How to securely store passwords
Bharat Kalluri / 2020-09-24
Having a way to login/signup users is one of the most common features web applications need. But doing it right is hard. Let us start with the most naive and basic ways of storing passwords and go level by level. Trying to understand how and why each level is bad and how we can improve on that.
Storing passwords plain text
This is the worst of all. From a developer standpoint, it is the easiest. Whenever a user logs in, you just have to go to the database get the password associated with the username. Do a string match and if the match says true, log in the user. But what happens if someone gets illegal access to your database? All the user's passwords are disclosed! The ramifications are even worse! A lot of users use the same passwords for multiple websites. So maybe the user might not lose a lot if the password of the current website is leaked, but if the same password is used for a bank? Then things get ugly. This is much more common than you might think, in a study by Cyclonis it was said that a stunning 83% of users use the same password for multiple websites, which means if one of them was compromised then every other login which has the same password has a high probability of being hacked. Encoding the password is an equally bad option, anyone with a decent computer science background can decode a base 64 encoded password in seconds. This is obviously bad. We can do much better.
Encrypting passwords
A quick 101 on encryption, you have a key and the encryption algorithm uses the key to convert information to garbled text. Later on, you can use the same key that can be used to decrypt the garbled text to the original information. This is heavily used in all modern communication platforms these days.
This is how it would work. The developer decides on a very good encryption algorithm, best in class even. And decides to encrypt all passwords with some key. Now whenever the system gets a password of the user, it decrypts the password stored in the database and does a string match. Although this sounds secure, This has pretty much the same problems as before. What happens when the encryption key gets in the wrong hands. You think all keys are carefully shared among the team members? Most probably than not, Keys are shared over slack and if someone can get access to slack, then they can basically go in and decrypt all user passwords. Something similar happened to twitter, where hackers gained access to sensitive information after internal tooling got compromised Let us go a step further.
Hashing passwords
Now we are talking! Hashing is a one-way function, unlike encryption. Encryption means you can convert passwords back to plain text using the key. If a password is hashed using a good algorithm, then it can't be converted back to text.
This is how it would work. When the user signs up, the system hash-es the password and stores it in the database. The next time the user puts in his user name and password. It would use the same hashing algorithm and hash the password again. If the generated hash matches with the hash in the database, then the user is verified and can log in. If not then the system rejects the user. Now, even if a hacker gets access to the database and can see all the password hashes, they cannot be used to retrieve the original plain text passwords.
But it is more complicated than that, the hacker cannot go back and find the plain text password from the hash. But The hacker can generate all possible combinations of text and numbers between length 6-12 and hash them. And then compare the generated hashes with the hash found in the database and thereby understand what was the original password. There are multiple databases of common passwords and their corresponding hashes. So if you're password is one of them. Then it just takes a search to figure out the password.
This might seem complicated and hard to do, but it is not. We have gigabytes of data saying what password corresponds to what hash and if the company is using a
simple algorithm for passwords and it gets leaked. it gets surprisingly simple to figure out passwords. For example, go ahead and paste this into your favorite search engine and see what results you get. 6eea9b7ef19179a06954edd0f6c05ceb
(This actually happens to be one of the most popular passwords, hashed format). Open the first link
and you can already see what is the hashing algorithm used and what is the original text.
Hashing passwords with salt
A salt is a random set of bytes used along with the password to hash. If you add random bytes to a very common password, the hash changes dramatically and thus will not be easy to crack. Even if the hacker gets access to the hashes, it will become significantly computationally expensive to crack passwords. Since every salt is unique and recreating all the databases with each salt is too much effort. The password salt can be stored as plain text as well.
Now the workflow would be, the user enters the password. The system generates a random salt, uses the random salt and the password, and hashes the password. Stores the hashed password+salt in the database and salt in plain text in another field. Next time the user tries logging in, it will take the salt along and hash the password and if it matches allows the user in. Or else rejects. One of the most popular algorithms for hashing now is bcrypt. There are libraries in almost all popular languages to simplify this. Most of these libraries handle the hash + salt for you and return back a string that stores the salt as well. So that the developer does not have to worry about storing the salt in a separate place.
Here is an example of bcrypt password hashing using golang
package main
import (
"fmt"
"golang.org/x/crypto/bcrypt"
)
func main() {
password := "secret"
hash, _ := bcrypt.GenerateFromPassword([]byte(password), bcrypt.MinCost)
fmt.Println("Password:", password)
fmt.Println("Hash: ", string(hash))
match := bcrypt.CompareHashAndPassword(hash, []byte(password))
fmt.Println("Match: ", (match == nil))
}
One thing I would like to mention is that bcrypt is not the state of the art password hashing algorithm right now. There is an algorithm called argon2 which is deemed to be the most secure. But it involves some tweaking and tuning from the developer's side for now for best performance. If you would like to read more about how to hash passwords using argon2 and golang, check out this awesome article by Alex Edwards.
Passwordless
The final and the best option in my opinion is the passwordless mechanism. In this method, the system sends a one time code to the user's email. The user inputs the one time code into the app. We verify the code and let the user in. This system assumes that the user email is secure, which is usually a fair assumption to make. This also offloads the security to the email client and email providers like google tend to have really good security protocols. This flow has been used for password reset for a long time.
Conclusion
If you can, go the passwordless route, and don't worry about passwords at all. If not, make sure you are using the state of art algorithms to carefully store passwords. Do not log passwords anywhere, this might seem like straightforward advice (And it is.) but multiple giants like Robinhood, Facebook and instagram have been subjected to these problems.
Hope this post has made it clear how you can work with passwords for your next project. Cheers!