Salt Before Hash or Hash Before Salt?

Salt Before Hash or Hash Before Salt? - hash

It is clear that we should salt and hash passwords before writing to the database. For this purpose, I have seen 2 different fundamental approaches:
Salting before hashing
hash(password + salt)
Hashing before salting
hash(hash(password) + salt)
My instincts somehow say that there is something wrong in the 2nd approach due to the hashing before salting, but that is only my instinct without cryptography basis.
Is one of these methods is more secure than the other? Is there any weakness in the 2nd method?

Recommended hash algorithms like PBKDF2 or BCrypt cannot calculate a hash in a single pass, so neither of the described approaches is sufficient. They should offer a cost factor, which controls the necessary time to calculate a single hash. The cost factor determines how many rounds of hashing are done.
How the salt is applied is part of the algorithm, so it is best to leave this to the algorithm.
This is how BCrypt applies the salt (from Wikipedia):
EksBlowfishSetup(cost, salt, key)
state \gets InitState()
state \gets ExpandKey(state, salt, key)
repeat (2cost)
state \gets ExpandKey(state, 0, key)
state \gets ExpandKey(state, 0, salt)
return state

Related

Why have a good salt?

Let's say we don't use password_hash and use crypt() with sha512 instead to hash passwords. We need to add salt to the password, so an attacker couldn't use a rainbow table attack. Why the salt has to be good and very random as stated in many SO answers? Even if salt differs by a little or not very random, it will still give a totally different hash from others. So, an attacker won't know who uses the same passwords and he still won't be able to create just one rainbow table.

Computing and storing a strong salt requires minimal effort yet reduces the chances of a rainbow table having being pre-computed with the salt astronomically small.
If the salt was a 3 digit number it would be feasible that an attacker could have pre-computed rainbow tables for all possible salt combinations. If the salt is a random 24 character alpha-numeric string then the chances an attacker could pre-compute this for all possible salts are practically zero.

A salt is supposed to be unique, must be long enough, and should be unpredictable. Randomness is not necessary, but it is the easiest way for a computer to meet those requirements. And it is not the purpose of a salt to be secret, a salt fulfills its purpose even when known.
Uniqueness means that it should not only be unique in your database (otherwise you could use a userid), it should be unique worldwide. Somebody could create rainbowtables for salts like e.g. 1-1000 and would be able to retrieve passwords for all accounts with those userids (often admin accounts have low userids).
Long enough: If the salt is too short (too few possible combinations), it becomes profitable again to build rainbow-tables. Salt and password together can then be seen as just a longer password, and if you can build a rainbow-table for this longer passwords, you also get the shorter original passwords. For very strong and long passwords, salting would actually not be necessary at all, but most human generated passwords can be brute-forced because they are short (people have to remember them).
Also using salts derrived from other parameters can fall into this category. Only because you calculate a hash from the userid, this doesn't increase the possible combinations.
Unpredictability is a bit less important, but imagine once more the case that you use the userid as salt, an attacker can find out what the next few userids will be, and can therefore precalculate a narrow number of rainbow-tables. Depending of the used hash-algorithm this can be applicable or not. He has a time advantage then, can retrieve the password immediately. More of a problem will be, if the admin accounts used a predictable salt.
So using a really random number, generated from the OS random source (dev/urandom), is the best you can do. Even when you ignore all reasons above, why should you use a derived salt when there is a better way, why not use the best way you know?

Is salting a password pointless if someone gains access to the salt key? Off server salting?

Hearing about all the recent hacks at big tech firms, it made me wonder their use of password storage.
I know salting + hashing is accepted as being generally secure but ever example I've seen of salting has the salt key hard-coded into the password script which is generally stored on the same server.
So is it a logical solution to hash the user's password initially, pass that hash to a "salting server" or some function stored off-site, then pass back the salted hash?
The way I I'm looking at it is, if an intruder gains access to the server or database containing the stored passwords, they won't immediately have access to the salt key.

No -- salt remains effective even if known to the attacker.
The idea of salt is that it makes a dictionary attack on a large number of users more difficult. Without salt, the attacker hashes all the words in a dictionary, and sees which match with your users' hashed paswords. With salt, he has to hash each word in the dictionary many times over (once for each possible hash value) to be certain of having one that fits each user.
This multiplication by several thousand (or possibly several million, depending on how large a salt you use) increases the time to hash all the values, and the storage need to store the results -- the point that (you hope) it's impractical.
I should add, however, that in many (most?) cases, a very large salt doesn't really add a lot of security. The problem is that if you use, say, a 24 bit salt (~16 million possible values) but have only, say, a few hundred users, the attacker can collect the salt values you're actually using ahead of time, then do his dictionary attack for only those values instead of the full ~16 million potential values. In short, your 24-bit salt adds only a tiny bit of difficulty beyond what a ~8 bit salt would have provided.
OTOH, for a large server (Google, Facebook, etc.) the story is entirely different -- a large salt becomes quite beneficial.

Salting is useful even if intruder knows the salt.
If passwords are NOT salted, it makes possible to use widely available precomputed rainbow tables to quickly attack your passwords.
If your password table was salted, it makes it very difficult to precompute rainbow tables - it is impractical to create rainbow table for every possible salt.
If you use random salt that is different for every password entry, and put it in plaintext right next to it, it makes very difficult for intruder to attack your passwords, short of brute force attack.

Salting passwords protects passwords against attacks where the attacker has a list of hashed passwords. There are some common hashing algorithms that hackers have tables for that allow them to look up a hash and retrieve the password. For this to work, the hacker has to have broken into the password storage and stolen the hashes.
If the passwords are salted, then the attacker must re-generate their hash tables, using the hashing algorithm and the salt. Depending on the hashing algorithm, this can take some time. To speed things up, hackers also use lists of the most common passwords and dictionary words. The idea of the salt is to slow an attacker down.
The best approach to use a different salt for each password, make it long and random, and it's ok to store the salt next to each password. This really slows an attacker down, because they would have to run their hash table generation for each individual password, for every combination of common passwords and dictionary words. This would make it implausible for an attacker to deduce strong passwords.
I had read a good article on this, which I can't find now. But Googling 'password salt' gives some good results. Have a look at this article.

I would like to point out, that the scheme you described with the hard-coded salt, is actually not a salt, instead it works like a key or a pepper. Salt and pepper solve different problems.
A salt should be generated randomly for every password, and can be stored together with the hashed password in the database. It can be stored plain text, and fullfills it's purpose even when known to the attacker.
A pepper is a secret key, that will be used for all passwords. It will not be stored in the database, instead it should be deposited in a safe place. If the pepper is known to the attacker, it becomes useless.
I tried to explain the differences in a small tutorial, maybe you want to have a look there.

Makes sense. Seems like more effort than worth (unless its a site of significant worth or importance) for an attacker.
all sites small or large, important or not, should take password hashing as high importance
as long as each hash has its own large random salt then yes it does become mostly impracticable, if each hash uses an static salt you can use Rainbow tables to weed out the users hashs who used password1 for example
using an good hashing algorithm is also important as well (using MD5 or SHA1 is nearly like using plaintext with the mutli gpu setups these days) use scrypt if not then bcrypt or if you have to use PBKDF2 then (you need the rounds to be very high)

Best practice for hashing passwords - SHA256 or SHA512?

I am currently using SHA256 with a salt to hash my passwords. Is it better to continue using SHA256 or should I change to SHA512?

Switching to SHA512 will hardly make your website more secure. You should not write your own password hashing function. Instead, use an existing implementation.
SHA256 and SHA512 are message digests, they were never meant to be password-hashing (or key-derivation) functions. (Although a message digest could be used a building block for a KDF, such as in PBKDF2 with HMAC-SHA256.)
A password-hashing function should defend against dictionary attacks and rainbow tables. In order to defend against dictionary attacks, a password hashing scheme must include a work factor to make it as slow as is workable.
Currently, the best choice is probably Argon2. This family of password hashing functions won the Password Hashing Competition in 2015.
If Argon2 is not available, the only other standardized password-hashing or key-derivation function is PBKDF2, which is an oldish NIST standard. Other choices, if using a standard is not required, include bcrypt and scrypt.
Wikipedia has pages for these functions:
https://en.wikipedia.org/wiki/Argon2
https://en.wikipedia.org/wiki/Bcrypt
https://en.wikipedia.org/wiki/Scrypt
https://en.wikipedia.org/wiki/PBKDF2
EDIT: NIST does not recommend using message digests such as SHA2 or SHA3 directly to hash passwords! Here is what NIST recommends:
Memorized secrets SHALL be salted and hashed using a suitable one-way
key derivation function. Key derivation functions take a password, a
salt, and a cost factor as inputs then generate a password hash. Their
purpose is to make each password guessing trial by an attacker who has
obtained a password hash file expensive and therefore the cost of a
guessing attack high or prohibitive. Examples of suitable key
derivation functions include Password-based Key Derivation Function 2
(PBKDF2) [SP 800-132] and Balloon [BALLOON].

SHA256 is still NIST Approved, but it would be good to change to SHA512, or bcrypt, if you can.
The list of NIST approved hash functions, at time of writing, is: SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256, and SHA3-224, SHA3-256, SHA3-384, and SHA3-512, SHAKE128 and SHAKE256.
See https://csrc.nist.gov/projects/hash-functions
Depending on what operating system you are running, you probably don't have access to the SHA3 or SHAKE hash functions.
Many people prefer bcrypt to SHA512, but bcrypt is also only available on some operating systems.
SHA512 will be available on your system, or if not, you probably have such an old system that choice of hashing algorithm is the least of your problems.
One reason commonly given for preferring bcrypt is that bcrypt is tuneable - you can increase the number of rounds (work factor) to increase the time it takes to crack bcrypt hashes.
But SHA256 and SHA512 are also tuneable. While the default is 5000 rounds, you can specify more if you wish. 500000 takes my current pc about 0.45 seconds to calculate, which feels tolerable.
e.g.:
password required pam_unix.so sha512 shadow rounds=500000 ...
The reason to change from SHA256 to SHA512 is that SHA256 needs a lot more rounds to be as secure as SHA512, so while it's not insecure, it's less secure.
See, for example: https://medium.com/#davidtstrauss/stop-using-sha-256-6adbb55c608
Crypto changes quickly, so any answer you get might be proved wrong tomorrow, but current state of the art is that while bcrypt is possibly better than SHA512, SHA512 is fine.
If SHA512 is what you have available 'out of the box', use it (not SHA256), and don't worry about bcrypt or any of the SHA3 family until they become standard for your distribution.
As an aside, the current top rated answer has a number of claims that are either wrong or misleading.
"Switching to SHA512 will hardly make your website more secure."
This is misleading. Switching to SHA512 will make your site slightly more secure. SHA256 isn't as good as SHA512, but it isn't dreadful either. There's nothing that is clearly better than SHA512 that is likely to be available on your system yet. Bcrypt might be better, but this isn't clear, and bcrypt isn't available on a lot of systems. The SHA3 family is probably better, but it isn't widely available either.
"SHA256 and SHA512 were never meant to be password-hashing"
This is wrong. Both SHA256 and SHA512 are approved NIST hash algorithms.
"to defend against dictionary attacks, a password hashing scheme must include a work factor to make it as slow as is workable."
This is wrong. A high work factor will protect against brute force hash cracking, but not against a dictionary attack. There is no work factor that is low enough to be usable but high enough to protect against a dictionary attack. If your password is a dictionary word, it will fall to a dictionary attack. The protection against a dictionary attack to not use passwords that can be found in dictionaries.
On my current PC, the limit on rounds seems to be 10 million, which produces a delay of 8.74 seconds for each password entered. That's long enough to be extremely painful, longer than you'd want to use. It's long enough to prevent a brute force attack - but a determined adversary with a good cracking rig and a bit of patience could still iterate through a dictionary if they wanted to.
"A password-hashing function should defend against ... rainbow tables"
This is, at best, misleading. The defence against rainbow tables is to make sure that each password has their own 'salt'. That's pretty much standard practice these days, and it happens before the hash function is called. (Salting means adding a random string to the password before hashing it. The salt is stored with the password, so it's not a secret, but it does mean that even if a user picks a well-known password, the attacker can't just recognise that {this hash} belongs to {that password}, they still need to crack the hash.)
"Currently, the best choice is probably Argon2. This family of password hashing functions won the Password Hashing Competition in 2015."
This is unclear. Any 'new' cryptographic function can have unobvious ways of being broken, which is why most people prefer functions that have been widely used. Besides which, Argon2 is probably not available to you.
"Other choices, if using a standard is not required, include bcrypt and scrypt."
This is unclear. At one point, scrypt was seen as a better bcrypt. However, for various reasons, sentiment has moved away from scrypt towards bcrypt. See, for example: https://blog.ircmaxell.com/2014/03/why-i-dont-recommend-scrypt.html
To repeat, at this point in time, SHA512 appears to be a good choice and so does bcrypt.
SHA512 is NIST approved and bcrypt is not.
SHA512 will almost certainly be available on your system. Bcrypt may or may not be.
If both are on your system, I'd probably recommend bcrypt, but it's a close call. Either is fine.

This has already been answered reasonably well, if you ask me: https://stackoverflow.com/questions/3897434/password-security-sha1-sha256-or-sha512
Jeff had an interesting post on hashing, too: http://www.codinghorror.com/blog/2012/04/speed-hashing.html
Note that SHA512 is a lot slower to compute than SHA256. In the context of secure hashing, this is an asset. Slower to compute hashes mean it takes more compute time to crack, so if you can afford the compute cost SHA512 will be more secure for this reason.

SHA512 may be significantly faster when calculated on most 64-bit processors as SHA256ses 32-bit math, an operation that is often slightly slower.

Outside of the really good and more practical/accurate answers regarding passwords, I have another perspective (one that I think is complementary to the others).
We use tools and companies to perform vulnerability assessments. One red flag we've had in code is use of MD5. This was not anything related to passwords... it was simply to generate a digest for a string. MD5 is nice and short, and really not a security issue for this specific scenario.
The problem is, it takes time to configure scanners to ignore these false-positives. And it is much more difficult to modify a security report written by an external vendor, in order to change the "high risk" finding to "low risk" or removed.
So my view is, why not use a better algorithm? In my case, I'm starting to use SHA512 in place of MD5. The length is a bit obscene compared to MD5, but for me it doesn't matter. Obviously, one's own performance needs in either calculation or storage would need to be considered.
As an aside for my situation, switching from MD5 to SHA256 would probably also be okay and not raise any red flags... but that leads me to my "why not use a better algorithm" comment.

what would ideal password hashing algorithm look like?

Disclaimer: there are many similar questions on SO, but I am looking for a practical suggestion instead of just general principles. Also, feel free to point out implementations of the "ideal" algorithm (PHP would be nice ;), but please provide specifics (how it works).
What is the best way to calculate hash string of a password for storing in a database? I know I should:
use salt
iterate hashing process multiple times (hash chaining for key stretching)
I was thinking of using such algorithm:
x = md5( salt + password);
repeat N-times:
x = md5( salt + password + x );
I am sure this is quite safe, but there are a few questions that come to mind:
would it be beneficial to include username in salt?
I have decided to use a common salt for all users, any downside in this?
what is the recommended minimum salt length, if any?
should I use md5, sha or something else?
is there anything wrong with the above algorithm / any suggestions?
... (feel free to provide more :)
I know the decisions necessarily depend on the situation, but I am looking for a solution that would:
provide as much security as possible
be fast enough ( < 0.5 second on a decent machine )
So, what would the ideal algorithm look like, preferably in pseudo-code?

The "ideal" password hashing function, right now, is bcrypt. It includes salt processing and a configurable number of iterations. There is a free opensource PHP implementation.
Second best would be PBKDF2, which relies on an underlying hash function and is somewhat similar to what you suggest. There are technical reasons why bcrypt is "better" than PBKDF2.
As for your specific questions:
1. would it be beneficial to include username in salt?
Not really.
2. I have decided to use a common salt for all users, any downside in this?
Yes: it removes the benefits of having a salt. The salt sole reason to exist is to be unique for each hashed password. This prevents an attacker from attacking two hashed passwords with less effort than twice that of attacking one hashed password. Salts must be unique. Even having a per-user salt is bad: the salt must also be changed when a user changes his password. The kind of optimization that an attacker may apply when a salt is reused / shared includes (but is not limited to) tables of precomputed hashes, such as rainbow tables.
3. what is the recommended minimum salt length, if any?
A salt must be unique. Uniqueness is a hard property to maintain. But by using long enough random salts (generated with a good random number generator, preferably a cryptographically strong one), you get uniqueness with a high enough probability. 128-bit salts are long enough.
4. should I use md5, sha or something else?
MD5 is bad for public relations. It has known weaknesses, which may or may not apply to a given usage, and it is very hard to "prove" with any kind of reliability that these weaknesses do not apply to a specific situation. SHA-1 is better, but not "good", because it also has weaknesses, albeit much less serious ones than MD5's. SHA-256 is a reasonable choice. As was pointed out above, for password hashing, you want a function which does not scale well on parallel architectures such as GPU, and SHA-256 scales well, which is why the Blowfish-derivative used in bcrypt is preferable.
5. is there anything wrong with the above algorithm / any suggestions?
It is homemade. That's bad. The trouble is that there is no known test for security of a cryptographic algorithm. The best we can hope for is to let a few hundreds professional cryptographer try to break an algorithm for a few years -- if they cannot, then we can say that although the algorithm is not really "proven" to be secure, at least weaknesses must not be obvious. Bcrypt has been around, widely deployed, and analyzed for 12 years. You cannot beat that by yourself, even with the help of StackOverflow.
As a professional cryptographer myself, I would raise a suspicious eyebrow at the use of simple concatenation in MD5 or even SHA-256: these are Merkle–Damgård hash functions, which is fine for collision resistance but does not provide a random oracle (there is the so-called "length extension attack"). In PBKDF2, the hash function is not used directly, but through HMAC.

I tend to use a fixed application salt, the username and the password
Example...
string prehash = "mysaltvalue" + "myusername" + "mypassword";
The benefit here is that people using the same password don't end up with the same hash value, and it prevents people with access to the database copying their password over another users - of course, if you can access the DB you don't really need to hack a login to get the data ;)
IMO, salt length doesn't matter too much, the hashed value length is always going to be 32 anyway (using MD5 - which again is what I would use)
I would say in terms of security, this password encryption is enough, the most important thing is to make sure your application/database has no security leaks in it!
Also, I wouldn't bother with repeated hashing, no point in my opinion. Somebody would have to know you algorithm to try to hack it that way and then it doesn't matter if it is hashed once or many times, if they know it, they know it

What's the difference between bcrypt and hashing multiple times?

How is bcrypt stronger than, say,
def md5lots(password, salt, rounds):
if (rounds < 1)
return password
else
newpass = md5(password + salt)
return md5lots(newpass, salt, rounds-1)
I get the feeling, given its hype, that more intelligent people than me have figured out that bcrypt is better than this. Could someone explain the difference in 'smart layman' terms?

The principal difference - MD5 and other hash functions designed to verify data have been designed to be fast, and bcrypt() has been designed to be slow.
When you are verifying data, you want the speed, because you want to verify the data as fast as possible.
When you are trying to protect credentials, the speed works against you. An attacker with a copy of a password hash will be able to execute many more brute force attacks because MD5 and SHA1, etc, are cheap to execute.
bcrypt in contrast is deliberately expensive. This matters little when there are one or two tries to authenticate by the genuine user, but is much more costly to brute-force.

There are three significant differences between bcrypt and hashing multiple times with MD5:
The size of the output: 128-bit (16-bytes) for MD5 and 448 bits (56-bytes) for bcrypt. If you store millions of hashes in a database, this has to be taken into account.
Collisions and preimage attacks are possible against MD5.
Bcrypt can be configured to iterate more and more as cpu's become more and more powerful.
Hence, using salting-and-stretching with MD5 is not as safe as using bcrypt. This issue can be solved by selecting a better hash function than MD5.
For example, if SHA-256 is selected, the output size will be 256-bits (32-bytes). If the salting-and-stretching can be configured to increase the number of iterations like bcrypt, then there is no difference between both methods, except the amount of space required to store result hashes.

You are effectively talking about implementing PBKDF2 or Password-Based Key Derivation Function. Effectively it is the same thing as BCrypt, the advantage being that you can lengthen the amount of CPU time it takes to derive a password. The advantage of this over something like BCrypt is that, by knowing how many 'Iterations' you have put the password through, when you need to increase it you could do it without resetting all the passwords in the database. Just have your algorithm pick up the end result as if it were at the nth iteration (where n is the previous itteration count) and keep going!
It is recomended you use a proper PBKDF2 library instead of creating your own, because lets face it, as with all cryptography, the only way you know if something is safe is if it has been 'tested' by the interwebs. (see here)
Systems that use this method:
.NET has a library already implemented. See it here
Mac, linux and windows file encryption uses many itteration (10,000+) versions of this encryption method to secure their file systems.
Wi-Fi networks are often secured using this method of encryption
Source
Thanks for asking the question, it forced me to research the method i was using for securing my passwords.
TTD

Although this question is already answered, i would like to point out a subtle difference between BCrypt and your hashing-loop. I will ignore the deprecated MD5 algorithm and the exponential cost factor, because you could easily improve this in your question.
You are calculating a hash-value and then you use the result to calculate the next hash-value. If you look at the implementation of BCrypt, you can see, that each iteration uses the resulting hash-value, as well as the original password (key).
Eksblowfish(cost, salt, key)
state = InitState()
state = ExpandKey(state, salt, key)
repeat (2^cost)
state = ExpandKey(state, 0, key)
state = ExpandKey(state, 0, salt)
return state
This is the reason, you cannot take a Bcrypt-hashed password and continue with iterating, because you would have to know the original password then. I cannot prove it, but i suppose this makes Bcrypt safer than a simple hashing-loop.

Strictly speaking, bcrypt actually encrypts the text:
OrpheanBeholderScryDoubt
64 times.
But it does it with a key that was derived from your password and some randomly generated salt.
Password hashing is not hashing
The real virtue of "password hashing algorithms" (like bcrypt) is that they use a lot of RAM.
SHA2 is designed to be fast. If you're a real-time web-server, and you want to validate file integrity, you want something that runs extraordinarly fast, with extraordinarliy low resource usage. That is the antithesis of password hashing.
SHA2 is designed to be fast
SHA2 can operate with 128 bytes of RAM
SHA2 is easily implementable in hardware
i own a USB stick device that can calculate 330 million hashes per second
in fact, i own 17 of them
If you perform a "fast" hash multiple times (e.g. 10,000 is a common recommendation of PBDKF2), then you're not really adding any security.
What you need is a hash that is difficult to implement in hardware. What you need is a hash that is hard to parallelize on a GPU.
Over the last few decades we've learned that RAM is the key to slowing down password hashing attempts. Custom hardware shines at performing raw computation (in fact, only 1% of your CPU is dedicated to computation - the rest is dedicated to jitting the machine instructions into something faster; pre-fetching, out-of-order-execution, branch prediction, cache). The way to styme custom hardware is to make the algorithm have to touch a lot of RAM.
SHA2: 128 bytes
bcrypt: 4 KB
scrypt (configurable): 16 MB in LiteCoin
Argon2 (configurable): 64 MB in documentation examples
Password hashing does not mean simply using a fast hash multiple times.
A modern recommended bcrypt cost factor is 12; so that it takes about 250 ms to compute.
you would have to perform about 330,000 iterations of SHA2 to equal that time cost on a modern single-core CPU
But then we get back to my 2.5W, USB, SHA2 stick and it's 330 Mhashes/sec. In order to defend against that, it would have to be 83M iterations.
If you're try to add only CPU cost: you're losing.
You have to add memory cost
bcrypt is 21 years old, and it only uses 4KB. But it is still ~infinitely better than any amount of MD5, SHA-1, or SHA2 hashing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse