I am currently using SHA256 with a salt to hash my passwords. Is it better to continue using SHA256 or should I change to SHA512?
Switching to SHA512 will hardly make your website more secure. You should not write your own password hashing function. Instead, use an existing implementation.
SHA256 and SHA512 are message digests, they were never meant to be password-hashing (or key-derivation) functions. (Although a message digest could be used a building block for a KDF, such as in PBKDF2 with HMAC-SHA256.)
A password-hashing function should defend against dictionary attacks and rainbow tables. In order to defend against dictionary attacks, a password hashing scheme must include a work factor to make it as slow as is workable.
Currently, the best choice is probably Argon2. This family of password hashing functions won the Password Hashing Competition in 2015.
If Argon2 is not available, the only other standardized password-hashing or key-derivation function is PBKDF2, which is an oldish NIST standard. Other choices, if using a standard is not required, include bcrypt and scrypt.
Wikipedia has pages for these functions:
https://en.wikipedia.org/wiki/Argon2
https://en.wikipedia.org/wiki/Bcrypt
https://en.wikipedia.org/wiki/Scrypt
https://en.wikipedia.org/wiki/PBKDF2
EDIT: NIST does not recommend using message digests such as SHA2 or SHA3 directly to hash passwords! Here is what NIST recommends:
Memorized secrets SHALL be salted and hashed using a suitable one-way
key derivation function. Key derivation functions take a password, a
salt, and a cost factor as inputs then generate a password hash. Their
purpose is to make each password guessing trial by an attacker who has
obtained a password hash file expensive and therefore the cost of a
guessing attack high or prohibitive. Examples of suitable key
derivation functions include Password-based Key Derivation Function 2
(PBKDF2) [SP 800-132] and Balloon [BALLOON].
SHA256 is still NIST Approved, but it would be good to change to SHA512, or bcrypt, if you can.
The list of NIST approved hash functions, at time of writing, is: SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256, and SHA3-224, SHA3-256, SHA3-384, and SHA3-512, SHAKE128 and SHAKE256.
See https://csrc.nist.gov/projects/hash-functions
Depending on what operating system you are running, you probably don't have access to the SHA3 or SHAKE hash functions.
Many people prefer bcrypt to SHA512, but bcrypt is also only available on some operating systems.
SHA512 will be available on your system, or if not, you probably have such an old system that choice of hashing algorithm is the least of your problems.
One reason commonly given for preferring bcrypt is that bcrypt is tuneable - you can increase the number of rounds (work factor) to increase the time it takes to crack bcrypt hashes.
But SHA256 and SHA512 are also tuneable. While the default is 5000 rounds, you can specify more if you wish. 500000 takes my current pc about 0.45 seconds to calculate, which feels tolerable.
e.g.:
password required pam_unix.so sha512 shadow rounds=500000 ...
The reason to change from SHA256 to SHA512 is that SHA256 needs a lot more rounds to be as secure as SHA512, so while it's not insecure, it's less secure.
See, for example: https://medium.com/#davidtstrauss/stop-using-sha-256-6adbb55c608
Crypto changes quickly, so any answer you get might be proved wrong tomorrow, but current state of the art is that while bcrypt is possibly better than SHA512, SHA512 is fine.
If SHA512 is what you have available 'out of the box', use it (not SHA256), and don't worry about bcrypt or any of the SHA3 family until they become standard for your distribution.
As an aside, the current top rated answer has a number of claims that are either wrong or misleading.
"Switching to SHA512 will hardly make your website more secure."
This is misleading. Switching to SHA512 will make your site slightly more secure. SHA256 isn't as good as SHA512, but it isn't dreadful either. There's nothing that is clearly better than SHA512 that is likely to be available on your system yet. Bcrypt might be better, but this isn't clear, and bcrypt isn't available on a lot of systems. The SHA3 family is probably better, but it isn't widely available either.
"SHA256 and SHA512 were never meant to be password-hashing"
This is wrong. Both SHA256 and SHA512 are approved NIST hash algorithms.
"to defend against dictionary attacks, a password hashing scheme must include a work factor to make it as slow as is workable."
This is wrong. A high work factor will protect against brute force hash cracking, but not against a dictionary attack. There is no work factor that is low enough to be usable but high enough to protect against a dictionary attack. If your password is a dictionary word, it will fall to a dictionary attack. The protection against a dictionary attack to not use passwords that can be found in dictionaries.
On my current PC, the limit on rounds seems to be 10 million, which produces a delay of 8.74 seconds for each password entered. That's long enough to be extremely painful, longer than you'd want to use. It's long enough to prevent a brute force attack - but a determined adversary with a good cracking rig and a bit of patience could still iterate through a dictionary if they wanted to.
"A password-hashing function should defend against ... rainbow tables"
This is, at best, misleading. The defence against rainbow tables is to make sure that each password has their own 'salt'. That's pretty much standard practice these days, and it happens before the hash function is called. (Salting means adding a random string to the password before hashing it. The salt is stored with the password, so it's not a secret, but it does mean that even if a user picks a well-known password, the attacker can't just recognise that {this hash} belongs to {that password}, they still need to crack the hash.)
"Currently, the best choice is probably Argon2. This family of password hashing functions won the Password Hashing Competition in 2015."
This is unclear. Any 'new' cryptographic function can have unobvious ways of being broken, which is why most people prefer functions that have been widely used. Besides which, Argon2 is probably not available to you.
"Other choices, if using a standard is not required, include bcrypt and scrypt."
This is unclear. At one point, scrypt was seen as a better bcrypt. However, for various reasons, sentiment has moved away from scrypt towards bcrypt. See, for example: https://blog.ircmaxell.com/2014/03/why-i-dont-recommend-scrypt.html
To repeat, at this point in time, SHA512 appears to be a good choice and so does bcrypt.
SHA512 is NIST approved and bcrypt is not.
SHA512 will almost certainly be available on your system. Bcrypt may or may not be.
If both are on your system, I'd probably recommend bcrypt, but it's a close call. Either is fine.
This has already been answered reasonably well, if you ask me: https://stackoverflow.com/questions/3897434/password-security-sha1-sha256-or-sha512
Jeff had an interesting post on hashing, too: http://www.codinghorror.com/blog/2012/04/speed-hashing.html
Note that SHA512 is a lot slower to compute than SHA256. In the context of secure hashing, this is an asset. Slower to compute hashes mean it takes more compute time to crack, so if you can afford the compute cost SHA512 will be more secure for this reason.
SHA512 may be significantly faster when calculated on most 64-bit processors as SHA256ses 32-bit math, an operation that is often slightly slower.
Outside of the really good and more practical/accurate answers regarding passwords, I have another perspective (one that I think is complementary to the others).
We use tools and companies to perform vulnerability assessments. One red flag we've had in code is use of MD5. This was not anything related to passwords... it was simply to generate a digest for a string. MD5 is nice and short, and really not a security issue for this specific scenario.
The problem is, it takes time to configure scanners to ignore these false-positives. And it is much more difficult to modify a security report written by an external vendor, in order to change the "high risk" finding to "low risk" or removed.
So my view is, why not use a better algorithm? In my case, I'm starting to use SHA512 in place of MD5. The length is a bit obscene compared to MD5, but for me it doesn't matter. Obviously, one's own performance needs in either calculation or storage would need to be considered.
As an aside for my situation, switching from MD5 to SHA256 would probably also be okay and not raise any red flags... but that leads me to my "why not use a better algorithm" comment.
Related
I am working on a project that has to store users passwords. With that password you can gain access to a user achievements and stuff so it's really important that you can not get the password even if you hacked into the database. My problem is which hashing function to choose in security and efficiency level.
Right now I am using sha256 with salt and pepper but I read that using a slow hashing function like Bcrypt with cost factor of 12 can be superior. And if it does, how much more security do I gain from using a pepper as well because it's really time consuming for hash function like Bcrypt.
My question is what hash function should I use base on the assumption that I do not expect to get hacked by a global hacking organization with supercomputers?
Plain crypto hash functions like sha2 or sha3 don't cut it anymore for password hashing, because it's too efficient to compute them, which means an attacker that controls many cpu cores can compute large tables relatively quickly. In practice the feasibility of this still depends on the length of passwords mostly (and the character set used in those password to some extent), but you should still not use these unless you really know why that will be acceptable.
Instead, you should be using a hash or key derivation function more suitable for password hashing. Pbkdf2, bcrypt, scrypt and Argon2 are all acceptable candidates, with somewhat different strengths and weaknesses. For a regular web app, it almost doesn't matter which of these you choose, and all will be more secure than plain hashes.
The difference will be very real if your database is compromised, properly hashed passwords will likely not be revealed, while at least some passwords with sha likely will, despite salt and pepper.
Disclaimer: there are many similar questions on SO, but I am looking for a practical suggestion instead of just general principles. Also, feel free to point out implementations of the "ideal" algorithm (PHP would be nice ;), but please provide specifics (how it works).
What is the best way to calculate hash string of a password for storing in a database? I know I should:
use salt
iterate hashing process multiple times (hash chaining for key stretching)
I was thinking of using such algorithm:
x = md5( salt + password);
repeat N-times:
x = md5( salt + password + x );
I am sure this is quite safe, but there are a few questions that come to mind:
would it be beneficial to include username in salt?
I have decided to use a common salt for all users, any downside in this?
what is the recommended minimum salt length, if any?
should I use md5, sha or something else?
is there anything wrong with the above algorithm / any suggestions?
... (feel free to provide more :)
I know the decisions necessarily depend on the situation, but I am looking for a solution that would:
provide as much security as possible
be fast enough ( < 0.5 second on a decent machine )
So, what would the ideal algorithm look like, preferably in pseudo-code?
The "ideal" password hashing function, right now, is bcrypt. It includes salt processing and a configurable number of iterations. There is a free opensource PHP implementation.
Second best would be PBKDF2, which relies on an underlying hash function and is somewhat similar to what you suggest. There are technical reasons why bcrypt is "better" than PBKDF2.
As for your specific questions:
1. would it be beneficial to include username in salt?
Not really.
2. I have decided to use a common salt for all users, any downside in this?
Yes: it removes the benefits of having a salt. The salt sole reason to exist is to be unique for each hashed password. This prevents an attacker from attacking two hashed passwords with less effort than twice that of attacking one hashed password. Salts must be unique. Even having a per-user salt is bad: the salt must also be changed when a user changes his password. The kind of optimization that an attacker may apply when a salt is reused / shared includes (but is not limited to) tables of precomputed hashes, such as rainbow tables.
3. what is the recommended minimum salt length, if any?
A salt must be unique. Uniqueness is a hard property to maintain. But by using long enough random salts (generated with a good random number generator, preferably a cryptographically strong one), you get uniqueness with a high enough probability. 128-bit salts are long enough.
4. should I use md5, sha or something else?
MD5 is bad for public relations. It has known weaknesses, which may or may not apply to a given usage, and it is very hard to "prove" with any kind of reliability that these weaknesses do not apply to a specific situation. SHA-1 is better, but not "good", because it also has weaknesses, albeit much less serious ones than MD5's. SHA-256 is a reasonable choice. As was pointed out above, for password hashing, you want a function which does not scale well on parallel architectures such as GPU, and SHA-256 scales well, which is why the Blowfish-derivative used in bcrypt is preferable.
5. is there anything wrong with the above algorithm / any suggestions?
It is homemade. That's bad. The trouble is that there is no known test for security of a cryptographic algorithm. The best we can hope for is to let a few hundreds professional cryptographer try to break an algorithm for a few years -- if they cannot, then we can say that although the algorithm is not really "proven" to be secure, at least weaknesses must not be obvious. Bcrypt has been around, widely deployed, and analyzed for 12 years. You cannot beat that by yourself, even with the help of StackOverflow.
As a professional cryptographer myself, I would raise a suspicious eyebrow at the use of simple concatenation in MD5 or even SHA-256: these are Merkle–Damgård hash functions, which is fine for collision resistance but does not provide a random oracle (there is the so-called "length extension attack"). In PBKDF2, the hash function is not used directly, but through HMAC.
I tend to use a fixed application salt, the username and the password
Example...
string prehash = "mysaltvalue" + "myusername" + "mypassword";
The benefit here is that people using the same password don't end up with the same hash value, and it prevents people with access to the database copying their password over another users - of course, if you can access the DB you don't really need to hack a login to get the data ;)
IMO, salt length doesn't matter too much, the hashed value length is always going to be 32 anyway (using MD5 - which again is what I would use)
I would say in terms of security, this password encryption is enough, the most important thing is to make sure your application/database has no security leaks in it!
Also, I wouldn't bother with repeated hashing, no point in my opinion. Somebody would have to know you algorithm to try to hack it that way and then it doesn't matter if it is hashed once or many times, if they know it, they know it
How is bcrypt stronger than, say,
def md5lots(password, salt, rounds):
if (rounds < 1)
return password
else
newpass = md5(password + salt)
return md5lots(newpass, salt, rounds-1)
I get the feeling, given its hype, that more intelligent people than me have figured out that bcrypt is better than this. Could someone explain the difference in 'smart layman' terms?
The principal difference - MD5 and other hash functions designed to verify data have been designed to be fast, and bcrypt() has been designed to be slow.
When you are verifying data, you want the speed, because you want to verify the data as fast as possible.
When you are trying to protect credentials, the speed works against you. An attacker with a copy of a password hash will be able to execute many more brute force attacks because MD5 and SHA1, etc, are cheap to execute.
bcrypt in contrast is deliberately expensive. This matters little when there are one or two tries to authenticate by the genuine user, but is much more costly to brute-force.
There are three significant differences between bcrypt and hashing multiple times with MD5:
The size of the output: 128-bit (16-bytes) for MD5 and 448 bits (56-bytes) for bcrypt. If you store millions of hashes in a database, this has to be taken into account.
Collisions and preimage attacks are possible against MD5.
Bcrypt can be configured to iterate more and more as cpu's become more and more powerful.
Hence, using salting-and-stretching with MD5 is not as safe as using bcrypt. This issue can be solved by selecting a better hash function than MD5.
For example, if SHA-256 is selected, the output size will be 256-bits (32-bytes). If the salting-and-stretching can be configured to increase the number of iterations like bcrypt, then there is no difference between both methods, except the amount of space required to store result hashes.
You are effectively talking about implementing PBKDF2 or Password-Based Key Derivation Function. Effectively it is the same thing as BCrypt, the advantage being that you can lengthen the amount of CPU time it takes to derive a password. The advantage of this over something like BCrypt is that, by knowing how many 'Iterations' you have put the password through, when you need to increase it you could do it without resetting all the passwords in the database. Just have your algorithm pick up the end result as if it were at the nth iteration (where n is the previous itteration count) and keep going!
It is recomended you use a proper PBKDF2 library instead of creating your own, because lets face it, as with all cryptography, the only way you know if something is safe is if it has been 'tested' by the interwebs. (see here)
Systems that use this method:
.NET has a library already implemented. See it here
Mac, linux and windows file encryption uses many itteration (10,000+) versions of this encryption method to secure their file systems.
Wi-Fi networks are often secured using this method of encryption
Source
Thanks for asking the question, it forced me to research the method i was using for securing my passwords.
TTD
Although this question is already answered, i would like to point out a subtle difference between BCrypt and your hashing-loop. I will ignore the deprecated MD5 algorithm and the exponential cost factor, because you could easily improve this in your question.
You are calculating a hash-value and then you use the result to calculate the next hash-value. If you look at the implementation of BCrypt, you can see, that each iteration uses the resulting hash-value, as well as the original password (key).
Eksblowfish(cost, salt, key)
state = InitState()
state = ExpandKey(state, salt, key)
repeat (2^cost)
state = ExpandKey(state, 0, key)
state = ExpandKey(state, 0, salt)
return state
This is the reason, you cannot take a Bcrypt-hashed password and continue with iterating, because you would have to know the original password then. I cannot prove it, but i suppose this makes Bcrypt safer than a simple hashing-loop.
Strictly speaking, bcrypt actually encrypts the text:
OrpheanBeholderScryDoubt
64 times.
But it does it with a key that was derived from your password and some randomly generated salt.
Password hashing is not hashing
The real virtue of "password hashing algorithms" (like bcrypt) is that they use a lot of RAM.
SHA2 is designed to be fast. If you're a real-time web-server, and you want to validate file integrity, you want something that runs extraordinarly fast, with extraordinarliy low resource usage. That is the antithesis of password hashing.
SHA2 is designed to be fast
SHA2 can operate with 128 bytes of RAM
SHA2 is easily implementable in hardware
i own a USB stick device that can calculate 330 million hashes per second
in fact, i own 17 of them
If you perform a "fast" hash multiple times (e.g. 10,000 is a common recommendation of PBDKF2), then you're not really adding any security.
What you need is a hash that is difficult to implement in hardware. What you need is a hash that is hard to parallelize on a GPU.
Over the last few decades we've learned that RAM is the key to slowing down password hashing attempts. Custom hardware shines at performing raw computation (in fact, only 1% of your CPU is dedicated to computation - the rest is dedicated to jitting the machine instructions into something faster; pre-fetching, out-of-order-execution, branch prediction, cache). The way to styme custom hardware is to make the algorithm have to touch a lot of RAM.
SHA2: 128 bytes
bcrypt: 4 KB
scrypt (configurable): 16 MB in LiteCoin
Argon2 (configurable): 64 MB in documentation examples
Password hashing does not mean simply using a fast hash multiple times.
A modern recommended bcrypt cost factor is 12; so that it takes about 250 ms to compute.
you would have to perform about 330,000 iterations of SHA2 to equal that time cost on a modern single-core CPU
But then we get back to my 2.5W, USB, SHA2 stick and it's 330 Mhashes/sec. In order to defend against that, it would have to be 83M iterations.
If you're try to add only CPU cost: you're losing.
You have to add memory cost
bcrypt is 21 years old, and it only uses 4KB. But it is still ~infinitely better than any amount of MD5, SHA-1, or SHA2 hashing.
I'm a bit conflicted with an answer when I google for this, as these algos are constantly improving and new exploits are being found and new issues come up all the time... a lot of advice on what algo to use is simply old, or keeping ideas from an older time when they were the best way.
I want to be very clear here: I'm not talking about passwords. I'm talking about message digests, not cryptographic hashes.
I could go ahead and use md5 as my first inkling for message digest (it's right in the name), but then I remembered there's more collisions than more modern algos out there. But then, what makes these newer algos more suitable for the message digest of a file or short string?
So that's my question, what's the modern message digest algo that should be used?
From that perspective, depending on the amount of data you are working with, SHA1 should do fine - if you will be working with larger amounts of data, a SHA-2 algorithm, such as SHA-256 might be more suitable as the fear of collisions in SHA1 is rising due to a flaw in its algorithm, but it isn't extremely serious when working with smallish amounts of data.
MD5 has been shown to be too vulnerable to collisions, as there have been attacks on SSL certificates that used MD5 to create a forged SSL certificate, so I'd stay away from there. Also depending on your application, MD5 is not FIPS 140 compliant, if that is of any importance to you.
SHA1 is ideal over MD5 because it is safer as MD5 is risky to use, and SHA1 has better performance in most common circumstances than SHA-2. The SHA-2 algorithms are by no means slow - but it has an edge. However, SHA1 is slightly riskier because you've probably locked yourself into using it - if collisions start to be found, it might be hard for you to change, so it might be better to invest in a SHA-2 algorithm up-front. The penalty for using SHA-256 over SHA-1 is very little, depending on how you will be using the SHA algorithm. SHA-2 algorithms produce a much larger output than SHA1, but at the benefit of the reduced chances of a collision.
So which one is right? It depends on what you are looking for and what your use case it. Hopefully now you can make a decision.
When in doubt, use SHA-256. The other SHA-2 functions are fine too; however, SHA-384 and SHA-512 may suffer from a non-negligible performance degradation on small (32-bit only) platforms. This may matter for some specific applications.
For non-security related usages (e.g. first pass of indexing in a hash table, or detection of accidental, non-malicious data alteration -- the kind of job where you could use a CRC), consider MD4, a predecessor to MD5. MD4 is even more broken than MD5, but also simpler to implement (with shorter code) and faster (actually, it has been measured to be faster than CRC32 on some ARM platforms).
I'm building a web application and would like to use the strongest hashing algorithm possible for passwords. What are the differences, if any, between sha512, whirlpool, ripemd160 and tiger192,4? Which one would be considered cryptographically stronger?
bCrypt - Why would be a very long explanation, for which I recommend Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes
Basically, it's secure, it's slow, it's already implemented.
If you are actually concerned about the security of your system (as opposed to the quite academic strength of algorithms) then you should go with a proven and mature implementation instead of nitpicking algorithms.
I would recommend Ulrich Drepper's SHA-crypt implementation. This implementation uses SHA-512, a 16 character long salt, is peer reviewed and scheduled to go into all major Linux distributions via glibc 2.7.
P.S.: Once you have reached this level of security, you'll be visited by the black helicopters anyways.
David, those are all plenty strong functions. Even the much-ballyhooed MD5 collisions are not of the password-cracking variety, they just generate two different strings with the same MD5 (a very different proposition from finding a string that generates a given MD5 value).
If you are concerned about the security of the passwords, you need to worry about the protocols used to store them, the protocols used to recover passwords forgotten by users, and all the other possible avenues of attack. Those options are used far more often to crack passwords than brute-force crtyptanalysis.
Do use a salt, though.
But first read the article AviewAnew posted
Here's a good post on coding horror about storing passwords. In short, he suggests bcrypt or SHA-2 with a random unique salt.
MD5 and SHA are the two most popular hashing algorithms. SHA-256 uses a 256-bit hash, whereas MD5 produces a 128-bit hash value. So, SHA-256 should be good choice as it is the strongest hashing algorithm.
You can find some useful case here -> https://codesigningstore.com/what-is-the-best-hashing-algorithm