Is it a good idea to store two hashes for each password in a database? - hash

Is it a good idea to store two hashes for each password in a database (e.g. SHA-1 and MD5) and check both of the hashes in a login script to prevent collisions? On the other side, wouldn't it then be easier to calculate the password from the two hashes (for example if a hacker gets access to the database)?

This would probably not be useful.
Any hash function you'd use will be safe against accidental collisions -- they're almost impossibly unlikely. So the only collisions you're concerned with are when hackers have already compromised your database, and have your hashes, trying to figure out a password that generates the target hash.
This is called a "second preimage attack", and it's incredibly hard. There are no known second preimage attacks for any relatively recent algorithm, even going back to MD4. This shouldn't be a serious concern.
However, if you're using a generic hash function, then people brute-forcing your hacked hashes is a realistic concern. You shouldn't use a generic hash function like SHA-2, even with salts. You should use a password hash function, like bcrypt, which is resistant to brute-forcing. If you are using normal hash functions then, as you note, storing two means they only need to brute force the weaker one -- it's one more thing that can go wrong.
Don't bother. Use a password hash function instead. It will be safer and simpler.

I really don't see how there can be any benefit from storing 2 hashes for the password and then checking both. All you're doing here is giving your app more work to do and in my opinion not providing any extra level of security as they are still entering the same password.

Related

Is SHA3 output re-hashed a million times more secure than Scrypt?

I am using Scrypt to get a hash for my input and I didn't use SHA3 because I found out that it can be bruteforced with a dictionary attack to find the SHA3 output. Later I was told not to use Scrypt because it's unnecessary and just hash the output of SHA3 a million times, as it would be simpler but also more secure.
Is that true? or is using Scrypt still a fine choice?
No, just hashing the password a million times is not more secure than scrypt.
There are at least two things that are missing:
the use of a salt, which differentiates the hash when users use the same password and prevent rainbow tables;
the memory usage of scrypt which can make it harder to crack passwords using specialized hardware.
What you are trying to re-implement is a password hash or PBKDF (Password Based Key Derivation Function, the same thing but to derive keys instead of hashes). There has been a password hashing competition not too long ago which Argon2 won. Baloon hashing is a later password hash created by a team of cryptographers.
I don't know which of your co-workers or acquaintances think that they could do better, but I think that they should learn about the Dunning-Kruger effect.

How do Hash-functions encode an infinite amount of data into a finite amount?

Hash-functions always create an output with a fixed length, even though the input can be infinitely large.
So how is it possible, that no information is lost here? Shouldn't some inputs result in the same output then?
Yes. Two inputs can result in the same output, resulting in a hash collision.
Hashes are designed so that hashing text is very easy, but reversing the process is difficult. The point of hashing isn't to store information. Instead, hashes are commonly used in security (and also data structures).
For instance, websites will hash a user's passwords and store the hashes instead of the physical passwords. This way, if the website's security is breached, the attacker can only obtain the hashes, which still doesn't let the attacker log in, as it is very difficult to reverse-engineer the password.
The hash set is another application of hashing. By hashing an object and storing only the hashes, you can check whether an object is present or not present in the set in constant time. You only have to search through all of the objects in the hash set that have the same hash as the object that you are checking. As the size of the hash set grows, so does the chance of a hash collision.
So how is it possible, that no information is lost here?
It's not possible, and lots of information is lost.
In the case of a perfect hash there is no collision and we could even argue that information isn't really lost (it's just not contained in the system alone) because we know all possible inputs and know there is no collisions in the hashes produced, but they can be used as an index in a way that isn't possible or as good with the input data, so they are useful.
In the case of a hash-based collection we use a hash code to (hopefully) have few collisions so we get close to O(1) lookup, but have some means to handle it if a collision does happen.
In the case of a cryptographic hash we could have collisions but it's extremely hard to deliberately do so, for similar (roughly speaking) reasons as to why its hard to break modern cryptography, so while you could have two passwords with the same hash you couldn't find it easily (especially if you aren't going to e.g. have a password of several thousand pages of text).
In the case of a checksum hash we could have collisions, but that they're unlikely means that if we have corruption we probably won't have the matching hash.

adding salt to a password

Is there really a point in salting a password?
if a program does all the processing of a SALT server side then does it really make it any more difficult for brute force or other attack. The code is only going to apply the salt to whatever is entered by a user.
Do I have this all wrong?
Yes, there is a point in salting a password.
The point is that each password has its own salt, so that an attacker can't make use of dictionaries and rainbow tables to brute force all passwords at once.
The salt doesn't make it harder to crack a single password¹, but it removes the benefit from attempting to crack multiple passwords at once. An attacker has to brute force one password at a time.
¹ At least not enough to be a good reason to use it. Using better passwords works much better.
In a word, yes.
Salting a password adds a level of complexity to the string and confuses humans, and makes dictionary attacks less likely to succeed.
Brute force can still crack this password however, hence the need for a randomly generated salt.
Salts are typically generated via byte-arrays, which is then fed into a function to combine the two strings into one at intervals. See my answer here.
The hashes may be leaked without the salt (common scenario: database gets dumped, but a salt i present in PHP source that does not leak).
You are right in a way but ... the most significant protection from SALT is that if the hashes ever do get released into the wild then reverse hash lookups are much much harder.
Hash a word and then put the hash result into your favourite search engine to see what I mean.

What is the recommended way to encrypt user passwords in a database?

In a web application written in Perl and using PostgreSQL the users have username and password. What would be the recommended way to store the passwords?
Encrypting them using the crypt() function of Perl and a random salt? That would limit the useful length of passswords to 8 characters and will require fetching the stored password in order to compare to the one given by the user when authenticating (to fetch the salt that was attached to it).
Is there a built-in way in PostgreSQL to do this?
Should I use Digest::MD5?
Don't use SHA1 or SHA256, as most other people are suggesting. Definitely don't use MD5.
SHA1/256 and MD5 are both designed to create checksums of files and strings (and other datatypes, if necessary). Because of this, they're designed to be as fast as possible, so that the checksum is quick to generate.
This fast speed makes it much easier to bruteforce passwords, as a well-written program easily can generate thousands of hashes every second.
Instead, use a slow algorithm that is specifically designed for passwords. They're designed to take a little bit longer to generate, with the upside being that bruteforce attacks become much harder. Because of this, the passwords will be much more secure.
You won't experience any significant performance disadvantages if you're only looking at encrypting individual passwords one at a time, which is the normal implementation of storing and checking passwords. It's only in bulk where the real difference is.
I personally like bcrypt. There should be a Perl version of it available, as a quick Google search yielded several possible matches.
MD5 is commonly used, but SHA1/SHA256 is better. Still not the best, but better.
The problem with all of these general-purpose hashing algorithms is that they're optimized to be fast. When you're hashing your passwords for storage, though, fast is just what you don't want - if you can hash the password in a microsecond, then that means an attacker can try a million passwords every second if they get their hands on your password database.
But you want to slow an attacker down as much as possible, don't you? Wouldn't it be better to use an algorithm which takes a tenth of a second to hash the password instead? A tenth of a second is still fast enough that users won't generally notice, but an attacker who has a copy of your database will only be able to make 10 attempts per second - it will take them 100,000 times longer to find a working set of login credentials. Every hour that it would take them at a microsecond per attempt becomes 11 years at a tenth of a second per attempt.
So, how do you accomplish this? Some folks fake it by running several rounds of MD5/SHA digesting, but the bcrypt algorithm is designed specifically to address this issue. I don't fully understand the math behind it, but I'm told that it's based on the creation of Blowfish frames, which is inherently slow (unlike MD5 operations which can be heavily streamlined on properly-configured hardware), and it has a tunable "cost" parameter so that, as Moore's Law advances, all you need to do is adjust that "cost" to keep your password hashing just as slow in ten years as it is today.
I like bcrypt the best, with SHA2(256) a close second. I've never seen MD5 used for passwords but maybe some apps/libraries use that. Keep in mind that you should always use a salt as well. The salt itself should be completely unique for each user and, in my opinion, as long as possible. I would never, ever use just a hash against a string without a salt added to it. Mainly because I'm a bit paranoid and also so that it's a little more future-proof.
Having a delay before a user can try again and auto-lockouts (with auto-admin notifications) is a good idea as well.
The pgcrypto module in PostgreSQL has builtin suppotr for password hashing, that is pretty smart about storage, generation, multi-algorithm etc. See http://www.postgresql.org/docs/current/static/pgcrypto.html, the section on Password Hashing Functions. You can also see the pgcrypto section of http://www.hagander.net/talks/hidden%20gems%20of%20postgresql.pdf.
Use SHA1 or SHA256 hashing with salting. Thats the way to go for storing passwords.
If you don't use a password recovery mechanism (Not password reset) I think using a hashing mechanism is better than trying to encrypt the password. You can just check the hashes without any security risk. Even you don't know the password of the user.
I would suggest storing it as a salted md5 hash.
INSERT INTO user (password) VALUES (md5('some_salt'||'the_password'));
You could calculate the md5 hash in perl if you wish, it doesn't make much difference unless you are micro-optimizing.
You could also use sha1 as an alternative, but I'm unsure if Postgres has a native implementation of this.
I usually discourage the use of a dynamic random salt, as it is yet another field that must be stored in the database. Plus, if your tables were ever compromised, the salt becomes useless.
I always go with a one-time randomly generated salt and store this in the application source, or a config file.
Another benefit of using a md5 or sha1 hash for the password is you can define the password column as a fixed width CHAR(32) or CHAR(40) for md5 and sha1 respectively.

Purposely create two files to have the same hash?

If someone is purposely trying to modify two files to have the same hash, what are ways to stop them? Can md5 and sha1 prevent the majority case?
I was thinking of writing my own and I figure even if I don't do a good job if the user doesn't know my hash he may not be able to fool mine.
What's the best way to prevent this?
MD5 is generally considered insecure if hash collisions are a major concern. SHA1 is likewise no longer considered acceptable by the US government. There is was a competition under way to find a replacement hash algorithm, but the recommendation at the moment is to use the SHA2 family - SHA-256, SHA-384 or SHA-512. [Update: 2012-10-02 NIST has chosen SHA-3 to be the algorithm Keccak.]
You can try to create your own hash — it would probably not be as good as MD5, and 'security through obscurity' is likewise not advisable.
If you want security, hash with multiple hash algorithms. Being able to simultaneously create files that have hash collisions using a number of algorithms is excessively improbable. [And, in the light of comments, let me make it clear: I mean publish both the SHA-256 and the Whirlpool values for the file — not combining hash algorithms to create a single value, but using separate algorithms to create separate values. Generally, a corrupted file will fail to match any of the algorithms; if, perchance, someone has managed to create a collision value using one algorithm, the chance of also producing a second collision in one of the other algorithms is negligible.]
The Public TimeStamp uses an array of algorithms. See, for example, sqlcmd-86.00.tgz for an illustration.
If the user doesn't know your hashing algorithm he also can't verify your signature on a document that you actually signed.
The best option is to use public-key one-way hashing algorithms that generate the longest hash. SHA-256 creates a 256-bit hash, so a forger would have to try 2255 different documents (on average) before they created one that matched a given document, which is pretty secure. If that's still not secure enough for you, there's SHA-512.
Also, I think it's worth mentioning that a good low-tech way to protect yourself against forged digitally-signed documents is to simply keep a copy of anything you sign. That way, if it comes down to a dispute, you can show that the original document you signed was altered.
There is a hierarchy of difficulty (for an attacker) here. It is easier to find two files with the same hash than to generate one to match a given hash, and easier to do the later if you don't have to respect form/content/lengths restrictions.
Thus, if it is possible to use a well defined document structure and lengths, you can make an attackers life a bit harder no matter what underling hash you use.
Why are you trying to create your own hash algorithm? What's wrong with SHA1HMAC?
Yes, there are repeats for hashes.
Any hash that is shorter than the plaintext is necessarily less information. That means there will be some repeats. The key for hashes is that the repeats are hard to reverse-engineer.
Consider CRC32 - commonly used as a hash. It's a 32-bit quantity. Because there are more than 2^32 messages in the universe, then there will be repeats with CRC32.
The same idea applies to other hashes.
This is called a "hash collision", and the best way to avoid it is to use a strong hash function. MD5 is relatively easy to artificially build colliding files, as seen here. Similarly, it's known there is a relatively efficient method for computing colliding SH1 files, although in this case "relatively efficient" still takes hunreds of hours of compute time.
Generally, MD5 and SHA1 are still expensive to crack, but not impossible. If you're really worried about it, use a stronger hash function, like SHA256.
Writing your own isn't actually a good idea unless you're a pretty expert cryptographer. most of the simple ideas have been tried and there are well-known attacks against them.
If you really want to learn more about it, have a look at Schneier's Applied Cryptography.
I don't think coming up with your own hash algorithm is a good choice.
Another good option is used Salted MD5. For example, the input to your MD5 hash function is appended with string "acidzom!##" before passing to MD5 function.
There is also a good reading at Slashdot.