Hash and salt collision

Hash and salt collision - hash

I remember a guy telling me that if I let him change 4 bytes he can make a file have any checksum he wants (CRC-32).
I heard mention of salting a hash. I am wondering if someone had his file match my file would salting the MD5 or SHA-1 hash change the result so both files no longer collide? Or does it change the end hash value only?

You are mixing up two different uses of hash values:
Checksumming for guarding against random (non-malicious) errors.
Computing cryptographical message digests for storing passwords, signing messages, certificates ...
CRCs are a good choice for the first application, but totally unsuited for the second, because it is easy to compute a collision (in math-speak: CRCs are linear). This is what your friend is essentially telling you.
MD5 and SHA1 are cryptographic hashes intended for the second kind of application. However, MD5 has been cracked and SHA1 is considered weak these days. Still, even though MD5 can be cracked it takes a long time to find MD5 collisions (days to weeks).
As for salt, it makes the computation of the cryptographic hash local by mixing in some random non-secret value, this value is called the salt. This prevents computing global tables which make it easy to compute possible values (e.g. passwords) from the hash value. The computation of the tables is extremely expensive, but without salt the cost would be amortized over many cracked passwords.

The attack (against CRC-32) is irrelevant if the hash you are using is not CRC-32 - MD5 and SHA-1 are not vulnerable to that kind of attack (yet).
The current attacks against MD5 are where an attacker creates two documents with the same hash.
Salts are used for password verification - they prevent an attacker performing an offline attack against the password database - each user's password has a salt attached to the plain-text before the hashing - then a pre-computed rainbow table of plaintext <-> hashed text is useless.

Adding salt to your hash function doesn't really serve any purpose if the digest function has been compromised, because the salt will have to be made public to be used, and the attacker can adjust their file to factor this in too.
The solution to this problem is to use a secure hash function. MD5 has shown to be vulnerable to hash collision, but I believe SHA-1 has not (so far).

Salting is usually used in password hashes to avoid dictionary attacks. There are plenty of web based reverse hash dictionaries where you enter the hash (say: 1a79a4d60de6718e8e5b326e338ae533) and get back the text: "example". With salt, this becomes next to impossible. If you prepend a password with random salt, the dictionary attack become more difficult.
As for collisions, I don't think you need to worry about entire files having the same md5 or sha1 hash. it's not important. The important use of the hash is to prove the file you receive is the same as the file that was approved by someone who is an authority on the file. If you add salt to the file, you need to send the salt so the user can verify the hash.
This actually makes it easier for the attacker to spoof your file because he can provide a false salt along with the false file. The user can usually tell if the file is faked because it no longer serves the purpose it is supposed to. But how is the user supposed to know the difference between the correct salt and the attacker's salt?

Related

Is it more safe to modify MD5 output?

As an ordinary method, I always used to save MD5 of passwords in database while there are many websites that decode the MD5 hashed data to its original data (using rainbow database).
I wonder if it is more safe to modify the output of MD5 function (e.g. omitting the last character of MD5 output to create a new hashed data)? or there is a logic behind the MD5 that makes is more safe than every modified version?

No this doesn't do much good to make your passwords more secure. It adds a bit of "security by obscurity", but when we hash passwords, we prepare for the case where the attacker knows the hashes and the algorithm.
The problem with MD5 in general and with derivations is, that they can be calculated ways too fast. With common hardware you can calculate 8Giga MD5/s, which makes brute-forcing too easy. Todays password cracker tools do not only offer plain MD5 hashes, you can calculate also derivations, e.g. md5(strtoupper(md5($pass))) out of the box.
For a secure storing of passwords you need a slow hash function like BCrypt, PBKDF2 or SCrypt with a cost factor. Of course they should be salted with a unique salt per password.

Perhaps you should consider a different hashing algorithm instead?
https://security.stackexchange.com/questions/4789/most-secure-password-hash-algorithms

Does a single Salt provide any additional security?

I understand that it's best practice to generate a long salt for each password you use. But does using a single salt provide any security benefit from not having a salt at all?

Having a shared salt makes you marginally more secure. It prevents an attacker from using a pre-computed rainbow table attack, but it does not prevent them from building a single, new rainbow table for your password database. Thus it is harder for an attacker to crack a single password with a given salt, but it is significantly easier for them to crack every password with that salt.
As an example, consider the following simplified set of passwords, salts and hashes:
Password Salt Hash
aaaaaz y03sar ze4lap
zzzzza y03sar enbe65
The attacker knows that your salt is y03sar and starts computing every hash with that start for every password from a to zzzzzz. Long before (in terms of iterations, the actual cracking would finish very quickly for passwords and salts of this complexity) they discover the password zzzzza, they will realize they have also discovered the password aaaaaz. In other words, brute forcing one password in your database is no harder in the worst case than brute forcing every password.
With different salts, each password must be attacked separately.
Password Salt Hash
aaaaaz bbq9f0 i2chf1
zzzzza y03sar enbe65
If the attacker again starts calculating hashes for every password from a to zzzzzz with the salt y03sar, then i2chf1 won't be in their output list (at least it's improbable with a reasonable length hash output; even if it is, the computed password for salt y03sar still won't work to gain access to the aaaaaz account because the hash would be different with salt bbq9f0).
To add some numbers to the mix, using a single salt lengthens the attack time from instantaneous (rainbow tables provide for a constant-time lookup on the hash digest) to linear time. As soon as the attacker has computed the hashes for every password allowed by your system, they have access to every account in your system. Even if you have 16 character passwords allowing [a-zA-Z0-9], then your whole system is compromised in days or weeks. (Linked question is just an example - GPU's can crack passwords even faster than the hardware in the answer.)
Now if you have distinct salts for every password, then in the same amount of time it took an attacker to crack your entire database, they have only cracked a single password.
That's a pretty big difference. Use unique salts. (And a good hashing algorithm, while we're on the topic.)

How does checking hashes work if no 2 hashes are ever the same?

I may be wrong here, but from what I understand, no 2 hashes are ever the same.
Certainly, when I md5 the word "password" twice, I get two different hashes.
If the user's password is "password123", then the hash will be something like "482c811da5d5b4bc6d497ffa98491e38"
If the user enters their password when logging in at a later date, the hash of password123 is:
"286755fad04869ca523320acce0dc6a4"
How can I compare the 2 hashes if they're different for the exact same word?

Hash of same value with the same algorithm is always the same - this is why it is ok to compare just hashes to verify if values are definitely different (if hashes are the same it may still mean values are different, but using sufficiently long hash like SHA256 it may be safe enough to assume that values are the same for password verification).
Most likely you have bug in getting original values to be represented the same way (i.e. non-trimmed spaces, different encoding,...) and that causes hashes to be different.
Note MD5 is generally not acceptable for hashing passwords due to known weakness.

Please start by reading How to securely hash passwords?.
I'll leave most of the detail in my answer to Password Verification - How to securely check if entered password is correct, but the high points are:
Hashes are deterministic; however, for password hashes, a per-user random salt of 8-16 bytes is generated when users select passwords
and the salt is saved in the clear with the user's password hash, iteration/work factor (see below), and the version of password hashing you're using (so you can change it easily)
thus during verification, you use the same salt you did before
Passwords should not be hashed using a single pass of any hash function.
Passwords should be hashed using PBKDF2, BCrypt, or SCrypt.
For PBKDF2 in particular, do not select an output size larger than the native hash size.
In all cases, select as high an iteration count as you can afford during expected peak times.

Is salting a password pointless if someone gains access to the salt key? Off server salting?

Hearing about all the recent hacks at big tech firms, it made me wonder their use of password storage.
I know salting + hashing is accepted as being generally secure but ever example I've seen of salting has the salt key hard-coded into the password script which is generally stored on the same server.
So is it a logical solution to hash the user's password initially, pass that hash to a "salting server" or some function stored off-site, then pass back the salted hash?
The way I I'm looking at it is, if an intruder gains access to the server or database containing the stored passwords, they won't immediately have access to the salt key.

No -- salt remains effective even if known to the attacker.
The idea of salt is that it makes a dictionary attack on a large number of users more difficult. Without salt, the attacker hashes all the words in a dictionary, and sees which match with your users' hashed paswords. With salt, he has to hash each word in the dictionary many times over (once for each possible hash value) to be certain of having one that fits each user.
This multiplication by several thousand (or possibly several million, depending on how large a salt you use) increases the time to hash all the values, and the storage need to store the results -- the point that (you hope) it's impractical.
I should add, however, that in many (most?) cases, a very large salt doesn't really add a lot of security. The problem is that if you use, say, a 24 bit salt (~16 million possible values) but have only, say, a few hundred users, the attacker can collect the salt values you're actually using ahead of time, then do his dictionary attack for only those values instead of the full ~16 million potential values. In short, your 24-bit salt adds only a tiny bit of difficulty beyond what a ~8 bit salt would have provided.
OTOH, for a large server (Google, Facebook, etc.) the story is entirely different -- a large salt becomes quite beneficial.

Salting is useful even if intruder knows the salt.
If passwords are NOT salted, it makes possible to use widely available precomputed rainbow tables to quickly attack your passwords.
If your password table was salted, it makes it very difficult to precompute rainbow tables - it is impractical to create rainbow table for every possible salt.
If you use random salt that is different for every password entry, and put it in plaintext right next to it, it makes very difficult for intruder to attack your passwords, short of brute force attack.

Salting passwords protects passwords against attacks where the attacker has a list of hashed passwords. There are some common hashing algorithms that hackers have tables for that allow them to look up a hash and retrieve the password. For this to work, the hacker has to have broken into the password storage and stolen the hashes.
If the passwords are salted, then the attacker must re-generate their hash tables, using the hashing algorithm and the salt. Depending on the hashing algorithm, this can take some time. To speed things up, hackers also use lists of the most common passwords and dictionary words. The idea of the salt is to slow an attacker down.
The best approach to use a different salt for each password, make it long and random, and it's ok to store the salt next to each password. This really slows an attacker down, because they would have to run their hash table generation for each individual password, for every combination of common passwords and dictionary words. This would make it implausible for an attacker to deduce strong passwords.
I had read a good article on this, which I can't find now. But Googling 'password salt' gives some good results. Have a look at this article.

I would like to point out, that the scheme you described with the hard-coded salt, is actually not a salt, instead it works like a key or a pepper. Salt and pepper solve different problems.
A salt should be generated randomly for every password, and can be stored together with the hashed password in the database. It can be stored plain text, and fullfills it's purpose even when known to the attacker.
A pepper is a secret key, that will be used for all passwords. It will not be stored in the database, instead it should be deposited in a safe place. If the pepper is known to the attacker, it becomes useless.
I tried to explain the differences in a small tutorial, maybe you want to have a look there.

Makes sense. Seems like more effort than worth (unless its a site of significant worth or importance) for an attacker.
all sites small or large, important or not, should take password hashing as high importance
as long as each hash has its own large random salt then yes it does become mostly impracticable, if each hash uses an static salt you can use Rainbow tables to weed out the users hashs who used password1 for example
using an good hashing algorithm is also important as well (using MD5 or SHA1 is nearly like using plaintext with the mutli gpu setups these days) use scrypt if not then bcrypt or if you have to use PBKDF2 then (you need the rounds to be very high)

What is the purpose of the "salt" when hashing?

Ok, I’m trying to understand the reason to use salt.
When a user registers I generate a unique salt for him/her that I store in DB. Then I hash it and the password with SHA1. And when he/she is logging in I re-hash it with sha1($salt.$password).
But if someone hacks my database he can see the hashed password AND the salt.
Is that harder to crack than just hashing the password with out salt? I don’t understand …
Sorry if I’m stupid …

If you don't use a salt then an attacker can precompute a password<->hash database offline even before they've broken into your server. Adding a salt massively increases the size of that database, making it harder to perform such an attack.
Also, once they've broken in they can guess a commonly used password, hash it, and then check all of the passwords in the database for a match. With a different salt for each user, they can only attack one password at a time.
There's an article at Wikipedia about salts in cryptography.

Another intention behind the use of a salt is to make sure two users with the same password won't end up having the same hash in the users table (assuming their salt are not the same). However, the combination of a salt and a password may lead to the same "string" or hash in the end and the hash will be exactly the same, so make sure to use a combination of salt and password where two different combination won't lead to the same hash.

If an attacker creates a giant table of hash values for plaintext passwords, using a salt prevents him from using the same table to crack more than one password. The attacker would have to generate a separate table for each salt. Note that for this to actually work propertly, your salt should be rather long. Otherwise the attacker's precomputed table is likely to contain the salt+password hash anyway.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse