Using a hash of data as a salt - hash

I was wondering - is there any disadvantages in using the hash of something as a salt of itself?
E.g. hashAlgorithm(data + hashAlgorithm(data))
This prevents the usage of lookup tables, and does not require the storage of a salt in the database. If the attacker does not have access to the source code, he would not be able to obtain the algorithm, which would make brute-forcing significantly harder.
Thoughts? (I have a gut feeling that this is bad - but I wanted to check if it really is, and if so, why.)

If the attacker does not have access to the source code
This is called "security through obscurity", which is always considered bad. An inherently safe method is always better, even if the only difference lies in the fact that you don't feel save "because they don't know how". Someone can and will always find the algorithm -- through careful analysis, trial-and-error, or because they found the source by SSH-ing to your shared hosting service, or any of a hundred other methods.

Using a hash of the data as salt for the data is not secure.
The purpose of salt is to produce unpredictable results from inputs that are otherwise the same. For example, even if many users select the same input (as a password, for example), after applying a good salt, you (or an attacker) won't be able to tell.
When the salt is a function of the data, an attacker can pre-compute a lookup table, because the salt for every password is predictable.
The best salts are chosen from a cryptographic pseudo-random number generator initialized with a random seed. If you really cannot store an extra salt, consider using something that varies per user (like a user name), together with something application specific (like a domain name). This isn't as good as a random salt, but it isn't fatally flawed.
Remember, a salt doesn't need to be secret, but it cannot be a function of the data being salted.

This offers no improvement over just hashing. Use a randomly generated salt.
The point of salting is to make it so two chronologically distinct values' hashes differ, and by so doing breaks pre-calculated lookup tables.
Consider:
data = "test"
hash = hash("test"+hash("test"))
Hash will be constant whenever data = "test". Thus, if the attacker has the algorithm (and the attacker always has the algorithm) they can pre-calculate hash values for a dictionary of data entries.

This is not salt - you have just modified the hash function. Instead of using lookup table for the original hashAlgorithm, attacker can just get the table for your modified one; this does not prevent the usage of lookup tables.

It is always better to use true random data as salt. Imagine an implementation where the username ist taken as salt value. This would lead to reduced security for common names like "root" or "admin".
I you don't want to create and manage a salt value for each hash, you could use a strong application wide salt. In most cases this would be absolutely sufficient and many other things would be more vulnerable than the hashes.

Related

Is it a good idea to store two hashes for each password in a database?

Is it a good idea to store two hashes for each password in a database (e.g. SHA-1 and MD5) and check both of the hashes in a login script to prevent collisions? On the other side, wouldn't it then be easier to calculate the password from the two hashes (for example if a hacker gets access to the database)?
This would probably not be useful.
Any hash function you'd use will be safe against accidental collisions -- they're almost impossibly unlikely. So the only collisions you're concerned with are when hackers have already compromised your database, and have your hashes, trying to figure out a password that generates the target hash.
This is called a "second preimage attack", and it's incredibly hard. There are no known second preimage attacks for any relatively recent algorithm, even going back to MD4. This shouldn't be a serious concern.
However, if you're using a generic hash function, then people brute-forcing your hacked hashes is a realistic concern. You shouldn't use a generic hash function like SHA-2, even with salts. You should use a password hash function, like bcrypt, which is resistant to brute-forcing. If you are using normal hash functions then, as you note, storing two means they only need to brute force the weaker one -- it's one more thing that can go wrong.
Don't bother. Use a password hash function instead. It will be safer and simpler.
I really don't see how there can be any benefit from storing 2 hashes for the password and then checking both. All you're doing here is giving your app more work to do and in my opinion not providing any extra level of security as they are still entering the same password.

SALT Hash Passwords in Visual Studio

I know there are many questions on SALT and hashing passwords, but I have yet to find a tutorial to walk me through this in VS using the MVC pattern.
I currently have a DB created with a user table containing three columns:
userID(PK, int, not null)
password(varchar(45), not null)
loginID(varchar(8), null)
The password is saved as a visible string in the DB. After researching the issue, I assume password is easiest as binary instead of varchar. Does anyone know of a good tutorial to implement hashing and SALT into my program? One that clearly defines this in terms of the MVC pattern is preferred.
MVC doesn't have anything to do with salting your passwords, although someone might point to the proper libraries that might be used with your tech stack.
Salting involves using a specific sequence, and appending that to the end of user passwords, and then hashing that data.
The reason this is done is because a hash algorithm applies on a well known string is easily reversible. A person could, for example, use well known hash algorithms against a whole dictionary, and compare to user passwords to determine what it was hashed from. While a good hash function is a one way function (aka can't find the input based on the output), if you had a dictionary to map you could easily do it for well known strings/ string combinations.
For example, the password password has a well known hash. When you attach a random sequence to the end (or start) and then hash that, it's a significantly less common hash as a result, and then it's significantly harder to reverse.
Sorry for not having the specific technologies related, but I wanted to communicate the general higher level concept of it since the over-focus on the technologies loses the bigger picture.

Salted Passwords

I have been reading on correct procedures for storing and checking user passwords, and am puzzling a little with salts.
I get the idea that they are to prevent use of such tools as rainbow tables, but to me the idea of storing the salt along with the hash seems a potential security problem. Too much data in one place for my liking.
The idea I have that I would like to run by folks is to use a 'lucky number' to create a salt from a portion of the password hash.
Basically, along with choosing a password, a user would choose a 'lucky number' as well. This number would then be used as the starting index to retrieve a salt from the hashed pass.
So a very basic example would be something like this.
Pass = "Password"
Lucky Number = "4"
Pass Hash = "00003gebdksjh2h4"
Salt Length = "5"
Resulting Salt = "3gebd"
My thinking is that as the 'lucky number' would not need to be stored, the salt would then also take computational time to work out and so make any attacks even more difficult. Plus means storing slightly less data.
It wouldn't add more security. The goal of using rainbow table is to have a map of combinations of dictionary words and relevant hashes. By using a more-or-less distinct salt (which in general shouldn't be found in a dictionary) for each password hash you compute, you force the attacker to generate a new set of rainbow tables for each entry, which would be prohibitively computationally expensive. At no point (except for generating the hash table) the attacker calculates hashes, so your strategy fails here.
On the other hand when using a regular dictionary attack the attacker only needs a constant-time computation to calculate the salt, which he would have to compute only once for many millions or more combinations of hashes he would have to generate. It would only work if it was more computationally expensive than generating all that, but then you'd have to make the same computation each time a user wants to log in, which is not feasible.

Why have a good salt?

Let's say we don't use password_hash and use crypt() with sha512 instead to hash passwords. We need to add salt to the password, so an attacker couldn't use a rainbow table attack. Why the salt has to be good and very random as stated in many SO answers? Even if salt differs by a little or not very random, it will still give a totally different hash from others. So, an attacker won't know who uses the same passwords and he still won't be able to create just one rainbow table.
Computing and storing a strong salt requires minimal effort yet reduces the chances of a rainbow table having being pre-computed with the salt astronomically small.
If the salt was a 3 digit number it would be feasible that an attacker could have pre-computed rainbow tables for all possible salt combinations. If the salt is a random 24 character alpha-numeric string then the chances an attacker could pre-compute this for all possible salts are practically zero.
A salt is supposed to be unique, must be long enough, and should be unpredictable. Randomness is not necessary, but it is the easiest way for a computer to meet those requirements. And it is not the purpose of a salt to be secret, a salt fulfills its purpose even when known.
Uniqueness means that it should not only be unique in your database (otherwise you could use a userid), it should be unique worldwide. Somebody could create rainbowtables for salts like e.g. 1-1000 and would be able to retrieve passwords for all accounts with those userids (often admin accounts have low userids).
Long enough: If the salt is too short (too few possible combinations), it becomes profitable again to build rainbow-tables. Salt and password together can then be seen as just a longer password, and if you can build a rainbow-table for this longer passwords, you also get the shorter original passwords. For very strong and long passwords, salting would actually not be necessary at all, but most human generated passwords can be brute-forced because they are short (people have to remember them).
Also using salts derrived from other parameters can fall into this category. Only because you calculate a hash from the userid, this doesn't increase the possible combinations.
Unpredictability is a bit less important, but imagine once more the case that you use the userid as salt, an attacker can find out what the next few userids will be, and can therefore precalculate a narrow number of rainbow-tables. Depending of the used hash-algorithm this can be applicable or not. He has a time advantage then, can retrieve the password immediately. More of a problem will be, if the admin accounts used a predictable salt.
So using a really random number, generated from the OS random source (dev/urandom), is the best you can do. Even when you ignore all reasons above, why should you use a derived salt when there is a better way, why not use the best way you know?

Is salting a password pointless if someone gains access to the salt key? Off server salting?

Hearing about all the recent hacks at big tech firms, it made me wonder their use of password storage.
I know salting + hashing is accepted as being generally secure but ever example I've seen of salting has the salt key hard-coded into the password script which is generally stored on the same server.
So is it a logical solution to hash the user's password initially, pass that hash to a "salting server" or some function stored off-site, then pass back the salted hash?
The way I I'm looking at it is, if an intruder gains access to the server or database containing the stored passwords, they won't immediately have access to the salt key.
No -- salt remains effective even if known to the attacker.
The idea of salt is that it makes a dictionary attack on a large number of users more difficult. Without salt, the attacker hashes all the words in a dictionary, and sees which match with your users' hashed paswords. With salt, he has to hash each word in the dictionary many times over (once for each possible hash value) to be certain of having one that fits each user.
This multiplication by several thousand (or possibly several million, depending on how large a salt you use) increases the time to hash all the values, and the storage need to store the results -- the point that (you hope) it's impractical.
I should add, however, that in many (most?) cases, a very large salt doesn't really add a lot of security. The problem is that if you use, say, a 24 bit salt (~16 million possible values) but have only, say, a few hundred users, the attacker can collect the salt values you're actually using ahead of time, then do his dictionary attack for only those values instead of the full ~16 million potential values. In short, your 24-bit salt adds only a tiny bit of difficulty beyond what a ~8 bit salt would have provided.
OTOH, for a large server (Google, Facebook, etc.) the story is entirely different -- a large salt becomes quite beneficial.
Salting is useful even if intruder knows the salt.
If passwords are NOT salted, it makes possible to use widely available precomputed rainbow tables to quickly attack your passwords.
If your password table was salted, it makes it very difficult to precompute rainbow tables - it is impractical to create rainbow table for every possible salt.
If you use random salt that is different for every password entry, and put it in plaintext right next to it, it makes very difficult for intruder to attack your passwords, short of brute force attack.
Salting passwords protects passwords against attacks where the attacker has a list of hashed passwords. There are some common hashing algorithms that hackers have tables for that allow them to look up a hash and retrieve the password. For this to work, the hacker has to have broken into the password storage and stolen the hashes.
If the passwords are salted, then the attacker must re-generate their hash tables, using the hashing algorithm and the salt. Depending on the hashing algorithm, this can take some time. To speed things up, hackers also use lists of the most common passwords and dictionary words. The idea of the salt is to slow an attacker down.
The best approach to use a different salt for each password, make it long and random, and it's ok to store the salt next to each password. This really slows an attacker down, because they would have to run their hash table generation for each individual password, for every combination of common passwords and dictionary words. This would make it implausible for an attacker to deduce strong passwords.
I had read a good article on this, which I can't find now. But Googling 'password salt' gives some good results. Have a look at this article.
I would like to point out, that the scheme you described with the hard-coded salt, is actually not a salt, instead it works like a key or a pepper. Salt and pepper solve different problems.
A salt should be generated randomly for every password, and can be stored together with the hashed password in the database. It can be stored plain text, and fullfills it's purpose even when known to the attacker.
A pepper is a secret key, that will be used for all passwords. It will not be stored in the database, instead it should be deposited in a safe place. If the pepper is known to the attacker, it becomes useless.
I tried to explain the differences in a small tutorial, maybe you want to have a look there.
Makes sense. Seems like more effort than worth (unless its a site of significant worth or importance) for an attacker.
all sites small or large, important or not, should take password hashing as high importance
as long as each hash has its own large random salt then yes it does become mostly impracticable, if each hash uses an static salt you can use Rainbow tables to weed out the users hashs who used password1 for example
using an good hashing algorithm is also important as well (using MD5 or SHA1 is nearly like using plaintext with the mutli gpu setups these days) use scrypt if not then bcrypt or if you have to use PBKDF2 then (you need the rounds to be very high)