Preventing preimage attack on limited set of values - hash

I have asked about the cost of running a preimage attack on the hashes of social security numbers. The excellent answer I got was that the type of social security numbers only has 366,000,000 hashes, which would make it easy to run a preimage attack.
My question now is whether it is possible to avoid a preimage attack altogether. My scenario is that several clients need to store the social security number on a central server. The hashing must be consistent between the clients. The clients could communicate with online web services.

Your problem is similar to what must be done when using passwords. Passwords fit in human brains, and, as such, cannot be much difficult to guess.
There are two complementary ways to mitigate risks when using low-entropy secrets:
Use iterated/repeated hashing to make each "guess" more expensive for the attacker.
Use salts to prevent cost sharing. The attacker shall pay the full dictionary search attack for every single attacked password/SSN.
One way to make hashing more expensive is to hash the concatenation of n copies of the data, with a n as big as possible (depending on the computing power of the clients, and, ultimately, the patience of the user). For instance, for (dummy) SSN "123456789", use H(123456789123456789123456789...123456789). You would count n in millions here; on a basic PC, SHA-256 can easily process a hundred megabytes per second.
A salt is a piece of public data which is used along the data to hash (the SSN), and which is different for each user. A salt needs not be secret, but it should not be reused (or at least not often). Since SSN tend to be permanent (an individual has a unique SSN for his whole life), then you can use the user name as salt (this contrasts with passwords, where a user can change his password, and should use a new salt for every new password). Hence, if user Bob Smith has SSN 123456789, you would end up using: H("Bob Smith 123456789 Bob Smith 123456789 Bob Smith 123456789... Smith 123456789") with enough repetitions to make the process sufficiently slow.
Assuming you can make the user wait for one second (it is difficult to make a user wait for more) on a not-so-new computer, it can be expected that even a determined attacker will have trouble trying more than a few hundred SSN per second. The cost of cracking a single SSN will be counted in weeks, and, thanks to the use of the user name as a salt, the attacker will have no shortcut (e.g. salting defeats precomputed tables, including the much-hyped "rainbow tables").

Related

is possible to bruteforce my Sha512 Authenication algorithm?

I have an authentication application and don't know how secure it is.
here is the algorithm.
1) A clientToken is generated by using SHA512 hash a new guid. I have about 1000 ClientsToken generated and store in the database.
every time the caller calling my web service it need to provide the clientToken, if the clienttoken does not exists in the database, then it is not valid client.
The problem is how long does it take to brute force to get the existing ClientToken?
A GUID is a 128 bit value, with 6 bits held constant, so a total of 122 bits available. Since this is your input to the hash, you're not going to have 2^512 unique hashes in your application. This is roughly 5.3*10^36 values to check.
Say your attacker is able to calculate 1,000,000 (10^6) hashes per second (I'm not sure how reasonable that is for SHA-512, but at this size, a few orders of magnitude won't influence things that much). This works out to about 5.3*10^30 seconds to check the space (For reference, this will be far beyond the time all stars have gone dark). Also, unless you have several billion clients, a birthday attack probably will not remove too many orders of magnitude from this.
But, just for fun, let's say the attacker has some trick that lets him reduce the number of hashes to check by half (or some combination of reduced space to check and increased speed), either by you having that many users, or some flaw in your GUID generator, or what have you. We're still looking at well over 100 million years to find a collision.
I think you're beyond safe and into somewhat overkill territory. Also note that hashing the GUID in effect does nothing, and that GUIDs probably are not generated via a secure random number generator. You'd actually be a bit better off just generating a 128 bits (16 bytes) of randomness via whatever secure random number generator your platform uses, and using that as the shared secret.

Why have a good salt?

Let's say we don't use password_hash and use crypt() with sha512 instead to hash passwords. We need to add salt to the password, so an attacker couldn't use a rainbow table attack. Why the salt has to be good and very random as stated in many SO answers? Even if salt differs by a little or not very random, it will still give a totally different hash from others. So, an attacker won't know who uses the same passwords and he still won't be able to create just one rainbow table.
Computing and storing a strong salt requires minimal effort yet reduces the chances of a rainbow table having being pre-computed with the salt astronomically small.
If the salt was a 3 digit number it would be feasible that an attacker could have pre-computed rainbow tables for all possible salt combinations. If the salt is a random 24 character alpha-numeric string then the chances an attacker could pre-compute this for all possible salts are practically zero.
A salt is supposed to be unique, must be long enough, and should be unpredictable. Randomness is not necessary, but it is the easiest way for a computer to meet those requirements. And it is not the purpose of a salt to be secret, a salt fulfills its purpose even when known.
Uniqueness means that it should not only be unique in your database (otherwise you could use a userid), it should be unique worldwide. Somebody could create rainbowtables for salts like e.g. 1-1000 and would be able to retrieve passwords for all accounts with those userids (often admin accounts have low userids).
Long enough: If the salt is too short (too few possible combinations), it becomes profitable again to build rainbow-tables. Salt and password together can then be seen as just a longer password, and if you can build a rainbow-table for this longer passwords, you also get the shorter original passwords. For very strong and long passwords, salting would actually not be necessary at all, but most human generated passwords can be brute-forced because they are short (people have to remember them).
Also using salts derrived from other parameters can fall into this category. Only because you calculate a hash from the userid, this doesn't increase the possible combinations.
Unpredictability is a bit less important, but imagine once more the case that you use the userid as salt, an attacker can find out what the next few userids will be, and can therefore precalculate a narrow number of rainbow-tables. Depending of the used hash-algorithm this can be applicable or not. He has a time advantage then, can retrieve the password immediately. More of a problem will be, if the admin accounts used a predictable salt.
So using a really random number, generated from the OS random source (dev/urandom), is the best you can do. Even when you ignore all reasons above, why should you use a derived salt when there is a better way, why not use the best way you know?

Is salting a password pointless if someone gains access to the salt key? Off server salting?

Hearing about all the recent hacks at big tech firms, it made me wonder their use of password storage.
I know salting + hashing is accepted as being generally secure but ever example I've seen of salting has the salt key hard-coded into the password script which is generally stored on the same server.
So is it a logical solution to hash the user's password initially, pass that hash to a "salting server" or some function stored off-site, then pass back the salted hash?
The way I I'm looking at it is, if an intruder gains access to the server or database containing the stored passwords, they won't immediately have access to the salt key.
No -- salt remains effective even if known to the attacker.
The idea of salt is that it makes a dictionary attack on a large number of users more difficult. Without salt, the attacker hashes all the words in a dictionary, and sees which match with your users' hashed paswords. With salt, he has to hash each word in the dictionary many times over (once for each possible hash value) to be certain of having one that fits each user.
This multiplication by several thousand (or possibly several million, depending on how large a salt you use) increases the time to hash all the values, and the storage need to store the results -- the point that (you hope) it's impractical.
I should add, however, that in many (most?) cases, a very large salt doesn't really add a lot of security. The problem is that if you use, say, a 24 bit salt (~16 million possible values) but have only, say, a few hundred users, the attacker can collect the salt values you're actually using ahead of time, then do his dictionary attack for only those values instead of the full ~16 million potential values. In short, your 24-bit salt adds only a tiny bit of difficulty beyond what a ~8 bit salt would have provided.
OTOH, for a large server (Google, Facebook, etc.) the story is entirely different -- a large salt becomes quite beneficial.
Salting is useful even if intruder knows the salt.
If passwords are NOT salted, it makes possible to use widely available precomputed rainbow tables to quickly attack your passwords.
If your password table was salted, it makes it very difficult to precompute rainbow tables - it is impractical to create rainbow table for every possible salt.
If you use random salt that is different for every password entry, and put it in plaintext right next to it, it makes very difficult for intruder to attack your passwords, short of brute force attack.
Salting passwords protects passwords against attacks where the attacker has a list of hashed passwords. There are some common hashing algorithms that hackers have tables for that allow them to look up a hash and retrieve the password. For this to work, the hacker has to have broken into the password storage and stolen the hashes.
If the passwords are salted, then the attacker must re-generate their hash tables, using the hashing algorithm and the salt. Depending on the hashing algorithm, this can take some time. To speed things up, hackers also use lists of the most common passwords and dictionary words. The idea of the salt is to slow an attacker down.
The best approach to use a different salt for each password, make it long and random, and it's ok to store the salt next to each password. This really slows an attacker down, because they would have to run their hash table generation for each individual password, for every combination of common passwords and dictionary words. This would make it implausible for an attacker to deduce strong passwords.
I had read a good article on this, which I can't find now. But Googling 'password salt' gives some good results. Have a look at this article.
I would like to point out, that the scheme you described with the hard-coded salt, is actually not a salt, instead it works like a key or a pepper. Salt and pepper solve different problems.
A salt should be generated randomly for every password, and can be stored together with the hashed password in the database. It can be stored plain text, and fullfills it's purpose even when known to the attacker.
A pepper is a secret key, that will be used for all passwords. It will not be stored in the database, instead it should be deposited in a safe place. If the pepper is known to the attacker, it becomes useless.
I tried to explain the differences in a small tutorial, maybe you want to have a look there.
Makes sense. Seems like more effort than worth (unless its a site of significant worth or importance) for an attacker.
all sites small or large, important or not, should take password hashing as high importance
as long as each hash has its own large random salt then yes it does become mostly impracticable, if each hash uses an static salt you can use Rainbow tables to weed out the users hashs who used password1 for example
using an good hashing algorithm is also important as well (using MD5 or SHA1 is nearly like using plaintext with the mutli gpu setups these days) use scrypt if not then bcrypt or if you have to use PBKDF2 then (you need the rounds to be very high)

How to securely detect accounts with matching passwords?

On our message board, we use password matching to help detect members with multiple registrations and enforce our rules against malicious puppet accounts. It worked well when we had SHA256 hashes and a per-site salt. But we recently had a humbling security breach in which a number of password hashes fell to a dictionary attack. So we forced a password change, and switched to bcrypt + per-user salts.
Of course, now password matching doesn't work anymore. I don't have a formal education in cryptography or computer science so I wanted to ask if there's a secure way to overcome this problem. Somebody I work with suggested a second password field using a loose hashing algorithm which intentionally has lots of collisions, but it seems to me that this would either lead to tons of false positives, or else reduce the search space too much to be secure. My idea was to stick with bcrypt, but store a second password hash which uses a per-site salt and an extremely high iteration count (say 10+ seconds to generate on modern hardware). That way users with the same password would have the same hash, but it couldn't be easily deduced with a dictionary attack.
I'm just wondering if there's an obvious problem with this, or if someone more knowledgeable than me has any suggestions for a better way to approach things? It seems to me like it would work, but I've learned that there can be a lot of hidden gotchas when it comes to security. :P Thanks!
Short Answer
Any algorithm that would allow you to detect whether or not 2 users had the same password would also allow an attacker to detect whether or not 2 users had the same password. This is, effectively, a precomputation attack. Therefore, your problem is not securely solvable.
Example
Assume I've compromised your password database.
Assume I've figured out how your hashes are calculated.
If I can apply your password transformation algorithm to "password" and quickly tell which users use "password" as their password, then the system is vulnerable to a form of precomputation attack.
If I must do an expensive calculation to determine the password for each individual user and work spent to calculate User A's password does not make calculating User B's password easier, then the system is secure (against these type of attacks).
Further Consideration
Your idea of using a per-site salt with bcrypt and a high iteration count may seem attractive at first, but it just can't scale. Even at 10 seconds, that's 6 password guesses per minute, 360 per hour, 8640 per day, or 3M per year (that's a lot). And that's just one machine. Throw a botnet of machines at that problem, or some GPU's and suddenly that number goes through the roof. Just 300 machines/cores/GPU's could knock out 2.5M guesses in a day.
Because you would be using the same salt for each one, you're allowing the attacker to crack all of your user's passwords at once. By sticking with a per-user salt only, the attacker can effectively only attempt to crack a single user's password at a time.
The short answer given above makes the assumption that the attacker has the same access as the server at all times, which is probably not reasonable. If the server is compromised in a permanent manner (owned by the attacker) then no scheme can save you - the attacker can retrieve all passwords as they are set by the user. The model is more normally that an attacker is able to access your server for a limited period of time, some point after it has gone live. This introduces an opportunity to perform the password matching that you've asked about without providing information that is useful to an attacker.
If at sign-up or password change your server has access to the password in plain text, then the server could iterate through all the user accounts on the system, hashing the new password with each user's individual salt, and testing to see if they were the same.
This doesn't introduce any weaknesses, but it would only be useful to you if your algorithm for preventing multiple fake accounts can use this as a one-time input ("this password matches these accounts").
Storing that information for later analysis would obviously be a weakness (for if an attacker can obtain your database of passwords, they can probably also obtain this list of accounts with the same password). A middle ground might be to store the information for daily review - reducing the total useful information available to an attacker who temporarily compromises your storage.
All of this is moot if the salting and hashing occurs client-side - then the server can't carry out the test.

Is forcing complex passwords "more important" than salting?

I've spent the past 2 hours reading up on salting passwords, making sure that I understood the idea. I was hoping some of you could share your knowledge on my conclusions.
If I'm an attacker, and I gain access to a user database, I could just take all the per-user salts present in the table and use those to create my rainbow tables. For big tables this could take a long time. If I could cut the list down to users of interest (admins, mods) I could use much bigger dictionary lists to create the rainbow tables, raising my percentage of hits...
If this is true then it seems that salting really doesn't do all that much to help. It only marginally slows down an attacker.
I know ideally you would want to force complex passwords and salt them with unique and random strings, but forcing complex passwords can annoy users (i know it annoys me), so a lot of sites don't do it. It seems sites are doing their users a disservice with this, and that forcing complex passwords is a lot more important than a good salting method.
I guess this isn't so much a question, but a request for others knowledge on the situation.
The point of a salt is that an attacker can no longer use a pre-existing dictionary to attack any user in your system. They have to create a brand new dictionary for each user using that user's salt, which takes time and effort. If you learn about a breach before dictionaries are created for all users of your system, you have time to act. (Alert users that their log-in credentials must be changed, which should generate a new random salt.)
I would say that you should use both a salt and the most complex password (pass phrase, really) that your users will tolerate. Even still, salting is a fundamental security measure, and you can't really afford to do without it.
Is keeping properly hydrated more important than breathing?
I tend to favor an approach that uses a salt per user, global salt (salt per algorithm), and modest password complexity rules (8+ characters with some combination of at least 2 uppercase/digit/punctuation characters) for most web sites. Using salts requires the generation of a rainbow table per account you want to break -- assuming unique salts per user. Using a global salt requires that you both compromise the DB and the application server. In my case, these are always two separate systems. Using password complexity rules helps to protect against simple, easy to guess passwords being used.
For accounts with more privileges, you may want to enforce greater password complexity. For example, admins in our AD forest are required to have a minimum 15 character passwords. It's actually easier than shorter passwords because it pretty much forces you to use a pass phrase rather than a password.
You also want to instruct your users in how to create good password, or better yet, pass phrases and to be aware of various social engineering attacks that circumvent all of your technical means of protecting your data.
Okay, let's look at real figures:
A single Nvidia 9800GTX can calculate 350 million MD5 hashes / second. At this rate, the entire keyspace of lower and uppercase alphanumeric characters will be done in 7 days. 7 chars, two hours. Applying salting will only double or triple these times depending on your algorithm.
Cheap modern GPUs will easily boast one billion MD5 hashes / second. Determined people typically link up about 6 of these, and get 6 billion / second, rendering the 9 character keyspace obsolete in 26 days.
Note that I'm talking about brute force here, as preimage attacks may or may not apply after this level of complexity.
Now if you want to defend against professional attackers, there is no reason they can't get 1 trillion hashes / second, they'd just use specialized hardware or a farm of some cheap GPU machines, whichever is cheaper.
And boom, your 10 character keyspace is done in 9.7 days, but then 11 character passwords take 602 days. Notice that at this point, adding 10 or 20 special characters to the allowed character list will only bring the cracking time of a 10 character keyspace to 43 or 159 days, respectively.
See the problem with password hashing is that it only reduces time until your futile doom. If you want something really strong, but still as naive as stored hashed passwords are, go for PBKDF2.
Then there is still one more problem, will the user use this "strong" password you forced him to use on all his other sites? If he doesn't save them in a master password file, he most certainly will. And those other sites wont use the same strong hashing algorithms you use, defeating the purpose of your system. I can't really see why you want your hashes to be super strong if it isn't to stop users from using the same password on multiple sites; if an attacker has access to your hashes, you most likely already lost.
On the other hand, like I will repeat and repeat again to people asking questions about how "secure" their hashing scheme is, just use client certificates, and all your problems are solved. It becomes impossible for users to use the same credentials on multiple sites, attackers cannot break your credentials without modifying them, users cannot easily have their credentials stolen if they store them on a smart card, etc etc.
To naively answer your question: a strong password is only backed by a strong hashing algorithm.
With the sole exception of a requirement for a long string, every constraint reduces the size of password phase space. Constraints cause a decrease in complexity, not an increase.
If this seems counterintuitive, consider that you are providing a bunch of reliable assumptions for the cracker. Let me illustrate this point with a true story from my misspent youth:
In the early days of twin-primes encryption processors were so slow that people tended to use int32 arithmetic for speed. This allowed me to assume the primes were between 0 and four billion. People always picked large primes because conventional wisdom held that bigger was better. So I used a pre-computed dictionary of primes and worked down from a known ceiling, knowing that people generally chose keys close to that ceiling. This generally allowed me to crack their key in about 30 seconds.
Insist on long pass phrases, and use salting, with no other constraints.
When people say "sophisticated" techniques they often mean complicated. A transformation can be very complicated and yet be commutative, and unfortunately if you don't know what that means then you're not in a position to assess the merits of the technique. Complexity of algorithm lends only security by obscurity, which is a bit like getting a house out of town and not locking the doors.
Use sophisticated techniques like salthash to keep your users' private information safe.
But don't obstruct your users. Offer suggestions, but don't get in their way.
It's up to your users to pick good passwords. It's up to you to suggest how to pick good passwords, and to accept any password given and keep the user's information as safe as the password given permits.
Both salted passwords and complex passwords are necessary for real security. Typically rainbow tables aren't created on the fly to attack a specific site, but are rather precomputed. It would be far more efficient to simply brute force a password than to generate a look up table based on a particular salt value.
That being said the purpose of a hash it to ensure that an attacker can't recover a password if your database is compromised. It does nothing to prevent an attacker from guessing an easy password.
Requiring password complexity is really a matter of the type of site/ type of data you are protecting. It will annoy some users, and cause others to write their password out on a post it and stick it to their monitor. I'd say it is absolutely essential to use a strong hash and salt on your end- neglecting to do so exposes not only your site, but completely compromises every user who recycles username/ password combinations.
So in my opinion salting is mandatory regardless of the security level of your site. Enforced password complexity is good for high security sites - but is definitely more situational. It won't guarantee good security practices on the part of your users. I'll also add that requiring a secure password for a site that doesn't require it can do more harm than good as it is more likely that a user will recycle a high-security password that they use on other more essential sites.