Identifying password similarity without storing in plain text? - hash

One of my SaaS software vendors requires me to change passwords every 90 days, which is good.
What surprises me though, is that the password change screen errors with a note that my new password is too similar to an old password.
This most often happens if I change less than three or four of the characters within a password.
If it were an exact match to an old password, I would have confidence that they are hashing my password, and comparing the hashes. The "similarity" matching makes me think they are storing and comparing the plaintext versions.
Is it possible to determine "similarity" by comparing one hash to another, or is this vendor more likely storing my password in plain-text?

It's possible. Whenever you change the password, the software could create hash codes for all combinations of the same password with a few characters masked or removed.
If your password is hello, it could create hash codes for _ello, h_llo, he_lo, hel_o, hell_, __llo, _e_lo, _ell_, he_l_, he__o... et.c.
The next time you change the password, it can create the same set of combinations of that password, and compare to all the previous hash codes. If there is a match, only a few characters were changed.
It's a lot simpler to just save the passwords in plain text, of course.

This depends whether they are checking all old passwords, or just your last one.
The last one will be available in memory if you had to enter your old password in order to set a new one. A form usually asks for three inputs: old password, new password and confirm new password.
If they are storing your last few passwords in hashed form, they would be able to check these for an exact match, and they could check your previous password for similarities using an algorithm using the old password that you just re-entered.

In all likelihood they are storing the plain text. With a good hashing algorithm there should be no correlation between the original content and the hash value (that is what makes it good).
It is possible they are storing some characteristics of the original password to use as reference. For example the counts of characters, any numeric value, etc., and then comparing to that but I doubt it.

One way to do this is by reducing the space of the password.
For example, if you think that "Hello" and "h3LL0" are similar, then you can make a reduce() function that changes the string to uppercase and changes all vowels and digits to #. Both "Hello" and "h3LL0" will be reduced to "H#LL#".
In the database you need to store hash() of the current password and hash(reduce()) of the current and all previous passwords.
You can design any policy of similarity you want, as long as you can make a suitable reduce() function.

Related

How do I store Argon2 passwords in my database?

I'm trying to store user passwords in my DB using Argon2 algorithm.
This is what I obtain by using it:
$echo -n "password" | argon2 "smallsalt" -id -t 4 -m 18 -p 4
Type: Argon2id
Iterations: 4
Memory: 262144 KiB
Parallelism: 4
Hash: cb4447d91dd62b085a555e13ebcc6f04f4c666388606b2c401ddf803055f54ac
Encoded: $argon2id$v=19$m=262144,t=4,p=4$c21hbGxzYWx0$y0RH2R3WKwhaVV4T68xvBPTGZjiGBrLEAd34AwVfVKw
1.486 seconds
Verification ok
In this case, what should I store in the DB?
The "encoded" value as shown above?
The "hash" value as shown above?
Neither, but another solution?
Please, could you help me? I'm a newbie with this and I'm a little bit lost.
I'm a bit late to the party, but I disagree with the previous answers.
You should store the field: Encoded
The $argon2id$.... value.
(At least if you are using normal Argon2 libraries having the verify() function.
It does not look like the man-page for argon2 command does this, however.
Only if you are stuck with the command line, you should consider storing each field individually.)
The $argon2id$ encoded hash
The argon2 encoded hash follows the same as its older cousin bcrypt's syntax.
The encoded hash includes all you ever need to verify the hash when the user logs in.
It is most likely more future proof. When a newer and better argon2 comes along: You can upgrade your one column hashed passwords. Just like you could detect bcrypt's $2a$-hashes, and re-hash them as $argon2id$-hashes, next time the user logs in. (If you were moving from bcrypt to agron2.)
TL;DR
Store the $-encoded value encoded_hash in your database.
Use argon2.verify(password, encoded_hash) to verify that the password is correct.
Don't bother about all the values inside the hash. Let the library do that for you. :)
Neither. Save following as a single value:
algorithm ID (e.g. argon2id)
salt
number of iterations (4)
memory usage factor (18)
parallelism (4)
The output of the field "encoded" is misleading because you cannot use it as is for password check (i.e. for hash generation), e.g. m=262144 where as for password check you need the original factor m=18.
Are you going to launch an OS process each time you check password? I would discourage you from doing this. I'd suggest you use a library (C++, Java, ...). They produce a string that contains all these data concatenated and separated with "$".
I'd put the type, iterations, memory, parallelism, hash, salt and corresponding user id into separate columns and leave the encoded bit out, because it's just all the attributes joined together. If they're in separate columns then you can reference the attributes more easily than having to split and index the encoded string.
The other option is to just store the encoded string in 1 column, but as I said its more tedious to look at certain attributes, as you'd have to split the encoded string and then index it.
I had the same question and read this post while gathering some information. Now after some days and thoughts about all this, I'll personally take a different route than the accepted answer and therefore slightly disagree with it. I thought I would share my perspective so that it might help others as well.
I suppose it will depend on everyone's context. I don't think there is a one size fits all answer here. I'm sure there are situations where it is perfectly valid and even better/simpler to store the encoded string ($argon2...).
However, I would argue that depending on the context, storing the encoded string doesn't seem to be the right approach.
First of all, it makes the hashing method very obvious. It is probably not that important but for some reasons it makes me a bit more comfortable not having it ^^. But, more importantly, it means that implementation details are stored in your persistence layer (db or else). At the time of writing, argon2id is the recommended hashing mechanism by OWASP but these things can change (eventually do change...). Some day, it might be considered unsecure, or another function will be considered more secure.
As a result, I would suggest this more function "agnostic" starting point:
The hash (for argon2 -> the hex string)
The salt
The last_modified date
A string with hashing parameters (for argon2, you could put the parameters here in the form of your choosing)
The last_modified allows to know if the hash needs updating or not and the parameters allows to support the verification and update of "old" hashes.
Of course this means that you have to work a bit more in the code and can't simply use every libraries shortcuts straight away. But, I would say that this increased complexity offer more flexibility in other circumstances (like moving away from a given hashing function). As always there are no free lunch.
That's why I suppose it depends on your context and why personally I wouldn't go with the accepted answer in my situation.
PS: I'm no cryptography expert nor some devsecop guru. So feel free to contradict, enrich, agree or disagree. I just like to keep my implementation details contained ;)

Reason for salting a password for webservice

I have very basic question related to user management and in particular storing hashed passwords.
I read few pages (like https://wiki.python.org/moin/Md5Passwords ).
The way I understand hashing is this:
password provided by user is hashed (with whatever function) one way.
nobody (including user/admin) is able to see the password.
when user logs in - the string provided by him is hashed to see if it matches stored hashed password.
That's all clear, however I am not sure what with 'salt' in hashing.
I read os.urandom (Python) is good to create good salt:
https://crackstation.net/hashing-security.htm
What I am not sure is how to work with this added "salt"
If I hash user password with salt and its one way. The next time when user log in he knows only password and not salt. From this I assume that "salt" generated for this user needs to be stored somewhere. Otherwise it will not make sense. But on the other hand if somebody gets access to DB then will see "salt" and hashed password. In such case "salt" does not add much value (its pretty much the same as hashing pure password). So maybe the "salt" is just to prevent protection on front end (against brute force).
Can somebody provide me a hint how to work with salt? Is my understanding correct. Do I need to store "salt" somewhere?
Before I posted this question I found this:
Should the Salt for a password Hash be "hashed" also?
what is the added value of the salt?
if I write web service I can block each log in after 3 failed attempts.
Nobody on the front end is able to see hashed values. Nobody can use brute force (this might be only DoS since 3 failed log ins will block user). The hacker will need have access to DB and see hashed passwords. But if he has, he will see "salt".
Salt is used to prevent a hacker from reversing the password hashes into passwords. So here we assume that somewhow the hacker has access to the database.
Without salt
Let us first assume the scenario without salt. In that case the table looks like:
user | md5 password (first 6 chars)
-------------------------------
1 | 1932ff
2 | d3b073
(we here make the situation simpler than it is in reality)
The hacker of course wants to know what the passwords behind d3b073 and 1932ff are. A hash function is one directional in the sense that we can hash a password very fast, but unhashing it will - given it is a good hashing function - take a very long time, after guessing a huge amount of passwords.
So there is not much hope to easily retrieve the possible password(s) behind d3b073. But we can easily find a list of the 100'000 most popular passwords, and calculate the MD5 hash of all these passwords. Such list could look like:
password | md5 (first 6 characters)
--------------------------------------------
foo | d3b073
bar | c157a7
So apparently user 2 has used foo as password. The password of user 1 is unknown to us (but we know it is not foo or bar).
Now the point is that we can construct such table once and then use it to crack all passwords of all the users. Constructing such table for 100'000 passwords might perhaps take a few hours, but then we can easily retrieve all passwords. So a hacker can construct (or download) such table (there are more efficient ways, for instance with rainbow tables), and then use it each time he/she hacks a website and then obtains the passwords of all users.
With salt
If we however use salting, the table could look like this:
user | salt | hashed password
-------------------------------
1 | a91f40 | 1a604e
2 | c2a67c | b36232
So here if the password of user 2 is foo, then we calculate the hash of fooc2a67c (or we use another way to combine the salt and the password) and store this into the database.
The point is that it is very hard to guess the password, since b36232 is not the hash of foo, but of fooc2a67c and the salt is typically something (pseudo)-random. We can of course again construct the most popular 100'000 passwords with salt c2a67c appended to it, but since we can not know the salt in advance, we can not create this table only once. Even if we are lucky and already constructed the table for salt c2a67c, it will not help us with hacking the password of user 1, since user 1 has a different salt.
So the only way to resolve this, is by constructing a reverse hash lookup table, for every user. Since it is usually very expensive to construct such table once, it will not be easy to calculate such table for every user.
We might of course decide to calculate all hashes of all possible salts, like for instance:
password | md5 (first 6 characters)
---------------------------------------------
foo000000 | 367390
foo000001 | eca8ea
foo000002 | 6eb7bf
foo000003 | 7906b1
foo000004 | 0e9f0c
foo000005 | 0bfb11
... | ...
But as you can see, the size of such table would grow to gigantic sizes. Furthermore it would take thousands of years. Even if we add only one hexadecimal character as salt, the size of the table would scale 16 times. Yes there are some techniques to reduce the amount of time and space for such table, but by increasing the "password space", the problem to hack passwords, will definitely be much harder. Furthermore salt is usally a signifcant amount of characters (or bytes) long making it way more harder than just 16 times more.
Basically salt acts as a way to enlarge the password space. Even if you enter the very same password on two websites, the personal salt of the websites will (close to certainty) be unique, and therefore the hash will be unique as well.

Is it really necessary to restrict user to input password with only certain character

I have seen many websites with their own rule of password validation. someone says don't input *&^%, few people says it should be between 8-12 character, etc. Is it really necessary?
I mean the password field should not be validated at all! what if I have 3 character password in my mind and it's impossible to guess!
Or else there should be a standard password validation so that user mindset will be constant for all website rather have to think every time before registering at a new website about their password rules.
It is good standard practice to require passwords of multiple character types. The longer and more complex a password is the harder it will be for a script to crack it. A three character password can take a matter of seconds to minutes to crack as where one that is eight to twelve characters (i.e. - letters, numbers, and special characters) can take upwards of years to crack. In the end it is up to you how secure you want your content to be.

Pretty URLs with hashes (md5)

In our web application we display a list of pulses, but for linking and such we make every pulse uniquely available. In our Couch DB we are giving every pulse a unique id by md5'ing their unique attributes. I.E.: www.foo.com/bar/
Though these md5 sums are extremely long and make for ugly URLs. Is there another way to hash the attributes that will require less characters but still guarantee uniqueness.
Thanks a lot
Instead of creating an ugly md5 you could use a method like this to create a random string of a given length containing certain characters and insert this into a row next to the md5 row that is used for retrieving the data from the database using the 'pretty url' string. One thing to think about would be to take out the vowels from the possible characters as with them, you could end up with bad words :) Also, make sure it does not already exist in the database of course, and if it does just create another one... that won't happen very often though.

Am I misunderstanding what a hash salt is?

I am working on adding hash digest generating functionality to our code base. I wanted to use a String as a hash salt so that a pre-known key/passphrase could be prepended to whatever it was that needed to be hashed. Am I misunderstanding this concept?
A salt is a random element which is added to the input of a cryptographic function, with the goal of impacting the processing and output in a distinct way upon each invocation. The salt, as opposed to a "key", is not meant to be confidential.
One century ago, cryptographic methods for encryption or authentication were "secret". Then, with the advent of computers, people realized that keeping a method completely secret was difficult, because this meant keeping software itself confidential. Something which is regularly written to a disk, or incarnated as some dedicated hardware, has trouble being kept confidential. So the researchers split the "method" into two distinct concepts: the algorithm (which is public and becomes software and hardware) and the key (a parameter to the algorithm, present in volatile RAM only during processing). The key concentrates the secret and is pure data. When the key is stored in the brain of a human being, it is often called a "password" because humans are better at memorizing words than bits.
Then the key itself was split later on. It turned out that, for proper cryptographic security, we needed two things: a confidential parameter, and a variable parameter. Basically, reusing the same key for distinct usages tends to create trouble; it often leaks information. In some cases (especially stream ciphers, but also for hashing passwords), it leaks too much and leads to successful attacks. So there is often a need for variability, something which changes every time the cryptographic method runs. Now the good part is that most of the time, variability and secret need not be merged. That is, we can separate the confidential from the variable. So the key was split into:
the secret key, often called "the key";
a variable element, usually chosen at random, and called "salt" or "IV" (as "Initial Value") depending on the algorithm type.
Only the key needs to be secret. The variable element needs to be known by all involved parties but it can be public. This is a blessing because sharing a secret key is difficult; systems used to distribute such a secret would find it expensive to accommodate a variable part which changes every time the algorithm runs.
In the context of storing hashed passwords, the explanation above becomes the following:
"Reusing the key" means that two users happen to choose the same password. If passwords are simply hashed, then both users will get the same hash value, and this will show. Here is the leakage.
Similarly, without a hash, an attacker could use precomputed tables for fast lookup; he could also attack thousands of passwords in parallel. This still uses the same leak, only in a way which demonstrates why this leak is bad.
Salting means adding some variable data to the hash function input. That variable data is the salt. The point of the salt is that two distinct users should use, as much as possible, distinct salts. But password verifiers need to be able to recompute the same hash from the password, hence they must have access to the salt.
Since the salt must be accessible to verifiers but needs not be secret, it is customary to store the salt value along with the hash value. For instance, on a Linux system, I may use this command:
openssl passwd -1 -salt "zap" "blah"
This computes a hashed password, with the hash function MD5, suitable for usage in the /etc/password or /etc/shadow file, for the password "blah" and the salt "zap" (here, I choose the salt explicitly, but under practical conditions it should be selected randomly). The output is then:
$1$zap$t3KZajBWMA7dVxwut6y921
in which the dollar signs serve as separators. The initial "1" identifies the hashing method (MD5). The salt is in there, in cleartext notation. The last part is the hash function output.
There is a specification (somewhere) on how the salt and password are sent as input to the hash function (at least in the glibc source code, possibly elsewhere).
Edit: in a "login-and-password" user authentication system, the "login" could act as a passable salt (two distinct users will have distinct logins) but this does not capture the situation of a given user changing his password (whether the new password is identical to an older password will leak).
You are understanding the concept perfectly. Just make sure the prepended salt is repeatable each and every time.
If I'm understanding you correctly, it sounds like you've got it right. The psuedocode for the process looks something like:
string saltedValue = plainTextValue + saltString;
// or string saltedalue = saltString + plainTextValue;
Hash(saltedValue);
The Salt just adds another level of complexity for people trying to get at your information.
And it's even better if the salt is different for each encrypted phrase since each salt requires its own rainbow table.
Its worth mentioning that even though the salt should be different for each password usage, your salt should in NO WAY be computed FROM the password itself! This sort of thing has the practical upshot of completely invalidating your security.