Different output, same username and password - .htpasswd

I was wondering why even for the same username and the same password, htpasswd outputs a new hash everytime? I tried finding an answer to this question, but couldn't.

The passwords generated by "htpasswd" use a random salt, to make it harder to guess. It also means that pre-crypted dictionaries for attacks have to be much larger since they have to crypt every possible password with every possible salt.
htpasswd uses crypt(3) behind the scenes.

Here is a tip for you, when generating secret keys or strings, use a one_way_hash( salt + current time), these are, if not impossible, hard to crack. I normally employ this to create tokens or session keys.

Related

decode SHA1 knowing stored value [duplicate]

Is it possible to reverse a SHA-1?
I'm thinking about using a SHA-1 to create a simple lightweight system to authenticate a small embedded system that communicates over an unencrypted connection.
Let's say that I create a sha1 like this with input from a "secret key" and spice it with a timestamp so that the SHA-1 will change all the time.
sha1("My Secret Key"+"a timestamp")
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1.
Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
Note1:
I guess you could brute force in some way, but how much work would that actually be?
Note2:
I don't plan to encrypt any data, I just would like to know who sent it.
No, you cannot reverse SHA-1, that is exactly why it is called a Secure Hash Algorithm.
What you should definitely be doing though, is include the message that is being transmitted into the hash calculation. Otherwise a man-in-the-middle could intercept the message, and use the signature (which only contains the sender's key and the timestamp) to attach it to a fake message (where it would still be valid).
And you should probably be using SHA-256 for new systems now.
sha("My Secret Key"+"a timestamp" + the whole message to be signed)
You also need to additionally transmit the timestamp in the clear, because otherwise you have no way to verify the digest (other than trying a lot of plausible timestamps).
If a brute force attack is feasible depends on the length of your secret key.
The security of your whole system would rely on this shared secret (because both sender and receiver need to know, but no one else). An attacker would try to go after the key (either but brute-force guessing or by trying to get it from your device) rather than trying to break SHA-1.
SHA-1 is a hash function that was designed to make it impractically difficult to reverse the operation. Such hash functions are often called one-way functions or cryptographic hash functions for this reason.
However, SHA-1's collision resistance was theoretically broken in 2005. This allows finding two different input that has the same hash value faster than the generic birthday attack that has 280 cost with 50% probability. In 2017, the collision attack become practicable as known as shattered.
As of 2015, NIST dropped SHA-1 for signatures. You should consider using something stronger like SHA-256 for new applications.
Jon Callas on SHA-1:
It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off.
The question is actually how to authenticate over an insecure session.
The standard why to do this is to use a message digest, e.g. HMAC.
You send the message plaintext as well as an accompanying hash of that message where your secret has been mixed in.
So instead of your:
sha1("My Secret Key"+"a timestamp")
You have:
msg,hmac("My Secret Key",sha(msg+msg_sequence_id))
The message sequence id is a simple counter to keep track by both parties to the number of messages they have exchanged in this 'session' - this prevents an attacker from simply replaying previous-seen messages.
This the industry standard and secure way of authenticating messages, whether they are encrypted or not.
(this is why you can't brute the hash:)
A hash is a one-way function, meaning that many inputs all produce the same output.
As you know the secret, and you can make a sensible guess as to the range of the timestamp, then you could iterate over all those timestamps, compute the hash and compare it.
Of course two or more timestamps within the range you examine might 'collide' i.e. although the timestamps are different, they generate the same hash.
So there is, fundamentally, no way to reverse the hash with any certainty.
In mathematical terms, only bijective functions have an inverse function. But hash functions are not injective as there are multiple input values that result in the same output value (collision).
So, no, hash functions can not be reversed. But you can look for such collisions.
Edit
As you want to authenticate the communication between your systems, I would suggest to use HMAC. This construct to calculate message authenticate codes can use different hash functions. You can use SHA-1, SHA-256 or whatever hash function you want.
And to authenticate the response to a specific request, I would send a nonce along with the request that needs to be used as salt to authenticate the response.
It is not entirely true that you cannot reverse SHA-1 encrypted string.
You cannot directly reverse one, but it can be done with rainbow tables.
Wikipedia:
A rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a plaintext password up to a certain length consisting of a limited set of characters.
Essentially, SHA-1 is only as safe as the strength of the password used. If users have long passwords with obscure combinations of characters, it is very unlikely that existing rainbow tables will have a key for the encrypted string.
You can test your encrypted SHA-1 strings here:
http://sha1.gromweb.com/
There are other rainbow tables on the internet that you can use so Google reverse SHA1.
Note that the best attacks against MD5 and SHA-1 have been about finding any two arbitrary messages m1 and m2 where h(m1) = h(m2) or finding m2 such that h(m1) = h(m2) and m1 != m2. Finding m1, given h(m1) is still computationally infeasible.
Also, you are using a MAC (message authentication code), so an attacker can't forget a message without knowing secret with one caveat - the general MAC construction that you used is susceptible to length extension attack - an attacker can in some circumstances forge a message m2|m3, h(secret, m2|m3) given m2, h(secret, m2). This is not an issue with just timestamp but it is an issue when you compute MAC over messages of arbitrary length. You could append the secret to timestamp instead of pre-pending but in general you are better off using HMAC with SHA1 digest (HMAC is just construction and can use MD5 or SHA as digest algorithms).
Finally, you are signing just the timestamp and the not the full request. An active attacker can easily attack the system especially if you have no replay protection (although even with replay protection, this flaw exists). For example, I can capture timestamp, HMAC(timestamp with secret) from one message and then use it in my own message and the server will accept it.
Best to send message, HMAC(message) with sufficiently long secret. The server can be assured of the integrity of the message and authenticity of the client.
You can depending on your threat scenario either add replay protection or note that it is not necessary since a message when replayed in entirety does not cause any problems.
Hashes are dependent on the input, and for the same input will give the same output.
So, in addition to the other answers, please keep the following in mind:
If you start the hash with the password, it is possible to pre-compute rainbow tables, and quickly add plausible timestamp values, which is much harder if you start with the timestamp.
So, rather than use
sha1("My Secret Key"+"a timestamp")
go for
sha1("a timestamp"+"My Secret Key")
I believe the accepted answer is technically right but wrong as it applies to the use case: to create & transmit tamper evident data over public/non-trusted mediums.
Because although it is technically highly-difficult to brute-force or reverse a SHA hash, when you are sending plain text "data & a hash of the data + secret" over the internet, as noted above, it is possible to intelligently get the secret after capturing enough samples of your data. Think about it - your data may be changing, but the secret key remains the same. So every time you send a new data blob out, it's a new sample to run basic cracking algorithms on. With 2 or more samples that contain different data & a hash of the data+secret, you can verify that the secret you determine is correct and not a false positive.
This scenario is similar to how Wifi crackers can crack wifi passwords after they capture enough data packets. After you gather enough data it's trivial to generate the secret key, even though you aren't technically reversing SHA1 or even SHA256. The ONLY way to ensure that your data has not been tampered with, or to verify who you are talking to on the other end, is to encrypt the entire data blob using GPG or the like (public & private keys). Hashing is, by nature, ALWAYS insecure when the data you are hashing is visible.
Practically speaking it really depends on the application and purpose of why you are hashing in the first place. If the level of security required is trivial or say you are inside of a 100% completely trusted network, then perhaps hashing would be a viable option. Hope no one on the network, or any intruder, is interested in your data. Otherwise, as far as I can determine at this time, the only other reliably viable option is key-based encryption. You can either encrypt the entire data blob or just sign it.
Note: This was one of the ways the British were able to crack the Enigma code during WW2, leading to favor the Allies.
Any thoughts on this?
SHA1 was designed to prevent recovery of the original text from the hash. However, SHA1 databases exists, that allow to lookup the common passwords by their SHA hash.
Is it possible to reverse a SHA-1?
SHA-1 was meant to be a collision-resistant hash, whose purpose is to make it hard to find distinct messages that have the same hash. It is also designed to have preimage-resistant, that is it should be hard to find a message having a prescribed hash, and second-preimage-resistant, so that it is hard to find a second message having the same hash as a prescribed message.
SHA-1's collision resistance is broken practically in 2017 by Google's team and NIST already removed the SHA-1 for signature purposes in 2015.
SHA-1 pre-image resistance, on the other hand, still exists. One should be careful about the pre-image resistance, if the input space is short, then finding the pre-image is easy. So, your secret should be at least 128-bit.
SHA-1("My Secret Key"+"a timestamp")
This is the pre-fix secret construction has an attack case known as the length extension attack on the Merkle-Damgard based hash function like SHA-1. Applied to the Flicker. One should not use this with SHA-1 or SHA-2. One can use
HMAC-SHA-256 (HMAC doesn't require the collision resistance of the hash function therefore SHA-1 and MD5 are still fine for HMAC, however, forgot about them) to achieve a better security system. HMAC has a cost of double call of the hash function. That is a weakness for time demanded systems. A note; HMAC is a beast in cryptography.
KMAC is the pre-fix secret construction from SHA-3, since SHA-3 has resistance to length extension attack, this is secure.
Use BLAKE2 with pre-fix construction and this is also secure since it has also resistance to length extension attacks. BLAKE is a really fast hash function, and now it has a parallel version BLAKE3, too (need some time for security analysis). Wireguard uses BLAKE2 as MAC.
Then I include this SHA-1 in the communication and the server, which can do the same calculation. And hopefully, nobody would be able to figure out the "secret key".
But is this really true?
If you know that this is how I did it, you would know that I did put a timestamp in there and you would see the SHA-1. Can you then use those two and figure out the "secret key"?
secret_key = bruteforce_sha1(sha1, timestamp)
You did not define the size of your secret. If your attacker knows the timestamp, then they try to look for it by searching. If we consider the collective power of the Bitcoin miners, as of 2022, they reach around ~293 double SHA-256 in a year. Therefore, you must adjust your security according to your risk. As of 2022, NIST's minimum security is 112-bit. One should consider the above 128-bit for the secret size.
Note1: I guess you could brute force in some way, but how much work would that actually be?
Given the answer above. As a special case, against the possible implementation of Grover's algorithm ( a Quantum algorithm for finding pre-images), one should use hash functions larger than 256 output size.
Note2: I don't plan to encrypt any data, I just would like to know who sent it.
This is not the way. Your construction can only work if the secret is mutually shared like a DHKE. That is the secret only known to party the sender and you. Instead of managing this, a better way is to use digital signatures to solve this issue. Besides, one will get non-repudiation, too.
Any hashing algorithm is reversible, if applied to strings of max length L. The only matter is the value of L. To assess it exactly, you could run the state of art dehashing utility, hashcat. It is optimized to get best performance of your hardware.
That's why you need long passwords, like 12 characters. Here they say for length 8 the password is dehashed (using brute force) in 24 hours (1 GPU involved). For each extra character multiply it by alphabet length (say 50). So for 9 characters you have 50 days, for 10 you have 6 years, and so on. It's definitely inaccurate, but can give us an idea, what the numbers could be.

Reason for salting a password for webservice

I have very basic question related to user management and in particular storing hashed passwords.
I read few pages (like https://wiki.python.org/moin/Md5Passwords ).
The way I understand hashing is this:
password provided by user is hashed (with whatever function) one way.
nobody (including user/admin) is able to see the password.
when user logs in - the string provided by him is hashed to see if it matches stored hashed password.
That's all clear, however I am not sure what with 'salt' in hashing.
I read os.urandom (Python) is good to create good salt:
https://crackstation.net/hashing-security.htm
What I am not sure is how to work with this added "salt"
If I hash user password with salt and its one way. The next time when user log in he knows only password and not salt. From this I assume that "salt" generated for this user needs to be stored somewhere. Otherwise it will not make sense. But on the other hand if somebody gets access to DB then will see "salt" and hashed password. In such case "salt" does not add much value (its pretty much the same as hashing pure password). So maybe the "salt" is just to prevent protection on front end (against brute force).
Can somebody provide me a hint how to work with salt? Is my understanding correct. Do I need to store "salt" somewhere?
Before I posted this question I found this:
Should the Salt for a password Hash be "hashed" also?
what is the added value of the salt?
if I write web service I can block each log in after 3 failed attempts.
Nobody on the front end is able to see hashed values. Nobody can use brute force (this might be only DoS since 3 failed log ins will block user). The hacker will need have access to DB and see hashed passwords. But if he has, he will see "salt".
Salt is used to prevent a hacker from reversing the password hashes into passwords. So here we assume that somewhow the hacker has access to the database.
Without salt
Let us first assume the scenario without salt. In that case the table looks like:
user | md5 password (first 6 chars)
-------------------------------
1 | 1932ff
2 | d3b073
(we here make the situation simpler than it is in reality)
The hacker of course wants to know what the passwords behind d3b073 and 1932ff are. A hash function is one directional in the sense that we can hash a password very fast, but unhashing it will - given it is a good hashing function - take a very long time, after guessing a huge amount of passwords.
So there is not much hope to easily retrieve the possible password(s) behind d3b073. But we can easily find a list of the 100'000 most popular passwords, and calculate the MD5 hash of all these passwords. Such list could look like:
password | md5 (first 6 characters)
--------------------------------------------
foo | d3b073
bar | c157a7
So apparently user 2 has used foo as password. The password of user 1 is unknown to us (but we know it is not foo or bar).
Now the point is that we can construct such table once and then use it to crack all passwords of all the users. Constructing such table for 100'000 passwords might perhaps take a few hours, but then we can easily retrieve all passwords. So a hacker can construct (or download) such table (there are more efficient ways, for instance with rainbow tables), and then use it each time he/she hacks a website and then obtains the passwords of all users.
With salt
If we however use salting, the table could look like this:
user | salt | hashed password
-------------------------------
1 | a91f40 | 1a604e
2 | c2a67c | b36232
So here if the password of user 2 is foo, then we calculate the hash of fooc2a67c (or we use another way to combine the salt and the password) and store this into the database.
The point is that it is very hard to guess the password, since b36232 is not the hash of foo, but of fooc2a67c and the salt is typically something (pseudo)-random. We can of course again construct the most popular 100'000 passwords with salt c2a67c appended to it, but since we can not know the salt in advance, we can not create this table only once. Even if we are lucky and already constructed the table for salt c2a67c, it will not help us with hacking the password of user 1, since user 1 has a different salt.
So the only way to resolve this, is by constructing a reverse hash lookup table, for every user. Since it is usually very expensive to construct such table once, it will not be easy to calculate such table for every user.
We might of course decide to calculate all hashes of all possible salts, like for instance:
password | md5 (first 6 characters)
---------------------------------------------
foo000000 | 367390
foo000001 | eca8ea
foo000002 | 6eb7bf
foo000003 | 7906b1
foo000004 | 0e9f0c
foo000005 | 0bfb11
... | ...
But as you can see, the size of such table would grow to gigantic sizes. Furthermore it would take thousands of years. Even if we add only one hexadecimal character as salt, the size of the table would scale 16 times. Yes there are some techniques to reduce the amount of time and space for such table, but by increasing the "password space", the problem to hack passwords, will definitely be much harder. Furthermore salt is usally a signifcant amount of characters (or bytes) long making it way more harder than just 16 times more.
Basically salt acts as a way to enlarge the password space. Even if you enter the very same password on two websites, the personal salt of the websites will (close to certainty) be unique, and therefore the hash will be unique as well.

Setting Key of System.Security.Cryptography.AesManaged in C#

When I instantiate AesManaged in C#, it already has the .Key property set. Is it safe to use it? I.e. is it cryptographically strong and random enough for each new instantiation of AesManaged (and every time I call .GenerateKey() on an existing instance)?
All examples I've seen, first generate a random password and then use a key derivation function like Rfc2898DeriveBytes or PasswordDeriveBytes to generate the Key (e.g. How to use 'System.Security.Cryptography.AesManaged' to encrypt a byte[]?). This requires additional information - like salt value, number of password iterations, what hash algorithm to use.
I understand I need all that if I want my users to come up with passwords. I then need to produce random cryptographically strong Keys from them. But if everything is generated by the computer, do I need to programmatically generate random passwords and then Keys from them, or can I just use whatever AesManaged.Key contains?
Yes, you can use the default Key and IV values if you like. You can also explicitly regenerate a new random one with:
SymmetricAlgorithm.GenerateKey() or SymmetricAlgorithm.CreateEncryptor(null, null)
It depends on what you are protecting and how many owners of information you need to support. If speed / volume doesn't matter, then you are still better adopting PBKDF2 by using Rfc2898DeriveBytes for the iterations.
Regardless, you don't want to share the key across multiple users / tenants / security "realms", however, but sure you can use the default key for a single application. If you do, combine the salt with it.
The reasons we use user defined passwords and salts are to avoid attacks that exploit common/weak passwords or shared passwords between users and to ensure as application owners don't know their keys.
The reasons we use PBKDF2 (derivation with many iterations) is to slow down the attacker. Penalty we pay 1 time per user is paid many times by an attacker.
If your needs are just to have a random key for a single application or system, then the default is usable, assuming, of course, it provides the strength you need.

Identifying password similarity without storing in plain text?

One of my SaaS software vendors requires me to change passwords every 90 days, which is good.
What surprises me though, is that the password change screen errors with a note that my new password is too similar to an old password.
This most often happens if I change less than three or four of the characters within a password.
If it were an exact match to an old password, I would have confidence that they are hashing my password, and comparing the hashes. The "similarity" matching makes me think they are storing and comparing the plaintext versions.
Is it possible to determine "similarity" by comparing one hash to another, or is this vendor more likely storing my password in plain-text?
It's possible. Whenever you change the password, the software could create hash codes for all combinations of the same password with a few characters masked or removed.
If your password is hello, it could create hash codes for _ello, h_llo, he_lo, hel_o, hell_, __llo, _e_lo, _ell_, he_l_, he__o... et.c.
The next time you change the password, it can create the same set of combinations of that password, and compare to all the previous hash codes. If there is a match, only a few characters were changed.
It's a lot simpler to just save the passwords in plain text, of course.
This depends whether they are checking all old passwords, or just your last one.
The last one will be available in memory if you had to enter your old password in order to set a new one. A form usually asks for three inputs: old password, new password and confirm new password.
If they are storing your last few passwords in hashed form, they would be able to check these for an exact match, and they could check your previous password for similarities using an algorithm using the old password that you just re-entered.
In all likelihood they are storing the plain text. With a good hashing algorithm there should be no correlation between the original content and the hash value (that is what makes it good).
It is possible they are storing some characteristics of the original password to use as reference. For example the counts of characters, any numeric value, etc., and then comparing to that but I doubt it.
One way to do this is by reducing the space of the password.
For example, if you think that "Hello" and "h3LL0" are similar, then you can make a reduce() function that changes the string to uppercase and changes all vowels and digits to #. Both "Hello" and "h3LL0" will be reduced to "H#LL#".
In the database you need to store hash() of the current password and hash(reduce()) of the current and all previous passwords.
You can design any policy of similarity you want, as long as you can make a suitable reduce() function.

Am I misunderstanding what a hash salt is?

I am working on adding hash digest generating functionality to our code base. I wanted to use a String as a hash salt so that a pre-known key/passphrase could be prepended to whatever it was that needed to be hashed. Am I misunderstanding this concept?
A salt is a random element which is added to the input of a cryptographic function, with the goal of impacting the processing and output in a distinct way upon each invocation. The salt, as opposed to a "key", is not meant to be confidential.
One century ago, cryptographic methods for encryption or authentication were "secret". Then, with the advent of computers, people realized that keeping a method completely secret was difficult, because this meant keeping software itself confidential. Something which is regularly written to a disk, or incarnated as some dedicated hardware, has trouble being kept confidential. So the researchers split the "method" into two distinct concepts: the algorithm (which is public and becomes software and hardware) and the key (a parameter to the algorithm, present in volatile RAM only during processing). The key concentrates the secret and is pure data. When the key is stored in the brain of a human being, it is often called a "password" because humans are better at memorizing words than bits.
Then the key itself was split later on. It turned out that, for proper cryptographic security, we needed two things: a confidential parameter, and a variable parameter. Basically, reusing the same key for distinct usages tends to create trouble; it often leaks information. In some cases (especially stream ciphers, but also for hashing passwords), it leaks too much and leads to successful attacks. So there is often a need for variability, something which changes every time the cryptographic method runs. Now the good part is that most of the time, variability and secret need not be merged. That is, we can separate the confidential from the variable. So the key was split into:
the secret key, often called "the key";
a variable element, usually chosen at random, and called "salt" or "IV" (as "Initial Value") depending on the algorithm type.
Only the key needs to be secret. The variable element needs to be known by all involved parties but it can be public. This is a blessing because sharing a secret key is difficult; systems used to distribute such a secret would find it expensive to accommodate a variable part which changes every time the algorithm runs.
In the context of storing hashed passwords, the explanation above becomes the following:
"Reusing the key" means that two users happen to choose the same password. If passwords are simply hashed, then both users will get the same hash value, and this will show. Here is the leakage.
Similarly, without a hash, an attacker could use precomputed tables for fast lookup; he could also attack thousands of passwords in parallel. This still uses the same leak, only in a way which demonstrates why this leak is bad.
Salting means adding some variable data to the hash function input. That variable data is the salt. The point of the salt is that two distinct users should use, as much as possible, distinct salts. But password verifiers need to be able to recompute the same hash from the password, hence they must have access to the salt.
Since the salt must be accessible to verifiers but needs not be secret, it is customary to store the salt value along with the hash value. For instance, on a Linux system, I may use this command:
openssl passwd -1 -salt "zap" "blah"
This computes a hashed password, with the hash function MD5, suitable for usage in the /etc/password or /etc/shadow file, for the password "blah" and the salt "zap" (here, I choose the salt explicitly, but under practical conditions it should be selected randomly). The output is then:
$1$zap$t3KZajBWMA7dVxwut6y921
in which the dollar signs serve as separators. The initial "1" identifies the hashing method (MD5). The salt is in there, in cleartext notation. The last part is the hash function output.
There is a specification (somewhere) on how the salt and password are sent as input to the hash function (at least in the glibc source code, possibly elsewhere).
Edit: in a "login-and-password" user authentication system, the "login" could act as a passable salt (two distinct users will have distinct logins) but this does not capture the situation of a given user changing his password (whether the new password is identical to an older password will leak).
You are understanding the concept perfectly. Just make sure the prepended salt is repeatable each and every time.
If I'm understanding you correctly, it sounds like you've got it right. The psuedocode for the process looks something like:
string saltedValue = plainTextValue + saltString;
// or string saltedalue = saltString + plainTextValue;
Hash(saltedValue);
The Salt just adds another level of complexity for people trying to get at your information.
And it's even better if the salt is different for each encrypted phrase since each salt requires its own rainbow table.
Its worth mentioning that even though the salt should be different for each password usage, your salt should in NO WAY be computed FROM the password itself! This sort of thing has the practical upshot of completely invalidating your security.