I am using a AES algorithm in my application for encrypting plain text. I am trying to use a key which is a six digit number. But as per the AES spec, the key should be minimum sixteen bytes in length. I am planning to append leading zeros to my six digit number to make it a 16 byte and then use this as a key.
Would it have any security implications ? I mean will it make my ciphertext more prone to attacks.
Please help.
You should use a key derivation function, in particular PBKDF2 is state-of-the-art in obtaining an AES key from a password or PIN.
In particular, PBKDF2 makes more difficult to perform a key search because it:
randomizes the key, therefore making precomputed password dictionaries useless;
increases the computational cost of testing each candidate password increasing the total time required to find a key.
As an additional remark, I would say that 6 digits correspond roughly to 16 bits of password entropy, which are definitely too few. Increase your password length.
Related
Using the Express.js framework and crypto to hash a password with pbkdf2 I read that the default algorithm is HMAC-SHA1 but i dont understand why it hasnt been upgraded to one of the other families or SHA.
crypto.pbkdf2(password, salt, iterations, keylen, callback)
Is the keylen that we provide the variation of the the SHA we want? like SHA-256,512 etc?
Also how does HMAC change the output?
And lastly is it strong enough when SHA1 is broken?
Sorry if i am mixing things up.
Is the keylen that we provide the variation of the the SHA we want? like SHA-256,512 etc?
As you state you're hashing a password in particular, #CodesInChaos is right - keylen (i.e. the length of the output from PBKDF2) would be at most the number of bits of your HMAC's native hash function.
For SHA-1, that's 160 bits (20 bytes)
For SHA-256, that's 256 bits (32 bytes), etc.
The reason for this is that if you ask for a longer hash (keylen) than the hash function supports, the first native length is identical, so an attacker only needs to attack bits. This is the problem 1Password found and fixed when the Hashcat team found it.
Example as a proof:
Here's 22 bytes worth of PBKDF2-HMAC-SHA-1 - that's one native hash size + 2 more bytes (taking a total of 8192 iterations! - the first 4096 iterations generate the first 20 bytes, then we do another 4096 iterations for the set after that!):
pbkdf2 sha1 "password" "salt" 4096 22
4b007901b765489abead49d926f721d065a429c12e46
And here's just getting the first 20 bytes of PBKDF2-HMAC-SHA-1 - i.e. exactly one native hash output size (taking a total of 4096 iterations)
pbkdf2 sha1 "password" "salt" 4096 20
4b007901b765489abead49d926f721d065a429c1
Even if you store 22 bytes of PBKDF2-HMAC-SHA-1, an attacker only needs to compute 20 bytes... which takes about half the time, as to get bytes 21 and 22, another entire set of HMAC values is calculated and then only 2 bytes are kept.
Yes, you're correct; 21 bytes takes twice the time 20 does for PBKDF2-HMAC-SHA-1, and 40 bytes takes just as long as 21 bytes in practical terms. 41 bytes, however, takes three times as long as 20 bytes, since 41/20 is between 2 and 3, exclusive.
Also how does HMAC change the output?
HMAC RFC2104 is a way of keying hash functions, particularly those with weaknesses when you simply concatenate key and text together. HMAC-SHA-1 is SHA-1 used in an HMAC; HMAC-SHA-512 is SHA-512 used in an HMAC.
And lastly is it strong enough when SHA1 is broken?
If you have enough iterations (upper tens of thousands to lower hundreds of thousands or more in 2014) then it should be all right. PBKDF2-HMAC-SHA-512 in particular has an advantage that it does much worse on current graphics cards (i.e. many attackers) than it does on current CPU's (i.e. most defenders).
For the gold standard, see the answer #ThomasPornin gave in Is SHA-1 secure for password storage?, a tiny part of which is "The known attacks on MD4, MD5 and SHA-1 are about collisions, which do not impact preimage resistance. It has been shown that MD4 has a few weaknesses which can be (only theoretically) exploited when trying to break HMAC/MD4, but this does not apply to your problem. The 2106 second preimage attack in the paper by Kesley and Schneier is a generic trade-off which applies only to very long inputs (260 bytes; that's a million terabytes -- notice how 106+60 exceeds 160; that's where you see that the trade-off has nothing magic in it)."
SHA-1 is broken, but it does not mean its unsafe to use; SHA-256 (SHA-2) is more or less for future proofing and long term substitute. Broken only means faster than bruteforce, but no necesarily feasible or practical possible (yet).
See also this answer: https://crypto.stackexchange.com/questions/3690/no-sha-1-collision-yet-sha1-is-broken
A function getting broken often only means that we should start
migrating to other, stronger functions, and not that there is
practical danger yet. Attacks only get stronger, so it's a good idea
to consider alternatives once the first cracks begin to appear.
I am comparing personal info of individuals, specifically their name, birthdate, gender, and race by hashing a string containing all of this info, and comparing the hash objects' hexdigests. This produces a 32 digit hexadecimal number, which I am using as a primary key in a database. For example, using my identifying string would work like this:
>> import hashlib
>> id_string = "BrianPeterson08041993MW"
>> byte_string = id_string.encode('utf-8')
>> hash_id = hashlib.md5(bytesring).hexdigest()
>> print(hash_id)
'3b807ad8a8b3a3569f098a575091bc79'
At this point, I am trying to ascertain collision risk. My understanding is that MD5 doesn't have significant collision risk, at least for strings that are relatively small, which mine are (about 20-40 characters in length). However, I am not using the 128-bit digest object, but the 32 digit hexdigest.
Now, I believe the hexdigest is a compression of the digest (that is, it's stored in fewer characters), so isn't there an increased risk of collision when comparing hexdigests? Or am I off-base?
Now, I believe the hexdigest is a compression of the digest (that is, it's stored in fewer characters), so isn't there an increased risk of collision when comparing hexdigests? Or am I off-base?
[...]
I guess my question is: don't different representations have different chances to be non-unique based on how many units of information they use to do the representation vs. how many units of information the original message takes to encode? And if so, what is the best representation to use? Um, let me preface your next answer with: talk to me like I'm 10
Old question, but yes, you were a bit off base, so to speak.
It’s the number of random bits that matters, not the length of the presentation.
The digest is just a number, an integer, which could be converted to a string using different amount of distinct digits. For example, a 128-bit number shown in some different radices:
"340106575100070649932820283680426757569" (base 10)
"ffde24cb47ecbff8d6e461a67c930dc1" (base 16, hexadecimal)
"7vroicmhvcnvsddp31kpu963e1" (base 32)
Shorter is nicer and more convenient (in auth tokens etc), but each representation has the exact same information and chance of collision. Shorter representations are shorter for the same reason as why "55" is shorter than "110111", while still encoding the same thing.
This answer might also clarify things, as well as toying with code like:
new BigInteger("340106575100070649932820283680426757569").toString(2)
...or something equivalent in other languages (Java/Scala above).
On a more practical level,
[...] which I am using as a primary key in a database
I don't see why not do away with any chance of collision by using a normal autoincremented id column (BIGINT AUTO_INCREMENT in MySQL, BIGSERIAL in PostgreSQL).
An abbreviated 32-bit hexdigest (8 hex characters) would not be long enough to effectively guarantee a collision-free database of users.
The formula for the birthday collision probability is here:
What is the probability of md5 collision if I pass in 2^32 sets of string?
Using a 32-bit key would mean that your software would start to break at around 10,000 users. The collision probability would be about 1%. It gets a lot worse very fast after that. At 100,000 users, the collision probability is 69%.
A 64-bit key, and a 10 billion users is another breaking point of about 2.7% collision rate.
For 100 billion users (a generous upper bound of the earth's population for the foreseeable future), a 96-bit key is a little risky in my opinion: collision chance is about one in 100 million. Really, you need a 128-bit key, which gives you a collision rate of about 1X10^-17.
128-bit keys are 128/4 = 32 hex characters long. If you wanted to use, a shorter key, for aesthetic purposes, you need to use 23 alphanumeric characters to exceed 128 bits. Or if you use printable characters (ASCII 32-126), you could get away with 20 characters.
So when you're talking about users, you need at least 128 bits for a collision-free random key, or a 20-32 character long string, or a 128/8 = 16 byte binary representation.
I'm just wondering what is the maximum length. I read from Wikipedia that it takes a message of any length less than 2^256 bits. Does that mean 2 to the power of 256? Also, would it be more secure to hash a password multiple times? Example:
WHIRLPOOL(WHIRLPOOL(WHIRLPOOL(WHIRLPOOL("passw0rd"))))
Or does that increase the risk of collisions?
Yes, this does mean 2^256 bits. Of course, as there are 2^3 bits in a byte, you will thus have a maximum space of 2^253 bytes. Nothing to worry about.
Yes, it is better to hash multiple times. No, you don't have to worry about "cycles" (much). Many pseudo random number generators are using hashes the same way. Hash algorithms should not loose too much information and should not have a short cycle time.
Passwords hashes are should however be calculated using password based key derivation functions. The "key" is then stored. PBKDF's may use hashes (e.g. PBKDF2) or keyed block ciphers (bcrypt). Most KDF's are using message authentication codes (HMAC or MAC) instead of directly using the underlying hash algorithm or block cipher.
Input to PBKDF's is a salt and iteration count. The iteration count is used to make it harder for an attacker to brute force the system by trying out all kinds of passwords. It's basically the same as what you did above with WHIRLPOOL. Only the iteration count is normally somewhere between 1 and 10 thousand. More data is normally mixed in as well in each iteration.
More importantly, the (password specific) salt is used to make sure that duplicate passwords cannot be detected and to avoid attacks using rainbow tables. Normally the salt is about 64 to 128 bits. The salt and iteration count should be stored with the "hash".
Finally, it is probably better to use a NIST vetted hash algorithm such as SHA-512 instead of WHIRLPOOL.
I know about length of some small encrypted strings as: 160, 196 ..
What determines the size?
The size in bytes of a single "block" encrypted is the same as the key size, which is the same as the size of the modulus. The private exponent is normally about the same size, but may be smaller. The public exponent can be up to to the key size in size, but is normally much smaller to allow for more efficient encryption or verification. Most of the time it is the fourth number of Fermat, 65537.
Note that this is the size in bits of the encrypted data. The plain data must be padded. PKCS#1 v1.5 uses at most the key size - 11 bytes padding for the plain text. It is certainly smart to keep a higher margin though, say 19 bytes padding minimum (a 16 byte random instead of a 8 byte random for padding).
For this reason, and because it is expensive to perform RSA encryption/decryption, RSA is mostly used in combination with a symmetric primitive such as AES - in the case of AES a random AES symmetric secret key is encrypted instead of the plain text. That key is then used to encrypt the plain text.
I need an hash algorithm that outputs an alphanumeric string that is max 20 characters long. For "alphanumeric" I mean [a-zA-Z0-9].
Inputs are UUIDs in canonical form (example 550e8400-e29b-41d4-a716-446655440000)
In alternative is there a way to convert a SHA1 or MD5 hash to a string with these limitations?
Thanks.
EDIT
Doesn't need to be cryptographically secure. Collisions make data inaccurate, but if they happen sporadically I can live with it.
EDIT 2
I don't know if truncating MD5 or SHA1 would make collisions happen too often. Now I'm wondering if it's better to truncate to 20 chars a MD5 value or a SHA1 value.
Just clip the characters you don't need from the hash of the GUID. With a good hash function, the unpredictability of any part of the hash is proportional to the part's size. If you want, you can encode it base 32 instead of the standard hex base 16. Bear in mind that this will not significantly improve entropy per character (only by 25%).
For non-cryptographic uses, it does not matter whether you truncate MD5, SHA1 or SHA2. Neither has any glaring deficiencies in entropy.