I need an hash algorithm that outputs an alphanumeric string that is max 20 characters long. For "alphanumeric" I mean [a-zA-Z0-9].
Inputs are UUIDs in canonical form (example 550e8400-e29b-41d4-a716-446655440000)
In alternative is there a way to convert a SHA1 or MD5 hash to a string with these limitations?
Thanks.
EDIT
Doesn't need to be cryptographically secure. Collisions make data inaccurate, but if they happen sporadically I can live with it.
EDIT 2
I don't know if truncating MD5 or SHA1 would make collisions happen too often. Now I'm wondering if it's better to truncate to 20 chars a MD5 value or a SHA1 value.
Just clip the characters you don't need from the hash of the GUID. With a good hash function, the unpredictability of any part of the hash is proportional to the part's size. If you want, you can encode it base 32 instead of the standard hex base 16. Bear in mind that this will not significantly improve entropy per character (only by 25%).
For non-cryptographic uses, it does not matter whether you truncate MD5, SHA1 or SHA2. Neither has any glaring deficiencies in entropy.
Related
I want to create a unique hash for a given string and I was wondering if there is a difference in duplicate hashes for md5 and sha1.
Lets for the sake of argument assume the following code:
foo = "gdfgkldng"
bar = "fdsfdsf"
md5(foo)
>>>> "25f709d867523ff6958784d399f138d9"
md5(bar)
>>>> "25f709d867523ff6958784d399f138d9"
Is there a difference in the probability of this occurring between sha1 and md5? Also: if I use strings that have a big overlap ("blabla1", "blabla2") is there a difference?
BTW. I am not interested in the security of the algorithms I just want to create a hash that is as unique as possible.
MD5 has a digest size of 128 bits. SHA-1 has a digest size of 160 bits. Even ignoring discovered weaknesses, MD5 is going to produce more collisions just because it has a smaller output space.
Consider using SHA-256 instead; it has a digest size of 256 bits (obviously), and furthermore hasn't been broken in a meaningful way.
I understand not wanting to use '\0', but all the rest in the extended ASCII range is usable right?
Wouldn't this provide a much better/secure/"less coliding" hash?
You're starting from false premise -- they produce a result that can (does) include all 8-bit values from 0 to 255. Just for example, one of the test vectors for SHA-256 is an input of "abc". The result from this (in hexadecimal) is:
ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad
Just within that test, the result includes bytes with values from 0x03 to 0xff.
For display, that may be (often is) rendered in something like hexadecimal. For transmission in email they're often encoded with something like MIME or UUENCODE. The hash itself, however, is not limited in this way.
Transforming the result this way makes no difference to collision resistance -- you still have 160/256/whatever bits of actual data, but the representation is expanded.
The result is just hexadecimal encoded to be better readable.
In fact, those hash algorithms are outputting numbers, not strings. They use only letters a-f in combination with numbers 0-9, which makes the output a hexadecimal number.
MD5 produces an 128 bit hash. (16 byte)
sha, depending of whether is sha1 or sha256 produces either 160 bit (20 byte) or 256 bit (32 byte) hash.
Note that I'm talking about binary length/strength. The longer the less likely a collision occurs.
The fact that most users stick it into a DB field or whatnot makes it convenient to convert it to ASCII using varions binary-ascii conversion algos. This should not influence the strength of collision probability at all since you'll end up with a larger ASCII string.
FWIW I've been using SHA1, SHA256 in crypto products in binary form for over 5 years and I'd recommend choosing hashes in this following order, from the strongest to the weakest: SHA256, SHA1, MD5. There is a website that can "reverse" MD5 so I'd strongly suggest against it.
I need a hash that can be represented in less than 26 chars
Md5 produces 32 chars long string , if convert it to base 36 how good will it be,
I am need of hash not for cryptography but rather for uniqueness basically identifying each input dependent on time of input and input data. currently i can think of this as
$hash=md5( str_ireplace(".","",microtime()).md5($input_data) ) ;
$unique_id= base_convert($hash,16,36) ;
should go like this or use crc32 which will give smaller hash size but i afraid it wont be that unique ?
I think a much simpler solution could take place.
According to your statement, you have 26 characters of space. However, to clarify what I understand to be character and what you understand to be character, let's do some digging.
The MD5 hash acc. to wikipedia produces 16 byte hashes.
The CRC32 algorithm prodces 4 byte hashes.
I understand "characters" (in the most simplest sense) to be ASCII characters. Each ascii character (eg. A = 65) is 8 bits long.
The MD5 aglorithm produces has 16 bytes * 8 bits per byte = 128 bits, CRC32 is 32 bits.
You must understand that hashes are not mathematically unique, but "likely to be unique."
So my solution, given your description, would be to then represent the bits of the hash as ascii characters.
If you only have the choice between MD5 and CRC32, the answer would be MD5. But you could also fit a SHA-1 160 bit hash < 26 character string (it would be 20 ascii characters long).
If you are concerned about the set of symbols that each hash uses, both hashes are in the set [A-Za-z0-9] (I believe).
Finally, when you convert what are essentially numbers from one base to another, the number doesn't change, therefore the strength of the algorithm doesn't change; it just changes the way the number is represented.
Suppose I have only the first 16 characters of a MD5 hash. If I use brute force attack or rainbow tables or any other method to retrieve the original password, how many compatible candidates have I to expect? 1? (I do not think) 10, 100, 1000, 10^12? Even a rough answer is welcome (for the number, but please be coherent with hash theory and methodology).
The output of MD5 is 16 bytes (128 bits). I suppose that you are talking about an hexadecimal representation, hence as 32 characters. Thus, "16 characters" means "64 bits". You are considering MD5 with its output truncated to 64 bits.
MD5 accepts inputs up to 264 bits in length; assuming that MD5 behaves as a random function, this means that the 218446744073709551616 possible input strings will map more or less uniformly among the 264 outputs, hence the average number of candidates for a given output is about 218446744073709551552, which is close to 105553023288523357112.95.
However, if you consider that you can find at least one candidate, then this means that the space of possible passwords that you consider is much reduced. A rainbow table is a special kind of precomputed table which accepts a compact representation (at the expense of a relatively expensive lookup procedure), but if it covers N passwords, then this means that, at some point, someone could apply the hash function N times. In practice, this severely limits the size N. Assuming N=260 (which means that the table builder had about one hundred NVidia GTX 580 GPU and could run them for six months; also, the table will use quite a lot of hard disks), then, on average, only 1/16th of 64-bit outputs have a matching password in the table. For those passwords which are in the table, there is a 93.75% probability that there is no other password in the table which leads to the same output; if you prefer, if you find a matching password, then you will find, on average, 0.0625 other candidates (i.e. most of the time, no other candidate).
In brief, the answer to your question depends on the size N of the space of possible passwords that you consider (those which were covered during rainbow table construction); but, in practice with Earth-based technology, if you can find one matching password for a 64-bit output, chances are that you will not be able to find another (although there are are really many others).
You should never ever be able to get a password from a partial hash.
I'd like to squeeze or compress the result hash value from MD5 or SHA1 at a server side application so that at the client can decompress it or desqueeze it , is this possible ? its a usability issue for my application.
No, hash values cannot be compressed. By design their bits are highly random and have maximum entropy, so there is no redundancy to compress.
If you want to make the hash values easier to read for users you can use different tricks, such as:
Displaying fewer digits. Instead of 32 digits just show 16.
Using a different base. For instance, if you used base 62 using all the uppercase and lowercase letters plus numbers 0-9 as digits then you could show a 128-bit hash using 22 letters+digits versus 32 hex digits:
log62 (2128) ≈ 21.5
Adding whitespace or punctuation. You'll commonly see CD keys printed with dashes like AX7T4-BZ41O-JK3FF-QOZ96. It's easier for users to read this than 20 digits all jammed together.
Hash values are quite short; attempting compression on these (quite random and highly varied) values is difficult and inefficient. If you want to save space, truncating the value could help, but keep in mind that if you do this, you increase collision space (and decrease key space).