I need a hash that can be represented in less than 26 chars
Md5 produces 32 chars long string , if convert it to base 36 how good will it be,
I am need of hash not for cryptography but rather for uniqueness basically identifying each input dependent on time of input and input data. currently i can think of this as
$hash=md5( str_ireplace(".","",microtime()).md5($input_data) ) ;
$unique_id= base_convert($hash,16,36) ;
should go like this or use crc32 which will give smaller hash size but i afraid it wont be that unique ?
I think a much simpler solution could take place.
According to your statement, you have 26 characters of space. However, to clarify what I understand to be character and what you understand to be character, let's do some digging.
The MD5 hash acc. to wikipedia produces 16 byte hashes.
The CRC32 algorithm prodces 4 byte hashes.
I understand "characters" (in the most simplest sense) to be ASCII characters. Each ascii character (eg. A = 65) is 8 bits long.
The MD5 aglorithm produces has 16 bytes * 8 bits per byte = 128 bits, CRC32 is 32 bits.
You must understand that hashes are not mathematically unique, but "likely to be unique."
So my solution, given your description, would be to then represent the bits of the hash as ascii characters.
If you only have the choice between MD5 and CRC32, the answer would be MD5. But you could also fit a SHA-1 160 bit hash < 26 character string (it would be 20 ascii characters long).
If you are concerned about the set of symbols that each hash uses, both hashes are in the set [A-Za-z0-9] (I believe).
Finally, when you convert what are essentially numbers from one base to another, the number doesn't change, therefore the strength of the algorithm doesn't change; it just changes the way the number is represented.
Related
I am looking for an hashing algorithm that generates alphanumeric output. I did few tests with MD5 , SHA3 etc and they produce hexadecimal output.
Example:
Input: HelloWorld
Output[sha3_256]: 92dad9443e4dd6d70a7f11872101ebff87e21798e4fbb26fa4bf590eb440e71b
The 1st character in the above output is 9. Since output is in HEX format, maximum possible values are [0-9][a-f]
I am trying to achieve maximum possible values for the 1st character. [0-9][a-z][A-Z]
Any ideas would be appreciated . Thanks in advance.
Where MD5 computes a 128bit hash and SHA256 a 256bit hash, the output they provide is nothing more than a 128, respectively 256 long binary number. In short, that are a lot of zero's and ones. In order to use a more human-friendly representation of binary-coded values, Software developers and system designers use hexadecimal numbers, which is a representation in base(16). For example, an 8-bit byte can have values ranging from 00000000 to 11111111 in binary form, which can be conveniently represented as 00 to FF in hexadecimal.
You could convert this binary number into a base(32) if you want. This is represented using the characters "A-Z2-7". Or you could use base(64) which needs the characters "A-Za-z0-9+/". In the end, it is just a representation.
There is, however, some practical use to base(16) or hexadecimal. In computer lingo, a byte is 8 bits and a word consists of two bytes (16 bits). All of these can be comfortably represented hexadecimally as 28 = 24×24 = 16×16. Where 28 = 25×23 = 32×8. Hence, in base(32), a byte is not cleanly represented. You already need 5 bytes to have a clean base(32) representation with 8 characters. That is not comfortable to deal with on a daily basis.
What is the $key_length in PBKDF2
It says that it will be derived from the input, but I see people using key_lengths of 256 and greater, but when I enter 256 as a key_length the output is 512 characters. Is this intentional? Can I safely use 64 as the key_length so the output is 128 characters long?
$key_length is the number of output bytes that you desire from PBKDF2. (Note that if key_length is more than the number of output bytes of the hash algorithm, the process is repeated twice, slowing down that hashing perhaps more than you desire. SHA256 gives 32 bytes of output, for example, so asking for 33 bytes will take roughly twice as long as asking for 32.)
The doubling of the length that you mention is because the code converts the output bytes to hexadecimal (i.e. 2 characters per 1 byte) unless you specify $raw_output = true. The test vectors included specify $raw_output = false, since hexadecimal is simply easier to work with and post online. Depending on how you are storing the data in your application, you can decide if you want to store the results as hex, base64, or just raw binary data.
In the IETF specification of Password-Based Cryptography Specification Version 2.0 the key length is defined as
"intended length in octets of the derived key, a positive integer, at most
(2^32 - 1) * hLen" Here hLen denotes the length in octets of the pseudorandom function output. For further details on pbkdf2 you can refer How to store passwords securely with PBKDF2
I understand not wanting to use '\0', but all the rest in the extended ASCII range is usable right?
Wouldn't this provide a much better/secure/"less coliding" hash?
You're starting from false premise -- they produce a result that can (does) include all 8-bit values from 0 to 255. Just for example, one of the test vectors for SHA-256 is an input of "abc". The result from this (in hexadecimal) is:
ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad
Just within that test, the result includes bytes with values from 0x03 to 0xff.
For display, that may be (often is) rendered in something like hexadecimal. For transmission in email they're often encoded with something like MIME or UUENCODE. The hash itself, however, is not limited in this way.
Transforming the result this way makes no difference to collision resistance -- you still have 160/256/whatever bits of actual data, but the representation is expanded.
The result is just hexadecimal encoded to be better readable.
In fact, those hash algorithms are outputting numbers, not strings. They use only letters a-f in combination with numbers 0-9, which makes the output a hexadecimal number.
MD5 produces an 128 bit hash. (16 byte)
sha, depending of whether is sha1 or sha256 produces either 160 bit (20 byte) or 256 bit (32 byte) hash.
Note that I'm talking about binary length/strength. The longer the less likely a collision occurs.
The fact that most users stick it into a DB field or whatnot makes it convenient to convert it to ASCII using varions binary-ascii conversion algos. This should not influence the strength of collision probability at all since you'll end up with a larger ASCII string.
FWIW I've been using SHA1, SHA256 in crypto products in binary form for over 5 years and I'd recommend choosing hashes in this following order, from the strongest to the weakest: SHA256, SHA1, MD5. There is a website that can "reverse" MD5 so I'd strongly suggest against it.
I need an hash algorithm that outputs an alphanumeric string that is max 20 characters long. For "alphanumeric" I mean [a-zA-Z0-9].
Inputs are UUIDs in canonical form (example 550e8400-e29b-41d4-a716-446655440000)
In alternative is there a way to convert a SHA1 or MD5 hash to a string with these limitations?
Thanks.
EDIT
Doesn't need to be cryptographically secure. Collisions make data inaccurate, but if they happen sporadically I can live with it.
EDIT 2
I don't know if truncating MD5 or SHA1 would make collisions happen too often. Now I'm wondering if it's better to truncate to 20 chars a MD5 value or a SHA1 value.
Just clip the characters you don't need from the hash of the GUID. With a good hash function, the unpredictability of any part of the hash is proportional to the part's size. If you want, you can encode it base 32 instead of the standard hex base 16. Bear in mind that this will not significantly improve entropy per character (only by 25%).
For non-cryptographic uses, it does not matter whether you truncate MD5, SHA1 or SHA2. Neither has any glaring deficiencies in entropy.
I'd like to squeeze or compress the result hash value from MD5 or SHA1 at a server side application so that at the client can decompress it or desqueeze it , is this possible ? its a usability issue for my application.
No, hash values cannot be compressed. By design their bits are highly random and have maximum entropy, so there is no redundancy to compress.
If you want to make the hash values easier to read for users you can use different tricks, such as:
Displaying fewer digits. Instead of 32 digits just show 16.
Using a different base. For instance, if you used base 62 using all the uppercase and lowercase letters plus numbers 0-9 as digits then you could show a 128-bit hash using 22 letters+digits versus 32 hex digits:
log62 (2128) ≈ 21.5
Adding whitespace or punctuation. You'll commonly see CD keys printed with dashes like AX7T4-BZ41O-JK3FF-QOZ96. It's easier for users to read this than 20 digits all jammed together.
Hash values are quite short; attempting compression on these (quite random and highly varied) values is difficult and inefficient. If you want to save space, truncating the value could help, but keep in mind that if you do this, you increase collision space (and decrease key space).