Can an MD5 hash have ONLY numbers or ONLY letters in it? - hash

I have been researching but I am clueless.
I know that MD5 can have both numbers and letters but if I ever find a case where an MD5 has only numbers or only letters it breaks my script currently

List of few first strings that give only-digit md5 hash:
ximaz : 61529519452809720693702583126814
aalbke : 55203129974456751211900188750366
afnnsd : 49716523209578759475317816476053
aooalg : 68619150135523129199070648991237
bzbkme : 69805916917525281143075153085385
Here's one with only letters:
cbaabcdljdac : cadbfdfecdcdcdacdbbbfadbcccefabd

You have 32 digits. If we assume all ciphers equally distributed, there are 10^32 combinations, just made of numeric ciphers, 6^32 combinations of just alphabetic ciphers, and 16^32 combinations in total.
Which makes a (10^32 + 6^32) / 16^32 probability that your script will fail, on each invocation.
echo "scale=10;(10^32 + 6^32) / 16^32" | bc
.0000002938
So once in about 3.4 million cases it will fail. How often do you want to use it?

Theoretically, yes, an MD5 hash (when converted to a hexadecimal string) could contain only decimal digits or only letters.
In practice, also yes: the string ximaz yields an MD5 hash of 61529519452809720693702583126814. Try it!
(Thanks to PHP Sadness for the example)

MD5 was intended to be a good hash function (currently broken, should not be used security applications) which means that it produces random looking output so that all possible values that fit into output space are utilized. Those letters and numbers are hex representation of the output. Yes, sometimes you could get output that consists of letters only or numbers only, but most of the time you will have both.
If I had to parse hex representations of MD5 I would surely take time to support those rather rare cases when output is letters only or numbers only.

I know this is a very old question, but I found three more strings with only numbers in their md5 hashes, and Google couldn't find anything while searching these hashes so I thought it might be worth posting these:
Ioktak : 54948232518148653519995784773259
'99x\`b0x\'b : 24034969117462298298932307218853
uttuJ## : 74616072929762262275291990931711

I believe you are working with the hex representation of the MD5 hashes. MD5 hashes are actually 128-bit strings. Most tools print them with the hex-representation which amounts to 32 hexadecimal digits. Hexadecimal digits use 0-9 and a-f.
Example:
susam#swift:~$ echo -n "foo" | md5sum
acbd18db4cc2f85cedef654fccc4a4d8 -

Related

Are MD5 hashes always either capital or lowercase?

I'm passing an HMAC-MD5 encoded parameter into a form and the vendor is returning it as invalid. However, it matches what their hash generator gives me, with the exception of capitalization on the letters. What I did to get around this was use an lcase command. I'm wondering if this will cause me trouble later. Coldfusion generates the hashed string in capital letters, the vendor always seems to use lowercase; is it always one or the other or will they ever be mixed?
MD5 as every other hash function will produce binary output, in case of MD5 it is 16 bytes.
Because those bytes are difficult to handle, they are encoded to a string. In case of MD5 they are usually encoded to 32 lowercase hexadecimal digits, so every byte is represented by 2 characters.
Whether the target system accepts upper- or lowercase encodings or both is up to the system, it is unrelated to the hash function, both are different representations of a the same MD5 hash. So to answer your question, format the output as the target system requires it.
While RFC-1321 MD5 Message-Digest Algorithm doesn't discuss hexadecimal string encoding, the test suite does show results in lowercase.
The MD5 test suite (driver option "-x") should print the following results:
MD5 test suite:
MD5 ("") = d41d8cd98f00b204e9800998ecf8427e
MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661
MD5 ("abc") = 900150983cd24fb0d6963f7d28e17f72
MD5 ("message digest") = f96b697d7cb7938d525a2f31aaf161d0
MD5 ("abcdefghijklmnopqrstuvwxyz") = c3fcd3d76192e4007dfb496cca67e13b
MD5 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") =
d174ab98d277d9f5a5611c2c9f419d9f
MD5 ("123456789012345678901234567890123456789012345678901234567890123456
78901234567890") = 57edf4a22be3c955ac49da2e2107b67a
Lowercase is simply the outcome of C/C++ printf() format specifier %02x, not a requirement: "should print", not "must print".
Ref: RFC-1321 Appendix A.5 Test suite
A hex string can contain anything in the 0-9 and a-f, A-F range, so you should anticipate both upper and lower-case versions.
If you're really stuck trying to interface between two highly opinionated systems, force upper or lower case depending on your requirements.

Dovecot password hashing

Can anyone tell me how the Dovecot administration tool (doveadm pw) hashes passwords when using SHA-512. $6$ indicates SHA-512, followed by a salt, then the hash. How exactly does Dovecot generate the salt? Does it use an own algorithm? As far as I can see it uses /dev/random or /dev/urandom, but how does it deal with non-ASCII characters?
Nevermind, found out in password-scheme.c.
It reads data from /dev/urandom and has an array with allowed characters (static const char salt_chars[] = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";).
The salt is generated by using one of the characters from that array; precisely, it takes the byte from /dev/urandom modulo the length of salt_chars - 1 and uses that as index to pick a char from salt_chars.

Correct Hashing Algorithm/Function

Are there any secure hashing algorithms/functions that give all the letters and numbers, and not just 0-9,a-f.
So the output could contain: 0-9, a-z, A-Z and even some symbols.
Any hashing algorithm, really.
Hexadecimal is just a common representation for them. Look at this code snippet (using perl, because you didn't tag a programming language):
use Digest::MD5 qw/md5 md5_hex/;
use MIME::Base64;
my $str = 'Foobar';
# Hexadecimal representation
print md5_hex($str),"\n";
# Base64 encoded representation
print encode_base64(md5($str));
Output:
89d5739baabbbe65be35cbe61c88e06d
idVzm6q7vmW+NcvmHIjgbQ==
The first output is the hexadecimal representation of the MD5 digest of the string; the second is the Base64 encoded representation of the raw digest.
This would work with any digesting algorithm. It does not, however, affect how secure the underlying algorithm actually is.
Use your favorite hashing algorithm/function and convert the output to base64. A mechanism to do that in Java is here: how to convert hex to base64.
Note that the hash value will still be the same, but the presentation will be different. If there's a reason you want to use a fuller symbol set, perhaps you could edit your question.

reading and storing numbers in perl without a loss of percision (Perl)

I have a few numbers in a file in a variety of formats: 8.3, 0.001, 9e-18. I'm looking for an easy way to read them in and store them without any loss of precision. This would be easy in AWK, but how's it done in Perl? I'm only open to using Perl. Thanks!
Also, I was wondering if there's an easy way to print them in an appropriate format. For example, 8.3 should be printed as "8.3" not "8.3e0"
If they're text strings, then reading them into Perl as strings and writing them back out as strings shouldn't result in any loss of precision. If you have to do arithmetic on them, then I suggest installing the CPAN module Math::BigFloat to ensure that you don't lose any precision to rounding.
As to your second question, Perl doesn't do any reformatting unless you ask it to:
$ perl -le 'print 8.3'
8.3
Am I missing something?
From http://perldoc.perl.org/perlnumber.html:
Perl can internally represent numbers in 3 different ways: as native
integers, as native floating point numbers, and as decimal strings.
Decimal strings may have an exponential notation part, as in
"12.34e-56" . Native here means "a format supported by the C compiler
which was used to build perl".
This means that printing the number out depends on how the number is stored internal to perl, which means, in turn, that you have to know how the number is represented on input.
By and large, Perl will just do the right thing, but you should know how what compiler was used, how it represents numbers internally, and how to print those numbers. For example:
$ perldoc -f int
int EXPR
int Returns the integer portion of EXPR. If EXPR is omitted, uses $_. You should
not use this function for rounding: one because it truncates towards 0, and two
because machine representations of floating-point numbers can sometimes produce
counterintuitive results. For example, "int(-6.725/0.025)" produces -268 rather than
the correct -269; that's because it's really more like -268.99999999999994315658
instead. Usually, the "sprintf", "printf", or the "POSIX::floor" and
"POSIX::ceil" functions will serve you better than will int().
I think that if you want to read a number in explicitly as a string, your best bet would be to use unpack() with the 'A*' format.

How can I convert the tiger hash values from the official implementations into the form used by Direct Connect?

I am trying to implement a Direct Connect Client, and I am currently stuck at a point where I need to hash the files in order to be able to upload them to other clients.
As the all other clients require a TTHL (Tiger Tree Hashing Leaves) support for verification of the downloaded data. I have searched for implementations of the algorithm, and found tiger-hash-python.
I have implemented a routine that uses the hash function from before, and is able to hash large files, according to the logic specified in Tree Hash EXchange format (THEX) (basically, the tree diagram is the important part on that page).
However, the value produced by it is similar to those shown on Wikipedia, a hex digest, but is different from those shown in the DC clients I'm using for reference.
I have been unable to find out how the hex digest form is converted to this other one (39 characters, A-Z, 0-9). Could someone please explain how that is done?
Well ... I tried what Paulo Ebermann said, using the following functions:
def strdivide(list,length):
result = []
# Calculate how many blocks there are, using the condition: i*length = len(list).
# The additional maths operations are to deal with the last block which might have a smaller size
for i in range(0,int(math.ceil(float(len(list))/length))):
result.append(list[i*length:(i+1)*length])
return result
def dchash(data):
result = tiger.hash(data) # From the aformentioned tiger-hash-python script, 48-char hex digest
result = "".join([ "".join(strdivide(result[i:i+16],2)[::-1]) for i in range(0,48,16) ]) # Representation Transform
bits = "".join([chr(int(c,16)) for c in strdivide(result,2)]) # Converting every 2 hex characters into 1 normal
result = base64.b32encode(bits) # Result will be 40 characters
return result[:-1] # Leaving behind the trailing '='
The TTH for an empty file was found to be 8B630E030AD09E5D0E90FB246A3A75DBB6256C3EE7B8635A, which after the transformation specified here, becomes 5D9ED00A030E638BDB753A6A24FB900E5A63B8E73E6C25B6. Base-32 encoding this result yielded LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHHHZWCLNQ, which was found to be what DC++ generates.
The only mention of the format of the hash in the Direct Connect protocol I found is on the $SR page on the NMDC Protocol wiki:
For files containing TTH, the <hub_name> parameter is replaced with TTH:<base32_encoded_tth_hash> (ref: TTH_Hash).
So, it is Base32-encoding. This is defined in RFC 4648 (and some earlier ones), section 6.
Basically, you are using the capital letters A-Z and the decimal digits 2 to 7, and one base32 digit represents 5 bits, while one base16 (hexadecimal) digit represents only 4 ones.
This means, each 5 hex digits map to 4 base32-digits, and for a Tiger hash (192 bits) you will need 40 base32-digits (in the official encoding, the last one would be a = padding, which seems to be omitted if you say that there are always 39 characters).
I'm not sure of an implementation of a conversion from hex (or bytes) to base32, but it shouldn't be too complicated with a lookup table and some bit-shifting.