How can I convert a message into a hash value using SHA/MD5 hashing in MATLAB? is there any builtin function or any fixed code?
There are no functions in matlab to calculate hashes. However, you can call Java (any OS) or .Net (Windows only) functions directly from matlab and either of these implement what you want.
Note that you haven't specified the encoding of the string. The hash is different if you consider the string in ASCII, UTF8, UTF16, etc.
Also note that matlab does not have 160-bit or 256-bit integer, so the hash can't obviously be a single integer.
Anyway, using .Net:
SHA256
string = 'some string';
sha256hasher = System.Security.Cryptography.SHA256Managed;
sha256 = uint8(sha256hasher.ComputeHash(uint8(string)));
dec2hex(sha256)
SHA1
sha1hasher = System.Security.Cryptography.SHA1Managed;
sha1= uint8(sha1hasher.ComputeHash(uint8(string)));
dec2hex(sha1)
Java based solution can be found in the following link
https://www.mathworks.com/matlabcentral/answers/45323-how-to-calculate-hash-sum-of-a-string-using-java
MATLAB's .NET classes appear to be a more recent creation than the JAVA hashing.
However, these classes don't have much/any public documentation available. After playing with it a bit, I found a way to specify one of several hash algorithms, as desired.
The "System.Security.Cryptography.HashAlgorithm" constructor accepts a hash algorithm name (string). Based on the string name you pass in, it returns different hasher classes (.SHA256Managed is only one type). See the example below for a complete string input ==> hash string output generation.
% Available options are 'SHA1', 'SHA256', 'SHA384', 'SHA512', 'MD5'
algorithm = 'SHA1';
% SHA1 category
hasher = System.Security.Cryptography.HashAlgorithm.Create('SHA1'); % DEFAULT
% SHA2 category
hasher = System.Security.Cryptography.HashAlgorithm.Create('SHA256');
hasher = System.Security.Cryptography.HashAlgorithm.Create('SHA384');
hasher = System.Security.Cryptography.HashAlgorithm.Create('SHA512');
% SHA3 category: Does not appear to be supported
% MD5 category
hasher = System.Security.Cryptography.HashAlgorithm.Create('MD5');
% GENERATING THE HASH:
str = 'Now is the time for all good men to come to the aid of their country';
hash_byte = hasher.ComputeHash( uint8(str) ); % System.Byte class
hash_uint8 = uint8( hash_byte ); % Array of uint8
hash_hex = dec2hex(hash_uint8); % Array of 2-char hex codes
% Generate the hex codes as 1 long series of characters
hashStr = str([]);
nBytes = length(hash_hex);
for k=1:nBytes
hashStr(end+1:end+2) = hash_hex(k,:);
end
fprintf(1, '\n\tThe %s hash is: "%s" [%d bytes]\n\n', algorithm, hashStr, nBytes);
% SIZE OF THE DIFFERENT HASHES:
% SHA1: 20 bytes = 20 hex codes = 40 char hash string
% SHA256: 32 bytes = 32 hex codes = 64 char hash string
% SHA384: 48 bytes = 48 hex codes = 96 char hash string
% SHA512: 64 bytes = 64 hex codes = 128 char hash string
% MD5: 16 bytes = 16 hex codes = 32 char hash string
REFERENCES:
1) https://en.wikipedia.org/wiki/SHA-1
2) https://defuse.ca/checksums.htm#checksums
I just used this and it works well.
Works on strings, files, different data types.
For a file I compared against CRC SHA through file explorer and got the same answer.
https://www.mathworks.com/matlabcentral/fileexchange/31272-datahash
Related
I understand that you have a hex string and perform SHA256 on it twice and then byte-swap the final hex string. The goal of this code is to find a Merkle Root by concatenating two transactions. I would like to understand what's going on in the background a bit more. What exactly are you decoding and encoding?
import hashlib
transaction_hex = "93a05cac6ae03dd55172534c53be0738a50257bb3be69fff2c7595d677ad53666e344634584d07b8d8bc017680f342bc6aad523da31bc2b19e1ec0921078e872"
transaction_bin = transaction_hex.decode('hex')
hash = hashlib.sha256(hashlib.sha256(transaction_bin).digest()).digest()
hash.encode('hex_codec')
'38805219c8ac7e9a96416d706dc1d8f638b12f46b94dfd1362b5d16cf62e68ff'
hash[::-1].encode('hex_codec')
'ff682ef66cd1b56213fd4db9462fb138f6d8c16d706d41969a7eacc819528038'
header_hex is a regular string of lower case ASCII characters and the decode() method with 'hex' argument changes it to a (binary) string (or bytes object in Python 3) with bytes 0x93 0xa0 etc. In C it would be an array of unsigned char of length 64 in this case.
This array/byte string of length 64 is then hashed with SHA256 and its result (another binary string of size 32) is again hashed. So hash is a string of length 32, or a bytes object of that length in Python 3. Then encode('hex_codec') is a synomym for encode('hex') (in Python 2); in Python 3, it replaces it (so maybe this code is meant to work in both versions). It outputs an ASCII (lower hex) string again that replaces each raw byte (which is just a small integer) with a two character string that is its hexadecimal representation. So the final bit reverses the double hash and outputs it as hexadecimal, to a form which I usually call "lowercase hex ASCII".
What is the difference between string and character class in MATLAB?
a = 'AX'; % This is a character.
b = string(a) % This is a string.
The documentation suggests:
There are two ways to represent text in MATLAB®. You can store text in character arrays. A typical use is to store short pieces of text as character vectors. And starting in Release 2016b, you can also store multiple pieces of text in string arrays. String arrays provide a set of functions for working with text as data.
This is how the two representations differ:
Element access. To represent char vectors of different length, one had to use cell arrays, e.g. ch = {'a', 'ab', 'abc'}. With strings, they can be created in actual arrays: str = [string('a'), string('ab'), string('abc')].
However, to index characters in a string array directly, the curly bracket notation has to be used:
str{3}(2) % == 'b'
Memory use. Chars use exactly two bytes per character. strings have overhead:
a = 'abc'
b = string('abc')
whos a b
returns
Name Size Bytes Class Attributes
a 1x3 6 char
b 1x1 132 string
The best place to start for understanding the difference is the documentation. The key difference, as stated there:
A character array is a sequence of characters, just as a numeric array is a sequence of numbers. A typical use is to store short pieces of text as character vectors, such as c = 'Hello World';.
A string array is a container for pieces of text. String arrays provide a set of functions for working with text as data. To convert text to string arrays, use the string function.
Here are a few more key points about their differences:
They are different classes (i.e. types): char versus string. As such they will have different sets of methods defined for each. Think about what sort of operations you want to do on your text, then choose the one that best supports those.
Since a string is a container class, be mindful of how its size differs from an equivalent character array representation. Using your example:
>> a = 'AX'; % This is a character.
>> b = string(a) % This is a string.
>> whos
Name Size Bytes Class Attributes
a 1x2 4 char
b 1x1 134 string
Notice that the string container lists its size as 1x1 (and takes up more bytes in memory) while the character array is, as its name implies, a 1x2 array of characters.
They can't always be used interchangeably, and you may need to convert between the two for certain operations. For example, string objects can't be used as dynamic field names for structure indexing:
>> s = struct('a', 1);
>> name = string('a');
>> s.(name)
Argument to dynamic structure reference must evaluate to a valid field name.
>> s.(char(name))
ans =
1
Strings do have a bit of overhead, but still increase by 2 bytes per character. After every 8 characters it increases the size of the variable. The red line is y=2x+127.
figure is created using:
v=[];N=100;
for ct = 1:N
s=char(randi([0 255],[1,ct]));
s=string(s);
a=whos('s');v(ct)=a.bytes;
end
figure(1);clf
plot(v)
xlabel('# characters')
ylabel('# bytes')
p=polyfit(1:N,v,1);
hold on
plot([0,N],[127,2*N+127],'r')
hold off
One important practical thing to note is, that strings and chars behave differently when interacting with square brackets. This can be especially confusing when coming from python. consider following example:
>>['asdf' '123']
ans =
'asdf123'
>> ["asdf" "123"]
ans =
1×2 string array
"asdf" "123"
I have been given a string and its corresponding md4 hash. I need to find a similar string that would give the same hash. Below is an MD4 collision example from md4 wiki page; (https://en.wikipedia.org/wiki/MD4)
I don't understand on what basis the characters were changed. What is the criteria for doing so ?
Note : The hex characters (in bold) has been altered
k1 = 839c7a4d7a92cb5678a5d5b9eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edd45e51fe39708bf9427e9c3e8b9
k2 = 839c7a4d7a92cbd678a5d529eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edc45e51fe39708bf9427e9c3e8b9
MD4(k1) = MD4(k2) = 4d7e6a1defa93d2dde05b45d864c429b
Note that two hex-digits of k1 and k2 define one byte of the input string, whose length is 64 bytes .
I am trying to generate hashes using the Murmur3 algorithm. The hashes are consistent but they are different values being returned by Scala and Guava.
class package$Test extends FunSuite {
test("Generate hashes") {
println(s"Seed = ${MurmurHash3.stringSeed}")
val vs = Set("abc", "test", "bucket", 111.toString)
vs.foreach { x =>
println(s"[SCALA] Hash for $x = ${MurmurHash3.stringHash(x).abs % 1000}")
println(s"[GUAVA] Hash for $x = ${Hashing.murmur3_32().hashString(x).asInt().abs % 1000}")
println(s"[GUAVA with seed] Hash for $x = ${Hashing.murmur3_32(MurmurHash3.stringSeed).hashString(x).asInt().abs % 1000}")
println()
}
}
}
Seed = -137723950
[SCALA] Hash for abc = 174
[GUAVA] Hash for abc = 419
[GUAVA with seed] Hash for abc = 195
[SCALA] Hash for test = 588
[GUAVA] Hash for test = 292
[GUAVA with seed] Hash for test = 714
[SCALA] Hash for bucket = 413
[GUAVA] Hash for bucket = 22
[GUAVA with seed] Hash for bucket = 414
[SCALA] Hash for 111 = 250
[GUAVA] Hash for 111 = 317
[GUAVA with seed] Hash for 111 = 958
Why am I getting different hashes?
It looks to me like Scala's hashString converts pairs of UTF-16 chars to ints differently than Guava's hashUnencodedChars (hashString with no Charset was renamed to that).
Scala:
val data = (str.charAt(i) << 16) + str.charAt(i + 1)
Guava:
int k1 = input.charAt(i - 1) | (input.charAt(i) << 16);
In Guava, the char at an index i becomes the 16 least significant bits of the the int and the char at i + 1 becomes the most significant 16 bits. In the Scala implementation, that's reversed: the char at i is the most significant while the char at i + 1 is the least significant. (The fact that the Scala implementation uses + rather than | could also be significant I imagine.)
Note that the Guava implementation is equivalent to using ByteBuffer.putChar(c) twice to put two characters into a little endian ByteBuffer, then using ByteBuffer.getInt() to get an int value back out. The Guava implementation is also equivalent to encoding the characters to bytes using UTF-16LE and hashing those bytes. The Scala implementation is not equivalent to encoding the string in any of the standard charsets that JVMs are required to support. In general, I'm not sure what precedent (if any) Scala has for doing it the way it does.
Edit:
The Scala implementation does another thing different than the Guava implementation as well: it passes the number of chars being hashed to the finalizeHash method where Guava's implementation passes the number of bytes to the equivalent fmix method.
I believe hashString(x, StandardCharsets.UTF_16BE) should match Scala's behavior. Let us know.
(Also, please upgrade your Guava to something newer!)
I'm trying to calculate the RIPEMD160 hash in matlab for some data represented by a hex string. I found the following java class and compiled it for jvm 1.6
http://developer.nokia.com/Community/Wiki/RIPEMD160_encryption_in_JavaME
the following code works perfectly in matlab for hashing strings:
clear all
% add folder with class file to java path
functions_folder = strcat(pwd,filesep,'functions');
javaaddpath(functions_folder)
% string to hash
string_to_hash = 'test12345';
% convert to java String
str_to_hash_java = javaObject('java.lang.String',uint8(string_to_hash));
% pass in string and convert output to char array
mystr = char(RIPEMD160.RIPEMD160String(str_to_hash_java))
Now my problem comes about when I try to hash some binary data represented by a hex string. The hash output is correct for hex values of 7f or smaller, but once I have 8 bits (>= 80) it no longer gives the correct answer. I can't seem to find the problem. Here is my code:
clear all
% add folder with class file to java path
functions_folder = strcat(pwd,filesep,'functions');
javaaddpath(functions_folder)
% data to hash in hex format
hex_string_in = '80';
hex_string_in_length = length(hex_string_in);
% split every to characters and calculate the data in each byte
for i=1:hex_string_in_length/2
data_uint8_array(1,i) = uint8(hex2dec(hex_string_in(2*i-1:2*i)));
end
% constructor
x = RIPEMD160;
% pass in binary data
x.update(data_uint8_array)
% get hash in binary format
hash_out_bin = x.digestBin();
% typecast binary data into unit8 primitive
hash_out_unit8=typecast(hash_out_bin,'uint8');
% convert to hex
hash_out_hex = dec2hex(hash_out_unit8)';
% pad with zeros if bytes all smaller than hex(80)
if(size(hash_out_hex,1))==1
hash_out_hex=[repmat('0',[1 size(hash_out_hex,2)]);hash_out_hex];
end
% final formatting, convert to lowercase
hash_out_hex = lower(hash_out_hex(:)')
for an input of '7f' it produces the correct hash of c8297aad716979548921b2e8e26ca8f20061dbef
but for '80' is gives e633ca40d977e24a1ffd56b7a992e99b48d13359 instead of the correct result b436441e6bb882fe0a0fa0320cb2d97d96b4d1bc
Thanks.
You are passing strings to your Java code instead of regular byte arrays. Not all bytes are representations of valid character encodings. Therefore you are likely to loose information. Only hash bytes, without any conversion. If strings are required use base 64 encoding/decoding.
str_to_hash_java should not be needed.