HMAC-MD5 in pure lua - hash

I need to write a HMAC-MD5 algorithm in pure Lua..
I got this algorithm from Wikipedia
function hmac (key, message)
if (length(key) > blocksize) then
key = hash(key) // keys longer than blocksize are shortened
end if
if (length(key) < blocksize) then
key = key ∥ [0x00 * (blocksize - length(key))] // keys shorter than blocksize are zero-padded ('∥' is concatenation)
end if
o_key_pad = [0x5c * blocksize] ⊕ key // Where blocksize is that of the underlying hash function
i_key_pad = [0x36 * blocksize] ⊕ key // Where ⊕ is exclusive or (XOR)
return hash(o_key_pad ∥ hash(i_key_pad ∥ message)) // Where '∥' is concatenation
end function
and I have the md5 code from here. The md5 calculation function works correctly..
Implementing the algorithm in lua, so far I have the following code
local function hmac_md5(key,msg)
local blocksize = 64
if string.len(key) > blocksize then
key = calculateMD5(key)
end
while string.len(key)<blocksize do
key = key .. "0"
end
-- local o_key_pad = bit_xor((0x5c * blocksize),key)
-- local i_key_pad = bit_xor((0x36 * blocksize),key)
return calculateMD5(o_key_pad..calculateMD5(i_key_pad..message))
end
--calculateMD5 is the md5.Calc function in the Stackoverflow link specifed
I am stuck in the part where o_key_pad and i_key_pad are calculated.. do I just XOR the 2 values? The python implementation in the wikipedia link had some weird calculations..
Please help!

Yes, "⊕" is the symbol for "exclusive or".
Remember: once you compute the final hash, DO NOT use an ordinary string comparison to check if a hash is correct. This WILL allow attackers to sign arbitrary messages.
Note that 0x5c * blocksize is probably not what you are looking for, since that multiplies 0x5c by blocksize. You want to create an array of length blocksize containing 0x5c in each position.
Note that you must pad with zero bytes, not the character "0". So key = key .. "0" is wrong. It should be key = key .. "\0", or however you create NUL bytes in Lua.

Related

Seed for hash-table non cryptographic hash functions

If one sets the hash table seed during resize or table creation to a random number, will that prevent the DDoS attacks on such hash table or, knowing the hash algorithm, the attacker will still easily get around the seed? What if the algorithm uses the Pearson hash function with randomly generated tables, unknown to the attacker? Does such table hash still need a seed or it is safe enough?
Context: I want to use an on-disk hash table for a key-value database for my toy web server, where the keys may depend on the user input.
There is exist several approaches to protect your hash-subsystem from "adverse selection" attack, most popular of them is named Universal Hashing, where hash-function or it's property randomly selected, at initialization.
In my own approach, I am using same hash function, where each char adding to result with non-linear mixing, dependends of random array of uint32_t[256]. Array is created during system initialization, and in my code, it happening at each start, by reading the /dev/urandom. See my implementation in open source emerSSL program. You're welcome for borrow this entire hash-table implementation, or hash-function only.
Currently, my hash-function from the referred source computes two independent hashes for double hashing search algorithm.
There is "reduced" hash-function form the source, to demonstrate idea of non-linear mixing with S-block array"
uint32_t S_block[0x100]; // Substitute block, random contains
#define NLF(h, c) (S_block[(unsigned char)(c + h)] ^ c)
#define ROL(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
int32_t hash(const char *key) {
uint32_t h = 0x1F351F35; // Barker code * 2
char c;
for(int i = 0; c = key[i]; i++) {
h = ROL(h, 5);
h += NLF(h, c);
}
return h;
}

Is there a difference between Signed and Unsigned LEB128, when *encoding* the number?

I understand that LEB128 decoders need to know whether an encoded number is signed or unsigned, but the encoder seems to work identically either way (though Wikipedia uses distinct functions for encoding signed and unsigned numbers).
If positive numbers are encoded the same way in Signed and Unsigned LEB128 (only the range changes), and negative numbers only occur in Signed LEB128, it seems more sensible to create a single function that encodes any integer (using the two's compliment when the argument is negative).
I implemented a function that works the way I described, and it seems to work fine.
This is not an implementation detail (unless I've misunderstood something). Any function that can encode Signed LEB128 makes any function that encodes Unsigned LEB128 completely redundant, so there would never be a good reason to create both.
I used JavaScript, but the actual implementation is not important. Is there ever a reason to have a Signed LEB128 encoder and an Unsigned one?
const toLEB128 = function * (arg) {
/* This generator takes any BigInt, LEB128 encodes it, and
yields the result, one byte at a time (little-endian). */
const digits = arg.toString(2).length;
const length = digits + (7 - digits % 7);
const sevens = new RegExp(".{1,7}", "g");
const number = BigInt.asUintN(length, arg);
const padded = "000000" + number.toString(2);
const string = padded.slice(padded.length % 7);
const eights = string.match(sevens).map(function(string, index) {
/* This callback takes each string of seven digits and its
index (big-endian), prepends the correct continuation digit,
converts the 8-bit result to a BigInt, then returns it. */
return BigInt("0b" + Boolean(index) * 1 + string);
});
while (eights.length) yield eights.pop();
};

calculate a colliding key to a given hash using a specific hash function

Given the following hash function (written in java):
long hash(String key) {
char[] c = key.toCharArray();
long hash = 7;
for (int i = 0; i < c.length; i++) {
hash = hash*31 + c[i];
}
return hash;
}
(note: I would have put the type of this hash function in the question title but couldn't find out what it's called. If you know the term please let me know in the comments)
How can one compute a key that gets hashed to the same value as some other key?
long a = hash("myKey");
String x = reverseHash(a);
assert(hash(x) == a);
Is there a way to compute this efficiently? (not looking for a decryption algorithm, only for a way to produce an equivalent hash)
How would such an algorithm look like? (doesn't have to use the exact same numbers as my example, I just want to understand it)

Why does the Streaming-Operator in SystemVerilog reverse the byte order?

I simulated the following example:
shortint j;
byte unsigned data_bytes[];
j = 16'b1111_0000_1001_0000;
data_bytes = { >>{j}};
`uvm_info(get_type_name(), $sformatf("j data_bytes: %b_%b", data_bytes[1], data_bytes[0]), UVM_LOW)
Result:
UVM_INFO Initiator.sv(84) # 0: uvm_test_top.sv_initiator [Initiator] j data_bytes: 10010000_11110000
However, this seems strange to me, since the byte-order is reversed, as I expect the LSB to be at index 0 of data_byte[0] and the MSB at index 7 of data_byte[1]. Why does this happen? According to documentation (Cadence Help) this should not be the case.
As defined in section 6.24.3 Bit-stream casting of the IEEE 1800-2017 LRM, the [0] element of an unpacked dynamic array is considered the left index, and streaming >> goes from left to right indexes. To get the results you want, write
data_bytes = { << byte {j}};
This reverses the stream, but keeps the individual bytes in right to left order.

Why do I need to add the original salt to each hash iteration of a password?

I understand it is important to hash passwords over multiple iterations to make things harder for an attacker. I have read numerous times that when processing these iterations, it is critical to hash not only the result of the previous hashing, but also append the original salt each time. In other words:
I need to not do this:
var hash = sha512(salt + password);
for (i = 0; i < 1000; i++) {
hash = sha512(hash);
}
And instead, need to do this:
var hash = sha512(salt + password);
for (i = 0; i < 1000; i++) {
hash = sha512(salt + hash);
}
My question is regarding the math here. Why does my bad example above make things easier for an attacker? I've heard that it would increase the likelihood of collisions but I am not understanding why.
It is not that you simply need to do "hash = sha512(salt + hash)" - it's more complex than that. An HMAC is a better way of adding your salt (and PBKDF2 is based on HMAC - see below for more detail on PBKDF2) - there's a good discussion at When is it safe to use a broken hash function? for those details.
You are correct in that you need to have multiple iterations of a hash function for security.
However, don't roll your own. See How to securely hash passwords?, and note that PBKDF2, BCrypt, and Scrypt are all means of doing so.
PBKDF2, also known as PKCS#5v2 and RFC2898 is in fact reasonably close to what you're doing (multiple iterations of a normal hash function), particular in the form of PBKDF2-HMAC-SHA-512, in particular section 5.2 lists:
For each block of the derived key apply the function F defined
below to the password P, the salt S, the iteration count c, and
the block index to compute the block:
T_1 = F (P, S, c, 1) ,
T_2 = F (P, S, c, 2) ,
...
T_l = F (P, S, c, l) ,
where the function F is defined as the exclusive-or sum of the
first c iterates of the underlying pseudorandom function PRF
applied to the password P and the concatenation of the salt S
and the block index i:
F (P, S, c, i) = U_1 \xor U_2 \xor ... \xor U_c
where
U_1 = PRF (P, S || INT (i)) ,
U_2 = PRF (P, U_1) ,
...
U_c = PRF (P, U_{c-1}) .
Here, INT (i) is a four-octet encoding of the integer i, most
significant octet first.
P.S. SHA-512 was a good choice of hash primitive - SHA-512 (and SHA-384) are also superior to MD5, SHA-1, and even SHA-224 and SHA-256 because SHA-384 and up use 64-bit operations which current GPU's (early 2014) do not have as much of an advantage over current CPU's with as they do 32-bit operations, thus reducing the margin of superiority attackers have for offline attacks.