How to hash two 32 bit integers into one 32 bit integers without collision?

How to hash two 32 bit integers into one 32 bit integers without collision? - hash

I am looking for a one-way hashing to combine two 32 bits integers into one 32 bits integer. Not sure if it's feasible to do it without collision.
Edit:
I think my integers are generally small. One of them rarely takes more than 14 bits, and the other one rarely takes more than 20 bits.
Edit 2: Thanks for the help in the comments. I think for cases like if the combination of two integers took more than 32 bits, I can do something differently to not hash it. With that, how should I hash my integers?
Thanks!

Related

Trucating FNV hash

I'm using 32-bit FNV-1a hashing, but now I want to reserve one of the bits to hold useful information about the input key. That is, I want to use only 31 of the 32 bits for hash and 1 bit for something else.
Assuming FNV is well distributed for my application, is it safe to assume that dropping 1 bit this will increase collision rate by 32/31, as opposed to something dramatic?
The algo recommends XOR the discarded MSB with the LSB, but for 1-bit, that seems pointless. As such, would it matter which bit is discarded (MSB or LSB)? And if not, would it matter if the LSB MSB were discard after hashing each byte (i.e. using a even numbered "prime") or after 32-bit hashing the entire byte-array first.

Removing a single bit from a 32-bit hash code will have a larger effect than a 32/31 increase in the collision rate. To see why, note that there are 232 possible 32-bit hashes and 231 possible 31-bit hashes, meaning that removing a bit from the hash cuts the numbers of possible hashes down by a factor of two - a pretty significant reduction in the number of possible hashes. This brings about roughly a doubling of the probability that you see a hash collision across your hashes.
If you have a sufficiently small number of hashes that collisions are rare, then cutting out a single bit is unlikely to change much. But if collisions were already an issue, dropping a bit will roughly double the chance you see them.

Does halving every SHA224 2 bytes to 1 byte to halve the hash length introduce a higher collision risk?

Let's say I have strings that need not be reversible and let's say I use SHA224 to hash it.
The hash of hello world is 2f05477fc24bb4faefd86517156dafdecec45b8ad3cf2522a563582b and its length is 56 bytes.
What if I convert every two chars to its numerical representation and make a single byte out of them?
In Python I'd do something like this:
shalist = list("2f05477fc24bb4faefd86517156dafdecec45b8ad3cf2522a563582b")
for first_byte,next_byte in zip(shalist[0::2],shalist[1::2]):
chr(ord(first_byte)+ord(next_byte))
The result will be \x98ek\x9d\x95\x96\x96\xc7\xcb\x9ckhf\x9a\xc7\xc9\xc8\x97\x97\x99\x97\xc9gd\x96im\x94. 28 bytes. Effectively halved the input.
Now, is there a higher hash collision risk by doing so?

The simple answer is pretty obvious: yes, it increases the chance of collision by as many powers of 2 as there are bits missing. For 56 bytes halved to 28 bytes you get the chance of collision increased 2^(28*8). That still leaves the chance of collision at 1:2^(28*8).
Your use of that truncation can be still perfectly legit, depending what it is. Git for example shows only the first few bytes from a commit hash and for most practical purposes the short one works fine.
A "perfect" hash should retain a proportional amount of "effective" bits if you truncate it. For example 32 bits of SHA256 result should have the same "strength" as a 32-bit CRC, although there may be some special properties of CRC that make it more suitable for some purposes while the truncated SHA may be better for others.
If you're doing any kind of security with this it will be difficult to prove your system, you're probably better of using a shorter but complete hash.
Lets shrink the size to make sense of it and use 2 bytes hash instead of 56. The original hash will have 65536 possible values, so if you hash more than that many strings you will surely get a collision. Half that to 1 bytes and you will get a collision after at most 256 strings hashed, regardless do you take the first or the second byte. So your chance of collision is 256 greater (2^(1byte*8bits)) and is 1:256.
Long hashes are used to make it truly impractical to brute-force them, even after long years of cryptanalysis. When MD5 was introduced in 1991 it was considered secure enough to use for certificate signing, in 2008 it was considered "broken" and not suitable for security-related use. Various cryptanalysis techniques can be developed to reduce the "effective" strength of hash and encryption algorithms, so the more spare bits there are (in an otherwise strong algorithm) the more effective bits should remain to keep the hash secure for all practical purposes.

How to calculate the risk of conflicts of a 64 bit hash?

I need global unique ids for my application. I know there is an UUID standard for this, but I wonder if I really need 128 bits.
So I think about writing my own generator that uses system time, a random number, and the machines network address to generate an id that fits into 64 bit and therefore, can be stored in the unsigned long long int datatype in C++.
How can I determine if 64 bits is enough for me?

64 bit runs to about 18,446,744,073,709,551,616 combinations
which is around 18 and a half quintillion.
so if your'e generating 1.92 million hashes, the odds of a collision will be 1 in 10 million
Source: http://preshing.com/20110504/hash-collision-probabilities

Does truncating a sha-160 hash produce a reasonable hash?

I have a sha-160 computation that gives me a 160 bit hash of my data, but I expect this is way larger than necessary. So I'm thinking I could truncate the resulting hash down to say the low 64 bits and use that.
Does taking the low 64 bits of a sha-160 hash computation give a reasonably random 64 bit hash?

Part of what it means for something to be a good hash is that any fixed subset of its bits is also (so far as possible, given how many bits) a good hash. The low 64 bits of a SHA-160 hash should be a good 64-bit hash, in so far as there is such a thing.
Note that for some purposes 64 bits really isn't all that many. For instance, if anything breaks in your application when someone finds two different things with the same hash, you probably want something longer: on average it will only take a modest number of billions of trials to find two things with the same 64-bit hash, no matter what your hashing algorithm.
What bad thing would happen if you just used all 160 bits?

Hash length reduction?

I know that say given a md5/sha1 of a value, that reducing it from X bits (ie 128) to say Y bits (ie 64 bits) increases the possibility of birthday attacks since information has been lost. Is there any easy to use tool/formula/table that will say what the probability of a "correct" guess will be when that length reduction occurs (compared to its original guess probability)?

Crypto is hard. I would recommend against trying to do this sort of thing. It's like cooking pufferfish: Best left to experts.
So just use the full length hash. And since MD5 is broken and SHA-1 is starting to show cracks, you shouldn't use either in new applications. SHA-2 is probably your best bet right now.

I would definitely recommend against reducing the bit count of hash. There are too many issues at stake here. Firstly, how would you decide which bits to drop?
Secondly, it would be hard to predict how the dropping of those bits would affect the distribution of outputs in the new "shortened" hash function. A (well-designed) hash function is meant to distribute inputs evenly across the whole of the output space, not a subset of it.
By dropping half the bits you are effectively taking a subset of the original hash function, which might not have nearly the desirably properties of a properly-designed hash function, and may lead to further weaknesses.

Well, since every extra bit in the hash provides double the number of possible hashes, every time you shorten the hash by a bit, there are only half as many possible hashes and thus the chances of guessing that random number is doubled.
128 bits = 2^128 possibilities
thus
64 bits = 2^64
so by cutting it in half, you get
2^64 / 2^128 percent
less possibilities

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse