Why does direct mapped cache use the n least significant bits as the index and not the n most significant bits? - cpu-cache

I understand the concept of direct mapped cache however I was wondering why the bits stored in the index are never the most significant bits.
I was thinking it might have something to do with the Two's complement?

Related

Trucating FNV hash

I'm using 32-bit FNV-1a hashing, but now I want to reserve one of the bits to hold useful information about the input key. That is, I want to use only 31 of the 32 bits for hash and 1 bit for something else.
Assuming FNV is well distributed for my application, is it safe to assume that dropping 1 bit this will increase collision rate by 32/31, as opposed to something dramatic?
The algo recommends XOR the discarded MSB with the LSB, but for 1-bit, that seems pointless. As such, would it matter which bit is discarded (MSB or LSB)? And if not, would it matter if the LSB MSB were discard after hashing each byte (i.e. using a even numbered "prime") or after 32-bit hashing the entire byte-array first.
Removing a single bit from a 32-bit hash code will have a larger effect than a 32/31 increase in the collision rate. To see why, note that there are 232 possible 32-bit hashes and 231 possible 31-bit hashes, meaning that removing a bit from the hash cuts the numbers of possible hashes down by a factor of two - a pretty significant reduction in the number of possible hashes. This brings about roughly a doubling of the probability that you see a hash collision across your hashes.
If you have a sufficiently small number of hashes that collisions are rare, then cutting out a single bit is unlikely to change much. But if collisions were already an issue, dropping a bit will roughly double the chance you see them.

How to compute a reasonable number of bits for a checksum?

I have around 1500 bytes of data that I want to construct a checksum for so that if the data gets corrupted the chances of the checksum still matching the data is less than say 1 in 10^15, i.e. a low enough probability that I can treat it as it is never going to happen.
The question is how many bits should I compute? I have a sha-160 computation that gives me a 160 bit hash of my data, but I expect this is way larger than necessary. So I'm thinking I could truncate the resulting hash down to say the low 40 bits and use that as a sufficiently large bit pattern that if the data gets corrupted, I will most likely detect it.
So the question is two fold, how many bits is good enough and is taking the lower bits of a sha-160 hash a good approach to take?
You can use the table here to determine approximately how many bits you need for your desired error detection rate.

Does truncating a sha-160 hash produce a reasonable hash?

I have a sha-160 computation that gives me a 160 bit hash of my data, but I expect this is way larger than necessary. So I'm thinking I could truncate the resulting hash down to say the low 64 bits and use that.
Does taking the low 64 bits of a sha-160 hash computation give a reasonably random 64 bit hash?
Part of what it means for something to be a good hash is that any fixed subset of its bits is also (so far as possible, given how many bits) a good hash. The low 64 bits of a SHA-160 hash should be a good 64-bit hash, in so far as there is such a thing.
Note that for some purposes 64 bits really isn't all that many. For instance, if anything breaks in your application when someone finds two different things with the same hash, you probably want something longer: on average it will only take a modest number of billions of trials to find two things with the same 64-bit hash, no matter what your hashing algorithm.
What bad thing would happen if you just used all 160 bits?

is it possible to retrieve a password from a (partial) MD5 hash?

Suppose I have only the first 16 characters of a MD5 hash. If I use brute force attack or rainbow tables or any other method to retrieve the original password, how many compatible candidates have I to expect? 1? (I do not think) 10, 100, 1000, 10^12? Even a rough answer is welcome (for the number, but please be coherent with hash theory and methodology).
The output of MD5 is 16 bytes (128 bits). I suppose that you are talking about an hexadecimal representation, hence as 32 characters. Thus, "16 characters" means "64 bits". You are considering MD5 with its output truncated to 64 bits.
MD5 accepts inputs up to 264 bits in length; assuming that MD5 behaves as a random function, this means that the 218446744073709551616 possible input strings will map more or less uniformly among the 264 outputs, hence the average number of candidates for a given output is about 218446744073709551552, which is close to 105553023288523357112.95.
However, if you consider that you can find at least one candidate, then this means that the space of possible passwords that you consider is much reduced. A rainbow table is a special kind of precomputed table which accepts a compact representation (at the expense of a relatively expensive lookup procedure), but if it covers N passwords, then this means that, at some point, someone could apply the hash function N times. In practice, this severely limits the size N. Assuming N=260 (which means that the table builder had about one hundred NVidia GTX 580 GPU and could run them for six months; also, the table will use quite a lot of hard disks), then, on average, only 1/16th of 64-bit outputs have a matching password in the table. For those passwords which are in the table, there is a 93.75% probability that there is no other password in the table which leads to the same output; if you prefer, if you find a matching password, then you will find, on average, 0.0625 other candidates (i.e. most of the time, no other candidate).
In brief, the answer to your question depends on the size N of the space of possible passwords that you consider (those which were covered during rainbow table construction); but, in practice with Earth-based technology, if you can find one matching password for a 64-bit output, chances are that you will not be able to find another (although there are are really many others).
You should never ever be able to get a password from a partial hash.

Hash length reduction?

I know that say given a md5/sha1 of a value, that reducing it from X bits (ie 128) to say Y bits (ie 64 bits) increases the possibility of birthday attacks since information has been lost. Is there any easy to use tool/formula/table that will say what the probability of a "correct" guess will be when that length reduction occurs (compared to its original guess probability)?
Crypto is hard. I would recommend against trying to do this sort of thing. It's like cooking pufferfish: Best left to experts.
So just use the full length hash. And since MD5 is broken and SHA-1 is starting to show cracks, you shouldn't use either in new applications. SHA-2 is probably your best bet right now.
I would definitely recommend against reducing the bit count of hash. There are too many issues at stake here. Firstly, how would you decide which bits to drop?
Secondly, it would be hard to predict how the dropping of those bits would affect the distribution of outputs in the new "shortened" hash function. A (well-designed) hash function is meant to distribute inputs evenly across the whole of the output space, not a subset of it.
By dropping half the bits you are effectively taking a subset of the original hash function, which might not have nearly the desirably properties of a properly-designed hash function, and may lead to further weaknesses.
Well, since every extra bit in the hash provides double the number of possible hashes, every time you shorten the hash by a bit, there are only half as many possible hashes and thus the chances of guessing that random number is doubled.
128 bits = 2^128 possibilities
thus
64 bits = 2^64
so by cutting it in half, you get
2^64 / 2^128 percent
less possibilities