Hashing using division method - hash

For the hash function : h(k) = k mod m;
I understand that m=2^n will always give the last n LSB digits. I also understand that m=2^p-1 when K is a string converted to integers using radix 2^p will give same hash value for every permutation of characters in K. But why exactly "a prime not too close to an exact power of 2" is a good choice? What if I choose 2^p - 2 or 2^p-3? Why are these choices considered bad?
Following is the text from CLRS:
"A prime not too close to an exact power of 2 is often a good choice for m. For
example, suppose we wish to allocate a hash table, with collisions resolved by
chaining, to hold roughly n D 2000 character strings, where a character has 8 bits.
We don’t mind examining an average of 3 elements in an unsuccessful search, and
so we allocate a hash table of size m D 701. We could choose m D 701 because
it is a prime near 2000=3 but not near any power of 2."

Suppose we work with radix 2p.
2p-1 case:
Why that is a bad idea to use 2p-1? Let us see,
k = ∑ai2ip
and if we divide by 2p-1 we just get
k = ∑ai2ip = ∑ai mod 2p-1
so, as addition is commutative, we can permute digits and get the same result.
2p-b case:
Quote from CLRS:
A prime not too close to an exact power of 2 is often a good choice for m.
k = ∑ai2ip = ∑aibi mod 2p-b
So changing least significant digit by one will change hash by one. Changing second least significant bit by one will change hash by two. To really change hash we would need to change digits with bigger significance. So, in case of small b we face problem similar to the case then m is power of 2, namely we depend on distribution of least significant digits.

Related

Alternate method for bin2dec function [duplicate]

I have a 12-bit binary that I need to convert to a decimal. For example:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
Bit 1 is the most significant bit, Bit 12 is the least significant bit.
Note: This answer applies primarily to unsigned data types. For converting to signed types, a few extra steps are necessary, discussed here.
The bin2dec function is one option, but requires you to change the vector to a string first. bin2dec can also be slow compared to computing the number yourself. Here's a solution that's about 75 times faster:
>> A = [0,1,1,0,0,0,0,0,1,1,0,0];
>> B = sum(A.*2.^(numel(A)-1:-1:0))
B =
1548
To explain, A is multiplied element-wise by a vector of powers of 2, with the exponents ranging from numel(A)-1 down to 0. The resulting vector is then summed to give the integer represented by the binary pattern of zeroes and ones, with the first element in the array being considered the most significant bit. If you want the first element to be considered the least significant bit, you can do the following:
>> B = sum(A.*2.^(0:numel(A)-1))
B =
774
Update: You may be able to squeeze even a little more speed out of MATLAB by using find to get the indices of the ones (avoiding the element-wise multiplication and potentially reducing the number of exponent calculations needed) and using the pow2 function instead of 2.^...:
B = sum(pow2(find(flip(A))-1)); % Most significant bit first
B = sum(pow2(find(A)-1)); % Least significant bit first
Extending the solution to matrices...
If you have a lot of binary vectors you want to convert to integers, the above solution can easily be modified to convert all the values with one matrix operation. Suppose A is an N-by-12 matrix, with one binary vector per row. The following will convert them all to an N-by-1 vector of integer values:
B = A*(2.^(size(A, 2)-1:-1:0)).'; % Most significant bit first
B = A*(2.^(0:size(A, 2)-1)).'; % Least significant bit first
Also note that all of the above solutions automatically determine the number of bits in your vector by looking at the number of columns in A.
Dominic's answer assumes you have access to the Data Acquisition toolbox. If not use bin2dec:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A) )
or (in reverse)
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A(end:-1:1)) )
depending on what you intend to be bit 1 and 12!
If the MSB is right-most (I'm not sure what you mean by Bit 1, sorry if that seems stupid):
Try:
binvec2dec(A)
Output should be:
ans =
774
If the MSB is left-most, use fliplr(A) first.

Space complexity for a simple streaming algorithm

I want to determine the space complexity of the go to example of a simple streaming algorithm.
If you get a permutation of n-1 different numbers and have to detect the one missing number, you calculate the sum of all numbers 1 to n using the formula n (n + 1) / 2 and then you subtract each incoming number. The result is your missing number. I found a german wikipedia article stating that the space complexity of this algorithm is O(log n). (https://de.wikipedia.org/wiki/Datenstromalgorithmus)
What I do not understand is: The amount of bits needed to store a number n is log2(n). ok.. but I do have to calculate the sum, tough. So n (n + 1) / 2 is larger than n and therefore needs more space than just log (n) right?
Can someone help me with this? Thanks in advance!
If integer A in binary coding requires Na bits and integer B requires Nb bits then A*B requires no more than Na+Nb bits (not Na * Nb). So, expression n(n+1)/2 requires no more than log2(n) + log2(n+1) = O(2log2(n)) = O(log2(n)) bits.
Even more, you may raise n to any fixed power i and it still will use O(log2(n)) space. n itself, n10, n500, n10000000 all require O(log(n)) bits of storage.

find discretization steps

I have data files F_j, each containing a list of numbers with an unknown number of decimal places. Each file contains discretized measurements of some continuous variable and
I want to find the discretization step d_j for file F_j
A solution I could come up with: for each F_j,
find the number (n_j) of decimal places;
multiply each number in F_j with 10^{n_j} to obtain integers;
find the greatest common divisor of the entire list.
I'm looking for an elegant way to find n_j with Matlab.
Also, finding the gcd of a long list of integers seems hard — do you have any better idea?
Finding the gcd of a long list of numbers isn't too hard. You can do it in time linear in the size of the list. If you get lucky, you can do it in time a lot less than linear. Essentially this is because:
gcd(a,b,c) = gcd(gcd(a,b),c)
and if either a=1 or b=1 then gcd(a,b)=1 regardless of the size of the other number.
So if you have a list of numbers xs you can do
g = xs(1);
for i = 2:length(xs)
g = gcd(x(i),g);
if g == 1
break
end
end
The variable g will now store the gcd of the list.
Here is some sample code that I believe will help you get the GCD once you have the numbers you want to look at.
A = [15 30 20];
A_min = min(A);
GCD = 1;
for n = A_min:-1:1
temp = A / n;
if (max(mod(temp,1))==0)
% yay GCD found
GCD = n;
break;
end
end
The basic concept here is that the default GCD will always be 1 since every number is divisible by itself and 1 of course =). The GCD also can't be greater than the smallest number in the list, thus I start with the smallest number and then decriment by 1. This is assuming that you have already converted the numbers to a whole number form at this point. Decimals will throw this off!
By using the modulus of 1 you are testing to see if the number is a whole number, if it isn't you will have a decmial remainder left which is greater than 0. If you anticipate having to deal with negative you will have to tweak this test!
Other than that, the first time you find a number where the modulus of the list (mod 1) is all zeros you've found the GCD.
Enjoy!

How to generate all possible combinations n-bit strings?

Given a positive integer n, I want to generate all possible n bit combinations in matlab.
For ex : If n=3, then answer should be
000
001
010
011
100
101
110
111
How do I do it ?
I want to actually store them in matrix. I tried
for n=1:2^4
r(n)=dec2bin(n,5);
end;
but that gave error "In an assignment A(:) = B, the number of elements in A and B must be the same.
Just loop over all integers in [0,2^n), and print the number as binary. If you always want to have n digits (e.g. insert leading zeros), this would look like:
for ii=0:2^n-1,
fprintf('%0*s\n', n, dec2bin(ii));
end
Edit: there are a number of ways to put the results in a matrix. The easiest is to use
x = dec2bin(0:2^n-1);
which will produce an n-by-2^n matrix of type char. Each row is one of the bit strings.
If you really want to store strings in each row, you can do this:
x = cell(1, 2^n);
for ii=0:2^n-1,
x{ii} = dec2bin(ii);
end
However, if you're looking for efficient processing, you should remember that integers are already stored in memory in binary! So, the vector:
x = 0 : 2^n-1;
Contains the binary patterns in the most memory efficient and CPU efficient way possible. The only trade-off is that you will not be able to represent patterns with more than 32 of 64 bits using this compact representation.
This is a one-line answer to the question which gives you a double array of all 2^n bit combinations:
bitCombs = dec2bin(0:2^n-1) - '0'
So many ways to do this permutation. If you are looking to implement with an array counter: set an array of counters going from 0 to 1 for each of the three positions (2^0,2^1,2^2). Let the starting number be 000 (stored in an array). Use the counter and increment its 1st place (2^0). The number will be 001. Reset the counter at position (2^0) and increase counter at 2^1 and go on a loop till you complete all the counters.

How can I convert a binary to a decimal without using a loop?

I have a 12-bit binary that I need to convert to a decimal. For example:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
Bit 1 is the most significant bit, Bit 12 is the least significant bit.
Note: This answer applies primarily to unsigned data types. For converting to signed types, a few extra steps are necessary, discussed here.
The bin2dec function is one option, but requires you to change the vector to a string first. bin2dec can also be slow compared to computing the number yourself. Here's a solution that's about 75 times faster:
>> A = [0,1,1,0,0,0,0,0,1,1,0,0];
>> B = sum(A.*2.^(numel(A)-1:-1:0))
B =
1548
To explain, A is multiplied element-wise by a vector of powers of 2, with the exponents ranging from numel(A)-1 down to 0. The resulting vector is then summed to give the integer represented by the binary pattern of zeroes and ones, with the first element in the array being considered the most significant bit. If you want the first element to be considered the least significant bit, you can do the following:
>> B = sum(A.*2.^(0:numel(A)-1))
B =
774
Update: You may be able to squeeze even a little more speed out of MATLAB by using find to get the indices of the ones (avoiding the element-wise multiplication and potentially reducing the number of exponent calculations needed) and using the pow2 function instead of 2.^...:
B = sum(pow2(find(flip(A))-1)); % Most significant bit first
B = sum(pow2(find(A)-1)); % Least significant bit first
Extending the solution to matrices...
If you have a lot of binary vectors you want to convert to integers, the above solution can easily be modified to convert all the values with one matrix operation. Suppose A is an N-by-12 matrix, with one binary vector per row. The following will convert them all to an N-by-1 vector of integer values:
B = A*(2.^(size(A, 2)-1:-1:0)).'; % Most significant bit first
B = A*(2.^(0:size(A, 2)-1)).'; % Least significant bit first
Also note that all of the above solutions automatically determine the number of bits in your vector by looking at the number of columns in A.
Dominic's answer assumes you have access to the Data Acquisition toolbox. If not use bin2dec:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A) )
or (in reverse)
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A(end:-1:1)) )
depending on what you intend to be bit 1 and 12!
If the MSB is right-most (I'm not sure what you mean by Bit 1, sorry if that seems stupid):
Try:
binvec2dec(A)
Output should be:
ans =
774
If the MSB is left-most, use fliplr(A) first.