Why do people use hash(k) = c * k with a prime c - hash

Given an integer m, a hash function defined on T is a map T -> {0, 1, 2, ..., m - 1}. If k is an element of T and m is a positive integer, we denote hash(k, m) its hashed value.
For simplicity, most hash functions are of the form hash(k, m) = f(k) % m where f is a map from T to the set of integers.
In the case where m = 2^p (which is often used to the modulo m operation is cheap) and T is a set of integers, I have seen many people using f(k) = c * k with c being a prime number.
I understand if you want to choose a function of the form f(k) = c * k, you need to have gcd(c, m) = 1 for every hash table size m. Even though using a prime number fits the bill, c = 1 is also good.
So my question is the following: why do people still use f(k) = prime * k as their hash function? What kind of nice property does it have?

You don't need it to be prime. One of the most efficient hash functions with provable collision resistance just multiplies with a random number: https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic. You do however need it to be odd.

Related

Hash Function for 3 Integers

I have 3 non-negative integers and a number n such that
0 <= a <= n, 0 <= b <= n, and 0 <= c <= n.
I need a one-way hash function that maps these 3 integers to one integer (could be any integer, positive or negative). Is there a way to do this, and if so, how? Is there a way so that this this function can be expressed as a simple mathematical expression where the only parameters are a, b, c, and n?
Note: I need this function because I was using tuples of 3 integers as keys in a dictionary on python, and with upwards of 10^10 keys, space is a real issue.
How about the Cantor pairing function (https://en.wikipedia.org/wiki/Pairing_function#Cantor_pairing_function)?
Let
H(a,b) := .5*(a + b)*(a + b + 1) + b
then
H(a,b,c) := .5*(H(a,b) + c)*(H(a,b) + c + 1) + c
You mentioned that you need a one-way hash, but based on your detailed description about memory constraints it seems that an invertible hash would also suffice.
This doesn't use the assumption that a, b, and c are bounded above and below.
Augmenting the answer above for a more concise implementation:
int cantor(int a, int b) {
return (a + b + 1) * (a + b) / 2 + b;
}
int hash(int a, int b, int c) {
return cantor(a, cantor(b, c));
}
The easiest way to understand this is that Cantor algorithm assigns a natural number to every integer pair of numbers.
Once we've assigned a natural number N = cantor(b, c), then we can assign a new unique natural number M = cantor(a, N), which we can use as a hash code and is a unique natural number for every triple a, b, c.
As a more general case, you could hash more integers by just another cantor with the next integer (e.g. cantor(a, cantor(b, cantor(c, d)))).

Creating script that takes matrices of size n and size n x 1

I have a MATLAB function that takes a matrix of length n, and uses gaussian elimination with partial pivoting to compute Ax = b. A is n x n, and b is n x 1.
I'm trying to create a script that generates random numbers and then call the function with those numbers. So far I have
A = rand(n)
b = rand(n, 1)
genp(r, r)
but since n is undefined, it doesn't work. Is the best way to create the variable n and assign a random integer to it?
Yes, the best way is to create a variable n. What you want is (probably):
n = 10; % Change this if you want different sized data
A = rand(n);
b = rand(n,1);
genp(A,b);
This way you can very easily alter the size of your data, by simply changing n. Based on this question, I'm assuming you want gen(A,b), and not gen(r,r), as you wrote in your question.
If you want an random n every time you call the function, check out randi

Generating Unique ID Number from Two Integers

Given 2 integers a and b (positive or negative). Is there any formula / method for generating unique ID number?
note: 1. result from f(a,b) and f(b,a) should be different. 2. calculating f(a,b) for x times (x > 1), the result should be same.
To make clear about the question, this function f(n) = (n * p) % q (where n=input sequence value, p=step size, q=maximum result size, n=non-negative integer, n < q, p < q, p ⊥ q (coprime)) will give unique ID number.
But, in my requirement, input are two numbers, a and b can be negative or positive integer.
any reference is appreciable
You could generate a long (64 bit) from 2 integers (32 bit) by just right bit shifting the first integer with 32 and then add the second integer.
private long uniqueId(int left, int right) {
long uniqueId = (long) left;
uniqueId = uniqueId <<< 32;
uniqueId += (long) right;
return uniqueId;
}
Say your integers have a range in [MIN_INT,MAX_INT]. Then, given an integer n from this range, the function
f(n) = n - MIN_INT
attributes a unique positive integer f(n) in the range [0, MAX_INT - MIN_INT], which is often called a rank.
Denote M = MAX_INT - MIN_INT + 1. Then, to find a unique id g(n,m) of two concatenated integers n and m, you can use the common access style also used for two-dimensional arrays:
g(n,m) = f(n)*M + f(m)
That is, you simply offset the second integer by the largest possible value and count on.
Practically, of course, you have to be careful in order to avoid overflows -- that is, you should use some suited data types.
Here is an example: say your integers come from the range [-1,4], thus M=6. Then, for two integers n=3 and m=-1 out of this range, g(n,m) = 3*6 + 0 = 18 can be used as id.

Multidimensional Arrays Multiplication in Matlab

I have the following three arrays in Matlab:
A size: 2xMxN
B size: MxN
C size: 2xN
Is there any way to remove the following loop to speed things up?
D = zeros(2,N);
for i=1:N
D(:,i) = A(:,:,i) * ( B(:,i) - A(:,:,i)' * C(:,i) );
end
Thanks
Yes, it is possible to do without the for loop, but whether this leads to a speed-up depends on the values of M and N.
Your idea of a generalized matrix multiplication is interesting, but it is not exactly to the point here, because through the repeated use of the index i you effectively take a generalized diagonal of a generalized product, which means that most of the multiplication results are not needed.
The trick to implement the computation without a loop is to a) match matrix dimensions through reshape, b) obtain the matrix product through bsxfun(#times, …) and sum, and c) get rid of the resulting singleton dimensions through reshape:
par = B - reshape(sum(bsxfun(#times, A, reshape(C, 2, 1, N)), 1), M, N);
D = reshape(sum(bsxfun(#times, A, reshape(par, 1, M, N)), 2), 2, N);
par is the value of the inner expression in parentheses, D the final result.
As said, the timing depends on the exact values. For M = 100 and N = 1000000 I find a speed-up by about a factor of two, for M = 10000 and N = 10000 the loop-less implementation is actually a bit slower.
You may find that the following
D=tprod(A,[1 -3 2],B-tprod(A,[-3 1 2],C,[-3 2]),[-3 2]);
cuts the time taken. I did a few tests and found the time was cut in about half.
tprod is available at
http://www.mathworks.com/matlabcentral/fileexchange/16275
tprod requires that A, B and C are full (not sparse).

Permuting n elements by swapping each element by no more than k positions

What I have is a vector (n = 4 in the example):
x = '0123';
What I want is a vector y of the same size of x and with the same elements as in x in different order:
y = ['0123'; '0132'; '0213'; '0231'; '0312'; '0321'; '1023'; '1032'; '1203'; '1302'; '2013'; '2031'; '2103'; '2301'];
y(ceil(rand * numel(y(:, 1))), :)
i.e. a permutation such that each element in y is allowed to randomly change no more than k positions with respect to its original position in x (k = 2 in the example). The probability distribution must be uniform (i.e. each permutation must be equally likely to occur).
An obvious but inefficient way to do it is of course to find a random unconstrained permutation and check ex post whether or not this happens to respect the constraint. For small vectors you can find all the permutations, delete those that are not allowed and randomly pick among the remaining ones.
Any idea about how to do the same more efficiently, for example by actually swapping the elements?
Generating all the permutations can be done easily using constraint programming. Here is a short model using MiniZinc for the above example (note that we assume that x will contain n different values here):
include "globals.mzn";
int: k = 2;
int: n = 4;
array[1..n] of int: x = [0, 1, 2, 3];
array[1..n] of var int: y;
constraint forall(i in 1..n) (
y[i] in {x[i + offset] | offset in -min(k, i-1)..min(k, n-i)}
);
constraint all_different(y);
solve :: int_search(y, input_order, indomain_min, complete)
satisfy;
output [show(y)];
In most cases, constraint programming systems have the possibility to use a random search. However, this would not give you a uniform distribution of the results. Using CP will however generate all valid permutations more efficiently than the naive method (generate and test for validity).
If you need to generate a random permutation of your kind efficiently, I think that it would be possible to modify the standard Fisher-Yates shuffle to handle it directly. The standard algorithm uses the rest of the array to choose the next value from, and chooses the value with a probability distribution that is uniform. It should be possible to keep a list of only the currently valid choices, and to change the probability distribution of the values to match the desired output.
I don't see any approach other than the rejection method that you mention. However, instead of listing all allowed permutations and then picking one, it's more efficient to avoid that listing. Thus, you can randomly generate a permutation, check if it's valid, and repeat if it's not:
x = '0123';
k = 2;
n = numel(x);
done = 0;
while ~done
perm = randperm(n);
done = all( abs(perm-(1:n)) <= k ); %// check condition
end
y = x(perm);