Hash function that returns the same hash for a sum even if different terms lead to the same sum - hash

let's say I have:
n = 14
n is the result of the following sums of integers:
[5, 2, 7] -> 5 + 2 + 7 = 14 = n
[3, 4, 5, 2] -> 3 + 4 + 5 + 2 = 14 = n
[1, 13] -> 1 + 13 = 14 = n
[13, 1] -> 13 + 1 = 14 = n
[4, 3, 5, 2] -> 4 + 3 + 5 + 2 = 14 = n
...
I would need a hash function h so that:
h([5, 2, 7]) = h([3, 4, 5, 2]) = h([1, 13]) = h([13, 1]) = h([4, 3, 5, 2]) = h(...)
I.e. it doesn't matter the order of the integer terms and as long as their integer sum is the same, their hash should also the same.
I need to do this without computing the sum n, because the terms as well as n can be very high and easily overflow (they don't fit the bits of an int), that's why I am asking this question.
Are you aware or maybe do you have an insight on how I can implement such a hash function?
Given a list/sequence of integers, this hash function must return the same hash if the sum of the integers would be the same, but without computing the sum.
Thank you for your attention.
EDIT: I elaborated on #derpirscher's answer and modified his function a bit further as I had collisions on multiples of BIG_PRIME (this example is in JavaScript):
function hash(seq) {
const BIG_PRIME = 999999999989;
const MAX_SAFE_INTEGER_DIV_2_FLOOR = Math.floor(Number.MAX_SAFE_INTEGER / 2);
let h = 0;
for (i = 0; i < seq.length; i++) {
let value = seq[i];
if (h > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
h = h % BIG_PRIME;
}
if (value > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
value = value % BIG_PRIME;
}
h += value;
}
return h;
}
My question now would be: what do you think about this function? Are there some edge cases I didn't take into account?
Thank you.
EDIT 2:
Using the above function hash([1,2]); and hash([4504 * BIG_PRIME +1, 4504 * BIG_PRIME + 2]) will collide as mentioned by #derpirscher.
Here is another modified of version of the above function, which computes the modulo % BIG_PRIME only to one of the two terms if either of the two are greater than MAX_SAFE_INTEGER_DIV_2_FLOOR:
function hash(seq) {
const BIG_PRIME = 999999999989;
const MAX_SAFE_INTEGER_DIV_2_FLOOR = Math.floor(Number.MAX_SAFE_INTEGER / 2);
let h = 0;
for (let i = 0; i < seq.length; i++) {
let value = seq[i];
if (
h > MAX_SAFE_INTEGER_DIV_2_FLOOR &&
value > MAX_SAFE_INTEGER_DIV_2_FLOOR
) {
if (h > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
h = h % BIG_PRIME;
} else if (value > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
value = value % BIG_PRIME;
}
}
h += value;
}
return h;
}
I think this version lowers the number of collisions a bit further.
What do you think? Thank you.
EDIT 3:
Even though I tried to elaborate on #derpirscher's answer, his implementation of hash is the correct one and the one to use.
Use his version if you need such an hash function.

You could calculate the sum modulo some big prime. If you want to stay within the range of int, you need to know what the maximum integer is, in the language you are using. Then select a BIG_PRIME that's just below maxint / 2
Assuming an int to be 4 bytes, maxint = 2147483647 thus the biggest prime < maxint/2 would be 1073741789;
int hash(int[] seq) {
BIG_PRIME = 1073741789;
int h = 0;
for (int i = 0; i < seq.Length; i++) {
h = (h + seq[i] % BIG_PRIME) % BIG_PRIME;
}
return h;
}
As at every step both summands will always be below maxint/2 you won't get any overflows.
Edit
From a mathematical point of view, the following property which may be important for your use case holds:
(a + b + c + ...) % N == (a % N + b % N + c % N + ...) % N
But yeah, of course, as in every hash function you will have collisions. You can't have a hash function without collisions, because the size of the domain of the hash function (ie the number of possible input values) is generally much bigger than the the size of the codomain (ie the number of possible output values).
For your example the size of the domain is (in principle) infinite, as you can have any count of numbers from 1 to 2000000000 in your sequence. But your codomain is just ~2000000000 elements (ie the range of int)

Related

Generate cell with random pairs without repetitions

How to generate a sequence of random pairs without repeating pairs?
The following code already generates the pairs, but does not avoid repetitions:
for k=1:8
Comb=[randi([-15,15]) ; randi([-15,15])];
T{1,k}=Comb;
end
When running I got:
T= [-3;10] [5;2] [1;-5] [10;9] [-4;-9] [-5;-9] [3;1] [-3;10]
The pair [-3,10] is repeated, which cannot happen.
PS : The entries can be positive or negative.
Is there any built in function for this? Any sugestion to solve this?
If you have the Statistics Toolbox, you can use randsample to sample 8 numbers from 1 to 31^2 (where 31 is the population size), without replacement, and then "unpack" each obtained number into the two components of a pair:
s = -15:15; % population
M = 8; % desired number of samples
N = numel(s); % population size
y = randsample(N^2, M); % sample without replacement
result = s([ceil(y/N) mod(y-1, N)+1]); % unpack pair and index into population
Example run:
result =
14 1
-5 7
13 -8
15 4
-6 -7
-6 15
2 3
9 6
You can use ind2sub:
n = 15;
m = 8;
[x y]=ind2sub([n n],randperm(n*n,m));
Two possibilities:
1.
M = nchoosek(1:15, 2);
T = datasample(M, 8, 'replace', false);
2.
T = zeros(8,2);
k = 1;
while (k <= 8)
t = randi(15, [1,2]);
b1 = (T(:,1) == t(1));
b2 = (T(:,2) == t(2));
if ~any(b1 & b2)
T(k,:) = t;
k = k + 1;
end
end
The first method is probably faster but takes up more memory and may not be practicable for very large numbers (ex: if instead of 15, the max was 50000), in which case you have to go with 2.

vectorizing function in octave / matlab

I have a function that I was wondering if it was possible to vectorize it and not have to use a for loop. The code is below.
a=1:2:8
for jj=1:length(a)
b(jj)=rtfib(a(jj)); %fibbonacci function
end
b
output below:
a =
1 3 5 7
>>>b =
1 3 8 21
I was trying to do it this way
t = 0:.01:10;
y = sin(t);
but doing the code below doesn't work any suggestions?
ps: I'm trying to keep the function rtfib because of it's speed and I need to use very large Fibonacci numbers. I'm using octave 3.8.1
a=1:2:8
b=rtfib(a)
Here's the rtfib code below as requested
function f = rtfib(n)
if (n == 0)
f = 0;
elseif (n==1)
f=1;
elseif (n == 2)
f = 2;
else
fOld = 2;
fOlder = 1;
for i = 3 : n
f = fOld + fOlder;
fOlder = fOld;
fOld = f;
end
end
end
You can see that your function rtfib actually computes every Fibonacci number up to n.
You can modify it so that is stores and returns all these number, so that you only have to call the function once with the maximum number you need:
function f = rtfib(n)
f=zeros(1,n+1);
if (n >= 0)
f(1) = 0;
end
if (n>=1)
f(2)=1;
end
if (n >= 2)
f(3) = 2;
end
if n>2
fOld = 2;
fOlder = 1;
for i = 3 : n
f(i+1) = fOld + fOlder;
fOlder = fOld;
fOld = f(i+1);
end
end
end
(It will return fibonnaci(n) in f(n+1), if you don't need the 0 you could change it so that it returns fibonnaci(n) in f(n) if you prefer)
Then you only need to call
>>f=rtfib(max(a));
>>b=f(a+1)
b =
1 3 8 21
If you don't want to store everything you could modify the function rtfib a little more, so that it takes the the array a as input, compute the Fibonacci numbers up to max(a) but only stores the one needed, and it would directly return b.
Both solutions will slow down rtfib itself but it will be a lot faster than calculating the Fibonacci numbers from 0 each time.

Fast way to test whether n^2 + (n+1)^2 is perfect square

I am trying to program a code to test whether n^2 + (n+1)^2 is a perfect.
As i do not have much experience in programming, I only have Matlab at my disposal.
So far this is what I have tried
function [ Liste ] = testSquare(N)
if exist('NumberTheory')
load NumberTheory.mat
else
MaxT = 0;
end
if MaxT > N
return
elseif MaxT > 0
L = 1 + MaxT;
else
L = 1;
end
n = (L:N)'; % Makes a list of numbers from L to N
m = n.^2 + (n+1).^2; % Makes a list of numbers on the form A^2+(A+1)^2
P = dec2hex(m); % Converts this list to hexadecimal
Length = length(dec2hex(P(N,:))); %F inds the maximum number of digits in the hexidecimal number
Modulo = ['0','1','4','9']'; % Only numbers ending on 0,1,4 or 9 can be perfect squares in hex
[d1,~] = ismember(P(:,Length),Modulo); % Finds all numbers that end on 0,1,4 or 9
m = m(d1); % Removes all numbers not ending on 0,1,4 or 9
n = n(d1); % -------------------||-----------------------
mm = sqrt(m); % Takes the square root of all the possible squares
A = (floor(mm + 0.5).^2 == m); % Tests wheter these are actually squares
lA = length(A(A>0)); % Finds the number of such numbers
MaxT = N;
save NumberTheory.mat MaxT;
if lA>0
m = m(A); % makes a list of all the square numbers
n = n(A); % finds the corresponding n values
mm = mm(A); % Finds the squareroot values of m
fid = fopen('Tallteori.txt','wt'); % Writes everything to a simple text.file
for ii = 1:lA
fprintf(fid,'%20d %20d %20d\t',n(ii),m(ii),mm(ii));
fprintf(fid,'\n');
end
fclose(fid);
end
end
Which will write the squares with the corresponding n values to a file. Now I saw that using hexadecimal was a fast way to find perfect squares in C+, and tried to use this in matlab. However I am a tad unsure if this is the best approach.
The code above breaks down when m > 2^52 due to the hexadecimal conversion.
Is there an alternative way/faster to write all the perfect squares on the form n^2 + (n+1)^2 to a text file from 1 to N ?
There is a much faster way that doesn't even require testing. You need a bit of elementary number theory to find that way, but here goes:
If n² + (n+1)² is a perfect square, that means there is an m such that
m² = n² + (n+1)² = 2n² + 2n + 1
<=> 2m² = 4n² + 4n + 1 + 1
<=> 2m² = (2n+1)² + 1
<=> (2n+1)² - 2m² = -1
Equations of that type are easily solved, starting from the "smallest" (positive) solution
1² - 2*1² = -1
of
x² - 2y² = -1
corresponding to the number 1 + √2, you obtain all further solutions by multiplying that with a power of the primitive solution of
a² - 2b² = 1
which is (1 + √2)² = 3 + 2*√2.
Writing that in matrix form, you obtain all solutions of x² - 2y² = -1 as
|x_k| |3 4|^k |1|
|y_k| = |2 3| * |1|
and all x_k are necessarily odd, thus can be written as 2*n + 1.
The first few solutions (x,y) are
(1,1), (7,5), (41,29), (239,169)
corresponding to (n,m)
(0,1), (3,5), (20,29), (119,169)
You can get the next (n,m) solution pair via
(n_(k+1), m_(k+1)) = (3*n_k + 2*m_k + 1, 4*n_k + 3*m_k + 2)
starting from (n_0, m_0) = (0,1).
Quick Haskell code since I don't speak MatLab:
Prelude> let next (n,m) = (3*n + 2*m + 1, 4*n + 3*m + 2) in take 20 $ iterate next (0,1)
[(0,1),(3,5),(20,29),(119,169),(696,985),(4059,5741),(23660,33461),(137903,195025)
,(803760,1136689),(4684659,6625109),(27304196,38613965),(159140519,225058681)
,(927538920,1311738121),(5406093003,7645370045),(31509019100,44560482149)
,(183648021599,259717522849),(1070379110496,1513744654945),(6238626641379,8822750406821)
,(36361380737780,51422757785981),(211929657785303,299713796309065)]
Prelude> map (\(n,m) -> (n^2 + (n+1)^2 - m^2)) it
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Edit by EitanT:
Here's the MATLAB code to calculate the first N numbers:
res = zeros(1, N);
nm = [0, 1];
for k = 1:N
nm = nm * [3 4; 2 3] + [1, 2];
res(k) = nm(1);
end
The resulting array res should hold the values of n that satisfy the condition of the perfect square.

Counting Number of Specific Outputs of a Function

If I have a matrix and I want to apply a function to each row of the matrix. This function has three possible outputs, either x = 0, x = 1, or x > 0. There's a couple things I'm running into trouble with...
1) The cases that output x = 1 or x > 0 are different and I'm not sure how to differentiate between the two when writing my script.
2) My function isn't counting correctly? I think this might be a problem with how I have my loop set up?
This is what I've come up with. Logically, I feel like this should work (except for the hiccup w/ the first problem I've stated)
[m n] = size(matrix);
a = 0; b = 0; c = 0;
for i = 1 : m
x(i) = function(matrix(m,:));
if x > 0
a = a + 1;
end
if x == 0
b = b + 1;
end
if x == 1
c = c + 1;
end
end
First you probably have an error in line 4. It probably should be i instead of m.
x(i) = function(matrix(i,:));
You can calculate a, b and c out of the loop:
a = sum(x>0);
b = sum(x==0);
c = sum(x==1);
If you want to distinguish x==1 and x>0 then may be with sum(xor(x==1,x>0)).
Also you may have problem with precision error when comparing double values with 0 and 1.

C / Generate a random number between 1 to 4 leaving 3 out with arc4random( )?

I have
int y = (arc4random()%4)+1;
So it generates a random number from 1 to 4.
I wanted to ask if there's a way to leave number 3 out so only numbers 1, 2 and 4 have a chance to get generated.
Thank you!
int allowdNumbers[3] = {1, 2, 4}
int index = arc4random()%3;
int number = allowdNumbers[index];
You can always make a random from 0-2 (arc4random() % 3) and use that number with 2 as a power:
2^0 = 1
2^1 = 2
2^2 = 4
and there you got your random from 1-4 without 3. In C:
int y = 1 << (arc4random() % 3);
Generate a random number from 0 to the number of different numbers you have (exclusive, and in your case, 3), and distribute the result according to your preference. In your case:
int y = (rand() % 3) + 1;
if (y == 3)
y++;
Assuming you want numbers that correspond to powers of two, then this should work nicely.
int y = 1 << arc4random_uniform(3);
If you want to leave out 3 for some other reason, then that would probably to more to obfuscate what you are doing than. In that case, something more straightforward would suffice.
do {
int y = arc4random_uniform(4) + 1;
} while (y == 3);
You can do this:
int y = (arc4random()%3)+1;
if (y == 3) y =4;
Though you should arc4random_uniform instead of the modulo operator.