Generating Unique ID Number from Two Integers - numbers

Given 2 integers a and b (positive or negative). Is there any formula / method for generating unique ID number?
note: 1. result from f(a,b) and f(b,a) should be different. 2. calculating f(a,b) for x times (x > 1), the result should be same.
To make clear about the question, this function f(n) = (n * p) % q (where n=input sequence value, p=step size, q=maximum result size, n=non-negative integer, n < q, p < q, p ⊥ q (coprime)) will give unique ID number.
But, in my requirement, input are two numbers, a and b can be negative or positive integer.
any reference is appreciable

You could generate a long (64 bit) from 2 integers (32 bit) by just right bit shifting the first integer with 32 and then add the second integer.
private long uniqueId(int left, int right) {
long uniqueId = (long) left;
uniqueId = uniqueId <<< 32;
uniqueId += (long) right;
return uniqueId;
}

Say your integers have a range in [MIN_INT,MAX_INT]. Then, given an integer n from this range, the function
f(n) = n - MIN_INT
attributes a unique positive integer f(n) in the range [0, MAX_INT - MIN_INT], which is often called a rank.
Denote M = MAX_INT - MIN_INT + 1. Then, to find a unique id g(n,m) of two concatenated integers n and m, you can use the common access style also used for two-dimensional arrays:
g(n,m) = f(n)*M + f(m)
That is, you simply offset the second integer by the largest possible value and count on.
Practically, of course, you have to be careful in order to avoid overflows -- that is, you should use some suited data types.
Here is an example: say your integers come from the range [-1,4], thus M=6. Then, for two integers n=3 and m=-1 out of this range, g(n,m) = 3*6 + 0 = 18 can be used as id.

Related

Why is the following code correct for computing the hash of a string?

I am currently reading about the Rabin Karp algorithm and as part of that I need to understand string polynomial hashing. From what I understand, the hash of a string is given by the following formula:
hash = ( char_0_val * p^0 + char_1_val * p^1 + ... + char_n_val ^ p^n ) mod m
Where:
char_i_val: is the integer value of the character plus 1 given by string[i]-'a' + 1
p is a prime number larger than the character set
m is a large prime number
The website cp-algorithms has the following entry on the subject. They say that the code to write the above is as follows:
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m;
p_pow = (p_pow * p) % m;
}
return hash_value;
}
I understand what the program is trying to do but I do not understand why it is correct.
My question
I am having trouble understanding why the above code is correct. It has been a long time since I have done any modular math. After searching online I see that we have the following formulas for modular addition and modular multiplication:
a+b (mod m) = (a%m + b%m)%m
a*b (mod m) = (a%m * b%m)%m
Based on the above shouldn't the code be as follows?
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
int char_value = (c - 'a' + 1);
hash_value = (hash_value%m + ((char_value%m * p_pow%m)%m)%m ) % m;
p_pow = (p_pow%m * p%m) % m;
}
return hash_value;
}
What am I missing? Ideally I am seeking a breakdown of the code and an explanation of why the first version is correct.
Mathematically, there is no reason to reduce intermediate results modulo m.
Operationally, there are a couple of very closely related reasons to do it:
To keep numbers small enough that they can be represented efficiently.
To keep numbers small enough that operations on them do not overflow.
So let's look at some quantities and see if they need to be reduced.
p was defined as some value less than m, so p % m == p.
p_pow and hash_value have already been reduced modulo m when they were computed, reducing them modulo m again would do nothing.
char_value is at most 26, which is already less than m.
char_value * p_pow is at most 26 * (m - 1). That can be, and often will be, more than m. So reducing it modulo m would do something. But it can still be delayed, because the next step is still "safe" (no overflow)
char_value * p_pow + hash_value is still at most 27 * (m - 1) which is still much less than 263-1 (the maximum value for a long long, see below why I assume that a long long is 64-bit), so there is no problem yet. It's fine to reduce modulo m after the addition.
As a bonus, the loop could actually do (263-1) / (27 * (m - 1)) iterations before it needs to reduce hash_value modulo m. That's over 341 million iterations! For most practical purposes you could therefore remove the first % m and return hash_value % m; instead.
I used 263-1 in this calculation because p_pow = (p_pow * p) % m requires long long to be a 64-bit type (or, hypothetically, an exotic size of 36 bits or higher). If it was a 32-bit type (which is technically allowed, but rare nowadays) then the multiplication could overflow, because p_pow can be approximately a billion and a 32-bit type cannot hold 31 billion.
BTW note that this hash function is specifically for strings that only contain lower-case letters and nothing else. Other characters could result in a negative value for char_value which is bad news because the remainder operator % in C++ works in a way such that for negative numbers it is not the "modulo operator" (misnomer, and the C++ specification does not call it that). A very similar function can be written that can take any string as input, and that would change the analysis above a little bit, but not qualitatively.

MATLAB - How do I find the first integer of an infinite set that satisfies this condition?

I want to find the smallest integer P, such that the number of primes in the set {1,2,..., P} is less than P/6.
I think have the answer via (long) trial and error but would like to know how to verify this through MATLAB.
You can use isprime to check if any value in an array is a prime number. If we want to check all integers up until the integer N we can do
% You can change this to the maximum number that you'd like to consider for P
N = 2000;
possible_P_values = 2:N; % We omit 1 here since it's not a prime number
primes = isprime(possible_P_values);
To determine how many primes have occured up to a given integer N we can use cumsum of this logical matrix (the cumulative sum)
nPrimes_less_than_or_equal_to_P = cumsum(primes);
Then we can divide possible_P_values by 6 and check where the number of primes up to a certain point is less than that number.
is_less_than_P_over_6 = nPrimes_less_than_or_equal_to_P < (possible_P_values ./ 6);
Then we can identify the first occurance with find
possible_P_values(find(is_less_than_P_over_6, 1, 'first'))
% 1081

Why do people use hash(k) = c * k with a prime c

Given an integer m, a hash function defined on T is a map T -> {0, 1, 2, ..., m - 1}. If k is an element of T and m is a positive integer, we denote hash(k, m) its hashed value.
For simplicity, most hash functions are of the form hash(k, m) = f(k) % m where f is a map from T to the set of integers.
In the case where m = 2^p (which is often used to the modulo m operation is cheap) and T is a set of integers, I have seen many people using f(k) = c * k with c being a prime number.
I understand if you want to choose a function of the form f(k) = c * k, you need to have gcd(c, m) = 1 for every hash table size m. Even though using a prime number fits the bill, c = 1 is also good.
So my question is the following: why do people still use f(k) = prime * k as their hash function? What kind of nice property does it have?
You don't need it to be prime. One of the most efficient hash functions with provable collision resistance just multiplies with a random number: https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic. You do however need it to be odd.

Hash Function for 3 Integers

I have 3 non-negative integers and a number n such that
0 <= a <= n, 0 <= b <= n, and 0 <= c <= n.
I need a one-way hash function that maps these 3 integers to one integer (could be any integer, positive or negative). Is there a way to do this, and if so, how? Is there a way so that this this function can be expressed as a simple mathematical expression where the only parameters are a, b, c, and n?
Note: I need this function because I was using tuples of 3 integers as keys in a dictionary on python, and with upwards of 10^10 keys, space is a real issue.
How about the Cantor pairing function (https://en.wikipedia.org/wiki/Pairing_function#Cantor_pairing_function)?
Let
H(a,b) := .5*(a + b)*(a + b + 1) + b
then
H(a,b,c) := .5*(H(a,b) + c)*(H(a,b) + c + 1) + c
You mentioned that you need a one-way hash, but based on your detailed description about memory constraints it seems that an invertible hash would also suffice.
This doesn't use the assumption that a, b, and c are bounded above and below.
Augmenting the answer above for a more concise implementation:
int cantor(int a, int b) {
return (a + b + 1) * (a + b) / 2 + b;
}
int hash(int a, int b, int c) {
return cantor(a, cantor(b, c));
}
The easiest way to understand this is that Cantor algorithm assigns a natural number to every integer pair of numbers.
Once we've assigned a natural number N = cantor(b, c), then we can assign a new unique natural number M = cantor(a, N), which we can use as a hash code and is a unique natural number for every triple a, b, c.
As a more general case, you could hash more integers by just another cantor with the next integer (e.g. cantor(a, cantor(b, cantor(c, d)))).

Find the number of pairs whose sum is divisible by k?

Given a value of k. Such that k<=100000
We have to print the number of pairs such that sum of elements of each pair is divisible by k.
under the following condition first element should be smaller than second, and both element should be less than 109.
I've found a solution, let a and b numbers such that (a+b)%k=0 then we have to find that pairs (a,b), where a<b, so let's count how many pairs (a,b) satisfy the condition that a+b=k, for example if k=3 0+3=3, 1+2=3, 2+1=3, 3+0=3 there are 4 pairs but only 2 pairs which is (K+1)/2 (integer division) so similar for find the pairs (a,b) which sum is 2k, 3k,.. nk, and the solution will be (k+1)/2 + (2k+1)/2 + (3k+1)/2 + ... + (nk+1)/2, and that is equal to (k*n*(n+1)/2 + n)/2 with time complexity O(1), take care in the case if n*k=2*10^9, because a can't be more than 10^9 for the given constraint.
Solution in O(N) time and O(N) space using hash map.
The concept is as follows:
If (a+b)%k=0 where
a=k*SOME_CONSTANT_1+REMAINDER_1
b=k*SOME_CONSTANT_2+REMAINDER_2
then (REMAINDER_1 +REMAINDER_2 )%k will surely be 0
so for an array (4,2,3,31,14,16,8) and k =5 if you have some information like below , you can figure out which all pairs sum %k =0
Note that, Bottom most row consist of all the remainders from 0 to k-1 and all the numbers corresponding to it.
Now all you need to do is move both the pointer towards each other until they meet. If both the pointers locations have number associated with it their sum%k will be 0
To solve it, you can keep track of all the remainder you have seen so far by using hash table
create a hash map Map<Integer, List>.
Pre-populate its keys with 0 to k-1;
iterate over array and put remainder of each number in the map with Key = remainder and put the actual number in the list,
Iterate over the key set using two pointers moving each other. And sum += listSizeAsPointedByPointer1 * listSizeAsPointedByPointer2
One way is brute force:
int numPairs = 0;
for (i = 0; i < 10e9; i++)
{
for (j = i+1; j < 10e9; j++)
{
int sum = i + j;
if (sum % k == 0) numPairs++;
}
}
return numPairs;
I'll leave it up to you to optimize this for performance. I can think of at least one way to significantly speed this up.
Some psuedo-code to get you started. It uses the brute-force technique you say you tried, but maybe something was wrong in your code?
max = 1000000000
numberPairs = 0
for i = 1 to max - 2 do
for j = i + 1 to max - 1 do
if (i + j) mod k = 0 then
numberPairs = numberPairs + 1
end if
end do
end do