Reverse multiplication of 32-bit numbers - hash

I have two large signed 32-bit numbers (java ints) being multiplied together such that they'll overflow. Actually, I have one of the numbers, and the result. Can I determine what the other operand was?
knownResult = unknownOperand * knownOperand;
Why? I have a string and a suffix being hashed with fnv1a. I know the resulting hash and the suffix, I want to see how easy it is to determine the hash of the original string.
This is the core of fnv1a:
hash ^= byte
hash *= PRIME

It depends. If the multiplier is even, at least one bit must inevitably be lost. So I hope that prime isn't 2.
If it's odd, then you can absolutely reverse it, just multiply by the modular multiplicative inverse of the multiplier to undo the multiplication.
There is an algorithm to calculate the modular multiplicative inverse modulo a power of two in Hacker's Delight.
For example, if the multiplier was 3, then you'd multiply by 0xaaaaaaab to undo (because 0xaaaaaaab * 3 = 1). For 0x01000193, the inverse is 0x359c449b.

You want to solve the equation y = prime * x for x, which you do by division in the finite ring modulo 232: x = y / prime.
Technically you do that by multiplying y with the multiplicative inverse of the prime modulo 232, which can be computed by the extended Euclidean algorithm.

Uh, division? Or am I not understanding the question?

It's not the fastest method, but something very easy to memorise is this:
unsigned inv(unsigned x) {
unsigned xx = x * x;
while (xx != 1) {
x *= xx;
xx *= xx;
}
return x;
}
It returns x**(2**n-1) (as in x*(x**2)*(x**4)*(x**8)*..., or x**(1+2+4+8+...)). As the loop exit condition implies, x**(2**n) is 1 when n is big enough, provided x is odd.
So, x**(2**n-1) equals x**(2**n)/x equals 1/x equals the thing you multiply x by to get the value 1 (mod 2**n). Which you then apply:
knownResult = unknownOperand * knownOperand
knownResult * inv(knownOperand) = unknownOperand * knownOperand * inv(knownOperand)
knownResult * inv(knownOperand) = unknownOperand * 1
or simply:
unknownOperand = knownResult * inv(knownOperand);
But there are faster ways, as given in other answers here. This one's just easy to remember.
Also, obligatory SO "use a library function" answer: BN_mod_inverse().

Related

Why is the following code correct for computing the hash of a string?

I am currently reading about the Rabin Karp algorithm and as part of that I need to understand string polynomial hashing. From what I understand, the hash of a string is given by the following formula:
hash = ( char_0_val * p^0 + char_1_val * p^1 + ... + char_n_val ^ p^n ) mod m
Where:
char_i_val: is the integer value of the character plus 1 given by string[i]-'a' + 1
p is a prime number larger than the character set
m is a large prime number
The website cp-algorithms has the following entry on the subject. They say that the code to write the above is as follows:
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m;
p_pow = (p_pow * p) % m;
}
return hash_value;
}
I understand what the program is trying to do but I do not understand why it is correct.
My question
I am having trouble understanding why the above code is correct. It has been a long time since I have done any modular math. After searching online I see that we have the following formulas for modular addition and modular multiplication:
a+b (mod m) = (a%m + b%m)%m
a*b (mod m) = (a%m * b%m)%m
Based on the above shouldn't the code be as follows?
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
int char_value = (c - 'a' + 1);
hash_value = (hash_value%m + ((char_value%m * p_pow%m)%m)%m ) % m;
p_pow = (p_pow%m * p%m) % m;
}
return hash_value;
}
What am I missing? Ideally I am seeking a breakdown of the code and an explanation of why the first version is correct.
Mathematically, there is no reason to reduce intermediate results modulo m.
Operationally, there are a couple of very closely related reasons to do it:
To keep numbers small enough that they can be represented efficiently.
To keep numbers small enough that operations on them do not overflow.
So let's look at some quantities and see if they need to be reduced.
p was defined as some value less than m, so p % m == p.
p_pow and hash_value have already been reduced modulo m when they were computed, reducing them modulo m again would do nothing.
char_value is at most 26, which is already less than m.
char_value * p_pow is at most 26 * (m - 1). That can be, and often will be, more than m. So reducing it modulo m would do something. But it can still be delayed, because the next step is still "safe" (no overflow)
char_value * p_pow + hash_value is still at most 27 * (m - 1) which is still much less than 263-1 (the maximum value for a long long, see below why I assume that a long long is 64-bit), so there is no problem yet. It's fine to reduce modulo m after the addition.
As a bonus, the loop could actually do (263-1) / (27 * (m - 1)) iterations before it needs to reduce hash_value modulo m. That's over 341 million iterations! For most practical purposes you could therefore remove the first % m and return hash_value % m; instead.
I used 263-1 in this calculation because p_pow = (p_pow * p) % m requires long long to be a 64-bit type (or, hypothetically, an exotic size of 36 bits or higher). If it was a 32-bit type (which is technically allowed, but rare nowadays) then the multiplication could overflow, because p_pow can be approximately a billion and a 32-bit type cannot hold 31 billion.
BTW note that this hash function is specifically for strings that only contain lower-case letters and nothing else. Other characters could result in a negative value for char_value which is bad news because the remainder operator % in C++ works in a way such that for negative numbers it is not the "modulo operator" (misnomer, and the C++ specification does not call it that). A very similar function can be written that can take any string as input, and that would change the analysis above a little bit, but not qualitatively.

Why do I get an incorrect output from a modulus operation with negative number

I tried this code on Dart: I get 28.5
void main() {
double modulo = -1.5 % 30.0;
print(modulo);
}
The same code in Javascript returns -1.5
let modulo = -1.5 % 30;
console.log(modulo);
What is the equivalent of the javascript code above in Dart ?
The documentation for num.operator % states (emphasis mine):
Returns the remainder of the Euclidean division. The Euclidean division of two integers a and b yields two integers q and r such that a == b * q + r and 0 <= r < b.abs().
...
The sign of the returned value r is always positive.
See remainder for the remainder of the truncating division.
Meanwhile, num.remainder says (again, emphasis mine):
The result r of this operation satisfies: this == (this ~/ other) * other + r. As a consequence the remainder r has the same sign as the divider this.
So if you use:
void main() {
double modulo = (-1.5).remainder(30.0);
print(modulo);
}
you'll get -1.5.
Note that both values are mathematically correct; there are two different answers that correspond to the two different ways that you can compute a negative quotient when performing integer division. You have a choice between rounding a negative quotient toward zero (also known as truncation) or toward negative infinity (flooring). remainder corresponds to a truncating division, and % corresponds to a flooring division.
An issue was raised about this in the dart-lang repo a while ago. Apparently the % symbol in dart is an "Euclidean modulo operator rather than remainder."
An equivalent of what you are trying to do can be accomplished with
(-1.5).remainder(30.0)

Logarithm/Exponential of real numbers in cvc4

I am looking for a solver that can provide models of formulas on real numbers involving logarithms or exponentials.
Can cvc4 handle functions which contain logarithms or exponentials of real numbers? Similarly, can cvc4 express the constant e?
According to this question, z3 can only handle constant exponents, which does not help me.
This question only asks about logarithms for integers.
I am unfamiliar with cvc4 but I perhaps have some useful properties about logarithms that you may be able to exploit based on your limitations.
Technically speaking, no computer (no matter how powerful) knows what e is because it is transcendental (cannot be expressed as the solution to a polynomial equation with rational coefficients).
If you are limited such that you can only take logarithms for integers, you can express e as a faction approximation and solve it that way. The formula ends up being a bit longer than just taking the logarithm directly but the advantage is that you can effectively calculate the logarithm where the base is any rational number, while only individually finding logarithms of whole numbers.
Let e be approximated by the fraction a/b where both a and b are integers.
(a/b)^n = x
log(base a/b)(x) = n
This doesn't really get you anywhere so we have to take a different route that requires a bit more algebra.
(a/b)^n = x
(a^n)/(b^n) = x
a^n = x * b^n
log(base a)(x * b^n) = n
log(base a)(x) + log(base a)(b^n) = n
log(base a)(x) + n*log(base a)(b) = n
log(base a)(x) = n - n*log(base a)(b)
log(base a)(x) = n * (1 - log(base a)(b))
n = log(base a)(x) / (1 - log(base a)(b))
In other words, log(base a)(x) / (1 - log(base a)(b)) is an approximation for ln(x) where a/b is an approximation of e. Obviously, this approximation for ln(x) gets closer to the real value of ln(x) as a/b more closely approximates e. Note I kept this in a general form here that a/b could represent any rational number, not just e.
If this doesn't answer your question fully, I hope it at least helps.
Just tried an arbitrary example.
If you consider a and b as 27183 and 10000 respectively, I tried this quick calculation:
log(base 27183)(82834) / (1 - log(base 27138)(10000)) = 11.32452...
ln(82834) = 11.32459...

Why do people use hash(k) = c * k with a prime c

Given an integer m, a hash function defined on T is a map T -> {0, 1, 2, ..., m - 1}. If k is an element of T and m is a positive integer, we denote hash(k, m) its hashed value.
For simplicity, most hash functions are of the form hash(k, m) = f(k) % m where f is a map from T to the set of integers.
In the case where m = 2^p (which is often used to the modulo m operation is cheap) and T is a set of integers, I have seen many people using f(k) = c * k with c being a prime number.
I understand if you want to choose a function of the form f(k) = c * k, you need to have gcd(c, m) = 1 for every hash table size m. Even though using a prime number fits the bill, c = 1 is also good.
So my question is the following: why do people still use f(k) = prime * k as their hash function? What kind of nice property does it have?
You don't need it to be prime. One of the most efficient hash functions with provable collision resistance just multiplies with a random number: https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic. You do however need it to be odd.

problem with arithmetic using logarthms to avoid numerical underflow

I have two lists of fractions;
say A = [ 1/212, 5/212, 3/212, ... ]
and B = [ 4/143, 7/143, 2/143, ... ].
If we define A' = a[0] * a[1] * a[2] * ... and B' = b[0] * b[1] * b[2] * ...
I want to calculate the values of A' / B',
My trouble is A are B are both quite long and each value is small so calculating the product causes numerical underflow very quickly...
I understand turning the product into a sum through logarithms can help me determine which of A' or B' is greater
ie max( log(a[0])+log(a[1])+..., log(b[0])+log(b[1])+... )
but i need the actual ratio....
My best bet to date is to keep the number representations as fractions, ie A = [ [1,212], [5,212], [3,212], ... ] and implement my own arithmetic but it's getting clumsy and I have a feeling there is a (simple) way of logarithms I'm just missing....
The numerators for A and B don't come from a sequence. They might as well be random for the purpose of this question. If it helps the denominators for all values in A are the same, as are all the denominators for B.
Any ideas most welcome!
Mat
You could calculate it in a slightly different order:
A' / B' = a[0] / b[0] * a[1] / b[1] * a[2] / b[2] * ...
If you want to keep it in logarithms, remember that A/B corresponds to log A - log B, so after you've summed the logarithms of A and B, you can find the ratio of the larger to the smaller by exponentiating your log base with max(logsumA, logsumB)-min(logsumA,logsumB).
Strip out the numerators and denominators since they are the same for the whole sequence. Compute the ratio of numerators element-by-element (rather as #Mark suggests), finally multiply the result by the right power of the denominator-of-B/denominator-of-A.
Or, if that threatens integer overflow in computing the product of the numerators or powers of the denominators, something like:
A'/B' = (numerator(A[0])/numerator(b[0]))*(denominator(B)/denominator(A) * ...
I've probably got some of the fractions upside-down, but I guess you can figure that out ?