I'm trying to understand why one needs modulo operator in writing a program that finds prime numbers; I'm a student analysing some code for learning purposes, and I am confused as to why modulo is needed.
Explicit modulo isn't required. Consider an implementation of the sieve of Eratosthenes (in C for the sake of using something):
int numbersThatMayBePrime[100];
memset(numbersThatMayBePrime, 0, sizeof(int)*100);
for(int c = 2; c < 100; c++)
{
if(!numbersThatMayBePrime[c])
{
printf("%d\n", c);
for(int strikeThrough = c; strikeThrough < 100; strikeThrough += c)
numbersThatMayBePrime[strikeThrough] = -1;
}
}
If the modulo of n % (all numbers between n and zero) is always positive and never zero, the number is prime.
I am trying to figure out how to code this in javascript.
Related
I am currently reading about the Rabin Karp algorithm and as part of that I need to understand string polynomial hashing. From what I understand, the hash of a string is given by the following formula:
hash = ( char_0_val * p^0 + char_1_val * p^1 + ... + char_n_val ^ p^n ) mod m
Where:
char_i_val: is the integer value of the character plus 1 given by string[i]-'a' + 1
p is a prime number larger than the character set
m is a large prime number
The website cp-algorithms has the following entry on the subject. They say that the code to write the above is as follows:
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m;
p_pow = (p_pow * p) % m;
}
return hash_value;
}
I understand what the program is trying to do but I do not understand why it is correct.
My question
I am having trouble understanding why the above code is correct. It has been a long time since I have done any modular math. After searching online I see that we have the following formulas for modular addition and modular multiplication:
a+b (mod m) = (a%m + b%m)%m
a*b (mod m) = (a%m * b%m)%m
Based on the above shouldn't the code be as follows?
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
int char_value = (c - 'a' + 1);
hash_value = (hash_value%m + ((char_value%m * p_pow%m)%m)%m ) % m;
p_pow = (p_pow%m * p%m) % m;
}
return hash_value;
}
What am I missing? Ideally I am seeking a breakdown of the code and an explanation of why the first version is correct.
Mathematically, there is no reason to reduce intermediate results modulo m.
Operationally, there are a couple of very closely related reasons to do it:
To keep numbers small enough that they can be represented efficiently.
To keep numbers small enough that operations on them do not overflow.
So let's look at some quantities and see if they need to be reduced.
p was defined as some value less than m, so p % m == p.
p_pow and hash_value have already been reduced modulo m when they were computed, reducing them modulo m again would do nothing.
char_value is at most 26, which is already less than m.
char_value * p_pow is at most 26 * (m - 1). That can be, and often will be, more than m. So reducing it modulo m would do something. But it can still be delayed, because the next step is still "safe" (no overflow)
char_value * p_pow + hash_value is still at most 27 * (m - 1) which is still much less than 263-1 (the maximum value for a long long, see below why I assume that a long long is 64-bit), so there is no problem yet. It's fine to reduce modulo m after the addition.
As a bonus, the loop could actually do (263-1) / (27 * (m - 1)) iterations before it needs to reduce hash_value modulo m. That's over 341 million iterations! For most practical purposes you could therefore remove the first % m and return hash_value % m; instead.
I used 263-1 in this calculation because p_pow = (p_pow * p) % m requires long long to be a 64-bit type (or, hypothetically, an exotic size of 36 bits or higher). If it was a 32-bit type (which is technically allowed, but rare nowadays) then the multiplication could overflow, because p_pow can be approximately a billion and a 32-bit type cannot hold 31 billion.
BTW note that this hash function is specifically for strings that only contain lower-case letters and nothing else. Other characters could result in a negative value for char_value which is bad news because the remainder operator % in C++ works in a way such that for negative numbers it is not the "modulo operator" (misnomer, and the C++ specification does not call it that). A very similar function can be written that can take any string as input, and that would change the analysis above a little bit, but not qualitatively.
Section 18.5.10 in the IEEE Std 1800-2017 has the following example to illustrate why we need solve before:
rand bit s;
rand bit [31:0] d;
constraint c { s -> d == 0; }
Do the following 2 options still require solve before?
A.
rand bit s;
rand bit [31:0] d;
constraint c1 { d < 1000; }
constraint c2 { if ( s == 1 ) d == 0; }
B.
rand bit s;
rand bit [31:0] d;
constraint c1 { d < 1000;
if ( s == 1 ) d == 0;
}
constraint c2 { s dist { 0 :/ 95, 1 :/ 5 }; }
A naive reading of all the above would assume a 50% (5% for option B) chance of choosing s as 1, whereas the IEEE standard says you need a solve s before d constraint in the first example to get the 50% chance.
Do A and B need a solve before to get the expected 50%/5% chance of s being 1?
Option A would give you a 0.0999% chance of s being 1 without the solve s before d construct. (s is 1'b1 in only 1 in 1001 solutions).
Option B would give you a 5% chance of s being 1 regardless of having the solve s before d construct. The dist constraint implies an ordering in picking values within the solution space, so adding the solve before construct would be redundant.
Consider the standard Murmurhash, giving 32-bit output values.
Suppose that we apply it on 32-bit inputs -- are there collisions?
In other words, does Murmurmash basically encodes a permutation when applied to 32-bit inputs?
If collisions exist, can anyone give an example (scanning random inputs didn't yield any)?
I assume you mean MurmurHash3, 32 bit, and specially the 32-bit fmix method:
FORCE_INLINE uint32_t fmix32 ( uint32_t h )
{
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
If not, then you need to better specify what you mean.
For the above, there are no collisions (two distinct inputs won't result in the same output). There is only one entry that returns the input value: 0.
As there are not "that many" 32-bit values, you can actually iterate over all of them to verify, in a couple of minutes. This will require some memory for a bit field, but that's it.
Btw, there is also a way to reverse the function (get the input from the output).
How to compute sum of two integers using bit wise NOT, XOR, AND,and OR operators, without using SHIFT operator (not using arithmetic operators as well)? Is it possible ?
Example C code:
int a = 5;
int b = 11;
int c = a ^ b;
int d = a & b;
int sum = ...
Let's take a = b = 1. The result of this sum is sum = 2. The binary representation of the inputs are 00000001 (let's have 8-bits for simplicity). The output is 00000010. You can easily get the least significant bit, LSB = a0 XOR b0, but to modify the bit number 1 you need a shift.
Given a value of k. Such that k<=100000
We have to print the number of pairs such that sum of elements of each pair is divisible by k.
under the following condition first element should be smaller than second, and both element should be less than 109.
I've found a solution, let a and b numbers such that (a+b)%k=0 then we have to find that pairs (a,b), where a<b, so let's count how many pairs (a,b) satisfy the condition that a+b=k, for example if k=3 0+3=3, 1+2=3, 2+1=3, 3+0=3 there are 4 pairs but only 2 pairs which is (K+1)/2 (integer division) so similar for find the pairs (a,b) which sum is 2k, 3k,.. nk, and the solution will be (k+1)/2 + (2k+1)/2 + (3k+1)/2 + ... + (nk+1)/2, and that is equal to (k*n*(n+1)/2 + n)/2 with time complexity O(1), take care in the case if n*k=2*10^9, because a can't be more than 10^9 for the given constraint.
Solution in O(N) time and O(N) space using hash map.
The concept is as follows:
If (a+b)%k=0 where
a=k*SOME_CONSTANT_1+REMAINDER_1
b=k*SOME_CONSTANT_2+REMAINDER_2
then (REMAINDER_1 +REMAINDER_2 )%k will surely be 0
so for an array (4,2,3,31,14,16,8) and k =5 if you have some information like below , you can figure out which all pairs sum %k =0
Note that, Bottom most row consist of all the remainders from 0 to k-1 and all the numbers corresponding to it.
Now all you need to do is move both the pointer towards each other until they meet. If both the pointers locations have number associated with it their sum%k will be 0
To solve it, you can keep track of all the remainder you have seen so far by using hash table
create a hash map Map<Integer, List>.
Pre-populate its keys with 0 to k-1;
iterate over array and put remainder of each number in the map with Key = remainder and put the actual number in the list,
Iterate over the key set using two pointers moving each other. And sum += listSizeAsPointedByPointer1 * listSizeAsPointedByPointer2
One way is brute force:
int numPairs = 0;
for (i = 0; i < 10e9; i++)
{
for (j = i+1; j < 10e9; j++)
{
int sum = i + j;
if (sum % k == 0) numPairs++;
}
}
return numPairs;
I'll leave it up to you to optimize this for performance. I can think of at least one way to significantly speed this up.
Some psuedo-code to get you started. It uses the brute-force technique you say you tried, but maybe something was wrong in your code?
max = 1000000000
numberPairs = 0
for i = 1 to max - 2 do
for j = i + 1 to max - 1 do
if (i + j) mod k = 0 then
numberPairs = numberPairs + 1
end if
end do
end do