Sum of integers using XOR, AND gates - boolean

How to compute sum of two integers using bit wise NOT, XOR, AND,and OR operators, without using SHIFT operator (not using arithmetic operators as well)? Is it possible ?
Example C code:
int a = 5;
int b = 11;
int c = a ^ b;
int d = a & b;
int sum = ...

Let's take a = b = 1. The result of this sum is sum = 2. The binary representation of the inputs are 00000001 (let's have 8-bits for simplicity). The output is 00000010. You can easily get the least significant bit, LSB = a0 XOR b0, but to modify the bit number 1 you need a shift.

Related

Why is the following code correct for computing the hash of a string?

I am currently reading about the Rabin Karp algorithm and as part of that I need to understand string polynomial hashing. From what I understand, the hash of a string is given by the following formula:
hash = ( char_0_val * p^0 + char_1_val * p^1 + ... + char_n_val ^ p^n ) mod m
Where:
char_i_val: is the integer value of the character plus 1 given by string[i]-'a' + 1
p is a prime number larger than the character set
m is a large prime number
The website cp-algorithms has the following entry on the subject. They say that the code to write the above is as follows:
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m;
p_pow = (p_pow * p) % m;
}
return hash_value;
}
I understand what the program is trying to do but I do not understand why it is correct.
My question
I am having trouble understanding why the above code is correct. It has been a long time since I have done any modular math. After searching online I see that we have the following formulas for modular addition and modular multiplication:
a+b (mod m) = (a%m + b%m)%m
a*b (mod m) = (a%m * b%m)%m
Based on the above shouldn't the code be as follows?
long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
int char_value = (c - 'a' + 1);
hash_value = (hash_value%m + ((char_value%m * p_pow%m)%m)%m ) % m;
p_pow = (p_pow%m * p%m) % m;
}
return hash_value;
}
What am I missing? Ideally I am seeking a breakdown of the code and an explanation of why the first version is correct.
Mathematically, there is no reason to reduce intermediate results modulo m.
Operationally, there are a couple of very closely related reasons to do it:
To keep numbers small enough that they can be represented efficiently.
To keep numbers small enough that operations on them do not overflow.
So let's look at some quantities and see if they need to be reduced.
p was defined as some value less than m, so p % m == p.
p_pow and hash_value have already been reduced modulo m when they were computed, reducing them modulo m again would do nothing.
char_value is at most 26, which is already less than m.
char_value * p_pow is at most 26 * (m - 1). That can be, and often will be, more than m. So reducing it modulo m would do something. But it can still be delayed, because the next step is still "safe" (no overflow)
char_value * p_pow + hash_value is still at most 27 * (m - 1) which is still much less than 263-1 (the maximum value for a long long, see below why I assume that a long long is 64-bit), so there is no problem yet. It's fine to reduce modulo m after the addition.
As a bonus, the loop could actually do (263-1) / (27 * (m - 1)) iterations before it needs to reduce hash_value modulo m. That's over 341 million iterations! For most practical purposes you could therefore remove the first % m and return hash_value % m; instead.
I used 263-1 in this calculation because p_pow = (p_pow * p) % m requires long long to be a 64-bit type (or, hypothetically, an exotic size of 36 bits or higher). If it was a 32-bit type (which is technically allowed, but rare nowadays) then the multiplication could overflow, because p_pow can be approximately a billion and a 32-bit type cannot hold 31 billion.
BTW note that this hash function is specifically for strings that only contain lower-case letters and nothing else. Other characters could result in a negative value for char_value which is bad news because the remainder operator % in C++ works in a way such that for negative numbers it is not the "modulo operator" (misnomer, and the C++ specification does not call it that). A very similar function can be written that can take any string as input, and that would change the analysis above a little bit, but not qualitatively.

Hash Function for 3 Integers

I have 3 non-negative integers and a number n such that
0 <= a <= n, 0 <= b <= n, and 0 <= c <= n.
I need a one-way hash function that maps these 3 integers to one integer (could be any integer, positive or negative). Is there a way to do this, and if so, how? Is there a way so that this this function can be expressed as a simple mathematical expression where the only parameters are a, b, c, and n?
Note: I need this function because I was using tuples of 3 integers as keys in a dictionary on python, and with upwards of 10^10 keys, space is a real issue.
How about the Cantor pairing function (https://en.wikipedia.org/wiki/Pairing_function#Cantor_pairing_function)?
Let
H(a,b) := .5*(a + b)*(a + b + 1) + b
then
H(a,b,c) := .5*(H(a,b) + c)*(H(a,b) + c + 1) + c
You mentioned that you need a one-way hash, but based on your detailed description about memory constraints it seems that an invertible hash would also suffice.
This doesn't use the assumption that a, b, and c are bounded above and below.
Augmenting the answer above for a more concise implementation:
int cantor(int a, int b) {
return (a + b + 1) * (a + b) / 2 + b;
}
int hash(int a, int b, int c) {
return cantor(a, cantor(b, c));
}
The easiest way to understand this is that Cantor algorithm assigns a natural number to every integer pair of numbers.
Once we've assigned a natural number N = cantor(b, c), then we can assign a new unique natural number M = cantor(a, N), which we can use as a hash code and is a unique natural number for every triple a, b, c.
As a more general case, you could hash more integers by just another cantor with the next integer (e.g. cantor(a, cantor(b, cantor(c, d)))).

MATLAB: How to discard overflow digits binary addition?

I want to know if it's possible in MATLAB to discard overflow digits in MATLAB when I add two binary numbers.
I've only been able to find how to have a least number of binary digits, but how to I set a maximum number of digits?
For example:
e = dec2bin(bin2dec('1001') + bin2dec('1000'))
That gave me:
e =
10001
How do I get only '0001'?
dec2bin will always give you the minimum amount of bits to represent a number. If you would like to retain the n least significant digits, you have to index into the string and grab those yourself.
Specifically, if you want to retain only the n least significant digits, given that you have a base-10 number stored in num, you would do this:
out = dec2bin(num);
out = out(end-n+1:end);
Bear in mind that this performs no error checking. Should n be larger than the total number of bits in the string, you will get an out of bounds error. I'm assuming you're smart enough to use this and know what you're doing.
So for your example:
e = dec2bin(bin2dec('1001') + bin2dec('1000'));
n = 4, and so:
>> n = 4;
>> e = e(end-n+1:end)
e =
0001
Here is a more robust (but less efficient, I fear) way: What you describe is exactly the modulo operation. A 4-bit binary number is the remainder after a division by 0b10000 = 16. This can be done using the mod function in MATLAB.
>> e = dec2bin(mod(bin2dec('1001') + bin2dec('1000'),16),4)
e =
0001
Note: I added 4 as additional argument to the dec2bin function, so the output will always be 4-bit wide.
This can of course be generalized to any bit width: If you want to add 8-bit numbers, you will need the remainder of the division by 0b1'0000'0000 = 256, for example
>> e = dec2bin(mod(bin2dec('10011001') + bin2dec('10001000'),256),8)
e =
00100001
Or for shorter numbers, e.g. 2-bit wide, it is 0b100 = 4:
>> e = dec2bin(mod(bin2dec('10') + bin2dec('11'),4),2)
e =
01

Multiply two variables in Matlab with vpa - high precision

I want to be sure that two variables, a and b, are multiplied with high precision, i.e., perform the product c of a and b with arbitrary precision (in my case 50 correct decimal digits):
a = vpa(10/3,50);
b = vpa(7/13,50);
c = eval(vpa(vpa(a,50)*vpa(b,50),50)); % I basically want to do just c = a*b;
which gives me
a = 3.3333333333333333333333333333333333333333333333333
b = 0.53846153846153846153846153846153846153846153846154
c = 23.333333333333333333333333333333
Testing
d = eval(vpa(c*13,50))
gives
d = 23.333333333333333333333333333333333333335292490585
which shows that the multiplication to get c was not carried out with 50 significant digits.
What's wrong here, but, more importantly, how do I get a correct result for a*b and for other operations such as exp?
First, should use vpa('7/13',50) or vpa(7,50)/13 to avoid the possibility of losing precision dues to 7/13 being calculated in double precision floating point (I believe that vpa, like sym, tries to guess common constants and rational fractions, but you shouldn't rely on it).
The issue is that while a and b are stored as 50-digit variable precision values, your multiplication is still being performed according to the default value of digits (32). The second argument to vpa only appears to specify the precision of the variable, not any subsequent operations on or with it (the documentation is not particularly helpful in this respect).
One way to accomplish what you want would be:
old_digits = digits(50);
a = vpa('10/3')
b = vpa('7/13')
c = a*b
d = 13*c
digits(old_digits);
Another would be to use exact symbolic expressions for all of the math (potentially more expensive) and then convert the result to 50-digit variable precision at the end:
a = sym('10/3')
b = sym('7/13')
c = a*b
d = vpa(13*c,50)
Both methods return 23.333333333333333333333333333333333333333333333333 for d.

Generating Unique ID Number from Two Integers

Given 2 integers a and b (positive or negative). Is there any formula / method for generating unique ID number?
note: 1. result from f(a,b) and f(b,a) should be different. 2. calculating f(a,b) for x times (x > 1), the result should be same.
To make clear about the question, this function f(n) = (n * p) % q (where n=input sequence value, p=step size, q=maximum result size, n=non-negative integer, n < q, p < q, p ⊥ q (coprime)) will give unique ID number.
But, in my requirement, input are two numbers, a and b can be negative or positive integer.
any reference is appreciable
You could generate a long (64 bit) from 2 integers (32 bit) by just right bit shifting the first integer with 32 and then add the second integer.
private long uniqueId(int left, int right) {
long uniqueId = (long) left;
uniqueId = uniqueId <<< 32;
uniqueId += (long) right;
return uniqueId;
}
Say your integers have a range in [MIN_INT,MAX_INT]. Then, given an integer n from this range, the function
f(n) = n - MIN_INT
attributes a unique positive integer f(n) in the range [0, MAX_INT - MIN_INT], which is often called a rank.
Denote M = MAX_INT - MIN_INT + 1. Then, to find a unique id g(n,m) of two concatenated integers n and m, you can use the common access style also used for two-dimensional arrays:
g(n,m) = f(n)*M + f(m)
That is, you simply offset the second integer by the largest possible value and count on.
Practically, of course, you have to be careful in order to avoid overflows -- that is, you should use some suited data types.
Here is an example: say your integers come from the range [-1,4], thus M=6. Then, for two integers n=3 and m=-1 out of this range, g(n,m) = 3*6 + 0 = 18 can be used as id.