Why is modulus different in different programming languages? - perl

Perl
print 2 % -18;
-->
-16
Tcl
puts [expr {2 % -18}]
-->
-16
but VBScript
wscript.echo 2 mod -18
-->
2
Why the difference?

The wikipedia answer is fairly helpful here.
A short summary is that any integer can be defined as
a = qn + r
where all of these letters are integers, and
0 <= |r| < |n|.
Almost every programming language will require that (a/n) * n + (a%n) = a. So the definition of modulus will nearly always depend on the definition of integer division. There are two choices for integer division by negative numbers 2/-18 = 0 or 2/-18 = -1. Depending on which one is true for your language will usually change the % operator.
This is because 2 = (-1) * -18 + (-16) and 2 = 0 * -18 + 2.
For Perl the situation is complicated. The manual page says: "Note that when use integer is in scope, "%" gives you direct access to the modulus operator as implemented by your C compiler. This operator is not as well defined for negative operands, but it will execute faster. " So it can choose either option for Perl (like C) if use integer is in scope. If use integer is not in scope, the manual says " If $b is negative, then $a % $b is $a minus the smallest multiple of $b that is not less than $a (i.e. the result will be less than or equal to zero). "

Wikipedia's "Modulo operation" page explains it quite well. I won't try to do any better here, as I'm likely to make a subtle but important mistake.
The rub of it is that you can define "remainder" or "modulus" in different ways, and different languages have chosen different options to implement.

After dividing a number and a divisor, one of which is negative, you have at least two ways to separate them into a quotient and a remainder, such that quotient * divisor + remainder = number: you can either round the quotient towards negative infinity, or towards zero.
Many languages just choose one.
I can't resist pointing out that Common Lisp offers both.

python, of course, explicitly informs you
>>> divmod(2,-18)
(-1, -16)

Related

Hashing using division method

For the hash function : h(k) = k mod m;
I understand that m=2^n will always give the last n LSB digits. I also understand that m=2^p-1 when K is a string converted to integers using radix 2^p will give same hash value for every permutation of characters in K. But why exactly "a prime not too close to an exact power of 2" is a good choice? What if I choose 2^p - 2 or 2^p-3? Why are these choices considered bad?
Following is the text from CLRS:
"A prime not too close to an exact power of 2 is often a good choice for m. For
example, suppose we wish to allocate a hash table, with collisions resolved by
chaining, to hold roughly n D 2000 character strings, where a character has 8 bits.
We don’t mind examining an average of 3 elements in an unsuccessful search, and
so we allocate a hash table of size m D 701. We could choose m D 701 because
it is a prime near 2000=3 but not near any power of 2."
Suppose we work with radix 2p.
2p-1 case:
Why that is a bad idea to use 2p-1? Let us see,
k = ∑ai2ip
and if we divide by 2p-1 we just get
k = ∑ai2ip = ∑ai mod 2p-1
so, as addition is commutative, we can permute digits and get the same result.
2p-b case:
Quote from CLRS:
A prime not too close to an exact power of 2 is often a good choice for m.
k = ∑ai2ip = ∑aibi mod 2p-b
So changing least significant digit by one will change hash by one. Changing second least significant bit by one will change hash by two. To really change hash we would need to change digits with bigger significance. So, in case of small b we face problem similar to the case then m is power of 2, namely we depend on distribution of least significant digits.

How to calculate machine epsilon in MATLAB?

I need to find the machine epsilon and I am doing the following:
eps = 1;
while 1.0 + eps > 1.0 do
eps = eps /2;
end
However, it shows me this:
Undefined function or variable 'do'.
Error in epsilon (line 3)
while 1.0 + eps > 1.0 do
What should I do?
First and foremost, there is no such thing as a do keyword in MATLAB, so eliminate that from your code. Also, don't use eps as an actual variable. This is a pre-defined function in MATLAB that calculates machine epsilon, which is also what you are trying to calculate. By creating a variable called eps, you would shadow over the actual function, and so any other functions in MATLAB that require its use will behave unexpectedly, and that's not what you want.
Use something else instead, like macheps. Also, your algorithm is slightly incorrect. You need to check for 1.0 + (macheps/2) in your while loop, not 1.0 + macheps.
In other words, do this:
macheps = 1;
while 1.0 + (macheps/2) > 1.0
macheps = macheps / 2;
end
This should give you 2.22 x 10^{-16}, which agrees with MATLAB if you type in eps in the command prompt. To double-check:
>> format long
>> macheps
macheps =
2.220446049250313e-16
>> eps
ans =
2.220446049250313e-16
Bonus
In case you didn't know, machine epsilon is the upper bound on the relative error due to floating point arithmetic. In other words, this would be the maximum difference expected between a true floating point number and one that is calculated on a computer due to the finite number of bits used to store a floating point number.
If you recall, floating numbers inevitably are represented as binary bits on your computer (or pretty much anything digital). In terms of the IEEE 754 floating point standard, MATLAB assumes that all numerical values are of type double, which represents floating point numbers as 64-bits. You can obviously override this behaviour by explicitly casting to another type. With the IEEE 754 floating point standard, for double precision type numbers, there are 52 bits that represent the fractional part of the number.
Here's a nice diagram of what I am talking about:
Source: Wikipedia
You see that there is one bit reserved for the sign of the number, 11 bits that are reserved for exponent base and finally, 52 bits are reserved for the fractional part. This in total adds up to 64 bits. The fractional part is a collection or summation of numbers of base 2, with negative exponents starting from -1 down to -52. The MSB of the floating point number starts with2^{-1}, all the way down to 2^{-52} as the LSB. Essentially, machine epsilon calculates the maximum resolution difference for an increase of 1 bit in binary between two numbers, given that they have the same sign and the same exponent base. Technically speaking, machine epsilon actually equals to 2^{-52} as this is the maximum resolution of a single bit in floating point, given those conditions that I talked about earlier.
If you actually look at the code above closely, the division by 2 is bit-shifting your number to the right by one position at each iteration, starting at the whole value of 1, or 2^{0}, and we take this number and add this to 1. We keep bit-shifting, and seeing what the value is equal to by adding this bit-shifted value by 1, and we go up until the point where when we bit shift to the right, a change is no longer registered. If you bit-shift any more to the right, the value will become 0 due to underflow, and so 1.0 + 0.0 = 1.0, and this is no longer > 1.0, which is what the while loop is checking.
Once the while loop quits, it is this threshold that defines machine epsilon. If you're curious, if you punch in 2^{-52} in the command prompt, you will get what eps is equal to:
>> 2^-52
ans =
2.220446049250313e-16
This makes sense as you are shifting one bit to the right 52 times, and the point before the loop stops would be at its LSB, which is 2^{-52}. For the sake of being complete, if you were to place a counter inside your while loop, and count up how many times the while loop executes, it will execute exactly 52 times, representing 52 bit shifts to the right:
macheps = 1;
count = 0;
while 1.0 + (macheps/2) > 1.0
macheps = macheps / 2;
count = count + 1;
end
>> count
count =
52
It looks like you may want something like this:
eps = 1;
while (1.0 + eps > 1.0)
eps = eps /2;
end
Although there is a fine answer above, I thought I would add a Octave and Matlab method mentioned[1].
>> a = 1; b = 1; while a+b~=a; b= b/2; end
It is read as a plus b is not equal to a.
References:
[1] Alfio Quarteroni, Fausto Saleri, and Paola Gervasio. 2016. Scientific Computing with MATLAB and Octave. Springer Publishing Company, Incorporated.

mod() operation weird behavior

I use mod() to compare if a number's 0.01 digit is 2 or not.
if mod(5.02*100, 10) == 2
...
end
The result is mod(5.02*100, 10) = 2 returns 0;
However, if I use mod(1.02*100, 10) = 2 or mod(20.02*100, 10) = 2, it returns 1.
The result of mod(5.02*100, 10) - 2 is
ans =
-5.6843e-14
Could it be possible that this is a bug for matlab?
The version I used is R2013a. version 8.1.0
This is not a bug in MATLAB. It is a limitation of floating point arithmetic and conversion between binary and decimal numbers. Even a simple decimal number such as 0.1 has cannot be exactly represented as a binary floating point number with finite precision.
Computer floating point arithmetic is typically not exact. Although we are used to dealing with numbers in decimal format (base10), computers store and process numbers in binary format (base2). The IEEE standard for double precision floating point representation (see http://en.wikipedia.org/wiki/Double-precision_floating-point_format, what MATLAB uses) specifies the use of 64 bits to represent a binary number. 1 bit is used for the sign, 52 bits are used for the mantissa (the actual digits of the number), and 11 bits are used for the exponent and its sign (which specifies where the decimal place goes).
When you enter a number into MATLAB, it is immediately converted to binary representation for all manipulations and arithmetic and then converted back to decimal for display and output.
Here's what happens in your example:
Convert to binary (keeping only up to 52 digits):
5.02 => 1.01000001010001111010111000010100011110101110000101e2
100 => 1.1001e6
10 => 1.01e3
2 => 1.0e1
Perform multiplication:
1.01000001010001111010111000010100011110101110000101 e2
x 1.1001 e6
--------------------------------------------------------------
0.000101000001010001111010111000010100011110101110000101
0.101000001010001111010111000010100011110101110000101
+ 1.01000001010001111010111000010100011110101110000101
-------------------------------------------------------------
1.111101011111111111111111111111111111111111111111111101e8
Cutting off at 52 digits gives 1.111101011111111111111111111111111111111111111111111e8
Note that this is not the same as 1.11110110e8 which would be 502.
Perform modulo operation: (there may actually be additional error here depending on what algorithm is used within the mod() function)
mod( 1.111101011111111111111111111111111111111111111111111e8, 1.01e3) = 1.111111111111111111111111111111111111111111100000000e0
The error is exactly -2-44 which is -5.6843x10-14. The conversion between decimal and binary and the rounding due to finite precision have caused a small error. In some cases, you get lucky and rounding errors cancel out and you might still get the 'right' answer which is why you got what you expect for mod(1.02*100, 10), but In general, you cannot rely on this.
To use mod() correctly to test the particular digit of a number, use round() to round it to the nearest whole number and compensate for floating point error.
mod(round(5.02*100), 10) == 2
What you're encountering is a floating point error or artifact, like the commenters say. This is not a Matlab bug; it's just how floating point values work. You'd get the same results in C or Java. Floating point values are "approximate" types, so exact equality comparisons using == without some rounding or tolerance are prone to error.
>> isequal(1.02*100, 102)
ans =
1
>> isequal(5.02*100, 502)
ans =
0
It's not the case that 5.02 is the only number this happens for; several around 0 are affected. Here's an example that picks out several of them.
x = 1.02:1000.02;
ix = mod(x .* 100, 10) ~= 2;
disp(x(ix))
To understand the details of what's going on here (and in many other situations you'll encounter working with floats), have a read through the Wikipedia entry for "floating point", or my favorite article on it, "What Every Computer Scientist Should Know About Floating-Point Arithmetic". (That title is hyperbole; this article goes deep and I don't understand half of it. But it's a great resource.) This stuff is particularly relevant to Matlab because Matlab does everything in floating point by default.

Compute 4^x mod 2π for large x

I need to compute sin(4^x) with x > 1000 in Matlab, with is basically sin(4^x mod 2π) Since the values inside the sin function become very large, Matlab returns infinite for 4^1000. How can I efficiently compute this?
I prefer to avoid large data types.
I think that a transformation to something like sin(n*π+z) could be a possible solution.
You need to be careful, as there will be a loss of precision. The sin function is periodic, but 4^1000 is a big number. So effectively, we subtract off a multiple of 2*pi to move the argument into the interval [0,2*pi).
4^1000 is roughly 1e600, a really big number. So I'll do my computations using my high precision floating point tool in MATLAB. (In fact, one of my explicit goals when I wrote HPF was to be able to compute a number like sin(1e400). Even if you are doing something for the fun of it, doing it right still makes sense.) In this case, since I know that the power we are interested in is roughly 1e600, then I'll do my computations in more than 600 digits of precision, expecting that I'll lose 600 digits by the subtractive cancellation. This is a massive subtractive cancellation issue. Think about it. That modulus operation is effectively a difference between two numbers that will be identical for the first 600 digits or so!
X = hpf(4,1000);
X^1000
ans =
114813069527425452423283320117768198402231770208869520047764273682576626139237031385665948631650626991844596463898746277344711896086305533142593135616665318539129989145312280000688779148240044871428926990063486244781615463646388363947317026040466353970904996558162398808944629605623311649536164221970332681344168908984458505602379484807914058900934776500429002716706625830522008132236281291761267883317206598995396418127021779858404042159853183251540889433902091920554957783589672039160081957216630582755380425583726015528348786419432054508915275783882625175435528800822842770817965453762184851149029376
What is the nearest multiple of 2*pi that does not exceed this number? We can get that by a simple operation.
twopi = 2*hpf('pi',1000);
twopi*floor(X^1000/twopi)
ans = 114813069527425452423283320117768198402231770208869520047764273682576626139237031385665948631650626991844596463898746277344711896086305533142593135616665318539129989145312280000688779148240044871428926990063486244781615463646388363947317026040466353970904996558162398808944629605623311649536164221970332681344168908984458505602379484807914058900934776500429002716706625830522008132236281291761267883317206598995396418127021779858404042159853183251540889433902091920554957783589672039160081957216630582755380425583726015528348786419432054508915275783882625175435528800822842770817965453762184851149029372.6669043995793459614134256945369645075601351114240611660953769955068077703667306957296141306508448454625087552917109594896080531977700026110164492454168360842816021326434091264082935824243423723923797225539436621445702083718252029147608535630355342037150034246754736376698525786226858661984354538762888998045417518871508690623462425811535266975472894356742618714099283198893793280003764002738670747
As you can see, the first 600 digits were the same. Now, when we subtract the two numbers,
X^1000 - twopi*floor(X^1000/twopi)
ans =
3.333095600420654038586574305463035492439864888575938833904623004493192229633269304270385869349155154537491244708289040510391946802229997388983550754583163915718397867356590873591706417575657627607620277446056337855429791628174797085239146436964465796284996575324526362330147421377314133801564546123711100195458248112849130937653757418846473302452710564325738128590071680110620671999623599726132925263826
This is why I referred to it as a massive subtractive cancellation issue. The two numbers were identical for many digits. Even carrying 1000 digits of accuracy, we lost many digits. When you subtract the two numbers, even though we are carrying a result with 1000 digits, only the highest order 400 digits are now meaningful.
HPF is able to compute the trig function of course. But as we showed above, we should only trust roughly the first 400 digits of the result. (On some problems, the local shape of the sin function might cause us to lose more digits than that.)
sin(X^1000)
ans =
-0.1903345812720831838599439606845545570938837404109863917294376841894712513865023424095542391769688083234673471544860353291299342362176199653705319268544933406487071446348974733627946491118519242322925266014312897692338851129959945710407032269306021895848758484213914397204873580776582665985136229328001258364005927758343416222346964077953970335574414341993543060039082045405589175008978144047447822552228622246373827700900275324736372481560928339463344332977892008702220160335415291421081700744044783839286957735438564512465095046421806677102961093487708088908698531980424016458534629166108853012535493022540352439740116731784303190082954669140297192942872076015028260408231321604825270343945928445589223610185565384195863513901089662882903491956506613967241725877276022863187800632706503317201234223359028987534885835397133761207714290279709429427673410881392869598191090443394014959206395112705966050737703851465772573657470968976925223745019446303227806333289071966161759485260639499431164004196825
So am I right, and we cannot trust all of these digits? I'll do the same computation, once in 1000 digits of precision, then a second time in 2000 digits. Compute the absolute difference, then take the log10. The 2000 digit result will be our reference as essentially exact compared to the 1000 digit result.
double(log10(abs(sin(hpf(4,[1000 0])^1000) - sin(hpf(4,[2000 0])^1000))))
ans =
-397.45
Ah. So of those 1000 digits of precision we started out with, we lost 602 digits. The last 602 digits in the result are non-zero, but still complete garbage. This was as I expected. Just because your computer reports high precision, you need to know when not to trust it.
Can we do the computation without recourse to a high precision tool? Be careful. For example, suppose we use a powermod type of computation? Thus, compute the desired power, while taking the modulus at every step. Thus, done in double precision:
X = 1;
for i = 1:1000
X = mod(X*4,2*pi);
end
sin(X)
ans =
0.955296299215251
Ah, but remember that the true answer was -0.19033458127208318385994396068455455709388...
So there is essentially nothing of significance remaining. We have lost all our information in that computation. As I said, it is important to be careful.
What happened was after each step in that loop, we incurred a tiny loss in the modulus computation. But then we multiplied the answer by 4, which caused the error to grow by a factor of 4, and then another factor of 4, etc. And of course, after each step, the result loses a tiny bit at the end of the number. The final result was complete crapola.
Lets look at the operation for a smaller power, just to convince ourselves what happened. Here for example, try the 20th power. Using double precision,
mod(4^20,2*pi)
ans =
3.55938555711037
Now, use a loop in a powermod computation, taking the mod after every step. Essentially, this discards multiples of 2*pi after each step.
X = 1;
for i = 1:20
X = mod(X*4,2*pi);
end
X
X =
3.55938555711037
But is that the correct value? Again, I'll use hpf to compute the correct value, showing the first 20 digits of that number. (Since I've done the computation in 50 total digits, I'll absolutely trust the first 20 of them.)
mod(hpf(4,[20,30])^20,2*hpf('pi',[20,30]))
ans =
3.5593426962577983146
In fact, while the results in double precision agree to the last digit shown, those double results were both actually wrong past the 5th significant digit. As it turns out, we STILL need to carry more than 600 digits of precision for this loop to produce a result of any significance.
Finally, to fully kill this dead horse, we might ask if a better powermod computation can be done. That is, we know that 1000 can be decomposed into a binary form (use dec2bin) as:
512 + 256 + 128 + 64 + 32 + 8
ans =
1000
Can we use a repeated squaring scheme to expand that large power with fewer multiplications, and so cause less accumulated error? Essentially, we might try to compute
4^1000 = 4^8 * 4^32 * 4^64 * 4^128 * 4^256 * 4^512
However, do this by repeatedly squaring 4, then taking the mod after each operation. This fails however, since the modulo operation will only remove integer multiples of 2*pi. After all, mod really is designed to work on integers. So look at what happens. We can express 4^2 as:
4^2 = 16 = 3.43362938564083 + 2*(2*pi)
Can we just square the remainder however, then taking the mod again? NO!
mod(3.43362938564083^2,2*pi)
ans =
5.50662545075664
mod(4^4,2*pi)
ans =
4.67258771281655
We can understand what happened when we expand this form:
4^4 = (4^2)^2 = (3.43362938564083 + 2*(2*pi))^2
What will you get when you remove INTEGER multiples of 2*pi? You need to understand why the direct loop allowed me to remove integer multiples of 2*pi, but the above squaring operation does not. Of course, the direct loop failed too because of numerical issues.
I would first redefine the question as follows: compute 4^1000 modulo 2pi. So we have split the problem in two.
Use some math trickery:
(a+2pi*K)*(b+2piL) = ab + 2pi*(garbage)
Hence, you can just multiply 4 many times by itself and computing mod 2pi every stage. The real question to ask, of course, is what is the precision of this thing. This needs careful mathematical analysis. It may or may not be a total crap.
Following to Pavel's hint with mod I found a mod function for high powers on mathwors.com.
bigmod(number,power,modulo) can NOT compute 4^4000 mod 2π. Because it just works with integers as modulo and not with decimals.
This statement is not correct anymore: sin(4^x) is sin(bidmod(4,x,2*pi)).

Problem with arithmetic using logarithms to avoid numerical underflow (take 2)

I have two lists of fractions;
say A = [ 1/212, 5/212, 3/212, ... ]
and B = [ 4/143, 7/143, 2/143, ... ].
If we define A' = a[0] * a[1] * a[2] * ... and B' = b[0] * b[1] * b[2] * ...
I want to calculate a normalised value of A' and B'
ie specifically the values of A' / (A'+B') and B' / (A'+B')
My trouble is A are B are both quite long and each value is small so calculating the product causes numerical underflow very quickly...
I understand turning the product into a sum through logarithms can help me determine which of A' or B' is greater
ie max( log(a[0])+log(a[1])+..., log(b[0])+log(b[1])+... )
and that using logs I can calculate the value of A' / B' but how do I do A' / A'+B'
My best bet to date is to keep the number representations as fractions, ie A = [ [1,212], [5,212], [3,212], ... ] and implement my own arithmetic but it's getting clumsy and I have a feeling there is a (simple) way of logarithms I'm just missing....
The numerators for A and B don't come from a sequence. They might as well be random for the purpose of this question. If it helps the denominators for all values in A are the same, as are all the denominators for B.
Any ideas most welcome!
( ps. I asked a similar question 24 hours ago regarding the ratio A'/B' but it was actually the wrong question to ask. I'm actually after A'/(A'+B'). Sorry, my mistake. )
I see few ways here
First of all you can notice that
A' / (A'+B') = 1 / (1 + B'/A')
and you know how to calculate B'/A' with logarithms.
Another way is to implement your own rational arithmetic but you don't need to go far about it. Since you know that denominators are the same for the whole array, it immediately gives you
numerator(A') = numerator(a[0]) * numerator(a[1]) ...
denumerator(A') = denumerator(a[0]) ^ A.length
All you now need to do is to sum A' and B' which is easy and then multiply A' and 1/(A'+B') which is also easy. The hardest part here is to normalize resulting value which is done with modulo operation and is trivial.
Alternatively, since you most likely using some popular scripting language, most of them have classes for rational arithmetic built in, Python and Ruby have them for sure.
My best bet to date is to keep the number representations as fractions and implement my own arithmetic but it's getting clumsy
What language are you using? If you can overload operators, it should be really easy to make up a Fraction class that you can treat as a number pretty much everywhere.
As an example, determining whether a fraction A / B is larger than C / D is basically comparing whether A * D is larger than B * C.
Both A and B have the same denominator in each fraction you mention. Is that true for every term in the list? If that's so, why don't you factor that out when you calculate the product? The denominator will simply be X^n, when X is the value and n is the number of terms in the list.
If you do that, you'll have the opposite problem: overflow in the numerator. You know that it can't be smaller than max(X)^n, where max(X) is the maximum value in the numerator and n is the number of terms in the list. If you can calculate that, you can see if your computer will have a problem. You can't put 10 pounds of anything in a 5 pound bag.
Unfortunately, the properties of logarithms limit you to the following simplifications:
(source: equationsheet.com)
and
(source: equationsheet.com)
So you're stuck with:
(source: equationsheet.com)
If you're using a language that supports infinite precision numbers (e.g., Java BigDecimal) it might make your life a little easier. But there's still a good argument for doing some thinking before you compute. Why use brute force when you can be elegant?
Well ... if you know A'(A'+B'), then B'(A'+B') should be one minus that. I personally would not use logarithms. I would use the actual fractions. I would also use some sort of BigInt class to represent the numerator and denominator. Which language are you using? Python can be a good fit.