How to calculate machine epsilon in MATLAB? - matlab

I need to find the machine epsilon and I am doing the following:
eps = 1;
while 1.0 + eps > 1.0 do
eps = eps /2;
end
However, it shows me this:
Undefined function or variable 'do'.
Error in epsilon (line 3)
while 1.0 + eps > 1.0 do
What should I do?

First and foremost, there is no such thing as a do keyword in MATLAB, so eliminate that from your code. Also, don't use eps as an actual variable. This is a pre-defined function in MATLAB that calculates machine epsilon, which is also what you are trying to calculate. By creating a variable called eps, you would shadow over the actual function, and so any other functions in MATLAB that require its use will behave unexpectedly, and that's not what you want.
Use something else instead, like macheps. Also, your algorithm is slightly incorrect. You need to check for 1.0 + (macheps/2) in your while loop, not 1.0 + macheps.
In other words, do this:
macheps = 1;
while 1.0 + (macheps/2) > 1.0
macheps = macheps / 2;
end
This should give you 2.22 x 10^{-16}, which agrees with MATLAB if you type in eps in the command prompt. To double-check:
>> format long
>> macheps
macheps =
2.220446049250313e-16
>> eps
ans =
2.220446049250313e-16
Bonus
In case you didn't know, machine epsilon is the upper bound on the relative error due to floating point arithmetic. In other words, this would be the maximum difference expected between a true floating point number and one that is calculated on a computer due to the finite number of bits used to store a floating point number.
If you recall, floating numbers inevitably are represented as binary bits on your computer (or pretty much anything digital). In terms of the IEEE 754 floating point standard, MATLAB assumes that all numerical values are of type double, which represents floating point numbers as 64-bits. You can obviously override this behaviour by explicitly casting to another type. With the IEEE 754 floating point standard, for double precision type numbers, there are 52 bits that represent the fractional part of the number.
Here's a nice diagram of what I am talking about:
Source: Wikipedia
You see that there is one bit reserved for the sign of the number, 11 bits that are reserved for exponent base and finally, 52 bits are reserved for the fractional part. This in total adds up to 64 bits. The fractional part is a collection or summation of numbers of base 2, with negative exponents starting from -1 down to -52. The MSB of the floating point number starts with2^{-1}, all the way down to 2^{-52} as the LSB. Essentially, machine epsilon calculates the maximum resolution difference for an increase of 1 bit in binary between two numbers, given that they have the same sign and the same exponent base. Technically speaking, machine epsilon actually equals to 2^{-52} as this is the maximum resolution of a single bit in floating point, given those conditions that I talked about earlier.
If you actually look at the code above closely, the division by 2 is bit-shifting your number to the right by one position at each iteration, starting at the whole value of 1, or 2^{0}, and we take this number and add this to 1. We keep bit-shifting, and seeing what the value is equal to by adding this bit-shifted value by 1, and we go up until the point where when we bit shift to the right, a change is no longer registered. If you bit-shift any more to the right, the value will become 0 due to underflow, and so 1.0 + 0.0 = 1.0, and this is no longer > 1.0, which is what the while loop is checking.
Once the while loop quits, it is this threshold that defines machine epsilon. If you're curious, if you punch in 2^{-52} in the command prompt, you will get what eps is equal to:
>> 2^-52
ans =
2.220446049250313e-16
This makes sense as you are shifting one bit to the right 52 times, and the point before the loop stops would be at its LSB, which is 2^{-52}. For the sake of being complete, if you were to place a counter inside your while loop, and count up how many times the while loop executes, it will execute exactly 52 times, representing 52 bit shifts to the right:
macheps = 1;
count = 0;
while 1.0 + (macheps/2) > 1.0
macheps = macheps / 2;
count = count + 1;
end
>> count
count =
52

It looks like you may want something like this:
eps = 1;
while (1.0 + eps > 1.0)
eps = eps /2;
end

Although there is a fine answer above, I thought I would add a Octave and Matlab method mentioned[1].
>> a = 1; b = 1; while a+b~=a; b= b/2; end
It is read as a plus b is not equal to a.
References:
[1] Alfio Quarteroni, Fausto Saleri, and Paola Gervasio. 2016. Scientific Computing with MATLAB and Octave. Springer Publishing Company, Incorporated.

Related

MATLAB numeric precision when generating a numeric sequence

I was testing a operation like this:
[input] 3.9/0.1 : 4.1/0.1
[output] 39 40
don't know why 4.1/0.1 is approximated to 40. If I add a round(), it will go as expected:
[input] 3.9/0.1 : round(4.1/0.1)
[output] 39 40 41
What's wrong with the first operation?
In this Q&A I go into detail on how the colon operator works in MATLAB to create a range. But the detail that causes the issue described in this question is not covered there.
That post includes the full code for a function that imitates exactly what the colon operator does. Let's follow that code. We start with start = 3.9/0.1, which is exactly 39, and stop = 4.1/0.1, which, due to rounding errors, is just slightly smaller than 41, and step = 1 (the default if it's not given).
It starts by computing a tolerance:
tol = 2.0*eps*max(abs(start),abs(stop));
This tolerance is intended to be used so that the stop value, if within tol of an exact number of steps, is still used, if the last step would step over it. Without a tolerance, it would be really difficult to build correct sequences using floating-point end points and step sizes.
However, then we get this test:
if start == floor(start) && step == 1
% Consecutive integers.
n = floor(stop) - start;
elseif ...
If the start value is an exact integer, and the step size is 1, then it forces the sequence to be an integer sequence. Unfortunately, it does so by taking the number of steps as the distance between floor(stop) and start. That is, it is not using the tolerance computed earlier in determining the right stop! If stop is slightly above an integer, that integer will be in the range. If stop is slightly below an integer (as in the case of the OP), that integer will not be part of the range.
It could be debated whether MATLAB should round the stop number up in this case or not. MATLAB chose not to. All of the sequences produced by the colon operator use the start and stop values exactly as given by the user. It leaves it up to the user to ensure the bounds of the sequence are as required.
However, if the colon operator hadn't special-cased the sequence of integers, the result would have been less surprising in this case. Let's add a very small number to the start value, so it's not an integer:
>> a = 3.9/0.1 : 4.1/0.1
a =
39 40
>> b = 3.9/0.1 + eps(39) : 4.1/0.1
b =
39.0000 40.0000 41.0000
Floating-point numbers suffer from loss of precision when represented with a fixed number of bits (64-bit in MATLAB by default). This is because there are infinite number of real numbers (even within a small range of say 0.0 to 0.1). On the other hand, a n-bit binary pattern can represent a finite 2^n distinct numbers. Hence, not all the real numbers can be represented. The nearest approximation will be used instead, resulted in loss of accuracy.
The closest representable value for 4.1/0.1 in the computer as a 64-bit double precision floating point number is actually,
4.1/0.1 ≈ 40.9999999999999941713291207...
So, in essence, 4.1/0.1 < 41.0 and that's what you get from the range. If you subtract, for example, 41 - 4.1/0.1 = 7.105427357601002e-15. But when you round, you get the closest value of 41.0 as expected.
The representation scheme for 64-bit double-precision according to the IEEE-754 standard:
The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative numbers.
The following 11 bits represent exponent (E).
The remaining 52 bits represents fraction (F).

How is eps() calculated in MATLAB?

The eps routine in MATLAB essentially returns the positive distance between floating point numbers. It can take an optional argument, too.
My question: How does MATLAB calculate this value? (Does it use a lookup table, or does it use some algorithm to calculate it at runtime, or something else...?)
Related: how could it be calculated in any language providing bit access, given a floating point number?
WIkipedia has quite the page on it
Specifically for MATLAB it's 2^(-53), as MATLAB uses double precision by default. Here's the graph:
It's one bit for the sign, 11 for the exponent and the rest for the fraction.
The MATLAB documentation on floating point numbers also show this.
d = eps(x), where x has data type single or double, returns the positive distance from abs(x) to the next larger floating-point number of the same precision as x.
As not all fractions are equally closely spaced on the number line, different fractions will show different distances to the next floating-point within the same precision. Their bit representations are:
1.0 = 0 01111111111 0000000000000000000000000000000000000000000000000000
0.9 = 0 01111111110 1100110011001100110011001100110011001100110011001101
the sign for both is positive (0), the exponent is not equal and of course their fraction is vastly different. This means that the next floating point numbers would be:
dec2bin(typecast(eps(1.0), 'uint64'), 64) = 0 01111001011 0000000000000000000000000000000000000000000000000000
dec2bin(typecast(eps(0.9), 'uint64'), 64) = 0 01111001010 0000000000000000000000000000000000000000000000000000
which are not the same, hence eps(0.9)~=eps(1.0).
Here is some insight into eps which will help you to write an algorithm.
See that eps(1) = 2^(-52). Now, say you want to compute the eps of 17179869183.9. Note that, I have chosen a number which is 0.1 less than 2^34 (in other words, something like 2^(33.9999...)). To compute eps of this, you can compute log2 of the number, which would be ~ 33.99999... as mentioned before. Take a floor() of this number and add it to -52, since eps(1) = 2^(-52) and the given number 2^(33.999...). Therefore, eps(17179869183.9) = -52+33 = -19.
If you take a number which is fractionally more than 2^34, e.g., 17179869184.1, then the log2(eps(17179869184.1)) = -18. This also shows that the eps value will change for the numbers that are integer powers of your base (or radix), in this case 2. Since eps value only changes at those numbers which are integer powers of 2, we take floor of the power. You will be able to get the perfect value of eps for any number using this. I hope it is clear.
MATLAB uses (along with other languages) the IEEE754 standard for representing real floating point numbers.
In this format the bits allocated for approximating the actual1 real number, usually 32 - for single or 64 - for double precision, are grouped into: 3 groups
1 bit for determining the sign, s.
8 (or 11) bits for exponent, e.
23 (or 52) bits for the fraction, f.
Then a real number, n, is approximated by the following three - term - relation:
n = (-1)s * 2(e - bias) * (1 + fraction)
where the bias offsets negatively2 the values of the exponent so that they describe numbers between 0 and 1 / (1 and 2) .
Now, the gap reflects the fact that real numbers does not map perfectly to their finite, 32 - or 64 - bit, representations, moreover, a range of real numbers that differ by abs value < eps maps to a single value in computer memory, i.e: if you assign a values val to a variable var_i
var_1 = val - offset
...
var_i = val;
...
val_n = val + offset
where
offset < eps(val) / 2
Then:
var_1 = var_2 = ... = var_i = ... = var_n.
The gap is determined from the second term containing the exponent (or characteristic):
2(e - bias)
in the above relation3, which determines the "scale" of the "line" on which the approximated numbers are located, the larger the numbers, the larger the distance between them, the less precise they are and vice versa: the smaller the numbers, the more densely located their representations are, consequently, more accurate.
In practice, to determine the gap of a specific number, eps(number), you can start by adding / subtracting a gradually increasing small number until the initial value of the number of interest changes - this will give you the gap in that (positive or negative) direction, i.e. eps(number) / 2.
To check possible implementations of MATLAB's eps (or ULP - unit of last place , as it is called in other languages), you could search for ULP implementations either in C, C++ or Java, which are the languages MATLAB is written in.
1. Real numbers are infinitely preciser i.e. they could be written with arbitrary precision, i.e. with any number of digits after the decimal point.
2. Usually around the half: in single precision 8 bits mean decimal values from 1 to 2^8 = 256, around the half in our case is: 127, i.e. 2(e - 127)
2. It can be thought that: 2(e - bias), is representing the most significant digits of the number, i.e. the digits that contribute to describe how big the number is, as opposed to the least significant digits that contribute to describe its precise location. Then the larger the term containing the exponent, the smaller the significance of the 23 bits of the fraction.

Calculating floating points as binary

The question is :
x and y are two floating point numbers in 32-bit IEEE floating-point format
(8-bit exponent with bias 127) whose binary representation is as follows:
x: 1 10000001 00010100000000000000000
y: 0 10000010 00100001000000000000000
Compute their product z = x y and give the result in binary IEEE floating-point format.
So I've found out that X = -4.3125. y = 9.03125. i can multiply them and get -38.947265625. I don't know how to show it in a IEEE format. Thanks in advance for the help.
I agree with the comment that it should be done in binary, rather than by conversion to decimal and decimal multiplication. I used Exploring Binary to do the arithmetic.
The first step is to find the actual binary significands. Neither input is subnormal, so they are 1.000101 and 1.00100001.
Multiply them, getting 1.00110111100101.
Similarly, subtract the bias, binary 1111111, from the exponents, getting 10 and 11. Add those, getting 101, then add back the bias, 10000100.
The sign bit for multiplying two numbers with different sign bits will be 1.
Now pack it all back together. The signficand came out in the [1,2) range so there is no need to normalize and adjust the exponent. We are still in the normal range, so drop the 1 before the binary point in the significand. The significand is narrow enough to fit without rounding - just add enough trailing zeros.
1 10000100 00110111100101000000000
You've made it harder by converting to decimal, the way you'd have to convert it back. It's not that it can't be done that way, but it's harder by hand.
Without converting, the algorithm to multiply two floats is (roughly) this:
put the implicit 1 back (if applicable)
multiply, to full size (don't truncate) (you can get away with using just Guard and Sticky, if you know how they work)
add the exponents
xor the signs
normalize/round/handle special cases (under-/overflow)
So here, multiply (look up how binary multiply worked if you forgot)
1.00010100000000000000000 *
1.00100001000000000000000 =
1.00100001000000000000000 +
0.000100100001000000000000000 +
0.00000100100001000000000000000 =
1.00110111100101000000000000000
Add exponents (mind the bias), 2+3 = 5 in this case, so 132 = 10000100.
Xor the signs, get 1.
No rounding is necessary because the dropped bits are all zero anyway.
Result: 1 10000100 00110111100101000000000

mod() operation weird behavior

I use mod() to compare if a number's 0.01 digit is 2 or not.
if mod(5.02*100, 10) == 2
...
end
The result is mod(5.02*100, 10) = 2 returns 0;
However, if I use mod(1.02*100, 10) = 2 or mod(20.02*100, 10) = 2, it returns 1.
The result of mod(5.02*100, 10) - 2 is
ans =
-5.6843e-14
Could it be possible that this is a bug for matlab?
The version I used is R2013a. version 8.1.0
This is not a bug in MATLAB. It is a limitation of floating point arithmetic and conversion between binary and decimal numbers. Even a simple decimal number such as 0.1 has cannot be exactly represented as a binary floating point number with finite precision.
Computer floating point arithmetic is typically not exact. Although we are used to dealing with numbers in decimal format (base10), computers store and process numbers in binary format (base2). The IEEE standard for double precision floating point representation (see http://en.wikipedia.org/wiki/Double-precision_floating-point_format, what MATLAB uses) specifies the use of 64 bits to represent a binary number. 1 bit is used for the sign, 52 bits are used for the mantissa (the actual digits of the number), and 11 bits are used for the exponent and its sign (which specifies where the decimal place goes).
When you enter a number into MATLAB, it is immediately converted to binary representation for all manipulations and arithmetic and then converted back to decimal for display and output.
Here's what happens in your example:
Convert to binary (keeping only up to 52 digits):
5.02 => 1.01000001010001111010111000010100011110101110000101e2
100 => 1.1001e6
10 => 1.01e3
2 => 1.0e1
Perform multiplication:
1.01000001010001111010111000010100011110101110000101 e2
x 1.1001 e6
--------------------------------------------------------------
0.000101000001010001111010111000010100011110101110000101
0.101000001010001111010111000010100011110101110000101
+ 1.01000001010001111010111000010100011110101110000101
-------------------------------------------------------------
1.111101011111111111111111111111111111111111111111111101e8
Cutting off at 52 digits gives 1.111101011111111111111111111111111111111111111111111e8
Note that this is not the same as 1.11110110e8 which would be 502.
Perform modulo operation: (there may actually be additional error here depending on what algorithm is used within the mod() function)
mod( 1.111101011111111111111111111111111111111111111111111e8, 1.01e3) = 1.111111111111111111111111111111111111111111100000000e0
The error is exactly -2-44 which is -5.6843x10-14. The conversion between decimal and binary and the rounding due to finite precision have caused a small error. In some cases, you get lucky and rounding errors cancel out and you might still get the 'right' answer which is why you got what you expect for mod(1.02*100, 10), but In general, you cannot rely on this.
To use mod() correctly to test the particular digit of a number, use round() to round it to the nearest whole number and compensate for floating point error.
mod(round(5.02*100), 10) == 2
What you're encountering is a floating point error or artifact, like the commenters say. This is not a Matlab bug; it's just how floating point values work. You'd get the same results in C or Java. Floating point values are "approximate" types, so exact equality comparisons using == without some rounding or tolerance are prone to error.
>> isequal(1.02*100, 102)
ans =
1
>> isequal(5.02*100, 502)
ans =
0
It's not the case that 5.02 is the only number this happens for; several around 0 are affected. Here's an example that picks out several of them.
x = 1.02:1000.02;
ix = mod(x .* 100, 10) ~= 2;
disp(x(ix))
To understand the details of what's going on here (and in many other situations you'll encounter working with floats), have a read through the Wikipedia entry for "floating point", or my favorite article on it, "What Every Computer Scientist Should Know About Floating-Point Arithmetic". (That title is hyperbole; this article goes deep and I don't understand half of it. But it's a great resource.) This stuff is particularly relevant to Matlab because Matlab does everything in floating point by default.

Bit shifting a double value in MATLAB R2010b

I'm trying to perform a bit shift right operation on a double value in MATLAB 2010b. It seems that in newer MATLAB versions, this can be done using bitsra(), e.g.:
y = double(128);
bitsra(y,3)
but this function is not available in older versions.
What is the best way to achieve this?
You can use the bitshift function, which is available from at least MATLAB 2009a. From the documentation
c = bitshift(a, k) returns the value of a shifted by k bits.
When k is positive, 0-valued bits are shifted in on the right.
When k is negative, and a is unsigned, or a signed and positive, 0-valued bits are shifted in on the left.
When k is negative and a is a signed and negative, 1-valued bits are shifted in on the left.
On MATLAB 2012b
>> bitsra(128, 3)
ans =
16
On MATLAB 2009a:
>> bitshift(128, -3)
ans =
16
Edit: bitshift works with any fixed-point data type, although the error message generated by calling bitshift(128.5, -3) would suggest that it requires integer values. So bitshift(128.5, -3), for example, will not work since 128.5 is, by default, a floating point double precision variable. From the documentation for bitshift you can use the fi function from the floating-point toolbox to create fixed-point numbers. So to work with fractions one could do something like
>> bitshift(fi(128.5), -3)
ans =
16.025