How to tell if two numbers are really different or they are actually the same due to floating point error - matlab

For example,
0.168033639538270
and
0.168033639538270
are two double type numbers that are from two different calculations (some further calculations from the eigenvalues of a matrix).
But they are treated as different by MATLAB (by unique or ==). How do I know if MATLAB treats them as different due to floating point error eps = 2.220446049250313e-16, or if they are actually different (the digits behind the first 15 digits are not the same, but MATLAB just will not display them). Sometimes MATLAB treats two number with the same display value as the same, but sometimes different, so I want to know if they are really different.

You can print a formatted version of the number at required precision using sprintf, and then compare the two strings using strcmp.

Related

Why NumberLong(9007199254740993) matches NumberLong(9007199254740992) in MongoDB from mongo shell?

This situation happens when the given number is big enough (greater than 9007199254740992), along with more tests, I even found many adjacent numbers could match a single number.
Not only NumberLong(9007199254740996) would match NumberLong("9007199254740996"), but also NumberLong(9007199254740995) and NumberLong(9007199254740997).
When I want to act upon a record using its number, I could actually use three different adjacent numbers to get back the same record.
The accepted answer from here makes sense, I quote the most relevant part below:
Caveat: Don't try to invoke the constructor with a too large number, i.e. don't try db.foo.insert({"t" : NumberLong(1234657890132456789)}); Since that number is way too large for a double, it will cause roundoff errors. Above number would be converted to NumberLong("1234657890132456704"), which is wrong, obviously.
Here are some supplements to make things more clear:
Firstly, Mongo shell is a JavaScript shell. And JS does not distinguish between integer and floating-point values. All numbers in JS are represented as floating point values. This means mongo shell uses 64 bit floating point number by default. If shell sees "9007199254740995", it will treat this as a string and convert it to long long. But when we omit the double quotes, mongo shell will see unquoted 9007199254740995 and treat it as a floating-point number.
Secondly, JS uses the 64 bit floating-point format defined in IEEE 754 standard to represent numbers, the maximum it can represent is:
, and the minimum is:
There are an infinite number of real numbers, but only a limited number of real numbers can be accurately represented in the JS floating point format. This means that when you deal with real numbers in JS, the representation of the numbers will usually be an approximation of the actual numbers.
This brings the so-called rounding error issue. Because integers are also represented in binary floating-point format, the reason for the loss of trailing digits precision is actually the same as that of decimals.
The JS number format allows you to accurately represent all integers between
and
Here, since the numbers are bigger than 9007199254740992, the rounding error certainly occurs. The binary representation of NumberLong(9007199254740995), NumberLong(9007199254740996) and NumberLong(9007199254740997) are the same. So when we query with these three numbers in this way, we are practically asking for the same thing. As a result, we will get back the same record.
I think understanding that this problem is not specific to JS is important: it affects any programming language that uses binary floating point numbers.
You are misusing the NumberLong constructor.
The correct usage is to give it a string argument, as stated in the relevant documentation.
NumberLong("2090845886852")

How to count the number of significant digits?

For example, 5.020 would return 4. Preferably, it should work with vector inputs too.
I Googled around and found some answers, but none of them counted the last zero in 5.020.
From the given information, it is not possible.
The problem is that when you enter a number it is (per standard) represented as a double, and thus it has a precision of eps (the entered precision is lost). However, as one is typically not interested in showing all ~15 digits Matlab uses a couple of different display rules which are independent of the originally entered number, this typically involves the integer part plus 4 digits.
Additionally, the standard rule, when converting a number to a string (num2str) is to cutoff trailing zeros. Which is why you do not get the last zero.
Your only option is to count the number of significant digits when you obtain the data. Which leads back to the question #Beaker asks you in the comments

Representation of fixed point number in systemverilog

How should I represent a fixed point number in systemverilog since it doesnt support fixed point numbers for reg and logic.Is using real data type the correct method or can we use any different data type?
I am trying to do a square root function in systemverilog, in which the result will be in FP e.g sqrt(8) = 2.82.
What should be the data type of my inputs and outputs(sqrt) such that I can check the decimal point places correctly while verifying.
You use integral types for fixed point numbers. Some people will index their variables like
logic [M-1:-F] fp_number; // M-bits integer, F bit fractional
But it is up to you to adjust the decimal point when adding different sized numbers as well as adjusting for multiplication and division. There are some OpenCore libraries that have many of these operations for you.

How to turn off denormal number support in MATLAB?

I am trying to turn off denormal number support in matlab, so that basically any two computations that would result in a denormal number would instead just result in zero (DAZ, FTZ)
I've researched several sites include the one below, but I haven't found anything about doing this.
http://blogs.mathworks.com/cleve/2014/07/21/floating-point-denormals-insignificant-but-controversial-2/
I've never heard of such an option in Matlab. It would likely require deep manipulation of a lot of the floating-point math, effectively requiring a new datatype to be supported if this were to be an easily toggle-able option in Matlab. You could write your own mex C code to do this (more here and here) for an individual function.
And of course you can get something like this with one line of Matlab – here's an example:
a = [1e-300 1e-310 1e-310];
b = [1e-301 1e-311 1e-310];
x = a-b;
x(abs(x(:)) < realmin(class(x))) = 0;
where realmin is the smallest normalized floating-point number. However, the floating point math is still performed using the extended denormal/subnormal values in a. It's just the output that's clipped to zero.
Unless you're doing this for fun an experimentation, or possibly running code on an embedded platform, I'd really recommend against disabling denormals as a form of optimization. Instead, focus on why your values are so small and how you might rescale your problem to avoid the issue entirely.

Irrational number representation in computer

We can write a simple Rational Number class using two integers representing A/B with B != 0.
If we want to represent an irrational number class (storing and computing), the first thing came to my mind is to use floating point, which means use IEEE 754 standard (binary fraction). This is because irrational number must be approximated.
Is there another way to write irrational number class other than using binary fraction (whether they conserve memory space or not) ?
I studied jsbeuno's solution using Python: Irrational number representation in any programming language?
He's still using the built-in floating point to store.
This is not homework.
Thank you for your time.
With a cardinality argument, there are much more irrational numbers than rational ones. (and the number of IEEE754 floating point numbers is finite, probably less than 2^64).
You can represent numbers with something else than fractions (e.g. logarithmically).
jsbeuno is storing the number as a base and a radix and using those when doing calcs with other irrational numbers; he's only using the float representation for output.
If you want to get fancier, you can define the base and the radix as rational numbers (with two integers) as described above, or make them themselves irrational numbers.
To make something thoroughly useful, though, you'll end up replicating a symbolic math package.
You can always use symbolic math, where items are stored exactly as they are and calculations are deferred until they can be performed with precision above some threshold.
For example, say you performed two operations on a non-irrational number like 2, one to take the square root and then one to square that. With limited precision, you may get something like:
(√2)²
= 1.414213562²
= 1.999999999
However, storing symbolic math would allow you to store the result of √2 as √2 rather than an approximation of it, then realise that (√x)² is equivalent to x, removing the possibility of error.
Now that obviously involves a more complicated encoding that simple IEEE754 but it's not impossible to achieve.