Dealing with small numbers and accuracy - numbers

I have a program where I deal with a lot of very small numbers (towards the lower end of the Double limits).
During the execution of my application, some of these numbers progressively get smaller meaning their "estimation" is less accurate.
My solution at the moment is scaling them up before I do any calculations and then scaling them back down again?
...but it's got me thinking, am I actually gaining any more "accuracy" by doing this?
Thoughts?

Are your numbers really in the region between 10^-308 (smallest normalized double) and 10^-324 (smallest representable double, denormalized i.e. losing precision)? If so, then by scaling them up you do indeed gain accuracy by working around the limits of the exponent range of the double type.
I have to wonder though: what kind of application deals with numbers that extremely small? I know of no physical discipline that needs anything like that.

A double has a fixed number of significant digits, and another fixed number of bytes to represent the "power"-part.
In fact you may, therefore, have two issues:
Regarding the power-part: that is what approaching the limit of small doubles is about.
Scaling them up (by powers of 2) helps avoid that your number becomes no longer representable.
when you write about the the accuracy of "estimation", I assume you refer to the number of significant digits: that is not related to the small-number-limit. A number that is very small, but not too small in the sense of the lower limit for doubles, has the same number of significant digits as any "more normal" number.
Concerns about numerical precision of a number should, generally speaking, focus on how the number is computed, rather than on the absolute size of the result.

Related

What is the most optimal method for calculating very, very large numbers? Namely, in context of an exponential game?

Right now, my current method for updating very large numbers is the follows:
Keep track of the health of a given monster, to be attacked, with both a
double to keep track of significant digits, and an int to keep track of the power
If the monster is attacked, only update the small double / exponent values used
By following the above, I am only ever dealing with calculations that are between small doubles between 0 and 10. However, I feel as though this is extremely complicated, in comparison to simply using bigintegers.
I have worked in Java previously, and using BigInt resulted in a HUGE performance loss when the exponential numbers became massive (e.g., 10 ^ 20000 +), if I remember correctly.
In the interest of coding a game, and not having to re-evaluate formulas later, what is the best course of action?
Am I really saving that much performance by keeping calculations between small numbers? Or is there an implementation of BigDouble / BigInt for Swift that makes any gain in performance by another implementation negligible? I am well versed in how to use BigDouble / BigInt, so I am really only concerned with the performance difference between using either of those versus an implementation of big numbers by splitting the number into a double (to represent significant digits) and an int (to represent the exponent).
Thank you, and if there needs to be any clarification, I can provide it.

Numerical convergence and minimum number size

I have a program which calculates probability values
(p-values),
but it is entering a very large negative number into the
exp function
exp(-626294.830) which evaluates to zero instead of the very small
positive number that it should be.
How can I get this to evaluate as a very small floating point number?
I have tried
Math::BigFloat,
bignum, and
bigrat
but all have failed.
Wolfram Alpha says that exp(-626294.830) is 4.08589×10^-271997... zero is a pretty close approximation to that ;-) Although you've edited and removed the context from your question, do you really need to work with such tiny numbers, or perhaps there is some way you could optimize your algorithm or scale your numbers?
Anyway, you are correct that code like Math::BigFloat->new("-626294.830")->bexp seems to take quite some time, even with the support of use Math::BigFloat lib => 'GMP';.
The only alternative I can offer at the moment is Math::Prime::Util::GMP's expreal, although you need to specify a precision to it.
use Math::Prime::Util::GMP qw/expreal/;
use Math::BigFloat;
my $e = Math::BigFloat->new(expreal(-626294.830,272000));
print $e->bnstr,"\n";
__END__
4.086e-271997
But on my machine, even that still takes ~20s to run, which brings us back to the question of potential optimization in other places.
Floating point numbers do not have infinite precision. Assuming the number is represented as an IEEE 754 double, we have 52 bits for a fraction, 11 bits for the exponent, and one bit for the sign. Due to the way exponents are encoded, the smallest positive number that can be represented is 2^-1022.
If we look at your number e^-626294.830, we can do a change of base and see that it equals 2^(log_2 e · -626294.830) = 2^-903552.445, which is significantly smaller than 2^-1022. Approximating your number as zero is therefore correct.
Instead of calculating this value using arbitrary-precision numerics, you are likely better off solving the necessary equations by hand, then coding this in a way that does not require extreme precision. For example, it is unlikely that you need the exact value of e^-626294.830, but perhaps just the magnitude. Then, you can calculate the logarithm instead of using exp().

Matlab precision: simple subtraction is not zero

I compute this simple sum on Matlab:
2*0.04-0.5*0.4^2 = -1.387778780781446e-017
but the result is not zero. What can I do?
Aabaz and Jim Clay have good explanations of what's going on.
It's often the case that, rather than exactly calculating the value of 2*0.04 - 0.5*0.4^2, what you really want is to check whether 2*0.04 and 0.5*0.4^2 differ by an amount that is small enough to be within the relevant numerical precision. If that's the case, than rather than checking whether 2*0.04 - 0.5*0.4^2 == 0, you can check whether abs(2*0.04 - 0.5*0.4^2) < thresh. Here thresh can either be some arbitrary smallish number, or an expression involving eps, which gives the precision of the numerical type you're working with.
EDIT:
Thanks to Jim and Tal for suggested improvement. Altered to compare the absolute value of the difference to a threshold, rather than the difference.
Matlab uses double-precision floating-point numbers to store real numbers. These are numbers of the form m*2^e where m is an integer between 2^52 and 2^53 (the mantissa) and e is the exponent. Let's call a number a floating-point number if it is of this form.
All numbers used in calculations must be floating-point numbers. Often, this can be done exactly, as with 2 and 0.5 in your expression. But for other numbers, most notably most numbers with digits after the decimal point, this is not possible, and an approximation has to be used. What happens in this case is that the number is rounded to the nearest floating-point number.
So, whenever you write something like 0.04 in Matlab, you're really saying "Get me the floating-point number that is closest to 0.04. In your expression, there are 2 numbers that need to be approximated: 0.04 and 0.4.
In addition, the exact result of operations like addition and multiplication on floating-point numbers may not be a floating-point number. Although it is always of the form m*2^e the mantissa may be too large. So you get an additional error from rounding the results of operations.
At the end of the day, a simple expression like yours will be off by about 2^-52 times the size of the operands, or about 10^-17.
In summary: the reason your expression does not evaluate to zero is two-fold:
Some of the numbers you start out with are different (approximations) to the exact numbers you provided.
The intermediate results may also be approximations of the exact results.
What you are seeing is quantization error. Matlab uses doubles to represent numbers, and while they are capable of a lot of precision, they still cannot represent all real numbers because there are an infinite number of real numbers. I'm not sure about Aabaz's trick, but in general I would say there isn't anything you can do, other than perhaps massaging your inputs to be double-friendly numbers.
I do not know if it is applicable to your problem but often the simplest solution is to scale your data.
For example:
a=0.04;
b=0.2;
a-0.2*b
ans=-6.9389e-018
c=a/min(abs([a b]));
d=b/min(abs([a b]));
c-0.2*d
ans=0
EDIT: of course I did not mean to give a universal solution to these kind of problems but it is still a good practice that can make you avoid a few problems in numerical computation (curve fitting, etc ...). See Jim Clay's answer for the reason why you are experiencing these problems.
I'm pretty sure this is a case of ye olde floating point accuracy issues.
Do you need 1e-17 accuracy? Is this merely a case of wanting 'pretty' output?
In that case, you can just use a formatted sprintf to display the number of significant digits you want.
Realize that this is not a matlab problem, but a fundamental limitation of how numbers are represented in binary.
For fun, work out what .1 is in binary...
Some references:
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
http://www.mathworks.com/support/tech-notes/1100/1108.html

iPhone and floating point math

I have following code:
float totalSpent;
int intBudget;
float moneyLeft;
totalSpent += Amount;
moneyLeft = intBudget - totalSpent;
And this is how it looks in debugger: http://www.braginski.com/math.tiff
Why would moneyLeft calculated by the code above is .02 different compared to the expression calculated by the debugger?
Expression windows is correct, yet code above produces wrong by .02 result. It only happens for a very large numbers (yet way below int limit)
thanks
A single-precision float has 23 bits of precision. That means that every calculation is rounded to 23 binary digits. This means that if you have a computation that, say, adds a very small number to a very large number, rounding may result in strange results.
Imagine that you are doing math in scientific notation decimal by hand, under the rule that you may only have four significant figures. Let's say I ask you to write twelve in scientific notation, with four significant figures. Remembering junior high school, you write:
1.200 × 101
Now I say compute the square of 12, and then add 0.5. That is easy enough:
1.440×102 + 0.005×102 = 1.445×102
How about twelve cubed plus 0.75:
1.728×103 + 0.00075×103 = 1.72875×103
But remember, I only gave you room for four significant digits, so you must round; then we get:
1.728×103 + 7.5×10-1 = 1.729×103
See? The lack of precision can make the computation come out with unexpected results.
In your example, you've got 999999 in a calculation where you're trying to be precise to 0.01. log2(999999) = 19.93 and log2(0.01) = -6.64. The difference is more than 23; therefore you would need more than 23 binary digits to perform this calculation accurately.
Because floating point mathematics rounds-off precision by its very nature, it is usually a bad choice for currency computation, where you must be accurate to the last cent. But are you really concerned with fractions of a cent in your application? If not, then why not do away with the decimal point altogether, and simply store cents (instead of dollars) in a 64-bit integer? 264¢ is more than the GDP of the entire planet.
Floating point will always produce strange results with money type calculations.
The golden rule is that floating point is good for things you measure litres,yards,lightyears,bushels etc. etc. but not for things you count like
sheep, beans, buttons etc.
Most money calculations are to do with counting pennies so use integer math
and you wont get the strange results. Either use a fixed decimal arithimatic
library (which would probably be overkill on an iPhone) or store your amounts
as whole numbers of cents and only convert to $ and cents on display.

Is it a good idea to use NSDecimalNumber for floating point arithmetics instead of plain double?

I wonder what's the point of NSDecimalNumber. It offers some arithmetics methods, but why should I use NSDecimalNumber and not just double or NSNumber? Did apple take care of some floating point arithmetics uglyness there? Would it make life easier when making heavy use of high precision and big floating point maths?
This all depends or your needs.
It is a trade off between precision, speed and size of data.
If you are writing an accounting application you cannot lose any precision and so might well use NSDecimal number.
Ig you are doing complex numerical analysis the speed could matter and so NSDecimalNumber would be too slow. But even in that case your analysis would look at the precision and errors you could afford and here could be cases where you need more precision that doubles etc give you.
NSNumber is a separate case it is a class cluster to allow storage of C type numbers in other objects and other use in Cocoa.
If your software deals with money, or other non-integer numbers of interest to accountants, you are well advised to use decimal numbers for that (rather than the binary ones that the underlying HW is optimized to process); that's why all sorts of general purpose languages and databases bend over backwards to support decimal non-integer numbers, not just binary ones.
Rounding issues with binary non-integers might easily result in fractions-of-a-cent discrepancies that, at the limit, might even land you in legal trouble, and, more realistically, will be perceived by accountants and others dealing with money &c as errors in your program, no matter how staunchly you may argue otherwise!-)
NSDecimalNumber is a fixed precision (and scale) integer scaled to a certain size to represent fractional numbers. This is a little different from a floating point number (where the point, obviously, floats...)
As an example, say you need to represent money from 0.00 to 999.99, you could store this in an integer from 0 to 99999 as an amount in pennies. The scale (in digits) is 2 and the precision is 5. In a floating point number, with precision 5, and a floating point you could represent from .00001 to 99999, but not 999.999, for example.