What does AP stand for in llvm::APFloat and llvm::APInt? - llvm-c++-api

https://llvm.org/doxygen/classllvm_1_1APFloat.html
https://llvm.org/doxygen/classllvm_1_1APInt.html
I'd appreciate if you could answer the question.

The AP in APFloat stands for Arbitrary Precision, which means it can be used for calculations with very large precision. See also Arbitrary-precision arithmetic on Wikipedia.

Related

Numerical convergence and minimum number size

I have a program which calculates probability values
(p-values),
but it is entering a very large negative number into the
exp function
exp(-626294.830) which evaluates to zero instead of the very small
positive number that it should be.
How can I get this to evaluate as a very small floating point number?
I have tried
Math::BigFloat,
bignum, and
bigrat
but all have failed.
Wolfram Alpha says that exp(-626294.830) is 4.08589×10^-271997... zero is a pretty close approximation to that ;-) Although you've edited and removed the context from your question, do you really need to work with such tiny numbers, or perhaps there is some way you could optimize your algorithm or scale your numbers?
Anyway, you are correct that code like Math::BigFloat->new("-626294.830")->bexp seems to take quite some time, even with the support of use Math::BigFloat lib => 'GMP';.
The only alternative I can offer at the moment is Math::Prime::Util::GMP's expreal, although you need to specify a precision to it.
use Math::Prime::Util::GMP qw/expreal/;
use Math::BigFloat;
my $e = Math::BigFloat->new(expreal(-626294.830,272000));
print $e->bnstr,"\n";
__END__
4.086e-271997
But on my machine, even that still takes ~20s to run, which brings us back to the question of potential optimization in other places.
Floating point numbers do not have infinite precision. Assuming the number is represented as an IEEE 754 double, we have 52 bits for a fraction, 11 bits for the exponent, and one bit for the sign. Due to the way exponents are encoded, the smallest positive number that can be represented is 2^-1022.
If we look at your number e^-626294.830, we can do a change of base and see that it equals 2^(log_2 e · -626294.830) = 2^-903552.445, which is significantly smaller than 2^-1022. Approximating your number as zero is therefore correct.
Instead of calculating this value using arbitrary-precision numerics, you are likely better off solving the necessary equations by hand, then coding this in a way that does not require extreme precision. For example, it is unlikely that you need the exact value of e^-626294.830, but perhaps just the magnitude. Then, you can calculate the logarithm instead of using exp().

Perl: Which floating point operations are lossless?

There are some great docs out there describing the nature of floating point numbers and why precision must necessarily be lost in some floating point operations. E.g.,:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
I'm interested in understanding which floating point operations are lossless in terms of the precision of the underlying number. For instance,
$x = 2.0/3.0; # this is an operation which will lose some precision
But will this lose precision?
$x = 473.25 / 1000000;
or this?
$x = 0.3429 + 0.2939201;
What general rules of thumb can one use to know when precision will be lost and when it will be retained?
In general most operations will introduce some precision loss. The exceptions are values that are exactly representable in floating point (mantissa is expressible as a non-repeating binary). When applying arithmetic, both operands and the result would have to be expressible without precision loss.
You can construct arbitrary examples that have no precision loss, but the number of such examples is small compared with the domain. For example:
1.0f / 8.0f = 0.125f
Put another way, the mantissa must be expressible as A/B where B is a power of 2, and (as pointed out by #ysth) both A and B are smaller than some upper bound which is dictated by the total number of bits available for the mantissa in the representation.
I don't think you'll find any useful rules of thumb. Generally speaking, precision will not be lost if both the following are true:
Each operand has an exact representation in floating point
The (mathematical) result has an exact representation in floating point
In your second example, 473.25 / 1000000, each operand has an exact floating point representation, but the quotient does not. (Any rational number that has a factor of 5 in the denominator has a repeating expansion in base 2 and hence no exact representation in floating point.) The above rules are not particularly useful because you cannot tell ahead of time whether the result is going to be exact just by looking at the operands; you need to also know the result.
Your question makes a big assumption.
The issue isn't whether certain operations are lossless, it's whether storing numbers in floating point at all will be without loss. For example, take this number from your example:
$x = .3429;
sprintf "%.20f", $x;
Outputs:
0.34289999999999998000
So to answer your question, some operations might be lossless in specific cases. However, it depends on both the original numbers and the result, so should never be counted upon.
For more information, read perlnumber.

Matlab precision: simple subtraction is not zero

I compute this simple sum on Matlab:
2*0.04-0.5*0.4^2 = -1.387778780781446e-017
but the result is not zero. What can I do?
Aabaz and Jim Clay have good explanations of what's going on.
It's often the case that, rather than exactly calculating the value of 2*0.04 - 0.5*0.4^2, what you really want is to check whether 2*0.04 and 0.5*0.4^2 differ by an amount that is small enough to be within the relevant numerical precision. If that's the case, than rather than checking whether 2*0.04 - 0.5*0.4^2 == 0, you can check whether abs(2*0.04 - 0.5*0.4^2) < thresh. Here thresh can either be some arbitrary smallish number, or an expression involving eps, which gives the precision of the numerical type you're working with.
EDIT:
Thanks to Jim and Tal for suggested improvement. Altered to compare the absolute value of the difference to a threshold, rather than the difference.
Matlab uses double-precision floating-point numbers to store real numbers. These are numbers of the form m*2^e where m is an integer between 2^52 and 2^53 (the mantissa) and e is the exponent. Let's call a number a floating-point number if it is of this form.
All numbers used in calculations must be floating-point numbers. Often, this can be done exactly, as with 2 and 0.5 in your expression. But for other numbers, most notably most numbers with digits after the decimal point, this is not possible, and an approximation has to be used. What happens in this case is that the number is rounded to the nearest floating-point number.
So, whenever you write something like 0.04 in Matlab, you're really saying "Get me the floating-point number that is closest to 0.04. In your expression, there are 2 numbers that need to be approximated: 0.04 and 0.4.
In addition, the exact result of operations like addition and multiplication on floating-point numbers may not be a floating-point number. Although it is always of the form m*2^e the mantissa may be too large. So you get an additional error from rounding the results of operations.
At the end of the day, a simple expression like yours will be off by about 2^-52 times the size of the operands, or about 10^-17.
In summary: the reason your expression does not evaluate to zero is two-fold:
Some of the numbers you start out with are different (approximations) to the exact numbers you provided.
The intermediate results may also be approximations of the exact results.
What you are seeing is quantization error. Matlab uses doubles to represent numbers, and while they are capable of a lot of precision, they still cannot represent all real numbers because there are an infinite number of real numbers. I'm not sure about Aabaz's trick, but in general I would say there isn't anything you can do, other than perhaps massaging your inputs to be double-friendly numbers.
I do not know if it is applicable to your problem but often the simplest solution is to scale your data.
For example:
a=0.04;
b=0.2;
a-0.2*b
ans=-6.9389e-018
c=a/min(abs([a b]));
d=b/min(abs([a b]));
c-0.2*d
ans=0
EDIT: of course I did not mean to give a universal solution to these kind of problems but it is still a good practice that can make you avoid a few problems in numerical computation (curve fitting, etc ...). See Jim Clay's answer for the reason why you are experiencing these problems.
I'm pretty sure this is a case of ye olde floating point accuracy issues.
Do you need 1e-17 accuracy? Is this merely a case of wanting 'pretty' output?
In that case, you can just use a formatted sprintf to display the number of significant digits you want.
Realize that this is not a matlab problem, but a fundamental limitation of how numbers are represented in binary.
For fun, work out what .1 is in binary...
Some references:
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
http://www.mathworks.com/support/tech-notes/1100/1108.html

What's the most precise data type for floating point calculations on iPhone OS?

I always thought it's double, until I accidently hit floo+ESC and it told me there is a floorl(<#long double #>) function. So long double is the solution to all big inaccuracy problems? ;-)
Or is there even something more precise than that?
One thing to be aware of is that long double simply acts as double on the iPhone hardware. You don't get any additional precision from the larger type. It will give you more precision in the Simulator, because you're running on a Mac there, so that can confuse you.
As is noted here (and by other commenters), NSDecimal or NSDecimalNumber is the way to go for precision (up to 34 digits), and calculations performed using it are done with true decimal math, not binary floating point. This avoids many of the errors that you see with normal IEEE 754 math.
I think long double is the limit, but it really depends what you want to do. Have you actually been running into inaccuracy problems?
For even more precision look at the NSDecimalNumber class. As the other comment says - have you found any inaccuracy problems. Also more accuracy will be slower.
Adding more precision is one approach to solving the problem, but the real problem sometimes (usually?) lies in the way you are performing the computation. In that case, I proscribe a healthy dose of RTFM. Any primer on FP arithmetic will cover why certain forms of equations are disastrous and how to avoid the gaping maw of oblivion.

How to make sure an NSDecimalNumber represents no fractional digits?

I want to do some fairly complex arithmetics that require very high precision, i.e. calculating
10000000000 + 0.00000000001 = 10000000000.00000000001
10000000000.00000000001 * 3 = 30000000000.00000000003
I want to use NSDecimalNumber for this kind of math, but the problem is: How to feed it with these values?
The documentation says:
- (id)initWithMantissa:(unsigned long long)mantissa exponent:(short)exponent isNegative:(BOOL)flag
The first problem I see is the mantissa. It requires a unsigned long long. As I understand that data type, It is a floating point, right? So if it is, at this point the entered value is already "dirty". It may have unwanted fractional digits somewhere at the end of it. I couldn't find good documentation on "unsigned long long" from apple, but I remember a code snippet where somone feeded the mantissa with a CGFloat, so that's why I assume it's a floating-point type.
Well if it is indeed some super floating point datatype, then the hard question is: How to get a clean, really clean integer into this thing? So clean, that I could multiply it by a half trillion without getting wrong results?
Are there good tutorials on the usage of NSDecimalNumber in practise?
Edit: No problem here! Thanks everyone!
If you really are concerned about feeding in less precise types, I'd recommend using -initWithString:, -initWithString:locale:, +decimalNumberWithString:, or +decimalNumberWithString:locale:. Using the string description avoids ever having to convert the numerical representation to a floating point or other numerical type before generating your NSDecimalNumber.