Why does the multiplicative inverse of 0 result in infinity? - swift

I am writing a program in Swift that takes the multiplicative inverse of random bytes. Sometimes, the byte is 0, and when the multiplicative inverse is taken, it results in inf.
The multiplicative inverse is being determined using
powf(Float(byte), -1.0)
byte is of type UInt8. If byte is equal to 0, the result is inf as mentioned earlier. How would the multiplicative inverse of 0 be infinity? Wouldn't the multiplicative inverse also be 0 since 0/0's multiplicative inverse is 0/0?

Short answer: By definition. In Swift (and many other languages), floating point numbers are backed by IEEE-754 definition of floats, which is directly implemented by the underlying hardware in most cases and thus quite fast. And according to that standard, division by 0 for floats is defined to be Infinity, and Swift is merely returning that result back to you. (To be precise, 0/0 is defined to be NaN, any positive number divided by 0 is defined to be Infinity, and any negative number divided by 0 is defined to be -Infinity.)
An interesting question to ask might be "why?" Why does IEEE-754 define division by 0 to be Infinity for floats, where one can reasonably also expect the machine to throw an error, or maybe define it as NaN (not-a-number), or perhaps maybe even 0? For an analysis of this, you should really read Kahan's (the designer of the semantics behind IEEE-754) own notes regarding this matter. Starting on page 10 of the linked document, he discusses why the choice of Infinity is preferable for division-by-zero, which essentially boils down to efficient implementation of numerical algorithms since this convention allows skipping of expensive tests in iterative numerical analysis. Start reading on page 10, and go through the examples he discusses, which ends on top of page 14.
To sum up: Floating point division by 0 is defined to be Infinity by the IEEE-754 standard, and there are good reasons for making this choice. Of course, one can imagine different systems adopting a different answer as well, depending on their particular need or application area; but then they wouldn't be IEEE-754 compliant.

Plugging in 0 just means it is 0 divided by some positive number. Then, the multiplicative inverse will be dividing by 0. As you probably know, this is undefined in mathematics, but in swift, it tries to calculate it. Essentially, it keeps subtracting 0 from the number, but never gets a result, so it will output infinity.
Edit: As Alias pointed out, Swift is not actually going through that process of continually subtracting 0. It will just return infinity anytime it is supposed to divide by 0.

Related

What accounts for most of the integer multiply instructions?

The majority of integer multiplications don't actually need multiply:
Floating-point is, and has been since the 486, normally handled by dedicated hardware.
Multiplication by a constant, such as for scaling an array index by the size of the element, can be reduced to a left shift in the common case where it's a power of two, or a sequence of left shifts and additions in the general case.
Multiplications associated with accessing a 2D array, can often be strength reduced to addition if it's in the context of a loop.
So what's left?
Certain library functions like fwrite that take a number of elements and an element size as runtime parameters.
Exact decimal arithmetic e.g. Java's BigDecimal type.
Such forms of cryptography as require multiplication and are not handled by their own dedicated hardware.
Big integers e.g. for exploring number theory.
Other cases I'm not thinking of right now.
None of these jump out at me as wildly common, yet all modern CPU architectures include integer multiply instructions. (RISC-V omits them from the minimal version of the instruction set, but has been criticized for even going this far.)
Has anyone ever analyzed a representative sample of code, such as the SPEC benchmarks, to find out exactly what use case accounts for most of the actual uses of integer multiply (as measured by dynamic rather than static frequency)?

Numerical convergence and minimum number size

I have a program which calculates probability values
(p-values),
but it is entering a very large negative number into the
exp function
exp(-626294.830) which evaluates to zero instead of the very small
positive number that it should be.
How can I get this to evaluate as a very small floating point number?
I have tried
Math::BigFloat,
bignum, and
bigrat
but all have failed.
Wolfram Alpha says that exp(-626294.830) is 4.08589×10^-271997... zero is a pretty close approximation to that ;-) Although you've edited and removed the context from your question, do you really need to work with such tiny numbers, or perhaps there is some way you could optimize your algorithm or scale your numbers?
Anyway, you are correct that code like Math::BigFloat->new("-626294.830")->bexp seems to take quite some time, even with the support of use Math::BigFloat lib => 'GMP';.
The only alternative I can offer at the moment is Math::Prime::Util::GMP's expreal, although you need to specify a precision to it.
use Math::Prime::Util::GMP qw/expreal/;
use Math::BigFloat;
my $e = Math::BigFloat->new(expreal(-626294.830,272000));
print $e->bnstr,"\n";
__END__
4.086e-271997
But on my machine, even that still takes ~20s to run, which brings us back to the question of potential optimization in other places.
Floating point numbers do not have infinite precision. Assuming the number is represented as an IEEE 754 double, we have 52 bits for a fraction, 11 bits for the exponent, and one bit for the sign. Due to the way exponents are encoded, the smallest positive number that can be represented is 2^-1022.
If we look at your number e^-626294.830, we can do a change of base and see that it equals 2^(log_2 e · -626294.830) = 2^-903552.445, which is significantly smaller than 2^-1022. Approximating your number as zero is therefore correct.
Instead of calculating this value using arbitrary-precision numerics, you are likely better off solving the necessary equations by hand, then coding this in a way that does not require extreme precision. For example, it is unlikely that you need the exact value of e^-626294.830, but perhaps just the magnitude. Then, you can calculate the logarithm instead of using exp().

Perl: Which floating point operations are lossless?

There are some great docs out there describing the nature of floating point numbers and why precision must necessarily be lost in some floating point operations. E.g.,:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
I'm interested in understanding which floating point operations are lossless in terms of the precision of the underlying number. For instance,
$x = 2.0/3.0; # this is an operation which will lose some precision
But will this lose precision?
$x = 473.25 / 1000000;
or this?
$x = 0.3429 + 0.2939201;
What general rules of thumb can one use to know when precision will be lost and when it will be retained?
In general most operations will introduce some precision loss. The exceptions are values that are exactly representable in floating point (mantissa is expressible as a non-repeating binary). When applying arithmetic, both operands and the result would have to be expressible without precision loss.
You can construct arbitrary examples that have no precision loss, but the number of such examples is small compared with the domain. For example:
1.0f / 8.0f = 0.125f
Put another way, the mantissa must be expressible as A/B where B is a power of 2, and (as pointed out by #ysth) both A and B are smaller than some upper bound which is dictated by the total number of bits available for the mantissa in the representation.
I don't think you'll find any useful rules of thumb. Generally speaking, precision will not be lost if both the following are true:
Each operand has an exact representation in floating point
The (mathematical) result has an exact representation in floating point
In your second example, 473.25 / 1000000, each operand has an exact floating point representation, but the quotient does not. (Any rational number that has a factor of 5 in the denominator has a repeating expansion in base 2 and hence no exact representation in floating point.) The above rules are not particularly useful because you cannot tell ahead of time whether the result is going to be exact just by looking at the operands; you need to also know the result.
Your question makes a big assumption.
The issue isn't whether certain operations are lossless, it's whether storing numbers in floating point at all will be without loss. For example, take this number from your example:
$x = .3429;
sprintf "%.20f", $x;
Outputs:
0.34289999999999998000
So to answer your question, some operations might be lossless in specific cases. However, it depends on both the original numbers and the result, so should never be counted upon.
For more information, read perlnumber.

Irrational number representation in computer

We can write a simple Rational Number class using two integers representing A/B with B != 0.
If we want to represent an irrational number class (storing and computing), the first thing came to my mind is to use floating point, which means use IEEE 754 standard (binary fraction). This is because irrational number must be approximated.
Is there another way to write irrational number class other than using binary fraction (whether they conserve memory space or not) ?
I studied jsbeuno's solution using Python: Irrational number representation in any programming language?
He's still using the built-in floating point to store.
This is not homework.
Thank you for your time.
With a cardinality argument, there are much more irrational numbers than rational ones. (and the number of IEEE754 floating point numbers is finite, probably less than 2^64).
You can represent numbers with something else than fractions (e.g. logarithmically).
jsbeuno is storing the number as a base and a radix and using those when doing calcs with other irrational numbers; he's only using the float representation for output.
If you want to get fancier, you can define the base and the radix as rational numbers (with two integers) as described above, or make them themselves irrational numbers.
To make something thoroughly useful, though, you'll end up replicating a symbolic math package.
You can always use symbolic math, where items are stored exactly as they are and calculations are deferred until they can be performed with precision above some threshold.
For example, say you performed two operations on a non-irrational number like 2, one to take the square root and then one to square that. With limited precision, you may get something like:
(√2)²
= 1.414213562²
= 1.999999999
However, storing symbolic math would allow you to store the result of √2 as √2 rather than an approximation of it, then realise that (√x)² is equivalent to x, removing the possibility of error.
Now that obviously involves a more complicated encoding that simple IEEE754 but it's not impossible to achieve.

Matlab precision: simple subtraction is not zero

I compute this simple sum on Matlab:
2*0.04-0.5*0.4^2 = -1.387778780781446e-017
but the result is not zero. What can I do?
Aabaz and Jim Clay have good explanations of what's going on.
It's often the case that, rather than exactly calculating the value of 2*0.04 - 0.5*0.4^2, what you really want is to check whether 2*0.04 and 0.5*0.4^2 differ by an amount that is small enough to be within the relevant numerical precision. If that's the case, than rather than checking whether 2*0.04 - 0.5*0.4^2 == 0, you can check whether abs(2*0.04 - 0.5*0.4^2) < thresh. Here thresh can either be some arbitrary smallish number, or an expression involving eps, which gives the precision of the numerical type you're working with.
EDIT:
Thanks to Jim and Tal for suggested improvement. Altered to compare the absolute value of the difference to a threshold, rather than the difference.
Matlab uses double-precision floating-point numbers to store real numbers. These are numbers of the form m*2^e where m is an integer between 2^52 and 2^53 (the mantissa) and e is the exponent. Let's call a number a floating-point number if it is of this form.
All numbers used in calculations must be floating-point numbers. Often, this can be done exactly, as with 2 and 0.5 in your expression. But for other numbers, most notably most numbers with digits after the decimal point, this is not possible, and an approximation has to be used. What happens in this case is that the number is rounded to the nearest floating-point number.
So, whenever you write something like 0.04 in Matlab, you're really saying "Get me the floating-point number that is closest to 0.04. In your expression, there are 2 numbers that need to be approximated: 0.04 and 0.4.
In addition, the exact result of operations like addition and multiplication on floating-point numbers may not be a floating-point number. Although it is always of the form m*2^e the mantissa may be too large. So you get an additional error from rounding the results of operations.
At the end of the day, a simple expression like yours will be off by about 2^-52 times the size of the operands, or about 10^-17.
In summary: the reason your expression does not evaluate to zero is two-fold:
Some of the numbers you start out with are different (approximations) to the exact numbers you provided.
The intermediate results may also be approximations of the exact results.
What you are seeing is quantization error. Matlab uses doubles to represent numbers, and while they are capable of a lot of precision, they still cannot represent all real numbers because there are an infinite number of real numbers. I'm not sure about Aabaz's trick, but in general I would say there isn't anything you can do, other than perhaps massaging your inputs to be double-friendly numbers.
I do not know if it is applicable to your problem but often the simplest solution is to scale your data.
For example:
a=0.04;
b=0.2;
a-0.2*b
ans=-6.9389e-018
c=a/min(abs([a b]));
d=b/min(abs([a b]));
c-0.2*d
ans=0
EDIT: of course I did not mean to give a universal solution to these kind of problems but it is still a good practice that can make you avoid a few problems in numerical computation (curve fitting, etc ...). See Jim Clay's answer for the reason why you are experiencing these problems.
I'm pretty sure this is a case of ye olde floating point accuracy issues.
Do you need 1e-17 accuracy? Is this merely a case of wanting 'pretty' output?
In that case, you can just use a formatted sprintf to display the number of significant digits you want.
Realize that this is not a matlab problem, but a fundamental limitation of how numbers are represented in binary.
For fun, work out what .1 is in binary...
Some references:
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
http://www.mathworks.com/support/tech-notes/1100/1108.html