Portability of auto kind/type conversions in numerical operations in Fortran - type-conversion

According to the Fortran standard, if the operands of a numeric operation have different data kind/types, then the resulting value has a kind/type determined by the operand with greater decimal precision. Before the operation is evaluated, the operand with the lower decimal precision is first converted to the higher-precision kind/type.
Now, the use of a high-precision data kind/type implies there is accuracy to a certain level of significant digits, but kind/type conversion does not seem to guarantee such things1. For this reason, I avoid mixing single- and double-precision reals.
But does this mean that automatic kind/type conversions should be avoided at all costs? For example, I would not hesitate to write x = y**2 where both x and y are reals (of the same kind), but the exponent is an integer.
Let us limit the scope of this question to the result of a single operation between two operands. We are not considering the outcome of equations with operations between multiple values where other issues might creep in.
Let us also assume we are using a portable type/kind system. For example, in the code below selected_real_kind is used to define the kind assigned to double-precision real values.
Then, I have two questions regarding numerical expressions with type/kind conversions between two operands:
Is it "portable", in practice? Can we expect the same result for an operation that uses automatic type/kind conversion from different compilers?
Is it "accurate" (and "portable") if the lower-precision operands are limited to integers or whole-number reals? To make this clear, can we always assume that 0==0.0d0, 1==1.0d0, 2==2.0d0, ... , for all compilers? And if so, then can we always assume that simple expressions such as (1 - 0.1230d0) == (1.0d0 - 0.1230d0) are true, and therefore the conversion is both accurate and portable?
To provide a simple example, would automatic conversion from an integer to a double-precision real like shown in the code below be accurate and/or portable?
program main
implicit none
integer, parameter :: dp = selected_real_kind(p=15)
print *, ((42 - 0.10_dp) == (42.0_dp - 0.10_dp))
end program
I have tested with gfortran and ifort, using different operands and operations, but have yet to see anything to cause concern as long as I limit the conversions to integers or whole-number reals. Am I missing anything here, or just revealing my non-CS background?
1According to these Intel Fortran docs (for example), integers converted to a real type have decimals filled with zeros. For the conversion of a single-precision real to higher-precision real, the additional decimal places are filled by first setting the low-order bits of the converted higher-precision operand to zero. So, for example, when a single-precision real operand with a non-zero fractional part (such as 1.2) is converted to a double, the conversion does not automatically increase the accuracy of the value - for example, 1.2 does not become 1.2000000000000000d0 but instead becomes something like 1.200000047683758d0. How much this actually matters probably depends on the application.

Related

What accounts for most of the integer multiply instructions?

The majority of integer multiplications don't actually need multiply:
Floating-point is, and has been since the 486, normally handled by dedicated hardware.
Multiplication by a constant, such as for scaling an array index by the size of the element, can be reduced to a left shift in the common case where it's a power of two, or a sequence of left shifts and additions in the general case.
Multiplications associated with accessing a 2D array, can often be strength reduced to addition if it's in the context of a loop.
So what's left?
Certain library functions like fwrite that take a number of elements and an element size as runtime parameters.
Exact decimal arithmetic e.g. Java's BigDecimal type.
Such forms of cryptography as require multiplication and are not handled by their own dedicated hardware.
Big integers e.g. for exploring number theory.
Other cases I'm not thinking of right now.
None of these jump out at me as wildly common, yet all modern CPU architectures include integer multiply instructions. (RISC-V omits them from the minimal version of the instruction set, but has been criticized for even going this far.)
Has anyone ever analyzed a representative sample of code, such as the SPEC benchmarks, to find out exactly what use case accounts for most of the actual uses of integer multiply (as measured by dynamic rather than static frequency)?

Rationale for CBOR negative integers

I am confused as to why CBOR chooses to encode negative integers as unsigned binary numbers with the value defined as -1 minus the unsigned value, instead of e.g. regular two's complement representation. Is there an obvious advantage that I'm missing, apart from increased negative range (which, IMO, is of questionable value weighed against increased complexity)?
Advantages:
There's only one allowed encoding type for each integer value, so all encoders will emit consistent output. If the encoders use the shortest encoding for each value as recommended by the spec, they'll emit identical output.
Picking the shortest numeric field is easier for non-negative numbers than for signed negative numbers, and CBOR aims for tiny IOT devices to readily transmit data.
It fits twice as many values into each integer encoding field width, thus making the data more compact. (It'd be yet more compact if the integer encodings didn't overlap, but that'd be notably more complicated.)
It can handle twice as large a negative value before needing the bignum extension.

Is it possible to predict when Perl's decimal/float math will be wrong? [duplicate]

This question already has answers here:
Why can't decimal numbers be represented exactly in binary?
(22 answers)
Closed 7 years ago.
In one respect, I understand that Perl's floats are inexact binary representations, which causes Perl's math to sometimes be wrong. What I don't understand, is why sometimes these floats seem to give exact answers, and other times, not. Is it possible to predict when Perl's float math will give the wrong (i.e. inexact answer)?
For instance, in the below code, Perl's math is wrong 1 time when the subtraction is "16.12 - 15.13", wrong 2 times when the problem is "26.12 - 25.13", and wrong 20 times when the problem is "36.12 - 35.13". Furthermore, for some reason, in all of the above mentioned test cases, the result of our subtraction problem (i.e. $subtraction_problem) starts out as being wrong, but will tend to become more correct, the more we add or subtract from it (with $x). This makes no sense, why is it that the more we add to or subtract from our arithmetic problem, the more likely it becomes that the value is correct (i.e. exact)?
my $subtraction_problem = 16.12 - 15.13;
my $perl_math_failures = 0;
for (my $x = -25; $x< 25; $x++){
my $result = $subtraction_problem +$x;
print "$result\n";
$perl_math_failures++ if length $result > 6;
}
print "There were $perl_math_failures perl math failures!\n";
None of this is Perl specific. See Goldberg:
Rounding Error
Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation. Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation. The section Relative Error and Ulps describes how it is measured.
Since most floating-point calculations have rounding error anyway, does it matter if the basic arithmetic operations introduce a little bit more rounding error than necessary? That question is a main theme throughout this section. The section Guard Digits discusses guard digits, a means of reducing the error when subtracting two nearby numbers. Guard digits were considered sufficiently important by IBM that in 1968 it added a guard digit to the double precision format in the System/360 architecture (single precision already had a guard digit), and retrofitted all existing machines in the field. Two examples are given to illustrate the utility of guard digits.
The IEEE standard goes further than just requiring the use of a guard digit. It gives an algorithm for addition, subtraction, multiplication, division and square root, and requires that implementations produce the same result as that algorithm. Thus, when a program is moved from one machine to another, the results of the basic operations will be the same in every bit if both machines support the IEEE standard. This greatly simplifies the porting of programs. Other uses of this precise specification are given in Exactly Rounded Operations.

Perl: Which floating point operations are lossless?

There are some great docs out there describing the nature of floating point numbers and why precision must necessarily be lost in some floating point operations. E.g.,:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
I'm interested in understanding which floating point operations are lossless in terms of the precision of the underlying number. For instance,
$x = 2.0/3.0; # this is an operation which will lose some precision
But will this lose precision?
$x = 473.25 / 1000000;
or this?
$x = 0.3429 + 0.2939201;
What general rules of thumb can one use to know when precision will be lost and when it will be retained?
In general most operations will introduce some precision loss. The exceptions are values that are exactly representable in floating point (mantissa is expressible as a non-repeating binary). When applying arithmetic, both operands and the result would have to be expressible without precision loss.
You can construct arbitrary examples that have no precision loss, but the number of such examples is small compared with the domain. For example:
1.0f / 8.0f = 0.125f
Put another way, the mantissa must be expressible as A/B where B is a power of 2, and (as pointed out by #ysth) both A and B are smaller than some upper bound which is dictated by the total number of bits available for the mantissa in the representation.
I don't think you'll find any useful rules of thumb. Generally speaking, precision will not be lost if both the following are true:
Each operand has an exact representation in floating point
The (mathematical) result has an exact representation in floating point
In your second example, 473.25 / 1000000, each operand has an exact floating point representation, but the quotient does not. (Any rational number that has a factor of 5 in the denominator has a repeating expansion in base 2 and hence no exact representation in floating point.) The above rules are not particularly useful because you cannot tell ahead of time whether the result is going to be exact just by looking at the operands; you need to also know the result.
Your question makes a big assumption.
The issue isn't whether certain operations are lossless, it's whether storing numbers in floating point at all will be without loss. For example, take this number from your example:
$x = .3429;
sprintf "%.20f", $x;
Outputs:
0.34289999999999998000
So to answer your question, some operations might be lossless in specific cases. However, it depends on both the original numbers and the result, so should never be counted upon.
For more information, read perlnumber.

Irrational number representation in computer

We can write a simple Rational Number class using two integers representing A/B with B != 0.
If we want to represent an irrational number class (storing and computing), the first thing came to my mind is to use floating point, which means use IEEE 754 standard (binary fraction). This is because irrational number must be approximated.
Is there another way to write irrational number class other than using binary fraction (whether they conserve memory space or not) ?
I studied jsbeuno's solution using Python: Irrational number representation in any programming language?
He's still using the built-in floating point to store.
This is not homework.
Thank you for your time.
With a cardinality argument, there are much more irrational numbers than rational ones. (and the number of IEEE754 floating point numbers is finite, probably less than 2^64).
You can represent numbers with something else than fractions (e.g. logarithmically).
jsbeuno is storing the number as a base and a radix and using those when doing calcs with other irrational numbers; he's only using the float representation for output.
If you want to get fancier, you can define the base and the radix as rational numbers (with two integers) as described above, or make them themselves irrational numbers.
To make something thoroughly useful, though, you'll end up replicating a symbolic math package.
You can always use symbolic math, where items are stored exactly as they are and calculations are deferred until they can be performed with precision above some threshold.
For example, say you performed two operations on a non-irrational number like 2, one to take the square root and then one to square that. With limited precision, you may get something like:
(√2)²
= 1.414213562²
= 1.999999999
However, storing symbolic math would allow you to store the result of √2 as √2 rather than an approximation of it, then realise that (√x)² is equivalent to x, removing the possibility of error.
Now that obviously involves a more complicated encoding that simple IEEE754 but it's not impossible to achieve.