double precision in Ada? - double

I'm very new to Ada and was trying to see if it offers double precision type. I see that we have float and
Put( Integer'Image( Float'digits ) );
on my machine gives a value of 6, which is not enough for numerical computations.
Does Ada has double and long double types as in C?
Thanks a lot...

It is a wee bit more complicated than that.
The only predefined floating-point type that compilers have to support is Float. Compilers may optionally support Short_Float and Long_Float. You should be able to look in appendex F of your compiler documentation to see what it supports.
In practice, your compiler almost certianly defines Float as a 32-bit IEEE float, and Long_Float as a 64-bit. Note that C pretty much works this way too with its float and double. C doesn't actually define the size of those.
If you absolutely must have a certian precision (eg: you are sharing the data with something external that must use IEEE 64-bit), then you should probably define your own float type with exactly that precision. That would ensure your code is either portable to any platform or compiler you move it to, or that it will produce a compiler error so you can fix the issue.

You can create any size Float you like. For a long it would be:
type My_Long_Float is digits 11;
Wiki Books is a good reference for things like this.

Related

What are scenarios where you should use Float in Swift?

Learning about the difference between Floats and Doubles in Swift. I can't think of any reasons to use Float. I know there are, and I know I am just not experienced enough to understand them.
So my question is why would you use float in Swift?
why would you use float in Swift
Left to your own devices, you likely never would. But there are situations where you have to. For example, the value of a UISlider is a Float. So when you retrieve that number, you are working with a Float. It’s not up to you.
And so with all the other numerical types. Swift includes a numerical type corresponding to every numerical type that you might possibly encounter as you interface with Cocoa and the outside world.
Float is a typealias for Float32. Float32 and Float16 are incredibly useful for GPU programming with Metal. They both will feel as archaic someday on the GPU as they do on the CPU, but that day is years off.
https://developer.apple.com/metal/
Double
Represents a 64-bit floating-point number.
Has a precision of at least 15 decimal digits.
Float
Float represents a 32-bit floating-point number.
precision of Float can be as little as 6 decimal digits.
The appropriate floating-point type to use depends on the nature and range of values you need to work with in your code. In situations where either type would be appropriate, Double is preferred.

representing Double values in Katai

Some of the values I need to read in my ksy file are double's which I assume is a binary64 structure. The native data-types for a float won't stretch that far. Has anyone managed to represent this datatype in Kaitai ?
"binary64" is a normal IEEE 754 double-precision floats, occupying 64 bits = 8 bytes.
They're perfectly supported by vast majority of languages and, subsequently, Kaitai Struct offers built-in supports for them as type: f8 (float, 8 bytes long).
If you're rather interested in larger floating point values (binary128, binary256 — i.e. quad or octuple precision), there is no built-in support for them in KS due to lack of standard support for these types in most target languages. If you want something like that, the recommended way would be implementing one as opaque type in a target language of your choice. That will likely require you to bringing in some external library which implements this type using some kind of software emulation / complex arithmetics — as hardware support seems to be almost non-existent in commodity CPUs (like Intel or ARM) as of 2020.
For more details on these, see issue #101.

What's the correct number type for financial variables in Swift?

I am used to programming in Java, where the BigDecimal type is the best for storing financial values, since there are manners to specify rounding rules over the calculations.
In the latest swift version (2.1 at the time this post is written), which native type better supports correct calculations and rounding for financial values? Is there any equivalent to java's BigDecimal? Or anything similar?
You can use NSDecimal or NSDecimalNumber for arbitrary precision numbers.
See more on NSDecimalNumbers's reference page.
If you are concerned about storing for example $1.23 in a float or double, and the potential inaccuracies you will get from floating point precision errors, that is if you actually want to stick to integer amounts of cents or pence (or whatever else). Then use an integer to store your value and use the pence/cent as your unit instead of pounds/dollars. You will then be 100% accurate when dealing in integer amounts of pence/cents, and it's easier than using a class like NSDecimalNumber. The display of that value is then purely a presentation issue.
If however you need to deal with fractions of a pence/cent, then NSDecimalNumber is probably what you want.
I recommend looking into how classes like this actually work, and how floating point numbers work too, because having an understanding of this will help you to see why precision errors arise and just what the precision limits are of a class like NSDecimalNumber, why it's better for storing decimal numbers, why floats are good at storing numbers like 17/262144 (i.e. where the denominator is a power of two) but can't store 1/100, etc.

Trying to understand how the casting/conversion is done by compiler,e.g., when cast from float to int

When a float is casted to int, how this casting is implemented by compiler.
Does compiler masks some part of memory of float variable i.e., which part of memory is plunked by compiler to pass the remaining to int variable.
I guess the answer to this lies in how the int and float is maintained in memory.
But isn't it machine dependent rather than compiler dependent. How compiler decides which part of memory to copy when casted to lower type (this is a static casting, right).
I am kind of confused with some wrong information, I guess.
(I read some questions on tag=downcasting, where debate on whether it is a cast or a conversion was going on, I am not very much interested on what it is called, as both are performed by compiler, but on how this is being performed).
...
Thanks
When talking about basic types and not pointers, then a conversion is done. Because floating point and integer representations are very different (usually IEEE-754 and two's complement respectively) it's more than just masking out some bits.
If you wanted to see the floating point number represented as an int without doing a conversion, you can do something like this (in C):
float f = 10.5;
int i2 = (int*)&f;
printf("%f %d\n", f, i2);
Most CPU architectures provide a native instruction (or multi-instruction sequence) to do float<->int conversions. The compiler will generally just generate this instruction. There's often faster methods. This question has some good information: What is the fastest way to convert float to int on x86.

double_t in C99

I just read that C99 has double_t which should be at least as wide as double. Does this imply that it gives more precision digits after the decimal place? More than the usual 15 digits for double?.
Secondly, how to use it: Is only including
#include <float.h>
enough? I read that one has to set the FLT_EVAL_METHOD to 2 for long double. How to do this? As I work with numerical methods, I would like maximum precision without using an arbitrary precision library.
Thanks a lot...
No. double_t is at least as wide as double; i.e., it might be the same as double. Footnote 190 in the C99 standard makes the intent clear:
The types float_t and double_t are
intended to be the implementation’s
most efficient types at least as wide
as float and double, respectively.
As Michael Burr noted, you can't set FLT_EVAL_METHOD.
If you want the widest floating-point type on any system available using only C99, use long double. Just be aware that on some platforms it will be the same as double (and could even be the same as float).
Also, if you "work with numerical methods", you should be aware that for many (most even) numerical methods, the approximation error of the method is vastly larger than the rounding error of double precision, so there's often no benefit to using wider types. Exceptions exist, of course. What type of numerical methods are you working on, specifically?
Edit: seriously, either (a) just use long double and call it a day or (b) take a few weeks to learn about how floating-point is actually implemented on the platforms that you're targeting, and what the actual accuracy requirements are for the algorithms that you're implementing.
Note that you don't get to set FLT_EVAL_METHOD - it is set by the compiler's headers to let you determine how the library does certain things with floating point.
If your code is very sensitive to exactly how floating point operations are performed, you can use the value of that macro to conditionally compile code to handle those differences that might be important to you.
So for example, in general you know that double_t will be at least a double in all cases. If you want your code to do something different if double_t is a long double then your code can test if FLT_EVAL_METHOD == 2 and act accordingly.
Note that if FLT_EVAL_METHOD is something other than 0, 1, or 2 you'll need to look at the compiler's documentation to know exactly what type double_t is.
double_t may be defined by typedef double double_t; — of course, if you plan to rely on implementation specifics, you need to look at your own implementation.