The Swift library includes the function bigEndian that can be used on integer types (such as Int, UInt, UInt8, UInt64, Int64, etc) to convert them from host order (which might presumably be anything, but realistically will be big or little endian) to network byte order (which is big endian). There're some good SO answers referring to this, and a particularly complete one is here.
However, I've not found a good resource that covers arranging a Float (32 bit) or Double (64 bit) type in to network byte order. Given that these types don't have a bigEndian method, I'm wondering if there is some subtlety involved? (The linked question does discuss floating point types, but I'm not sure it is definitely covering all details that might be relevant).
Specifically, I want to handle the 64 bit Double floating point type. I'd like a solution that will work on any platform where Swift is available.
Thank you.
Related
Some of the values I need to read in my ksy file are double's which I assume is a binary64 structure. The native data-types for a float won't stretch that far. Has anyone managed to represent this datatype in Kaitai ?
"binary64" is a normal IEEE 754 double-precision floats, occupying 64 bits = 8 bytes.
They're perfectly supported by vast majority of languages and, subsequently, Kaitai Struct offers built-in supports for them as type: f8 (float, 8 bytes long).
If you're rather interested in larger floating point values (binary128, binary256 — i.e. quad or octuple precision), there is no built-in support for them in KS due to lack of standard support for these types in most target languages. If you want something like that, the recommended way would be implementing one as opaque type in a target language of your choice. That will likely require you to bringing in some external library which implements this type using some kind of software emulation / complex arithmetics — as hardware support seems to be almost non-existent in commodity CPUs (like Intel or ARM) as of 2020.
For more details on these, see issue #101.
My English is not good so i apologize for it.
i experienced little about java and C++. But there is a problem. I only use integer for integer numbers and double for decimal numbers. There are many types like float, long int etc. Is there a specific way to decide what i must use?
It purely depends on the size of the data and of course the type of it. For example if you have a very large number that cannot fit within the size of a machine word (typically mapped to an int[eger] type) then you would choose long, and so forth.
For a small number I would go with char (since it occupies one byte in C/C++), or short if the number is greater than 255 but less than 65535, etc.
And all of these again depend on the programming language.
Be sure to check your programming language reference for the limits.
Hope that helps.
Different numerical data types are used for different value ranges. What range applies to what data type depends on the language you are using and the operating system, where the program is compiled/run.
For example, byte data type uses 1 byte of storage and can store numbers from 0 to 255. word data type usually uses 2 bytes of storage and can store numbers from 0 to 65,536. Then you get int - here the number of bytes vary, but often it would be 4 bytes with values of -2^31 to 2^31-1 - and so on. In C/C++ there also qualifiers signed and unsigned, which are not present in Java.
With float/double, not only the range of numbers, but also the precision (the number of decimal places that can be stored) will be one of the deciding factors. With double you can store a lot more decimal places than with single.
On the whole, the decision will be based on what data you need to store in it, how much memory you're willing to allocate and what platform you're running on. Check your language documentation for more details. For example, this page describes primitive data types in java.
You must check first for the type of data you want to store with the reference of data types provided in that programming language. Then very important you must check for the range of that data type...
I know that we need to convert decimal, octal, and hexadecimal into binary, but I am confused about conversion of decimal to octal or octal to hexadecimal or decimal to hexadecimal.
Why and where we need these types of conversion?
Different bases are good for different purposes.
Decimal is obviously what most people know how to deal with, so is good for output of real quantities to end users.
Hex is short and has an even ratio of exactly 2 characters per byte, so it's good for expressing large numbers like SHA1 hashes or private keys and the like in a type-able format, particularly since those numbers don't really represent a quantity, so users don't need to be able to understand them as numbers.
Octal is mostly for legacy reasons -- UNIX file permission codes are traditionally expressed as octal numbers, for example, because three bits per digit corresponds nicely to the three bits per user-category of the UNIX permission encoding scheme.
One sometimes will want to use numbers in one base for a purpose where another base is desired. Thus, the various conversion functions available. In truth, however, my experience is that in practice you almost never convert from one base to another much, except to convert numbers from some non-binary base into binary (in the form of your language of choice's native integral type) and back out into whatever base you need to output. Most of the time one goes from one non-binary base to another is when learning about bases and getting a feel for what numbers in different bases look like, or when debugging using hexadecimal output. Even then, if a computer does it the main method is to convert to binary and then back out, because current computers are just inherently good at dealing with base-2 numbers and not-so-good at anything else.
One important place you see numbers actually stored and operated on in decimal is in some financial applications or others where it's important that "number-of-decimal-place" level precision be preserved. Sometimes fixed-point arithmetic can work for currency, but not always, and if it doesn't using binary-floating-point is a bad idea. Older systems actually had built in support for this in the form of binary-coded-decimal arithmetic. In BCD, each 4 bits acts as a decimal digit, so you give up a chunk of every 4 bits of storage in exchange for maintaining your level of precision in the base-of-choice of the non-computing world.
Oddly enough, there is one common use case for other bases that's a bit hidden. Modern languages with large number support (e.g. Python 2.x's long type or Java's BigInteger and BigDecimal type) will usually store the numbers internally in an array with each element being a digit in some base. Then they implement the math they support on strings of digits of that base. Really efficient bigint implementations may actually use use a base approaching 2^(bits in machine native word size); a base 2^64 number is obviously impossible to usefully output in that form, but doing the calculations in chunks of that size ends up making the best use of space and the CPU. (I don't know if that's the best base; it may be best to use a base of half that number of bits to simplify overflow handling from one digit to the next. It's been awhile since I wrote my own bigint and I never implemented the faster/more-complicated versions of multiplication and division.)
MIME uses hexadecimal system for Quoted Printable encoding (e.g. mail subject in Unicode) abd 64-based system for Base64 encoding.
If your workplace is stuck in IPv4 CIDR - you'll be doing quite a lot of bin -> hex -> decimal conversions managing most of the networking equipment until you get them memorized (or just use some random, simple tool).
Even that usage is a bit few-and-far-between - most businesses just adopt the lazy "/24 everything" approach.
If you do a lot of graphics work - there's the chance you'll want to convert colors between systems and need to convert from hex -> dec... most tools have this built in to the color picker, though.
I suppose there's no practical reason to be able to do other than it's really simple and there's no point not learning how to do it. :)
... unless, for some reason, you're trying to do mantissa binary math in your head.
All of these bases have their uses. Hexadecimal in particular is useful as a shorthand for binary. Every hexadecimal digit is equivalent to 4 bits, so you can write a full 32-bit value as a string of 8 hex digits. Likewise, octal digits are equivalent to 3 bits, and are used frequently as a shorthand for things like Unix file permissions (777 = set read, write, execute bits for user/group/other).
No one base is special--they all have their (obscure) uses. Decimal is special to us because it reflects human experience (10 fingers) but that's really the only reason.
A real world use case: a program prints error code in decimal, to get info from a database or the internet you need the hexadecimal format, because the bits of the error 'number' convey extra info you need to look at it in binary.
I'm there are occasional uses for this. One use case would be a little app that allows user who wants to convert decimal to octal ... like you can with lots of calculators.
But I'm not sure I understand the point of the question. Standard libraries typically don't provide methods like String toOctal(String decimal). Instead, you would normally convert from a decimal String to a primitive integer and then from the primitive integer to (say) an octal String.
I just read that C99 has double_t which should be at least as wide as double. Does this imply that it gives more precision digits after the decimal place? More than the usual 15 digits for double?.
Secondly, how to use it: Is only including
#include <float.h>
enough? I read that one has to set the FLT_EVAL_METHOD to 2 for long double. How to do this? As I work with numerical methods, I would like maximum precision without using an arbitrary precision library.
Thanks a lot...
No. double_t is at least as wide as double; i.e., it might be the same as double. Footnote 190 in the C99 standard makes the intent clear:
The types float_t and double_t are
intended to be the implementation’s
most efficient types at least as wide
as float and double, respectively.
As Michael Burr noted, you can't set FLT_EVAL_METHOD.
If you want the widest floating-point type on any system available using only C99, use long double. Just be aware that on some platforms it will be the same as double (and could even be the same as float).
Also, if you "work with numerical methods", you should be aware that for many (most even) numerical methods, the approximation error of the method is vastly larger than the rounding error of double precision, so there's often no benefit to using wider types. Exceptions exist, of course. What type of numerical methods are you working on, specifically?
Edit: seriously, either (a) just use long double and call it a day or (b) take a few weeks to learn about how floating-point is actually implemented on the platforms that you're targeting, and what the actual accuracy requirements are for the algorithms that you're implementing.
Note that you don't get to set FLT_EVAL_METHOD - it is set by the compiler's headers to let you determine how the library does certain things with floating point.
If your code is very sensitive to exactly how floating point operations are performed, you can use the value of that macro to conditionally compile code to handle those differences that might be important to you.
So for example, in general you know that double_t will be at least a double in all cases. If you want your code to do something different if double_t is a long double then your code can test if FLT_EVAL_METHOD == 2 and act accordingly.
Note that if FLT_EVAL_METHOD is something other than 0, 1, or 2 you'll need to look at the compiler's documentation to know exactly what type double_t is.
double_t may be defined by typedef double double_t; — of course, if you plan to rely on implementation specifics, you need to look at your own implementation.
I'm very new to Ada and was trying to see if it offers double precision type. I see that we have float and
Put( Integer'Image( Float'digits ) );
on my machine gives a value of 6, which is not enough for numerical computations.
Does Ada has double and long double types as in C?
Thanks a lot...
It is a wee bit more complicated than that.
The only predefined floating-point type that compilers have to support is Float. Compilers may optionally support Short_Float and Long_Float. You should be able to look in appendex F of your compiler documentation to see what it supports.
In practice, your compiler almost certianly defines Float as a 32-bit IEEE float, and Long_Float as a 64-bit. Note that C pretty much works this way too with its float and double. C doesn't actually define the size of those.
If you absolutely must have a certian precision (eg: you are sharing the data with something external that must use IEEE 64-bit), then you should probably define your own float type with exactly that precision. That would ensure your code is either portable to any platform or compiler you move it to, or that it will produce a compiler error so you can fix the issue.
You can create any size Float you like. For a long it would be:
type My_Long_Float is digits 11;
Wiki Books is a good reference for things like this.