Maximum allowed digits after decimal for double - double

Hi,
I want to know how many number of digits are allowed after decimal
point for primitive double datatype in java, without actually getting
rounded off.

It depends on the number; read up on floating point representations for more information.
Use the BigDecimal class for fixed-scale arithmetic.

Taken literally, the number is 0. Most decimal fractions with at least one digit after the decimal point get rounded on conversion to double. For example, the decimal fraction 0.1 is rounded to 0.1000000000000000055511151231257827021181583404541015625 on conversion to double.
The problem is that double is a binary floating point system, and most decimal fractions can no more be exactly represented in than 1/3 can be exactly represented as a decimal fraction with a finite number of significant digits.
As already recommended, if truly exact representation of decimal fractions is important, use BigDecimal.
The primitive double has, for normal numbers, 53 significant bits, about 15.9 decimal digits. If very, very close is good enough, you can use double.

Related

Postgres floating point math - do I need to do anything special?

I am new to PG and I'm wondering if I need to 'do anything' extra to properly handle floating-point math.
For example, in ruby you use BigDecimal, and in Elixir you use Decimal.
Is what I have below the best solution for PG?
SELECT
COALESCE(SUM(active_service_fees.service_fee * (1::decimal - active_service_fees.withdraw_percentage_discount)), 0)
FROM active_service_fees
Data types:
service_fee integer NOT NULL
withdraw_percentage_discount numeric(3,2) DEFAULT 0.0 NOT NULL
It depends on what you want.
If you want floating point numbers you need to use the data types real or double precision, depending on your precision requirements.
These floating point numbers need a fixed space (4 or 8 bytes), are stored in binary representation and have limited precision.
If you want arbitrary precision, you can use the binary coded decimal type numeric (decimal is a synonym for it).
Such values are stored as decimal digits, and the amount of storage required depends on the number of digits.
The big advantage of floating point numbers is performance – floating point arithmetic is implemented in hardware in the processor, while arithmetic on binary coded decimals is implemented in PostgreSQL.
A rule of thumb would be:
If you need values that are exact up to a certain number of decimal places (like monetary data) and you don't need to do a lot of calculations, use decimal.
If you need to do number crunching and you don't need values rounded to a fixed precision, use double precision.

round to significant number of digits in Postgres

Is it possible to round a double precision number to a given number of significant digits in postgres?
I would like to reproduce the R function signif as closely as possible. In particular, I would like the result to conform to the IEEE 754 floating point standard for the format and rounding rules.
It looks like it may be possible using to_char: http://www.postgresql.org/docs/9.4/static/functions-formatting.html
Here is the logic for R's signif: https://github.com/wch/r-source/blob/b156e3a711967f58131e23c1b1dc1ea90e2f0c43/src/nmath/fprec.c

Meaning of Precision Vs. Range of Double Types

To begin with, allow me to confess that I'm an experienced programmer, who has over 10 years of programming experience. However, the question I'm asking here is the one, which has bugged me ever since, I first picked up a book on C about a decade back.
Below is an excerpt from a book on Python, explaining about Python Floating type.
Floating-point numbers are represented using the native
double-precision (64-bit) representation of floating-point numbers on
the machine. Normally this is IEEE 754, which provides approximately
17 digits of precision and an exponent in the range of –308 to
308.This is the same as the double type in C.
What I never understood is the meaning of the phrase
" ... which provides approximately 17 digits of precision and an
exponent in the range of –308 to 308 ... "
My intuition here goes astray, since i can understand the meaning of precision, but how can range be different from that. I mean, if a floating point number can represent a value up to 17 digits, (i.e. max of 1,000,000,000,000,000,00 - 1), then how can exponent be +308. wouldn't that make a 308 digit number if exponent is 10 or a rough 100 digit number if exponent is 2.
I hope, I'm able to express my confusion.
Regards
Vaid, Abhishek
Suppose that we write 1500 with two digits of precision. That means we are being precise enough to distinguish 1500 from 1600 and 1400, but not precise enough to distinguish 1500 from 1510 or 1490. Telling those numbers apart would require three digits of precision.
Even though I wrote four digits, floating-point representation doesn't necessarily contain all these digits. 1500 is 1.5 * 10^3. In decimal floating-point representation, with two digits of precision, only the first two digits of the number and the exponent would be stored, which I will write (1.5, 3).
Why is there a distinction between the "real" digits and the placeholder zeros? Because it tells us how precisely we can represent numbers, that is, what fraction of their value is lost due to approximation. We can distinguish 1500 = (1.5, 3) from 1500+100 = (1.6, 3). But if we increase the exponent, we can't distinguish 15000 = (1.5, 4) from 15000+100 = (1.51, 4). At best, we can approximate numbers within +/- 10% with two decimal digits of precision. This is true no matter how small or large the exponent is allowed to be.
The regular decimal representation of numbers obscures the issue. If instead, one considers them in normalised scientific notation to separate the mantissa and exponent, then the distinction is immediate. Normalisation is achieved by scaling the mantissa until it is between 0.0 and 1.0 and adjusting the exponent to avoid loss of scale.
The precision of the number is the number of digits in the mantissa. A floating point number has a limited number of bits to represent this portion of the value. It determines how accurately numbers that are similar in size can be distinguished.
The range determines the allowable values of the exponent. In your example, the range of -308 through 308 is represented independently of the mantissa value and is limited by the number of bits in the floating point number allocated to storing the range.
These two values can be varied independently to suit. For example, in many graphic pipelines much smaller values are represented with truncated values that are scaled to fit into even 16 bits.
Numerical methods libraries expend large amounts of effort in ensuring that these limits are not exceeded to maintain the correctness of calculations. Casual use will not usually encounter these limits.
The choices in IEEE 754 are accepted to be a reasonable trade off between precision and range. The 32-bit single has similar, but different limits. The question What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems? provides a longer summary and further links for more study.
From Wiki, Double Precision Floating Point numbers are expected to have a precision to 17 digits, or 17 SF. The exponent can be in the range -1022 to 1023.
Their -308 to 308 would appear to be an error, or else an idea not fully explained.

Fixed point vs Floating point number

I just can't understand fixed point and floating point numbers due to hard to read definitions about them all over Google. But none that I have read provide a simple enough explanation of what they really are. Can I get a plain definition with example?
A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point) and a specific number of bits reserved for the fractional part (the part to the right of the decimal point). No matter how large or small your number is, it will always use the same number of bits for each portion. For example, if your fixed point format was in decimal IIIII.FFFFF then the largest number you could represent would be 99999.99999 and the smallest non-zero number would be 00000.00001. Every bit of code that processes such numbers has to have built-in knowledge of where the decimal point is.
A floating point number does not reserve a specific number of bits for the integer part or the fractional part. Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent). So a floating point number that took up 10 digits with 2 digits reserved for the exponent might represent a largest value of 9.9999999e+50 and a smallest non-zero value of 0.0000001e-49.
A fixed point number just means that there are a fixed number of digits after the decimal point. A floating point number allows for a varying number of digits after the decimal point.
For example, if you have a way of storing numbers that requires exactly four digits after the decimal point, then it is fixed point. Without that restriction it is floating point.
Often, when fixed point is used, the programmer actually uses an integer and then makes the assumption that some of the digits are beyond the decimal point. For example, I might want to keep two digits of precision, so a value of 100 means actually means 1.00, 101 means 1.01, 12345 means 123.45, etc.
Floating point numbers are more general purpose because they can represent very small or very large numbers in the same way, but there is a small penalty in having to have extra storage for where the decimal place goes.
From my understanding, fixed-point arithmetic is done using integers. where the decimal part is stored in a fixed amount of bits, or the number is multiplied by how many digits of decimal precision is needed.
For example, If the number 12.34 needs to be stored and we only need two digits of precision after the decimal point, the number is multiplied by 100 to get 1234. When performing math on this number, we'd use this rule set. Adding 5620 or 56.20 to this number would yield 6854 in data or 68.54.
If we want to calculate the decimal part of a fixed-point number, we use the modulo (%) operand.
12.34 (pseudocode):
v1 = 1234 / 100 // get the whole number
v2 = 1234 % 100 // get the decimal number (100ths of a whole).
print v1 + "." + v2 // "12.34"
Floating point numbers are a completely different story in programming. The current standard for floating point numbers use something like 23 bits for the data of the number, 8 bits for the exponent, and 1 but for sign. See this Wikipedia link for more information on this.
The term ‘fixed point’ refers to the corresponding manner in which numbers are represented, with a fixed number of digits after, and sometimes before, the decimal point.
With floating-point representation, the placement of the decimal point can ‘float’ relative to the significant digits of the number.
For example, a fixed-point representation with a uniform decimal point placement convention can represent the numbers 123.45, 1234.56, 12345.67, etc, whereas a floating-point representation could in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000, etc.
There's of what a fixed-point number is and , but very little mention of what I consider the defining feature. The key difference is that floating-point numbers have a constant relative (percent) error caused by rounding or truncating. Fixed-point numbers have constant absolute error.
With 64-bit floats, you can be sure that the answer to x+y will never be off by more than 1 bit, but how big is a bit? Well, it depends on x and y -- if the exponent is equal to 10, then rounding off the last bit represents an error of 2^10=1024, but if the exponent is 0, then rounding off a bit is an error of 2^0=1.
With fixed point numbers, a bit always represents the same amount. For example, if we have 32 bits before the decimal point and 32 after, that means truncation errors will always change the answer by 2^-32 at most. This is great if you're working with numbers that are all about equal to 1, which gain a lot of precision, but bad if you're working with numbers that have different units--who cares if you calculate a distance of a googol meters, then end up with an error of 2^-32 meters?
In general, floating-point lets you represent much larger numbers, but the cost is higher (absolute) error for medium-sized numbers. Fixed points get better accuracy if you know how big of a number you'll have to represent ahead of time, so that you can put the decimal exactly where you want it for maximum accuracy. But if you don't know what units you're working with, floats are a better choice, because they represent a wide range with an accuracy that's good enough.
It is CREATED, that fixed-point numbers don't only have some Fixed number of decimals after point (digits) but are mathematically represented in negative powers. Very good for mechanical calculators:
e.g, the price of smth is USD 23.37 (Q=2 digits after the point. ) The machine knows where the point is supposed to be!
Take the number 123.456789
As an integer, this number would be 123
As a fixed point (2), this
number would be 123.46 (Assuming you rounded it up)
As a floating point, this number would be 123.456789
Floating point lets you represent most every number with a great deal of precision. Fixed is less precise, but simpler for the computer..

Xcode Obj-C Why is 7/10 not 0.7

int a=7
int b=10
float answer = (float)a/b;
answer=0.699999988 ( I expect 0.7 ??)
The short version is: Floating points are not accurate, it's only a finite set of bits, and a finite set of bits cannot be used to represent an infinite set of numbers.
The longer version is here: What Every Computer Scientist Should Know About Floating-Point Arithmetic
See also:
How is floating point stored? When does it matter?
Why is my number being rounded incorrectly?
Floating point numbers are accurate only to a certain finite number of digits of precision. You will need to do some rounding to get whole numbers.
If you need more precision, use the double data type, or the NSDecimal class (Which will preserve your decimal digits at the expense of complexity).
It is because floating point calculations are not precise.
The only thing I rely on is the existence of exact small integers (namely -2, -1, 0, 1, 2, as you might use for representing [0,1] plus some special values), and some people frown on using that too.