Efficiently Store Decimal Numbers with Many Leading Zeros in Postgresql - postgresql

A number like:
0.000000000000000000000000000000000000000123456
is difficult to store without a large performance penalty with the available numeric types in postgres. This question addresses a similar problem, but I don't feel like it came to an acceptable resolution. Currently one of my colleagues landed on rounding numbers like this to 15 decimal places and just storing them as:
0.000000000000001
So that the double precision numeric type can be used which prevents the penalty associated with moving to a decimal numeric type. Numbers that are this small for my purposes are more or less functionally equivalent, because they are both very small (and mean more or less the same thing). However, we are graphing these results and when a large portion of the data set would be rounded like this it looks exceptionally stupid (flat line on the graph).
Because we are storing tens of thousands of these numbers and operating on them, the decimal numeric type is not a good option for us as the performance penalty is too large.
I am a scientist, and my natural inclination would just be to store these types of numbers in scientific notation, but it does't appear that postgres has this kind of functionality. I don't actually need all of the precision in the number, I just want to preserve 4 digits or so, so I don't even need the 15 digits that the float numeric type offers. What are the advantages and disadvantages of storing these numbers in two fields like this:
1.234 (real)
-40 (smallint)
where this is equivalent to 1.234*10^-40? This would allow for ~32000 leading decimals with only 2 bytes used to store them and 4 bytes to store the real value, for a total of maximally 6 bytes per number (gives me the exact number I want to store and takes less space than the existing solution which consumes 8 bytes). It also seems like sorting these numbers would be much improved as you'd need only sort on the smallint field first followed by the real field second.

You and/or your colleague seem to be confused about what numbers can be represented using the floating point formats.
A double precision (aka float) number can store at least 15 significant digits, in the range from about 1e-307 to 1e+308. You have to think of it as scientific notation. Remove all the zeroes and move that to the exponent. If whatever you have once in scientific notation has less than 15 digits and an exponent between -307 and +308, it can be stored as is.
That means that 0.000000000000000000000000000000000000000123456 can definitely be stored as a double precision, and you'll keep all the significant digits (123456). No need to round that to 0.000000000000001 or anything like that.
Floating point numbers have well-known issue of exact representation of decimal numbers (as decimal numbers in base 10 do not necessarily map to decimal numbers in base 2), but that's probably not an issue for you (it's an issue if you need to be able to do exact comparisons on such numbers).

What are the advantages and disadvantages of storing these numbers in
two fields like this
You'll have to manage 2 columns instead of one.
Roughly, what you'll be doing is saving space by storing lower-precision floats. If you only need 4 digits of precision, you can go further and save 2 more bytes by using smallint + smallint (1000-9999 + exponent). Using that format, you could cram the two smallint into one 32 bits int (exponent*2^16 + mantissa), that should work too.
That's assuming that you need to save storage space and/or need to go beyond the +/-308 digits exponent limit of the double precision float. If that's not the case, the standard format is fine.

Related

T-SQL Data type for fixed precision and variable scale

I have a set of data with a precision of 16 digits, however this can range from very large numbers with all 16 digits to the left of the decimal point to very small number with all digits to right of the decimal point. (e.g. 1234567890123456.0 & 0.1234567890123456 ) I am trying to figure out the correct ("best") data type to store this data in. I need to store the exact values and not an approximations so float & real are not viable options. Numeric or decimal seem appropriate, however I am getting hung up on the most efficient precision & scale to set, it seems I must go with (32,16) to account for both extremes, but that seem inefficient as I am requesting twice the bit storage that I will ever use. Is there a better option?
Thank You for your assistance.

Postgres floating point math - do I need to do anything special?

I am new to PG and I'm wondering if I need to 'do anything' extra to properly handle floating-point math.
For example, in ruby you use BigDecimal, and in Elixir you use Decimal.
Is what I have below the best solution for PG?
SELECT
COALESCE(SUM(active_service_fees.service_fee * (1::decimal - active_service_fees.withdraw_percentage_discount)), 0)
FROM active_service_fees
Data types:
service_fee integer NOT NULL
withdraw_percentage_discount numeric(3,2) DEFAULT 0.0 NOT NULL
It depends on what you want.
If you want floating point numbers you need to use the data types real or double precision, depending on your precision requirements.
These floating point numbers need a fixed space (4 or 8 bytes), are stored in binary representation and have limited precision.
If you want arbitrary precision, you can use the binary coded decimal type numeric (decimal is a synonym for it).
Such values are stored as decimal digits, and the amount of storage required depends on the number of digits.
The big advantage of floating point numbers is performance – floating point arithmetic is implemented in hardware in the processor, while arithmetic on binary coded decimals is implemented in PostgreSQL.
A rule of thumb would be:
If you need values that are exact up to a certain number of decimal places (like monetary data) and you don't need to do a lot of calculations, use decimal.
If you need to do number crunching and you don't need values rounded to a fixed precision, use double precision.

Selecting floating point numbers in decimal form

I've a small number in a PostgreSQL table:
test=# CREATE TABLE test (r real);
CREATE TABLE
test=# INSERT INTO test VALUES (0.00000000000000000000000000000000000000000009);
INSERT 0 1
When I run the following query it returns the number as 8.96831e-44:
test=# SELECT * FROM test;
r
-------------
8.96831e-44
(1 row)
How can I show the value in psql in its decimal form (0.00000000000000000000000000000000000000000009) instead of the scientific notation? I'd be happy with 0.0000000000000000000000000000000000000000000896831 too. Unfortunately I can't change the table and I don't really care about loss of precision.
(I've played with to_char for a while with no success.)
Real in Postgres is a floating point datatype, stored on 4 bytes, that is 32 bits.
Your value,
0.00000000000000000000000000000000000000000009
Can not be precisely represented in a 32bit IEEE754 floating point number. You can check the exact values in this calculator
You cold try and use double precision (64bits) to store it, according to the calculator, that seems to be an exact representation. NOT TRUE Patricia showed that it was just the calculator rounding the value, even though explicitly asking it not to... Double would mean a bit more precision, but still no exact value, as this number is not representable using finite number of binary digits. (Thanks, Patricia, a lesson learnt (again): don't believe what you see on the Intertubez)
Under normal circumstances, you should use a NUMERIC(precision, scale) format, that would store the number precisely to get back the correct value.
However, your value to store seems to have a scale larger than postgres allows (which seems to be 30) for exact decimal represenations. If you don't want to do calculations, just store them (which would not be a very common situation, I admit), you could try storing them as strings... (but this is ugly...)
EDIT
This to_char problem seems to be a known bug...
Quote:
My immediate reaction to that is that float8 values don't have 57 digits
of precision. If you are expecting that format string to do something
useful you should be applying it to a numeric column not a double
precision one.
It's possible that we can kluge things to make this particular case work
like you are expecting, but there are always going to be similar-looking
cases that can't work because the precision just isn't there.
In a quick look at the code, the reason you just get "0." is that it's
rounding off after 15 digits to ensure it doesn't print garbage. Maybe
it could be a bit smarter for cases where the value is very much smaller
than 1, but it wouldn't be a simple change.
(from here)
However, I find this not defendable. IMHO a double (IEEE754 64bit floating point to be exact) will always have ~15 significant decimal digits, if the value fits into the type...
Recommended reading:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Postgres numeric types
BUG #6217: to_char() gives incorrect output for very small float values

comparing float and double and printing them

I have a quick question. So, say I have a really big number up to like 15 digits, and I would take the input and assign it to two variables, one float and one double if I were to compare two numbers, how would you compare them? I think double has the precision up to like 15 digits? and float has 8? So, do I simply compare them while the float only contains 8 digits and pad the rest or do I have the float to print out all 15 digits and then make the comparison? Also, if I were asked to print out the float number, is the standard way of doing it is just printing it up to 8 digits? which is its max precision
thanks
Most languages will do some form of type promotion to let you compare types that are not identical, but reasonably similar. For details, you would have to indicate what language you are referring to.
Of course, the real problem with comparing floating point numbers is that the results might be unexpected due to rounding errors. Most mathematical equivalences don't hold for floating point artihmetic, so two sequences of operations which SHOULD yield the same value might actually yield slightly different values (or even very different values if you aren't careful).
EDIT: as for printing, the "standard way" is based on what you need. If, for some reason, you are doing monetary computations in floating point, chances are that you'll only want to print 2 decimal digits.
Thinking in terms of digits may be a problem here. Floats can have a range from negative infinity to positive infinity. In C# for example the range is ±1.5 × 10^−45 to ±3.4 × 10^38 with a precision of 7 digits.
Also, IEEE 754 defines floats and doubles.
Here is a link that might help http://en.wikipedia.org/wiki/IEEE_floating_point
Your question is the right one. You want to consider your approach, though.
Whether at 32 or 64 bits, the floating-point representation is not meant to compare numbers for equality. For example, the assertion 2.0/7.0 == 60.0/210.0 may or may not be true in the CPU's view. Conceptually, the floating-point is inherently meant to be imprecise.
If you wish to compare numbers for equality, use integers. Consider again the ratios of the last paragraph. The assertion that 2*210 == 7*60 is always true -- noting that those are the integral versions of the same four numbers as before, only related using multiplication rather than division. One suspects that what you are really looking for is something like this.

Fixed point vs Floating point number

I just can't understand fixed point and floating point numbers due to hard to read definitions about them all over Google. But none that I have read provide a simple enough explanation of what they really are. Can I get a plain definition with example?
A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point) and a specific number of bits reserved for the fractional part (the part to the right of the decimal point). No matter how large or small your number is, it will always use the same number of bits for each portion. For example, if your fixed point format was in decimal IIIII.FFFFF then the largest number you could represent would be 99999.99999 and the smallest non-zero number would be 00000.00001. Every bit of code that processes such numbers has to have built-in knowledge of where the decimal point is.
A floating point number does not reserve a specific number of bits for the integer part or the fractional part. Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent). So a floating point number that took up 10 digits with 2 digits reserved for the exponent might represent a largest value of 9.9999999e+50 and a smallest non-zero value of 0.0000001e-49.
A fixed point number just means that there are a fixed number of digits after the decimal point. A floating point number allows for a varying number of digits after the decimal point.
For example, if you have a way of storing numbers that requires exactly four digits after the decimal point, then it is fixed point. Without that restriction it is floating point.
Often, when fixed point is used, the programmer actually uses an integer and then makes the assumption that some of the digits are beyond the decimal point. For example, I might want to keep two digits of precision, so a value of 100 means actually means 1.00, 101 means 1.01, 12345 means 123.45, etc.
Floating point numbers are more general purpose because they can represent very small or very large numbers in the same way, but there is a small penalty in having to have extra storage for where the decimal place goes.
From my understanding, fixed-point arithmetic is done using integers. where the decimal part is stored in a fixed amount of bits, or the number is multiplied by how many digits of decimal precision is needed.
For example, If the number 12.34 needs to be stored and we only need two digits of precision after the decimal point, the number is multiplied by 100 to get 1234. When performing math on this number, we'd use this rule set. Adding 5620 or 56.20 to this number would yield 6854 in data or 68.54.
If we want to calculate the decimal part of a fixed-point number, we use the modulo (%) operand.
12.34 (pseudocode):
v1 = 1234 / 100 // get the whole number
v2 = 1234 % 100 // get the decimal number (100ths of a whole).
print v1 + "." + v2 // "12.34"
Floating point numbers are a completely different story in programming. The current standard for floating point numbers use something like 23 bits for the data of the number, 8 bits for the exponent, and 1 but for sign. See this Wikipedia link for more information on this.
The term ‘fixed point’ refers to the corresponding manner in which numbers are represented, with a fixed number of digits after, and sometimes before, the decimal point.
With floating-point representation, the placement of the decimal point can ‘float’ relative to the significant digits of the number.
For example, a fixed-point representation with a uniform decimal point placement convention can represent the numbers 123.45, 1234.56, 12345.67, etc, whereas a floating-point representation could in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000, etc.
There's of what a fixed-point number is and , but very little mention of what I consider the defining feature. The key difference is that floating-point numbers have a constant relative (percent) error caused by rounding or truncating. Fixed-point numbers have constant absolute error.
With 64-bit floats, you can be sure that the answer to x+y will never be off by more than 1 bit, but how big is a bit? Well, it depends on x and y -- if the exponent is equal to 10, then rounding off the last bit represents an error of 2^10=1024, but if the exponent is 0, then rounding off a bit is an error of 2^0=1.
With fixed point numbers, a bit always represents the same amount. For example, if we have 32 bits before the decimal point and 32 after, that means truncation errors will always change the answer by 2^-32 at most. This is great if you're working with numbers that are all about equal to 1, which gain a lot of precision, but bad if you're working with numbers that have different units--who cares if you calculate a distance of a googol meters, then end up with an error of 2^-32 meters?
In general, floating-point lets you represent much larger numbers, but the cost is higher (absolute) error for medium-sized numbers. Fixed points get better accuracy if you know how big of a number you'll have to represent ahead of time, so that you can put the decimal exactly where you want it for maximum accuracy. But if you don't know what units you're working with, floats are a better choice, because they represent a wide range with an accuracy that's good enough.
It is CREATED, that fixed-point numbers don't only have some Fixed number of decimals after point (digits) but are mathematically represented in negative powers. Very good for mechanical calculators:
e.g, the price of smth is USD 23.37 (Q=2 digits after the point. ) The machine knows where the point is supposed to be!
Take the number 123.456789
As an integer, this number would be 123
As a fixed point (2), this
number would be 123.46 (Assuming you rounded it up)
As a floating point, this number would be 123.456789
Floating point lets you represent most every number with a great deal of precision. Fixed is less precise, but simpler for the computer..