Numeric vs Real Datypes for Storing Monetary Values - postgresql

An answer to a question about a good schema for stock data recommended this schema:
Symbol - char 6
Date - date
Time - time
Open - decimal 18, 4
High - decimal 18, 4
Low - decimal 18, 4
Close - decimal 18, 4
Volume - int
In addition, Postgres documentation says:
"If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead (of floating point types)."
I'm fairly new at SQL, and I hope this is not a really naive question. I'm wondering about the necessity of using the numeric datatype (especially 18,4) - it seems like overkill to me. And "exact" is not really something I'd specify, if exact means correct out to 12 decimal places.
I'm thinking of using real 10,2 for the monetary columns. Here's my rationale.
A typical calculation might compare a stock price (2 decimal places) to a moving average (that could have many decimal places), to determine which is larger. My understanding is that the displayed value of the average (and any calculated results) would be rounded to 2 decimal places, but that calculations would be performed using the higher precision of the stored internal number.
So such a calculation would be accurate to at least 2 decimal places, which is really all I need, I think.
Am I way off base here, and is it possible to get an incorrect answer to the above comparison by using the real 10,2 datatype?
I'd also welcome any other comments, pro or con, about using the numeric datatype.
Thanks in advance.

Floating point variables are vulnerable to floating point errors. Therefore, if accuracy is important (anytime money is involved) it's always recommended to use a numeric type.
https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems
Floating point inaccuracy examples

Let's start with the schema above, and look how 18,4 would look like in floating point numbers:
select '12345678901234.5678'::float4;
float4
-------------
1.23457e+13
(1 row)
select '12345678901234.5678'::double precision;
float8
------------------
12345678901234.6
(1 row)
Therefore 14 numbers (before the decimal point) will always round your number, and you store rounded (and therefore wrong) values.
Also your assumption about rounding to two decimal places - where is that assumption coming from?
select '1.2345678'::float4;
float4
---------
1.23457
(1 row)
Therefore, so far you presented a number of assumptions, and shortcuts, without showing why you want to use floating point numbers instead of numeric. What is your compelling reason? Just save some bytes?
My next question is: if your application expands, and does more than just "avg" calculations - do you need to chance the data type to numeric again?

Related

Storing values as real data type in PostgreSQL. Confusion about the concept of precision

I am confused about the real data type in PostgreSQL:
CREATE TABLE number_data_types (
numeric_column numeric(20,5),
real_column real,
double_column double precision
);
INSERT INTO number_data_types
VALUES (.7, .7, .7),
(2.13579, 2.13579, 2.13579),
(2.1357987654, 2.1357987654, 2.1357987654)
);
SELECT * FROM number_data_types;
The output in the 3rd row, 2nd column is 2.1357987. Since the real data type in PostgreSQL has a precision of 6, the number of digits in a number that can be stored is 6. I expected to see the number 2.13579, because there are 6 digits in the number. What's wrong with my thought?
In my textbook, the author writes: "On the third row, you see PostgreSQL's default behavior in those two columns, which is to output floating-point numbers using their shortest precise decimal representation rather than show the entire value."
The precision on a real is at least 6 digits. From the docs...
On all currently supported platforms, the real type has a range of around 1E-37 to 1E+37 with a precision of at least 6 decimal digits.
When displayed a real (aka float4) can display up to 9 digits.
By default, floating point values are output in text form in their shortest precise decimal representation; the decimal value produced is closer to the true stored binary value than to any other value representable in the same binary precision. (However, the output value is currently never exactly midway between two representable values, in order to avoid a widespread bug where input routines do not properly respect the round-to-nearest-even rule.) This value will use at most 17 significant decimal digits for float8 values, and at most 9 digits for float4 values.
Floating point numbers try to cram a lot of precision into a very small space. As a result they have a lot of quirks.

Postgres floating point math - do I need to do anything special?

I am new to PG and I'm wondering if I need to 'do anything' extra to properly handle floating-point math.
For example, in ruby you use BigDecimal, and in Elixir you use Decimal.
Is what I have below the best solution for PG?
SELECT
COALESCE(SUM(active_service_fees.service_fee * (1::decimal - active_service_fees.withdraw_percentage_discount)), 0)
FROM active_service_fees
Data types:
service_fee integer NOT NULL
withdraw_percentage_discount numeric(3,2) DEFAULT 0.0 NOT NULL
It depends on what you want.
If you want floating point numbers you need to use the data types real or double precision, depending on your precision requirements.
These floating point numbers need a fixed space (4 or 8 bytes), are stored in binary representation and have limited precision.
If you want arbitrary precision, you can use the binary coded decimal type numeric (decimal is a synonym for it).
Such values are stored as decimal digits, and the amount of storage required depends on the number of digits.
The big advantage of floating point numbers is performance – floating point arithmetic is implemented in hardware in the processor, while arithmetic on binary coded decimals is implemented in PostgreSQL.
A rule of thumb would be:
If you need values that are exact up to a certain number of decimal places (like monetary data) and you don't need to do a lot of calculations, use decimal.
If you need to do number crunching and you don't need values rounded to a fixed precision, use double precision.

Efficiently Store Decimal Numbers with Many Leading Zeros in Postgresql

A number like:
0.000000000000000000000000000000000000000123456
is difficult to store without a large performance penalty with the available numeric types in postgres. This question addresses a similar problem, but I don't feel like it came to an acceptable resolution. Currently one of my colleagues landed on rounding numbers like this to 15 decimal places and just storing them as:
0.000000000000001
So that the double precision numeric type can be used which prevents the penalty associated with moving to a decimal numeric type. Numbers that are this small for my purposes are more or less functionally equivalent, because they are both very small (and mean more or less the same thing). However, we are graphing these results and when a large portion of the data set would be rounded like this it looks exceptionally stupid (flat line on the graph).
Because we are storing tens of thousands of these numbers and operating on them, the decimal numeric type is not a good option for us as the performance penalty is too large.
I am a scientist, and my natural inclination would just be to store these types of numbers in scientific notation, but it does't appear that postgres has this kind of functionality. I don't actually need all of the precision in the number, I just want to preserve 4 digits or so, so I don't even need the 15 digits that the float numeric type offers. What are the advantages and disadvantages of storing these numbers in two fields like this:
1.234 (real)
-40 (smallint)
where this is equivalent to 1.234*10^-40? This would allow for ~32000 leading decimals with only 2 bytes used to store them and 4 bytes to store the real value, for a total of maximally 6 bytes per number (gives me the exact number I want to store and takes less space than the existing solution which consumes 8 bytes). It also seems like sorting these numbers would be much improved as you'd need only sort on the smallint field first followed by the real field second.
You and/or your colleague seem to be confused about what numbers can be represented using the floating point formats.
A double precision (aka float) number can store at least 15 significant digits, in the range from about 1e-307 to 1e+308. You have to think of it as scientific notation. Remove all the zeroes and move that to the exponent. If whatever you have once in scientific notation has less than 15 digits and an exponent between -307 and +308, it can be stored as is.
That means that 0.000000000000000000000000000000000000000123456 can definitely be stored as a double precision, and you'll keep all the significant digits (123456). No need to round that to 0.000000000000001 or anything like that.
Floating point numbers have well-known issue of exact representation of decimal numbers (as decimal numbers in base 10 do not necessarily map to decimal numbers in base 2), but that's probably not an issue for you (it's an issue if you need to be able to do exact comparisons on such numbers).
What are the advantages and disadvantages of storing these numbers in
two fields like this
You'll have to manage 2 columns instead of one.
Roughly, what you'll be doing is saving space by storing lower-precision floats. If you only need 4 digits of precision, you can go further and save 2 more bytes by using smallint + smallint (1000-9999 + exponent). Using that format, you could cram the two smallint into one 32 bits int (exponent*2^16 + mantissa), that should work too.
That's assuming that you need to save storage space and/or need to go beyond the +/-308 digits exponent limit of the double precision float. If that's not the case, the standard format is fine.

Selecting floating point numbers in decimal form

I've a small number in a PostgreSQL table:
test=# CREATE TABLE test (r real);
CREATE TABLE
test=# INSERT INTO test VALUES (0.00000000000000000000000000000000000000000009);
INSERT 0 1
When I run the following query it returns the number as 8.96831e-44:
test=# SELECT * FROM test;
r
-------------
8.96831e-44
(1 row)
How can I show the value in psql in its decimal form (0.00000000000000000000000000000000000000000009) instead of the scientific notation? I'd be happy with 0.0000000000000000000000000000000000000000000896831 too. Unfortunately I can't change the table and I don't really care about loss of precision.
(I've played with to_char for a while with no success.)
Real in Postgres is a floating point datatype, stored on 4 bytes, that is 32 bits.
Your value,
0.00000000000000000000000000000000000000000009
Can not be precisely represented in a 32bit IEEE754 floating point number. You can check the exact values in this calculator
You cold try and use double precision (64bits) to store it, according to the calculator, that seems to be an exact representation. NOT TRUE Patricia showed that it was just the calculator rounding the value, even though explicitly asking it not to... Double would mean a bit more precision, but still no exact value, as this number is not representable using finite number of binary digits. (Thanks, Patricia, a lesson learnt (again): don't believe what you see on the Intertubez)
Under normal circumstances, you should use a NUMERIC(precision, scale) format, that would store the number precisely to get back the correct value.
However, your value to store seems to have a scale larger than postgres allows (which seems to be 30) for exact decimal represenations. If you don't want to do calculations, just store them (which would not be a very common situation, I admit), you could try storing them as strings... (but this is ugly...)
EDIT
This to_char problem seems to be a known bug...
Quote:
My immediate reaction to that is that float8 values don't have 57 digits
of precision. If you are expecting that format string to do something
useful you should be applying it to a numeric column not a double
precision one.
It's possible that we can kluge things to make this particular case work
like you are expecting, but there are always going to be similar-looking
cases that can't work because the precision just isn't there.
In a quick look at the code, the reason you just get "0." is that it's
rounding off after 15 digits to ensure it doesn't print garbage. Maybe
it could be a bit smarter for cases where the value is very much smaller
than 1, but it wouldn't be a simple change.
(from here)
However, I find this not defendable. IMHO a double (IEEE754 64bit floating point to be exact) will always have ~15 significant decimal digits, if the value fits into the type...
Recommended reading:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Postgres numeric types
BUG #6217: to_char() gives incorrect output for very small float values

From Fraction to Decimal and Back?

Is it possible to store a fraction like 3/6 in a variable of some sort?
When I try this it only stores the numbers before the /. I know that I can use 2 variables and divide them, but the input is from a single text field. Is this at all possible?
I want to do this because I need to calculate the fractions to decimal odds.
A bonus question ;) - Is there an easy way to calculate a decimal value to a fraction? Thanks..
Well in short, there is no true way to extract the original fraction out of a decimal.
Example: take 5/10
you will get 0.5
now, 0.5 also translates back to 1/2, 2/4, 3/6, etc.
Your best bet is to store each integer separately, and perform the calculation later on.
The best thing to do is to implement a fraction class (or rational number class). Normally it would take a numerator and denominator and be able to provide a double, and do basic math with other fraction objects. It should also be able to parse and format fractions.
Rational Arithmetic on Rosetta Code looks like something good to start with.
I'm afraid there aren't any easy answers for you on this. For creating the fraction, you'll have to split the text field on the '/', convert the two halves to doubles, and divide them out. As for converting it back to a fraction, you'll have to crack open a math textbook and figure it out. (Even worse, a double is not actually precise—you may think it has 0.1 in it, but it really has 0.09999999999999998726 or something like that, so you'll have to choose a precision and go for it, or write some sort of fraction class that's based on a pair of integers.)
The method, as been said, is to store the numerator and denominator, much in the way you can write it on paper.
for 'C' use the
GNU Multiple Precision Arithmetic Library
look for 'rational' in the docs.
Is there an easy way to calculate a decimal value to a fraction?
If you limit your decimal values to a certain number of decimal points you could create a lookup table.
0.3333, 1/3
0.6666, 2/3
0.0625, 1/16
0.1250, 1/8
0.2500, 1/4
0.5000, 1/2
0.7500, 3/4
etc...
So if the user input 0.5 you pad it with 0's until you got 4 decimal places. You would then use the lookup table to return "1/2". The lookup table should probably be a dictionary of sorts.
It wouldn't be too difficult to do estimating either. For example, if the user entered 0.0624 you could easily select the value in the table closest to that decimal. In this case it would return "1/16."
Don't let typing/entering of the finite set of decimal/fraction pairs scares you (it's really not that large depending on the precision you choose).
If all else fails perhaps a google search would reveal a library that does this sort of this for you.