Does PostgreSQL support rounding at the database level? - postgresql

For example is it possible to ensure that a particular number field is always rounded to exactly three decimal places?

Yes. Define the column as DECIMAL(16, 3) - see the postgres documentation for this type
The first number (16 in this case) is the maximum total number of digits in the number, the second number (3 in this case) is the number of decimal places.
Every database (I know of) supports this datatype.

Related

How to automatically use appropriate precision/scale for numeric type in postgres?

I have a numeric type in postgres with a value of 1.9999999999999999 (scale is 16) and I run a query that calculates (1.9999999999999999+2)/2. The exact answer should be 1.99999999999999995 (scale should be 17) but instead I get 2.0000000000000000 (scale is still 16 again) because the default precision/scale of numeric is exceeded.
I could get around this by running the query and doing (1.9999999999999999+2)/2::numeric(1000,999), but then when I retrieve this value that was saved to the database, it actually appears with 1000 "0" digits in the SELECT query output.
How can I do either of the following?
Tell postgres never to round a value if it exceeds the scale, but increase the scale automatically instead
Tell postgres not to store trailing "0" digits after the decimal, and keep only the minimum number of digits it needs, so that I can use the highest possible scale ::numeric(1000,999) but it only stores the necessary digits?
I'd like to get exact results for any operations I do on numeric values to the fullest precision that postgres allows, but not store unnecessary trailing "0" digits. Like how you would expect it to work if you were calculating it manually.

Numeric vs Real Datypes for Storing Monetary Values

An answer to a question about a good schema for stock data recommended this schema:
Symbol - char 6
Date - date
Time - time
Open - decimal 18, 4
High - decimal 18, 4
Low - decimal 18, 4
Close - decimal 18, 4
Volume - int
In addition, Postgres documentation says:
"If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead (of floating point types)."
I'm fairly new at SQL, and I hope this is not a really naive question. I'm wondering about the necessity of using the numeric datatype (especially 18,4) - it seems like overkill to me. And "exact" is not really something I'd specify, if exact means correct out to 12 decimal places.
I'm thinking of using real 10,2 for the monetary columns. Here's my rationale.
A typical calculation might compare a stock price (2 decimal places) to a moving average (that could have many decimal places), to determine which is larger. My understanding is that the displayed value of the average (and any calculated results) would be rounded to 2 decimal places, but that calculations would be performed using the higher precision of the stored internal number.
So such a calculation would be accurate to at least 2 decimal places, which is really all I need, I think.
Am I way off base here, and is it possible to get an incorrect answer to the above comparison by using the real 10,2 datatype?
I'd also welcome any other comments, pro or con, about using the numeric datatype.
Thanks in advance.
Floating point variables are vulnerable to floating point errors. Therefore, if accuracy is important (anytime money is involved) it's always recommended to use a numeric type.
https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems
Floating point inaccuracy examples
Let's start with the schema above, and look how 18,4 would look like in floating point numbers:
select '12345678901234.5678'::float4;
float4
-------------
1.23457e+13
(1 row)
select '12345678901234.5678'::double precision;
float8
------------------
12345678901234.6
(1 row)
Therefore 14 numbers (before the decimal point) will always round your number, and you store rounded (and therefore wrong) values.
Also your assumption about rounding to two decimal places - where is that assumption coming from?
select '1.2345678'::float4;
float4
---------
1.23457
(1 row)
Therefore, so far you presented a number of assumptions, and shortcuts, without showing why you want to use floating point numbers instead of numeric. What is your compelling reason? Just save some bytes?
My next question is: if your application expands, and does more than just "avg" calculations - do you need to chance the data type to numeric again?

How to count the number of significant digits?

For example, 5.020 would return 4. Preferably, it should work with vector inputs too.
I Googled around and found some answers, but none of them counted the last zero in 5.020.
From the given information, it is not possible.
The problem is that when you enter a number it is (per standard) represented as a double, and thus it has a precision of eps (the entered precision is lost). However, as one is typically not interested in showing all ~15 digits Matlab uses a couple of different display rules which are independent of the originally entered number, this typically involves the integer part plus 4 digits.
Additionally, the standard rule, when converting a number to a string (num2str) is to cutoff trailing zeros. Which is why you do not get the last zero.
Your only option is to count the number of significant digits when you obtain the data. Which leads back to the question #Beaker asks you in the comments

Efficiently Store Decimal Numbers with Many Leading Zeros in Postgresql

A number like:
0.000000000000000000000000000000000000000123456
is difficult to store without a large performance penalty with the available numeric types in postgres. This question addresses a similar problem, but I don't feel like it came to an acceptable resolution. Currently one of my colleagues landed on rounding numbers like this to 15 decimal places and just storing them as:
0.000000000000001
So that the double precision numeric type can be used which prevents the penalty associated with moving to a decimal numeric type. Numbers that are this small for my purposes are more or less functionally equivalent, because they are both very small (and mean more or less the same thing). However, we are graphing these results and when a large portion of the data set would be rounded like this it looks exceptionally stupid (flat line on the graph).
Because we are storing tens of thousands of these numbers and operating on them, the decimal numeric type is not a good option for us as the performance penalty is too large.
I am a scientist, and my natural inclination would just be to store these types of numbers in scientific notation, but it does't appear that postgres has this kind of functionality. I don't actually need all of the precision in the number, I just want to preserve 4 digits or so, so I don't even need the 15 digits that the float numeric type offers. What are the advantages and disadvantages of storing these numbers in two fields like this:
1.234 (real)
-40 (smallint)
where this is equivalent to 1.234*10^-40? This would allow for ~32000 leading decimals with only 2 bytes used to store them and 4 bytes to store the real value, for a total of maximally 6 bytes per number (gives me the exact number I want to store and takes less space than the existing solution which consumes 8 bytes). It also seems like sorting these numbers would be much improved as you'd need only sort on the smallint field first followed by the real field second.
You and/or your colleague seem to be confused about what numbers can be represented using the floating point formats.
A double precision (aka float) number can store at least 15 significant digits, in the range from about 1e-307 to 1e+308. You have to think of it as scientific notation. Remove all the zeroes and move that to the exponent. If whatever you have once in scientific notation has less than 15 digits and an exponent between -307 and +308, it can be stored as is.
That means that 0.000000000000000000000000000000000000000123456 can definitely be stored as a double precision, and you'll keep all the significant digits (123456). No need to round that to 0.000000000000001 or anything like that.
Floating point numbers have well-known issue of exact representation of decimal numbers (as decimal numbers in base 10 do not necessarily map to decimal numbers in base 2), but that's probably not an issue for you (it's an issue if you need to be able to do exact comparisons on such numbers).
What are the advantages and disadvantages of storing these numbers in
two fields like this
You'll have to manage 2 columns instead of one.
Roughly, what you'll be doing is saving space by storing lower-precision floats. If you only need 4 digits of precision, you can go further and save 2 more bytes by using smallint + smallint (1000-9999 + exponent). Using that format, you could cram the two smallint into one 32 bits int (exponent*2^16 + mantissa), that should work too.
That's assuming that you need to save storage space and/or need to go beyond the +/-308 digits exponent limit of the double precision float. If that's not the case, the standard format is fine.

Redshift error Overflow for NUMERIC(8,4)

Why do I get "Overflow for NUMERIC(8,4)" for 10595.148?
Redshift is based on PostgreSQL, so it follows the PostgreSQL rules for numeric data types.
NUMERIC(8,4) indicates a scale of 4, so it will try to store your number with 4 decimal digits in the fraction part: 10595.1480. This number has 9 digits, which is higher than the precision of 8. The maximum number you can store in this data type is 9999.9999.