I stumbled across this oddity when multiplying DECIMAL numbers on SQL Server 2005/2008. Can anyone explain the effect?
DECLARE #a DECIMAL(38,20)
DECLARE #b DECIMAL(38,20)
DECLARE #c DECIMAL(38,20)
SELECT #a=1.0,
#b=2345.123456789012345678,
#c=23456789012345.999999999999999999
SELECT CASE WHEN #a*#b*#c = #c*#b*#a
THEN 'Product is the same'
ELSE 'Product differs'
END
It's due to precision representation and rounding errors.
The problem is due to
SELECT #a*#b --(=2345.123457)
[Please search SO for multiple examples.]
Related: Sql Server Decimal(30,10) losing last 2 decimals
Related
I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.
I am writing a Postgres function in which I select calculation results with double precision into a table type. When looking at the results for values which can't be represented as binary, one can see the limited precision when calling the function from the IntelliJ IDEA database console, but not in pgAdmin.
To illustrate what I am doing, consider the following scenario:
CREATE TABLE mytable
(
mycol double precision
);
CREATE OR REPLACE FUNCTION myFn() RETURNS mytable AS $$
DECLARE
myvar mytable;
BEGIN
SELECT 0.3::DOUBLE PRECISION INTO myvar.mycol;
return myvar;
END $$ LANGUAGE plpgsql;
SELECT myFn();
As mentioned before, in IntelliJ I get the result (0.299999999999999989) whereas in pgAdmin I get (0.3).
I suspect, that this is only a difference in displaying the value, as 0.3 can't be represented in binary, and therefore is not stored exactly in the database.
But now comes the strange part: when I insert a row into the database and select it again I get 0.3 in both applications.
INSERT INTO mytable (mycol) VALUES (0.3);
SELECT mycol FROM mytable;
Also, if I don't store the value in a table type within the function, but in a variable directly, I again get 0.3 in both applications:
CREATE OR REPLACE FUNCTION myFn() RETURNS DOUBLE PRECISION AS $$
DECLARE
myvar DOUBLE PRECISION;
BEGIN
SELECT 0.3 INTO myvar;
return myvar;
END $$ LANGUAGE plpgsql;
What exactly is happening here?
The value 0.3 cannot be stored exactly as double precision, because the IEEE standard stores floating point values in binary representation.
You do not normally notice that in PostgreSQL, because it rounds away the last three digits which may contain errors.
You can show all digits by setting extra_float_digits to 3.
SET extra_float_digits=3;
SELECT 0.3::double precision;
┌──────────────────────┐
│ float8 │
├──────────────────────┤
│ 0.299999999999999989 │
└──────────────────────┘
(1 row)
I don't know how IntelliJ IDEA formats and displays double precision values, but maybe it sometimes does its own rounding or converts the values to numeric first, which also rounds away these extra digits.
However, you should not worry about that. If you use double precision, you are in for a certain amount of imprecision.
I have a function in PL/pgSQL that is trying to back out some data for a date range. The problem I have is that I cannot seem to store the double precision inside a variable. No matter what I do the value is always null when running inside a function. When I run the query from psql command line it returns me the correct data. I can also run the query on another column that is isn't of type double precision and it works fine. For example if I change the column to "total_impressions_for_date_range" it will return me the correct data.
I am using PostgreSQL 8.4
CREATE OR REPLACE FUNCTION rollback_date_range_revenue(campaign_id int,
begin_date timestamp, end_date timestamp, autocommit boolean)
RETURNS void AS $BODY$
DECLARE
total_impressions_for_date_range bigint;
total_clicks_for_date_range bigint;
total_revenue_for_date_range double precision;
total_cost_for_date_range double precision;
BEGIN
SELECT sum(revenue) INTO total_revenue_for_date_range
FROM ad_block_summary_hourly
WHERE ad_run_id IN (
SELECT ad_run_id FROM ad_run WHERE ad_campaign_id = campaign_id)
AND ad_summary_time >= begin_date
AND ad_summary_time < end_date
AND (revenue IS NOT NULL);
RAISE NOTICE 'Total revenue for given date range and campaign % was %',
campaign_id, total_revenue_for_date_range;
When I run this I always get a null value for the revenue
SELECT rollback_date_range_revenue(8818, '2015-07-20 18:00:00'::timestamp,
'2015-07-20 20:00:00'::timestamp, false);
NOTICE: Total revenue for given date range and campaign 8818 was <NULL>
When I run it from command line outside of the function it works completely fine
select sum(revenue) from ad_block_summary_hourly where ad_run_id in (
select ad_run_id from ad_run where ad_campaign_id = 8818) and ad_summary_time
>= '2015-07-20 18:00:00'::TIMESTAMP and ad_summary_time < '2015-07-20
20:00:00'::TIMESTAMP ;
sum
----------
3122.533
(1 row)
EDIT
Huge thanks to a_horse_with_no_name and Patrick. This was indeed a problem with a place holder I had called revenue which overlapped with my query. I was thrown off by the fact that the two queries that were not working were both double precision. It just happened to be that those two were also the place holders that I had overlapped with column names.
2 things to take away from this.
I adopted the p_ naming scheme for place holders suggested by a_horse_with_no_name, so as to not run into this issue again.
Post a full code example, this could have been identified much quicker by the experts.
First of all, PostgreSQL 8.4 is no longer supported so you should upgrade to 9.4 as soon as you can. Second, your function is obviously abbreviated because some declared variables are not used and there is no END clause. These two points together make it somewhat guesswork to give you an answer, but here goes.
Try casting the double precision to text, or convert it with to_char(). RAISE NOTICE expects a string for the expressions to be inserted; possibly in 8.4 this is not automatic.
You could also improve upon your query:
...
SELECT sum(sh.revenue) INTO total_revenue_for_date_range
FROM ad_block_summary_hourly sh
JOIN ad_run r USING (ad_run_id)
WHERE r.ad_campaign_id = campaign_id
AND sh.ad_summary_time BETWEEN begin_date AND end_date;
RAISE NOTICE 'Total revenue for given date range and campaign % was %',
campaign_id, to_char(total_revenue_for_date_range, '9D999');
...
Another potential cause of the problem (guessing again due to lack of information) is a name collision between a function parameter or variable with a column name from either of the two tables.
This code works fine and does exactly what I want, which is to sum the Qty * Price for each instance of the dynamic query.
But when I add an IIF statement it breaks. What I am trying to do is the same thing as above but when the transaction type is 'CO' set the sum to a negative amount.
The problem turned out to be the NVARCHAR(4000) type of #sql, limiting its length to 4000 characters: the query got truncated at some random place after adding another long chunk to it.
DECLARE #sql NVARCHAR(MAX) solves the problem, allowing a dynamic query of any size below 2GB.
I think this is best asked in the form of a simple example. The following chunk of SQL causes a "DB-Library Error:20049 Severity:4 Message:Data-conversion resulted in overflow" message, but how come?
declare #a numeric(18,6), #b numeric(18,6), #c numeric(18,6)
select #a = 1.000000, #b = 1.000000, #c = 1.000000
select #a/(#b/#c)
go
How is this any different to:
select 1.000000/(1.000000/1.000000)
go
which works fine?
I ran into the same problem the last time I tried to use Sybase (many years ago). Coming from a SQL Server mindset, I didn't realize that Sybase would attempt to coerce the decimals out -- which, mathematically, is what it should do. :)
From the Sybase manual:
Arithmetic overflow errors occur when
the new type has too few decimal
places to accommodate the results.
And further down:
During implicit conversions to numeric
or decimal types, loss of scale
generates a scale error. Use the
arithabort numeric_truncation option
to determine how serious such an error
is considered. The default setting,
arithabort numeric_truncation on,
aborts the statement that causes the
error but continues to process other
statements in the transaction or
batch. If you set arithabort
numeric_truncation off, Adaptive
Server truncates the query results and
continues processing.
So assuming that the loss of precision is acceptable in your scenario, you probably want the following at the beginning of your transaction:
SET ARITHABORT NUMERIC_TRUNCATION OFF
And then at the end of your transaction:
SET ARITHABORT NUMERIC_TRUNCATION ON
This is what solved it for me those many years ago ...
This is just speculation, but could it be that the DBMS doesn't look at the dynamic value of your variables but only the potential values? Thus, a six-decimal numeric divided by a six-decimal numeric could result in a twelve-decimal numeric; in the literal division, the DBMS knows there is no overflow. Still not sure why the DBMS would care, though--shouldn't it return the result of two six-decimal divisions as up to a 18-decimal numeric?
Because you have declared the variables in the first example the result is expected to be of the same declaration (i.e. numeric (18,6)) but it is not.
I have to say that the first one worked in SQL2005 though (returned 1.000000 [The same declared type]) while the second one returned (1.00000000000000000000000 [A total different declaration]).
Not directly related, but could possibly save someone some time with the Arithmetic overflow errors using Sybase ASE (12.5.0.3).
I was setting a few default values in a temporary table which I intended to update later on, and stumbled on to an Arithmetic overflow error.
declare #a numeric(6,3)
select 0.000 as thenumber into #test --indirect declare
select #a = ( select thenumber + 100 from #test )
update #test set thenumber = #a
select * from #test
Shows the error:
Arithmetic overflow during implicit conversion of NUMERIC value '100.000' to a NUMERIC field .
Which in my head should work, but doesn't as the 'thenumber' column wasn't declared ( or indirectly declared as decimal(4,3) ). So you would have to indirectly declare the temp table column with scale and precision to the format you want, as in my case was 000.000.
select 000.000 as thenumber into #test --this solved it
Hopefully that saves someone some time :)