Odd behavior with overloaded function in Postgres 13.3 - postgresql

I've added a pair of overloaded functions to handle safe vision, optionally with rounding, in PG 13.3. I've run some simple example cases through the routines and, in one case, the output varies unexpectedly. I'm hoping that someone can shed some light on what might be causing this inconsistency. First off, here is the code for the div_safe (anycompatible, anycompatible) : real and div_safe (anycompatible, anycompatible, integer) : real functions. (I tried replacing integer with anycompatible in that third parameter, it made no difference.)
------------------------------
-- No rounding
------------------------------
CREATE OR REPLACE FUNCTION tools.div_safe(
numerator anycompatible,
denominator anycompatible)
RETURNS real
AS $BODY$
SELECT numerator/NULLIF(denominator,0)::real
$BODY$
LANGUAGE SQL;
COMMENT ON FUNCTION tools.div_safe (anycompatible, anycompatible) IS
'Pass in any two values that are, or can be coerced into, numbers, and get a safe division real result.';
------------------------------
-- Rounding
------------------------------
CREATE OR REPLACE FUNCTION tools.div_safe(
numerator anycompatible,
denominator anycompatible,
rounding_in integer)
RETURNS real
AS $BODY$
SELECT ROUND(numerator/NULLIF(denominator,0)::numeric, rounding_in)::real
$BODY$
LANGUAGE sql;
COMMENT ON FUNCTION tools.div_safe (anycompatible, anycompatible, integer) IS
'Pass in any two values that are, or can be coerced into, numbers, the number of rounding digits, and get back a rounded, safe division real result.';
I threw together these checks, as I was working out the code:
-- (real, int))
select '5.1/nullif(null,0)', 5.1/nullif(null,0) as result union all
select 'div_safe(5.1,0)', div_safe(5.1, 0) as result union all
-- (0, 0)
select '0/nullif(0,0)', 5.1/nullif(null,0) as result union all
select 'div_safe(0, 0)', div_safe(0, 0) as result union all
-- (int, int)
select '5/nullif(8,0)::real', 5/nullif(8,0)::real as result union all
select 'div_safe(5,8)', div_safe(5, 8) as result union all
-- (string, int)
select 'div_safe(''5'',8)', div_safe('5', 8) as result union all
select 'div_safe(''8'',5)', div_safe('8', 5) as result union all
-- Rounding: Have to convert real result to numeric to pass it into ROUND (numeric, integer)
select 'round(div_safe(10,3)::numeric, 2)',
round(div_safe(10,3)::numeric, 2) as result union all
-- Pass a third parameter to specify rounding:
select 'div_safe(20,13,2)', div_safe(20, 13, 2) as result
+-----------------------------------+--------------------+
| ?column? | result |
+-----------------------------------+--------------------+
| 5.1/nullif(null,0) | NULL |
| div_safe(5.1,0) | NULL |
| 0/nullif(0,0) | NULL |
| div_safe(0, 0) | NULL |
| 5/nullif(8,0)::real | 0.625 |
| div_safe(5,8) | 0.625 |
| div_safe('5',8) | 0.625 |
| div_safe('8',5) | 1.600000023841858 |
| round(div_safe(10,3)::numeric, 2) | 3.33 |
| div_safe(20,13,2) | 1.5399999618530273 |
+-----------------------------------+--------------------+
The last line looks wrong to me, it should be rounded to 1.54. I've discovered that I get this behavior in the presence of one of the other tests. Specifically:
select '5/nullif(8,0)::real', 5/nullif(8,0)::real as result union all
Without that, the final line returns 1.54, as expected.
Can anyone shed some light on what's going on? Is it something to do with the combination of anycompatible with UNION ALL? Something incredibly simple that I'm missing?
And, if anyone knows, is there a chance that anynum might be added as a pseudo-type in the future?
Follow-up regarding inconsistent output
I've already gotten a helpful answer to my original question (thanks!), and am following up on a follow-on point. Namely, why does my function round data before returning it, and then the value is changed in the final result. It think that there's something fundamental I'm missing here, and it's not obvious. I figured that I needed to confirm that the right version of the function is being called, and RAISE NOTIFICATION to get at the values, as seen inside the method. This new version is div_safe_p (anycompatible, anycompatible, integer) : real, and is written in PL/PgSQL:
------------------------------
-- Rounding
------------------------------
drop function if exists tools.div_safe_p(anycompatible,anycompatible,integer);
CREATE OR REPLACE FUNCTION tools.div_safe_p(
numerator anycompatible,
denominator anycompatible,
rounding_in integer)
RETURNS real
AS $BODY$
DECLARE
result_r real := 0;
BEGIN
SELECT ROUND(numerator/NULLIF(denominator,0)::numeric, rounding_in)::real INTO result_r;
RAISE NOTICE 'Calling div_safe_p(%, %, %) : %', numerator, denominator, rounding_in, result_r;
RETURN result_r;
END
$BODY$
LANGUAGE plpgsql;
COMMENT ON FUNCTION tools.div_safe_p (anycompatible, anycompatible, integer) IS
'Pass in any two values that are, or can be coerced into, numbers, the number of roudning digits, and get back a rounded, safe division real result.';
Here's a sample call, and output:
select 5/nullif(8,0)::real union all
select div_safe_p(10,3, 2)::real
+--------------------+
| ?column? |
+--------------------+
| 0.625 |
| 3.3299999237060547 |
+--------------------+
The result of div_safe_p appears to be converted to a double, not a real. Check the RAISE NOTICE console output, the function returned 3.33:
NOTICE: Calling div_safe_p(10, 3, 2) : 3.33
Yes, this 3.33 is shown as 3.3299999237060547. I'm not clear why the value is modified from how it's returned from the function. I also can't reproduce the transformation by converting the value by hand. Both select 3.33::real and select 3.33::double precision return 3.33.
Another variant, the same as the original except without the ::real castings:
select 5/nullif(8,0) union all
select div_safe_p(10,3, 2)
+----------+
| ?column? |
+----------+
| 0 |
| 3.33 |
+----------+
It certainly looks like the first value encountered is guiding the column typing, as answered already. However, I'm stumped as to why this changes the behavior of the function itself. Or, at least changes how the output is interpreted.
If this sounds like a fine point...maybe it is. When I run into peculiarities that I can't explain, I hope to figure out what's going on so that I can predict and troubleshoot more complex examples in the future.
Thanks for any illumination!

This is as expected on account of the type resolution rules for UNION:
Select the first non-unknown input type as the candidate type, then consider each other non-unknown input type, left to right.
Now the first non-NULL data type is double precision (see the type resolution rules for operators), so all results get cast to double precision resulting in the imprecision being visible. Without that test, the result is of type real, so PostgreSQL shows few enough digits to hide the imprecision.
It is useful to use the pg_typeof function to show the data type, that will clear things up:
SELECT pg_typeof(v)
FROM (SELECT NULL
UNION ALL
SELECT 2::real / 3::real
UNION ALL
SELECT pi()) AS t(v);
pg_typeof
══════════════════
double precision
double precision
double precision
(3 rows)

Related

Understand rounded results after division involving floating point types

In Postgres 14, I see rounded results that do not make sense to me. Trying to understand what is going on. In all cases I am dividing 19 by 3.
Casting either integer value to a real:
SELECT 19::real /3;
yields a value of 6.333333333333333
SELECT 19/3::real;
yields a value of 6.333333333333333
However, casting both sides to real yields:
SELECT 19::real/3::real;
yields a value of 6.3333335
Interestingly enough, if I cast both sides to double precision or float, the answer is 6.333333333333333
Also
SELECT 19.0 / 3.0;
yields 6.333333333333333
SELECT 19.0::real / 3.0::real;
yields 6.3333335
SELECT ( 16.0 / 3.0) :: real;
yields 6.3333335
I now see that:
SELECT 6.333333333333333::real;
yields 6.33333335
So the real issue seems to be:
Why are we rounding in this weird way (I know that real / floats are inexact but this seems extreme.)
What data type is 19::real / 3;
Why are we rounding in this weird way? (I know that real / floats are inexact but this seems extreme.)
Because real (float4) only uses 4 bytes for storage, and that's the closest possible value it can encode.
What data type is 19::real / 3;?
Check with pg_typeof() if your client does not indicate the column type (like pgAdmin4 does).
test=> SELECT pg_typeof(19::real / 3);
pg_typeof
------------------
double precision
(1 row)
test=> SELECT pg_typeof(19/3::real);
pg_typeof
------------------
double precision
(1 row)
test=> SELECT pg_typeof(19::real/3::real);
pg_typeof
-----------
real
(1 row)
This is the complete list of available division operators involving real:
test=> SELECT oprleft::regtype, oprright::regtype, oprresult::regtype
test-> FROM pg_operator
test-> WHERE oprname = '/'
test-> AND 'real'::regtype IN (oprleft, oprright);
oprleft | oprright | oprresult
------------------+------------------+------------------
real | real | real
money | real | money
real | double precision | double precision
double precision | real | double precision
(4 rows)
For combinations of types that have no exact match here, Postgres finds the closest match according to its operator resolution rules. Postgres aims to preserve precision, so the only division that produces real is real / real. All other variants produce double precision (float8). (money being a corner case exception.)

PostgreSQL convert varchar to numeric and get average

I have a column that I want to get an average of, the column is varchar(200). I keep getting this error. How do I convert the column to numeric and get an average of it.
Values in the column look like
16,000.00
15,000.00
16,000.00 etc
When I execute
select CAST((COALESCE( bonus,'0')) AS numeric)
from tableone
... I get
ERROR: invalid input syntax for type numeric:
The standard way to represent (as text) a numeric in SQL is something like:
16000.00
15000.00
16000.00
So, your commas in the text are hurting you.
The most sensible way to solve this problem would be to store the data just as a numeric instead of using a string (text, varchar, character) type, as already suggested by a_horse_with_no_name.
However, assuming this is done for a good reason, such as you inherited a design you cannot change, one possibility is to get rid of all the characters which are not a (minus sign, digit, period) before casting to numeric:
Let's assume this is your input data
CREATE TABLE tableone
(
bonus text
) ;
INSERT INTO tableone(bonus)
VALUES
('16,000.00'),
('15,000.00'),
('16,000.00'),
('something strange 25'),
('why do you actually use a "text" column if you could just define it as numeric(15,0)?'),
(NULL) ;
You can remove all the straneous chars with a regexp_replace and the proper regular expression ([^-0-9.]), and do it globally:
SELECT
CAST(
COALESCE(
NULLIF(
regexp_replace(bonus, '[^-0-9.]+', '', 'g'),
''),
'0')
AS numeric)
FROM
tableone ;
| coalesce |
| -------: |
| 16000.00 |
| 15000.00 |
| 16000.00 |
| 25 |
| 150 |
| 0 |
See what happens to the 15,0 (this may NOT be what you want).
Check everything at dbfiddle here
I'm going to go out on a limb and say that it might be because you have Empty strings rather than nulls in your column; this would result in the error you are seeing. Try wrapping the column name in a nullif:
SELECT CAST(coalesce(NULLIF(bonus, ''), '0') AS integer) as new_field
But I would really question your schema that you have numeric values stored in a varchar column...

PostgreSql round() giving Error [duplicate]

I am using PostgreSQL via the Ruby gem 'sequel'.
I'm trying to round to two decimal places.
Here's my code:
SELECT ROUND(AVG(some_column),2)
FROM table
I get the following error:
PG::Error: ERROR: function round(double precision, integer) does
not exist (Sequel::DatabaseError)
I get no error when I run the following code:
SELECT ROUND(AVG(some_column))
FROM table
Does anyone know what I am doing wrong?
PostgreSQL does not define round(double precision, integer). For reasons #Mike Sherrill 'Cat Recall' explains in the comments, the version of round that takes a precision is only available for numeric.
regress=> SELECT round( float8 '3.1415927', 2 );
ERROR: function round(double precision, integer) does not exist
regress=> \df *round*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------+------------------+---------------------+--------
pg_catalog | dround | double precision | double precision | normal
pg_catalog | round | double precision | double precision | normal
pg_catalog | round | numeric | numeric | normal
pg_catalog | round | numeric | numeric, integer | normal
(4 rows)
regress=> SELECT round( CAST(float8 '3.1415927' as numeric), 2);
round
-------
3.14
(1 row)
(In the above, note that float8 is just a shorthand alias for double precision. You can see that PostgreSQL is expanding it in the output).
You must cast the value to be rounded to numeric to use the two-argument form of round. Just append ::numeric for the shorthand cast, like round(val::numeric,2).
If you're formatting for display to the user, don't use round. Use to_char (see: data type formatting functions in the manual), which lets you specify a format and gives you a text result that isn't affected by whatever weirdness your client language might do with numeric values. For example:
regress=> SELECT to_char(float8 '3.1415927', 'FM999999999.00');
to_char
---------------
3.14
(1 row)
to_char will round numbers for you as part of formatting. The FM prefix tells to_char that you don't want any padding with leading spaces.
        ((this is a Wiki! please edit to enhance!))
Try also the old syntax for casting,
SELECT ROUND( AVG(some_column)::numeric, 2 ) FROM table;
works with any version of PostgreSQL. ...But, as definitive solution, you can overload the ROUND function.
Overloading as casting strategy
CREATE FUNCTION ROUND(float,int) RETURNS NUMERIC AS $f$
SELECT ROUND( CAST($1 AS numeric), $2 )
$f$ language SQL IMMUTABLE;
Now your instruction will works fine, try this complete comparison:
SELECT trunc(n,3), round(n,3) n_round, round(f,3) f_round,
pg_typeof(n) n_type, pg_typeof(f) f_type, pg_typeof(round(f,3)) f_round_type
FROM (SELECT 2.0/3.0, 2/3::float) t(n,f);
trunc
n_round
f_round
n_type
f_type
f_round_type
0.666
0.667
0.667
numeric
double precision
numeric
The ROUND(float,int) function is f_round, it returns a (decimal) NUMERIC datatype, that is fine for some applications: problem solved!
In another applications we need a float also as result. An alternative is to use round(f,3)::float or to create a round_tofloat() function.
Other alternative, overloading ROUND function again, and using all range of accuracy-precision of a floating point number, is to return a float when the accuracy is defined (see IanKenney's answer),
CREATE FUNCTION ROUND(
input float, -- the input number
accuracy float -- accuracy, the "counting unit"
) RETURNS float AS $f$
SELECT ROUND($1/accuracy)*accuracy
$f$ language SQL IMMUTABLE;
Try
SELECT round(21.04, 0.05); -- 21.05 float!
SELECT round(21.04, 5::float); -- 20
SELECT round(1/3., 0.0001); -- 0.3333
SELECT round(2.8+1/3., 0.5); -- 3.15
SELECT round(pi(), 0.0001); -- 3.1416
PS: the command \df round, on psql after overloadings, will show something like this table
Schema | Name | Result | Argument
------------+-------+---------+------------------
myschema | round | numeric | float, int
myschema | round | float | float, float
pg_catalog | round | float | float
pg_catalog | round | numeric | numeric
pg_catalog | round | numeric | numeric, int
where float is synonymous of double precision and myschema is public when you not use a schema. The pg_catalog functions are the default ones, see at Guide the build-in math functions.
Rounding and formating
The to_char function apply internally the round procedure, so, when your aim is only to show a final result in the terminal, you can use the FM modifier as a prefix to a numeric format pattern:
SELECT round(x::numeric,2), trunc(x::numeric,2), to_char(x, 'FM99.99')
FROM (SELECT 2.0/3) t(x);
round
trunc
to_char
0.67
0.66
.67
NOTES
Cause of the problem
There are a lack of overloads in some PostgreSQL functions, why (???): I think "it is a lack" (!), but #CraigRinger, #Catcall and the PostgreSQL team agree about "pg's historic rationale".
Note about performance and reuse
The build-in functions, such as ROUND of the pg_catalog, can be overloaded with no performance loss, when compared to direct cast encoding. Two precautions must be taken when implementing user-defined cast functions for high performance:
The IMMUTABLE clause is very important for code snippets like this, because, as said in the Guide: "allows the optimizer to pre-evaluate the function when a query calls it with constant arguments"
PLpgSQL is the preferred language, except for "pure SQL". For JIT optimizations (and sometimes for parallelism) language SQL can obtain better optimizations. Is something like copy/paste small piece of code instead of use a function call.
Conclusion: the above ROUND(float,int) function, after optimizations, is so fast than #CraigRinger's answer; it will compile to (exactly) the same internal representation. So, although it is not standard for PostgreSQL, it can be standard for your projects, by a centralized and reusable "library of snippets", like pg_pubLib.
Round to the nth bit or other numeric representation
Some people argue that it doesn't make sense for PostgreSQL to round a number of float datatype, because float is a binary representation, it requires rounding the number of bits or its hexadecimal representation.
Well, let's solve the problem, adding an exotic suggestion... The aim here is to return a float type in another overloaded function,   ROUND(float, text, int) RETURNS float The text is to offer a choice between
'dec' for "decimal representation",
'bin' for "binary" representation and
'hex' for hexadecimal representation.
So, in different representations we have a different interpretation about the number of digits to be rounded. Rounding a number x with an approximate shorter value, with less "fractionary digits" (tham its original d digits), will be shorter when d is couting binary digits instead decimal or hexadecimal.
It is not easy without C++, using "pure SQL", but this code snippets will illustrate and can be used as workaround:
-- Looking for a round_bin() function! this is only a workaround:
CREATE FUNCTION trunc_bin(x bigint, t int) RETURNS bigint AS $f$
SELECT ((x::bit(64) >> t) << t)::bigint;
$f$ language SQL IMMUTABLE;
CREATE FUNCTION ROUND(
x float,
xtype text, -- 'bin', 'dec' or 'hex'
xdigits int DEFAULT 0
)
RETURNS FLOAT AS $f$
SELECT CASE
WHEN xtype NOT IN ('dec','bin','hex') THEN 'NaN'::float
WHEN xdigits=0 THEN ROUND(x)
WHEN xtype='dec' THEN ROUND(x::numeric,xdigits)
ELSE (s1 ||'.'|| s2)::float
END
FROM (
SELECT s1,
lpad(
trunc_bin( s2::bigint, CASE WHEN xd<bin_bits THEN bin_bits - xd ELSE 0 END )::text,
l2,
'0'
) AS s2
FROM (
SELECT *,
(floor( log(2,s2::numeric) ) +1)::int AS bin_bits, -- most significant bit position
CASE WHEN xtype='hex' THEN xdigits*4 ELSE xdigits END AS xd
FROM (
SELECT s[1] AS s1, s[2] AS s2, length(s[2]) AS l2
FROM (SELECT regexp_split_to_array(x::text,'\.')) t1a(s)
) t1b
) t1c
) t2
$f$ language SQL IMMUTABLE;
Try
SELECT round(1/3.,'dec',4); -- 0.3333 float!
SELECT round(2.8+1/3.,'dec',1); -- 3.1 float!
SELECT round(2.8+1/3.,'dec'); -- ERROR, need to cast string
SELECT round(2.8+1/3.,'dec'::text); -- 3 float
SELECT round(2.8+1/3.,'dec',0); -- 3 float
SELECT round(2.8+1/3.,'hex',0); -- 3 float (no change)
SELECT round(2.8+1/3.,'hex',1); -- 3.1266
SELECT round(2.8+1/3.,'hex',3); -- 3.13331578486784
SELECT round(2.8+1/3.,'bin',1); -- 3.1125899906842625
SELECT round(2.8+1/3.,'bin',6); -- 3.1301821767286784
SELECT round(2.8+1/3.,'bin',12); -- 3.13331578486784
And \df round have also:
Schema | Name | Result | Argument
------------+-------+---------+---------------
myschema | round | float | x float, xtype text, xdigits int DEFAULT 0
Try with this:
SELECT to_char (2/3::float, 'FM999999990.00');
-- RESULT: 0.67
Or simply:
SELECT round (2/3::DECIMAL, 2)::TEXT
-- RESULT: 0.67
you can use the function below
SELECT TRUNC(14.568,2);
the result will show :
14.56
you can also cast your variable to the desire type :
SELECT TRUNC(YOUR_VAR::numeric,2)
SELECT ROUND(SUM(amount)::numeric, 2) AS total_amount
FROM transactions
Gives: 200234.08
Try casting your column to a numeric like:
SELECT ROUND(cast(some_column as numeric),2) FROM table
According to Bryan's response you can do this to limit decimals in a query. I convert from km/h to m/s and display it in dygraphs but when I did it in dygraphs it looked weird. Looks fine when doing the calculation in the query instead. This is on postgresql 9.5.1.
select date,(wind_speed/3.6)::numeric(7,1) from readings;
Error:function round(double precision, integer) does not exist
Solution: You need to addtype cast then it will work
Ex: round(extract(second from job_end_time_t)::integer,0)

PostgreSQL - rounding floating point numbers

I have a newbie question about floating point numbers in PostgreSQL 9.2.
Is there a function to round a floating point number directly, i.e. without having to convert the number to a numeric type first?
Also, I would like to know whether there is a function to round by an arbitrary unit of measure, such as to nearest 0.05?
When casting the number into a decimal form first, the following query works perfectly:
SELECT round(1/3.::numeric,4);
round
--------
0.3333
(1 row)
Time: 0.917 ms
However, what really I'd like to achieve is something like the following:
SELECT round(1/3.::float,4);
which currently gives me the following error:
ERROR: function round(double precision, integer) does not exist at character 8
Time: 0.949 ms
Thanks
Your workaround solution works with any version of PostgreSQL,
SELECT round(1/3.::numeric,4);
But the answer for "Is there a function to round a floating point number directly?", is no.
The cast problem
You are reporting a well-known "bug", there is a lack of overloads in some PostgreSQL functions... Why (???): I think "it is a lack" (!), but #CraigRinger, #Catcall (see comments at Craig's anser) and the PostgreSQL team agree about "PostgreSQL's historic rationale".
The solution is to develop a centralized and reusable "library of snippets", like pg_pubLib. It implements the strategy described below.
Overloading as casting strategy
You can overload the build-in ROUND function with,
CREATE FUNCTION ROUND(float,int) RETURNS NUMERIC AS $f$
SELECT ROUND($1::numeric,$2);
$f$ language SQL IMMUTABLE;
Now your dream will be reality, try
SELECT round(1/3.,4); -- 0.3333 numeric
It returns a (decimal) NUMERIC datatype, that is fine for some applications... An alternative is to use round(1/3.,4)::float or to create a round_tofloat() function.
Other alternative, to preserve input datatype and use all range of accuracy-precision of a floating point number (see IanKenney's answer), is to return a float when the accuracy is defined,
CREATE or replace FUNCTION ROUND(
input float, -- the input number
accuracy float -- accuracy
) RETURNS float AS $f$
SELECT ROUND($1/accuracy)*accuracy
$f$ language SQL IMMUTABLE;
COMMENT ON FUNCTION ROUND(float,float) IS 'ROUND by accuracy.';
Try
SELECT round(21.04, 0.05); -- 21.05
SELECT round(21.04, 5::float); -- 20
SELECT round(pi(), 0.0001); -- 3.1416
SELECT round(1/3., 0.0001); -- 0.33330000000000004 (ops!)
To avoid floating-pont truncation (internal information loss), you can "clean" the result, for example truncating on 9 digits:
CREATE or replace FUNCTION ROUND9(
input float, -- the input number
accuracy float -- accuracy
) RETURNS float AS $f$
SELECT (ROUND($1/accuracy)*accuracy)::numeric(99,9)::float
$f$ language SQL IMMUTABLE;
Try
SELECT round9(1/3., 0.00001); -- 0.33333 float, solved!
SELECT round9(1/3., 0.005); -- 0.335 float, ok!
PS: the command \df round, on psql after overloadings, will show something like this table
Schema | Name | Result | Argument
------------+-------+---------+------------------
myschema | round | numeric | float, int
myschema | round | float | float, float
pg_catalog | round | float | float
pg_catalog | round | numeric | numeric
pg_catalog | round | numeric | numeric, int
where float is synonymous of double precision and myschema is public when you not use a schema. The pg_catalog functions are the default ones, see at Guide the build-in math functions.
More details
See a complete Wiki answer here.
You can accomplish this by doing something along the lines of
select round( (21.04 /0.05 ),0)*0.05
where 21.04 is the number to round and 0.05 is the accuracy.

How to create a custom windowing function for PostgreSQL? (Running Average Example)

I would really like to better understand what is involved in creating a UDF that operates over windows in PostgreSQL. I did some searching about how to create UDFs in general, but haven't found an example of how to do one that operates over a window.
To that end I am hoping that someone would be willing to share code for how to write a UDF (can be in C, pl/SQL or any of the procedural languages supported by PostgreSQL) that calculates the running average of numbers in a window. I realize there are ways to do this by applying the standard average aggregate function with the windowing syntax (rows between syntax I believe), I am simply asking for this functionality because I think it makes a good simple example. Also, I think if there was a windowing version of average function then the database could keep a running sum and observation count and wouldn't sum up almost identical sets of rows at each iteration.
You have to look to postgresql source code postgresql/src/backend/utils/adt/windowfuncs.c and postgresql/src/backend/executor/nodeWindowAgg.c
There are no good documentation :( -- fully functional window function should be implemented only in C or PL/v8 - there are no API for other languages.
http://www.pgcon.org/2009/schedule/track/Version%208.4/128.en.html presentation from author of implementation in PostgreSQL.
I found only one non core implementation - http://api.pgxn.org/src/kmeans/kmeans-1.1.0/
http://pgxn.org/dist/plv8/1.3.0/doc/plv8.html
According to the documentation "Other window functions can be added by the user. Also, any built-in or user-defined normal aggregate function can be used as a window function." (section 4.2.8). That worked for me for computing stock split adjustments:
CREATE OR REPLACE FUNCTION prod(float8, float8) RETURNS float8
AS 'SELECT $1 * $2;'
LANGUAGE SQL IMMUTABLE STRICT;
CREATE AGGREGATE prods ( float8 ) (
SFUNC = prod,
STYPE = float8,
INITCOND = 1.0
);
create or replace view demo.price_adjusted as
select id, vd,
prods(sdiv) OVER (PARTITION by id ORDER BY vd DESC ROWS UNBOUNDED PRECEDING) as adjf,
rawprice * prods(sdiv) OVER (PARTITION by id ORDER BY vd DESC ROWS UNBOUNDED PRECEDING) as price
from demo.prices_raw left outer join demo.adjustments using (id,vd);
Here are the schemas of the two tables:
CREATE TABLE demo.prices_raw (
id VARCHAR(30),
vd DATE,
rawprice float8 );
CREATE TABLE demo.adjustments (
id VARCHAR(30),
vd DATE,
sdiv float);
Starting with table
payments
+------------------------------+
| customer_id | amount | item |
| 5 | 10 | book |
| 5 | 71 | mouse |
| 7 | 13 | cover |
| 7 | 22 | cable |
| 7 | 19 | book |
+------------------------------+
SELECT customer_id,
AVG(amount) OVER (PARTITION BY customer_id) AS avg_amount,
item,
FROM payments`
we get
+----------------------------------+
| customer_id | avg_amount | item |
| 5 | 40.5 | book |
| 5 | 40.5 | mouse |
| 7 | 18 | cover |
| 7 | 18 | cable |
| 7 | 18 | book |
+----------------------------------+
AVG being an aggregate function, it can act as a window function. However not all window functions are aggregate functions. The aggregate functions are the non-sophisticated window functions.
In the query above, let's not use the built-in AVG function and use our own implementation. Does the same, just implemented by the user. The query above becomes:
SELECT customer_id,
my_avg(amount) OVER (PARTITION BY customer_id) AS avg_amount,
item,
FROM payments`
The only difference from the former query is that AVG has been replaced with my_avg. We now need to implement our custom function.
On how to compute the average
Sum up all the elements, then divide by the number of elements. For customer_id of 7, that would be (13 + 22 + 19) / 3 = 18.
We can devide it in:
a step-by-step accumulation -- the sum.
a final operation -- division.
On how the aggregate function gets to the result
The average is computed in steps. Only the last value is necessary.
Start with an initial value of 0.
Feed 13. Compute the intermediate/accumulated sum, which is 13.
Feed 22. Compute the accumulated sum, which needs the previous sum plus this element: 13 + 22 = 35
Feed 19. Compute the accumulated sum, which needs the previous sum plus this element: 35 + 19 = 54. This is the total that needs to be divided by the number of element (3).
The result of step 3. is fed to another function, that knows how to divide the accumulated sum by the number of elements
What happened here is that the state started with the initial value of 0 and was changed with every step, then passed to the next step.
State travels between steps for as long as there is data. When all data is consumed state goes to a final function (terminal operation). We want the state to contain all the information needed for the accumulator as well as by the terminal operation.
In the specific case of computing the average, the terminal operation needs to know how many elements the accumulator worked with because it needs to divide by that. For that reason, the state needs to include both the accumulated sum and the number of elements.
We need a tuple that will contain both. Pre-defined POINT PostgreSQL type to the rescue. POINT(5, 89) means an accumulated sum of 5 elements that has the value of 89. The initial state will be a POINT(0,0).
The accumulator is implemented in what's called a state function. The terminal operation is implemented in what's called a final function.
When defining a custom aggregate function we need to specify:
the aggregate function name and return type
the initial state
the type of the state that the infrastructure will pass between steps and to the final function
a state function -- knows how to perform the accumulation steps
a final function -- knows how to perform the terminal operation. Not always needed (e.g. in a custom implementation of SUM the final value of the accumulated sum is the result.)
Here's the definition for the custom aggregate function.
CREATE AGGREGATE my_avg (NUMERIC) ( -- NUMERIC is what the function returns
initcond = '(0,0)', -- this is the initial state of type POINT
stype = POINT, -- this is the type of the state that will be passed between steps
sfunc = my_acc, -- this is the function that knows how to compute a new average from existing average and new element. Takes in the state (type POINT) and an element for the step (type NUMERIC)
finalfunc my_final_func -- returns the result for the aggregate function. Takes in the state of type POINT (like all other steps) and returns the result as what the aggregate function returns - NUMERIC
);
The only thing left is to define two functions my_acc and my_final_func.
CREATE FUNCTION my_acc (state POINT, elem_for_step NUMERIC) -- performs accumulated sum
RETURNS POINT
LANGUAGE SQL
AS $$
-- state[0] is the number of elements, state[1] is the accumulated sum
SELECT POINT(state[0]+1, state[1] + elem_for_step);
$$;
CREATE FUNCTION my_final_func (POINT) -- performs devision and returns final value
RETURNS NUMERIC
LANGUAGE SQL
AS $$
-- $1[1] is the sum, $1[0] is the number of elements
SELECT ($1[1]/$1[0])::NUMERIC;
$$;
Now that the functions are available CREATE AGGREGATE defined above will run successfully. Now that we have the aggregate defined, the query based on my_avg instead of the built-in AVG can be run:
SELECT customer_id,
my_avg(amount) OVER (PARTITION BY customer_id) AS avg_amount,
item,
FROM payments`
The results are identical with what you get when using the built-in AVG.
The PostgreSQL documentation suggests that the users are limited to implementing user-defined aggregate functions:
In addition to these [pre-defined window] functions, any built-in or user-defined general-purpose or statistical aggregate (i.e., not ordered-set or hypothetical-set aggregates) can be used as a window function;
What I suspect ordered-set or hypothetical-set aggregates means:
the value returned is identical to all other rows (e.g. AVG and SUM. In contrast RANK returns different values for all rows in group depending on more sophisticated criteria)
it makes no sense to ORDER BY when PARTITIONing because the values are the same for all rows anyway. In contrast we want to ORDER BY when using RANK()
Query:
SELECT customer_id, item, rank() OVER (PARTITION BY customer_id ORDER BY amount desc) FROM payments;
Geometric mean
The following is a user-defined aggregate function that I found no built-in aggregate for and may be useful to some.
The state function computes the average of the natural logarithms of the terms.
The final function raises constant e to whatever the accumulator provides.
CREATE OR REPLACE FUNCTION sum_of_log(state POINT, curr_val NUMERIC)
RETURNS POINT
LANGUAGE SQL
AS $$
SELECT POINT(state[0] + 1,
(state[1] * state[0]+ LN(curr_val))/(state[0] + 1));
$$;
CREATE OR REPLACE FUNCTION e_to_avg_of_log(POINT)
RETURNS NUMERIC
LANGUAGE SQL
AS $$
select exp($1[1])::NUMERIC;
$$;
CREATE AGGREGATE geo_mean (NUMBER)
(
stype = NUMBER,
initcond = '(0,0)', -- represent POINT value
sfunc = sum_of_log,
finalfunc = e_to_avg_of_log
);
PL/R provides such functionality. See here for some examples. That said, I'm not sure that it (currently) meets your requirement of "keep[ing] a running sum and observation count and [not] sum[ming] up almost identical sets of rows at each iteration" (see here).