calculation in postgresql function - postgresql

I'm trying to create function that serves as replacement for postgresql sum function in my query.
query itself it LOOONG and something like this
SELECT .... sum(tax) as tax ... from (SUBQUERY) as bar group by id;
I want to replace this sum function with my own which alters the end value by tax calculation.
So I created pgsql function like this:
DECLARE
v double precision;
BEGIN
IF val > 256 THEN
v := 256;
ELSE
v := val;
END IF;
RETURN (v*0.21)/0.79;
END;
That function uses double precision as input. Using this function gives me error though:
column "bar.tax" must appear in the GROUP BY clause or be used in an aggregate function
So I tried to use double precision[] as input type for function - in that case the error was telling me that there was no postgresql function matching the name and given input type.
So is there a way to replace sum function in my query by using pgsql functions or not - I don't want to change any other parts of my query unless I really, Really, REALLY have to.

Look into CREATE AGGREGATE for aggregate functions. And in your expression you can do something like sum(LEAST(tax,256)*21/79) if I read you right.

Related

PostgreSQL - Writing aggregate function which do average or percentile

I would like to create aggregate function
avg_or_percentile(column, type::text)
which aggregate by average or given percentile depending on parameter 'type' which can take values like:
'avg', '50', '99' etc.
If 'avg' - aggregate using 'avg' function, if other - use percentile.
I don't see any possibility to include default avg function or aggregate function in my custom one.
If i try doing something like this:
CREATE OR REPLACE FUNCTION avg_or_percentile_transition(state internal, value double precision)
RETURNS internal AS $$
BEGIN
CASE agg_type
WHEN 'avg' THEN
return int8_avg_accum(state, value);
ELSE
return ordered_set_transition(state, value);
END CASE;
END;
$$ LANGUAGE plpgsql;
I am getting error, that internal type cannot be used.
Is the only way to achieve this to decompose them and combine both implementations in a custom one?
Am I missing something?
EDIT 1:
Right now I am achieving it using query like this ($ are macros from grafana):
Select
$__timeGroup("Closed", $__interval, 0) as "time",
CASE
WHEN '${Func}'='avg'
THEN avg(duration)
ELSE percentile_cont(try_cast_numeric('0.${Func}')) within group (order by duration asc)
END as "${Func}"
from
(SELECT
*,
extract(epoch from "Closed"-"Created")/3600 as duration
FROM pr."PullRequests" pr
WHERE $__timeFilter("Closed") and (pr."TeamName" = ${Team} or ${Team} = 'All')) withDuration
GROUP BY time
ORDER BY time
But it is very ugly statement and it gets even worse, when we add additional columns to aggregate. Would love to simplify it.

How to create a user-defined function in Postgresql?

I have the following query in Postgres. I want to have a function where the user can define the value of record_year and record_month using the function call, dynamically without having to specify in the query statement. That way the same query can be reused for different user input values for record_year and record_month (in this case). Can anybody help me out?
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y
ON x.sid = y.sid
WHERE x.record_time='23:50'
and x.record_year='2020'
and x.record_month='1'
group by x.station_id, x.record_year, x.record_month, y.addr;
The function call could be something like, select function_name(record_year, record_month);
As you are returning a result the function should be defined as returns table(). PL/pgSQL is not required if all you want to do is to return a result. A SQL function will be enough.
create function get_data(p_time time, p_year int, p_month int)
returns table (sid int, record_year int, record_month int, addr text)
as
$$
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y ON x.sid = y.sid
WHERE x.record_time p_time
and x.record_year = p_year
and x.record_month = p_month
group by x.station_id, x.record_year, x.record_month, y.addr
$$
language sql
stable;
You have to adjust the data types of the returned columns - I have only guessed them based on name.
Note that your full outer join is really a left join because of the conditions on table x
As it is a set returning function, use it like a table in the FROM clause:
select *
from get_data(time '23:50', 2020, 1);

Create function to compute an average of 3 values

I'm trying to write a function in postgre sql to take an average across three columns. I have written the following function:
create function xcol_avg (col1, col2, col3)
returns numeric as $$
begin
return (coalesce(col1, 0) + coalesce(col2,0) +coalesce(col3, 0))/
case when (col 1 is null or col1 = 0 then 0 else 1 end +
case when (col 2 is null or col2 = 0 then 0 else 1 end +
case when (col 3 is null or col3 = 0 then 0 else 1 end;
end
What is the problem with my code? Also, is there a way to get the function to return null if it ends up dividing by 0? Any help is really appreciated.
Thanks!
Actually, you can make a function that will use a variable number of arguments and depending on their number compute the average. In Postgres there's a word VARIADIC for such things:
SQL functions can be declared to accept variable numbers of arguments, so long as all the "optional" arguments are of the same data type
Function code:
CREATE FUNCTION xcol_avg(numeric, VARIADIC numeric[])
RETURNS numeric
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
RETURN (SELECT AVG(vals) FROM unnest($2 || ARRAY[$1]) t(vals));
END;
$$;
Use case with different number of arguments:
select xcol_avg(1,6); -- returns 3.5
select xcol_avg(1,5.5,4); -- returns 3.5
select xcol_avg(1,2,3,4,5,6,7); -- returns 4
Click on this Button to try this online.
Explanation:
Marking a function as IMMUTABLE improves the execution time by allowing the optimizer to pre-evaluate the function. Immutable functions cannot modify the database and are guaranteed to always return the same results when called with the same input.
Declaring the last parameter of a function as VARIADIC which has to be of an array type lets you provide optional arguments that will be passed to the function as an array. Note that you don't explicitly write the array, you just list your parameters as you normally would.
unnest() is a function that returns a set of rows by expanding an array. In other words it's "unpacking" the array elements into separate rows
|| is an array operator that provides the array-to-array concatenation. Here it serves the purpose of connecting the first (required) argument with the rest given in a VARIADIC array.
AVG() is an aggregate function that computes an average of all input values. In our case it would take "unpacked" rows from a column named vals and compute the average.
With this solution you don't need to worry about dividing by zero, as at least one argument is required and avg() is doing the job you wanted to do manually by building up the denominator.
Apply it in a query:
This function would also work for computing an average of multiple columns in a row. Consider a table tbl with columns name, cost1, cost2, cost3 and below statement:
SELECT
name, cost1, cost2, cost3,
xcol_avg(cost1, cost2, cost3) AS average_cost
FROM tbl
For more general information about CREATE FUNCTION check the resourceful documentation.

Trying to create aggregate function in PostgreSQL

I'm trying to create new aggregate function in PostgreSQL to use instead of the sum() function
I started my journey in the manual here.
Since I wanted to create a function that takes an array of double precision values, sums them and then does some additional calculations I first created that final function:
takes double precision as input and gives double precision as output
DECLARE
v double precision;
BEGIN
IF tax > 256 THEN
v := 256;
ELSE
v := tax;
END IF;
RETURN v*0.21/0.79;
END;
Then I wanted to create the aggregate function that takes an array of double precision values and puts out a single double precision value for my previous function to handle.
CREATE AGGREGATE aggregate_ee_income_tax (float8[]) (
sfunc = array_agg
,stype = float8
,initcond = '{}'
,finalfunc = eeincometax);
What I get when I run that command is:
ERROR: function array_agg(double precision, double precision[]) does
not exist
I'm somewhat stuck here, because the manual lists array_agg() as existing function. What am I doing wrong?
Also, when I run:
\da
List of aggregate functions
Schema | Name | Result data type | Argument data types | Description
--------+------+------------------+---------------------+-------------
(0 rows)
My installation has no aggregate functions at all? Or does only list user defined functions?
Basically what I'm trying to understand:
1) Can I use an existing functions to sum up my array values?
2) How can I find out about input and ouptut data types of functions? Docs claim that array_agg() takes any kind of input.
3) What is wrong with my own aggregate function?
Edit 1
To give more information and clearer picture of what I'm trying to achieve:
I have one huge query over several tables which goes something like this:
SELECT sum(tax) ... from (SUBQUERY) as foo group by id
I want to replace that sum function with my own aggregate function so I don't have to do additional calculations on backend - since they can all be done on database level.
Edit 2
Accepted Ants's answer. Since final solution comes from comments I post it here for reference:
CREATE AGGREGATE aggregate_ee_income_tax (float8)
(
sfunc = float8pl
,stype = float8
,initcond = '0.0'
,finalfunc = eeincometax
);
Array agg is an aggregate function not a regular function, so it can't be used as a state transition function for a new aggregate. What you want to do is to create an aggregate function which has a state transition function that is identical to array_agg and a custom final func.
Unfortunately the state transition function of array_agg is defined in terms of an internal datatype so it can't be reused. Fortunately there is an existing function in core that already does what you want.
CREATE AGGREGATE aggregate_ee_income_tax (float8)(
sfunc = array_append,
stype = float8[],
initcond = '{}',
finalfunc = eeincometax);
Also note that you had your types mixed up, you probably want aggregate a set of floats to an array, not a set of arrays to a float.
In addition to #Ants excellent advice:
1.) Your final function could be simplified to:
CREATE FUNCTION eeincometax(float8)
RETURNS float8 LANGUAGE SQL AS
$func$
SELECT (least($1, 256) * 21) / 79
$func$;
2.) It seems like you are dealing with money? In this case I would strongly advise to use the type numeric (preferred) or money for the purpose. Floating point operations are often not precise enough.
3.) The initial condition of the aggregate can simply be just 0:
CREATE AGGREGATE aggregate_ee_income_tax(float8)
(
sfunc = float8pl
,stype = float8
,initcond = 0
,finalfunc = eeincometax
);
4.) In your case (least(sum(tax), 256) * 21) / 79 is probably faster than your custom aggregate. Aggregate functions provided by PostgreSQL are written in C and optimized for performance. I would use that instead.

Are you able to use a custom Postgres comparison function for ORDER BY clauses?

In Python, I can write a sort comparison function which returns an item in the set {-1, 0, 1} and pass it to a sort function like so:
sorted(["some","data","with","a","nonconventional","sort"], custom_function)
This code will sort the sequence according to the collation order I define in the function.
Can I do the equivalent in Postgres?
e.g.
SELECT widget FROM items ORDER BY custom_function(widget)
Edit: Examples and/or pointers to documentation are welcome.
Yes you can, you can even create an functional index to speed up the sorting.
Edit: Simple example:
CREATE TABLE foo(
id serial primary key,
bar int
);
-- create some data
INSERT INTO foo(bar) SELECT i FROM generate_series(50,70) i;
-- show the result
SELECT * FROM foo;
CREATE OR REPLACE FUNCTION my_sort(int) RETURNS int
LANGUAGE sql
AS
$$
SELECT $1 % 5; -- get the modulo (remainder)
$$;
-- lets sort!
SELECT *, my_sort(bar) FROM foo ORDER BY my_sort(bar) ASC;
-- make an index as well:
CREATE INDEX idx_my_sort ON foo ((my_sort(bar)));
The manual is full of examples how to use your own functions, just start playing with it.
SQL: http://www.postgresql.org/docs/current/static/xfunc-sql.html
PL/pgSQL: http://www.postgresql.org/docs/current/static/plpgsql.html
We can avoid confusion about ordering methods using names:
"score function" of standard SQL select * from t order by f(x) clauses, and
"compare function" ("sort function" in the question text) of the Python's sort array method.
The ORDER BY clause of PostgreSQL have 3 mechanisms to sort:
Standard, using an "score function", that you can use also with INDEX.
Special "standard string-comparison alternatives", by collation configuration (only for text, varchar, etc. datatypes).
ORDER BY ... USING clause. See this question or docs example. Example: SELECT * FROM mytable ORDER BY somecol USING ~<~ where ~<~ is an operator, that is embedding a compare function.
Perhaps "standard way" in a RDBMS (as PostgreSQL) is not like Python's standard because indexing is the aim of a RDBMS, and it's easier to index score functions.
Answers to the question:
Direct solution. There are no direct way to use an user-defined function as compare function, like in the sort method of languages like Python or Javascript.
Indirect solution. You can use a user-defined compare function in an user-defined operator, and an user-defined operator class to index it. See at PostgreSQL docs:
CREATE OPERATOR with the compare function;
CREATE OPERATOR CLASS, to be indexable.
Explaining compare functions
In Python, the compare function looks like this:
def compare(a, b):
return 1 if a > b else 0 if a == b else -1
The compare function use less CPU tham a score function. It is usefull also to express order when score funcion is unknown.
See a complete description at
for C language see https://www.gnu.org/software/libc/manual/html_node/Comparison-Functions.html
for Javascript see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort#Description
Other typical compare functions
Wikipedia's example to compare tuples:
function tupleCompare((lefta, leftb, leftc), (righta, rightb, rightc))
if lefta ≠ righta
return compare(lefta, righta)
else if leftb ≠ rightb
return compare(leftb, rightb)
else
return compare(leftc, rightc)
In Javascript:
function compare(a, b) {
if (a is less than b by some ordering criterion) {
return -1;
}
if (a is greater than b by the ordering criterion) {
return 1;
}
// a must be equal to b
return 0;
}
C++ example of PostgreSQL docs:
complex_abs_cmp_internal(Complex *a, Complex *b)
{
double amag = Mag(a),
bmag = Mag(b);
if (amag < bmag)
return -1;
if (amag > bmag)
return 1;
return 0;
}
You could do something like this
SELECT DISTINCT ON (interval_alias) *,
to_timestamp(floor((extract('epoch' FROM index.created_at) / 10)) * 10) AT
TIME ZONE 'UTC' AS interval_alias
FROM index
WHERE index.created_at >= '{start_date}'
AND index.created_at <= '{end_date}'
AND product = '{product_id}'
GROUP BY id, interval_alias
ORDER BY interval_alias;
Firstly you define the parameter that will be your ordering column with AS. It could be function or any SQL expression. Then set it to ORDER BY expression and you're done!
In my opinion, this is the smoothest way to do such an ordering.