PostgreSQL - Writing aggregate function which do average or percentile - postgresql

I would like to create aggregate function
avg_or_percentile(column, type::text)
which aggregate by average or given percentile depending on parameter 'type' which can take values like:
'avg', '50', '99' etc.
If 'avg' - aggregate using 'avg' function, if other - use percentile.
I don't see any possibility to include default avg function or aggregate function in my custom one.
If i try doing something like this:
CREATE OR REPLACE FUNCTION avg_or_percentile_transition(state internal, value double precision)
RETURNS internal AS $$
BEGIN
CASE agg_type
WHEN 'avg' THEN
return int8_avg_accum(state, value);
ELSE
return ordered_set_transition(state, value);
END CASE;
END;
$$ LANGUAGE plpgsql;
I am getting error, that internal type cannot be used.
Is the only way to achieve this to decompose them and combine both implementations in a custom one?
Am I missing something?
EDIT 1:
Right now I am achieving it using query like this ($ are macros from grafana):
Select
$__timeGroup("Closed", $__interval, 0) as "time",
CASE
WHEN '${Func}'='avg'
THEN avg(duration)
ELSE percentile_cont(try_cast_numeric('0.${Func}')) within group (order by duration asc)
END as "${Func}"
from
(SELECT
*,
extract(epoch from "Closed"-"Created")/3600 as duration
FROM pr."PullRequests" pr
WHERE $__timeFilter("Closed") and (pr."TeamName" = ${Team} or ${Team} = 'All')) withDuration
GROUP BY time
ORDER BY time
But it is very ugly statement and it gets even worse, when we add additional columns to aggregate. Would love to simplify it.

Related

Create a postgresql function that returns a string 'fall', 'spring' depending on year

I need to create a function that returns the string 'fall' or 'spring' depending on the month of the year. If the function was named getterm and took no parameters I would like to use it in a select statement like this:
select name, classname, getterm from classtable
where classtable holds the names of the classes we offer. The result set would include the columns as follows:
-Jack, Data Systems, Sp2022
-Jill, Web Stuff, F2023
I have used the now() function. I can also use extract(quarter from now()) to get my current quarter. It would seem simple then to use an 'if' or 'case' clause to return 'spring' or 'fall' based upon the quarters. I just haven't found any examples of a function like this.
Can anyone suggest some sample code ?
Per documentation from here CASE:
There is a “simple” form of CASE expression that is a variant of the general form above:
CASE expression
WHEN value THEN result
[WHEN ...]
[ELSE result]
END
The first expression is computed, then compared to each of the value expressions in the WHEN clauses until one is found that is equal to it. If no match is found, the result of the ELSE clause (or a null value) is returned. This is similar to the switch statement in C.
This translates in your case to:
SELECT
CASE extract('quarter' FROM now())
WHEN 1 THEN
'winter'
WHEN 2 THEN
'spring'
WHEN 3 THEN
'summer'
WHEN 4 THEN
'fall'
END;
case
--------
spring
Thank you to Adrian.. I now have the working function:
create function getTermString() returns text as $$
select case extract(quarter from now())
when 1 then 'sp'
when 2 then 'sp'
when 3 then 'f'
when 4 then 'f'
end ;
$$ language SQL;

How to create a user-defined function in Postgresql?

I have the following query in Postgres. I want to have a function where the user can define the value of record_year and record_month using the function call, dynamically without having to specify in the query statement. That way the same query can be reused for different user input values for record_year and record_month (in this case). Can anybody help me out?
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y
ON x.sid = y.sid
WHERE x.record_time='23:50'
and x.record_year='2020'
and x.record_month='1'
group by x.station_id, x.record_year, x.record_month, y.addr;
The function call could be something like, select function_name(record_year, record_month);
As you are returning a result the function should be defined as returns table(). PL/pgSQL is not required if all you want to do is to return a result. A SQL function will be enough.
create function get_data(p_time time, p_year int, p_month int)
returns table (sid int, record_year int, record_month int, addr text)
as
$$
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y ON x.sid = y.sid
WHERE x.record_time p_time
and x.record_year = p_year
and x.record_month = p_month
group by x.station_id, x.record_year, x.record_month, y.addr
$$
language sql
stable;
You have to adjust the data types of the returned columns - I have only guessed them based on name.
Note that your full outer join is really a left join because of the conditions on table x
As it is a set returning function, use it like a table in the FROM clause:
select *
from get_data(time '23:50', 2020, 1);

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

calculation in postgresql function

I'm trying to create function that serves as replacement for postgresql sum function in my query.
query itself it LOOONG and something like this
SELECT .... sum(tax) as tax ... from (SUBQUERY) as bar group by id;
I want to replace this sum function with my own which alters the end value by tax calculation.
So I created pgsql function like this:
DECLARE
v double precision;
BEGIN
IF val > 256 THEN
v := 256;
ELSE
v := val;
END IF;
RETURN (v*0.21)/0.79;
END;
That function uses double precision as input. Using this function gives me error though:
column "bar.tax" must appear in the GROUP BY clause or be used in an aggregate function
So I tried to use double precision[] as input type for function - in that case the error was telling me that there was no postgresql function matching the name and given input type.
So is there a way to replace sum function in my query by using pgsql functions or not - I don't want to change any other parts of my query unless I really, Really, REALLY have to.
Look into CREATE AGGREGATE for aggregate functions. And in your expression you can do something like sum(LEAST(tax,256)*21/79) if I read you right.

Are you able to use a custom Postgres comparison function for ORDER BY clauses?

In Python, I can write a sort comparison function which returns an item in the set {-1, 0, 1} and pass it to a sort function like so:
sorted(["some","data","with","a","nonconventional","sort"], custom_function)
This code will sort the sequence according to the collation order I define in the function.
Can I do the equivalent in Postgres?
e.g.
SELECT widget FROM items ORDER BY custom_function(widget)
Edit: Examples and/or pointers to documentation are welcome.
Yes you can, you can even create an functional index to speed up the sorting.
Edit: Simple example:
CREATE TABLE foo(
id serial primary key,
bar int
);
-- create some data
INSERT INTO foo(bar) SELECT i FROM generate_series(50,70) i;
-- show the result
SELECT * FROM foo;
CREATE OR REPLACE FUNCTION my_sort(int) RETURNS int
LANGUAGE sql
AS
$$
SELECT $1 % 5; -- get the modulo (remainder)
$$;
-- lets sort!
SELECT *, my_sort(bar) FROM foo ORDER BY my_sort(bar) ASC;
-- make an index as well:
CREATE INDEX idx_my_sort ON foo ((my_sort(bar)));
The manual is full of examples how to use your own functions, just start playing with it.
SQL: http://www.postgresql.org/docs/current/static/xfunc-sql.html
PL/pgSQL: http://www.postgresql.org/docs/current/static/plpgsql.html
We can avoid confusion about ordering methods using names:
"score function" of standard SQL select * from t order by f(x) clauses, and
"compare function" ("sort function" in the question text) of the Python's sort array method.
The ORDER BY clause of PostgreSQL have 3 mechanisms to sort:
Standard, using an "score function", that you can use also with INDEX.
Special "standard string-comparison alternatives", by collation configuration (only for text, varchar, etc. datatypes).
ORDER BY ... USING clause. See this question or docs example. Example: SELECT * FROM mytable ORDER BY somecol USING ~<~ where ~<~ is an operator, that is embedding a compare function.
Perhaps "standard way" in a RDBMS (as PostgreSQL) is not like Python's standard because indexing is the aim of a RDBMS, and it's easier to index score functions.
Answers to the question:
Direct solution. There are no direct way to use an user-defined function as compare function, like in the sort method of languages like Python or Javascript.
Indirect solution. You can use a user-defined compare function in an user-defined operator, and an user-defined operator class to index it. See at PostgreSQL docs:
CREATE OPERATOR with the compare function;
CREATE OPERATOR CLASS, to be indexable.
Explaining compare functions
In Python, the compare function looks like this:
def compare(a, b):
return 1 if a > b else 0 if a == b else -1
The compare function use less CPU tham a score function. It is usefull also to express order when score funcion is unknown.
See a complete description at
for C language see https://www.gnu.org/software/libc/manual/html_node/Comparison-Functions.html
for Javascript see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort#Description
Other typical compare functions
Wikipedia's example to compare tuples:
function tupleCompare((lefta, leftb, leftc), (righta, rightb, rightc))
if lefta ≠ righta
return compare(lefta, righta)
else if leftb ≠ rightb
return compare(leftb, rightb)
else
return compare(leftc, rightc)
In Javascript:
function compare(a, b) {
if (a is less than b by some ordering criterion) {
return -1;
}
if (a is greater than b by the ordering criterion) {
return 1;
}
// a must be equal to b
return 0;
}
C++ example of PostgreSQL docs:
complex_abs_cmp_internal(Complex *a, Complex *b)
{
double amag = Mag(a),
bmag = Mag(b);
if (amag < bmag)
return -1;
if (amag > bmag)
return 1;
return 0;
}
You could do something like this
SELECT DISTINCT ON (interval_alias) *,
to_timestamp(floor((extract('epoch' FROM index.created_at) / 10)) * 10) AT
TIME ZONE 'UTC' AS interval_alias
FROM index
WHERE index.created_at >= '{start_date}'
AND index.created_at <= '{end_date}'
AND product = '{product_id}'
GROUP BY id, interval_alias
ORDER BY interval_alias;
Firstly you define the parameter that will be your ordering column with AS. It could be function or any SQL expression. Then set it to ORDER BY expression and you're done!
In my opinion, this is the smoothest way to do such an ordering.