Create a custom aggregate function in postgresql

Create a custom aggregate function in postgresql - postgresql

I need an aggregate function in postgresql that returns the maximum value of a text column, where the maximum is calculated not by alphabetical order but by the length of the string.
Can anyone please help me out?

A custom aggregate consist of two parts: a function that does the work and the definition of the aggregate function.
So we first need a function that returns the longer of two strings:
create function greater_by_length(p_one text, p_other text)
returns text
as
$$
select case
when length(p_one) >= length(p_other) then p_one
else p_other
end
$$
language sql
immutable;
Then we can define an aggregate using that function:
create aggregate max_by_length(text)
(
sfunc = greater_by_length,
stype = text
);
And using it:
select max_by_length(s)
from (
values ('one'), ('onetwo'), ('three'), ('threefourfive')
) as x(s);
returns threefourfive

Related

How to turn a greater strict function into an aggregate

I have been looking for a max function setting null to the max value and found the following on (https://www.postgresql.org/message-id/r2y162867791004201002x50843917y3d1f1293db7451e0#mail.gmail.com) :
create or replace function greatest_strict(variadic anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
The problem is that this function is not an aggregation function usable for group by. How can I change that? Such that I can use the following query:
SELECT greatest_strict(performed_on) as start_date
from task
group by contract_id;

I've created this before: https://wiki.postgresql.org/wiki/Aggregate_strict_min_and_max
I call it strict_max, not strict_greatest, because "max" is already an aggregate so that seems like a better name.
This has the advantage (over the other answer) of not storing all the values in memory while it is aggregating over them, so that it can work on very large data sets.

You can create your own aggregation functions.
create aggregate agg_greatest_strict(anyelement) (
sfunc = create_array,
stype = anyarray,
finalfunc = greatest_strict,
initcond = '{}'
);
sfunc is a function which will be executed for every row and returns an intermediate result.
finalfunc will be executed afterwards with the result of the last sfunc execution.
In your case you could create the arrays for every row (your sfunc):
create or replace function create_array(anyarray, anyelement)
returns anyarray as $$
SELECT
$1 || $2
$$ language sql;
This simply aggregates the row values into one array. (first parameter is the result of the previous execution; if it is the first one, initcond value will be taken instead)
Afterwards you can take your function as finalfunc:
create or replace function greatest_strict(anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
demo:db<>fiddle
Edit: Former solutions without any finalfunc function using the greatest() function on every row:
demo:db<>fiddle (one sfunc for anyelement)
demo:db<>fiddle (overloaded sfunc for text and numeric type because of some problem with special chars and ASCII-order)

How to ARRAY values from a SELECT query in POSTGRES

I am trying to structure an array within my postgres call by pulling 3 values (all SMALLINT's) from a table and turning them into an array so that I can use them in the rest of the call like so code_list[0].
Currently, I have only created this part of the function so that I can ensure that I am structuring it correctly before I proceed. However, I receive this error error: subquery must return only one column which makes me thing that it assume that I am trying to return a TABLE. I can't save a table in into one value as far as I am aware so I am trying to create an array instead.
Am I creating an ARRAY properly? Is there a way to transform this into JSONB if that would be a better strategy?
CREATE OR REPLACE FUNCTION "RetrieveCodeValues" (
"#code" VARCHAR(50)
)
RETURNS SMALLINT[] AS
$func$
BEGIN
SELECT ARRAY (
SELECT c."big", c."mid", c."small"
FROM "codes" AS c
WHERE "code" = "#code"
) AS code_list;
RETURN code_list;
END;
$func$ LANGUAGE PLPGSQL;

Use the array constructor:
DECLARE res integer[];
BEGIN
SELECT ARRAY[c.big, c.mid, c.small] INTO res
FROM ...
RETURN res;
END;

How to derive a column name in the return type from input parameters to the function?

Using Postgres 9.5 I have built this function:
CREATE or REPLACE FUNCTION func_getratio_laglag(_numeratorLAG text, _n1 int, _denominatorLAG text, _n2 int, _table text)
RETURNS TABLE (date_t timestamp without time zone, customer_code text, index text, ratio real) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT
date_t,
customer_code,
index,
(LAG('||quote_ident(_numeratorLAG)||',' || quote_literal(_n1)||') OVER W / LAG('||quote_ident(_denominatorLAG)||','|| quote_literal(_n2)||') OVER W) '
|| ' FROM ' || quote_ident(_table)
|| ' WINDOW W AS (PARTITION BY customer_code ORDER BY date_t asc);';
END;
$BODY$ LANGUAGE plpgsql;
All the function does is allow me the ability to pick a 2 different columns from a specified table and calculate a ratio between them based on different lag windows. To execute the function above I use the following query:
SELECT * FROM func_getratio_laglag('order_first',1,'order_last',0,'customers_hist');
This returns a table with the column labels date_t, customer_code, index and ratio. I have really struggled on how to output ratio as a dynamic column label. That is, I would like to make it contingent on the input parameters e.g. if I ran the select query above then I would like the column labels date_t, customer_code, index and order_first_1_order_last_0.
I am stuck, any advice or hints?

How to derive a column name in the return type from input parameters to the function?
The short answer: Not possible.
SQL is very rigid about column data types and names. Those have to be declared before or at call time at the latest. No exceptions. No truly dynamic column names.
I can think of 3 half-way workarounds:
1. Column aliases
Use your function as is (or rather the audited version I suggest below) and add column aliases in the function call:
SELECT * FROM func_getratio_laglag('order_first',1,'order_last',0,'customers_hist')
AS f(date_t, customer_code, index, order_first_1_order_last_0)
I would do that.
2. Column definition list
Create your function to return anonymous records:
RETURNS SETOF record
Then you have to provide a column definition list with every call:
SELECT * FROM func_getratio_laglag('order_first',1,'order_last',0,'customers_hist')
AS f(date_t timestamp, customer_code text, index text, order_first_1_order_last_0 real)
I would not do that.
3. Use a registered row type as polymorphic input / output type.
Mostly useful if you happen to have row types at hand. You could register a row type on the fly by crating a temporary table, but that seems like overkill for your use case.
Details in the last chapter of this answer:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Function audit
Use format() to make building query string much more safe and simple.
Read the manual if you are not familiar with it.
CREATE OR REPLACE FUNCTION func_getratio_laglag(
_numerator_lag text, _n1 int
, _denominator_lag text, _n2 int
, _table regclass)
RETURNS TABLE (date_t timestamp, customer_code text, index text, ratio real) AS
$func$
BEGIN
RETURN QUERY EXECUTE format (
'SELECT date_t, customer_code, index
, (lag(%I, %s) OVER w / lag(%I, %s) OVER w) -- data type must match
FROM %s
WINDOW w AS (PARTITION BY customer_code ORDER BY date_t)'
, _numerator_lag, _n1, _denominator_lag, _n2, _table::text
);
END
$func$ LANGUAGE plpgsql;
Note the data type regclass for the table name. That's my personal (optional) suggestion.
Table name as a PostgreSQL function parameter
Aside: I would also advise not to use mixed-case identifiers in Postgres.
Are PostgreSQL column names case-sensitive?

PostgreSQL: store function in column as value

Can functions be stored as anonymous functions directly in column as its value?
Let's say I want this function be stored in column.
Example (pseudocode):
Table my_table: pk (int), my_function (func)
func ( x ) { return x * 100 }
And later use it as:
select
t.my_function(some_input) AS output
from
my_table as t
where t.pk = 1999
Function may vary for each pk.

Your title asks something else than your example.
A function has to be created before you can call it. (title)
An expression has to be evaluated. You would need a meta-function for that. (example)
Here are solutions for both:
1. Evaluate expressions dynamically
You have to take into account that the resulting type can vary. I use polymorphic types for that.
CREATE OR REPLACE FUNCTION f1(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;';
CREATE OR REPLACE FUNCTION f2(text)
RETURNS text
LANGUAGE sql IMMUTABLE AS
$$SELECT $1 || '_foo';$$;
CREATE TABLE my_expr (
expr text PRIMARY KEY
, def text
, rettype regtype
);
INSERT INTO my_expr VALUES
('x', 'f1(3)' , 'int')
, ('y', $$f2('bar')$$, 'text')
, ('z', 'now()' , 'timestamptz')
;
CREATE OR REPLACE FUNCTION f_eval(text, _type anyelement = 'NULL'::text, OUT _result anyelement)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE
'SELECT ' || (SELECT def FROM my_expr WHERE expr = $1)
INTO _result;
END
$func$;
Related:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Call:
SQL is strictly typed, the same result column can only have one data type. For multiple rows with possibly heterogeneous data types, you might settle for type text, as every data type can be cast to and from text:
SELECT *, f_eval(expr) AS result -- default to type text
FROM my_expr;
Or return multplce columns like:
SELECT *
, CASE WHEN rettype = 'text'::regtype THEN f_eval(expr) END AS text_result -- default to type text
, CASE WHEN rettype = 'int'::regtype THEN f_eval(expr, NULL::int) END AS int_result
, CASE WHEN rettype = 'timestamptz'::regtype THEN f_eval(expr, NULL::timestamptz) END AS tstz_result
-- , more?
FROM my_expr;
db<>fiddle here
2. Create and use functions dynamically
It is possible to create functions dynamically and then use them. You cannot do that with plain SQL, however. You will have to use another function to do that or at least an anonymous code block (DO statement), introduced in PostgreSQL 9.0.
It can work like this:
CREATE TABLE my_func (func text PRIMARY KEY, def text);
INSERT INTO my_func VALUES
('f'
, $$CREATE OR REPLACE FUNCTION f(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;'$$);
CREATE OR REPLACE FUNCTION f_create_func(text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE (SELECT def FROM my_func WHERE func = $1);
END
$func$;
Call:
SELECT f_create_func('f');
SELECT f(3);
db<>fiddle here
You may want to drop the function afterwards.
In most cases you should just create the functions instead and be done with it. Use separate schemas if you have problems with multiple versions or privileges.
For more information on the features I used here, see my related answer on dba.stackexchange.com.

how would i print out multiple columns from a stored procedure in plpgsql

I am trying to print multiple columns in a stored procedure...can anyone please provide me with an example that uses a query. Thank you.
K for example, I have a movie database and I want to find the percentage of the profitable movies of all movies since the year 1960. I have the queires that do that and I ran it on pgADMIN and it works perfectly; however when i try creating a stored procedure, I know I have to use create a type holder as ( yr INTEGER, prnct FLOAT).
So now with that I need to create a stored procedure that would return the two columns one of the year and one of the percent, however how do i that this column is the yr and the next column is prnct.

If you want to return a single row with multiple columns than you can use record or some_table as the type.
If you have a table like movie than you can create a function like this:
CREATE OR REPLACE FUNCTION get_profitable_movie() RETURNS movie AS
If you want to return some arbitrary type, than you'll have to do something like this:
CREATE OR REPLACE FUNCTION get_profitable_movie() RETURNS record AS
And if you want to return more than 1 row, you have to use the SETOF modifier like this:
CREATE OR REPLACE FUNCTION get_profitable_movie() RETURNS SETOF record AS
You can create a function like this:
CREATE OR REPLACE FUNCTION multicolumn_thing() RETURNS record AS $$
DECLARE
r record;
BEGIN
SELECT 1, 2, 3 INTO r;
RETURN r;
END
$$ LANGUAGE 'plpgsql';
And select results from it like this:
SELECT
columns.a,
columns.b,
columns.c
FROM multicolumn_thing() AS columns(a int, b int, c int);
With a setof it's the same but multiple rows ofcourse :)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Create a custom aggregate function in postgresql - postgresql

I need an aggregate function in postgresql that returns the maximum value of a text column, where the maximum is calculated not by alphabetical order but by the length of the string. Can anyone please help me out?

Related

How to turn a greater strict function into an aggregate

How to ARRAY values from a SELECT query in POSTGRES

How to derive a column name in the return type from input parameters to the function?

PostgreSQL: store function in column as value

how would i print out multiple columns from a stored procedure in plpgsql

Categories

Resources