How to turn a greater strict function into an aggregate - postgresql

I have been looking for a max function setting null to the max value and found the following on (https://www.postgresql.org/message-id/r2y162867791004201002x50843917y3d1f1293db7451e0#mail.gmail.com) :
create or replace function greatest_strict(variadic anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
The problem is that this function is not an aggregation function usable for group by. How can I change that? Such that I can use the following query:
SELECT greatest_strict(performed_on) as start_date
from task
group by contract_id;

I've created this before: https://wiki.postgresql.org/wiki/Aggregate_strict_min_and_max
I call it strict_max, not strict_greatest, because "max" is already an aggregate so that seems like a better name.
This has the advantage (over the other answer) of not storing all the values in memory while it is aggregating over them, so that it can work on very large data sets.

You can create your own aggregation functions.
create aggregate agg_greatest_strict(anyelement) (
sfunc = create_array,
stype = anyarray,
finalfunc = greatest_strict,
initcond = '{}'
);
sfunc is a function which will be executed for every row and returns an intermediate result.
finalfunc will be executed afterwards with the result of the last sfunc execution.
In your case you could create the arrays for every row (your sfunc):
create or replace function create_array(anyarray, anyelement)
returns anyarray as $$
SELECT
$1 || $2
$$ language sql;
This simply aggregates the row values into one array. (first parameter is the result of the previous execution; if it is the first one, initcond value will be taken instead)
Afterwards you can take your function as finalfunc:
create or replace function greatest_strict(anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
demo:db<>fiddle
Edit: Former solutions without any finalfunc function using the greatest() function on every row:
demo:db<>fiddle (one sfunc for anyelement)
demo:db<>fiddle (overloaded sfunc for text and numeric type because of some problem with special chars and ASCII-order)

Related

How to use text input as column name(s) in a Postgres function?

I'm working with Postgres and PostGIS. Trying to write a function that that selects specific columns according to the given argument.
I'm using a WITH statement to create the result table before converting it to bytea to return.
The part I need help with is the $4 part. I tried it is demonstrated below and $4::text and both give me back the text value of the input and not the column value in the table if cols=name so I get back from the query name and not the actual names in the table. I also try data($4) and got type error.
The code is like this:
CREATE OR REPLACE FUNCTION select_by_txt(z integer,x integer,y integer, cols text)
RETURNS bytea
LANGUAGE 'plpgsql'
AS $BODY$
declare
res bytea;
begin
WITH bounds AS (
SELECT ST_TileEnvelope(z, x, y) AS geom
),
mvtgeom AS (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom, $4
FROM table1 t, bounds
WHERE ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
)
SELECT ST_AsMVT(mvtgeom, 'public.select_by_txt')
INTO res
FROM mvtgeom;
RETURN res;
end;
$BODY$;
Example for calling the function:
select_by_txt(10,32,33,"col1,col2")
The argument cols can be multiple column names from 1 and not limited from above. The names of the columns inside cols will be checked before calling the function that they are valid columns.
Passing multiple column names as concatenated string for dynamic execution urgently requires decontamination. I suggest a VARIADIC function parameter instead, with properly quoted identifiers (using quote_ident() in this case):
CREATE OR REPLACE FUNCTION select_by_txt(z int, x int, y int, VARIADIC cols text[] = NULL, OUT res text)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format(
$$
SELECT ST_AsMVT(mvtgeom, 'public.select_by_txt')
FROM (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom%s
FROM table1 t
JOIN (SELECT ST_TileEnvelope($1, $2, $3)) AS bounds(geom)
ON ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
) mvtgeom
$$, (SELECT ', ' || string_agg(quote_ident (col), ', ') FROM unnest(cols) col)
)
INTO res
USING z, x, y;
END
$func$;
db<>fiddle here
The format specifier %I for format() deals with a single identifier. You have to put in more work for multiple identifiers, especially for a variable number of 0-n identifiers. This implementation quotes every single column name, and only add a , if any column names have been passed. So it works for every possible input, even no input at all. Note VARIADIC cols text[] = NULL as last input parameter with NULL as default value:
Optional argument in PL/pgSQL function
Related:
quote_ident() does not add quotes to column name "first"
Column names are case sensitive in this context!
Call for your example (important!):
SELECT select_by_txt(10,32,33,'col1', 'col2');
Alternative syntax:
SELECT select_by_txt(10,32,33, VARIADIC '{col1,col2}');
More revealing call, with a third column name and malicious (though futile) intent:
SELECT select_by_txt(10,32,33,'col1', 'col2', $$col3'); DROP TABLE table1;--$$);
About that odd third column name and SQL injection:
https://www.explainxkcd.com/wiki/index.php/Little_Bobby_Tables
About VAIRADIC parameters:
Return rows matching elements of input array in plpgsql function
Pass multiple values in single parameter
Using an OUT parameter for simplicity. That's totally optional. See:
Returning from a function with OUT parameter
What I would not do
If you really, really trust the input to be a properly formatted list of 1 or more valid column names at all times - and you asserted that ...
the names of the columns inside cols will be checked before calling the function that they are valid columns
You could simplify:
CREATE OR REPLACE FUNCTION select_by_txt(z int, x int, y int, cols text, OUT res text)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format(
$$
SELECT ST_AsMVT(mvtgeom, 'public.select_by_txt')
FROM (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom, %s
FROM table1 t
JOIN (SELECT ST_TileEnvelope($1, $2, $3)) AS bounds(geom)
ON ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
) mvtgeom
$$, cols
)
INTO res
USING z, x, y;
END
$func$;
(How can you be so sure that the input will always be reliable?)
You would need to use a dynamic query:
CREATE OR REPLACE FUNCTION select_by_txt(z integer,x integer,y integer, cols text)
RETURNS bytea
LANGUAGE 'plpgsql'
AS $BODY$
declare
res bytea;
begin
EXECUTE format('
WITH bounds AS (
SELECT ST_TileEnvelope($1, $2, $3) AS geom
),
mvtgeom AS (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom, %I
FROM table1 t, bounds
WHERE ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
)
SELECT ST_AsMVT(mvtgeom, ''public.select_by_txt'')
FROM mvtgeom', cols)
INTO res
USING z,x,y;
RETURN res;
end;
$BODY$;

PL/pgSQL: structure of query does not match function result type

I have a table
create table abac_object (
inbound abac_attribute,
outbound abac_attribute,
created_at timestamptz not null default now(),
primary key (inbound, outbound)
);
abac_attribute is my custom type that is defined like
create type abac_attribute as (
value text,
key text,
namespace_id uuid
);
I try to create a number of functions that work over the table
create function abac_object_list_1(_attr abac_attribute)
returns setof abac_object as $$
select *
from abac_object
where outbound = _attr
$$ language sql stable;
create function abac_object_list_2(_attr1 abac_attribute, _attr2 abac_attribute)
returns setof abac_object as $$
select t1.*
from abac_object as t1
inner join abac_object as t2 on t1.inbound = t2.inbound
where
t1.outbound = _attr1
and t2.outbound = _attr2
$$ language sql stable;
create function abac_object_list_3(_attr1 abac_attribute, _attr2 abac_attribute, _attr3 abac_attribute)
returns setof abac_object as $$
select t1.*
from abac_object as t1
inner join abac_object as t2 on t1.inbound = t2.inbound
inner join abac_object as t3 on t1.inbound = t3.inbound
where
t1.outbound = _attr1
and t2.outbound = _attr2
and t3.outbound = _attr3
$$ language sql stable;
create function abac_object_list(_attrs abac_attribute[], _offset integer, _limit integer)
returns setof abac_attribute as $$
begin
case array_length(_attrs, 1)
when 1 then return query
select t.inbound
from abac_object_list_1(_attrs[1]) as t
order by t.created_at
offset _offset
limit _limit;
when 2 then return query
select t.inbound
from abac_object_list_2(_attrs[1], _attrs[2]) as t
order by t.created_at
offset _offset
limit _limit;
when 3 then return query
select t.inbound
from abac_object_list_3(_attrs[1], _attrs[2], _attrs[3]) as t
order by t.created_at
offset _offset
limit _limit;
else raise exception 'bad argument' using detail = 'array length should be less or equal to 3';
end case;
end
$$ language plpgsql stable;
abac_object_list_1 works fine
select *
from abac_object_list_1(('apple', 'type', '00000000-0000-1000-a000-000000000010') ::abac_attribute);
However with abac_object_list I get the following error
select *
from abac_object_list(array [('apple', 'type', '00000000-0000-1000-a000-000000000010')] :: abac_attribute[], 0, 100);
[42804] ERROR: structure of query does not match function result type Detail: Returned type abac_attribute does not match expected type text in column 1. Where: PL/pgSQL function abac_object_list(abac_attribute[],integer,integer) line 4 at RETURN QUERY
I can't understand where text type comes from.
These functions worked before but then I added created_at column to be able to add explicit ORDER BY clause. And now I can't rewrite functions correctly.
Also maybe there is a better (or more idiomatic) way to define these function.
Could you please help me to figure it out?
Thank you.
P. S.
Well, it seems to work if I write
return query select (t.inbound).*
But I am not sure if it is a right approach.
Also I wonder what is better to use return table or return setof.

Merge multiple result tables and perform final query on result

I have a function returning table, which accumulates output of multiple calls to another function returning table. I would like to perform final query on built table before returning result. Currently I implemented this as two functions, one accumulating and one performing final query, which is ugly:
CREATE OR REPLACE FUNCTION func_accu(LOCATION_ID INTEGER, SCHEMA_CUSTOMER TEXT)
RETURNS TABLE("networkid" integer, "count" bigint) AS $$
DECLARE
GATEWAY_ID integer;
BEGIN
FOR GATEWAY_ID IN
execute format(
'SELECT id FROM %1$I.gateway WHERE location_id=%2$L'
, SCHEMA_CUSTOMER, LOCATION_ID)
LOOP
RETURN QUERY execute format(
'SELECT * FROM get_available_networks_gw(%1$L, %2$L)'
, GATEWAY_ID, SCHEMA_CUSTOMER);
END LOOP;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION func_query(LOCATION_ID INTEGER, SCHEMA_CUSTOMER TEXT)
RETURNS TABLE("networkid" integer, "count" bigint) AS $$
DECLARE
BEGIN
RETURN QUERY execute format('
SELECT networkid, max(count) FROM func_accu(%2$L, %1$L) GROUP BY networkid;'
, SCHEMA_CUSTOMER, LOCATION_ID);
END;
$$ LANGUAGE plpgsql;
How can this be done in single function, elegantly?
Both functions simplified and merged, also supplying value parameters in the USING clause:
CREATE OR REPLACE FUNCTION pg_temp.func_accu(_location_id integer, schema_customer text)
RETURNS TABLE(networkid integer, count bigint) AS
$func$
BEGIN
RETURN QUERY EXECUTE format('
SELECT f.networkid, max(f.ct)
FROM %I.gateway g
, get_available_networks_gw(g.id, $1) f(networkid, ct)
WHERE g.location_id = $2
GROUP BY 1'
, _schema_customer)
USING _schema_customer, _location_id;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM func_accu(123, 'my_schema');
Related:
Dynamically access column value in record
I am using alias names for the columns returned by the function (f(networkid, ct)) to be sure because you did not disclose the return type of get_available_networks_gw(). You can use the column names of the return type directly.
The comma (,) in the FROM clause is short syntax for CROSS JOIN LATERAL .... Requires Postgres 9.3 or later.
What is the difference between LATERAL and a subquery in PostgreSQL?
Or you could run this query instead of the function:
SELECT f.networkid, max(f.ct)
FROM myschema.gateway g, get_available_networks_gw(g.id, 'my_schema') f(networkid, ct)
WHERE g.location_id = $2
GROUP BY 1;

eliminate duplicate array values in postgres

I have an array of type bigint, how can I remove the duplicate values in that array?
Ex: array[1234, 5343, 6353, 1234, 1234]
I should get array[1234, 5343, 6353, ...]
I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.
I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:
array_agg(DISTINCT value)
This works for me.
The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.
To enable its use, you must install the module.
If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.
If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function
SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
?column?
----------
{1,2,3}
(1 row)
Alternatively you could create your own function to do this
CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
SELECT ARRAY(
SELECT DISTINCT $1[s.i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY 1
);
$body$;
Here is a sample invocation:
SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
array_sort_unique
-------------------
{1,2,3}
(1 row)
... Where the statandard libraries (?) for this kind of array_X utility??
Try to search... See some but no standard:
postgres.cz/wiki/Array_based_functions: good reference!
JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.
wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.
MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".
Simplest and faster array_distinct() snippet-lib function
Here the simplest and perhaps faster implementation for array_unique() or array_distinct():
CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;
NOTE: it works as expected with any datatype, except with array of arrays,
SELECT array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ),
array_distinct( array['3','3','hello','hello','bye'] ),
array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
-- "{1,2,3,4,6,8,99}", "{3,bye,hello}", "{3,5,6}"
the "side effect" is to explode all arrays in a set of elements.
PS: with JSONB arrays works fine,
SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
-- "{"[3, 3]","[5, 6]"}"
Edit: more complex but useful, a "drop nulls" parameter
CREATE FUNCTION array_distinct(
anyarray, -- input array
boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x)
FROM unnest($1) t(x)
WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)
CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
array_agg(distinct_value ORDER BY first_index)
FROM
(SELECT
value AS distinct_value,
min(index) AS first_index
FROM
unnest($1) WITH ORDINALITY AS input(value, index)
GROUP BY
value
) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
I have assembled a set of stored procedures (functions) to combat PostgreSQL's lack of array handling coined anyarray. These functions are designed to work across any array data-type, not just integers as intarray does: https://www.github.com/JDBurnZ/anyarray
In your case, all you'd really need is anyarray_uniq.sql. Copy & paste the contents of that file into a PostgreSQL query and execute it to add the function. If you need array sorting as well, also add anyarray_sort.sql.
From there, you can peform a simple query as follows:
SELECT ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234])
Returns something similar to: ARRAY[1234, 6353, 5343]
Or if you require sorting:
SELECT ANYARRAY_SORT(ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234]))
Return exactly: ARRAY[1234, 5343, 6353]
Here's the "inline" way:
SELECT 1 AS anycolumn, (
SELECT array_agg(c1)
FROM (
SELECT DISTINCT c1
FROM (
SELECT unnest(ARRAY[1234,5343,6353,1234,1234]) AS c1
) AS t1
) AS t2
) AS the_array;
First we create a set from array, then we select only distinct entries, and then aggregate it back into array.
In a single query i did this:
SELECT (select array_agg(distinct val) from ( select unnest(:array_column) as val ) as u ) FROM :your_table;
For people like me who still have to deal with postgres 8.2, this recursive function can eliminate duplicates without altering the sorting of the array
CREATE OR REPLACE FUNCTION my_array_uniq(bigint[])
RETURNS bigint[] AS
$BODY$
DECLARE
n integer;
BEGIN
-- number of elements in the array
n = replace(split_part(array_dims($1),':',2),']','')::int;
IF n > 1 THEN
-- test if the last item belongs to the rest of the array
IF ($1)[1:n-1] #> ($1)[n:n] THEN
-- returns the result of the same function on the rest of the array
return my_array_uniq($1[1:n-1]);
ELSE
-- returns the result of the same function on the rest of the array plus the last element
return my_array_uniq($1[1:n-1]) || $1[n:n];
END IF;
ELSE
-- if array has only one item, returns the array
return $1;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
for exemple :
select my_array_uniq(array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99]);
will give
{3,8,2,6,4,1,99}

Sorting array elements

I want to write a stored procedure that gets an array as input parameter and sort that array and return the sorted array.
The best way to sort an array of integers is without a doubt to use the intarray extension, which will do it much, much, much faster than any SQL formulation:
CREATE EXTENSION intarray;
SELECT sort( ARRAY[4,3,2,1] );
A function that works for any array type is:
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(SELECT unnest($1) ORDER BY 1)
$$;
(I've replaced my version with Pavel's slightly faster one after discussion elsewhere).
In PostrgreSQL 8.4 and up you can use:
select array_agg(x) from (select unnest(ARRAY[1,5,3,7,2]) AS x order by x) as _;
But it will not be very fast.
In older Postgres you can implement unnest like this
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE
And array_agg like this:
CREATE AGGREGATE array_agg (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
But it will be even slower.
You can also implement any sorting algorithm in pl/pgsql or any other language you can plug in to postgres.
Just use the function unnest():
SELECT
unnest(ARRAY[1,2]) AS x
ORDER BY
x DESC;
See array functions in the Pg docs.
This worked for me from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I#General_array_sort
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
Please see Craig's answer since he is far more more knowledgable on Postgres and has a better answer. Also if possible vote to delete my answer.
Very nice exhibition of PostgreSQL's features is general procedure for sorting by David Fetter.
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
If you're looking for a solution which will work across any data-type, I'd recommend taking the approach laid out at YouLikeProgramming.com.
Essentially, you can create a stored procedure (code below) which performs the sorting for you, and all you need to do is pass your array to that procedure for it to be sorted appropriately.
I have also included an implementation which does not require the use of a stored procedure, if you're looking for your query to be a little more transportable.
Creating the stored procedure
DROP FUNCTION IF EXISTS array_sort(anyarray);
CREATE FUNCTION
array_sort(
array_vals_to_sort anyarray
)
RETURNS TABLE (
sorted_array anyarray
)
AS $BODY$
BEGIN
RETURN QUERY SELECT
ARRAY_AGG(val) AS sorted_array
FROM
(
SELECT
UNNEST(array_vals_to_sort) AS val
ORDER BY
val
) AS sorted_vals
;
END;
$BODY$
LANGUAGE plpgsql;
Sorting array values (works with any array data-type)
-- The following will return: {1,2,3,4}
SELECT ARRAY_SORT(ARRAY[4,3,2,1]);
-- The following will return: {in,is,it,on,up}
SELECT ARRAY_SORT(ARRAY['up','on','it','is','in']);
Sorting array values without a stored procedure
In the following query, simply replace ARRAY[4,3,2,1] with your array or query which returns an array:
WITH
sorted_vals AS (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
)
SELECT
ARRAY_AGG(val) AS sorted_array
FROM
sorted_vals
... or ...
SELECT
ARRAY_AGG(vals.val) AS sorted_arr
FROM (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
) AS vals
I'm surprised no-one has mentioned the containment operators:
select array[1,2,3] <# array[2,1,3] and array[1,2,3] #> array[2,1,3];
?column?
══════════
t
(1 row)
Notice that this requires that all elements of the arrays must be unique.
(If a contains b and b contains a, they must be the same if all elements are unique)