Consider a custom aggregate intended to take the set union of a bunch of arrays:
CREATE FUNCTION array_union_step (s ANYARRAY, n ANYARRAY) RETURNS ANYARRAY
AS $$ SELECT s || n; $$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE FUNCTION array_union_final (s ANYARRAY) RETURNS ANYARRAY
AS $$
SELECT array_agg(i ORDER BY i) FROM (
SELECT DISTINCT UNNEST(x) AS i FROM (VALUES(s)) AS v(x)
) AS w WHERE i IS NOT NULL;
$$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE AGGREGATE array_union (ANYARRAY) (
SFUNC = array_union_step,
STYPE = ANYARRAY,
FINALFUNC = array_union_final,
INITCOND = '{}',
PARALLEL = SAFE
);
As I understand it, array concatenation in PostgreSQL copies all the elements of both inputs into a new array, so this is quadratic in the total number of elements (before deduplication). Is there a more efficient alternative without writing extension code in C? (Specifically, using either LANGUAGE SQL or LANGUAGE plpgsql.) For instance, maybe it's possible for the step function to take and return a set of rows somehow?
An example of the kind of data this needs to be able to process:
create temp table demo (tag int, values text[]);
insert into demo values
(1, '{"a", "b"}'),
(2, '{"c", "d"}'),
(1, '{"a"}'),
(2, '{"c", "e", "f"}');
select tag, array_union(values) from demo group by tag;
tag | array_union
-----+-------------
2 | {c,d,e,f}
1 | {a,b}
Note in particular that the built-in array_agg cannot be used with this data, because the arrays are not all the same length:
select tag, array_agg(values) from demo group by tag;
ERROR: cannot accumulate arrays of different dimensionality
Array concatenation is expensive. That's why build-in array_agg() uses the internal array builder structure. Unfortunately, you cannot use this API on the SQL level.
I don't think using temp tables for custom aggregation is correct. Creating and dropping tables is expensive (temp table too) or very expensive (with high frequency), INSERT and SELECT is expensive, too. (Tables have a much more complex format than arrays.) If you need really fast aggregation, then write a C functions and use the array builder.
If you cannot use your own C extension, then use the built-in array_agg() function with deduplication and sort on already aggregated data.
CREATE OR REPLACE FUNCTION array_distinct(anyarray)
RETURNS anyarray AS $$
BEGIN
RETURN ARRAY(SELECT DISTINCT v FROM unnest($1) u(v) WHERE v IS NOT NULL ORDER BY v);
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
Call:
SELECT ..., array_distinct(array_agg(somecolumn)) FROM tab;
I have been looking for a max function setting null to the max value and found the following on (https://www.postgresql.org/message-id/r2y162867791004201002x50843917y3d1f1293db7451e0#mail.gmail.com) :
create or replace function greatest_strict(variadic anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
The problem is that this function is not an aggregation function usable for group by. How can I change that? Such that I can use the following query:
SELECT greatest_strict(performed_on) as start_date
from task
group by contract_id;
I've created this before: https://wiki.postgresql.org/wiki/Aggregate_strict_min_and_max
I call it strict_max, not strict_greatest, because "max" is already an aggregate so that seems like a better name.
This has the advantage (over the other answer) of not storing all the values in memory while it is aggregating over them, so that it can work on very large data sets.
You can create your own aggregation functions.
create aggregate agg_greatest_strict(anyelement) (
sfunc = create_array,
stype = anyarray,
finalfunc = greatest_strict,
initcond = '{}'
);
sfunc is a function which will be executed for every row and returns an intermediate result.
finalfunc will be executed afterwards with the result of the last sfunc execution.
In your case you could create the arrays for every row (your sfunc):
create or replace function create_array(anyarray, anyelement)
returns anyarray as $$
SELECT
$1 || $2
$$ language sql;
This simply aggregates the row values into one array. (first parameter is the result of the previous execution; if it is the first one, initcond value will be taken instead)
Afterwards you can take your function as finalfunc:
create or replace function greatest_strict(anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
demo:db<>fiddle
Edit: Former solutions without any finalfunc function using the greatest() function on every row:
demo:db<>fiddle (one sfunc for anyelement)
demo:db<>fiddle (overloaded sfunc for text and numeric type because of some problem with special chars and ASCII-order)
Can functions be stored as anonymous functions directly in column as its value?
Let's say I want this function be stored in column.
Example (pseudocode):
Table my_table: pk (int), my_function (func)
func ( x ) { return x * 100 }
And later use it as:
select
t.my_function(some_input) AS output
from
my_table as t
where t.pk = 1999
Function may vary for each pk.
Your title asks something else than your example.
A function has to be created before you can call it. (title)
An expression has to be evaluated. You would need a meta-function for that. (example)
Here are solutions for both:
1. Evaluate expressions dynamically
You have to take into account that the resulting type can vary. I use polymorphic types for that.
CREATE OR REPLACE FUNCTION f1(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;';
CREATE OR REPLACE FUNCTION f2(text)
RETURNS text
LANGUAGE sql IMMUTABLE AS
$$SELECT $1 || '_foo';$$;
CREATE TABLE my_expr (
expr text PRIMARY KEY
, def text
, rettype regtype
);
INSERT INTO my_expr VALUES
('x', 'f1(3)' , 'int')
, ('y', $$f2('bar')$$, 'text')
, ('z', 'now()' , 'timestamptz')
;
CREATE OR REPLACE FUNCTION f_eval(text, _type anyelement = 'NULL'::text, OUT _result anyelement)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE
'SELECT ' || (SELECT def FROM my_expr WHERE expr = $1)
INTO _result;
END
$func$;
Related:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Call:
SQL is strictly typed, the same result column can only have one data type. For multiple rows with possibly heterogeneous data types, you might settle for type text, as every data type can be cast to and from text:
SELECT *, f_eval(expr) AS result -- default to type text
FROM my_expr;
Or return multplce columns like:
SELECT *
, CASE WHEN rettype = 'text'::regtype THEN f_eval(expr) END AS text_result -- default to type text
, CASE WHEN rettype = 'int'::regtype THEN f_eval(expr, NULL::int) END AS int_result
, CASE WHEN rettype = 'timestamptz'::regtype THEN f_eval(expr, NULL::timestamptz) END AS tstz_result
-- , more?
FROM my_expr;
db<>fiddle here
2. Create and use functions dynamically
It is possible to create functions dynamically and then use them. You cannot do that with plain SQL, however. You will have to use another function to do that or at least an anonymous code block (DO statement), introduced in PostgreSQL 9.0.
It can work like this:
CREATE TABLE my_func (func text PRIMARY KEY, def text);
INSERT INTO my_func VALUES
('f'
, $$CREATE OR REPLACE FUNCTION f(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;'$$);
CREATE OR REPLACE FUNCTION f_create_func(text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE (SELECT def FROM my_func WHERE func = $1);
END
$func$;
Call:
SELECT f_create_func('f');
SELECT f(3);
db<>fiddle here
You may want to drop the function afterwards.
In most cases you should just create the functions instead and be done with it. Use separate schemas if you have problems with multiple versions or privileges.
For more information on the features I used here, see my related answer on dba.stackexchange.com.
I have an array of type bigint, how can I remove the duplicate values in that array?
Ex: array[1234, 5343, 6353, 1234, 1234]
I should get array[1234, 5343, 6353, ...]
I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.
I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:
array_agg(DISTINCT value)
This works for me.
The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.
To enable its use, you must install the module.
If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.
If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function
SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
?column?
----------
{1,2,3}
(1 row)
Alternatively you could create your own function to do this
CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
SELECT ARRAY(
SELECT DISTINCT $1[s.i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY 1
);
$body$;
Here is a sample invocation:
SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
array_sort_unique
-------------------
{1,2,3}
(1 row)
... Where the statandard libraries (?) for this kind of array_X utility??
Try to search... See some but no standard:
postgres.cz/wiki/Array_based_functions: good reference!
JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.
wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.
MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".
Simplest and faster array_distinct() snippet-lib function
Here the simplest and perhaps faster implementation for array_unique() or array_distinct():
CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;
NOTE: it works as expected with any datatype, except with array of arrays,
SELECT array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ),
array_distinct( array['3','3','hello','hello','bye'] ),
array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
-- "{1,2,3,4,6,8,99}", "{3,bye,hello}", "{3,5,6}"
the "side effect" is to explode all arrays in a set of elements.
PS: with JSONB arrays works fine,
SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
-- "{"[3, 3]","[5, 6]"}"
Edit: more complex but useful, a "drop nulls" parameter
CREATE FUNCTION array_distinct(
anyarray, -- input array
boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x)
FROM unnest($1) t(x)
WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)
CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
array_agg(distinct_value ORDER BY first_index)
FROM
(SELECT
value AS distinct_value,
min(index) AS first_index
FROM
unnest($1) WITH ORDINALITY AS input(value, index)
GROUP BY
value
) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
I have assembled a set of stored procedures (functions) to combat PostgreSQL's lack of array handling coined anyarray. These functions are designed to work across any array data-type, not just integers as intarray does: https://www.github.com/JDBurnZ/anyarray
In your case, all you'd really need is anyarray_uniq.sql. Copy & paste the contents of that file into a PostgreSQL query and execute it to add the function. If you need array sorting as well, also add anyarray_sort.sql.
From there, you can peform a simple query as follows:
SELECT ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234])
Returns something similar to: ARRAY[1234, 6353, 5343]
Or if you require sorting:
SELECT ANYARRAY_SORT(ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234]))
Return exactly: ARRAY[1234, 5343, 6353]
Here's the "inline" way:
SELECT 1 AS anycolumn, (
SELECT array_agg(c1)
FROM (
SELECT DISTINCT c1
FROM (
SELECT unnest(ARRAY[1234,5343,6353,1234,1234]) AS c1
) AS t1
) AS t2
) AS the_array;
First we create a set from array, then we select only distinct entries, and then aggregate it back into array.
In a single query i did this:
SELECT (select array_agg(distinct val) from ( select unnest(:array_column) as val ) as u ) FROM :your_table;
For people like me who still have to deal with postgres 8.2, this recursive function can eliminate duplicates without altering the sorting of the array
CREATE OR REPLACE FUNCTION my_array_uniq(bigint[])
RETURNS bigint[] AS
$BODY$
DECLARE
n integer;
BEGIN
-- number of elements in the array
n = replace(split_part(array_dims($1),':',2),']','')::int;
IF n > 1 THEN
-- test if the last item belongs to the rest of the array
IF ($1)[1:n-1] #> ($1)[n:n] THEN
-- returns the result of the same function on the rest of the array
return my_array_uniq($1[1:n-1]);
ELSE
-- returns the result of the same function on the rest of the array plus the last element
return my_array_uniq($1[1:n-1]) || $1[n:n];
END IF;
ELSE
-- if array has only one item, returns the array
return $1;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
for exemple :
select my_array_uniq(array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99]);
will give
{3,8,2,6,4,1,99}
Assuming that my subquery yields a number of rows with the columns (x,y), I would like to calculate the value avg(abs(x-mean)/y). where mean effectively is avg(x).
select avg(abs(x-avg(x))/y) as incline from subquery fails because I cannot nest aggregation functions. Nor can I think of a way to calculate the mean in a subquery while keeping the original result set. An avgdev function as it exists in other dialects would not exactly help me, so here I am stuck. Probably just due to lack of sql knowledge - calculating the value from the result set in postprocessing is easy.
Which SQL construct could help me?
Edit: Server version is 8.3.4. No window functions with WITH or OVER available here.
Not sure I understand you correctly, but you might be looking for something like this:
SELECT avg(x - mean/y)
FROM (
SELECT x,
y,
avg(x) as mean over(partition by your_grouping_column)
FROM your_table
) t
If you do not need to group your results to get the correct avg(x) then simply leave out the "partition by" using an empty over: over()
if your data sets are not too large you could accumulate them into an array and then return the incline from a function:
create type typ as (x numeric, y numeric);
create aggregate array_accum( sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}' );
create or replace function unnest(anyarray) returns setof anyelement
language sql immutable strict as $$
select $1[i] from generate_series(array_lower($1,1), array_upper($1,1)) i;$$;
create function get_incline(typ[]) returns numeric
language sql immutable strict as $$
select avg(abs(x-(select avg(x) from unnest($1)))/y) from unnest($1);$$;
select get_incline((select array_accum((x,y)::typ) from subquery));
sample view for testing:
create view subquery as
select generate_series(1,5) as x, generate_series(1,6) as y;
One option I found is to use a temporary table:
begin;
create temporary table sub on commit drop as (...subquery code...);
select avg(abs(x-mean)/y) as incline from (SELECT x, y, (SELECT avg(x) FROM sub) AS mean FROM sub) as sub2;
commit;
But is that overkill?