Improve performance of custom aggregate function in PostgreSQL - postgresql

I have custom aggregate sum function which accepts boolean data type:
create or replace function badd (bigint, boolean)
returns bigint as
$body$
select $1 + case when $2 then 1 else 0 end;
$body$ language sql;
create aggregate sum(boolean) (
sfunc=badd,
stype=int8,
initcond='0'
);
This aggregate should calculate number of rows with TRUE. For example the following should return 2 (and it does):
with t (x) as
(values
(true::boolean),
(false::boolean),
(true::boolean),
(null::boolean)
)
select sum(x) from t;
However, it's performance is quite bad, it is 5.5 times slower then using casting to integer:
with t as (select (gs > 0.5) as test_vector from generate_series(1,1000000,1) gs)
select sum(test_vector) from t; -- 52012ms
with t as (select (gs > 0.5) as test_vector from generate_series(1,1000000,1) gs)
select sum(test_vector::int) from t; -- 9484ms
Is the only way how to improve this aggregate to write some new C function - e.g. some alternative of int2_sum function in src/backend/utils/adt/numeric.c?

Your test case is misleading, you only count TRUE. You should have both TRUE and FALSE - or even NULL, if applicable.
Like #foibs already explained, one wouldn't use a custom aggregate function for this. The built-in C-functions are much faster and do the job. Use instead (also demonstrating a simpler and more sensible test):
SELECT count(NULLIF(g%2 = 1, FALSE)) AS ct
FROM generate_series(1,100000,1) g;
How does this work?
Compute percents from SUM() in the same SELECT sql query
Several fast & simple ways (plus a benchmark) under this related answer on dba.SE:
For absolute performance, is SUM faster or COUNT?
Or faster yet, test for TRUE in the WHERE clause, where possible:
SELECT count(*) AS ct
FROM generate_series(1,100000,1) g;
WHERE g%2 = 1 -- excludes FALSE and NULL !
If you'd have to write a custom aggregate for some reason, this form would be superior:
CREATE OR REPLACE FUNCTION test_sum_int8 (int8, boolean)
RETURNS bigint as
'SELECT CASE WHEN $2 THEN $1 + 1 ELSE $1 END' LANGUAGE sql;
The addition is only executed when necessary. Your original would add 0 for the FALSE case.
Better yet, use a plpgsql function. It saves a bit of overhead per call, since it works like a prepared statement (the query is not re-planned). Makes a difference for a tiny aggregate function that is called many times:
CREATE OR REPLACE FUNCTION test_sum_plpgsql (int8, boolean)
RETURNS bigint AS
$func$
BEGIN
RETURN CASE WHEN $2 THEN $1 + 1 ELSE $1 END;
END
$func$ LANGUAGE plpgsql;
CREATE AGGREGATE test_sum_plpgsql(boolean) (
sfunc = test_sum_plpgsql
,stype = int8
,initcond = '0'
);
Faster than what you had, but much slower than the presented alternative with a standard count(). And slower than any other C-function, too.
->SQLfiddle

I created custom C function and aggregate for boolean:
C function:
#include "postgres.h"
#include <fmgr.h>
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
int
bool_sum(int arg, bool tmp)
{
if (tmp)
{
arg++;
}
return arg;
}
Transition and aggregate functions:
-- transition function
create or replace function bool_sum(bigint, boolean)
returns bigint
AS '/usr/lib/postgresql/9.1/lib/bool_agg', 'bool_sum'
language C strict
cost 1;
alter function bool_sum(bigint, boolean) owner to postgres;
-- aggregate
create aggregate sum(boolean) (
sfunc=bool_sum,
stype=int8,
initcond='0'
);
alter aggregate sum(boolean) owner to postgres;
Performance test:
-- Performance test - 10m rows
create table tmp_test as (select (case when random() <.3 then null when random() < .6 then true else false end) as test_vector from generate_series(1,10000000,1) gs);
-- Casting to integer
select sum(test_vector::int) from tmp_test;
-- Boolean sum
select sum(test_vector) from tmp_test;
Now sum(boolean) is as fast as sum(boolean::int).
Update:
It turns out that I can call existing C transition functions directly, even with boolean data type. It gets somehow magically converted to 0/1 on the way. So my current solution for boolean sum and average is:
create or replace function bool_sum(bigint, boolean)
returns bigint as
'int2_sum'
language internal immutable
cost 1;
create aggregate sum(boolean) (
sfunc=bool_sum,
stype=int8
);
-- Average for boolean values (percentage of rows with TRUE)
create or replace function bool_avg_accum(bigint[], boolean)
returns bigint[] as
'int2_avg_accum'
language internal immutable strict
cost 1;
create aggregate avg(boolean) (
sfunc=bool_avg_accum,
stype=int8[],
finalfunc=int8_avg,
initcond='{0,0}'
);

I don't see the real issue here. First of all, using sum as your custom aggregate name is wrong. When you call sum with your test_vector cast to int, the embedded postgres sum is used and not yours, that's why it is so much faster. A C function will always be faster, but I'm not sure you need one in this case.
You could easily drop the badd function and your custom sum use the embedded sum with a where clause
with t as (select 1 as test_vector from generate_series(1,1000000,1) gs where gs > 0.5)
select sum(test_vector) from t;
EDIT:
To sum it up, the best way to optimize your custom aggregate is to remove it if it is not needed. The second best way would be to write a C function.

Related

Reusing json parsed input in postgres plpgsql function

I have a plpgsql function that takes a jsonb input, and uses it to first check something, and then again in a query to get results. Something like:
CREATE OR REPLACE FUNCTION public.my_func(
a jsonb,
OUT inserted integer)
RETURNS integer
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $function$
BEGIN
-- fail if there's something already there
IF EXISTS(
select t.x from jsonb_populate_recordset(null::my_type, a) f inner join some_table t
on f.x = t.x and
f.y = t.y
) THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
-- straight insert, and collect number of inserted
WITH inserted_rows AS (
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z
FROM jsonb_populate_recordset(null::my_type, a) f
RETURNING 1
)
SELECT count(*) from inserted_rows INTO inserted
;
END
Here, I'm using jsonb_populate_recordset(null::my_type, a) both in the IF check, and also in the actual insert. Is there a way to do the parsing once - perhaps via a variable of some sort? Or would the query optimiser kick in and ensure the parse operation happens only once?
If I understand correctly you look to something like this:
CREATE OR REPLACE FUNCTION public.my_func(
a jsonb,
OUT inserted integer)
RETURNS integer
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $function$
BEGIN
WITH checked_rows AS (
SELECT f.x, f.y, f.z, t.x IS NOT NULL as present
FROM jsonb_populate_recordset(null::my_type, a) f
LEFT join some_table t
on f.x = t.x and f.y = t.y
), vioalted_rows AS (
SELECT count(*) AS violated FROM checked_rows AS c WHERE c.present
), inserted_rows AS (
INSERT INTO some_table (x, y, z)
SELECT c.x, c.y, c.z
FROM checked_rows AS c
WHERE (SELECT violated FROM vioalted_rows) = 0
RETURNING 1
)
SELECT count(*) from inserted_rows INTO inserted
;
IF inserted = 0 THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
END;
$function$;
JSONB type is no need to parse more then once, at the assignment:
while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.
Link
jsonb_populate_recordset function declared as STABLE:
STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements.
Link
I am not sure about it. From the one side UDF call is considering as single statements, from the other side UDF can contains multiple statement. Clarification needed.
Finally if you want to cache such sings then you could to use arrays:
CREATE OR REPLACE FUNCTION public.my_func(
a jsonb,
OUT inserted integer)
RETURNS integer
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $function$
DECLARE
d my_type[]; -- There is variable for caching
BEGIN
select array_agg(f) into d from jsonb_populate_recordset(null::my_type, a) as f;
-- fail if there's something already there
IF EXISTS(
select *
from some_table t
where (t.x, t.y) in (select x, y from unnest(d)))
THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
-- straight insert, and collect number of inserted
WITH inserted_rows AS (
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z
FROM unnest(d) f
RETURNING 1
)
SELECT count(*) from inserted_rows INTO inserted;
END $function$;
If you actually want to reuse a result set repeatedly, the general solution would be a temporary table. Example:
Using temp table in PL/pgSQL procedure for cleaning tables
However, that's rather expensive. Looks like all you need is a UNIQUE constraint or index:
Simple and safe with UNIQUE constraint
ALTER TABLE some_table ADD CONSTRAINT some_table_x_y_uni UNIQUE (x,y);
As opposed to your procedural attempt, this is also concurrency-safe (no race conditions). Much faster, too.
Then the function can be dead simple:
CREATE OR REPLACE FUNCTION public.my_func(a jsonb, OUT inserted integer) AS
$func$
BEGIN
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z
FROM jsonb_populate_recordset(null::my_type, a) f;
GET DIAGNOSTICS inserted = ROW_COUNT; -- OUT param, we're done here
END
$func$ LANGUAGE plpgsql;
If any (x,y) is already present in some_table you get your exception. Chose an instructive name for the constraint, which is reported in the error message.
And we can just read the command tag with GET DIAGNOSTICS, which is substantially cheaper than running another count query.
Related:
How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?
UNIQUE constraint not possible?
For the unlikely case that a UNIQUE constraint should not be feasible, you can still have it rather simple:
CREATE OR REPLACE FUNCTION public.my_func(a jsonb, OUT inserted integer) AS
$func$
BEGIN
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z -- empty result set if there are any violations
FROM (
SELECT f.x, f.y, f.z, count(t.x) OVER () AS conflicts
FROM jsonb_populate_recordset(null::my_type, a) f
LEFT JOIN some_table t USING (x,y)
) f
WHERE f.conflicts = 0;
GET DIAGNOSTICS inserted = ROW_COUNT;
IF inserted = 0 THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
END
$func$ LANGUAGE plpgsql;
Count the number of violations in the same query. (count() only counts non-null values). Related:
Best way to get result count before LIMIT was applied
You should have at least a simple index on some_table (x,y) anyway.
It's important to know that plpgsql does not return results before control exits the function. The exception cancels the return, the user never gets results, only the error message. We added a code example to the manual.
Note, however, that there are race conditions here under concurrent write load. Related:
Is SELECT or INSERT in a function prone to race conditions?
Would the query planner avoid repeated evaluation?
Certainly not between multiple SQL statements.
Even if the function itself is defined STABLE or IMMUTABLE (jsonb_populate_recordset() in the example is STABLE), the query planner does not know that values of input parameters are unchanged between calls. It would be expensive to keep track and make sure of it.
Actually, since plpgsql treats SQL statements like prepared statements, that's plain impossible, since the query is planned before parameter values are fed to the planned query.

PostgreSQL: store function in column as value

Can functions be stored as anonymous functions directly in column as its value?
Let's say I want this function be stored in column.
Example (pseudocode):
Table my_table: pk (int), my_function (func)
func ( x ) { return x * 100 }
And later use it as:
select
t.my_function(some_input) AS output
from
my_table as t
where t.pk = 1999
Function may vary for each pk.
Your title asks something else than your example.
A function has to be created before you can call it. (title)
An expression has to be evaluated. You would need a meta-function for that. (example)
Here are solutions for both:
1. Evaluate expressions dynamically
You have to take into account that the resulting type can vary. I use polymorphic types for that.
CREATE OR REPLACE FUNCTION f1(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;';
CREATE OR REPLACE FUNCTION f2(text)
RETURNS text
LANGUAGE sql IMMUTABLE AS
$$SELECT $1 || '_foo';$$;
CREATE TABLE my_expr (
expr text PRIMARY KEY
, def text
, rettype regtype
);
INSERT INTO my_expr VALUES
('x', 'f1(3)' , 'int')
, ('y', $$f2('bar')$$, 'text')
, ('z', 'now()' , 'timestamptz')
;
CREATE OR REPLACE FUNCTION f_eval(text, _type anyelement = 'NULL'::text, OUT _result anyelement)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE
'SELECT ' || (SELECT def FROM my_expr WHERE expr = $1)
INTO _result;
END
$func$;
Related:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Call:
SQL is strictly typed, the same result column can only have one data type. For multiple rows with possibly heterogeneous data types, you might settle for type text, as every data type can be cast to and from text:
SELECT *, f_eval(expr) AS result -- default to type text
FROM my_expr;
Or return multplce columns like:
SELECT *
, CASE WHEN rettype = 'text'::regtype THEN f_eval(expr) END AS text_result -- default to type text
, CASE WHEN rettype = 'int'::regtype THEN f_eval(expr, NULL::int) END AS int_result
, CASE WHEN rettype = 'timestamptz'::regtype THEN f_eval(expr, NULL::timestamptz) END AS tstz_result
-- , more?
FROM my_expr;
db<>fiddle here
2. Create and use functions dynamically
It is possible to create functions dynamically and then use them. You cannot do that with plain SQL, however. You will have to use another function to do that or at least an anonymous code block (DO statement), introduced in PostgreSQL 9.0.
It can work like this:
CREATE TABLE my_func (func text PRIMARY KEY, def text);
INSERT INTO my_func VALUES
('f'
, $$CREATE OR REPLACE FUNCTION f(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;'$$);
CREATE OR REPLACE FUNCTION f_create_func(text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE (SELECT def FROM my_func WHERE func = $1);
END
$func$;
Call:
SELECT f_create_func('f');
SELECT f(3);
db<>fiddle here
You may want to drop the function afterwards.
In most cases you should just create the functions instead and be done with it. Use separate schemas if you have problems with multiple versions or privileges.
For more information on the features I used here, see my related answer on dba.stackexchange.com.

eliminate duplicate array values in postgres

I have an array of type bigint, how can I remove the duplicate values in that array?
Ex: array[1234, 5343, 6353, 1234, 1234]
I should get array[1234, 5343, 6353, ...]
I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.
I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:
array_agg(DISTINCT value)
This works for me.
The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.
To enable its use, you must install the module.
If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.
If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function
SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
?column?
----------
{1,2,3}
(1 row)
Alternatively you could create your own function to do this
CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
SELECT ARRAY(
SELECT DISTINCT $1[s.i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY 1
);
$body$;
Here is a sample invocation:
SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
array_sort_unique
-------------------
{1,2,3}
(1 row)
... Where the statandard libraries (?) for this kind of array_X utility??
Try to search... See some but no standard:
postgres.cz/wiki/Array_based_functions: good reference!
JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.
wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.
MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".
Simplest and faster array_distinct() snippet-lib function
Here the simplest and perhaps faster implementation for array_unique() or array_distinct():
CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;
NOTE: it works as expected with any datatype, except with array of arrays,
SELECT array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ),
array_distinct( array['3','3','hello','hello','bye'] ),
array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
-- "{1,2,3,4,6,8,99}", "{3,bye,hello}", "{3,5,6}"
the "side effect" is to explode all arrays in a set of elements.
PS: with JSONB arrays works fine,
SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
-- "{"[3, 3]","[5, 6]"}"
Edit: more complex but useful, a "drop nulls" parameter
CREATE FUNCTION array_distinct(
anyarray, -- input array
boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x)
FROM unnest($1) t(x)
WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)
CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
array_agg(distinct_value ORDER BY first_index)
FROM
(SELECT
value AS distinct_value,
min(index) AS first_index
FROM
unnest($1) WITH ORDINALITY AS input(value, index)
GROUP BY
value
) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
I have assembled a set of stored procedures (functions) to combat PostgreSQL's lack of array handling coined anyarray. These functions are designed to work across any array data-type, not just integers as intarray does: https://www.github.com/JDBurnZ/anyarray
In your case, all you'd really need is anyarray_uniq.sql. Copy & paste the contents of that file into a PostgreSQL query and execute it to add the function. If you need array sorting as well, also add anyarray_sort.sql.
From there, you can peform a simple query as follows:
SELECT ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234])
Returns something similar to: ARRAY[1234, 6353, 5343]
Or if you require sorting:
SELECT ANYARRAY_SORT(ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234]))
Return exactly: ARRAY[1234, 5343, 6353]
Here's the "inline" way:
SELECT 1 AS anycolumn, (
SELECT array_agg(c1)
FROM (
SELECT DISTINCT c1
FROM (
SELECT unnest(ARRAY[1234,5343,6353,1234,1234]) AS c1
) AS t1
) AS t2
) AS the_array;
First we create a set from array, then we select only distinct entries, and then aggregate it back into array.
In a single query i did this:
SELECT (select array_agg(distinct val) from ( select unnest(:array_column) as val ) as u ) FROM :your_table;
For people like me who still have to deal with postgres 8.2, this recursive function can eliminate duplicates without altering the sorting of the array
CREATE OR REPLACE FUNCTION my_array_uniq(bigint[])
RETURNS bigint[] AS
$BODY$
DECLARE
n integer;
BEGIN
-- number of elements in the array
n = replace(split_part(array_dims($1),':',2),']','')::int;
IF n > 1 THEN
-- test if the last item belongs to the rest of the array
IF ($1)[1:n-1] #> ($1)[n:n] THEN
-- returns the result of the same function on the rest of the array
return my_array_uniq($1[1:n-1]);
ELSE
-- returns the result of the same function on the rest of the array plus the last element
return my_array_uniq($1[1:n-1]) || $1[n:n];
END IF;
ELSE
-- if array has only one item, returns the array
return $1;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
for exemple :
select my_array_uniq(array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99]);
will give
{3,8,2,6,4,1,99}

Avoid to nest aggregation functions in PostgreSQL 8.3.4

Assuming that my subquery yields a number of rows with the columns (x,y), I would like to calculate the value avg(abs(x-mean)/y). where mean effectively is avg(x).
select avg(abs(x-avg(x))/y) as incline from subquery fails because I cannot nest aggregation functions. Nor can I think of a way to calculate the mean in a subquery while keeping the original result set. An avgdev function as it exists in other dialects would not exactly help me, so here I am stuck. Probably just due to lack of sql knowledge - calculating the value from the result set in postprocessing is easy.
Which SQL construct could help me?
Edit: Server version is 8.3.4. No window functions with WITH or OVER available here.
Not sure I understand you correctly, but you might be looking for something like this:
SELECT avg(x - mean/y)
FROM (
SELECT x,
y,
avg(x) as mean over(partition by your_grouping_column)
FROM your_table
) t
If you do not need to group your results to get the correct avg(x) then simply leave out the "partition by" using an empty over: over()
if your data sets are not too large you could accumulate them into an array and then return the incline from a function:
create type typ as (x numeric, y numeric);
create aggregate array_accum( sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}' );
create or replace function unnest(anyarray) returns setof anyelement
language sql immutable strict as $$
select $1[i] from generate_series(array_lower($1,1), array_upper($1,1)) i;$$;
create function get_incline(typ[]) returns numeric
language sql immutable strict as $$
select avg(abs(x-(select avg(x) from unnest($1)))/y) from unnest($1);$$;
select get_incline((select array_accum((x,y)::typ) from subquery));
sample view for testing:
create view subquery as
select generate_series(1,5) as x, generate_series(1,6) as y;
One option I found is to use a temporary table:
begin;
create temporary table sub on commit drop as (...subquery code...);
select avg(abs(x-mean)/y) as incline from (SELECT x, y, (SELECT avg(x) FROM sub) AS mean FROM sub) as sub2;
commit;
But is that overkill?

Sorting array elements

I want to write a stored procedure that gets an array as input parameter and sort that array and return the sorted array.
The best way to sort an array of integers is without a doubt to use the intarray extension, which will do it much, much, much faster than any SQL formulation:
CREATE EXTENSION intarray;
SELECT sort( ARRAY[4,3,2,1] );
A function that works for any array type is:
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(SELECT unnest($1) ORDER BY 1)
$$;
(I've replaced my version with Pavel's slightly faster one after discussion elsewhere).
In PostrgreSQL 8.4 and up you can use:
select array_agg(x) from (select unnest(ARRAY[1,5,3,7,2]) AS x order by x) as _;
But it will not be very fast.
In older Postgres you can implement unnest like this
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE
And array_agg like this:
CREATE AGGREGATE array_agg (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
But it will be even slower.
You can also implement any sorting algorithm in pl/pgsql or any other language you can plug in to postgres.
Just use the function unnest():
SELECT
unnest(ARRAY[1,2]) AS x
ORDER BY
x DESC;
See array functions in the Pg docs.
This worked for me from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I#General_array_sort
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
Please see Craig's answer since he is far more more knowledgable on Postgres and has a better answer. Also if possible vote to delete my answer.
Very nice exhibition of PostgreSQL's features is general procedure for sorting by David Fetter.
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
If you're looking for a solution which will work across any data-type, I'd recommend taking the approach laid out at YouLikeProgramming.com.
Essentially, you can create a stored procedure (code below) which performs the sorting for you, and all you need to do is pass your array to that procedure for it to be sorted appropriately.
I have also included an implementation which does not require the use of a stored procedure, if you're looking for your query to be a little more transportable.
Creating the stored procedure
DROP FUNCTION IF EXISTS array_sort(anyarray);
CREATE FUNCTION
array_sort(
array_vals_to_sort anyarray
)
RETURNS TABLE (
sorted_array anyarray
)
AS $BODY$
BEGIN
RETURN QUERY SELECT
ARRAY_AGG(val) AS sorted_array
FROM
(
SELECT
UNNEST(array_vals_to_sort) AS val
ORDER BY
val
) AS sorted_vals
;
END;
$BODY$
LANGUAGE plpgsql;
Sorting array values (works with any array data-type)
-- The following will return: {1,2,3,4}
SELECT ARRAY_SORT(ARRAY[4,3,2,1]);
-- The following will return: {in,is,it,on,up}
SELECT ARRAY_SORT(ARRAY['up','on','it','is','in']);
Sorting array values without a stored procedure
In the following query, simply replace ARRAY[4,3,2,1] with your array or query which returns an array:
WITH
sorted_vals AS (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
)
SELECT
ARRAY_AGG(val) AS sorted_array
FROM
sorted_vals
... or ...
SELECT
ARRAY_AGG(vals.val) AS sorted_arr
FROM (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
) AS vals
I'm surprised no-one has mentioned the containment operators:
select array[1,2,3] <# array[2,1,3] and array[1,2,3] #> array[2,1,3];
?column?
══════════
t
(1 row)
Notice that this requires that all elements of the arrays must be unique.
(If a contains b and b contains a, they must be the same if all elements are unique)