I need an aggregate function in postgresql that returns the maximum value of a text column, where the maximum is calculated not by alphabetical order but by the length of the string.
Can anyone please help me out?
A custom aggregate consist of two parts: a function that does the work and the definition of the aggregate function.
So we first need a function that returns the longer of two strings:
create function greater_by_length(p_one text, p_other text)
returns text
as
$$
select case
when length(p_one) >= length(p_other) then p_one
else p_other
end
$$
language sql
immutable;
Then we can define an aggregate using that function:
create aggregate max_by_length(text)
(
sfunc = greater_by_length,
stype = text
);
And using it:
select max_by_length(s)
from (
values ('one'), ('onetwo'), ('three'), ('threefourfive')
) as x(s);
returns threefourfive
I have been looking for a max function setting null to the max value and found the following on (https://www.postgresql.org/message-id/r2y162867791004201002x50843917y3d1f1293db7451e0#mail.gmail.com) :
create or replace function greatest_strict(variadic anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
The problem is that this function is not an aggregation function usable for group by. How can I change that? Such that I can use the following query:
SELECT greatest_strict(performed_on) as start_date
from task
group by contract_id;
I've created this before: https://wiki.postgresql.org/wiki/Aggregate_strict_min_and_max
I call it strict_max, not strict_greatest, because "max" is already an aggregate so that seems like a better name.
This has the advantage (over the other answer) of not storing all the values in memory while it is aggregating over them, so that it can work on very large data sets.
You can create your own aggregation functions.
create aggregate agg_greatest_strict(anyelement) (
sfunc = create_array,
stype = anyarray,
finalfunc = greatest_strict,
initcond = '{}'
);
sfunc is a function which will be executed for every row and returns an intermediate result.
finalfunc will be executed afterwards with the result of the last sfunc execution.
In your case you could create the arrays for every row (your sfunc):
create or replace function create_array(anyarray, anyelement)
returns anyarray as $$
SELECT
$1 || $2
$$ language sql;
This simply aggregates the row values into one array. (first parameter is the result of the previous execution; if it is the first one, initcond value will be taken instead)
Afterwards you can take your function as finalfunc:
create or replace function greatest_strict(anyarray)
returns anyelement as $$
select null from unnest($1) g(v) where v is null
union all
select max(v) from unnest($1) g(v)
limit 1
$$ language sql;
demo:db<>fiddle
Edit: Former solutions without any finalfunc function using the greatest() function on every row:
demo:db<>fiddle (one sfunc for anyelement)
demo:db<>fiddle (overloaded sfunc for text and numeric type because of some problem with special chars and ASCII-order)
Write an aggregate to count the number of times the number 40 is seen in a column.
Use your aggregate to count the number of 40 year olds in the directory table.
This is what I was doing:
Create function aggstep(curr int) returns int as $$
begin
return curr.count where age = 40;
end;
$$ language plpgsql;
Create aggregate aggs(integer) (
stype = int,
initcond = '',
sfunc = aggstep);
Select cas(age) from directory;
You could do it for example like this:
First, create a transition function:
CREATE FUNCTION count40func(bigint, integer) RETURNS bigint
LANGUAGE sql IMMUTABLE CALLED ON NULL INPUT AS
'SELECT $1 + ($2 IS NOT DISTINCT FROM 40)::integer::bigint';
That works because FALSE::integer is 0 and TRUE::integer is 1.
I use IS NOT DISTINCT FROM rather than = so that it does the correct thing for NULLs.
The aggregate can then be defined as
CREATE AGGREGATE count40(integer) (
SFUNC = count40func,
STYPE = bigint,
INITCOND = 0
);
You can then query like
SELECT count40(age) FROM directory;
I am recently inclined in PostgreSQL and trying to make a ListAggregation function as given here the only difference being that I am trying to use CONCAT instead of TextCat
.
My function is as under
CREATE AGGREGATE ListAggregation(
basetype = Text,
sfunc = Concat,
stype = Text,
initcond = ''
);
It is throwing error
ERROR: function concat(text, text) does not exist
********** Error **********
ERROR: function concat(text, text) does not exist
SQL state: 42883
what mistake I am making...please help
N.B.~ I have even looked at the example given here
Thanks
Interesting, what are you palanning to do? There is already a string_agg() aggregate function in PostgreSQL 9.0+ ...
You should create state change function sfunc to implement an aggregate with the signature: sfunc( state, value ) ---> next-state
-- sfunc:
CREATE OR REPLACE FUNCTION concat(text, text)
RETURNS text
LANGUAGE SQL
AS $$
SELECT $1||$2;
$$;
-- Aggregate:
CREATE AGGREGATE ListAggregation(
basetype = text,
sfunc = concat,
stype = text,
initcond = ''
);
-- Testing:
WITH test(v) AS (VALUES
('AAAA'),
('BBBB'),
('1111'),
('2222') )
SELECT ListAggregation(v) FROM test;
I want to write a stored procedure that gets an array as input parameter and sort that array and return the sorted array.
The best way to sort an array of integers is without a doubt to use the intarray extension, which will do it much, much, much faster than any SQL formulation:
CREATE EXTENSION intarray;
SELECT sort( ARRAY[4,3,2,1] );
A function that works for any array type is:
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(SELECT unnest($1) ORDER BY 1)
$$;
(I've replaced my version with Pavel's slightly faster one after discussion elsewhere).
In PostrgreSQL 8.4 and up you can use:
select array_agg(x) from (select unnest(ARRAY[1,5,3,7,2]) AS x order by x) as _;
But it will not be very fast.
In older Postgres you can implement unnest like this
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE
And array_agg like this:
CREATE AGGREGATE array_agg (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
But it will be even slower.
You can also implement any sorting algorithm in pl/pgsql or any other language you can plug in to postgres.
Just use the function unnest():
SELECT
unnest(ARRAY[1,2]) AS x
ORDER BY
x DESC;
See array functions in the Pg docs.
This worked for me from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I#General_array_sort
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
Please see Craig's answer since he is far more more knowledgable on Postgres and has a better answer. Also if possible vote to delete my answer.
Very nice exhibition of PostgreSQL's features is general procedure for sorting by David Fetter.
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
If you're looking for a solution which will work across any data-type, I'd recommend taking the approach laid out at YouLikeProgramming.com.
Essentially, you can create a stored procedure (code below) which performs the sorting for you, and all you need to do is pass your array to that procedure for it to be sorted appropriately.
I have also included an implementation which does not require the use of a stored procedure, if you're looking for your query to be a little more transportable.
Creating the stored procedure
DROP FUNCTION IF EXISTS array_sort(anyarray);
CREATE FUNCTION
array_sort(
array_vals_to_sort anyarray
)
RETURNS TABLE (
sorted_array anyarray
)
AS $BODY$
BEGIN
RETURN QUERY SELECT
ARRAY_AGG(val) AS sorted_array
FROM
(
SELECT
UNNEST(array_vals_to_sort) AS val
ORDER BY
val
) AS sorted_vals
;
END;
$BODY$
LANGUAGE plpgsql;
Sorting array values (works with any array data-type)
-- The following will return: {1,2,3,4}
SELECT ARRAY_SORT(ARRAY[4,3,2,1]);
-- The following will return: {in,is,it,on,up}
SELECT ARRAY_SORT(ARRAY['up','on','it','is','in']);
Sorting array values without a stored procedure
In the following query, simply replace ARRAY[4,3,2,1] with your array or query which returns an array:
WITH
sorted_vals AS (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
)
SELECT
ARRAY_AGG(val) AS sorted_array
FROM
sorted_vals
... or ...
SELECT
ARRAY_AGG(vals.val) AS sorted_arr
FROM (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
) AS vals
I'm surprised no-one has mentioned the containment operators:
select array[1,2,3] <# array[2,1,3] and array[1,2,3] #> array[2,1,3];
?column?
══════════
t
(1 row)
Notice that this requires that all elements of the arrays must be unique.
(If a contains b and b contains a, they must be the same if all elements are unique)