I have an array of type bigint, how can I remove the duplicate values in that array?
Ex: array[1234, 5343, 6353, 1234, 1234]
I should get array[1234, 5343, 6353, ...]
I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.
I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:
array_agg(DISTINCT value)
This works for me.
The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.
To enable its use, you must install the module.
If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.
If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function
SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
?column?
----------
{1,2,3}
(1 row)
Alternatively you could create your own function to do this
CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
SELECT ARRAY(
SELECT DISTINCT $1[s.i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY 1
);
$body$;
Here is a sample invocation:
SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
array_sort_unique
-------------------
{1,2,3}
(1 row)
... Where the statandard libraries (?) for this kind of array_X utility??
Try to search... See some but no standard:
postgres.cz/wiki/Array_based_functions: good reference!
JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.
wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.
MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".
Simplest and faster array_distinct() snippet-lib function
Here the simplest and perhaps faster implementation for array_unique() or array_distinct():
CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;
NOTE: it works as expected with any datatype, except with array of arrays,
SELECT array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ),
array_distinct( array['3','3','hello','hello','bye'] ),
array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
-- "{1,2,3,4,6,8,99}", "{3,bye,hello}", "{3,5,6}"
the "side effect" is to explode all arrays in a set of elements.
PS: with JSONB arrays works fine,
SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
-- "{"[3, 3]","[5, 6]"}"
Edit: more complex but useful, a "drop nulls" parameter
CREATE FUNCTION array_distinct(
anyarray, -- input array
boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x)
FROM unnest($1) t(x)
WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)
CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
array_agg(distinct_value ORDER BY first_index)
FROM
(SELECT
value AS distinct_value,
min(index) AS first_index
FROM
unnest($1) WITH ORDINALITY AS input(value, index)
GROUP BY
value
) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
I have assembled a set of stored procedures (functions) to combat PostgreSQL's lack of array handling coined anyarray. These functions are designed to work across any array data-type, not just integers as intarray does: https://www.github.com/JDBurnZ/anyarray
In your case, all you'd really need is anyarray_uniq.sql. Copy & paste the contents of that file into a PostgreSQL query and execute it to add the function. If you need array sorting as well, also add anyarray_sort.sql.
From there, you can peform a simple query as follows:
SELECT ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234])
Returns something similar to: ARRAY[1234, 6353, 5343]
Or if you require sorting:
SELECT ANYARRAY_SORT(ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234]))
Return exactly: ARRAY[1234, 5343, 6353]
Here's the "inline" way:
SELECT 1 AS anycolumn, (
SELECT array_agg(c1)
FROM (
SELECT DISTINCT c1
FROM (
SELECT unnest(ARRAY[1234,5343,6353,1234,1234]) AS c1
) AS t1
) AS t2
) AS the_array;
First we create a set from array, then we select only distinct entries, and then aggregate it back into array.
In a single query i did this:
SELECT (select array_agg(distinct val) from ( select unnest(:array_column) as val ) as u ) FROM :your_table;
For people like me who still have to deal with postgres 8.2, this recursive function can eliminate duplicates without altering the sorting of the array
CREATE OR REPLACE FUNCTION my_array_uniq(bigint[])
RETURNS bigint[] AS
$BODY$
DECLARE
n integer;
BEGIN
-- number of elements in the array
n = replace(split_part(array_dims($1),':',2),']','')::int;
IF n > 1 THEN
-- test if the last item belongs to the rest of the array
IF ($1)[1:n-1] #> ($1)[n:n] THEN
-- returns the result of the same function on the rest of the array
return my_array_uniq($1[1:n-1]);
ELSE
-- returns the result of the same function on the rest of the array plus the last element
return my_array_uniq($1[1:n-1]) || $1[n:n];
END IF;
ELSE
-- if array has only one item, returns the array
return $1;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
for exemple :
select my_array_uniq(array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99]);
will give
{3,8,2,6,4,1,99}
Related
Consider a custom aggregate intended to take the set union of a bunch of arrays:
CREATE FUNCTION array_union_step (s ANYARRAY, n ANYARRAY) RETURNS ANYARRAY
AS $$ SELECT s || n; $$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE FUNCTION array_union_final (s ANYARRAY) RETURNS ANYARRAY
AS $$
SELECT array_agg(i ORDER BY i) FROM (
SELECT DISTINCT UNNEST(x) AS i FROM (VALUES(s)) AS v(x)
) AS w WHERE i IS NOT NULL;
$$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE AGGREGATE array_union (ANYARRAY) (
SFUNC = array_union_step,
STYPE = ANYARRAY,
FINALFUNC = array_union_final,
INITCOND = '{}',
PARALLEL = SAFE
);
As I understand it, array concatenation in PostgreSQL copies all the elements of both inputs into a new array, so this is quadratic in the total number of elements (before deduplication). Is there a more efficient alternative without writing extension code in C? (Specifically, using either LANGUAGE SQL or LANGUAGE plpgsql.) For instance, maybe it's possible for the step function to take and return a set of rows somehow?
An example of the kind of data this needs to be able to process:
create temp table demo (tag int, values text[]);
insert into demo values
(1, '{"a", "b"}'),
(2, '{"c", "d"}'),
(1, '{"a"}'),
(2, '{"c", "e", "f"}');
select tag, array_union(values) from demo group by tag;
tag | array_union
-----+-------------
2 | {c,d,e,f}
1 | {a,b}
Note in particular that the built-in array_agg cannot be used with this data, because the arrays are not all the same length:
select tag, array_agg(values) from demo group by tag;
ERROR: cannot accumulate arrays of different dimensionality
Array concatenation is expensive. That's why build-in array_agg() uses the internal array builder structure. Unfortunately, you cannot use this API on the SQL level.
I don't think using temp tables for custom aggregation is correct. Creating and dropping tables is expensive (temp table too) or very expensive (with high frequency), INSERT and SELECT is expensive, too. (Tables have a much more complex format than arrays.) If you need really fast aggregation, then write a C functions and use the array builder.
If you cannot use your own C extension, then use the built-in array_agg() function with deduplication and sort on already aggregated data.
CREATE OR REPLACE FUNCTION array_distinct(anyarray)
RETURNS anyarray AS $$
BEGIN
RETURN ARRAY(SELECT DISTINCT v FROM unnest($1) u(v) WHERE v IS NOT NULL ORDER BY v);
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
Call:
SELECT ..., array_distinct(array_agg(somecolumn)) FROM tab;
I need to loop through type RECORD items by key/index, like I can do this using array structures in other programming languages.
For example:
DECLARE
data1 record;
data2 text;
...
BEGIN
...
FOR data1 IN
SELECT
*
FROM
sometable
LOOP
FOR data2 IN
SELECT
unnest( data1 ) -- THIS IS DOESN'T WORK!
LOOP
RETURN NEXT data1[data2]; -- SMTH LIKE THIS
END LOOP;
END LOOP;
As #Pavel explained, it is not simply possible to traverse a record, like you could traverse an array. But there are several ways around it - depending on your exact requirements. Ultimately, since you want to return all values in the same column, you need to cast them to the same type - text is the obvious common ground, because there is a text representation for every type.
Quick and dirty
Say, you have a table with an integer, a text and a date column.
CREATE TEMP TABLE tbl(a int, b text, c date);
INSERT INTO tbl VALUES
(1, '1text', '2012-10-01')
,(2, '2text', '2012-10-02')
,(3, ',3,ex,', '2012-10-03') -- text with commas
,(4, '",4,"ex,"', '2012-10-04') -- text with commas and double quotes
Then the solution can be a simple as:
SELECT unnest(string_to_array(trim(t::text, '()'), ','))
FROM tbl t;
Works for the first two rows, but fails for the special cases of row 3 and 4.
You can easily solve the problem with commas in the text representation:
SELECT unnest(('{' || trim(t::text, '()') || '}')::text[])
FROM tbl t
WHERE a < 4;
This would work fine - except for line 4 which has double quotes in the text representation. Those are escaped by doubling them up. But the array constructor would need them escaped by \. Not sure why this incompatibility is there ...
SELECT ('{' || trim(t::text, '()') || '}') FROM tbl t WHERE a = 4
Yields:
{4,""",4,""ex,""",2012-10-04}
But you would need:
SELECT '{4,"\",4,\"ex,\"",2012-10-04}'::text[]; -- works
Proper solution
If you knew the column names beforehand, a clean solution would be simple:
SELECT unnest(ARRAY[a::text,b::text,c::text])
FROM tbl
Since you operate on records of well know type you can just query the system catalog:
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = 'tbl'::regclass
AND a.attnum > 0
AND a.attisdropped = FALSE
Put this in a function with dynamic SQL:
CREATE OR REPLACE FUNCTION unnest_table(_tbl text)
RETURNS SETOF text LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE '
SELECT unnest(ARRAY[' || (
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = _tbl::regclass
AND a.attnum > 0
AND a.attisdropped = false
) || '])
FROM ' || _tbl::regclass;
END
$func$;
Call:
SELECT unnest_table('tbl') AS val
Returns:
val
-----
1
1text
2012-10-01
2
2text
2012-10-02
3
,3,ex,
2012-10-03
4
",4,"ex,"
2012-10-04
This works without installing additional modules. Another option is to install the hstore extension and use it like #Craig demonstrates.
PL/pgSQL isn't really designed for what you want to do. It doesn't consider a record to be iterable, it's a tuple of possibly different and incompatible data types.
PL/pgSQL has EXECUTE for dynamic SQL, but EXECUTE queries cannot refer to PL/pgSQL variables like NEW or other records directly.
What you can do is convert the record to a hstore key/value structure, then iterate over the hstore. Use each(hstore(the_record)), which produces a rowset of key,value tuples. All values are cast to their text representations.
This toy function demonstrates iteration over a record by creating an anonymous ROW(..) - which will have column names f1, f2, f3 - then converting that to hstore, iterating over its column/value pairs, and returning each pair.
CREATE EXTENSION hstore;
CREATE OR REPLACE FUNCTION hs_demo()
RETURNS TABLE ("key" text, "value" text)
LANGUAGE plpgsql AS
$$
DECLARE
data1 record;
hs_row record;
BEGIN
data1 = ROW(1, 2, 'test');
FOR hs_row IN SELECT kv."key", kv."value" FROM each(hstore(data1)) kv
LOOP
"key" = hs_row."key";
"value" = hs_row."value";
RETURN NEXT;
END LOOP;
END;
$$;
In reality you would never write it this way, since the whole loop can be replaced with a simple RETURN QUERY statement and it does the same thing each(hstore) does anyway - so this is only to show how each(hstore(record)) works, and the above function should never actually be used.
This feature is not supported in plpgsql - Record IS NOT hash array like other scripting languages - it is similar to C or ADA, where this functionality is impossible. You can use other PL language like PLPerl or PLPython or some tricks - you can iterate with HSTORE datatype (extension) or via dynamic SQL
see How to set value of composite variable field using dynamic SQL
But request for this functionality usually means, so you do some wrong. When you use PL/pgSQL you have think different than you use Javascript or Python
FOR data2 IN
SELECT d
from unnest( data1 ) s(d)
LOOP
RETURN NEXT data2;
END LOOP;
If you order your results prior to looping, will you accomplish what you want.
for rc in select * from t1 order by t1.key asc loop
return next rc;
end loop;
will do exactly what you need. It is also the fastest way to perform that kind of task.
I wasn't able to find a proper way to loop over record, so what I did is converted record to json first and looped over json
declare
_src_schema varchar := 'db_utility';
_targetjson json;
_key text;
_value text;
BEGIN
select row_to_json(c.*) from information_schema.columns c where c.table_name = prm_table and c.column_name = prm_column
and c.table_schema = _src_schema into _targetjson;
raise notice '_targetjson %', _targetjson;
FOR _key, _value IN
SELECT * FROM jsonb_each_text(_targetjson)
LOOP
-- do some math operation on its corresponding value
RAISE NOTICE '%: %', _key, _value;
END LOOP;
return true;
end;
Can functions be stored as anonymous functions directly in column as its value?
Let's say I want this function be stored in column.
Example (pseudocode):
Table my_table: pk (int), my_function (func)
func ( x ) { return x * 100 }
And later use it as:
select
t.my_function(some_input) AS output
from
my_table as t
where t.pk = 1999
Function may vary for each pk.
Your title asks something else than your example.
A function has to be created before you can call it. (title)
An expression has to be evaluated. You would need a meta-function for that. (example)
Here are solutions for both:
1. Evaluate expressions dynamically
You have to take into account that the resulting type can vary. I use polymorphic types for that.
CREATE OR REPLACE FUNCTION f1(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;';
CREATE OR REPLACE FUNCTION f2(text)
RETURNS text
LANGUAGE sql IMMUTABLE AS
$$SELECT $1 || '_foo';$$;
CREATE TABLE my_expr (
expr text PRIMARY KEY
, def text
, rettype regtype
);
INSERT INTO my_expr VALUES
('x', 'f1(3)' , 'int')
, ('y', $$f2('bar')$$, 'text')
, ('z', 'now()' , 'timestamptz')
;
CREATE OR REPLACE FUNCTION f_eval(text, _type anyelement = 'NULL'::text, OUT _result anyelement)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE
'SELECT ' || (SELECT def FROM my_expr WHERE expr = $1)
INTO _result;
END
$func$;
Related:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Call:
SQL is strictly typed, the same result column can only have one data type. For multiple rows with possibly heterogeneous data types, you might settle for type text, as every data type can be cast to and from text:
SELECT *, f_eval(expr) AS result -- default to type text
FROM my_expr;
Or return multplce columns like:
SELECT *
, CASE WHEN rettype = 'text'::regtype THEN f_eval(expr) END AS text_result -- default to type text
, CASE WHEN rettype = 'int'::regtype THEN f_eval(expr, NULL::int) END AS int_result
, CASE WHEN rettype = 'timestamptz'::regtype THEN f_eval(expr, NULL::timestamptz) END AS tstz_result
-- , more?
FROM my_expr;
db<>fiddle here
2. Create and use functions dynamically
It is possible to create functions dynamically and then use them. You cannot do that with plain SQL, however. You will have to use another function to do that or at least an anonymous code block (DO statement), introduced in PostgreSQL 9.0.
It can work like this:
CREATE TABLE my_func (func text PRIMARY KEY, def text);
INSERT INTO my_func VALUES
('f'
, $$CREATE OR REPLACE FUNCTION f(int)
RETURNS int
LANGUAGE sql IMMUTABLE AS
'SELECT $1 * 100;'$$);
CREATE OR REPLACE FUNCTION f_create_func(text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE (SELECT def FROM my_func WHERE func = $1);
END
$func$;
Call:
SELECT f_create_func('f');
SELECT f(3);
db<>fiddle here
You may want to drop the function afterwards.
In most cases you should just create the functions instead and be done with it. Use separate schemas if you have problems with multiple versions or privileges.
For more information on the features I used here, see my related answer on dba.stackexchange.com.
Assuming that my subquery yields a number of rows with the columns (x,y), I would like to calculate the value avg(abs(x-mean)/y). where mean effectively is avg(x).
select avg(abs(x-avg(x))/y) as incline from subquery fails because I cannot nest aggregation functions. Nor can I think of a way to calculate the mean in a subquery while keeping the original result set. An avgdev function as it exists in other dialects would not exactly help me, so here I am stuck. Probably just due to lack of sql knowledge - calculating the value from the result set in postprocessing is easy.
Which SQL construct could help me?
Edit: Server version is 8.3.4. No window functions with WITH or OVER available here.
Not sure I understand you correctly, but you might be looking for something like this:
SELECT avg(x - mean/y)
FROM (
SELECT x,
y,
avg(x) as mean over(partition by your_grouping_column)
FROM your_table
) t
If you do not need to group your results to get the correct avg(x) then simply leave out the "partition by" using an empty over: over()
if your data sets are not too large you could accumulate them into an array and then return the incline from a function:
create type typ as (x numeric, y numeric);
create aggregate array_accum( sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}' );
create or replace function unnest(anyarray) returns setof anyelement
language sql immutable strict as $$
select $1[i] from generate_series(array_lower($1,1), array_upper($1,1)) i;$$;
create function get_incline(typ[]) returns numeric
language sql immutable strict as $$
select avg(abs(x-(select avg(x) from unnest($1)))/y) from unnest($1);$$;
select get_incline((select array_accum((x,y)::typ) from subquery));
sample view for testing:
create view subquery as
select generate_series(1,5) as x, generate_series(1,6) as y;
One option I found is to use a temporary table:
begin;
create temporary table sub on commit drop as (...subquery code...);
select avg(abs(x-mean)/y) as incline from (SELECT x, y, (SELECT avg(x) FROM sub) AS mean FROM sub) as sub2;
commit;
But is that overkill?
I want to write a stored procedure that gets an array as input parameter and sort that array and return the sorted array.
The best way to sort an array of integers is without a doubt to use the intarray extension, which will do it much, much, much faster than any SQL formulation:
CREATE EXTENSION intarray;
SELECT sort( ARRAY[4,3,2,1] );
A function that works for any array type is:
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(SELECT unnest($1) ORDER BY 1)
$$;
(I've replaced my version with Pavel's slightly faster one after discussion elsewhere).
In PostrgreSQL 8.4 and up you can use:
select array_agg(x) from (select unnest(ARRAY[1,5,3,7,2]) AS x order by x) as _;
But it will not be very fast.
In older Postgres you can implement unnest like this
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE
And array_agg like this:
CREATE AGGREGATE array_agg (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
But it will be even slower.
You can also implement any sorting algorithm in pl/pgsql or any other language you can plug in to postgres.
Just use the function unnest():
SELECT
unnest(ARRAY[1,2]) AS x
ORDER BY
x DESC;
See array functions in the Pg docs.
This worked for me from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I#General_array_sort
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
Please see Craig's answer since he is far more more knowledgable on Postgres and has a better answer. Also if possible vote to delete my answer.
Very nice exhibition of PostgreSQL's features is general procedure for sorting by David Fetter.
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
If you're looking for a solution which will work across any data-type, I'd recommend taking the approach laid out at YouLikeProgramming.com.
Essentially, you can create a stored procedure (code below) which performs the sorting for you, and all you need to do is pass your array to that procedure for it to be sorted appropriately.
I have also included an implementation which does not require the use of a stored procedure, if you're looking for your query to be a little more transportable.
Creating the stored procedure
DROP FUNCTION IF EXISTS array_sort(anyarray);
CREATE FUNCTION
array_sort(
array_vals_to_sort anyarray
)
RETURNS TABLE (
sorted_array anyarray
)
AS $BODY$
BEGIN
RETURN QUERY SELECT
ARRAY_AGG(val) AS sorted_array
FROM
(
SELECT
UNNEST(array_vals_to_sort) AS val
ORDER BY
val
) AS sorted_vals
;
END;
$BODY$
LANGUAGE plpgsql;
Sorting array values (works with any array data-type)
-- The following will return: {1,2,3,4}
SELECT ARRAY_SORT(ARRAY[4,3,2,1]);
-- The following will return: {in,is,it,on,up}
SELECT ARRAY_SORT(ARRAY['up','on','it','is','in']);
Sorting array values without a stored procedure
In the following query, simply replace ARRAY[4,3,2,1] with your array or query which returns an array:
WITH
sorted_vals AS (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
)
SELECT
ARRAY_AGG(val) AS sorted_array
FROM
sorted_vals
... or ...
SELECT
ARRAY_AGG(vals.val) AS sorted_arr
FROM (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
) AS vals
I'm surprised no-one has mentioned the containment operators:
select array[1,2,3] <# array[2,1,3] and array[1,2,3] #> array[2,1,3];
?column?
══════════
t
(1 row)
Notice that this requires that all elements of the arrays must be unique.
(If a contains b and b contains a, they must be the same if all elements are unique)