I have 2 tables in Postgresql with the same schema, the only difference is that in one of the table id field is of type bigint. Schema of the table I need to fill with data looks like this:
create table test_int_table(
id int,
description text,
hash_code int
);
I need to copy the data from test_table with bigint id to public.test_int_table. And some of the values which are bigger than id range should be filtered out. How can I track those values without hardcoding the range?
I can do something like this, but I would like to build more generic solution:
insert into test_int_table
select * from test_table as test
where test.id not between 2147483647 and 9223372036854775808
By generic I mean without constraints on the columns names and their number. So that in case, I have multiple columns of bigint type in other tables how can I filter all of their columns values generically (without specifying a column name)?
There is no generic solution, as far as I can tell.
But I would write it as
INSERT INTO test_int_table
SELECT *
FROM test_table AS t
WHERE t.id BETWEEN -2147483647 AND 2147483647;
You can do something like this if you want to track :
Create a function like this :
CREATE OR REPLACE FUNCTION convert_to_integer(v_input bigint)
RETURNS INTEGER AS $$
DECLARE v_int_value INTEGER DEFAULT NULL;
BEGIN
BEGIN
v_int_value := v_input::INTEGER;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Invalid integer value: "%". Returning NULL.', v_input;
RETURN NULL;
END;
RETURN v_int_value;
END;
$$ LANGUAGE plpgsql;
and write a query like this :
INSERT INTO test_int_table SELECT * FROM test_table AS t WHERE convert_to_integer(t.id) is not null;
Or you can modify a function to return 0.
I have an array of type bigint, how can I remove the duplicate values in that array?
Ex: array[1234, 5343, 6353, 1234, 1234]
I should get array[1234, 5343, 6353, ...]
I tested out the example SELECT uniq(sort('{1,2,3,2,1}'::int[])) in the postgres manual but it is not working.
I faced the same. But an array in my case is created via array_agg function. And fortunately it allows to aggregate DISTINCT values, like:
array_agg(DISTINCT value)
This works for me.
The sort(int[]) and uniq(int[]) functions are provided by the intarray contrib module.
To enable its use, you must install the module.
If you don't want to use the intarray contrib module, or if you have to remove duplicates from arrays of different type, you have two other ways.
If you have at least PostgreSQL 8.4 you could take advantage of unnest(anyarray) function
SELECT ARRAY(SELECT DISTINCT UNNEST('{1,2,3,2,1}'::int[]) ORDER BY 1);
?column?
----------
{1,2,3}
(1 row)
Alternatively you could create your own function to do this
CREATE OR REPLACE FUNCTION array_sort_unique (ANYARRAY) RETURNS ANYARRAY
LANGUAGE SQL
AS $body$
SELECT ARRAY(
SELECT DISTINCT $1[s.i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY 1
);
$body$;
Here is a sample invocation:
SELECT array_sort_unique('{1,2,3,2,1}'::int[]);
array_sort_unique
-------------------
{1,2,3}
(1 row)
... Where the statandard libraries (?) for this kind of array_X utility??
Try to search... See some but no standard:
postgres.cz/wiki/Array_based_functions: good reference!
JDBurnZ/postgresql-anyarray, good initiative but needs some collaboration to enhance.
wiki.postgresql.org/Snippets, frustrated initiative, but "offcial wiki", needs some collaboration to enhance.
MADlib: good! .... but it is an elephant, not an "pure SQL snippets lib".
Simplest and faster array_distinct() snippet-lib function
Here the simplest and perhaps faster implementation for array_unique() or array_distinct():
CREATE FUNCTION array_distinct(anyarray) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x) FROM unnest($1) t(x);
$f$ LANGUAGE SQL IMMUTABLE;
NOTE: it works as expected with any datatype, except with array of arrays,
SELECT array_distinct( array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99] ),
array_distinct( array['3','3','hello','hello','bye'] ),
array_distinct( array[array[3,3],array[3,3],array[3,3],array[5,6]] );
-- "{1,2,3,4,6,8,99}", "{3,bye,hello}", "{3,5,6}"
the "side effect" is to explode all arrays in a set of elements.
PS: with JSONB arrays works fine,
SELECT array_distinct( array['[3,3]'::JSONB, '[3,3]'::JSONB, '[5,6]'::JSONB] );
-- "{"[3, 3]","[5, 6]"}"
Edit: more complex but useful, a "drop nulls" parameter
CREATE FUNCTION array_distinct(
anyarray, -- input array
boolean DEFAULT false -- flag to ignore nulls
) RETURNS anyarray AS $f$
SELECT array_agg(DISTINCT x)
FROM unnest($1) t(x)
WHERE CASE WHEN $2 THEN x IS NOT NULL ELSE true END;
$f$ LANGUAGE SQL IMMUTABLE;
Using DISTINCT implicitly sorts the array. If the relative order of the array elements needs to be preserved while removing duplicates, the function can be designed like the following: (should work from 9.4 onwards)
CREATE OR REPLACE FUNCTION array_uniq_stable(anyarray) RETURNS anyarray AS
$body$
SELECT
array_agg(distinct_value ORDER BY first_index)
FROM
(SELECT
value AS distinct_value,
min(index) AS first_index
FROM
unnest($1) WITH ORDINALITY AS input(value, index)
GROUP BY
value
) AS unique_input
;
$body$
LANGUAGE 'sql' IMMUTABLE STRICT;
I have assembled a set of stored procedures (functions) to combat PostgreSQL's lack of array handling coined anyarray. These functions are designed to work across any array data-type, not just integers as intarray does: https://www.github.com/JDBurnZ/anyarray
In your case, all you'd really need is anyarray_uniq.sql. Copy & paste the contents of that file into a PostgreSQL query and execute it to add the function. If you need array sorting as well, also add anyarray_sort.sql.
From there, you can peform a simple query as follows:
SELECT ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234])
Returns something similar to: ARRAY[1234, 6353, 5343]
Or if you require sorting:
SELECT ANYARRAY_SORT(ANYARRAY_UNIQ(ARRAY[1234,5343,6353,1234,1234]))
Return exactly: ARRAY[1234, 5343, 6353]
Here's the "inline" way:
SELECT 1 AS anycolumn, (
SELECT array_agg(c1)
FROM (
SELECT DISTINCT c1
FROM (
SELECT unnest(ARRAY[1234,5343,6353,1234,1234]) AS c1
) AS t1
) AS t2
) AS the_array;
First we create a set from array, then we select only distinct entries, and then aggregate it back into array.
In a single query i did this:
SELECT (select array_agg(distinct val) from ( select unnest(:array_column) as val ) as u ) FROM :your_table;
For people like me who still have to deal with postgres 8.2, this recursive function can eliminate duplicates without altering the sorting of the array
CREATE OR REPLACE FUNCTION my_array_uniq(bigint[])
RETURNS bigint[] AS
$BODY$
DECLARE
n integer;
BEGIN
-- number of elements in the array
n = replace(split_part(array_dims($1),':',2),']','')::int;
IF n > 1 THEN
-- test if the last item belongs to the rest of the array
IF ($1)[1:n-1] #> ($1)[n:n] THEN
-- returns the result of the same function on the rest of the array
return my_array_uniq($1[1:n-1]);
ELSE
-- returns the result of the same function on the rest of the array plus the last element
return my_array_uniq($1[1:n-1]) || $1[n:n];
END IF;
ELSE
-- if array has only one item, returns the array
return $1;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
for exemple :
select my_array_uniq(array[3,3,8,2,6,6,2,3,4,1,1,6,2,2,3,99]);
will give
{3,8,2,6,4,1,99}
I want to write a stored procedure that gets an array as input parameter and sort that array and return the sorted array.
The best way to sort an array of integers is without a doubt to use the intarray extension, which will do it much, much, much faster than any SQL formulation:
CREATE EXTENSION intarray;
SELECT sort( ARRAY[4,3,2,1] );
A function that works for any array type is:
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(SELECT unnest($1) ORDER BY 1)
$$;
(I've replaced my version with Pavel's slightly faster one after discussion elsewhere).
In PostrgreSQL 8.4 and up you can use:
select array_agg(x) from (select unnest(ARRAY[1,5,3,7,2]) AS x order by x) as _;
But it will not be very fast.
In older Postgres you can implement unnest like this
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE
And array_agg like this:
CREATE AGGREGATE array_agg (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
But it will be even slower.
You can also implement any sorting algorithm in pl/pgsql or any other language you can plug in to postgres.
Just use the function unnest():
SELECT
unnest(ARRAY[1,2]) AS x
ORDER BY
x DESC;
See array functions in the Pg docs.
This worked for me from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I#General_array_sort
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
Please see Craig's answer since he is far more more knowledgable on Postgres and has a better answer. Also if possible vote to delete my answer.
Very nice exhibition of PostgreSQL's features is general procedure for sorting by David Fetter.
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[s.i] AS "foo"
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
ORDER BY foo
);
$$;
If you're looking for a solution which will work across any data-type, I'd recommend taking the approach laid out at YouLikeProgramming.com.
Essentially, you can create a stored procedure (code below) which performs the sorting for you, and all you need to do is pass your array to that procedure for it to be sorted appropriately.
I have also included an implementation which does not require the use of a stored procedure, if you're looking for your query to be a little more transportable.
Creating the stored procedure
DROP FUNCTION IF EXISTS array_sort(anyarray);
CREATE FUNCTION
array_sort(
array_vals_to_sort anyarray
)
RETURNS TABLE (
sorted_array anyarray
)
AS $BODY$
BEGIN
RETURN QUERY SELECT
ARRAY_AGG(val) AS sorted_array
FROM
(
SELECT
UNNEST(array_vals_to_sort) AS val
ORDER BY
val
) AS sorted_vals
;
END;
$BODY$
LANGUAGE plpgsql;
Sorting array values (works with any array data-type)
-- The following will return: {1,2,3,4}
SELECT ARRAY_SORT(ARRAY[4,3,2,1]);
-- The following will return: {in,is,it,on,up}
SELECT ARRAY_SORT(ARRAY['up','on','it','is','in']);
Sorting array values without a stored procedure
In the following query, simply replace ARRAY[4,3,2,1] with your array or query which returns an array:
WITH
sorted_vals AS (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
)
SELECT
ARRAY_AGG(val) AS sorted_array
FROM
sorted_vals
... or ...
SELECT
ARRAY_AGG(vals.val) AS sorted_arr
FROM (
SELECT
UNNEST(ARRAY[4,3,2,1]) AS val
ORDER BY
val
) AS vals
I'm surprised no-one has mentioned the containment operators:
select array[1,2,3] <# array[2,1,3] and array[1,2,3] #> array[2,1,3];
?column?
══════════
t
(1 row)
Notice that this requires that all elements of the arrays must be unique.
(If a contains b and b contains a, they must be the same if all elements are unique)