Split array of values into rows in Amazon Redshift - amazon-redshift

How can i split an array of values in a column into corresponding rows in Redshift using a delimiter (,) ?
Input Data:-
—————————————
Empid | Items
—————————————
1001| A, B
1002| B
1003| C, D, E
Required Output:-
—————————————
Empid | Items
—————————————
1001| A
1001| B
1002| B
1003| C
1003| D
1003| E
Any help is appreciated.
Thanks

Based on the official docs, you can do with JOIN!
Let's say your input is:
—————————————
empid | items
—————————————
1001| [A, B]
1002| [B]
1003| [C, D, E]
1004| []
Then you can do it as:
SELECT t.empid, items as item
FROM table_name AS t
LEFT JOIN t.items AS items ON TRUE
This will returns:
—————————————
empid | item
—————————————
1001| A
1001| B
1002| B
1003| C
1003| D
1003| E
1004| <NULL>

There is no way to "functionally" do this in redshift.
Instead you need something like this
select empid,'A' as items from input where items ilike 'A,'
union all
select empid,'B' as items from input where items ilike 'B,'
union all
select empid,'C' as items from input where items ilike 'C,'
union all
select empid,'D' as items from input where items ilike 'D,'
union all
select empid,'E' as items from input where items ilike 'E,'

Actually with the addition of stored procedures to Redshift this is possible
The procedure below accepts two parameters (source_table and target_table)
assuming both table exists it transforms the data described in the question
The way it works is
Reads data from source table row by row
Finds out max items in Items column
In a loop extracts each item
Inserts id + item combo into target table
CREATE OR REPLACE PROCEDURE Array_to_Rows(source_table VARCHAR, target_table VARCHAR)
LANGUAGE plpgsql
AS $$
DECLARE i INTEGER;
rec RECORD;
query VARCHAR;
item VARCHAR;
cnt INTEGER;
BEGIN
query := 'SELECT * FROM ' || source_table;
FOR rec IN EXECUTE query
LOOP
select INTO cnt regexp_count(rec.items,',')+1;
i := 1;
<< items_loop >>
LOOP
SELECT INTO item trim(split_part(rec.items,',',i));
EXECUTE 'INSERT INTO ' || target_table || ' values (' || rec.Empid || ',''' || item ||''')';
i := i + 1;
EXIT items_loop WHEN (i > cnt);
END LOOP;
END LOOP;
END;
$$
Usage: CALL Array_to_Rows('source table name','target table name')
With test data in the question it took less than 0.2 seconds, don't know how big OPs data set is
Output is
Empid item
1001 A
1001 B
1002 B
1003 C
1003 D
1003 E

Related

How to loop through columns within a row

I have a row, let it be in this format
DECLARE
a t1%ROWTYPE;
BEGIN
SELECT * INTO a FROM t1 WHERE id=<some_id>
-- a = id: <some_id>, name: "some_name", description: "some_descr"
END;
And I need to insert one row per column into t2
t2 TABLE
column_name TEXT, value JSONB
Excepted result:
column_name | value
--------------------
id | '"some_id"'
name | '"some_name"'
description | '"some_descr"'
How can I do it?
No need for PL/pgSQL or a loop. You can convert the row from t1 to a JSON value, then turn those key/value pairs into rows:
insert into t2 (column_name, value)
select x.col, to_jsonb(x.val)
from t1
cross join jsonb_each_text(to_jsonb(t1)) as x(col, val)
where t1.id = 42;

How to count values in a column that come after a particular value

I have a column that has two types of values A or B. I need to find the count of values that come after the first occurrence of A.
eg
column
B
B
B
A
B
A
B
The result in this case would be 4 as their are 4 entries after the first occurrence of A(including A)
You can use a sub query to get the id number (of whatever column you are using to order the columns) of the first 'A'.
CREATE TABLE t (
id serial,
col char(1));
insert into t (col) values
('B'),
('B'),
('A'),
('B'),
('A'),
('B')
✓
6 rows affected
select
count(*) NUM
from t
where id >=
(select MIN(id)
from t
where col = 'A');
| num |
| --: |
| 4 |
db<>fiddle here

Query with condition on array items in PostgreSQL

I would like to select rows in a table in which a certain number of items in an array column meet a comparison condition (>= n). Is this possible without using unnest?
unnest() is a natural way to count filtered elements in an array.
However, you can hide this in an sql function like this:
create or replace function number_of_elements(arr int[], val int)
returns bigint language sql
as $$
select count(*)
from unnest(arr) e
where e > val;
$$;
with test(id, arr) as (
values
(1, array[1,2,3,4]),
(2, array[3,4,5,6]))
select id, arr, number_of_elements(arr, 3)
from test;
id | arr | number_of_elements
----+-----------+--------------------
1 | {1,2,3,4} | 1
2 | {3,4,5,6} | 3
(2 rows)

Get complex output type of function that returns record in postgresql

I want to write nice and detailed report on functions in my postgresql database.
I built the following query:
SELECT routine_name, data_type, proargnames
FROM information_schema.routines
join pg_catalog.pg_proc on pg_catalog.pg_proc.proname = information_schema.routines.routine_name
WHERE specific_schema = 'public'
ORDER BY routine_name;
It works as it should (basically returns me what I want it to: function name, output data type and input data type) except one thing:
I have relatively complicated functions and many of them return record.
The thing is, data_type returns me record as well for such functions, while I want detailed list of function output types.
For instance, I have something like this in one of my functions:
RETURNS TABLE("Res" integer, "Output" character varying) AS
How can I make query above (or, perhaps, a new query, if it will solve the problem) return something like
integer, character varying instead of record for such functions?
I am using postgresql 9.2
Thanks in advance!
The RECORD returned value is evaluated at runtime, there is no way that the information can be retrieved this way.
BUT, if RETURNS TABLE("Res" integer, "Output" character varying) AS is used, there is a solution.
The test functions I used:
-- first function, uses RETURNS TABLE
CREATE FUNCTION test_ret(a TEXT, b TEXT)
RETURNS TABLE("Res" integer, "Output" character varying) AS $$
DECLARE
ret RECORD;
BEGIN
-- test
END;$$ LANGUAGE plpgsql;
-- second function, test some edge cases
-- same name as above, returns simple integer
CREATE FUNCTION test_ret(a TEXT)
RETURNS INTEGER AS $$
DECLARE
ret RECORD;
BEGIN
-- test
END;$$ LANGUAGE plpgsql;
How to retrieve this function return datatype is easy as it's stored into pg_catalog.pg_proc.proallargtypes, the problem is that this is an array of OID. We must unnest this thing and join it to pg_catalog.pg_types.oid.
-- edit: add support for function not returning tables, thx Tommaso Di Bucchianico
WITH pg_proc_with_unnested_proallargtypes AS (
SELECT
pg_catalog.pg_proc.oid,
pg_catalog.pg_proc.proname,
CASE WHEN proallargtypes IS NOT NULL THEN unnest(proallargtypes) ELSE null END AS proallargtype
FROM pg_catalog.pg_proc
JOIN pg_catalog.pg_namespace ON pg_catalog.pg_proc.pronamespace = pg_catalog.pg_namespace.oid
WHERE pg_catalog.pg_namespace.nspname = 'public'
),
pg_proc_with_proallargtypes_names AS (
SELECT
pg_proc_with_unnested_proallargtypes.oid,
pg_proc_with_unnested_proallargtypes.proname,
array_agg(pg_catalog.pg_type.typname) AS proallargtypes
FROM pg_proc_with_unnested_proallargtypes
LEFT JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = proallargtype
GROUP BY
pg_proc_with_unnested_proallargtypes.oid,
pg_proc_with_unnested_proallargtypes.proname
)
SELECT
information_schema.routines.specific_name,
information_schema.routines.routine_name,
information_schema.routines.routine_schema,
information_schema.routines.data_type,
pg_proc_with_proallargtypes_names.proallargtypes
FROM information_schema.routines
-- we can declare many function with the same name and schema as long as arg types are different
-- This is the only right way to join pg_catalog.pg_proc and information_schema.routines, sadly
JOIN pg_proc_with_proallargtypes_names
ON pg_proc_with_proallargtypes_names.proname || '_' || pg_proc_with_proallargtypes_names.oid = information_schema.routines.specific_name
;
Any refactoring is welcome :)
Here is the result:
specific_name | routine_name | routine_schema | data_type | proallargtypes
----------------+--------------+----------------+-----------+--------------------------
test_ret_16633 | test_ret | public | record | {text,text,int4,varchar}
test_ret_16635 | test_ret | public | integer | {NULL}
(2 rows)
EDIT
Identification of input and output arguments is not trivial, here is my solution for pg 9.2
-- https://gist.github.com/subssn21/e9e121f6fd5ff50f688d
-- Allow us to use array_remove in pg < 9.3
CREATE OR REPLACE FUNCTION array_remove(a ANYARRAY, e ANYELEMENT)
RETURNS ANYARRAY AS $$
BEGIN
RETURN array(SELECT x FROM unnest(a) x WHERE x <> e);
END;
$$ LANGUAGE plpgsql;
-- edit: add support for function not returning tables, thx Tommaso Di Bucchianico
WITH pg_proc_with_unnested_proallargtypes AS (
SELECT
pg_catalog.pg_proc.oid,
pg_catalog.pg_proc.proname,
pg_catalog.pg_proc.proargmodes,
CASE WHEN proallargtypes IS NOT NULL THEN unnest(proallargtypes) ELSE null END AS proallargtype
FROM pg_catalog.pg_proc
JOIN pg_catalog.pg_namespace ON pg_catalog.pg_proc.pronamespace = pg_catalog.pg_namespace.oid
WHERE pg_catalog.pg_namespace.nspname = 'public'
),
pg_proc_with_unnested_proallargtypes_names_and_mode AS (
SELECT
pg_proc_with_unnested_proallargtypes.oid,
pg_proc_with_unnested_proallargtypes.proname,
pg_catalog.pg_type.typname,
-- we can't unnest multiple array of same length the way we expect in pg 9.2
-- just retrieve each mode manually using type row_number
pg_proc_with_unnested_proallargtypes.proargmodes[row_number() OVER w] AS proargmode
FROM pg_proc_with_unnested_proallargtypes
LEFT JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = proallargtype
WINDOW w AS (PARTITION BY pg_proc_with_unnested_proallargtypes.proname)
),
pg_proc_with_input_and_output_type_names AS (
SELECT
pg_proc_with_unnested_proallargtypes_names_and_mode.oid,
pg_proc_with_unnested_proallargtypes_names_and_mode.proname,
array_agg(pg_proc_with_unnested_proallargtypes_names_and_mode.typname) AS proallargtypes,
-- we should use FILTER, but that's not available in pg 9.2 :(
array_remove(array_agg(
-- see documentation for proargmodes here: http://www.postgresql.org/docs/9.2/static/catalog-pg-proc.html
CASE WHEN pg_proc_with_unnested_proallargtypes_names_and_mode.proargmode = ANY(ARRAY['i', 'b', 'v'])
THEN pg_proc_with_unnested_proallargtypes_names_and_mode.typname
ELSE NULL END
), NULL) AS proinputargtypes,
array_remove(array_agg(
-- see documentation for proargmodes here: http://www.postgresql.org/docs/9.2/static/catalog-pg-proc.html
CASE WHEN pg_proc_with_unnested_proallargtypes_names_and_mode.proargmode = ANY(ARRAY['o', 'b', 't'])
THEN pg_proc_with_unnested_proallargtypes_names_and_mode.typname
ELSE NULL END
), NULL) AS prooutputargtypes
FROM pg_proc_with_unnested_proallargtypes_names_and_mode
GROUP BY
pg_proc_with_unnested_proallargtypes_names_and_mode.oid,
pg_proc_with_unnested_proallargtypes_names_and_mode.proname
)
SELECT
*
FROM pg_proc_with_input_and_output_type_names
;
And here is my sample output:
oid | proname | proallargtypes | proinputargtypes | prooutputargtypes
-------+--------------+--------------------------+------------------+-------------------
16633 | test_ret | {text,text,int4,varchar} | {text,text} | {int4,varchar}
16634 | array_remove | {NULL} | {} | {}
16635 | test_ret | {NULL} | {} | {}
(3 rows)
Hope that helps :)
Answer, provided by Clément Prévost, is detailed and educative and therefore I marked it as best, while, however, after I executed suggested script I ended up with empty (filled only by {}) proinputargtypes and prooutputargtypes columnes on my machine. So I conducted a little research on my own, using hints that I learned from the answer above, and wrote following query:
WITH pg_proc_with_unhandled_proallargtypes AS (
SELECT
pg_catalog.pg_proc.oid,
pg_catalog.pg_proc.proname,
pg_catalog.pg_proc.proargmodes,
CASE WHEN proallargtypes IS NOT NULL THEN cast(proallargtypes AS text) ELSE NULL END AS proallargtype,
CASE WHEN array_agg(proargtypes) IS NOT NULL THEN replace(string_agg(proargtypes::text, ','), ' ', ',') ELSE NULL END AS proargtype
FROM pg_catalog.pg_proc
JOIN pg_catalog.pg_namespace ON pg_catalog.pg_proc.pronamespace = pg_catalog.pg_namespace.oid
WHERE pg_catalog.pg_namespace.nspname = 'public'
GROUP BY pg_catalog.pg_proc.oid, pg_catalog.pg_proc.proname, pg_catalog.pg_proc.proargmodes, proallargtype
),
pg_proc_with_unnested_proallargtypes AS(
SELECT proname, CASE WHEN char_length(proargtype) =0 THEN NULL
ELSE ('{' || proargtype || '}')::oid[] end AS inp,
CASE WHEN proallargtype is NULL THEN NULL ELSE replace(proallargtype, proargtype || ',', '')::oid[] end AS output
FROM pg_proc_with_unhandled_proallargtypes
),
smth_input AS(
SELECT proname, unnest(inp) AS inp FROM pg_proc_with_unnested_proallargtypes
),
smth_output AS(
SELECT proname, unnest(output) AS output FROM pg_proc_with_unnested_proallargtypes
),
input_unnested AS(
SELECT proname, string_agg(pg_catalog.pg_type.typname::text, ',') AS fin_input FROM smth_input
JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = inp
GROUP BY proname
),
output_unnested AS (
SELECT proname, string_agg(pg_catalog.pg_type.typname::text, ',') AS fin_output FROM smth_output
JOIN pg_catalog.pg_type ON pg_catalog.pg_type.oid = output
GROUP BY proname
)
SELECT input_unnested.proname, fin_input, CASE WHEN fin_output IS NOT NULl THEN fin_output ELSE information_schema.routines.data_type end AS fin_output
FROM input_unnested
LEFT JOIN output_unnested ON input_unnested.proname = output_unnested.proname
JOIN information_schema.routines ON information_schema.routines.routine_name = input_unnested.proname
It might be inefficient a bit and I probably used too much explicict type casting, but it worked.

PostgreSQL function or stored procedure that outputs multiple columns?

Here is what I ideally want. Imagine that I have a table with the row A.
I want to do:
SELECT A, func(A) FROM table
and for the output to have say 4 columns.
Is there any way to do this? I have seen things on custom types or whatever that let you sort of get a result that would look like
A,(B,C,D)
But it would be really great if I could have that one function return multiple columns without any more finagling.
Is there anything that can do something like this?
If the function func returns only 1 row with 3 values, such as:
CREATE OR REPLACE FUNCTION func
(
input_val integer,
OUT output_val1 integer,
OUT output_val2 integer,
OUT output_val3 integer
)
AS $$
BEGIN
output_val1 := input_val + 1;
output_val2 := input_val + 2;
output_val3 := input_val + 3;
END;
$$ LANGUAGE plpgsql;
and you then execute SELECT a, func(a) FROM table1 you'll get:
a | func
integer | record
========|==========
1 | (2, 3, 4)
2 | (3, 4, 5)
3 | (4, 5, 6)
but, if you execute:
SELECT a, (f).output_val1, (f).output_val2, (f).output_val3
FROM (SELECT a, func(a) AS f FROM table1) AS x
you'll get:
a | output_val1 | output_val2 | output_val3
integer | integer | integer | integer
========|=============|=============|=============
1 | 2 | 3 | 4
2 | 3 | 4 | 5
3 | 4 | 5 | 6
or, using CTE (Common Table Expressions), if you execute:
WITH temp AS (SELECT a, func(a) AS f FROM table1)
SELECT a, (f).output_val1, (f).output_val2, (f).output_val3 FROM temp
you'll also get:
a | output_val1 | output_val2 | output_val3
integer | integer | integer | integer
========|=============|=============|=============
1 | 2 | 3 | 4
2 | 3 | 4 | 5
3 | 4 | 5 | 6
Note: you may also use the following queries to obtain the same results:
SELECT a, (f).*
FROM (SELECT a, func(a) AS f FROM table1) AS x
or
WITH temp AS (SELECT a, func(a) AS f FROM table1)
SELECT a, (f).* FROM temp
I agree with bambam's answer but would like to point out that JackPDouglas's more succinct syntax SELECT a, (func(a)).* FROM table1, from my tests, would actually execute the function once for each column returned whereas the CTE expression will only execute the function once. So the CTE expression is preferred if the function takes a long time to execute.
If the function always returns 3 columns, you can do something like that:
CREATE TYPE sometype AS (b INT, c TEXT, d TEXT);
CREATE OR REPLACE FUNCTION func(a TEXT) RETURNS SETOF sometype AS $$
BEGIN
RETURN QUERY EXECUTE 'SELECT b, c, d FROM ' || a;
END;
$$ LANGUAGE plpgsql;
SELECT a, (f).b, (f).c, (f).d
FROM (SELECT a, func(a) AS f FROM table) x;
If you can access the table from within a view, maybe you can create a view in some way
CREATE VIEW v AS
SELECT 'tab1' AS a, b, c, d FROM tab1 WHERE 'tab1' IN (SELECT a FROM table)
UNION
SELECT 'tab2' AS a, b, c, d FROM tab2 WHERE 'tab2' IN (SELECT a FROM table)
UNION
SELECT 'tab3' AS a, b, c, d FROM tab3 WHERE 'tab3' IN (SELECT a FROM table);
then it's just a SELECT * FROM v. But again this looks like Inheritance could be used.
I think you will want to return a single record, with multiple columns? In that case you can use the return-type RECORD for example. This will allow you to return an anonymous variable with as many columns as you want. You can find more information about all the different variables here:
http://www.postgresql.org/docs/9.0/static/plpgsql-declarations.html
And about return types:
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-OUTPUT-PARAMETERS
If you want to return multiple records with multiple columns, first check and see if you have to use a stored procedure for this. It might be an option to just use a VIEW (and query it with a WHERE-clause) instead. If that's not a good option, there is the possibility of returning a TABLE from a stored procedure in version 9.0.