Redshift. Convert comma delimited values into rows with all combinations - amazon-redshift

I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell
I would like to see:
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | start,stop
1 | Shone | start,cancell
1 | Shone | start,stop,cancell
1 | Shone | stop
1 | Shone | stop,cancell
1 | Shone | cancell
....

You can create the following Python UDF:
create or replace function get_unique_combinations(list varchar(max))
returns varchar(max)
stable as $$
from itertools import combinations
arr = list.split(',')
response = []
for L in range(1, len(arr)+1):
for subset in combinations(arr, L):
response.append(','.join(subset))
return ';'.join(response)
$$ language plpythonu;
that will take your list of actions and return unique combinations separated by semicolon (elements in combinations themselves will be separated by commas). Then you use a UNION hack to split values into separate rows like this:
WITH unique_combinations as (
SELECT
user_id
,user_name
,get_unique_combinations(user_actions) as action_combinations
FROM your_table
)
,unwrap_lists as (
SELECT
user_id
,user_name
,split_part(action_combinations,';',1) as parsed_action
FROM unique_combinations
UNION ALL
SELECT
user_id
,user_name
,split_part(action_combinations,';',2) as parsed_action
FROM unique_combinations
-- as much UNIONS as possible combinations you have for a single element, with the 3rd parameter (1-based array index) increasing by 1
)
SELECT *
FROM unwrap_lists
WHERE parsed_action is not null

Related

DB2: How to transpose mutlidimensional table from row to column to find data changes across rows

I am trying the following with Db2:
Problem
So I've got a table with 80+ columns and two rows.
I need to accomplish is checking what columns have changed value between the two rows, and return a table of the column names that have changed, their initial value from row1, and their new value from row2.
Approach so far
My initial idea was to perform a pivot of the two rows into two columns, row 1 as column 1, row 2 as column 2, then join a column of column names (likely taken from syscat.columns) to the table as column 3, at which point I can then do a select where column1 != column2, hence returning the rows with all the data needed. But alas, it was not long after coming up with this that I discover DB2 doesn't support pivot / unpivot...
Question
So is there any idea for how to accomplish this in DB2, taking a table with 80+ columns and two rows like so:
| Col A | Col B | Col C | ... | Col Z|
| ----- | ----- | ----- | --- | ---- |
| Val A | Val B | 123 | ... | 01/01/2021 |
| Val C | Val B | 124 | ... | 02/01/2021 |
And returning a table with the columns changed, their initial value, and their new value:
| Initial | New | ColName|
| ----- | ----- | ----- |
| Val A | Val C | Col A |
| 123 | 124 | Col C |
| 01/01/2021 | 02/01/2021 | Col Z |
Also note the column data types also vary, so will need to be converted to varchar
DB2 version is 11.1
EDIT: Also for reference as per comment request, this is code I attempted to use to achieve this goal:
WITH
INIT AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MIN(SOMEDATE) FROM TABLE),
LATE AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MAX(SOMEDATE) FROM TABLE),
COLS AS (SELECT COLNAME FROM SYSCAT.COLUMNS WHERE TABNAME='TABLE' ORDER BY COLNO)
SELECT * FROM (
SELECT
COLNAME AS ATTRIBUTE,
(SELECT COLNAME AS INITIAL FROM INIT),
(SELECT COLNAME AS NEW FROM LATE)
FROM
COLS
WHERE
(INITIAL != NEW) OR (INITIAL IS NULL AND NEW IS NOT NULL) OR (INITIAL IS NOT NULL AND NEW IS NULL));
Only issue with this one is that I couldn't figure how to use the values from the COLS table as the columns to be selected
You may easily generate text of the expressions needed, if you don't want to type them manually.
Consider the following example, if you want to print different column values only in 2 rows of the same quite a wide table SYSCAT.TABLES. We use the following query for such an expression generation.
SELECT
'DECODE(I.I, '
|| LISTAGG(COLNO || ', A.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS INITIAL' AS EXPR_INITIAL
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', B.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS NEW' AS EXPR_NEW
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', ''' || COLNAME || '''', ', ')
|| ') AS COLNAME' AS EXPR_COLNAME
FROM SYSCAT.COLUMNS C
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB';
It doesn't matter how many columns the table contains. We just filter out the columns of *LOB types as an example. If you want them as well, you should change the ::VARCHAR(128) casting to some ::CLOB(XXX).
These 3 generated expressions we put to the corresponding places in the query below:
WITH MYTAB AS
(
-- We enumerate the rows to reference them later
SELECT ROWNUMBER() OVER () RN_, T.*
FROM SYSCAT.TABLES T
WHERE TABSCHEMA = 'SYSCAT'
FETCH FIRST 2 ROWS ONLY
)
SELECT *
FROM
(
SELECT
-- Place here the result got in the EXPR_INITIAL column
-- , Place here the result got in the EXPR_NEW column
-- , Place here the result got in the EXPR_COLNAME column
FROM MYTAB A, MYTAB B
,
(
SELECT COLNO AS I
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB'
) I
WHERE A.RN_ = 1 AND B.RN_ = 2
)
WHERE INITIAL IS DISTINCT FROM NEW;
The result I got in my database:
|INITIAL |NEW |COLNAME |
|--------------------------|--------------------------|---------------|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|ALTER_TIME |
|26 |15 |COLCOUNT |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|CREATE_TIME |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|INVALIDATE_TIME|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|LAST_REGEN_TIME|
|ATTRIBUTES |AUDITPOLICIES |TABNAME |

Postgresql replace string in table from another table

I have a problem with two tables, those used for accounting
First table named tabela1 have a set of symbol and account. The second table is the symbol, the name to be changed in the first table and the record number of the first table.
Tabela1 is a:
ID |KNT_S_WN | KNT_S_MA |
1 |3021-_R | 3021-_K-_W|
2 |_W-_R | _Z |
Tabelas is a:
ID | SYMBOL |REP |
1 | _R |7Z45 |
1 | _K |321-05 |
1 | _W |490 |
2 | _W |C1 |
2 | _R |C17 |
2 | _Z |320 |
I need this output:
ID |KNT_S_WN | KNT_S_MA |
1 |3021-7Z45 | 3021-321-05-490|
2 |C1-C17 | 320 |
I try this:
update tabela set
knt_s_wn=replace(knt_s_wn,
(select symbol from tabelas where tabela.id=tabelas.id and position(tabelas.symbol in knt_s_wn)>0),
(select a from tabelas where tabela.id=tabelas.id and position(tabelas.symbol in knt_s_wn)>0))
If I use this expression, if it is not knt_s_wn symbol is deleted (blank).
Please help me!!!
One of the simplest solution is to replace strings in a loop inside a plpgsql function:
create or replace function multireplace(aid int, str text)
returns text language plpgsql as $$
declare
rec record;
begin
for rec in
select *
from tabelas
where id = aid
loop
str:= replace(str, rec.symbol, rec.rep);
end loop;
return str;
end $$;
Test it here.
Pure sql solution (ie: without procedural sql) to get:
I need this output:
ID |KNT_S_WN | KNT_S_MA |
1 |3021-7Z45 | 3021-321-05-490|
2 |C1-C17 | 320 |
is below:
with recursive t(id, knt_s_wn, knt_s_ma, symbols, reps) as (
select
tabela.id,
knt_s_wn,
knt_s_ma,
array_agg(symbol),
array_agg(rep)
from tabela
join tabelas on tabelas.id = tabela.id
group by 1, 2, 3
union all
select
id,
replace(knt_s_wn, symbols[1], reps[1]),
replace(knt_s_ma, symbols[1], reps[1]),
array_remove(symbols, symbols[1]),
array_remove(reps, reps[1])
from t
where array_length(symbols, 1) > 0
)
select id, knt_s_wn, knt_s_ma
from t
where symbols = array[]::text[];

postgres expression to select elements from array

I want to select certain elements from an array column. I know you can do it by position, but I want to filter on content. Here's my data:
table_name | column_names
---------------------+---------------------------------------------------------------
attribute_definition | {attribute_type_concept_id}
cohort_definition | {definition_type_concept_id,subject_concept_id}
condition_occurrence | {condition_concept_id,condition_source_concept_id,condition_type_concept_id}
death | {cause_concept_id,cause_source_concept_id,death_impute_concept_id,death_type_concept_id}
device_exposure | {device_concept_id,device_source_concept_id,device_type_concept_id}
drug_exposure | {dose_unit_concept_id,drug_concept_id,drug_source_concept_id,drug_type_concept_id,route_concept_id}
What I would like to say is something like:
SELECT table_name,
array_agg(SELECT colname FROM column_names WHERE colname LIKE '%type%') AS type_cols,
array_agg(SELECT colname FROM column_names WHERE colname NOT LIKE '%type%') AS other_cols
FROM mytable
GROUP BY table_name
And the result I would like would be:
table_name | type_cols | other_cols
----------------------+--------------------------------------------------------------------------------------------------------------
attribute_definition | {attribute_type_concept_id} | {}
cohort_definition | {definition_type_concept_id} | {subject_concept_id}
condition_occurrence | {condition_type_concept_id} | {condition_concept_id,condition_source_concept_id}
death | {death_type_concept_id} | {cause_concept_id,cause_source_concept_id,death_impute_concept_id}
device_exposure | {device_type_concept_id} | {device_concept_id,device_source_concept_id}
drug_exposure | {drug_type_concept_id} | {dose_unit_concept_id,drug_concept_id,drug_source_concept_id,route_concept_id}
So, I want to end up with the same number of rows but different columns. There's gotta be a simple way to do this. Why can't I find it?
unnest is your friend. As in:
SELECT table_name,
array(SELECT colname FROM unnest(column_names) AS colname WHERE colname LIKE '%type%') AS type_cols,
array(SELECT colname FROM unnest(column_names) AS colname WHERE colname NOT LIKE '%type%') AS other_cols
FROM mytable
GROUP BY table_name, column_names
Here is Dan Getz's answer again but in a self-contained statement so it's easily runnable without copying my data.
with grps as
(
with numlist as
(
select '1 - 10' as grp, generate_series(1,10) num
union
select '11 - 20', generate_series(11,20) order by 1,2
)
select grp, array_agg(num) as nums
from numlist
group by 1
)
select grp,
(select array_agg(evens) from unnest(nums) as evens where evens % 2 = 0) as evens,
(select array_agg(odds) from unnest(nums) as odds where odds % 2 != 0) as odds
from grps
group by grp, nums;
grp | evens | odds
---------+------------------+------------------
11 - 20 | {12,14,16,18,20} | {11,13,15,17,19}
1 - 10 | {2,4,6,8,10} | {1,3,5,7,9}

PostgreSQL JSONB grouping array values inside a hash

We have a PostgreSQL jsonb column containing hashes which in turn contain arrays of values:
id | hashes
---------------
1 | {"sources"=>["a","b","c"], "ids"=>[1,2,3]}
2 | {"sources"=>["b","c","d","e","e"], "ids"=>[1,2,3]}
What we'd like to do is create a jsonb query which would return
code | count
---------------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
we've been trying something along the lines of
SELECT jsonb_to_recordset(hashes->>'sources')
but that's not working - any help with this hugely appreciated...
The setup (should be a part of the question, note the proper json syntax):
create table a_table (id int, hashes jsonb);
insert into a_table values
(1, '{"sources":["a","b","c"], "ids":[1,2,3]}'),
(2, '{"sources":["b","c","d","e","e"], "ids":[1,2,3]}');
Use the function jsonb_array_elements():
select code, count(code)
from
a_table,
jsonb_array_elements(hashes->'sources') sources(code)
group by 1
order by 1;
code | count
------+-------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
(5 rows)
SELECT h, count(*)
FROM (
SELECT jsonb_array_elements_text(hashes->'sources') AS h FROM mytable
) sub
GROUP BY h
ORDER BY h;
We finally got this working this way:
SELECT jsonb_array_elements_text(hashes->'sources') as s1,
count(jsonb_array_elements_text(hashes->'sources'))
FROM a_table
GROUP BY s1;
but Klin's solution is more complete and both Klin and Patrick got there quicker than us (thank you both) - so points go to them.

Postgresql Update inside For Loop

I'm new enough to postgresql, and I'm having issues updating a column of null values in a table using a for loop. The table i'm working on is huge so for brevity i'll give a smaller example which should get the point across. Take the following table
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | NULL |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | NULL |
| c | 2 | 3 | NULL |
| d | 4 | 2 | NULL |
+----+----------+----------+
I want to write a for loop which iterates through all of the rows and does some operation
on the values in columns a and b and then inserts a new value in c.
For example, where id = a , update table set C = A*B, or where id = d set C = A + B etc. This would then give me a table like
+----+----------+----------+
| id | A | B | C |
+----+----------+----------+
| a | 1 | 0 | 0 |
| b | 1 | 1 | NULL |
| c | 2 | 4 | NULL |
| a | 3 | 2 | 6 |
| c | 2 | 3 | NULL |
| d | 4 | 2 | 6 |
+----+----------+----------+
So ultimately I'd like to loop through all the rows of the table and update column C according to the value in the "id" column. The function I've written (which isn't giving any errors but also isn't updating anything either) looks like this...
-- DROP FUNCTION some_function();
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
--r integer; not too sure if this needs to be declared or not
result int;
BEGIN
FOR r IN select * from 'table_name'
LOOP
select(
case
when id = 'a' THEN B*C
when id = 'd' THEN B+C
end)
into result;
update table set C = result
WHERE id = '';
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql
I'm sure there's something silly i'm missing, probably around what I'm, returning... void in this case. But as I only want to update existing rows should I need to return anything? There's probably easier ways of doing this than using a loop but I'd like to get it working using this method.
If anyone could point me in the right direction or point out anything blatantly obvious that I'm doing wrong I'd much appreciate it.
Thanks in advance.
No need for a loop or a function, this can be done with a single update statement:
update table_name
set c = case
when id = 'a' then a*b
when id = 'd' then a+b
else c -- don't change anything
end;
SQLFiddle: http://sqlfiddle.com/#!15/b65cb/2
The reason your function isn't doing anything is this:
update table set C = result
WHERE id = '';
You don't have a row with an empty string in the column id. Your function also seems to use the wrong formula: when id = 'a' THEN B*C I guess that should be: then a*b. As C is NULL initially, b*c will also yield null. So even if your update in the loop would find a row, it would update it to NULL.
You are also retrieving the values incorrectly from the cursor.
If you really, really want to do it inefficiently in a loop, the your function should look something like this (not tested!):
CREATE OR REPLACE FUNCTION some_function()
RETURNS void AS
$BODY$
DECLARE
result int;
BEGIN
-- r is a structure that contains an element for each column in the select list
FOR r IN select * from table_name
LOOP
if r.id = 'a' then
result := r.a * r.b;
end if;
if r.id = 'b' then
result := r.a + r.b;
end if;
update table
set C = result
WHERE id = r.id; -- note the where condition that uses the value from the record variable
END LOOP;
END
$BODY$
LANGUAGE plpgsql
But again: if your table is "huge" as you say, the loop is an extremely bad solution. Relational databases are made to deal with "sets" of data. Row-by-row processing is an anti-pattern that will almost always have bad performance.
Or to put it the other way round: doing set-based operations (like my single update example) is always the better choice.