DB2: How to transpose mutlidimensional table from row to column to find data changes across rows - db2

I am trying the following with Db2:
Problem
So I've got a table with 80+ columns and two rows.
I need to accomplish is checking what columns have changed value between the two rows, and return a table of the column names that have changed, their initial value from row1, and their new value from row2.
Approach so far
My initial idea was to perform a pivot of the two rows into two columns, row 1 as column 1, row 2 as column 2, then join a column of column names (likely taken from syscat.columns) to the table as column 3, at which point I can then do a select where column1 != column2, hence returning the rows with all the data needed. But alas, it was not long after coming up with this that I discover DB2 doesn't support pivot / unpivot...
Question
So is there any idea for how to accomplish this in DB2, taking a table with 80+ columns and two rows like so:
| Col A | Col B | Col C | ... | Col Z|
| ----- | ----- | ----- | --- | ---- |
| Val A | Val B | 123 | ... | 01/01/2021 |
| Val C | Val B | 124 | ... | 02/01/2021 |
And returning a table with the columns changed, their initial value, and their new value:
| Initial | New | ColName|
| ----- | ----- | ----- |
| Val A | Val C | Col A |
| 123 | 124 | Col C |
| 01/01/2021 | 02/01/2021 | Col Z |
Also note the column data types also vary, so will need to be converted to varchar
DB2 version is 11.1
EDIT: Also for reference as per comment request, this is code I attempted to use to achieve this goal:
WITH
INIT AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MIN(SOMEDATE) FROM TABLE),
LATE AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MAX(SOMEDATE) FROM TABLE),
COLS AS (SELECT COLNAME FROM SYSCAT.COLUMNS WHERE TABNAME='TABLE' ORDER BY COLNO)
SELECT * FROM (
SELECT
COLNAME AS ATTRIBUTE,
(SELECT COLNAME AS INITIAL FROM INIT),
(SELECT COLNAME AS NEW FROM LATE)
FROM
COLS
WHERE
(INITIAL != NEW) OR (INITIAL IS NULL AND NEW IS NOT NULL) OR (INITIAL IS NOT NULL AND NEW IS NULL));
Only issue with this one is that I couldn't figure how to use the values from the COLS table as the columns to be selected

You may easily generate text of the expressions needed, if you don't want to type them manually.
Consider the following example, if you want to print different column values only in 2 rows of the same quite a wide table SYSCAT.TABLES. We use the following query for such an expression generation.
SELECT
'DECODE(I.I, '
|| LISTAGG(COLNO || ', A.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS INITIAL' AS EXPR_INITIAL
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', B.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS NEW' AS EXPR_NEW
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', ''' || COLNAME || '''', ', ')
|| ') AS COLNAME' AS EXPR_COLNAME
FROM SYSCAT.COLUMNS C
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB';
It doesn't matter how many columns the table contains. We just filter out the columns of *LOB types as an example. If you want them as well, you should change the ::VARCHAR(128) casting to some ::CLOB(XXX).
These 3 generated expressions we put to the corresponding places in the query below:
WITH MYTAB AS
(
-- We enumerate the rows to reference them later
SELECT ROWNUMBER() OVER () RN_, T.*
FROM SYSCAT.TABLES T
WHERE TABSCHEMA = 'SYSCAT'
FETCH FIRST 2 ROWS ONLY
)
SELECT *
FROM
(
SELECT
-- Place here the result got in the EXPR_INITIAL column
-- , Place here the result got in the EXPR_NEW column
-- , Place here the result got in the EXPR_COLNAME column
FROM MYTAB A, MYTAB B
,
(
SELECT COLNO AS I
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB'
) I
WHERE A.RN_ = 1 AND B.RN_ = 2
)
WHERE INITIAL IS DISTINCT FROM NEW;
The result I got in my database:
|INITIAL |NEW |COLNAME |
|--------------------------|--------------------------|---------------|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|ALTER_TIME |
|26 |15 |COLCOUNT |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|CREATE_TIME |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|INVALIDATE_TIME|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|LAST_REGEN_TIME|
|ATTRIBUTES |AUDITPOLICIES |TABNAME |

Related

Maintaining order in DB2 "IN" query

This question is based on this one. I'm looking for a solution to that question that works in DB2. Here is the original question:
I have the following table
DROP TABLE IF EXISTS `test`.`foo`;
CREATE TABLE `test`.`foo` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Then I try to get records based on the primary key
SELECT * FROM foo f where f.id IN (2, 3, 1);
I then get the following result
+----+--------+
| id | name |
+----+--------+
| 1 | first |
| 2 | second |
| 3 | third |
+----+--------+
3 rows in set (0.00 sec)
As one can see, the result is ordered by id. What I'm trying to achieve is to get the results ordered in the sequence I'm providing in the query. Given this example it should return
+----+--------+
| id | name |
+----+--------+
| 2 | second |
| 3 | third |
| 1 | first |
+----+--------+
3 rows in set (0.00 sec)
You could use a derived table with the IDs you want, and the order you want, and then join the table in, something like...
SELECT ...
FROM mcscb.mcs_premise prem
JOIN mcscb.mcs_serv_deliv_id serv
ON prem.prem_nb = serv.prem_nb
AND prem.tech_col_user_id = serv.tech_col_user_id
AND prem.tech_col_version = serv.tech_col_version
JOIN (
SELECT 1, '9486154876' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 2, '9403149581' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 3, '9465828230' FROM SYSIBM.SYSDUMMY1
) B (ORD, ID)
ON serv.serv_deliv_id = B.ID
WHERE serv.tech_col_user_id = 'CRSSJEFF'
AND serv.tech_col_version = '00'
ORDER BY B.ORD
You can use derived column to do custom ordering.
select
case
when serv.SERV_DELIV_ID = '9486154876' then 1 ELSE
when serv.SERV_DELIV_ID = '9403149581' then 2 ELSE 3
END END as custom_order,
...
...
ORDER BY custom_order
To make the logic a little bit more evident you might modify the solution provided by bhamby like so:
WITH ordered_in_list (ord, id) as (
VALUES (1, '9486154876'), (2, '9403149581'), (3, '9465828230')
)
SELECT ...
FROM mcscb.mcs_premise prem
JOIN mcscb.mcs_serv_deliv_id serv
ON prem.prem_nb = serv.prem_nb
AND prem.tech_col_user_id = serv.tech_col_user_id
AND prem.tech_col_version = serv.tech_col_version
JOIN ordered_in_list il
ON serv.serv_deliv_id = il.ID
WHERE serv.tech_col_user_id = 'CRSSJEFF'
AND serv.tech_col_version = '00'
ORDER BY il.ORD

convert array of aclitem into multiple rows redshift

I have one array with column values as
{james=UC/james,adam=C/james,chris=UC/james,john=U/james}
The above column values are not json. They are in string in the following form:
{ username=privilegestring/grantor }
How to convert above column into multiple rows
Edit #3:
Updated the query to specifically target pg_catalog.pg_namespace for acl permissions grants, via CTE pg_catalog. Currently this CTE is filtered in the where clause to select a single namespace name ('avengers'); if you want to select from multiple namespace names, you should be able to add them into the WHERE clause of this CTE directly, or in the case of wanting all namespace names, remove the clause altogether.
It's worth noting as well, that you will need to expand the case statements in access_privilege_types to handle all permissions cases: 'r', 'w', 'a', 'd', and 'x', for the operations: SELECT, UPDATE, INSERT, DELETE, REFERENCE, respectively.
Edit #2:
The final posted version of the query below should get you the data you want in the format that you want it in. I don't know how many possible values there are for the permissions types; if you have more than the two specified currently, you will need to expand the case statements in the CTE* access_privilege_types*. Obviously you'll also need to replace your table name within the query, etc.. Let me know if you run into any trouble and I'll help as necessary.
Edit #1:
Was able to validate that this query works in Redshift. Updated the query to break out separate rows by grantee and owner. The current version doesn't break out individual permissions by row yet -- Will take a look later tonight to see if I can get that working as well.
Original:
I don't have access to my Redshift cluster to test this at the moment, but I will when I get home. The general idea behind the following method, is to create a numbered index table to cross join against that will expand the data in the permissions field into a row-based representation.
I had inquired about the size limit, because this will currently only handle 10,000 possible delimited values, however you can adjust the CTEs to scale up to larger amounts if needed for your specific application:
Revision 3 Query:
WITH
pg_namespace AS (
SELECT
nspname
, nspowner
, rtrim(ltrim(array_to_string(nspacl, ','), '{'), '}') as nspacl
FROM pg_catalog.pg_namespace
WHERE nspname = 'public'
),
-- Generating a table with the numbers 1 - 10 in a single column.
ten_numbers AS (
SELECT
1 AS num
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 0
),
-- Expands the values in ten_numbers to create a single column with the values 1 - 10,000.
depivot_index AS (
SELECT
(1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
),
-- Filters down generated_numbers to house only the numbers up to the maximum times that the delimiter appears.
splitter AS (
SELECT
*
FROM depivot_index
WHERE gen_num BETWEEN 1 AND (
SELECT max(REGEXP_COUNT(nspacl, '\\,') + 1)
FROM pg_namespace
)
),
-- Cross joins permissions_groups and splitter to populate all requests, delimited on ','.
expanded_input AS (
SELECT
pg.nspname
, pg.nspacl
, trim(split_part(pg.nspacl, ',', s.gen_num)) AS raw_permissions_string
FROM pg_namespace AS pg
JOIN splitter AS s ON 1 = 1
WHERE split_part(nspacl, ',', s.gen_num) <> ''
),
-- Breaks out the owner and grantee fields into their own columns respectively.
users_with_raw_permissions_data AS (
SELECT
e.raw_permissions_string
, e.nspname
, trim(split_part(e.raw_permissions_string, '=', 1)) AS grantee
, trim(split_part(trim(split_part(e.raw_permissions_string, '=', 2)), '/', 2)) AS owner
, trim(split_part(trim(split_part(e.raw_permissions_string, '=', 2)), '/', 1)) AS raw_permissions_data
FROM
expanded_input e
),
-- Mines privilege types from raw string data.
access_privilege_types AS (
SELECT
u.nspname
, u.owner
, u.grantee
,CASE
WHEN position('C*' IN u.raw_permissions_data) > 0 THEN 'C*'
WHEN position('U*' IN u.raw_permissions_data) > 0 THEN 'U*'
WHEN position('C' IN u.raw_permissions_data) > 0 THEN 'C'
WHEN position('U' IN u.raw_permissions_data) > 0 THEN 'U'
ELSE u.raw_permissions_data
END AS first_access_privilege
, CASE
WHEN position('U*' IN u.raw_permissions_data) > 0 THEN 'U*'
WHEN position('C*' IN u.raw_permissions_data) > 0 THEN 'C*'
WHEN position('U' IN u.raw_permissions_data) > 0 THEN 'U'
WHEN position('C' IN u.raw_permissions_data) > 0 THEN 'C'
ELSE u.raw_permissions_data
END AS second_access_privilege
, first_access_privilege || ',' || second_access_privilege AS merged_access_privileges
FROM users_with_raw_permissions_data u
),
-- Cross joins access_privilge_types and splitter to populate all privilege_types, delimited on ','.
expanded_access_privilege_types AS (
SELECT
a.nspname
, a.owner
, a.grantee
, trim(split_part(a.merged_access_privileges, ',', s.gen_num)) AS access_privileges
FROM access_privilege_types AS a
JOIN splitter AS s ON 1 = 1
WHERE split_part(a.merged_access_privileges, ',', s.gen_num) <> ''
GROUP BY 1, 2, 3, 4
)
SELECT
ea.nspname
, ea.owner
, ea.grantee
, LEFT(ea.access_privileges, 1) AS access_privilege
, CASE
WHEN POSITION('*' IN ea.access_privileges) > 0 THEN 'YES'
ELSE 'NO'
END AS is_grantable
FROM expanded_access_privilege_types ea
ORDER BY 1, 2, 3, 4, 5
Edit #4:
Adding some clarification on how the ten_numbers, depivot_index, and splitter tables work to break apart the pg_catalog.pg_namespace.nspacl field. The general overview, is that ten_numbers and depivot_index are created purely to return tables with numbered rows to use as an index when joining in thesplit_partvalues ofnspacl`.
ten_numbers generates a table with a single column, containing the numbers 0-9:
-------
| num |
-------
| 0 |
-------
| 1 |
-------
| etc |
-------
| 9 |
-------
This table is then expanded to house the range 0-9999 during the CTE depivot_index:
-----------
| gen_num |
-----------
| 0 |
-----------
| 1 |
-----------
| 2 |
-----------
| etc |
-----------
| 9998 |
-----------
| 9999 |
-----------
splitter then narrows down the table to house only the numbers up to the maximum count of the specified delimiter within the nspacl field:
-------
| num |
-------
| 0 |
-------
| 1 |
-------
| etc |
-------
| 6 |
-------
The table returned by splitter is then used as the target of a CROSS JOIN via the join on 1 = 1 in CTE expanded_input. This ensures that each member returned by split_part will have its own row:
---------------------------------------------------------------------------
| nspname | nspacl | raw_permissions_string |
---------------------------------------------------------------------------
| avengers | "{james=UC/james,adam=C/james}" | "james=UC/james" |
---------------------------------------------------------------------------
| avengers | "{james=UC/james,adam=C/james}" | "adam=C/james" |
---------------------------------------------------------------------------
| avengers | etc. | etc. |
---------------------------------------------------------------------------

Redshift. Convert comma delimited values into rows with all combinations

I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell
I would like to see:
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | start,stop
1 | Shone | start,cancell
1 | Shone | start,stop,cancell
1 | Shone | stop
1 | Shone | stop,cancell
1 | Shone | cancell
....
You can create the following Python UDF:
create or replace function get_unique_combinations(list varchar(max))
returns varchar(max)
stable as $$
from itertools import combinations
arr = list.split(',')
response = []
for L in range(1, len(arr)+1):
for subset in combinations(arr, L):
response.append(','.join(subset))
return ';'.join(response)
$$ language plpythonu;
that will take your list of actions and return unique combinations separated by semicolon (elements in combinations themselves will be separated by commas). Then you use a UNION hack to split values into separate rows like this:
WITH unique_combinations as (
SELECT
user_id
,user_name
,get_unique_combinations(user_actions) as action_combinations
FROM your_table
)
,unwrap_lists as (
SELECT
user_id
,user_name
,split_part(action_combinations,';',1) as parsed_action
FROM unique_combinations
UNION ALL
SELECT
user_id
,user_name
,split_part(action_combinations,';',2) as parsed_action
FROM unique_combinations
-- as much UNIONS as possible combinations you have for a single element, with the 3rd parameter (1-based array index) increasing by 1
)
SELECT *
FROM unwrap_lists
WHERE parsed_action is not null

Report duplicate data

create table dupt(cat varchar(10), num int)
insert dupt(cat,num) values ('A',1),('A',2),('A',3),
('B',1),('B',2),
('C',1),('C',2), ('C',3),
('D',1),('D',2), ('D',4),
('E',1),('E',2),
('F',1),('F',2)
I need to create a report which finds out duplicate data. From the sample data above, report needs to show that data for cat A is duplicated by cat C (notice the num value and no. of records) and cat B is duplicated by cat E and F. What is the best way to show that?
Example output
-------------
|cat | dupby|
-------------
| A | C |
| B | E, F |
-------------
Updated: switched to traditional set matching using common table expression and the stuff() with select ... for xml path ('') method of string concatenation only on the final results:
;with cte as (
select *
, cnt = count(*) over (partition by cat)
from t
)
, duplicates as (
select
x.cat
, dup_cat = x2.cat
from cte as x
inner join cte as x2
on x.cat < x2.cat
and x.num = x2.num
and x.cnt = x2.cnt
group by x.cat, x2.cat, x.cnt
having count(*) = x.cnt
)
select
d.cat
, dupby = stuff((
select ', '+i.dup_cat
from duplicates i
where i.cat = d.cat
for xml path (''), type).value('.','varchar(8000)')
,1,2,'')
from duplicates d
where not exists (
select 1
from duplicates i
where d.cat = i.dup_cat
)
group by d.cat
rextester demo: http://rextester.com/KHAG98718
returns:
+-----+-------+
| cat | dupby |
+-----+-------+
| A | C |
| B | E, F |
+-----+-------+

Comparing tables and getting non matching values

I'm pretty new to SQL and I can't get this to work I've got these two tables below
Table A Table B
_________________ _________________
| A | 2015-10-4 | B | 2015-11-6
| B | 2015-11-4 | C | 2015-05-4
| C | 2015-05-6 | D | 2015-05-8
| D | 2015-05-7 | C | 2015-05-5
I'm trying to write a stored procedure that will get all letters from table B that has a date less than table A and any letter that doesn't exist in table B.
This is what I have so far
SELECT *
FROM A q JOIN
B c ON q.Letter = c.Letter AND q.Date > c.Date OR c.Letter IS NULL
This returns C but I can't have it return A also. It's confusing to me trying to join and compare tables still.
I do not want duplicate rows, the results I would be expecting would return
| A | 2015-10-4
| C | 2015-05-6
EDIT
I'm running into an issue now where if I have a case like this
Table A Table B
_________________ _________________
| A | 2015-10-4 | B | 2015-11-6
| B | 2015-11-4 | C | 2015-05-4
| C | 2015-05-6 | D | 2015-05-8
| D | 2015-05-7 | C | 2015-05-5
| C | 2015-05-7
It will still return C for some reason. Using a.date > max(b.date) doesn't work because max can't used that way. And I want to assume the max date can be anywhere in the table in table B.
So now my new results would be
| A | 2015-10-4
But I am getting A and C still.
You should use a LEFT JOIN:
SELECT DISTINCT A.letter, A.[Date]
FROM dbo.TableA A
LEFT JOIN dbo.TableB B
ON A.letter = B.letter
WHERE B.[Date] < A.[Date] OR B.letter IS NULL;
UPDATE
You should have explained your requirements as: "get all letters from table B in which every date is lesser than...."
SELECT DISTINCT A.letter, A.[Date]
FROM dbo.TableA A
LEFT JOIN (SELECT letter, MAX([Date]) [Date]
FROM dbo.TableB
GROUP BY letter) B
ON A.letter = B.letter
WHERE B.[Date] < A.[Date] OR B.letter IS NULL;
I would go for a UNION / UNION ALL, so that you get the result subset for the first condition + the ones for the second one.
Something similar to this should do the job:
sqlite> create table A (letter, my_date);
sqlite> create table B (letter, my_date);
sqlite> insert into A values ('A', '2015-10-04');
sqlite> insert into A values ('B', '2015-11-04');
sqlite> insert into A values ('C', '2015-05-06');
sqlite> insert into A values ('D', '2015-05-07');
sqlite> insert into B values ('B', '2015-11-06');
sqlite> insert into B values ('C', '2015-05-04');
sqlite> insert into B values ('D', '2015-05-08');
sqlite> insert into B values ('C', '2015-05-05');
A 2015-10-04
sqlite> select B.* from A, B where A.letter = B.letter and B.my_date < A.my_date UNION ALL select A.* from A where not exists (select 1 from B where B.letter=A.letter);
letter my_date
---------- ----------
C 2015-05-04
C 2015-05-05
A 2015-10-04