convert array of aclitem into multiple rows redshift - amazon-redshift

I have one array with column values as
{james=UC/james,adam=C/james,chris=UC/james,john=U/james}
The above column values are not json. They are in string in the following form:
{ username=privilegestring/grantor }
How to convert above column into multiple rows

Edit #3:
Updated the query to specifically target pg_catalog.pg_namespace for acl permissions grants, via CTE pg_catalog. Currently this CTE is filtered in the where clause to select a single namespace name ('avengers'); if you want to select from multiple namespace names, you should be able to add them into the WHERE clause of this CTE directly, or in the case of wanting all namespace names, remove the clause altogether.
It's worth noting as well, that you will need to expand the case statements in access_privilege_types to handle all permissions cases: 'r', 'w', 'a', 'd', and 'x', for the operations: SELECT, UPDATE, INSERT, DELETE, REFERENCE, respectively.
Edit #2:
The final posted version of the query below should get you the data you want in the format that you want it in. I don't know how many possible values there are for the permissions types; if you have more than the two specified currently, you will need to expand the case statements in the CTE* access_privilege_types*. Obviously you'll also need to replace your table name within the query, etc.. Let me know if you run into any trouble and I'll help as necessary.
Edit #1:
Was able to validate that this query works in Redshift. Updated the query to break out separate rows by grantee and owner. The current version doesn't break out individual permissions by row yet -- Will take a look later tonight to see if I can get that working as well.
Original:
I don't have access to my Redshift cluster to test this at the moment, but I will when I get home. The general idea behind the following method, is to create a numbered index table to cross join against that will expand the data in the permissions field into a row-based representation.
I had inquired about the size limit, because this will currently only handle 10,000 possible delimited values, however you can adjust the CTEs to scale up to larger amounts if needed for your specific application:
Revision 3 Query:
WITH
pg_namespace AS (
SELECT
nspname
, nspowner
, rtrim(ltrim(array_to_string(nspacl, ','), '{'), '}') as nspacl
FROM pg_catalog.pg_namespace
WHERE nspname = 'public'
),
-- Generating a table with the numbers 1 - 10 in a single column.
ten_numbers AS (
SELECT
1 AS num
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 0
),
-- Expands the values in ten_numbers to create a single column with the values 1 - 10,000.
depivot_index AS (
SELECT
(1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
),
-- Filters down generated_numbers to house only the numbers up to the maximum times that the delimiter appears.
splitter AS (
SELECT
*
FROM depivot_index
WHERE gen_num BETWEEN 1 AND (
SELECT max(REGEXP_COUNT(nspacl, '\\,') + 1)
FROM pg_namespace
)
),
-- Cross joins permissions_groups and splitter to populate all requests, delimited on ','.
expanded_input AS (
SELECT
pg.nspname
, pg.nspacl
, trim(split_part(pg.nspacl, ',', s.gen_num)) AS raw_permissions_string
FROM pg_namespace AS pg
JOIN splitter AS s ON 1 = 1
WHERE split_part(nspacl, ',', s.gen_num) <> ''
),
-- Breaks out the owner and grantee fields into their own columns respectively.
users_with_raw_permissions_data AS (
SELECT
e.raw_permissions_string
, e.nspname
, trim(split_part(e.raw_permissions_string, '=', 1)) AS grantee
, trim(split_part(trim(split_part(e.raw_permissions_string, '=', 2)), '/', 2)) AS owner
, trim(split_part(trim(split_part(e.raw_permissions_string, '=', 2)), '/', 1)) AS raw_permissions_data
FROM
expanded_input e
),
-- Mines privilege types from raw string data.
access_privilege_types AS (
SELECT
u.nspname
, u.owner
, u.grantee
,CASE
WHEN position('C*' IN u.raw_permissions_data) > 0 THEN 'C*'
WHEN position('U*' IN u.raw_permissions_data) > 0 THEN 'U*'
WHEN position('C' IN u.raw_permissions_data) > 0 THEN 'C'
WHEN position('U' IN u.raw_permissions_data) > 0 THEN 'U'
ELSE u.raw_permissions_data
END AS first_access_privilege
, CASE
WHEN position('U*' IN u.raw_permissions_data) > 0 THEN 'U*'
WHEN position('C*' IN u.raw_permissions_data) > 0 THEN 'C*'
WHEN position('U' IN u.raw_permissions_data) > 0 THEN 'U'
WHEN position('C' IN u.raw_permissions_data) > 0 THEN 'C'
ELSE u.raw_permissions_data
END AS second_access_privilege
, first_access_privilege || ',' || second_access_privilege AS merged_access_privileges
FROM users_with_raw_permissions_data u
),
-- Cross joins access_privilge_types and splitter to populate all privilege_types, delimited on ','.
expanded_access_privilege_types AS (
SELECT
a.nspname
, a.owner
, a.grantee
, trim(split_part(a.merged_access_privileges, ',', s.gen_num)) AS access_privileges
FROM access_privilege_types AS a
JOIN splitter AS s ON 1 = 1
WHERE split_part(a.merged_access_privileges, ',', s.gen_num) <> ''
GROUP BY 1, 2, 3, 4
)
SELECT
ea.nspname
, ea.owner
, ea.grantee
, LEFT(ea.access_privileges, 1) AS access_privilege
, CASE
WHEN POSITION('*' IN ea.access_privileges) > 0 THEN 'YES'
ELSE 'NO'
END AS is_grantable
FROM expanded_access_privilege_types ea
ORDER BY 1, 2, 3, 4, 5
Edit #4:
Adding some clarification on how the ten_numbers, depivot_index, and splitter tables work to break apart the pg_catalog.pg_namespace.nspacl field. The general overview, is that ten_numbers and depivot_index are created purely to return tables with numbered rows to use as an index when joining in thesplit_partvalues ofnspacl`.
ten_numbers generates a table with a single column, containing the numbers 0-9:
-------
| num |
-------
| 0 |
-------
| 1 |
-------
| etc |
-------
| 9 |
-------
This table is then expanded to house the range 0-9999 during the CTE depivot_index:
-----------
| gen_num |
-----------
| 0 |
-----------
| 1 |
-----------
| 2 |
-----------
| etc |
-----------
| 9998 |
-----------
| 9999 |
-----------
splitter then narrows down the table to house only the numbers up to the maximum count of the specified delimiter within the nspacl field:
-------
| num |
-------
| 0 |
-------
| 1 |
-------
| etc |
-------
| 6 |
-------
The table returned by splitter is then used as the target of a CROSS JOIN via the join on 1 = 1 in CTE expanded_input. This ensures that each member returned by split_part will have its own row:
---------------------------------------------------------------------------
| nspname | nspacl | raw_permissions_string |
---------------------------------------------------------------------------
| avengers | "{james=UC/james,adam=C/james}" | "james=UC/james" |
---------------------------------------------------------------------------
| avengers | "{james=UC/james,adam=C/james}" | "adam=C/james" |
---------------------------------------------------------------------------
| avengers | etc. | etc. |
---------------------------------------------------------------------------

Related

Aggregate results in SQLAnywhere 16 (For XML replacement)

I'm using Ultralite 16.0 and trying to aggregate multiple rows into one.
I have a film table and a genre table, that have a many-to-many relationship through the film_genre table.
I want a result like this:
| mFilm | idGenre |
| --------- | ------- |
| Film_One | 1, 2 |
| Film_Two | 1, 3, 4 |
I can do this easily in SQL Server Management Studio (SSMS) with
SELECT f.nmFilm,
(
REPLACE(
STUFF(
(SELECT g.idGenre FROM genre g
JOIN film_genre fg on g.idGenre = fg.idGenre
WHERE fg.idFilm = f.idFilm
FOR XML PATH('')
) , 1 , 1 , ''
)
, '&', '&')
) AS genres FROM film f;
However, any time I try to use FOR XML in SQLAnywhere such as in the query below
SELECT f.nmFilm,
(SELECT g.idGenre FROM genre g
JOIN film_genre fg on g.idGenre = fg.idGenre
WHERE fg.idFilm = f.idFilm
FOR XML AUTO
) AS genres FROM film f;
I get syntax error
Could not execute statement.
[UltraLite Database] Syntax error near 'XML' [SQL Offset: 57]
SQLCODE=-131, ODBC 3 State="42000"
Line 1, column 1
I can't find any reference to FOR XML in the documentation, so I'm unsure if this is available in version 16.0.
How could I achieve this?
In case someone else needs this, all I had to do was use the LIST() function.
SELECT f.nmFilm,
(SELECT LIST(g.idGenre) FROM genre g
JOIN film_genre fg on g.idGenre = fg.idGenre
WHERE fg.idFilm = f.idFilm
) AS genres
FROM film f;

DB2: How to transpose mutlidimensional table from row to column to find data changes across rows

I am trying the following with Db2:
Problem
So I've got a table with 80+ columns and two rows.
I need to accomplish is checking what columns have changed value between the two rows, and return a table of the column names that have changed, their initial value from row1, and their new value from row2.
Approach so far
My initial idea was to perform a pivot of the two rows into two columns, row 1 as column 1, row 2 as column 2, then join a column of column names (likely taken from syscat.columns) to the table as column 3, at which point I can then do a select where column1 != column2, hence returning the rows with all the data needed. But alas, it was not long after coming up with this that I discover DB2 doesn't support pivot / unpivot...
Question
So is there any idea for how to accomplish this in DB2, taking a table with 80+ columns and two rows like so:
| Col A | Col B | Col C | ... | Col Z|
| ----- | ----- | ----- | --- | ---- |
| Val A | Val B | 123 | ... | 01/01/2021 |
| Val C | Val B | 124 | ... | 02/01/2021 |
And returning a table with the columns changed, their initial value, and their new value:
| Initial | New | ColName|
| ----- | ----- | ----- |
| Val A | Val C | Col A |
| 123 | 124 | Col C |
| 01/01/2021 | 02/01/2021 | Col Z |
Also note the column data types also vary, so will need to be converted to varchar
DB2 version is 11.1
EDIT: Also for reference as per comment request, this is code I attempted to use to achieve this goal:
WITH
INIT AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MIN(SOMEDATE) FROM TABLE),
LATE AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MAX(SOMEDATE) FROM TABLE),
COLS AS (SELECT COLNAME FROM SYSCAT.COLUMNS WHERE TABNAME='TABLE' ORDER BY COLNO)
SELECT * FROM (
SELECT
COLNAME AS ATTRIBUTE,
(SELECT COLNAME AS INITIAL FROM INIT),
(SELECT COLNAME AS NEW FROM LATE)
FROM
COLS
WHERE
(INITIAL != NEW) OR (INITIAL IS NULL AND NEW IS NOT NULL) OR (INITIAL IS NOT NULL AND NEW IS NULL));
Only issue with this one is that I couldn't figure how to use the values from the COLS table as the columns to be selected
You may easily generate text of the expressions needed, if you don't want to type them manually.
Consider the following example, if you want to print different column values only in 2 rows of the same quite a wide table SYSCAT.TABLES. We use the following query for such an expression generation.
SELECT
'DECODE(I.I, '
|| LISTAGG(COLNO || ', A.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS INITIAL' AS EXPR_INITIAL
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', B.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS NEW' AS EXPR_NEW
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', ''' || COLNAME || '''', ', ')
|| ') AS COLNAME' AS EXPR_COLNAME
FROM SYSCAT.COLUMNS C
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB';
It doesn't matter how many columns the table contains. We just filter out the columns of *LOB types as an example. If you want them as well, you should change the ::VARCHAR(128) casting to some ::CLOB(XXX).
These 3 generated expressions we put to the corresponding places in the query below:
WITH MYTAB AS
(
-- We enumerate the rows to reference them later
SELECT ROWNUMBER() OVER () RN_, T.*
FROM SYSCAT.TABLES T
WHERE TABSCHEMA = 'SYSCAT'
FETCH FIRST 2 ROWS ONLY
)
SELECT *
FROM
(
SELECT
-- Place here the result got in the EXPR_INITIAL column
-- , Place here the result got in the EXPR_NEW column
-- , Place here the result got in the EXPR_COLNAME column
FROM MYTAB A, MYTAB B
,
(
SELECT COLNO AS I
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB'
) I
WHERE A.RN_ = 1 AND B.RN_ = 2
)
WHERE INITIAL IS DISTINCT FROM NEW;
The result I got in my database:
|INITIAL |NEW |COLNAME |
|--------------------------|--------------------------|---------------|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|ALTER_TIME |
|26 |15 |COLCOUNT |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|CREATE_TIME |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|INVALIDATE_TIME|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|LAST_REGEN_TIME|
|ATTRIBUTES |AUDITPOLICIES |TABNAME |

Extract words before and after a specific word

I need to extract words before and after a word like '%don%' in a ntext column.
table A, column name: Text
Example:
TEXT
where it was done it will retrieve the...
at the end of the trip clare done everything to improve
it is the only one done in these times
I would like the following results:
was done it
clare done everything
one done in
I am using T-SQL, Left and right functions did not work with ntext data type of the column containing text.
As others have said, you can use a string splitting function to split out each word and then return those you require. Using the previously linked DelimitedSplit8K:
CREATE FUNCTION dbo.DelimitedSplit8K
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
go
declare #t table (t ntext);
insert into #t values('where it was done it will retrieve the...'),('at the end of the trip clare done everything to improve'),('we don''t take donut donations here'),('ending in don');
with t as (select cast(t as nvarchar(max)) as t from #t)
,d as (select t.t
,case when patindex('%don%',s.Item) > 0 then 1 else 0 end as d
,s.ItemNumber as i
,lag(s.Item,1,'') over (partition by t.t order by s.ItemNumber) + ' '
+ s.Item + ' '
+ lead(s.Item,1,'') over (partition by t.t order by s.ItemNumber) as r
from t
cross apply dbo.DelimitedSplit8K(t.t, ' ') as s
)
select t
,r
from d
where d = 1
order by t
,i;
Output:
+---------------------------------------------------------+-----------------------+
| t | r |
+---------------------------------------------------------+-----------------------+
| at the end of the trip clare done everything to improve | clare done everything |
| ending in don | in don |
| we don't take donut donations here | we don't take |
| we don't take donut donations here | take donut donations |
| we don't take donut donations here | donut donations here |
| where it was done it will retrieve the... | was done it |
+---------------------------------------------------------+-----------------------+
And a working example:
http://rextester.com/RND43071

Update Count column in Postgresql

I have a single table laid out as such:
id | name | count
1 | John |
2 | Jim |
3 | John |
4 | Tim |
I need to fill out the count column such that the result is the number of times the specific name shows up in the column name.
The result should be:
id | name | count
1 | John | 2
2 | Jim | 1
3 | John | 2
4 | Tim | 1
I can get the count of occurrences of unique names easily using:
SELECT COUNT(name)
FROM table
GROUP BY name
But that doesn't fit into an UPDATE statement due to it returning multiple rows.
I can also get it narrowed down to a single row by doing this:
SELECT COUNT(name)
FROM table
WHERE name = 'John'
GROUP BY name
But that doesn't allow me to fill out the entire column, just the 'John' rows.
you can do that with a common table expression:
with counted as (
select name, count(*) as name_count
from the_table
group by name
)
update the_table
set "count" = c.name_count
from counted c
where c.name = the_table.name;
Another (slower) option would be to use a co-related sub-query:
update the_table
set "count" = (select count(*)
from the_table t2
where t2.name = the_table.name);
But in general it is a bad idea to store values that can easily be calculated on the fly:
select id,
name,
count(*) over (partition by name) as name_count
from the_table;
Another method : Using a derived table
UPDATE tb
SET count = t.count
FROM (
SELECT count(NAME)
,NAME
FROM tb
GROUP BY 2
) t
WHERE t.NAME = tb.NAME

SQL to remove rows with duplicated value while keeping one

Say I have this table
id | data | value
-----------------
1 | a | A
2 | a | A
3 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
8 | c | C
I want to remove those rows with duplicated value for each data while keeping the one with the min id, e.g. the result will be
id | data | value
-----------------
1 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
I know a way to do it is to do a union like:
SELECT 1 [id], 'a' [data], 'A' [value] INTO #test UNION SELECT 2, 'a', 'A'
UNION SELECT 3, 'a', 'A' UNION SELECT 4, 'a', 'B'
UNION SELECT 5, 'b', 'C' UNION SELECT 6, 'c', 'A'
UNION SELECT 7, 'c', 'C' UNION SELECT 8, 'c', 'C'
SELECT * FROM #test WHERE id NOT IN (
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) > 1
UNION
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) <= 1
)
but this solution has to repeat the same group by twice (consider the real case is a massive group by with > 20 columns)
I would prefer a simpler answer with less code as oppose to complex ones. Is there any more concise way to code this?
Thank you
You can use one of the methods below:
Using WITH CTE:
WITH CTE AS
(SELECT *,RN=ROW_NUMBER() OVER(PARTITION BY data,value ORDER BY id)
FROM TableName)
DELETE FROM CTE WHERE RN>1
Explanation:
This query will select the contents of the table along with a row number RN. And then delete the records with RN >1 (which would be the duplicates).
This Fiddle shows the records which are going to be deleted using this method.
Using NOT IN:
DELETE FROM TableName
WHERE id NOT IN
(SELECT MIN(id) as id
FROM TableName
GROUP BY data,value)
Explanation:
With the given example, inner query will return ids (1,6,4,5,7). The outer query will delete records from table whose id NOT IN (1,6,4,5,7).
This fiddle shows the records which are going to be deleted using this method.
Suggestion: Use the first method since it is faster than the latter. Also, it manages to keep only one record if id field is also duplicated for the same data and value.
I want to add MYSQL solution for this query
Suggestion 1 : MySQL prior to version 8.0 doesn't support the WITH clause
Suggestion 2 : throw this error (you can't specify table TableName for update in FROM clause
So the solution will be
DELETE FROM TableName WHERE id NOT IN
(SELECT MIN(id) as id
FROM (select * from TableName) as t1
GROUP BY data,value) as t2;