Split and sequentially join string parts in Postgresql - postgresql

I need to create a DB view with parts of sequential combinations of string parts of a source column. Example:
IN:
tag
--------
A_B_C_D
X_Y_Z
OUT:
subtag
--------
A
A_B
A_B_C
A_B_C_D
X
X_Y
X_Y_Z
The answer seems to be somewhere around WITH RECURSIVE, but I cannot put it all together.

demo:db<>fiddle
SELECT
array_to_string( -- 3
array_agg(t.value) OVER (PARTITION BY tags ORDER BY t.number), --2
'_'
) AS subtag
FROM
tags,
regexp_split_to_table(tag, '_') WITH ORDINALITY as t(value, number) -- 1
Split the string into one row per element. The WITH ORDINALITY adds a row count which can be used to hold the original order of the elements
Using array_agg() window function to aggregate the elements. The ORDER BY makes it cumulative
Reaggregate the array into a string.

You can use a recursive query:
WITH RECURSIVE s AS (
SELECT tag FROM tag
UNION
SELECT regexp_replace(tag, '_[^_]*$', '') FROM s
)
SELECT * FROM s;
tag
---------
A_B_C_D
X_Y_Z
A_B_C
X_Y
A_B
X
A
(7 rows)
The idea is to successively cut off _* at the end.

Thanks a lot #laurenz-albe! There is a problem with your code that it's missing recursion break condition. So I ended up with this:
WITH RECURSIVE s AS (
SELECT tag FROM tag
UNION
SELECT regexp_replace(tag, '_[^_]*$', '')
FROM s
WHERE tag LIKE '%\_%'
)
SELECT * FROM s;
db<>fiddle

Related

How to use redshift regex to get out numbers in an array

In redshift, I have a column that contains an array-like string like [1,2,3] and I want to return 1,2,3 using Redshift's regex functionality. How can one do this? I don't want to do this:
SELECT LISTAGG(option_name , ',') WITHIN GROUP (ORDER BY option_name) as pets_names
FROM reference.vital_options
WHERE option_id in
(
-- this nested CTE splits the json string array into comma separated pet ids
with NS AS (
SELECT vo.option_id + 1 as n
FROM <column with number id> as vo
WHERE upper(vo.country) = 'US'
...
)
select TRIM(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(u.pets_vital, NS.n - 1)) AS val
FROM NS
INNER JOIN go_prod.users AS u ON NS.n <= JSON_ARRAY_LENGTH(u.pets_vital)
WHERE u.id = %(user_id)s
)
AND ...
Is all you are trying to do is remove the square brackets? If so then the translate() function is likely what you want to use. For example:
create table test as (select '[1,2,3]'::text as A);
select a, translate(a, '][', '') as b from test;

Postgresql node traversal using Recursive CTE

I am just trying to learn graph traversal using Recursive CTE in postgresql.
Below is my data set:
i am using the below code to get the path along with existing columns(node & edges).
It is giving me output but path column is not in ARRAY format.
;WITH RECURSIVE CTE AS
(
SELECT NODE,EDGES,ARRAY[G.NODE]::TEXT AS PATH,1 AS LEVEL
FROM property_graph G
UNION ALL
SELECT G.NODE,G.EDGES,C.PATH || G.NODE,LEVEL + 1
FROM property_graph G
INNER JOIN CTE C ON G.NODE = ANY(C.EDGES)
WHERE G.NODE <> ALL(STRING_TO_ARRAY(C.PATH,'')) --Cond added to avoid cyclic graph
)
SELECT NODE,EDGES,PATH,LEVEL
FROM CTE
ORDER BY NODE,LEVEL;
Output:
Could you guys help me?
Thanks in advance.
The problem is that your PATH column is of type TEXT, and so is NODE, therefore the || operator performs string concatenation rather than array concatenation.
You should change the type of your PATH column from TEXT to TEXT[] (and then you can remove the STRING_TO_ARRAY in the WHERE clause.
For example:
WITH RECURSIVE CTE AS
(
SELECT NODE,EDGES,ARRAY[G.NODE]::TEXT[] AS PATH,1 AS LEVEL
FROM property_graph G
UNION ALL
SELECT G.NODE,G.EDGES,C.PATH || ARRAY[G.NODE]::TEXT[],LEVEL + 1
FROM property_graph G
INNER JOIN CTE C ON G.NODE = ANY(C.EDGES)
WHERE G.NODE <> ALL(C.PATH) --Cond added to avoid cyclic graph
)
SELECT NODE,EDGES,PATH,LEVEL
FROM CTE
ORDER BY NODE,LEVEL;

Postgres - Repeating an element N times as array

For example, where the element is 'hi', and where N is 3, I need a PostgreSQL snippet I can use in a SELECT query that returns the following array:
['hi', 'hi', 'hi']
Postgres provides array_fill for this purpose, e.g.:
SELECT array_fill('hi'::text, '{3}');
SELECT array_fill('hi'::text, array[3]);
The two examples are equivalent but the 2nd form is more convenient if you wish to replace the dimension 3 with a variable.
See also: https://www.postgresql.org/docs/current/functions-array.html
You may use array_agg with generate_series
select array_agg(s) from ( values('hi')) as t(s) cross join generate_series(1,3)
Generic
select array_agg(s) from ( values(:elem)) as t(s) cross join generate_series(1,:n)
DEMO
sql demo
with cte as (
select 'hi' as rep_word, generate_series(1, 3) as value
) -- ^^^ n = 3
select array(SELECT rep_word::text from cte);

Postgres frequency count across arrays

I have a column of text[]. How do I get a frequency count of all the objects across the column?
Example:
col_a
--------
{a, b}
{a}
{b}
{a}
Output should be:
col_a | count
----------------
a | 3
b | 2
My query:
with all_tags as (
select array_agg(c)
from (
select unnest(tags)
from message_tags
) as dt(c)
)
select count(*) from all_tags;
figured it out:
-- Collapse all tags into one array
with all_tags as (
select array_agg(c) as arr
from (
select unnest(ner_tags)
from message_tags
) as dt(c)
),
-- Expand single array into a row per tag
row_tags as (
select unnest(arr) as tags from all_tags
)
-- count distinct tags
select tags, count(*) from row_tags group by tags
As an alternative, you could just skip several steps and directly group on the unnested value:
select unnest(ner_tags) as tags,
count(*) as cnt
from message_tags
group by tags
order by cnt desc
Since you only require a count over each of the values (no distinct or other aggregates), this is the simplest solution.

How to find the longest pattern of numerics using TSql

I have say a table with as under
Values
--------
12Null345XXX23456
6712356
Expected Output
----------------
23456
123
How can I do this using TSql(prefereable set based) ?
Thanks
So, you want the longest substring of consecutive numeric digits? Maybe something like this (t here is recreating your original table; if it already exists, skip that CTE and use your table)?:
;with ten as (
select i from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) x(i)
), subs as (
select SUBSTRING('0123456789',s.i,l.i) as sub, l.i as run
from ten as s
join ten as l on s.i+l.i<=11
), t as (
select s
from (
values ('12Null345XXX23456'),('6712356'))x(s)
)
select x.sub
from t
cross apply (select top 1 subs.sub
from subs
where CHARINDEX(subs.sub,t.s)>0
order by subs.run desc) x