PostgreSQL calculate prefix combinations after split - postgresql

I do have a string as entry, of the form foo:bar:something:221. I'm looking for a way to generate a table with all prefixes for this string, like:
foo
foo:bar
foo:bar:something
foo:bar:something:221
I wrote the following query to split the string, but can't figure out where to go from there:
select unnest(string_to_array('foo:bar:something:221', ':'));

An option is to simulate a loop over all elements, then take the sub-array from the input for each element index:
with data(input) as (
values (string_to_array('foo:bar:something:221', ':'))
)
select array_to_string(input[1:g.idx], ':')
from data
cross join generate_series(1, cardinality(input)) as g(idx);
generate_series(1, cardinality(input)) generates as many rows as the array has elements. And the expression input[1:g.idx] takes the "sub-array" starting with the first up to the "idx" one. As the output is an array, I use array_to_string to re-create the representation with the :

You can use string_agg as a window function. The default frame is from the beginning of the partition to the current row:
SELECT string_agg(s, ':') OVER (ORDER BY n)
FROM unnest(string_to_array('foo:bar:something:221', ':')) WITH ORDINALITY AS u(s, n);
string_agg
-----------------------
foo
foo:bar
foo:bar:something
foo:bar:something:221
(4 rows)

Related

Combine JSONB array of values by consecutive pairs

In postgresql, I have a simple one JSONB column data store:
data
----------------------------
{"foo": [1,2,3,4]}
{"foo": [10,20,30,40,50,60]}
...
I need to convert consequent pairs of values into data points, essentially calling the array variant of ST_MakeLine like this: ST_MakeLine(ARRAY(ST_MakePoint(10,20), ST_MakePoint(30,40), ST_MakePoint(50,60))) for each row of the source data.
Needed result (note that the x,y order of each point might need to be reversed):
data geometry (after decoding)
---------------------------- --------------------------
{"foo": [1,2,3,4]} LINE (1 2, 3 4)
{"foo": [10,20,30,40,50,60]} LINE (10 20, 30 40, 50 60)
...
Partial solution
I can already iterate over individual array values, but it is the pairing that is giving me trouble. Also, I am not certain if I need to introduce any ordering into the query to preserve the original ordering of the array elements.
SELECT ARRAY(
SELECT elem::int
FROM jsonb_array_elements(data -> 'foo') elem
) arr FROM mytable;
You can achieve this by using window functions lead or lag, then picking only every second row:
SELECT (
SELECT array_agg((a, b) ORDER BY o)
FROM (
SELECT elem::int AS a, lead(elem::int) OVER (ORDER BY o) AS b, o
FROM jsonb_array_elements(data -> 'foo') WITH ORDINALITY els(elem, o)
) AS pairs
WHERE o % 2 = 1
) AS arr
FROM example;
(online demo)
And yes, I would recommend to specify the ordering explicitly, making use of WITH ORDINALITY.

Redshift how to split a stringified array into separate parts

Say I have a varchar column let's say religions that looks like this: ["Christianity", "Buddhism", "Judaism"] (yes it has a bracket in the string) and I want the string (not array) split into multiple rows like "Christianity", "Buddhism", "Judaism" so it can be used in a WHERE clause.
Eventually I want to use the results of the query in a where clause like this:
SELECT ...
FROM religions
WHERE name in
(
<this subquery>
)
How can one do this?
You can use the function JSON_PARSE to convert the varchar string into an array. Then you can use the strategy described in Convert varchar array to rows in redshift - Stack Overflow to convert the array to separate rows.
You can do the following.
Create a temporary table with sequence of numbers
Using the sequence and split_part function available in redshift, you can split the values based on the numbers generated in the temporary table by doing a cross join.
To replace the double quote and square brackets, you can use the regexp_replace function in Redshift.
create temp table seq as
with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
select * from numbers;
select regexp_replace(split_part(val,',',seq.number),'[]["]','') as value
from
(select '["christianity","Buddhism","Judaism"]' as val) -- You can select the actual column from the table here.
cross join
seq
where seq.number <= regexp_count(val,'[,]')+1;

PostgreSQL adding two integer Arrays

I have two instances of the type integer[] (generated by the Timescale histogram function), e.g. {3,5,1} and {2,2,2}.
I would like to add these two Arrays to {5,7,3} but using
SELECT "ID", histogram(...) + histogram(...)
FROM "ID"
GROUP BY "ID"
throws the following error: operator does not exist: integer[] + integer[]. Is there any way to accomplish this?
I don't think there is such a function.
In order to achieve your goal (in SQL) you'd have to unnest the arrays, then add the corresponding elements and aggregate the results back to array.
SELECT
array_agg(
COALESCE(h1.val, 0)+COALESCE(h2.val, 0)
ORDER BY COALESCE(h1.row_number, h2.row_number)
) as result
FROM
(SELECT ROW_NUMBER() over (), val FROM unnest('{3,5,1,5}'::int[]) as val) as h1
FULL JOIN (SELECT ROW_NUMBER() over (), val FROM unnest('{2,2,2}'::int[]) as val) as h2 ON h1.row_number=h2.row_number
I'm using ROW_NUMBER window function to get the array element number.
FULL JOIN is required because the arrays may be of different length. It is also the reason why COALESCE is required when adding the elements.
Thanks to #a_horse_with_no_name the query may be rewritten using ordinality without relying on row_number() function:
SELECT
array_agg(
COALESCE(h1.val, 0)+COALESCE(h2.val, 0)
ORDER BY COALESCE(h1.no, h2.no)
) as result
FROM
unnest('{3,5,1,5}'::int[]) WITH ORDINALITY as h1(val, no)
FULL JOIN unnest('{2,2,2}'::int[]) WITH ORDINALITY as h2(val, no) ON h1.no=h2.no

Query table by a value in the second dimension of a two dimensional array column

WHAT I HAVE
I have a table with the following definition:
CREATE TABLE "Highlights"
(
id uuid,
chunks numeric[][]
)
WHAT I NEED TO DO
I need to query the data in the table using the following predicate:
... WHERE id = 'some uuid' and chunks[????????][1] > 10 chunks[????????][3] < 20
What should I put instead of [????????] in order to scan all items in the first dimension of the array?
Notes
I'm not entirely sure that chunks[][1] even close to something I need.
All I need is to test a row, whether its chunks column contains a two dimensional array, that has in any of its tuples some specific values.
May be there's better alternative, but this might do - you just go over first dimension of each array and testing your condition:
select *
from highlights as h
where
exists (
select
from generate_series(1, array_length(h.chunks, 1)) as tt(i)
where
-- your condition goes here
h.chunks[tt.i][1] > 10 and h.chunks[tt.i][3] < 20
)
db<>fiddle demo
update as #arie-r pointed out, it'd be better to use generate_subscripts function:
select *
from highlights as h
where
exists (
select *
from generate_subscripts(h.chunks, 1) as tt(i)
where
h.chunks[tt.i][3] = 6
)
db<>fiddle demo

Need help in parsing column value based on value in other column

I have two columns, COL1 and COL2. COL1 has value like 'Birds sitting on $1 and enjoying' and COL2 has value like 'the.location_value[/tree,\building]'
I need to update third column COL3 with values like 'Birds sitting on /tree and enjoying'
i.e. $1 in 1st column is replaced with /tree
which is the 1st word from list of comma separated words with in square brackets [] in COL2 i.e. [/tree,\building]
I wanted to know the best suitable combination of string function in postgresql to use to achieve this.
You need to first extract the first element from the comma separated list, to do that, you can use split_part() but you first need to extract the actual list of values. This can be done using substring() with a regular expression:
substring(col2 from '\[(.*)\]')
will return /tree,\building
So the complete query would be:
select replace(col1, '$1', split_part(substring(col2 from '\[(.*)\]'), ',', 1))
from the_table;
Online example: http://rextester.com/CMFZMP1728
This one should work with any (int) number after $:
select t.*, c.col3
from t,
lateral (select string_agg(case
when o = 1 then s
else (string_to_array((select regexp_matches(t.col2, '\[(.*)\]'))[1], ','))[(select regexp_matches(s, '^\$(\d+)'))[1]::int] || substring(s from '^\$\d+(.*)')
end, '' order by o) col3
from regexp_split_to_table(t.col1, '(?=\$\d+)') with ordinality s(s, o)) c
http://rextester.com/OKZAG54145
Note:it is not the most efficient though. It splits col2's values (in the square brackets) each time for replacing $N.
Update: LATERAL and WITH ORDINALITY is not supported in older versions, but you could try a correlating subquery instead:
select t.*, (select array_to_string(array_agg(case
when s ~ E'^\\$(\\d+)'
then (string_to_array((select regexp_matches(t.col2, E'\\[(.*)\\]'))[1], ','))[(select regexp_matches(s, E'^\\$(\\d+)'))[1]::int] || substring(s from E'^\\$\\d+(.*)')
else s
end), '') col3
from regexp_split_to_table(t.col1, E'(?=\\$\\d+)') s) col3
from t