Combine JSONB array of values by consecutive pairs - postgresql

In postgresql, I have a simple one JSONB column data store:
data
----------------------------
{"foo": [1,2,3,4]}
{"foo": [10,20,30,40,50,60]}
...
I need to convert consequent pairs of values into data points, essentially calling the array variant of ST_MakeLine like this: ST_MakeLine(ARRAY(ST_MakePoint(10,20), ST_MakePoint(30,40), ST_MakePoint(50,60))) for each row of the source data.
Needed result (note that the x,y order of each point might need to be reversed):
data geometry (after decoding)
---------------------------- --------------------------
{"foo": [1,2,3,4]} LINE (1 2, 3 4)
{"foo": [10,20,30,40,50,60]} LINE (10 20, 30 40, 50 60)
...
Partial solution
I can already iterate over individual array values, but it is the pairing that is giving me trouble. Also, I am not certain if I need to introduce any ordering into the query to preserve the original ordering of the array elements.
SELECT ARRAY(
SELECT elem::int
FROM jsonb_array_elements(data -> 'foo') elem
) arr FROM mytable;

You can achieve this by using window functions lead or lag, then picking only every second row:
SELECT (
SELECT array_agg((a, b) ORDER BY o)
FROM (
SELECT elem::int AS a, lead(elem::int) OVER (ORDER BY o) AS b, o
FROM jsonb_array_elements(data -> 'foo') WITH ORDINALITY els(elem, o)
) AS pairs
WHERE o % 2 = 1
) AS arr
FROM example;
(online demo)
And yes, I would recommend to specify the ordering explicitly, making use of WITH ORDINALITY.

Related

how to unpivot large AWS Redshift table

I am trying to run a query against a table in AWS Redshift (i.e., postgresql). Below is a simplified definition of the table:
CREATE TABLE some_schema.some_table (
row_id int
,productid_level1 char(1)
,productid_level2 char(1)
,productid_level3 char(1)
)
;
INSERT INTO some_schema.some_table
VALUES
(1, a, b, c)
,(2, d, c, e)
,(3, c, f, g)
,(4, e, h, i)
,(5, f, j, k)
,(6, g, l, m)
;
I need to return a de-duped, single column table of a given productid and all of its children. "Children" means any productid that has "level" higher than the given product (for a given row) and also its grandchildren.
For example, for productid 'c', I expect to return...
'c' (because it's found in rows 1, 2, and 3)
'e' (because it's a child of 'c' in row 2)
'f' and 'g' (because they're children of 'c' in row 3)
'h' and 'i' (because they're children of 'e' in row 4)
'j' and 'k' (because they're children of 'f' in row 5)
and 'l' and 'm' (because they're children of 'g' in row 6)
Visually, I expect to return the following:
productid
---------
c
e
f
g
h
i
j
k
l
m
The actual table has about 3M rows and has about 20 "levels".
I think there are 2 parts to this query -- (1) a recursive CTE to build out the hierarchy and (2) an unpivot operation.
I have not attempted (1) yet. For (2), I have tried a query like the following, but it hasn't returned even after 3 minutes. As this will be used for an operational report, I need it to return in < 15 seconds.
select
b.productid
,b.product_level
from
some_schema.some_table as a
cross join lateral (
values
(a.productid_level1, 1)
,(a.productid_level2, 2)
...
,(a.productid_level20, 20)
) as b(productid, product_level)
How can I write the query to achieve (1) and (2) and be very performant?
I would avoid using the term Hierarchy, as that "usually" implies any node having a single parent at most.
I admit I'm lost as to the nature of the graph/network this table represents. But you might benefit from a little brute force and code repetition.
Whatever eventually works for you, I think you'll need to persist/materialise/cache the results, as repeating this at report time is unlikely to ever be a good idea.
I'm a data engineer by trade, and I'm sure they have good reasons for what they've done (or, like me, they maybe screwed up). Either way, there are many good reasons to ask them to materialise the graph in more than just one form, each suited to different use cases. So, asking them for a traditional adjacency list, as well as the table you already have, is a reasonable request. Or, at the very least, a good starting point for a conversation.
So, a brute force approach?
WITH
adjacency AS
(
SELECT level01, level02 FROM some_table WHERE level02 IS NOT NULL
UNION
SELECT level02, level03 FROM some_table WHERE level03 IS NOT NULL
UNION
...
UNION
SELECT level19, level20 FROM some_table WHERE level20 IS NOT NULL
)
The WHERE clause elimates any sparse data before it enters the map.
The UNION (without ALL) ensures duplicate links are eliminated. You should also test UNION ALL and then wrapping a SELECT DISTINCT around it (or similar).
Then you can use that adjacency list in the usual recursive walk, to find all children of a given node. (Taking care that there aren't any cyclic paths.)

Postgres - pair elements of one array with elements of another

The values to pair are defined in two arrays, array[1,2,3] and array['A','B','C'].
What I need to do is merge these two arrays together, where each element is paired with one at the same index in the other, so it results in array[[1,'A'],[2,'B'],[3,'C']].
How can I do that?
You could use UNNEST WITH ORDINALITY & join the 2 result this way:
Schema (PostgreSQL v14)
CREATE TABLE array_zip (
id INT,
a1 INT[],
a2 TEXT[]
);
INSERT INTO array_zip
VALUES (1, ARRAY [1, 2, 3], ARRAY ['A', 'B', 'C'])
, (2, ARRAY [4, 5, 6], ARRAY ['D', 'E', 'F', 'G']) -- different number of elements
;
Query #1
SELECT id, zip
FROM array_zip
CROSS JOIN LATERAL (
SELECT array_agg((aa1.v, aa2.v)) AS zip
FROM UNNEST(a1) WITH ORDINALITY AS aa1(v, i)
-- use an INNER JOIN instead to shortcut the zip of the first missing element
FULL JOIN UNNEST(a2) WITH ORDINALITY AS aa2(v, i)
USING (i)
) AS f
;
id
zip
1
{"(1,A)","(2,B)","(3,C)"}
2
{"(4,D)","(5,E)","(6,F)","(,G)"}
Query #2
You could also avoid the JOIN between the two arrays with an index-based direct access, which should be faster (but probably not by a lot, unless the arrays here are pretty big):
SELECT id, zip
FROM array_zip
CROSS JOIN LATERAL (
SELECT array_agg((a1[i], a2[i])) AS zip
-- use LEAST instead to stop the zip on the first missing element
FROM generate_series(1, GREATEST(cardinality(a1), cardinality(a2))) AS i
) AS f;
id
zip
1
{"(1,A)","(2,B)","(3,C)"}
2
{"(4,D)","(5,E)","(6,F)","(,G)"}
View on DB Fiddle
The last option I can think of is writing a function (either PL/pgSQL, or PL/v8, or...). The code would probably be easier to understand (especially if you need this feature in multiple queries) and you could handle the len(arr1) != len(arr2 -> raise case if you want/need to.

PostgreSQL calculate prefix combinations after split

I do have a string as entry, of the form foo:bar:something:221. I'm looking for a way to generate a table with all prefixes for this string, like:
foo
foo:bar
foo:bar:something
foo:bar:something:221
I wrote the following query to split the string, but can't figure out where to go from there:
select unnest(string_to_array('foo:bar:something:221', ':'));
An option is to simulate a loop over all elements, then take the sub-array from the input for each element index:
with data(input) as (
values (string_to_array('foo:bar:something:221', ':'))
)
select array_to_string(input[1:g.idx], ':')
from data
cross join generate_series(1, cardinality(input)) as g(idx);
generate_series(1, cardinality(input)) generates as many rows as the array has elements. And the expression input[1:g.idx] takes the "sub-array" starting with the first up to the "idx" one. As the output is an array, I use array_to_string to re-create the representation with the :
You can use string_agg as a window function. The default frame is from the beginning of the partition to the current row:
SELECT string_agg(s, ':') OVER (ORDER BY n)
FROM unnest(string_to_array('foo:bar:something:221', ':')) WITH ORDINALITY AS u(s, n);
string_agg
-----------------------
foo
foo:bar
foo:bar:something
foo:bar:something:221
(4 rows)

Query table by a value in the second dimension of a two dimensional array column

WHAT I HAVE
I have a table with the following definition:
CREATE TABLE "Highlights"
(
id uuid,
chunks numeric[][]
)
WHAT I NEED TO DO
I need to query the data in the table using the following predicate:
... WHERE id = 'some uuid' and chunks[????????][1] > 10 chunks[????????][3] < 20
What should I put instead of [????????] in order to scan all items in the first dimension of the array?
Notes
I'm not entirely sure that chunks[][1] even close to something I need.
All I need is to test a row, whether its chunks column contains a two dimensional array, that has in any of its tuples some specific values.
May be there's better alternative, but this might do - you just go over first dimension of each array and testing your condition:
select *
from highlights as h
where
exists (
select
from generate_series(1, array_length(h.chunks, 1)) as tt(i)
where
-- your condition goes here
h.chunks[tt.i][1] > 10 and h.chunks[tt.i][3] < 20
)
db<>fiddle demo
update as #arie-r pointed out, it'd be better to use generate_subscripts function:
select *
from highlights as h
where
exists (
select *
from generate_subscripts(h.chunks, 1) as tt(i)
where
h.chunks[tt.i][3] = 6
)
db<>fiddle demo

Pixel values of raster records to be inserted in the table as columns

I have a table with following columns:
(ID, row_num, col_num, pix_centroid, pix_val1).
I have more than 1000 records. I am inserting my data using:
insert into pixelbased (row_num, col_num, pix_centroid, pix_val)
select
(ST_PixelAsPolygons(rast, 1)).x as X,
(ST_PixelAsPolygons(rast, 1)).y as Y,
(ST_Centroid((ST_PixelAsPolygons(rast, 1)).geom)) as geom,
(ST_PixelAsPolygons(rast, 1)).val as pix_val1
from mytable
where rid=1`
Now I am trying to insert all the other records as a column and _pix_val1_ column is important for me. All the other columns will remain the same. In the other word, I want the final table to have these columns:
(ID, row_num, col_num, pix_centroid, pix_val1, pix_val2, pix_val3, ....)
Is there a way to do it?
I would want to store this data as a bitmap in a bytea if possible. Here's how to take a series of byte values and turn it into a bytea:
WITH bytes(b) AS (SELECT x % 256 FROM generate_series(1,53000) x)
SELECT ('\x'||string_agg(lpad(to_hex(b),2,'0'),''))::bytea FROM bytes;
You can access fields or ranges of the byte array using the substr function. This bytea is organized as a linear pixel array, but you may find it more useful to organize it into a more traditional bitmap format. Also, if your pixels are more than one byte you may need to cope with big-endian vs little-endian. You could do that in SQL, but it's likely to be much easier in a procedural language like PL/Perl.
Failing that, a multidimensional array would be a somewhat reasonable choice.
Using a generate_series statement as a substitute for your pix_val field for convenient testing, this query produces a two-dimensional array of integers using two aggregation passes:
SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr
FROM (
SELECT array_agg(x order by x)::text
FROM generate_series(1,53000) x
GROUP BY width_bucket(x, 1, 53001, 100)
) a(subarray);
The unfortunate use of the string literal form of the two dimensional array is made necessary by the fact that array_agg cannot aggregate arrays. In my view this is a real wart in PostgreSQL; in general its multidimensional arrays are odd to work with and inconsistent with how most applications and languages implement arrays.
You can get fields out of the array by indexing it. Example:
regress=> SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr INTO test FROM (SELECT array_agg(x order by x)::text from generate_series(1,53000) x GROUP BY width_bucket(x, 1, 53001, 100)) a(subarray);
regress=> \d test
Table "public.test"
Column | Type | Modifiers
--------+-----------+-----------
arr | integer[] |
test contains a single array with two dimensions:
regress=> \x
regress=> select array_dims(test.arr), array_ndims(test.arr), array_length(test.arr,1), array_length(test.arr,2) FROM test;
-[ RECORD 1 ]+---------------
array_dims | [1:100][1:530]
array_ndims | 2
array_length | 100
array_length | 530
I can get elements with two-level indexing:
regress=> SELECT test.arr[4][4] FROM test;
arr
------
1594
(1 row)
or a "column" with slicing:
regress=> SELECT test.arr[4:4][1:530] FROM test;
Oddly, this is still a two-dimensional array, the top dimension is just one element deep. You can flatten it (inefficiently) with unnest and array_agg if you need to.
Two-dimensional arrays in PostgreSQL are somewhat weird, as you can see, but so is what you're trying to do.