PostgreSQL JSONB grouping array values inside a hash - postgresql

We have a PostgreSQL jsonb column containing hashes which in turn contain arrays of values:
id | hashes
---------------
1 | {"sources"=>["a","b","c"], "ids"=>[1,2,3]}
2 | {"sources"=>["b","c","d","e","e"], "ids"=>[1,2,3]}
What we'd like to do is create a jsonb query which would return
code | count
---------------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
we've been trying something along the lines of
SELECT jsonb_to_recordset(hashes->>'sources')
but that's not working - any help with this hugely appreciated...

The setup (should be a part of the question, note the proper json syntax):
create table a_table (id int, hashes jsonb);
insert into a_table values
(1, '{"sources":["a","b","c"], "ids":[1,2,3]}'),
(2, '{"sources":["b","c","d","e","e"], "ids":[1,2,3]}');
Use the function jsonb_array_elements():
select code, count(code)
from
a_table,
jsonb_array_elements(hashes->'sources') sources(code)
group by 1
order by 1;
code | count
------+-------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
(5 rows)

SELECT h, count(*)
FROM (
SELECT jsonb_array_elements_text(hashes->'sources') AS h FROM mytable
) sub
GROUP BY h
ORDER BY h;

We finally got this working this way:
SELECT jsonb_array_elements_text(hashes->'sources') as s1,
count(jsonb_array_elements_text(hashes->'sources'))
FROM a_table
GROUP BY s1;
but Klin's solution is more complete and both Klin and Patrick got there quicker than us (thank you both) - so points go to them.

Related

Postgres, get two row values that are both linked to the same ID

I have a rather tricky database problem that has really stumped me, would appreciate any help.
I have a table which includes data from multiple different sources. This data from different sources can be ‘duplicated’ and we have ways of identifying if that is the case.
Each row in the table has an ‘id’, and if it is identified as a duplicate of another row then we merge it, and it is given a ‘merged_into_id’ which refers to another row in the same table.
I am trying to run a report which will return information about where we have identified duplicates from two of those different sources.
Lets say I have three sources: A, B and C. I want to identify all of the duplicate rows between source A and source B.
I have got the query working fine to do this if a row from source A is directly merged into source B. However, we also have instances in the DB where source A row AND source B row are merged into source C. I am struggling with these and was hoping someone could help with that.
An example:
Original DB:
id
source
merged_into_id
1
A
3
2
B
3
3
C
NULL
What I would like to do is to be able to return id 1 and id 2 from that table, as they are both merged into the same ID e.g. like so:
source_a_id
source_b_id
1
2
But I'm really struggling to get to that - all I've managed to do is create a parent and child link like the following:
parent_id
child_id
child_source
3
1
A
3
2
B
I can also return just the IDs that I want, but they don't 'join' so to speak:
e.g.
SELECT
CASE WHEN child_source = 'A' then child_id as source_a_id,
CASE WHEN child_source = 'B' then child_id as source_b_id
But that just gives me a response with an empty row for the 'missing' data
---EDIT---
Using array_agg and array_to_string I've gotten a little closer to what I need:
SELECT
parent.id as parent_id,
ARRAY_TO_STRING(
ARRAY_AGG(CASE WHEN child_source = 'A' THEN child.id END)
, ','
) a_id,
ARRAY_TO_STRING(
ARRAY_AGG(CASE WHEN child_source = 'B' THEN child.id END)
, ','
) b_id
but its not quite the right format as I can occasionally have multiple versions from each source, so I get a table that looks like :
parent_id
a_id
b_id
3
1
2,4,5
In this case, I want to return a table that looks like:
parent_id
a_id
b_id
3
1
2
3
1
4
3
1
5
Does anyone have any advice on getting to my desired output? Many thanks
Suppose that we have this table
select * from t;
id | source | merged_into_id
----+--------+----------------
1 | A | 3
2 | B | 3
3 | C |
5 | B | 3
4 | B | 3
(5 rows)
This should do the work
WITH B_source as (select * from t where source = 'B'),
A_source as (select * from t where source = 'A')
SELECT merged_into_id,A_source.id as a_id,B_source.id as b_id
FROM A_source
INNER JOIN B_source using (merged_into_id);
Result
merged_into_id | a_id | b_id
----------------+------+------
3 | 1 | 2
3 | 1 | 5
3 | 1 | 4
(3 rows)

extract all values of postgresql jsonb object

i have a postgresql table t1 , id integer , data jsonb
id | data
--------------------
1 | {"1":{"11":11},"2":{"12":12}}
and i need a function to extract all key/value in separate rows
like this
key | values
----------------------
1 | {"11":11}
2 | {"12":12}
in "hstore" dataType , there was "hvals" function , do this
but in jsonb i dont find similar function
You are looking for jsonb_each
with t1 (id, data) as (
values (1, '{"1":{"11":11},"2":{"12":12}}'::jsonb)
)
select t.*
from t1, jsonb_each(data) as t(k,v)
returns:
k | v
--+-----------
1 | {"11": 11}
2 | {"12": 12}

Redshift. Convert comma delimited values into rows with all combinations

I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell
I would like to see:
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | start,stop
1 | Shone | start,cancell
1 | Shone | start,stop,cancell
1 | Shone | stop
1 | Shone | stop,cancell
1 | Shone | cancell
....
You can create the following Python UDF:
create or replace function get_unique_combinations(list varchar(max))
returns varchar(max)
stable as $$
from itertools import combinations
arr = list.split(',')
response = []
for L in range(1, len(arr)+1):
for subset in combinations(arr, L):
response.append(','.join(subset))
return ';'.join(response)
$$ language plpythonu;
that will take your list of actions and return unique combinations separated by semicolon (elements in combinations themselves will be separated by commas). Then you use a UNION hack to split values into separate rows like this:
WITH unique_combinations as (
SELECT
user_id
,user_name
,get_unique_combinations(user_actions) as action_combinations
FROM your_table
)
,unwrap_lists as (
SELECT
user_id
,user_name
,split_part(action_combinations,';',1) as parsed_action
FROM unique_combinations
UNION ALL
SELECT
user_id
,user_name
,split_part(action_combinations,';',2) as parsed_action
FROM unique_combinations
-- as much UNIONS as possible combinations you have for a single element, with the 3rd parameter (1-based array index) increasing by 1
)
SELECT *
FROM unwrap_lists
WHERE parsed_action is not null

Migrate flat jsonb to hstore

I run postgres 9.4, and want to migrate column in my database table to hstore just to be able to make performance comparison.
My current column is key-value pair in jsonb, w/o nested structure.
Any tips how to approach this problem?
Example data:
create table jsons (id int, val jsonb);
insert into jsons values
(1, '{"age":22}'),
(2, '{"height":182}'),
(3, '{"age":30, "height":177}');
Split json objects to key, value pairs:
select id, (jsonb_each_text(val)).key, (jsonb_each_text(val)).value
from jsons
id | key | value
----+--------+-------
1 | age | 22
2 | height | 182
3 | age | 30
3 | height | 177
(4 rows)
Aggregate the pairs and convert them to hstore:
select id, hstore(array_agg(key), array_agg(value))
from (
select id, (jsonb_each_text(val)).key, (jsonb_each_text(val)).value
from jsons
) sub
group by 1
order by 1
id | hstore
----+------------------------------
1 | "age"=>"22"
2 | "height"=>"182"
3 | "age"=>"30", "height"=>"177"
(3 rows)
The same can be accomplished in a more elegant way using lateral join:
select id, hstore(array_agg(key), array_agg(value))
from jsons
cross join jsonb_each_text(val)
group by 1
order by 1;

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.