not understand json_agg in this context - postgresql

Reference: Query jsonb column containing array of JSON objects
begin;
CREATE TEMP TABLE segments (segments_id serial PRIMARY KEY, payload jsonb);
INSERT INTO segments (payload)
VALUES ('[{"kind": "person", "limit": "1"}, {"kind": "B", "filter_term": "fin"}]');
INSERT INTO segments (payload)
VALUES ('[{"kind": "person", "limit": "3"}, {"kind": "A", "filter_term": "abc"}]');
INSERT INTO segments (payload)
VALUES ('[{"kind": "person", "limit": "2"}, {"kind": "C", "filter_term": "def"}]');
commit;
CTE query:
with a as (select jsonb_array_elements(s.payload) j from segments s)
SELECT json_agg(a.j) AS filtered_payload from a
where j #> '{"kind":"person"}';
Return: [{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}]
This QueryA:
SELECT a.filtered_payload,
a.ct_elem_row
, sum(ct_elem_row) OVER () AS ct_elem_total
, count(*) OVER () AS ct_rows
FROM segments s
JOIN LATERAL (
SELECT json_agg(j.elem) AS filtered_payload, count(*) AS ct_elem_row
FROM jsonb_array_elements(s.payload) j(elem)
WHERE j.elem #> '{"kind":"person"}'
) a ON ct_elem_row > 0
WHERE s.payload #> '[{"kind":"person"}]';
return :
In QueryA, the structure is like: select ... from segments s join lateral filtered_payload.... segments is 3 rows lateral join with one row (filtered_payload). filtered_payload return only row as per CTE query, as an a consolidate JSON array. So overall I am very confused with json_agg in the QueryA.
Edit at 2021-10-05 16:36 +5:30:
Even following code, a.filtered_payload return 3 jsonb array, instead of one arrgregate json array. I don't know when already aggregated jsonb array (using json_agg function) unnested to serveal jsonb arrays.
SELECT a.filtered_payload, s.*
FROM segments s
cross JOIN LATERAL (
SELECT json_agg(j.elem) AS filtered_payload
FROM jsonb_array_elements(s.payload) j(elem)
WHERE j.elem #> '{"kind":"person"}') a;

I believe the LATERAL JOIN is doing the trick there.
In your original query, you're using json_agg against the whole table dataset (filtered by '{"kind":"person"}')
with a as
(
select jsonb_array_elements(s.payload) j
from segments s
)
SELECT json_agg(a.j) AS filtered_payload
from a
where j #> '{"kind":"person"}';
Meanwhile in the second instance, you are playing with one row at the time using the LATERAL. That's why you end up having 3 rows with single "kind":"person" values instead of an unique row with 3 values.
Not sure about what you're trying to achieve but the following could put you in the right direction
SELECT a.filtered_payload,
a.ct_elem_row,
sum(ct_elem_row) OVER () AS ct_elem_total,
count(*) OVER () AS ct_rows
FROM segments s
JOIN LATERAL (
SELECT json_agg(j.elem) AS filtered_payload,
count(*) AS ct_elem_row
FROM segments d, lateral jsonb_array_elements(d.payload) j(elem)
WHERE j.elem #> '{"kind":"person"}'
) a ON ct_elem_row > 0
WHERE s.payload #> '[{"kind":"person"}]';
Results
filtered_payload | ct_elem_row | ct_elem_total | ct_rows
--------------------------------------------------------------------------------------------------------+-------------+---------------+---------
[{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}] | 3 | 9 | 3
[{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}] | 3 | 9 | 3
[{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}] | 3 | 9 | 3
(3 rows)

Related

Update property of object in jsonb array and keep other properties

I have a postgres table with a jsonb colum like this:
create table if not exists doc
(
id uuid not null
constraint pkey_doc_id
primary key,
data jsonb not null
);
INSERT INTO doc (id, data) VALUES ('3cf40366-ea58-402d-b63b-c9d6fdf99ec8', '{"Id": "3cf40366-ea58-402d-b63b-c9d6fdf99ec8", "Tags": [{"Key": "inoivce", "Value": "70086"},{"Key": "customer", "Value": "100233"}] }' );
INSERT INTO doc (id, data) VALUES ('ae2d1119-adb9-41d2-96e9-53445eaf97ab', '{"Id": "ae2d1119-adb9-41d2-96e9-53445eaf97ab", "Tags": [{"Key": "project", "Value": "12345"},{"Key": "customer", "Value": "100233"}]}' );b9-41d2-96e9-53445eaf97ab", "Tags": [{"Key": "customer", "Value": "100233"}]}' )
Tags.Key in the first row contains a typo inoivce which I want to fix to invoice:
{
"Id": "3cf40366-ea58-402d-b63b-c9d6fdf99ec8",
"Tags": [{
"Key": "inoivce",
"Value": "70086"
},{
"Key": "customer",
"Value": "100233"
}]
}
I tried this:
update doc set data = jsonb_set(
data,
'{"Tags"}',
$${"Key":"invoice"}$$
) where data #> '{"Tags": [{ "Key":"inoivce"}]}';
The typo gets fixed but I'm loosing the other Tags elements in the array:
{
"Id": "3cf40366-ea58-402d-b63b-c9d6fdf99ec8",
"Tags": [{"Key": "invoice"}]
}
How can I fix the typo without removing the other elements of the Tags array?
Dbfiddle for repro.
One possible solution, not so obvious : we need a CTE because the idea here is to loop on the 'Tags' jsonb array elements using the jsonb_agg aggregate function to rebuild the array, but the SET clause of an UPDATE doesn't accept aggregate functions ...
WITH list AS
( SELECT d.id, (d.data - 'Tags') || jsonb_build_object('Tags', jsonb_agg(jsonb_set(e.content, '{Key}' :: text[], to_jsonb(replace(e.content->>'Key', 'inoivce', 'invoice'))) ORDER BY e.id)) AS data
FROM doc AS d
CROSS JOIN LATERAL jsonb_array_elements(d.data->'Tags') WITH ORDINALITY AS e(content, id)
WHERE d.data #? '$.Tags[*] ? (exists(# ? (#.Key == "inoivce")))'
GROUP BY d.id, d.data
)
UPDATE doc AS d
SET data = l.data
FROM list AS l
WHERE d.id = l.id
see the result in dbfiddle

Calculate sum for items in nested jsonb array

So I have Postgres DB table 'purchases' with columns 'id' and 'receipt'. 'id' is primary int column, the value in column 'receipt' is jsonb and can look like this:
{
"shop_name":"some_shop_name",
"items": [
{
"name": "foo",
"spent": 49,
"quantity": 1
},
{
"name": "bar",
"price": 99,
"quantity": 2
},
{
"name": "abc",
"price": 999,
"quantity": 1
},
...
]
}
There can be varied amount of items in receipt.
In the end I need to write a query so the resulting table contains purchase id and amount spent on all bar for every purchase in table as such:
id
spent
1
198
..
...
My issue:
I can't figure out how to work with jsonb inside select query along regular columns in the resulting table, I guess query should be structured as this:
SELECT p.id, %jsonb_parsing_result_here% AS spent
FROM purchases p
It's blocking me from moving further with iterating through items in FOR cycle (or maybe using another way).
You can unnest the items from the receipt column with a lateral join and the jsonb_array_elements function like:
SELECT p.id, SUM((item->>'price')::NUMERIC)
FROM purchases p,
LATERAL jsonb_array_elements(p.receipt->'items') r (item)
GROUP BY p.id
It's also possible to add a where condition, for example:
WHERE item->>'name' = 'bar

How to delete subsequent matching rows?

I have jsonb datatype, where each row has a name, last_updated, among other keys. How would I go about creating a query, which would leave only 1 row per name per day?
i.e. this:
id | data
1 | {"name": "foo1", "last_updated": "2019-10-06T09:29:30.000Z"}
2 | {"name": "foo1", "last_updated": "2019-10-06T01:29:30.000Z"}
3 | {"name": "foo1", "last_updated": "2019-10-07T01:29:30.000Z"}
4 | {"name": "foo2", "last_updated": "2019-10-06T09:29:30.000Z"}
5 | {"name": "foo2", "last_updated": "2019-10-06T01:29:30.000Z"}
6 | {"name": "foo2", "last_updated": "2019-10-06T02:29:30.000Z"}
becomes:
id | data
1 | {"name": "foo1", "last_updated": "2019-10-06T09:29:30.000Z"}
3 | {"name": "foo1", "last_updated": "2019-10-07T01:29:30.000Z"}
4 | {"name": "foo2", "last_updated": "2019-10-06T09:29:30.000Z"}
This query will run on some 9 million rows, on roughly 300 names.
Try something like this:
Table
create table test (
id serial,
data jsonb
);
Data
insert into test (data) values
('{"name": "foo1", "last_updated": "2019-10-06T09:29:30.000Z"}'),
('{"name": "foo1", "last_updated": "2019-10-06T01:29:30.000Z"}'),
('{"name": "foo1", "last_updated": "2019-10-07T01:29:30.000Z"}'),
('{"name": "foo2", "last_updated": "2019-10-06T09:29:30.000Z"}'),
('{"name": "foo2", "last_updated": "2019-10-06T01:29:30.000Z"}'),
('{"name": "foo2", "last_updated": "2019-10-06T02:29:30.000Z"}');
Query
with latest as (
select data->>'name' as name, max(data->>'last_updated') as last_updated
from test
group by data->>'name'
)
delete from test t
where not exists (
select 1 from latest
where t.data->>'name' = name
and t.data->>'last_updated' = last_updated
);
select * from test;
Example
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=2415e6f2c9c7980e69d178a331120dcd
You might have to index your jsonb column like create index on test((data->>'name'));; you could do that for last_updated also.
I make the assumption that a user doesn't have identical last_updated.
If that assumption is not true, you could try this:
with ranking as (
select
row_number() over (partition by data->>'name' order by data->>'last_updated' desc) as sr,
x.*
from test x
)
delete from test t
where not exists (
select 1 from ranking
where sr = 1
and id = t.id
);
In this case, we first give a serial number to users' records. Each user's latest_updated time gets sr 1.
Then, we ask the database to delete all records that aren't a match for sr 1's id.
Example: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=dba1879a755ed0ec90580352f82554ee

Postgresql query array of objects in JSONB field filter Specific Object

CREATE TABLE company (id SERIAL, companyJson JSONB);
CREATE INDEX comapny_gin_idx ON company USING gin (companyJson);
INSERT INTO company (id, companyJson)
VALUES (1, '[{"name": "t", "company": "company1"}]');
INSERT INTO company (id, companyJson)
VALUES (2, '[{"name": "b", "company":"company2"}, {"name": "b", "company":"company3"}]');
SELECT * FROM company WHERE companyJson #> '[{"company": "company2" , "name": "b"}]';
The output of the above program is
2 [{"name": "b", "company": "company2"}, {"name": "b", "company": "company3"}]
Is there anyway to return {"name": "b", "company": "company2"} instead whole row.
I can only think of unnesting the array and the return the element from that:
SELECT x.j
FROM company c
cross join jsonb_array_elements(c.companyjson) as x(j)
where x.j = '{"company": "company2" , "name": "b"}'
You can directly return the first component through companyJson -> 0 which contains -> operand returning the first component by argument zero :
SELECT companyJson -> 0 as companyJson
FROM company
WHERE companyJson #> '[{"company": "company2" , "name": "b"}]';
Demo

How do I just select the "occupation"?

I have the following:
SELECT *
FROM (
SELECT '{"people": [{"name": "Bob", "occupation": "janitor"}, {"name": "Susan", "occupation": "CEO"}]}'::jsonb as data
) as b
WHERE data->'people' #> '[{"name":"Bob"}]'::jsonb;
I am filtering for the object '{"name": "Bob", "occupation": "janitor"}'
How do I return Bob's occupation ("janitor")?
SELECT data->'people'->>'occupation'
FROM (
SELECT '{"people": [{"name": "Bob", "occupation": "janitor"}, {"name": "Susan", "occupation": "CEO"}]}'::jsonb as data
) as b
WHERE data->'people' #> '[{"name":"Bob"}]'::jsonb;
returns
?column?
--------
NULL
Looking for:
occupation
----------
janitor
If you don't care about anything else on the row where the jsonb is, you can take all elements out of the jsonb and then use them as separate elements to select from
SELECT data->>'occupation' as occupation
FROM (
SELECT jsonb_array_elements(
'{"people":
[
{"name": "Bob", "occupation": "janitor"},
{"name": "Susan", "occupation": "CEO"}
]
}'::jsonb->'people') as data) as b
WHERE data #> '{"name":"Bob"}';
Results
occupation
-----------
janitor
(1 row)
Your "people" element is an array. You can get at the elements of an array with the jsonb_array_elements function. After that, you can just filter on person->>'name':
SELECT person->>'occupation' as occupation
FROM (
SELECT person.value as person
FROM (
SELECT
'{"people":
[
{"name": "Bob", "occupation": "janitor"},
{"name": "Susan", "occupation": "CEO"}
]
}'::jsonb as data
) a
CROSS JOIN
jsonb_array_elements(data->'people') as person
) b
WHERE person->>'name' = 'Bob';
Note that ->> returns text, while -> returns jsonb.