How to delete subsequent matching rows? - postgresql

I have jsonb datatype, where each row has a name, last_updated, among other keys. How would I go about creating a query, which would leave only 1 row per name per day?
i.e. this:
id | data
1 | {"name": "foo1", "last_updated": "2019-10-06T09:29:30.000Z"}
2 | {"name": "foo1", "last_updated": "2019-10-06T01:29:30.000Z"}
3 | {"name": "foo1", "last_updated": "2019-10-07T01:29:30.000Z"}
4 | {"name": "foo2", "last_updated": "2019-10-06T09:29:30.000Z"}
5 | {"name": "foo2", "last_updated": "2019-10-06T01:29:30.000Z"}
6 | {"name": "foo2", "last_updated": "2019-10-06T02:29:30.000Z"}
becomes:
id | data
1 | {"name": "foo1", "last_updated": "2019-10-06T09:29:30.000Z"}
3 | {"name": "foo1", "last_updated": "2019-10-07T01:29:30.000Z"}
4 | {"name": "foo2", "last_updated": "2019-10-06T09:29:30.000Z"}
This query will run on some 9 million rows, on roughly 300 names.

Try something like this:
Table
create table test (
id serial,
data jsonb
);
Data
insert into test (data) values
('{"name": "foo1", "last_updated": "2019-10-06T09:29:30.000Z"}'),
('{"name": "foo1", "last_updated": "2019-10-06T01:29:30.000Z"}'),
('{"name": "foo1", "last_updated": "2019-10-07T01:29:30.000Z"}'),
('{"name": "foo2", "last_updated": "2019-10-06T09:29:30.000Z"}'),
('{"name": "foo2", "last_updated": "2019-10-06T01:29:30.000Z"}'),
('{"name": "foo2", "last_updated": "2019-10-06T02:29:30.000Z"}');
Query
with latest as (
select data->>'name' as name, max(data->>'last_updated') as last_updated
from test
group by data->>'name'
)
delete from test t
where not exists (
select 1 from latest
where t.data->>'name' = name
and t.data->>'last_updated' = last_updated
);
select * from test;
Example
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=2415e6f2c9c7980e69d178a331120dcd
You might have to index your jsonb column like create index on test((data->>'name'));; you could do that for last_updated also.
I make the assumption that a user doesn't have identical last_updated.
If that assumption is not true, you could try this:
with ranking as (
select
row_number() over (partition by data->>'name' order by data->>'last_updated' desc) as sr,
x.*
from test x
)
delete from test t
where not exists (
select 1 from ranking
where sr = 1
and id = t.id
);
In this case, we first give a serial number to users' records. Each user's latest_updated time gets sr 1.
Then, we ask the database to delete all records that aren't a match for sr 1's id.
Example: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=dba1879a755ed0ec90580352f82554ee

Related

not understand json_agg in this context

Reference: Query jsonb column containing array of JSON objects
begin;
CREATE TEMP TABLE segments (segments_id serial PRIMARY KEY, payload jsonb);
INSERT INTO segments (payload)
VALUES ('[{"kind": "person", "limit": "1"}, {"kind": "B", "filter_term": "fin"}]');
INSERT INTO segments (payload)
VALUES ('[{"kind": "person", "limit": "3"}, {"kind": "A", "filter_term": "abc"}]');
INSERT INTO segments (payload)
VALUES ('[{"kind": "person", "limit": "2"}, {"kind": "C", "filter_term": "def"}]');
commit;
CTE query:
with a as (select jsonb_array_elements(s.payload) j from segments s)
SELECT json_agg(a.j) AS filtered_payload from a
where j #> '{"kind":"person"}';
Return: [{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}]
This QueryA:
SELECT a.filtered_payload,
a.ct_elem_row
, sum(ct_elem_row) OVER () AS ct_elem_total
, count(*) OVER () AS ct_rows
FROM segments s
JOIN LATERAL (
SELECT json_agg(j.elem) AS filtered_payload, count(*) AS ct_elem_row
FROM jsonb_array_elements(s.payload) j(elem)
WHERE j.elem #> '{"kind":"person"}'
) a ON ct_elem_row > 0
WHERE s.payload #> '[{"kind":"person"}]';
return :
In QueryA, the structure is like: select ... from segments s join lateral filtered_payload.... segments is 3 rows lateral join with one row (filtered_payload). filtered_payload return only row as per CTE query, as an a consolidate JSON array. So overall I am very confused with json_agg in the QueryA.
Edit at 2021-10-05 16:36 +5:30:
Even following code, a.filtered_payload return 3 jsonb array, instead of one arrgregate json array. I don't know when already aggregated jsonb array (using json_agg function) unnested to serveal jsonb arrays.
SELECT a.filtered_payload, s.*
FROM segments s
cross JOIN LATERAL (
SELECT json_agg(j.elem) AS filtered_payload
FROM jsonb_array_elements(s.payload) j(elem)
WHERE j.elem #> '{"kind":"person"}') a;
I believe the LATERAL JOIN is doing the trick there.
In your original query, you're using json_agg against the whole table dataset (filtered by '{"kind":"person"}')
with a as
(
select jsonb_array_elements(s.payload) j
from segments s
)
SELECT json_agg(a.j) AS filtered_payload
from a
where j #> '{"kind":"person"}';
Meanwhile in the second instance, you are playing with one row at the time using the LATERAL. That's why you end up having 3 rows with single "kind":"person" values instead of an unique row with 3 values.
Not sure about what you're trying to achieve but the following could put you in the right direction
SELECT a.filtered_payload,
a.ct_elem_row,
sum(ct_elem_row) OVER () AS ct_elem_total,
count(*) OVER () AS ct_rows
FROM segments s
JOIN LATERAL (
SELECT json_agg(j.elem) AS filtered_payload,
count(*) AS ct_elem_row
FROM segments d, lateral jsonb_array_elements(d.payload) j(elem)
WHERE j.elem #> '{"kind":"person"}'
) a ON ct_elem_row > 0
WHERE s.payload #> '[{"kind":"person"}]';
Results
filtered_payload | ct_elem_row | ct_elem_total | ct_rows
--------------------------------------------------------------------------------------------------------+-------------+---------------+---------
[{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}] | 3 | 9 | 3
[{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}] | 3 | 9 | 3
[{"kind": "person", "limit": "1"}, {"kind": "person", "limit": "3"}, {"kind": "person", "limit": "2"}] | 3 | 9 | 3
(3 rows)

Postgresql query array of objects in JSONB field filter Specific Object

CREATE TABLE company (id SERIAL, companyJson JSONB);
CREATE INDEX comapny_gin_idx ON company USING gin (companyJson);
INSERT INTO company (id, companyJson)
VALUES (1, '[{"name": "t", "company": "company1"}]');
INSERT INTO company (id, companyJson)
VALUES (2, '[{"name": "b", "company":"company2"}, {"name": "b", "company":"company3"}]');
SELECT * FROM company WHERE companyJson #> '[{"company": "company2" , "name": "b"}]';
The output of the above program is
2 [{"name": "b", "company": "company2"}, {"name": "b", "company": "company3"}]
Is there anyway to return {"name": "b", "company": "company2"} instead whole row.
I can only think of unnesting the array and the return the element from that:
SELECT x.j
FROM company c
cross join jsonb_array_elements(c.companyjson) as x(j)
where x.j = '{"company": "company2" , "name": "b"}'
You can directly return the first component through companyJson -> 0 which contains -> operand returning the first component by argument zero :
SELECT companyJson -> 0 as companyJson
FROM company
WHERE companyJson #> '[{"company": "company2" , "name": "b"}]';
Demo

How can I do less than, greater than in JSON array object Postgres fields?

I want to retrieve data by specific field operation it store array of object. i want to add new object in it.
CREATE TABLE justjson ( id INTEGER, doc JSONB);
INSERT INTO justjson VALUES ( 1, '[
{
"name": "abc",
"age": "22"
},
{
"name": "def",
"age": "23"
}
]');
retrieve data where age is greater then and equal to 23 how is possible
eg using jsonb_array_elements:
t=# with a as (select *,jsonb_array_elements(doc) k from justjson)
select k from a where (k->>'age')::int >= 23;
k
------------------------------
{"age": "23", "name": "def"}
(1 row)

Search for jsonb concatenated values specified by keys

I have jsonb data as :
id | data
1 | {"last": "smith", "first": "john", "title": "mr"}
2 | {"last": "smith jr", "first": "john", "title": "mr"}
3 | {"last": "roberts", "first": "Julia", "title": "miss"}
4 | {"last": "2nd", "first": "john smith", "title": "miss"}
I need to search for records which match with "John smith"; So, in this case IDs - 1,2,4
I cannot separate the search for each key => value pair; I need to get concatenated entry for records to check against incoming request.
I have tried
select * from contacts where jsonb_concat(data->>'title'::TEXT || data->>'first'::TEXT || data->>'last'::TEXT) ilike "John smith";
This doesn't work because I am trying to concat values and not jsonb object. Is there any way to concat jsonb values specified by keys?
I solved it myself with some research..
my query is like -
select * from contacts where trim(regexp_replace(CONCAT(data->>'title'::TEXT,' ',data->>'first'::TEXT,' ',data->>'last'::TEXT), '\s+', ' ', 'g')) ilike '%john%';
For PostgreSQL versions > 9.1 .. you can use '\s+' instead of '\s+'.
hope this helps someone.

How do I just select the "occupation"?

I have the following:
SELECT *
FROM (
SELECT '{"people": [{"name": "Bob", "occupation": "janitor"}, {"name": "Susan", "occupation": "CEO"}]}'::jsonb as data
) as b
WHERE data->'people' #> '[{"name":"Bob"}]'::jsonb;
I am filtering for the object '{"name": "Bob", "occupation": "janitor"}'
How do I return Bob's occupation ("janitor")?
SELECT data->'people'->>'occupation'
FROM (
SELECT '{"people": [{"name": "Bob", "occupation": "janitor"}, {"name": "Susan", "occupation": "CEO"}]}'::jsonb as data
) as b
WHERE data->'people' #> '[{"name":"Bob"}]'::jsonb;
returns
?column?
--------
NULL
Looking for:
occupation
----------
janitor
If you don't care about anything else on the row where the jsonb is, you can take all elements out of the jsonb and then use them as separate elements to select from
SELECT data->>'occupation' as occupation
FROM (
SELECT jsonb_array_elements(
'{"people":
[
{"name": "Bob", "occupation": "janitor"},
{"name": "Susan", "occupation": "CEO"}
]
}'::jsonb->'people') as data) as b
WHERE data #> '{"name":"Bob"}';
Results
occupation
-----------
janitor
(1 row)
Your "people" element is an array. You can get at the elements of an array with the jsonb_array_elements function. After that, you can just filter on person->>'name':
SELECT person->>'occupation' as occupation
FROM (
SELECT person.value as person
FROM (
SELECT
'{"people":
[
{"name": "Bob", "occupation": "janitor"},
{"name": "Susan", "occupation": "CEO"}
]
}'::jsonb as data
) a
CROSS JOIN
jsonb_array_elements(data->'people') as person
) b
WHERE person->>'name' = 'Bob';
Note that ->> returns text, while -> returns jsonb.