Postgres: how to delete duplicate based on field in jsonb column - postgresql

I have a postgres table with a created_at and data column, where the data column is in jsonb format.
A row is considered duplicated if it has the same value for the id propery in the data column.
The table looks like this
created_at | data
--------------------------------------------------------------------------
1 2018-03-20 | {"id": "abc", "name": "please"}
2 2018-01-10 | {"id": "sdf", "name": "john" }
3 2018-03-31 | {"id": "lkj", "name": "doe" }
4 2018-02-30 | {"id": "dfg", "name": "apple"}
5 2018-05-24 | {"id": "dfg", "name": "seed" }
6 2018-03-27 | {"id": "23f", "name": "need" }
7 2018-11-14 | {"id": "abc", "name": "help" }
What is an efficient way to remove duplicates in this table? I do want to keep one instance behind
ie. if 5 entries have the same id I want to delete 4 and leave 1 of them in table
In this scenario I want to remove one of the entries with id='abc' and id='dfg'

Related

Query for jsonb. How select by value from jsonb? [duplicate]

This question already has answers here:
JSONB in where clause postgres
(2 answers)
how to query array of nested json in postgresql
(2 answers)
Closed 3 months ago.
I have a postrgresql table with jsonb products column:
| products |
| --------------------------------------------------- |
| [{"id": "eaaca8bc-c8a0-45f7-9698-d4fc701d2e5a", "#type": "#game", "extId": "da32af17-fa03-4a62-bd04-f026d04d16e9"}, {"id": "5fc5de21-9cb7-4bd3-a723-7936bfef7cde", "#type": "#book", "extId": "c945f005-2d37-491c-8ba9-9da2709a3aab"}, {"id": "892fe85c-d7d6-4815-8dec-1720b644205a", "#type": "#sport", "extId": "c252dcba-2a14-4e75-90db-29ccac2499d2"}] |
| [{"id": "gh6d86ls-wj8o-39r4-2694-1720b644205a", "#type": "#game", "extId": "da32af17-fa03-4a62-bd04-f026d04d16e9"}] |
| |
| [{"id": "892fe85c-d7d6-4815-8dec-1720b644205a", "#type": "#sport", "extId": "c252dcba-2a14-4e75-90db-29ccac2499d2"}] |
Example of pretty json from products column:
[
{
"id": "eaaca8bc-c8a0-45f7-9698-d4fc701d2e5a",
"#type": "#game",
"extId": "da32af17-fa03-4a62-bd04-f026d04d16e9"
},
{
"id": "5fc5de21-9cb7-4bd3-a723-7936bfef7cde",
"#type": "#book",
"extId": "c945f005-2d37-491c-8ba9-9da2709a3aab"
}
]
How can I select all rows, where "#type" equals "#game" and "extId" equals da32af17-fa03-4a62-bd04-f026d04d16e9?
You can use the contains operator #>
select *
from the_table
where products #> '[{"#type": "#game", "extId": "da32af17-fa03-4a62-bd04-f026d04d16e9"}]

Postgres expand array of object (jsonb)

I have a postgres table in which I want to expand a jsonb column.
The table (called runs) has each participant as a single row, with the jsonb column holding their experiment data.
id |data |
---|-------------|
id1|[{}, {}] |
id2|[{}, {}, {}] |
The jsonb column is always an array with an unknown number of objects, where each object has an unknown number of arbitrary keys.
[
{
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]
I would like to either (1) expand the jsonb column so each object is a row:
id |rt | phase | question | choice | options |
---|------| ------------- | -------- | ------ | -------------- |
id1| 3698 | questionnaire | 1 | | |
id1| 5467 | choice | | 0 | ["red", "blue] |
OR (2) map the other columns in the row to the jsonb array (here the "id" key):
[
{
"id": "id1",
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"id": "id1",
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]
The fact that the number of objects, the number of keys per object, and the keys themselves are unknown a priori is really stumping me on how to accomplish this. Maybe something like this, but this isn't right...
SELECT id, x.*
FROM
runs_table,
jsonb_populate_recordset(null:runs_table, data) x
PostgreSQL has a many JSON functions. Firstly you must extract keys and values from 'JSONB'. After then you can get the type of values using Postgres jsonb_typeof(jsonb) function. I wrote two samples for you:
-- sample 1
select *
from jsonb_array_elements(
'[
{
"id": "id1",
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"id": "id1",
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]'::jsonb
) t1 (json_data)
cross join jsonb_each(t1.json_data) t2(js_key, js_value)
where jsonb_typeof(t2.js_value::jsonb) = 'array'
-- sample 2
select *
from jsonb_array_elements(
'[
{
"id": "id1",
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"id": "id1",
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]'::jsonb
) t1 (json_data)
where jsonb_typeof((t1.json_data->'options')::jsonb) = 'array'
Sample 1: This Query will extract all keys and values from JSONB and after then will be set filtering for showing only array type of values.
Sample 2: Use this query if you know which keys can be array types.

PostgreSQL jsonb_path_query removes result instead of returning null value

In an example table:
create table example
(
id serial not null
constraint example_pk
primary key,
data json not null
);
and data
INSERT INTO public.example (id, data) VALUES (1, '[{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}]');
INSERT INTO public.example (id, data) VALUES (2, '[{"key": "1", "value": "val1"}]');
INSERT INTO public.example (id, data) VALUES (3, '[{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}]');
id
data
1
[{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}]
2
[{"key": "1", "value": "val1"}]
3
[{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}]
I want to query the value field in the data column where key = 2
The query I'm currently using is this:
SELECT id,
jsonb_path_query(
TO_JSONB(data),
'$[*] ? (#.key == "2").value'::JSONPATH
)::VARCHAR AS values
FROM example
I would expect the results to be:
id
values
1
"val2"
2
null
3
"val2"
But the actual result is:
id
values
1
"val2"
3
"val2"
Is there a reason why the null output of jsonb_path_query is omitted? How do I get it to behave the way I'm expecting?
You want jsonb_path_query_first() if you want the result of the path expression:
SELECT id,
jsonb_path_query_first(data, '$[*] ? (#.key == "2").value') AS values
FROM example
Note that this returns a jsonb value. If you want a text value, use:
jsonb_path_query_first(data, '$[*] ? (#.key == "2").value') #>> '{}
As per PostgreSQL documentation the filter acts as WHERE condition
When defining the path, you can also use one or more filter expressions that work similar to the WHERE clause in SQL. A filter expression begins with a question mark and provides a condition in parentheses:
I managed to achieve what you're looking for using the LATERAL and a LEFT JOIN
SELECT id,
*
FROM example left join
LATERAL jsonb_path_query(
TO_JSONB(data),
'$[*] ? (#.key == "2").value'::JSONPATH)
on true;
Result
id | id | data | jsonb_path_query
----+----+----------------------------------------------------------------+------------------
1 | 1 | [{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}] | "val2"
2 | 2 | [{"key": "1", "value": "val1"}] |
3 | 3 | [{"key": "1", "value": "val1"}, {"key": "2", "value": "val2"}] | "val2"
(3 rows)

Search and update a JSON array element in Postgres

I have a Jsonb column that store array of elements like the following:
[
{"id": "11", "name": "John", "age":"25", ..........},
{"id": "22", "name": "Mike", "age":"35", ..........},
{"id": "33", "name": "Tom", "age":"45", ..........},
.....
]
I want to replace the 2nd object(id=22) with a totally new object. I don't want to update each property one by one because there are many properties and their values all could have changed. I want to just identify the 2nd element and replace the whole object.
I know there is a jsonb_set(). However, to update the 2nd element, I need to know its array index=1 so I can do the following:
jsonb_set(data, '{1}', '{"id": "22", "name": "Don", "age":"55"}',true)
But I couldn't find any way to search and get that index. Can someone help me out?
One way I can think of is to combine row_number and json_array_elements:
-- test data
create table test (id integer, data jsonb);
insert into test values (1, '[{"id": "22", "name": "Don", "age":"55"}, {"id": "23", "name": "Don2", "age":"55"},{"id": "24", "name": "Don3", "age":"55"}]');
insert into test values (2, '[{"id": "32", "name": "Don", "age":"55"}, {"id": "33", "name": "Don2", "age":"55"},{"id": "34", "name": "Don3", "age":"55"}]');
select subrow, id, row_number() over (partition by id)
from (
select json_array_elements(data) as subrow, id
from test
) as t;
subrow | id | row_number
------------------------------------------+----+------------
{"id": "22", "name": "Don", "age":"55"} | 1 | 1
{"id": "23", "name": "Don2", "age":"55"} | 1 | 2
{"id": "24", "name": "Don3", "age":"55"} | 1 | 3
{"id": "32", "name": "Don", "age":"55"} | 2 | 1
{"id": "33", "name": "Don2", "age":"55"} | 2 | 2
{"id": "34", "name": "Don3", "age":"55"} | 2 | 3
-- apparently you can filter what you want from here
select subrow, id, row_number() over (partition by id)
from (
select json_array_elements(data) as subrow, id
from test
) as t
where subrow->>'id' = '23';
In addition, think about your schema design. It may not be the best idea to store your data this way.

PSQL: Find record where JSONB field with array of hashes contains some case insensitive value

I need to find all the records in the PostgreSQL database (9.5) where connections JSONB column with array of hashes containing some information. And this search query must be case-insensitive.
For example column would be [{"type":"email", "value":"john#test.com", "comment": "Test"}, {"type":"skype", "value":"john.b", "comment": "Test2"}]). And I need to find the record, where connections column contains entry with type: "skype" and value "JOHN.B".
# SELECT * FROM contacts;
id | email | connections
---+-------+------------------------------------------------------------------------------
1 | asd | [{"type": "email", "value": "john#test.com"}, {"type": "skype", "value": "john.b"}]
How can I do it? Thanks.
WITH t(id,email,connections) AS ( VALUES
(1,'test','[
{"type": "email", "value": "john#test.com"},
{"type": "skype", "value": "john.b"}
]'::JSONB)
)
SELECT * FROM t
WHERE connections #> '[{"type": "skype"}]'
AND connections #> '[{"value": "john.b"}]';
Result:
id | email | connections
----+-------+-------------------------------------------------------------------------------------
1 | test | [{"type": "email", "value": "john#test.com"}, {"type": "skype", "value": "john.b"}]
(1 row)
Here is one method using >#:
where connections #> '{"type":"skype"}' and
connections #> '{"value":"john.b"}')