PostgreSQL: Find and delete duplicated jsonb data, excluding a key/value pair when comparing - postgresql

I have been searching all over to find a way to do this.
I am trying to clean up a table with a lot of duplicated jsonb fields.
There are some examples out there, but as a little twist, I need to exclude one key/value pair in the jsonb field, to get the result I need.
Example jsonb
{
"main": {
"orders": {
"order_id": "1"
"customer_id": "1",
"update_at": "11/23/2017 17:47:13"
}
}
Compared to:
{
"main": {
"orders": {
"order_id": "1"
"customer_id": "1",
"updated_at": "11/23/2017 17:49:53"
}
}
If I can exclude the "updated_at" key when comparing, the query should find it a duplicate and this, and possibly other, duplicated entries should be deleted, keeping only one, the first "original" one.
I have found this query, to try and find the duplicates. But it doesn't take my situation into account. Maybe someone can help structuring this to meet the requirements.
SELECT t1.jsonb_field
FROM customers t1
INNER JOIN (SELECT jsonb_field, COUNT(*) AS CountOf
FROM customers
GROUP BY jsonb_field
HAVING COUNT(*)>1
) t2 ON t1.jsonb_field=t2.jsonb_field
WHERE
t1.customer_id = 1
Thanks in advance :-)

If the Updated at is always at the same path, then you can remove it:
SELECT t1.jsonb_field
FROM customers t1
INNER JOIN (SELECT jsonb_field, COUNT(*) AS CountOf
FROM customers
GROUP BY jsonb_field
HAVING COUNT(*)>1
) t2 ON
t1.jsonb_field #-'{main,orders,updated_at}'
=
t2.jsonb_field #-'{main,orders,updated_at}'
WHERE
t1.customer_id = 1
See https://www.postgresql.org/docs/9.5/static/functions-json.html
additional operators
EDIT
If you dont have #- you might just cast to text, and do a regex replace
regexp_replace(t1.jsonb_field::text, '"update_at": "[^"]*?"','')::jsonb
=
regexp_replace(t2.jsonb_field::text, '"update_at": "[^"]*?"','')::jsonb
I even think, you don't need to cast it back to jsonb. But to be save.
Mind the regex matche ANY "update_at" field (by key) in the json. It should not match data, because it would not match an escaped closing quote \", nor find the colon after it.
Note the regex actually should be '"update_at": "[^"]*?",?'
But on sql fiddle that fails. (maybe depends on the postgresbuild..., check with your version, because as far as regex go, this is correct)
If the comma is not removed, the cast to json fails.
you can try '"update_at": "[^"]*?",'
no ? : that will remove the comma, but fail if update_at was the last in the list.
worst case, nest the 2
regexp_replace(
regexp_replace(t1.jsonb_field::text, '"update_at": "[^"]*?",','')
'"update_at": "[^"]*?"','')::jsonb

for postgresql 9.4
Though sqlfidle only has 9.3 and 9.6
9.3 is missing the json_object_agg. But the postgres doc says it is in 9.4. So this should work
It will only work, if all records have objects under the important keys.
main->orders
If main->orders is a json array, or scalar, then this may give an error.
Same if {"main": [1,2]} => error.
Each json_each returns a table with a row for each key in the json
json_object_agg aggregates them back to a json array.
The case statement filters the one key on each level that needs to be handled.
In the deepest nest level, it filters out the updated_at row.
On sqlfidle set query separator to '//'
If you use psql client, replace the // with ;
create or replace function foo(x json)
returns jsonb
language sql
as $$
select json_object_agg(key,
case key when 'main' then
(select json_object_agg(t2.key,
case t2.key when 'orders' then
(select json_object_agg(t3.key, t3.value)
from json_each(t2.value) as t3
WHERE t3.key <> 'updated_at'
)
else t2.value
end)
from json_each(t1.value) as t2
)
else t1.value
end)::jsonb
from json_each(x) as t1
$$ //
select foo(x)
from
(select '{ "main":{"orders":{"order_id": "1", "customer_id": "1", "updated_at": "11/23/2017 17:49:53" }}}'::json as x) as t1
x (the argument) may need to be jsonb, if that is your datatype

Related

Search for string in jsonb values - PostgreSQL

For simplicity, a row of table looks like this:
key: "z06khw1bwi886r18k1m7d66bi67yqlns",
reference_keys: {
"KEY": "1x6t4y",
"CODE": "IT137-521e9204-ABC-TESTE"
"NAME": "A"
},
I have a jsonb object like this one {"KEY": "1x6t4y", "CODE": "IT137-521e9204-ABC-TESTE", "NAME": "A"} and I want to search for a query in the values of any key. If my query is something like '521e9204' I want it to return the row that reference_keys has '521e9204' in any value. Basicly the keys don't matter for this scenario.
Note: The column reference_keys and so the jsonb object, are always a 1 dimensional array.
I have tried a query like this:
SELECT * FROM table
LEFT JOIN jsonb_each_text(table.reference_keys) AS j(k, value) ON true
WHERE j.value LIKE '%521e9204%'
The problem is that it duplicates rows, for every key in the json and it messes up the returned items.
I have also thinked of doing something like this:
SELECT DISTINCT jsonb_object_keys(reference_keys) from table;
and then use a query like:
SELECT * FROM table
WHERE reference_keys->>'CODE' like '%521e9204%'
It seems like this would work but I really don't want to rely on this solution.
You can rewrite your JOIN to an EXISTS condition to avoid the duplicates:
SELECT t.*
FROM the_table t
WHERE EXISTS (select *
from jsonb_each_text(t.reference_keys) AS j(k, value)
WHERE j.value LIKE '%521e9204%');
If you are using Postgres 12 or later, you can also use a JSON path query:
where jsonb_path_exists(reference_keys, 'strict $.** ? (# like_regex "521e9204")')

JSONB Data Type Modification in Postgresql

I have a doubt with modification of jsonb data type in postgres
Basic setup:-
array=> ["1", "2", "3"]
and now I have a postgresql database with an id column and a jsonb datatype column named lets just say cards.
id cards
-----+---------
1 {"1": 3, "4": 2}
thats the data in the table named test
Question:
How do I convert the cards of id->1 FROM {"1": 3, "4": 2} TO {"1": 4, "4":2, "2": 1, "3": 1}
How I expect the changes to occur:
From the array, increment by 1 all elements present inside the array that exist in the cards jsonb as a key thus changing {"1": 3} to {"1": 4} and insert the values that don't exist as a key in the cards jsonb with a value of 1 thus changing {"1":4, "4":2} to {"1":4, "4":2, "2":1, "3":1}
purely through postgres.
Partial Solution
I asked a senior for support regarding my question and I was told this:-
Roughly (names may differ): object keys to explode cards, array_elements to explode the array, left join them, do the calculation, re-aggregate the object. There may be a more direct way to do this but the above brute-force approach will work.
So I tried to follow through it using these two functions json_each_text(), json_array_elements_text() but ended up stuck halfway into this as well as I was unable to understand what they meant by left joining two columns:-
SELECT jsonb_each_text(tester_cards) AS each_text, jsonb_array_elements_text('[["1", 1], ["2", 1], ["3", 1]]') AS array_elements FROM tester WHERE id=1;
TLDR;
Update statement that checks whether a range of keys from an array exist or not in the jsonb data and automatically increments by 1 or inserts respectively the keys into the jsonb with a value of 1
Now it might look like I'm asking to be spoonfed but I really haven't managed to find anyway to solve it so any assistance would be highly appreciated 🙇
The key insight is that with jsonb_each and jsonb_object_agg you can round-trip a JSON object in a subquery:
SELECT id, (
SELECT jsonb_object_agg(key, value)
FROM jsonb_each(cards)
) AS result
FROM test;
(online demo)
Now you can JOIN these key-value pairs against the jsonb_array_elements of your array input. Your colleague was close, but not quite right: it requires a full outer join, not just a left (or right) join to get all the desired object keys for your output, unless one of your inputs is a subset of the other.
SELECT id, (
SELECT jsonb_object_agg(COALESCE(obj_key, arr_value), …)
FROM jsonb_array_elements_text('["1", "2", "3"]') AS arr(arr_value)
FULL OUTER JOIN jsonb_each(cards) AS obj(obj_key, obj_value) ON obj_key = arr_value
) AS result
FROM test;
(online demo)
Now what's left is only the actual calculation and the conversion to an UPDATE statement:
UPDATE test
SET cards = (
SELECT jsonb_object_agg(
COALESCE(key, arr_value),
COALESCE(obj_value::int, 0) + (arr_value IS NOT NULL)::int
)
FROM jsonb_array_elements_text('["1", "2", "3"]') AS arr(arr_value)
FULL OUTER JOIN jsonb_each_text(cards) AS obj(key, obj_value) ON key = arr_value
);
(online demo)

Querying Postgres SQL JSON Column

I have a json column (json_col) in a postgres database with the following structure:
{
"event1":{
"START_DATE":"6/18/2011",
"END_DATE":"7/23/2011",
"event_type":"active with prior experience"
},
"event2":{
"START_DATE":"8/20/11",
"END_DATE":"2/11/2012",
"event_type":"active"
}
}
[example of table structure][1]
How can I make a select statement in postgres to return the start_date and end_date with a where statement where "event_type" like "active"?
Attempted Query:
select person_id, json_col#>>'START_DATE' as event_start, json_col#>>'END_DATE' as event_end
from data
where json_col->>'event_type' like '%active%';
Returns empty columns.
Expected Response:
event_start
6/18/2011
8/20/2011
It sounds like you want to unnest your json structure, ignoring the top level keys and just getting the top level values. You can do this with jsonb_each, looking at resulting column named 'value'. You would put the function call in the FROM list as a lateral join (but since it is a function call, you don't need to specify the LATERAL keyword, it is implicit)
select value->>'START_DATE' from data, jsonb_each(json_col)
where value->>'event_type' like '%active%';

Find element in array of jsonb documents by value as case insensitive

Assume I have userinfo table with person column containing the following jsonb object:
{
"skills": [
{
"name": "php"
},
{
"name": "Python"
}
]
}
In order to get Python skill i would write the following query
select * from userinfo
where person -> 'skills' #> '[{"name":"Python"}]'
It works well, but if specify '[{"name":"python"}]' as lower case it doesn't return me what i want.
How can i write case insensitive query there?
Postgre version is 11.2
you can do that when unnesting with an exists predicate:
select u.*
from userinfo u
where exists (select *
from jsonb_array_elements(u.person -> 'skills') as s(j)
-- make sure to only do this for rows that actually contain an array
where jsonb_typeof(u.person -> 'skills') = 'array'
and lower(s.j ->> 'name') = 'python');
Online example: https://rextester.com/XKVUA73952
demo:db<>fiddle
AFAIK there is no in-built JSON function for that. So, you have to convert the JSON-String to lower case (meaning casting into type text, lower-case it, recast it into type jsonb):
WHERE lower(person::text)::jsonb -> 'skills' #> '[{"name":"python"}]'
select * from (select 1 as id, 'fname' as firstname, 'sname' as surname, jsonb_array_elements('{
"skills": [{"name": "php"},{"name": "Python"}]}'::jsonb->'skills') skill) p
where p.skill->>'name' ilike 'python';
to suit the tables in the question it'd be something like
select * from (select *, jsonb_array_elements(person->'skills') skill from userinfo) u
where u.skill->>'name' ilike 'python';
Just a note, this will return multiple entries for the same userinfo if you start looking for multiple skills .. if you use the above, you'd want to group by the fields you want returned or select distinct id, username etc.
like (assuming there's an id column in the userinfo table)
select distinct id from (select *, jsonb_array_elements(person->'skills') skill from userinfo) u
where u.skill->>'name' ilike 'python' or u.skill->>'name' ilike 'php';
it all depends what you want to do

How to use postgresql any with jsonb data

Related
see this question
Question
I have a postgresql table that has a column of type jsonb. the json data looks like this
{
"personal":{
"gender":"male",
"contact":{
"home":{
"email":"ceo#home.me",
"phone_number":"5551234"
},
"work":{
"email":"ceo#work.id",
"phone_number":"5551111"
}
},
..
"nationality":"Martian",
..
},
"employment":{
"title":"Chief Executive Officer",
"benefits":[
"Insurance A",
"Company Car"
],
..
}
}
This query works perfectly well
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'personal'->>'nationality' in ('Martian','Terran')
I would like to fetch all employees who have benefits of type Insurance A OR Insurance B, this ugly query works:
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'employment'->'benefits' ? 'Insurance A'
OR employees->'employment'->'benefits' ? 'Insurance B';
I would like to use any instead like so:
select * from employees
where employees->'employment'->>'benefits' =
any('{Insurance A, Insurance B}'::text[]);
but this returns 0 results.. ideas?
What i've also tried
I tried the following syntaxes (all failed):
.. = any({'Insurance A','Insurance B'}::text[]);
.. = any('Insurance A'::text,'Insurance B'::text}::array);
.. = any({'Insurance A'::text,'Insurance B'::text}::array);
.. = any(['Insurance A'::text,'Insurance B'::text]::array);
employees->'employment'->'benefits' is a json array, so you should unnest it to use its elements in any comparison.
Use the function jsonb_array_elements_text() in lateral join:
select *
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
where
benefit = any('{Insurance A, Insurance B}'::text[]);
The syntax
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits')
is equivalent to
from
employees,
lateral jsonb_array_elements_text(employees->'employment'->'benefits')
The word lateral may be omitted. For the documentation:
LATERAL can also precede a function-call FROM item, but in this case
it is a noise word, because the function expression can refer to
earlier FROM items in any case.
See also: What is the difference between LATERAL and a subquery in PostgreSQL?
The syntax
from jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
is a form of aliasing, per the documentation
Another form of table aliasing gives temporary names to the columns of
the table, as well as the table itself:
FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )
You can use the containment operator ?| to check if the array contains any of the values you want.
select * from employees
where employees->'employment'->'benefits' ?| array['Insurance A', 'Insurance B']
If you happen to a case where you want all of the values to be in the array, then there's the ?& operator to check for that.