Deep search within jsonb field PostgreSQL - postgresql

A sample of my data looks something like this:
{"city": "NY",
"skills": [
{"soft_skills": "Analysis"},
{"soft_skills": "Procrastination"},
{"soft_skills": "Presentation"}
],
"areas_of_training": [
{"areas of training": "Visio"},
{"areas of training": "Office"},
{"areas of training": "Risk Assesment"}
]}
I would like to run a query to find users with soft_skills Analysis and maybe run another one to find users whose area of training is Visio and Risk Assesment
My column type is jsonb. How can I implement a search query on these deeply nested objects? A query on level one for city works using SELECT * FROM mydata WHERE content::json->>'city'='NY';
How can I also run a match using the LIKE keyword or string matching for deeply nested values?

1)
SELECT * FROM mydata
WHERE content->'skills' #> '[{"soft_skills": "Analysis"}]';
2)
SELECT * FROM mydata
WHERE content->'areas_of_training' #> '[{"areas of training": "Visio"},{"areas of training": "Risk Assesment"}]';
About JSON(B) operators
PS: And be ready for extremely slow queries. I highly recommend to think about data normalization.
Update for LIKE
For your example data it could be:
SELECT * FROM mydata
WHERE EXISTS (
SELECT *
FROM jsonb_array_elements(content->'areas_of_training') as a
WHERE a->>'areas of training' ilike '%vi%');
But query highly depending on the actual JSON structure.

Use json_array_elements() to get values of nested elements, examples:
select d.*
from mydata d,
json_array_elements(content->'skills')
where value->>'soft_skills' ilike '%analysis%';
select d.*
from mydata d,
json_array_elements(content->'areas_of_training')
where value->>'areas of training' ~* 'visio|office';
It is possible that the query yields duplicate rows, so it is reasonable to use select distinct on (id), where id is a primary key.
Note that the function json_array_elements() is costly and you cannot use indexes in contrary to Abelisto's solution. However you have to use it if you want to have an access to values of nested json elements.

Related

How to query nested JSONB format data column in PostgreSQL?

I have data gene expression data in jsosnb column in multiple rows for different samples as shown below:
Sample Gexp
Sample A {"data": [{"pval": 0.0154, "Protein": "A0A0B4J2D5", "FoldChange": 1.3534, "MinusLog10p": 0.1334, "Significance": "Non-significant"}, {"pval": 0.0689, "Protein": "A0FGR8", "FoldChange": 2.5448, "MinusLog10p": 1.1615, "Significance": "Significant"}]}
Sample B {"data": [{"pval": 0.0824, "Protein": "A0A0B4J2D5", "FoldChange": -0.1676, "MinusLog10p": 0.1084, "Significance": "Non-significant"}, {"pval": 0.0219, "Protein": "A0FGR8", "FoldChange": 2.3448, "MinusLog10p": 1.1615, "Significance": "Significant"}]}
I need to query across the column containing multiple records where a certain protein has a pval or FoldChange in a certain range. I tried multiple solutions provided in this forum (Search in nested Postgresql JSONB column, Postgresql query for objects in nested JSONB field, Query simplified JSONB form JSONB column containing nested JSON from a Postgresql database?, How to query nested array with heterogeneous elements in PostgreSQL JSONB column, etc., with no luck. Can someone help me?
The conditions for selecting the data were not precisely (unambiguously) described in the question. Exemplary, when we are looking for the A0FGR8 protein in the pval range of 0.02 to 0.03, the query might look like this:
select sample, value
from my_table
cross join jsonb_array_elements(gexp->'data')
where value->>'Protein' = 'A0FGR8'
and (value->>'pval')::numeric between 0.02 and 0.03
Test the query in Db<>fiddle.

Search for string in jsonb values - PostgreSQL

For simplicity, a row of table looks like this:
key: "z06khw1bwi886r18k1m7d66bi67yqlns",
reference_keys: {
"KEY": "1x6t4y",
"CODE": "IT137-521e9204-ABC-TESTE"
"NAME": "A"
},
I have a jsonb object like this one {"KEY": "1x6t4y", "CODE": "IT137-521e9204-ABC-TESTE", "NAME": "A"} and I want to search for a query in the values of any key. If my query is something like '521e9204' I want it to return the row that reference_keys has '521e9204' in any value. Basicly the keys don't matter for this scenario.
Note: The column reference_keys and so the jsonb object, are always a 1 dimensional array.
I have tried a query like this:
SELECT * FROM table
LEFT JOIN jsonb_each_text(table.reference_keys) AS j(k, value) ON true
WHERE j.value LIKE '%521e9204%'
The problem is that it duplicates rows, for every key in the json and it messes up the returned items.
I have also thinked of doing something like this:
SELECT DISTINCT jsonb_object_keys(reference_keys) from table;
and then use a query like:
SELECT * FROM table
WHERE reference_keys->>'CODE' like '%521e9204%'
It seems like this would work but I really don't want to rely on this solution.
You can rewrite your JOIN to an EXISTS condition to avoid the duplicates:
SELECT t.*
FROM the_table t
WHERE EXISTS (select *
from jsonb_each_text(t.reference_keys) AS j(k, value)
WHERE j.value LIKE '%521e9204%');
If you are using Postgres 12 or later, you can also use a JSON path query:
where jsonb_path_exists(reference_keys, 'strict $.** ? (# like_regex "521e9204")')

Indexing a josnb column in postgresql

I have a column in postgresql table with type jsonb.
{
.....
"type": "car",
"vehicleIds": [
"980e3761-935a-4e52-be77-9f9461dec4d1","980e3761-935a-4e52-be77-9f9461dec4d2"
]
.....
}
Application runs queries against these fields to fetch records. I need to index this column only for these fields.
How can this be done?
This is query structure with properties as the column name:
SELECT *
FROM Vehicle f
WHERE f.properties::text ## CONCAT('$.vehicleIds[*] >', :vehicleId )= true
AND f.properties::text ## CONCAT('$.type >', :type ) = true
The query you are using is highly confusing, as it boils down to be a text search query, as the ## is applied on a text value.
I also don't understand the '$.type > ... condition. With values like car I would expect an equality operator, rather than "greater than". Using > together with a UUID also doesn't seem to make sense.
If you want to search for values of type car and contain a list of IDs, using the "contains" operator #> is a better way to do that:
SELECT *
FROM Vehicle f
WHERE f.properties #> '{"type": "car", "vehicleIds": ["980e3761-935a-4e52-be77-9f9461dec4d1"]}'
The above could make use of a GIN index on the properties column:
create index on vehicles using gin (properties);
If the type key is always queried with equality (which I assume), a combined index might be more efficient:
create index on vehicles using gin ( (properties ->> 'type'), (properties -> 'vehicleIds') );
You need to install the btree_gin extension in order to create that index.
That index would be a bit smaller but needs a different query:
SELECT *
FROM Vehicle f
WHERE f.properties ->> 'type' = 'car'
AND f.properties -> 'vehicleIds' #> '["980e3761-935a-4e52-be77-9f9461dec4d1"]'
You will need to validate if the indexes are used and which ones is more efficient by looking at the execution plan

querying JSONB with array fields

If I have a jsonb column called value with fields such as:
{"id": "5e367554-bf4e-4057-8089-a3a43c9470c0",
"tags": ["principal", "reversal", "interest"],,, etc}
how would I find all the records containing given tags, e.g:
if given: ["reversal", "interest"]
it should find all records with either "reversal" or "interest" or both.
My experimentation got me to this abomination so far:
select value from account_balance_updated
where value #> '{}' :: jsonb and value->>'tags' LIKE '%"principal"%';
of course this is completely wrong and inefficient
Assuming you are using PG 9.4+, you can use the jsonb_array_elements() function:
SELECT DISTINCT abu.*
FROM account_balance_updated abu,
jsonb_array_elements(abu.value->'tags') t
WHERE t.value <# '["reversal", "interest"]'::jsonb;
As it turned out you can use cool jsonb operators described here:
https://www.postgresql.org/docs/9.5/static/functions-json.html
so original query doesn't have to change much:
select value from account_balance_updated
where value #> '{}' :: jsonb and value->'tags' ?| array['reversal', 'interest'];
in my case I also needed to escape the ? (??|) because I am using so called "prepared statement" where you pass query string and parameters to jdbc and question marks are like placeholders for params:
https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html

How to use postgresql any with jsonb data

Related
see this question
Question
I have a postgresql table that has a column of type jsonb. the json data looks like this
{
"personal":{
"gender":"male",
"contact":{
"home":{
"email":"ceo#home.me",
"phone_number":"5551234"
},
"work":{
"email":"ceo#work.id",
"phone_number":"5551111"
}
},
..
"nationality":"Martian",
..
},
"employment":{
"title":"Chief Executive Officer",
"benefits":[
"Insurance A",
"Company Car"
],
..
}
}
This query works perfectly well
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'personal'->>'nationality' in ('Martian','Terran')
I would like to fetch all employees who have benefits of type Insurance A OR Insurance B, this ugly query works:
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'employment'->'benefits' ? 'Insurance A'
OR employees->'employment'->'benefits' ? 'Insurance B';
I would like to use any instead like so:
select * from employees
where employees->'employment'->>'benefits' =
any('{Insurance A, Insurance B}'::text[]);
but this returns 0 results.. ideas?
What i've also tried
I tried the following syntaxes (all failed):
.. = any({'Insurance A','Insurance B'}::text[]);
.. = any('Insurance A'::text,'Insurance B'::text}::array);
.. = any({'Insurance A'::text,'Insurance B'::text}::array);
.. = any(['Insurance A'::text,'Insurance B'::text]::array);
employees->'employment'->'benefits' is a json array, so you should unnest it to use its elements in any comparison.
Use the function jsonb_array_elements_text() in lateral join:
select *
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
where
benefit = any('{Insurance A, Insurance B}'::text[]);
The syntax
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits')
is equivalent to
from
employees,
lateral jsonb_array_elements_text(employees->'employment'->'benefits')
The word lateral may be omitted. For the documentation:
LATERAL can also precede a function-call FROM item, but in this case
it is a noise word, because the function expression can refer to
earlier FROM items in any case.
See also: What is the difference between LATERAL and a subquery in PostgreSQL?
The syntax
from jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
is a form of aliasing, per the documentation
Another form of table aliasing gives temporary names to the columns of
the table, as well as the table itself:
FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )
You can use the containment operator ?| to check if the array contains any of the values you want.
select * from employees
where employees->'employment'->'benefits' ?| array['Insurance A', 'Insurance B']
If you happen to a case where you want all of the values to be in the array, then there's the ?& operator to check for that.