Indexing a josnb column in postgresql - postgresql

I have a column in postgresql table with type jsonb.
{
.....
"type": "car",
"vehicleIds": [
"980e3761-935a-4e52-be77-9f9461dec4d1","980e3761-935a-4e52-be77-9f9461dec4d2"
]
.....
}
Application runs queries against these fields to fetch records. I need to index this column only for these fields.
How can this be done?
This is query structure with properties as the column name:
SELECT *
FROM Vehicle f
WHERE f.properties::text ## CONCAT('$.vehicleIds[*] >', :vehicleId )= true
AND f.properties::text ## CONCAT('$.type >', :type ) = true

The query you are using is highly confusing, as it boils down to be a text search query, as the ## is applied on a text value.
I also don't understand the '$.type > ... condition. With values like car I would expect an equality operator, rather than "greater than". Using > together with a UUID also doesn't seem to make sense.
If you want to search for values of type car and contain a list of IDs, using the "contains" operator #> is a better way to do that:
SELECT *
FROM Vehicle f
WHERE f.properties #> '{"type": "car", "vehicleIds": ["980e3761-935a-4e52-be77-9f9461dec4d1"]}'
The above could make use of a GIN index on the properties column:
create index on vehicles using gin (properties);
If the type key is always queried with equality (which I assume), a combined index might be more efficient:
create index on vehicles using gin ( (properties ->> 'type'), (properties -> 'vehicleIds') );
You need to install the btree_gin extension in order to create that index.
That index would be a bit smaller but needs a different query:
SELECT *
FROM Vehicle f
WHERE f.properties ->> 'type' = 'car'
AND f.properties -> 'vehicleIds' #> '["980e3761-935a-4e52-be77-9f9461dec4d1"]'
You will need to validate if the indexes are used and which ones is more efficient by looking at the execution plan

Related

In Postgres, how can I efficiently filter using the inner numbers of this jsonb structure?

So I work with Postgres SQL, and I have a jsonb column with the following structure:
{
"Store1":[
{
"price":5.99,
"seller":"seller"
},
{
"price":56.43,
"seller":"seller"
}
],
"Store2":[
{
"price":45.65,
"seller":"seller"
},
{
"price":44.66,
"seller":"seller"
}
]
}
I have a jsonb like this for every product in the database. I want to run an SQL query that will answer the following question:
For each product, is one of the prices in this JSON is bigger/equal/smaller than X?
Basically filter the product to include only the ones who have at least one price that satisfies a mathematical condition.
How can I do it efficiently? What's the best way in Postgres to iterate a JSON like this, with a relatively complex inner structure?
Also, if I could control the way the data is structured (to an extent, I can), what changes can I do to make this query more efficient?
Thanks!
Use a json path expression:
WHERE col ## '$.*[*].price < 20'
or
WHERE col #? '$.*[*] ? (#.price < 20)'
If you need to compare to another column or make the query parameterised, you can either build the jsonpath dynamically
WHERE col ## format('$.*[*].price < %s', $1)::jsonpath
WHERE col #? format('$.*[*] ? (#.price < %s)', $1)::jsonpath
or you can use the respective function and pass variables as an object:
WHERE jsonb_path_match(col, '$.*[*].price < $limit', jsonb_build_object('limit', $1))
WHERE jsonb_path_exists(col, format('$.*[*] ? (#.price < $limit)', jsonb_build_object('limit', $1))
I admit I had to check my cheat sheet to figure out the right combination of operator and expression. Takeaways:
if a comparison operator needs to work with multiple values, it generally functions as an ANY
## does not work with ? (# …) filter expressions since they don't return a boolean,
#? does not work with predicates since they always return a value (even if it's false)
What changes can I do to make this query more efficient?
As #jjanes commented on my other answer, the jsonpath match col ## '$.*[*].price < $limit' isn't going to be fast and needs to do full table scan, at least for < and >. To make a useful index, a different approach is required. An index can only have a single value to compare with, not any number. For that, we need to change the condition from EXISTS(SELECT prices_of(col) WHERE price < $limit) to (SELECT MIN(prices_of(col))) < $limit.
With this idea it is possible to build an expression index on the result of a custom immutable function:
CREATE FUNCTION min_price(data jsonb) RETURNS float
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN (
SELECT min((offer ->> 'price')::float)
FROM jsonb_each(data) AS entries(name, store),
LATERAL jsonb_array_elements(store) AS elements(offer)
);
CREATE INDEX example_min_data_price_idx ON example (min_price(data));
which you can use as
SELECT * FROM example WHERE min_price(data) < 20;
Looking for rows with a price larger than a certain number requires a separate index on max_price(data). If you want to use the index in a JOIN with more conditions, consider making it a multi-column index.
Looking for row with a price equalling a certain number can be optimised by indexing the jsonb column and using a jsonpath:
CREATE INDEX example_data_idx ON example USING GIN (data jsonb_ops);
SELECT * FROM example WHERE data ## '$.*[*].price == 20';
SELECT * FROM example WHERE data #? '$.*[*] ? (#.price == 20)';
Unfortunately you can't use jsonb_path_ops here since that doesn't support the wildcard.

Index created for PostgreSQL jsonb column not utilized

I have created an index for a field in jsonb column as:
create index on Employee using gin ((properties -> 'hobbies'))
Query generated is:
CREATE INDEX employee_expr_idx ON public.employee USING gin (((properties -> 'hobbies'::text)))
My search query has structure as:
SELECT * FROM Employee e
WHERE e.properties #> '{"hobbies": ["trekking"]}'
AND e.department = 'Finance'
Running EXPLAIN command for this query gives:
Seq Scan on employee e (cost=0.00..4452.94 rows=6 width=1183)
Filter: ((properties #> '{"hobbies": ["trekking"]}'::jsonb) AND (department = 'Finance'::text))
Going by this, I am not sure if index is getting used for search.
Is this entire setup ok?
The expression you use in the WHERE clause must match the expression in the index exactly, your index uses the expression: ((properties -> 'hobbies'::text)) but your query only uses e.properties on the left hand side.
To make use of that index, your WHERE clause needs to use the same expression as was used in the index:
SELECT *
FROM Employee e
WHERE (properties -> 'hobbies') #> '["trekking"]'
AND e.department = 'Finance'
However: your execution plan shows that the table employee is really tiny (rows=6). With a table as small as that, a Seq Scan is always going to be the fastest way to retrieve data, no matter what kind of indexes you define.

Select rows in postgres where JSONB field in provided list

Say I have table in Postgres with column data of type JSONB. This column contains pretty complex object for example:
{
...,
gender: ['men', 'women'],
...
}
I have query like gender=men&gender=women&gender=something_else and want to find all rows in table where ANY of gender's members `IN ('men', 'women', 'something_else'). For example:
SELECT uuid, data ->> 'gender' FROM "OX_Articles" WHERE data ->> 'gender' INTERSECTS WITH (men', 'women', 'something_else');
Of course we haven't keywords INTERSECTS WITH.
Either in (...) or = any(array[...]) should work.
They should have similar performances. I favor =any because it handles empty RHS (whereas IN can't handle empty literal tuples) and I'd expect whatever postgres bindings I have to convert the host language's arrays/lists/arraylists to pg arrays, not pg tuples.

How to use postgresql any with jsonb data

Related
see this question
Question
I have a postgresql table that has a column of type jsonb. the json data looks like this
{
"personal":{
"gender":"male",
"contact":{
"home":{
"email":"ceo#home.me",
"phone_number":"5551234"
},
"work":{
"email":"ceo#work.id",
"phone_number":"5551111"
}
},
..
"nationality":"Martian",
..
},
"employment":{
"title":"Chief Executive Officer",
"benefits":[
"Insurance A",
"Company Car"
],
..
}
}
This query works perfectly well
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'personal'->>'nationality' in ('Martian','Terran')
I would like to fetch all employees who have benefits of type Insurance A OR Insurance B, this ugly query works:
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'employment'->'benefits' ? 'Insurance A'
OR employees->'employment'->'benefits' ? 'Insurance B';
I would like to use any instead like so:
select * from employees
where employees->'employment'->>'benefits' =
any('{Insurance A, Insurance B}'::text[]);
but this returns 0 results.. ideas?
What i've also tried
I tried the following syntaxes (all failed):
.. = any({'Insurance A','Insurance B'}::text[]);
.. = any('Insurance A'::text,'Insurance B'::text}::array);
.. = any({'Insurance A'::text,'Insurance B'::text}::array);
.. = any(['Insurance A'::text,'Insurance B'::text]::array);
employees->'employment'->'benefits' is a json array, so you should unnest it to use its elements in any comparison.
Use the function jsonb_array_elements_text() in lateral join:
select *
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
where
benefit = any('{Insurance A, Insurance B}'::text[]);
The syntax
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits')
is equivalent to
from
employees,
lateral jsonb_array_elements_text(employees->'employment'->'benefits')
The word lateral may be omitted. For the documentation:
LATERAL can also precede a function-call FROM item, but in this case
it is a noise word, because the function expression can refer to
earlier FROM items in any case.
See also: What is the difference between LATERAL and a subquery in PostgreSQL?
The syntax
from jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
is a form of aliasing, per the documentation
Another form of table aliasing gives temporary names to the columns of
the table, as well as the table itself:
FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )
You can use the containment operator ?| to check if the array contains any of the values you want.
select * from employees
where employees->'employment'->'benefits' ?| array['Insurance A', 'Insurance B']
If you happen to a case where you want all of the values to be in the array, then there's the ?& operator to check for that.

Postgres jsonb query missing index?

We have the following json documents stored in our PG table (identities) in a jsonb column 'data':
{
"email": {
"main": "mainemail#email.com",
"prefix": "aliasPrefix",
"prettyEmails": ["stuff1", "stuff2"]
},
...
}
I have the following index set up on the table:
CREATE INDEX ix_identities_email_main
ON identities
USING gin
((data -> 'email->main'::text) jsonb_path_ops);
What am I missing that is preventing the following query from hitting that index?? It does a full seq scan on the table... We have tens of millions of rows, so this query is hanging for 15+ minutes...
SELECT * FROM identities WHERE data->'email'->>'main'='mainemail#email.com';
If you use JSONB data type for your data column, in order to index ALL "email" entry values you need to create following index:
CREATE INDEX ident_data_email_gin_idx ON identities USING gin ((data -> 'email'));
Also keep in mind that for JSONB you need to use appropriate list of operators;
The default GIN operator class for jsonb supports queries with the #>,
?, ?& and ?| operators
Following queries will hit this index:
SELECT * FROM identities
WHERE data->'email' #> '{"main": "mainemail#email.com"}'
-- OR
SELECT * FROM identities
WHERE data->'email' #> '{"prefix": "aliasPrefix"}'
If you need to search against array elements "stuff1" or "stuff2", index above will not work , you need to explicitly add expression index on "prettyEmails" array element values in order to make query work faster.
CREATE INDEX ident_data_prettyemails_gin_idx ON identities USING gin ((data -> 'email' -> 'prettyEmails'));
This query will hit the index:
SELECT * FROM identities
WHERE data->'email' #> '{"prettyEmails":["stuff1"]}'