Postgres jsonb query missing index? - postgresql

We have the following json documents stored in our PG table (identities) in a jsonb column 'data':
{
"email": {
"main": "mainemail#email.com",
"prefix": "aliasPrefix",
"prettyEmails": ["stuff1", "stuff2"]
},
...
}
I have the following index set up on the table:
CREATE INDEX ix_identities_email_main
ON identities
USING gin
((data -> 'email->main'::text) jsonb_path_ops);
What am I missing that is preventing the following query from hitting that index?? It does a full seq scan on the table... We have tens of millions of rows, so this query is hanging for 15+ minutes...
SELECT * FROM identities WHERE data->'email'->>'main'='mainemail#email.com';

If you use JSONB data type for your data column, in order to index ALL "email" entry values you need to create following index:
CREATE INDEX ident_data_email_gin_idx ON identities USING gin ((data -> 'email'));
Also keep in mind that for JSONB you need to use appropriate list of operators;
The default GIN operator class for jsonb supports queries with the #>,
?, ?& and ?| operators
Following queries will hit this index:
SELECT * FROM identities
WHERE data->'email' #> '{"main": "mainemail#email.com"}'
-- OR
SELECT * FROM identities
WHERE data->'email' #> '{"prefix": "aliasPrefix"}'
If you need to search against array elements "stuff1" or "stuff2", index above will not work , you need to explicitly add expression index on "prettyEmails" array element values in order to make query work faster.
CREATE INDEX ident_data_prettyemails_gin_idx ON identities USING gin ((data -> 'email' -> 'prettyEmails'));
This query will hit the index:
SELECT * FROM identities
WHERE data->'email' #> '{"prettyEmails":["stuff1"]}'

Related

How to create index for postgresql jsonb field (array data) and text field

Please let me know how to create index for below query.
SELECT * FROM customers
WHERE identifiers #>
'[{"systemName": "SAP", "systemReference": "33557"}]'
AND country_code = 'IN';
identifiers is jsonb type and data is as below.
[{"systemName": "ERP", "systemReference": "TEST"}, {"systemName": "FEED", "systemReference": "2733"}, {"systemName": "SAP", "systemReference": "33557"}]
country_code is varchar type.
Either create a GIN index on identifiers ..
CREATE INDEX customers_identifiers_idx ON customers
USING GIN(identifiers);
.. or a composite index with identifiers and country_code.
CREATE INDEX customers_country_code_identifiers_idx ON customers
USING GIN(identifiers,country_code gin_trgm_ops);
The second option will depend on the values distribution of country_code.
Demo: db<>fiddle
You can create gin index for jsonb typed columns in Postgresql. gin index has built-in operator classes to handle jsonb operators. Learn more about gin index here https://www.postgresql.org/docs/12/gin-intro.html
For varchar types, btree index is good enough.

How to Index and make WHERE clause case insensitive?

Have this table in PostgreSQL 12, no index
CREATE TABLE tbl
(
...
foods json NOT NULL
)
sample record:
foods:
{
"fruits": [" 2 orange ", "1 apple in chocolate", " one pint of berry"],
"meat": ["some beef", "ground beef", "chicken",...],
"veg": ["cucumber"]
}
Need to select all records who satisfy:
fruits contains orange.
AND meat contains beef or chicken.
select * from tbl where foods->> 'fruits' LIKE '%ORANGE%' and (foods->> 'meat' LIKE '%beef%' or foods->> 'meat' LIKE '%chicken%')
Is it an optimized query? (I'm from RDBMS world)
How to index for faster response and not overkill, also how to make PostgreSQL case insensitive?
This will make you unhappy.
You would need two trigram GIN indexes to speed this up:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON tbl USING gin ((foods ->> 'fruits') gin_trgm_ops);
CREATE INDEX ON tbl USING gin ((foods ->> 'meat') gin_trgm_ops);
These indexes can become large and will impact data modification performance.
Then you need to rewrite your query to use ILIKE.
Finally, the query might be slower than you want, because it will use three index scans and a (potentially expensive) bitmap heap scan.
But with a data structure like that and substring matches, you cannot do better.

Indexing a josnb column in postgresql

I have a column in postgresql table with type jsonb.
{
.....
"type": "car",
"vehicleIds": [
"980e3761-935a-4e52-be77-9f9461dec4d1","980e3761-935a-4e52-be77-9f9461dec4d2"
]
.....
}
Application runs queries against these fields to fetch records. I need to index this column only for these fields.
How can this be done?
This is query structure with properties as the column name:
SELECT *
FROM Vehicle f
WHERE f.properties::text ## CONCAT('$.vehicleIds[*] >', :vehicleId )= true
AND f.properties::text ## CONCAT('$.type >', :type ) = true
The query you are using is highly confusing, as it boils down to be a text search query, as the ## is applied on a text value.
I also don't understand the '$.type > ... condition. With values like car I would expect an equality operator, rather than "greater than". Using > together with a UUID also doesn't seem to make sense.
If you want to search for values of type car and contain a list of IDs, using the "contains" operator #> is a better way to do that:
SELECT *
FROM Vehicle f
WHERE f.properties #> '{"type": "car", "vehicleIds": ["980e3761-935a-4e52-be77-9f9461dec4d1"]}'
The above could make use of a GIN index on the properties column:
create index on vehicles using gin (properties);
If the type key is always queried with equality (which I assume), a combined index might be more efficient:
create index on vehicles using gin ( (properties ->> 'type'), (properties -> 'vehicleIds') );
You need to install the btree_gin extension in order to create that index.
That index would be a bit smaller but needs a different query:
SELECT *
FROM Vehicle f
WHERE f.properties ->> 'type' = 'car'
AND f.properties -> 'vehicleIds' #> '["980e3761-935a-4e52-be77-9f9461dec4d1"]'
You will need to validate if the indexes are used and which ones is more efficient by looking at the execution plan

Index created for PostgreSQL jsonb column not utilized

I have created an index for a field in jsonb column as:
create index on Employee using gin ((properties -> 'hobbies'))
Query generated is:
CREATE INDEX employee_expr_idx ON public.employee USING gin (((properties -> 'hobbies'::text)))
My search query has structure as:
SELECT * FROM Employee e
WHERE e.properties #> '{"hobbies": ["trekking"]}'
AND e.department = 'Finance'
Running EXPLAIN command for this query gives:
Seq Scan on employee e (cost=0.00..4452.94 rows=6 width=1183)
Filter: ((properties #> '{"hobbies": ["trekking"]}'::jsonb) AND (department = 'Finance'::text))
Going by this, I am not sure if index is getting used for search.
Is this entire setup ok?
The expression you use in the WHERE clause must match the expression in the index exactly, your index uses the expression: ((properties -> 'hobbies'::text)) but your query only uses e.properties on the left hand side.
To make use of that index, your WHERE clause needs to use the same expression as was used in the index:
SELECT *
FROM Employee e
WHERE (properties -> 'hobbies') #> '["trekking"]'
AND e.department = 'Finance'
However: your execution plan shows that the table employee is really tiny (rows=6). With a table as small as that, a Seq Scan is always going to be the fastest way to retrieve data, no matter what kind of indexes you define.

How to use jsonb index in postgres

My pg is 9.5+.
I have a jsonb data in column 'body':
{
"id":"58cf96481ebf47eba351db3b",
"JobName":"test",
"JobDomain":"SAW",
"JobStatus":"TRIGGERED",
"JobActivity":"ACTIVE"
}
And I create index for body and key:
CREATE INDEX scheduledjob_request_id_idx ON "ScheduledJob" USING gin ((body -> 'JobName'));
CREATE INDEX test_index ON "ScheduledJob" USING gin (body jsonb_path_ops)
This are my queries:
SELECT body FROM "ScheduledJob" WHERE body #> '{"JobName": "analytics_import_transaction_job"}';
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
Those are return correct data, but no one use index.
I saw the explain:
-> Seq Scan on public."ScheduledJob" (cost=0.00..4.55 rows=1 width=532)
So, I don't know why didn't use the index, and how to use the index for jsonb correctly.
Update:
I create index before insert data, the query can use index.
But I create index after insert the first data, the query will be
scan all records.
This is so strange, and how can I make the index useful when I insert data first.
So, I do some research and test that:
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
This kind of query will never use the index.
And only the table have enough data, index can be available anytime.