Index created for PostgreSQL jsonb column not utilized - postgresql

I have created an index for a field in jsonb column as:
create index on Employee using gin ((properties -> 'hobbies'))
Query generated is:
CREATE INDEX employee_expr_idx ON public.employee USING gin (((properties -> 'hobbies'::text)))
My search query has structure as:
SELECT * FROM Employee e
WHERE e.properties #> '{"hobbies": ["trekking"]}'
AND e.department = 'Finance'
Running EXPLAIN command for this query gives:
Seq Scan on employee e (cost=0.00..4452.94 rows=6 width=1183)
Filter: ((properties #> '{"hobbies": ["trekking"]}'::jsonb) AND (department = 'Finance'::text))
Going by this, I am not sure if index is getting used for search.
Is this entire setup ok?

The expression you use in the WHERE clause must match the expression in the index exactly, your index uses the expression: ((properties -> 'hobbies'::text)) but your query only uses e.properties on the left hand side.
To make use of that index, your WHERE clause needs to use the same expression as was used in the index:
SELECT *
FROM Employee e
WHERE (properties -> 'hobbies') #> '["trekking"]'
AND e.department = 'Finance'
However: your execution plan shows that the table employee is really tiny (rows=6). With a table as small as that, a Seq Scan is always going to be the fastest way to retrieve data, no matter what kind of indexes you define.

Related

How to use an index when using jsonb_array_elements in Postgres

I have the next table structure:
create table public.listings (id varchar(255) not null, data jsonb not null);
And the next indexes:
create index listings_data_index on public.listings using gin(data jsonb_ops);
create unique index listings_id_index on public.listings(id);
alter table public.listings add constraint listings_id_pk primary key(id);
With this row:
id | data
1 | {"attributes": {"ccid": "123", "listings": [{"vin": "1234","body": "Sleeper", "make": "International"}, { "vin": "5678", "body": "Sleeper", "make": "International" }]}}
The use case needs to retrieve a specific item inside the listings array that matches a specific vin.
I am accomplishing that with the next query:
SELECT elems
FROM public.listings, jsonb_array_elements(data->'attributes'->'listings') elems
WHERE id = '1' AND elems->'vin' ? '1234';
The output is what I need:
{"vin": "1234","body": "Sleeper", "make": "International"}
Now I am in the phase of optimizing this query, since there will be millions of rows, and up to 100K items inside listings array.
When I run the explain over that query is shows this:
Nested Loop (cost=0.01..2.53 rows=1 width=32)
-> Seq Scan on listings (cost=0.00..1.01 rows=1 width=32)
Filter: ((id)::text = '1'::text)
-> Function Scan on jsonb_array_elements elems (cost=0.01..1.51 rows=1 width=32)
Filter: ((value -> 'vin'::text) ? '1234'::text)
I wonder what would be the right way to construct an index for that, or if I need to modify the query to another that is more efficient.
Thank you!
First: with a table as small as that, you will never see PostgreSQL use an index. You need to try with realistic amounts. Second: while PostgreSQL will happily use an index for the condition on id, it can never use an index for such a JSON search, no matter how you write it.

Can Postgres use multiple indexes in a single query?

Assume that I have a query like below:
select
sum(impressions) as imp, sum(taps) as taps
from report
where org_id = 1 and report_date between '2019-01-01' and '2019-10-10'
group by country, text;
In MYSQL, there is no support for multiple indexing for a single query. Can I use multiple indexes for a single query in PostgeSQL?
Like:
For where condition: index(org_id, report_date);
For group by: index(country, text);
Explain:
"GroupAggregate (cost=8.18..8.21 rows=1 width=604)"
" Group Key: country, text"
" -> Sort (cost=8.18..8.18 rows=1 width=556)"
" Sort Key: country, text"
" -> Index Scan using idx_org_date on report (cost=0.14..8.17 rows=1 width=556)"
" Index Cond: ((org_id = 1) AND (date >= '2019-01-01'::date) AND (date <= '2019-02-02'::date))"
Yes and no. It can in general, but it can't use one index to get selectivity, and another to obtain the ordering needed for an efficient GROUP BY, on the same relation.
For example, if you had separate indexes on "org_id" and "report_date", it would be able to combine them using a BitmapAnd. But that would be less efficient than having your current two-column index, so this fact probably isn't of use to you.
You might be better off with a HashAgg. You could try increasing work_mem in order to get one. But if there truly is only one row, it won't really matter.

How to use jsonb index in postgres

My pg is 9.5+.
I have a jsonb data in column 'body':
{
"id":"58cf96481ebf47eba351db3b",
"JobName":"test",
"JobDomain":"SAW",
"JobStatus":"TRIGGERED",
"JobActivity":"ACTIVE"
}
And I create index for body and key:
CREATE INDEX scheduledjob_request_id_idx ON "ScheduledJob" USING gin ((body -> 'JobName'));
CREATE INDEX test_index ON "ScheduledJob" USING gin (body jsonb_path_ops)
This are my queries:
SELECT body FROM "ScheduledJob" WHERE body #> '{"JobName": "analytics_import_transaction_job"}';
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
Those are return correct data, but no one use index.
I saw the explain:
-> Seq Scan on public."ScheduledJob" (cost=0.00..4.55 rows=1 width=532)
So, I don't know why didn't use the index, and how to use the index for jsonb correctly.
Update:
I create index before insert data, the query can use index.
But I create index after insert the first data, the query will be
scan all records.
This is so strange, and how can I make the index useful when I insert data first.
So, I do some research and test that:
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
This kind of query will never use the index.
And only the table have enough data, index can be available anytime.

Optimizing Postgres JSONB query with not null constraint

I've got a Postgres 9.4.4 database with 1.7 million records with the following information stored in a JSONB column called data in a table called accounts:
data: {
"lastUpdatedTime": "2016-12-26T12:09:43.901Z",
"UID": "2c5bb7fd-1a00-4988-8d92-ffaa52ebc20d",
"data": {
"country": "UK",
"verified_at": "2017-01-01T23:49:10.217Z"
}
}
The data format cannot be changed since this is legacy information.
I need to obtain all accounts where the country is UK, the verified_at value is not null and the lastUpdatedTime value is greater than some given value.
So far, I have the following query:
SELECT * FROM "accounts"
WHERE (data #> '{ "data": { "country": "UK" } }')
AND (data->'data' ? 'verified_at')
AND ((data->'data' ->> 'verified_at') is not null)
AND (data ->>'lastUpdatedTime' > '2016-02-28T05:49:08.511846')
ORDER BY data ->>'lastUpdatedTime' LIMIT 100 OFFSET 0;
And the following indexes:
"accounts_idxgin" gin (data)
"accounts_idxgin_on_data" gin ((data -> 'data'::text))
I've managed to get the query time down to about 1000 to 4000ms
Here is the analyze from the query:
Bitmap Heap Scan on accounts (cost=41.31..6934.50 rows=9 width=1719)
(actual time=7.273..1067.657 rows=23190 loops=1)
Recheck Cond: ((data -> 'data'::text) ? 'verified_at'::text)
Filter: ((((data -> 'data'::text) ->> 'verified_at'::text) IS NOT NULL)
AND ((data ->> 'lastUpdatedTime'::text) > '2016-02-01 05:49:08.511846'::text)
AND (((data -> 'data'::text) ->> 'country'::text) = 'UK'::text))
Rows Removed by Filter: 4
Heap Blocks: exact=16039
-> Bitmap Index Scan on accounts_idxgin_on_data (cost=0.00..41.30 rows=1773 width=0)
(actual time=4.618..4.618 rows=23194 loops=1)
Index Cond: ((data -> 'data'::text) ? 'verified_at'::text)
Planning time: 0.448 ms
Execution time: 1069.344 ms
(9 rows)
I have the following questions
Is there anything I can do to further speed up this query?
What is the correct way to speed up a field is not null query with JSONB? I ended up using the existence operator with (data->'data' ? 'verified_at') to filter out a large number of non-matching records, because much of my data doesn't have verified_at as a top level key. This increased the speed of the query, but I'm wondering if there's a general approach to optimizing this type of query.
In order to use the existence operator with (data->'data' ? 'verified_at'), I needed to add another index on ((data -> 'data'::text)). I already had an index on gin (data), but the existence operator didn't use this. Why is that? I thought the existence and containment operators would use this index.
3: Not really. This case is explicitly mentioned in the docs.
When you have an index on the column data, it is only used, when you query your table, like data #> '...' or data ? '...'. When you have an index on the expression (data -> 'data'), these queries can take advantage of it: (data -> 'data') #> '...' or (data -> 'data') ? '...'.
2: usual jsonb indexes won't help during a (jsonb_col -> '<key>') is [not] null query at all. And unfortunately, you cannot use jsonb_col #> '{"<key>":null}' either, because the JSON object might lack the key entirely. Also reverse use of the index (for is not null) is not possible at all. But there may be a trick...
1: Not much. There may be some improvements, but don't expect huge performance advantages. So here them go:
You can use the jsonb_path_ops operator class instead of the (default) jsonb_ops. This should mean a little improvement in performance, but they cannot use the existence operator (?). But we won't need it anyway.
You have a single, index-unfriendly, boolean typed expression, which slows you down. Thankfully you can use a partial index here if you only interested in true values.
So, your index should look something like this:
create index accounts_idxgin_on_data
on accounts using gin ((data -> 'data') jsonb_path_ops)
where (data -> 'data' ->> 'verified_at') is not null;
With this index, you can use the following query:
select *
from accounts
where (data -> 'data') #> '{"country":"UK"}'
and (data -> 'data' ->> 'verified_at') is not null
and (data ->> 'lastUpdatedTime') > '2016-02-28T05:49:08.511Z'
order by data ->>'lastUpdatedTime';
Note: for proper timestamp comparisons, you should use (data ->> 'lastUpdatedTime')::timestamptz > '2016-02-28T05:49:08.511Z'.
http://rextester.com/QWUW41874
After playing around a bit more, I've managed to reduce my query time from around 1000ms to 350ms by creating the following partial index:
CREATE INDEX index_accounts_partial_on_verified_at
ON accounts ((data->'data'->'verified_at'))
WHERE (data->'data'->>'verified_at') IS NOT NULL
AND (data->'data' ? 'verified_at')
AND (data->'data'->>'country' = 'UK');
I was able to hardcode some of the values in this index, such as country=UK because I only need to consider UK accounts for this query. I was also able to remove the index on ((data->'data')) which was 258MB, and replace it with the partial index which is only 1360 kB!
For anyone interested, I found the details for building a partial JSONB index from here
Use the path access operator for faster access to lower-level objects:
SELECT * FROM "accounts"
WHERE data #>> '{data, country}' = 'UK'
AND data #>> '{data, verified_at}' IS NOT NULL
AND data ->> 'lastUpdatedTime' > '2016-02-28T05:49:08.511846'
ORDER BY data ->> 'lastUpdatedTime' LIMIT 100 OFFSET 0;
The index only works on the top-level key. So, with an index on column data queries like data #> [[key]] are supported. However, for a query on data -> 'data' ? 'verified_at' you need an index on data->'data'.
Two more points:
I don't think it is necessary to test for the presence of verified_at. If it is not there it simply comes out as NULL so it gets caught by the same test.
Comparing string representations of timestamp values may work if the JSON value is properly and consistently formatted. Cast to timestamp to be on the safe side.

Postgres jsonb query missing index?

We have the following json documents stored in our PG table (identities) in a jsonb column 'data':
{
"email": {
"main": "mainemail#email.com",
"prefix": "aliasPrefix",
"prettyEmails": ["stuff1", "stuff2"]
},
...
}
I have the following index set up on the table:
CREATE INDEX ix_identities_email_main
ON identities
USING gin
((data -> 'email->main'::text) jsonb_path_ops);
What am I missing that is preventing the following query from hitting that index?? It does a full seq scan on the table... We have tens of millions of rows, so this query is hanging for 15+ minutes...
SELECT * FROM identities WHERE data->'email'->>'main'='mainemail#email.com';
If you use JSONB data type for your data column, in order to index ALL "email" entry values you need to create following index:
CREATE INDEX ident_data_email_gin_idx ON identities USING gin ((data -> 'email'));
Also keep in mind that for JSONB you need to use appropriate list of operators;
The default GIN operator class for jsonb supports queries with the #>,
?, ?& and ?| operators
Following queries will hit this index:
SELECT * FROM identities
WHERE data->'email' #> '{"main": "mainemail#email.com"}'
-- OR
SELECT * FROM identities
WHERE data->'email' #> '{"prefix": "aliasPrefix"}'
If you need to search against array elements "stuff1" or "stuff2", index above will not work , you need to explicitly add expression index on "prettyEmails" array element values in order to make query work faster.
CREATE INDEX ident_data_prettyemails_gin_idx ON identities USING gin ((data -> 'email' -> 'prettyEmails'));
This query will hit the index:
SELECT * FROM identities
WHERE data->'email' #> '{"prettyEmails":["stuff1"]}'