How to Index and make WHERE clause case insensitive? - postgresql

Have this table in PostgreSQL 12, no index
CREATE TABLE tbl
(
...
foods json NOT NULL
)
sample record:
foods:
{
"fruits": [" 2 orange ", "1 apple in chocolate", " one pint of berry"],
"meat": ["some beef", "ground beef", "chicken",...],
"veg": ["cucumber"]
}
Need to select all records who satisfy:
fruits contains orange.
AND meat contains beef or chicken.
select * from tbl where foods->> 'fruits' LIKE '%ORANGE%' and (foods->> 'meat' LIKE '%beef%' or foods->> 'meat' LIKE '%chicken%')
Is it an optimized query? (I'm from RDBMS world)
How to index for faster response and not overkill, also how to make PostgreSQL case insensitive?

This will make you unhappy.
You would need two trigram GIN indexes to speed this up:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON tbl USING gin ((foods ->> 'fruits') gin_trgm_ops);
CREATE INDEX ON tbl USING gin ((foods ->> 'meat') gin_trgm_ops);
These indexes can become large and will impact data modification performance.
Then you need to rewrite your query to use ILIKE.
Finally, the query might be slower than you want, because it will use three index scans and a (potentially expensive) bitmap heap scan.
But with a data structure like that and substring matches, you cannot do better.

Related

Indexing a josnb column in postgresql

I have a column in postgresql table with type jsonb.
{
.....
"type": "car",
"vehicleIds": [
"980e3761-935a-4e52-be77-9f9461dec4d1","980e3761-935a-4e52-be77-9f9461dec4d2"
]
.....
}
Application runs queries against these fields to fetch records. I need to index this column only for these fields.
How can this be done?
This is query structure with properties as the column name:
SELECT *
FROM Vehicle f
WHERE f.properties::text ## CONCAT('$.vehicleIds[*] >', :vehicleId )= true
AND f.properties::text ## CONCAT('$.type >', :type ) = true
The query you are using is highly confusing, as it boils down to be a text search query, as the ## is applied on a text value.
I also don't understand the '$.type > ... condition. With values like car I would expect an equality operator, rather than "greater than". Using > together with a UUID also doesn't seem to make sense.
If you want to search for values of type car and contain a list of IDs, using the "contains" operator #> is a better way to do that:
SELECT *
FROM Vehicle f
WHERE f.properties #> '{"type": "car", "vehicleIds": ["980e3761-935a-4e52-be77-9f9461dec4d1"]}'
The above could make use of a GIN index on the properties column:
create index on vehicles using gin (properties);
If the type key is always queried with equality (which I assume), a combined index might be more efficient:
create index on vehicles using gin ( (properties ->> 'type'), (properties -> 'vehicleIds') );
You need to install the btree_gin extension in order to create that index.
That index would be a bit smaller but needs a different query:
SELECT *
FROM Vehicle f
WHERE f.properties ->> 'type' = 'car'
AND f.properties -> 'vehicleIds' #> '["980e3761-935a-4e52-be77-9f9461dec4d1"]'
You will need to validate if the indexes are used and which ones is more efficient by looking at the execution plan

Index created for PostgreSQL jsonb column not utilized

I have created an index for a field in jsonb column as:
create index on Employee using gin ((properties -> 'hobbies'))
Query generated is:
CREATE INDEX employee_expr_idx ON public.employee USING gin (((properties -> 'hobbies'::text)))
My search query has structure as:
SELECT * FROM Employee e
WHERE e.properties #> '{"hobbies": ["trekking"]}'
AND e.department = 'Finance'
Running EXPLAIN command for this query gives:
Seq Scan on employee e (cost=0.00..4452.94 rows=6 width=1183)
Filter: ((properties #> '{"hobbies": ["trekking"]}'::jsonb) AND (department = 'Finance'::text))
Going by this, I am not sure if index is getting used for search.
Is this entire setup ok?
The expression you use in the WHERE clause must match the expression in the index exactly, your index uses the expression: ((properties -> 'hobbies'::text)) but your query only uses e.properties on the left hand side.
To make use of that index, your WHERE clause needs to use the same expression as was used in the index:
SELECT *
FROM Employee e
WHERE (properties -> 'hobbies') #> '["trekking"]'
AND e.department = 'Finance'
However: your execution plan shows that the table employee is really tiny (rows=6). With a table as small as that, a Seq Scan is always going to be the fastest way to retrieve data, no matter what kind of indexes you define.

How to efficiently index multiple nested numbers in JSONB structure with PostgreSQL for efficient comparison operations? (optional: SQLAlchemy)

I want to make sure PostgreSQL indexing works properly (B-Tree, to allow efficient greater than / smaller than operations on numbers) inside a JSONB on multiple nested numbers. The JSONB "data" column would look as follows:
data: {
a: {n: 1000, str: 'blabla'},
b: {n: 2000, str: 'blabla'},
c: {n: 3000, str: 'blabla'},
d: {n: 4000, str: 'blabla'},
...[we can assume 10 such nested dicts]
}
Where I would select rows based on combinations of multiple nested numbers, ex:
WHERE data['a']['n'] == 1000
AND data['b']['n'] == 2000
AND data['c']['n'] >= 3000
AND data['d']['n'] <= 4000
and adding multiple ORDER BYs such as:
ORDER BY DESC(data['a']['n']) + DESC(data['b']['n']) etc.
to achieve ordering based on the a, b, c, d hierarchy and nested numbers 'n' in ascending or descending order.
I've put some code below, but I can't tell if the indexing is working as expected and I'm wondering if this is the right way or if there's a better way to achieve this? (ideally using JSONB)
I'm using PostgreSQL 11 (with SQLAlchemy ORM), so the table and index declaration look as per below:
class TableWithJSONB(db.Base):
__tablename__ = 'tablewithjsonb'
id = Column(Integer, primary_key=True)
data = Column(NestedMutable.as_mutable(JSONB), nullable=False)
__table_args__ = ( # Adding Indexes
# GIN using jsonb_path_ops => are these indexes useful?
Index(
'ix_data_a_gin',
text("(data->'a') jsonb_path_ops"),
postgresql_using='gin',
),
Index(
'ix_data_b_gin',
text("(data->'b') jsonb_path_ops"),
postgresql_using='gin',
),
Index(
'ix_data_c_gin',
text("(data->'c') jsonb_path_ops"),
postgresql_using='gin',
),
...
# BTree Indexes on nested numbers
Index(
'ix_data_a_bTree',
text("((data #> '{a, n}')::INTEGER) int4_ops"),
),
# BTree Indexes on nested numbers
Index(
'ix_data_b_bTree',
text("((data #> '{b, n}')::INTEGER) int4_ops"),
),
# BTree Indexes on nested numbers
Index(
'ix_data_c_bTree',
text("((data #> '{c, n}')::INTEGER) int4_ops"),
),
...
)
After reading what I could find on the subject, I'm not sure if the b-Tree index actually works as expected for each nested numerical value inside JSONB. Also, I can't tell if the GIN jsonb_path_ops index makes any sense on the nested dicts a, b, c, d for the usage described above. Is this the right way or is there a better way?
UPDATE: I seem to have answered my own question. See dbfiddle here
Indexing nested numeric value in JSONB (with b-Tree index):
CREATE INDEX i_btree_a ON tablewithjsonb (((data #> '{a, n}')::INTEGER) int4_ops);
Successfully creates index on the numeric value data['a']['n'] in JSONB.
The index is used with queries such as:
explain analyze select * from tablewithjsonb
where (data #> '{a, n}')::INTEGER <= 10000;
Creating a combined index on multiple numeric values within the same JSONB works as well (in this particular case the index above (i_btree_a) would be redundant, searching on data['a']['n'] would use the index i_btree_a_b below instead):
CREATE INDEX i_btree_a_b ON tablewithjsonb
(((data #> '{a, n}')::INTEGER) int4_ops,
((data #> '{b, n}')::INTEGER) int4_ops);
..which would be used on queries such as:
explain analyze select * from tablewithjsonb
where (data #> '{a, n}')::INTEGER <= 10000 AND
where (data #> '{b, n}')::INTEGER <= 10000;
Indexing nested string/text value in JSONB (with b-Tree index):
CREATE INDEX i_btree_s_a ON tablewithjsonb ((data #>> '{a, s}'));
The b-Tree index will be used for equality (=) and LIKE operations (Execution Time: 0.048 ms):
explain analyze select * from tablewithjsonb
where (data #>> '{a, s}') = 'blabla';
explain analyze select * from tablewithjsonb
where (data #>> '{a, s}') LIKE '%blabla%';
(Update: when I tried this separately, it went for sequential scan instead of index. Why?)
Although I don't understand why the following goes for a sequential scan (Execution Time: 53.712 ms) (why?):
explain analyze select * from tablewithjsonb
where (data #>> '{a, s}') LIKE '%blabla 1 5%';
Indexing nested string/text value in JSONB (with GIN full text search index):
CREATE INDEX i_gin_ts_s_a ON tablewithjsonb
USING GIN (( to_tsvector('english', (data #>> '{a, s}')) ));
The GIN full text search index will be used for queries as such:
explain analyze select * from tablewithjsonb where
to_tsvector('english', (data #>> '{a, s}')) ## to_tsquery('blabla & 1 & 5:*');
(Execution Time: 34.845 ms)
I note that this last query (via GIN full-text search) is quite slow (why?), not far from the sequential scan mentioned above where Execution Time was 53.712 ms.

How to use jsonb index in postgres

My pg is 9.5+.
I have a jsonb data in column 'body':
{
"id":"58cf96481ebf47eba351db3b",
"JobName":"test",
"JobDomain":"SAW",
"JobStatus":"TRIGGERED",
"JobActivity":"ACTIVE"
}
And I create index for body and key:
CREATE INDEX scheduledjob_request_id_idx ON "ScheduledJob" USING gin ((body -> 'JobName'));
CREATE INDEX test_index ON "ScheduledJob" USING gin (body jsonb_path_ops)
This are my queries:
SELECT body FROM "ScheduledJob" WHERE body #> '{"JobName": "analytics_import_transaction_job"}';
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
Those are return correct data, but no one use index.
I saw the explain:
-> Seq Scan on public."ScheduledJob" (cost=0.00..4.55 rows=1 width=532)
So, I don't know why didn't use the index, and how to use the index for jsonb correctly.
Update:
I create index before insert data, the query can use index.
But I create index after insert the first data, the query will be
scan all records.
This is so strange, and how can I make the index useful when I insert data first.
So, I do some research and test that:
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
This kind of query will never use the index.
And only the table have enough data, index can be available anytime.

Postgres jsonb query missing index?

We have the following json documents stored in our PG table (identities) in a jsonb column 'data':
{
"email": {
"main": "mainemail#email.com",
"prefix": "aliasPrefix",
"prettyEmails": ["stuff1", "stuff2"]
},
...
}
I have the following index set up on the table:
CREATE INDEX ix_identities_email_main
ON identities
USING gin
((data -> 'email->main'::text) jsonb_path_ops);
What am I missing that is preventing the following query from hitting that index?? It does a full seq scan on the table... We have tens of millions of rows, so this query is hanging for 15+ minutes...
SELECT * FROM identities WHERE data->'email'->>'main'='mainemail#email.com';
If you use JSONB data type for your data column, in order to index ALL "email" entry values you need to create following index:
CREATE INDEX ident_data_email_gin_idx ON identities USING gin ((data -> 'email'));
Also keep in mind that for JSONB you need to use appropriate list of operators;
The default GIN operator class for jsonb supports queries with the #>,
?, ?& and ?| operators
Following queries will hit this index:
SELECT * FROM identities
WHERE data->'email' #> '{"main": "mainemail#email.com"}'
-- OR
SELECT * FROM identities
WHERE data->'email' #> '{"prefix": "aliasPrefix"}'
If you need to search against array elements "stuff1" or "stuff2", index above will not work , you need to explicitly add expression index on "prettyEmails" array element values in order to make query work faster.
CREATE INDEX ident_data_prettyemails_gin_idx ON identities USING gin ((data -> 'email' -> 'prettyEmails'));
This query will hit the index:
SELECT * FROM identities
WHERE data->'email' #> '{"prettyEmails":["stuff1"]}'