How to create index for postgresql jsonb field (array data) and text field - postgresql

Please let me know how to create index for below query.
SELECT * FROM customers
WHERE identifiers #>
'[{"systemName": "SAP", "systemReference": "33557"}]'
AND country_code = 'IN';
identifiers is jsonb type and data is as below.
[{"systemName": "ERP", "systemReference": "TEST"}, {"systemName": "FEED", "systemReference": "2733"}, {"systemName": "SAP", "systemReference": "33557"}]
country_code is varchar type.

Either create a GIN index on identifiers ..
CREATE INDEX customers_identifiers_idx ON customers
USING GIN(identifiers);
.. or a composite index with identifiers and country_code.
CREATE INDEX customers_country_code_identifiers_idx ON customers
USING GIN(identifiers,country_code gin_trgm_ops);
The second option will depend on the values distribution of country_code.
Demo: db<>fiddle

You can create gin index for jsonb typed columns in Postgresql. gin index has built-in operator classes to handle jsonb operators. Learn more about gin index here https://www.postgresql.org/docs/12/gin-intro.html
For varchar types, btree index is good enough.

Related

index with where condition - Oracle

I would like to have an equivalent Oracle query for the below SQL Query
SQL QUERY:
CREATE UNIQUE NONCLUSTERED INDEX ValidSub_Category ON ValidSub (Category ASC) WHERE (category IS NOT NULL)
purpose: This index is created to make sure that the column has more than 1 NULL records but does not have duplicate strings.
Thanks in advance
I found it
CREATE UNIQUE INDEX VALIDSUB_CATEGORY ON VALIDSUB (Case WHEN Category
IS NOT NULL THEN CATEGORY END);

How to Index and make WHERE clause case insensitive?

Have this table in PostgreSQL 12, no index
CREATE TABLE tbl
(
...
foods json NOT NULL
)
sample record:
foods:
{
"fruits": [" 2 orange ", "1 apple in chocolate", " one pint of berry"],
"meat": ["some beef", "ground beef", "chicken",...],
"veg": ["cucumber"]
}
Need to select all records who satisfy:
fruits contains orange.
AND meat contains beef or chicken.
select * from tbl where foods->> 'fruits' LIKE '%ORANGE%' and (foods->> 'meat' LIKE '%beef%' or foods->> 'meat' LIKE '%chicken%')
Is it an optimized query? (I'm from RDBMS world)
How to index for faster response and not overkill, also how to make PostgreSQL case insensitive?
This will make you unhappy.
You would need two trigram GIN indexes to speed this up:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON tbl USING gin ((foods ->> 'fruits') gin_trgm_ops);
CREATE INDEX ON tbl USING gin ((foods ->> 'meat') gin_trgm_ops);
These indexes can become large and will impact data modification performance.
Then you need to rewrite your query to use ILIKE.
Finally, the query might be slower than you want, because it will use three index scans and a (potentially expensive) bitmap heap scan.
But with a data structure like that and substring matches, you cannot do better.

postgres: Composite fulltext / btree index

I want to do a fulltext search on one column and sort in a different column. If I index these two columns separately postgres can't use both indexes in this query. Is there a way to create a composite index that could be used in this scenario?
Unfortunately not.
While you can attach scalar columns to a GIN index via the btree_gin contrib module, Postgres can't use a GIN index for sorting. From the docs:
Of the index types currently supported by PostgreSQL, only B-tree can
produce sorted output — the other index types return matching rows in
an unspecified, implementation-dependent order.
I report as answer my previous comment with example on it
In a similar scenario I build a GiST index on a tsvector column and on
another tetxt colum with gist_trgm_ops operator so I actually did a
full-text search with the tsvector column and then ordered on the
other text column with trigram distance value using only one index.
I created an index on "title" and "search":
CREATE INDEX docs_docume_search_title_gist
ON public.docs_document
USING gist
(title COLLATE pg_catalog."default" gist_trgm_ops, search);
In this query the full-text search is on "search" and the ordering is on "title" with trigram:
SELECT "title", ("title" <-> 'json') AS "distance"
FROM "docs_document"
WHERE ("release_id" = 22 AND "search" ## (plainto_tsquery('json')) = true)
ORDER BY "distance" ASC
LIMIT 10
This is the explain:
Limit (cost=0.40..71.99 rows=10 width=29)
Output: title, (((title)::text <-> 'json'::text))
-> Index Scan using docs_docume_search_title_gist on public.docs_document (cost=0.40..258.13 rows=36 width=29)
Output: title, ((title)::text <-> 'json'::text)
Index Cond: (docs_document.search ## plainto_tsquery('json'::text))
Order By: ((docs_document.title)::text <-> 'json'::text)
Filter: (docs_document.release_id = 22)

Postgres jsonb query missing index?

We have the following json documents stored in our PG table (identities) in a jsonb column 'data':
{
"email": {
"main": "mainemail#email.com",
"prefix": "aliasPrefix",
"prettyEmails": ["stuff1", "stuff2"]
},
...
}
I have the following index set up on the table:
CREATE INDEX ix_identities_email_main
ON identities
USING gin
((data -> 'email->main'::text) jsonb_path_ops);
What am I missing that is preventing the following query from hitting that index?? It does a full seq scan on the table... We have tens of millions of rows, so this query is hanging for 15+ minutes...
SELECT * FROM identities WHERE data->'email'->>'main'='mainemail#email.com';
If you use JSONB data type for your data column, in order to index ALL "email" entry values you need to create following index:
CREATE INDEX ident_data_email_gin_idx ON identities USING gin ((data -> 'email'));
Also keep in mind that for JSONB you need to use appropriate list of operators;
The default GIN operator class for jsonb supports queries with the #>,
?, ?& and ?| operators
Following queries will hit this index:
SELECT * FROM identities
WHERE data->'email' #> '{"main": "mainemail#email.com"}'
-- OR
SELECT * FROM identities
WHERE data->'email' #> '{"prefix": "aliasPrefix"}'
If you need to search against array elements "stuff1" or "stuff2", index above will not work , you need to explicitly add expression index on "prettyEmails" array element values in order to make query work faster.
CREATE INDEX ident_data_prettyemails_gin_idx ON identities USING gin ((data -> 'email' -> 'prettyEmails'));
This query will hit the index:
SELECT * FROM identities
WHERE data->'email' #> '{"prettyEmails":["stuff1"]}'

How to create an operator in PostgreSQL for the hstore type with an int4range value

I have a table with an HSTORE column 'ext', where the value is an int4range. An example:
"p1"=>"[10, 18]", "p2"=>"[24, 32]", "p3"=>"[29, 32]", "p4"=>"[18, 19]"
However, when I try to create an expression index on this, I get an error:
CREATE INDEX ix_test3_p1
ON test3
USING gist
(((ext -> 'p1'::text)::int4range));
ERROR: data type text has no default operator class for access method
"gist" SQL state: 42704 Hint: You must specify an operator class for
the index or define a default operator class for the data type.
How do I create the operator for this?
NOTE
Each record may have its own unique set of keys. Each key represents an attribute, and the values the value range. So not all records will have "p1". Consider this an EAV model in hstore.
I don't get that error - I get "functions in index expression must be marked IMMUTABLE"
CREATE TABLE ht (ext hstore);
INSERT INTO ht VALUES ('p1=>"[10,18]"'), ('p1=>"[99,99]"');
CREATE INDEX ht_test_idx ON ht USING GIST ( ((ext->'p1'::text)::int4range) );
ERROR: functions in index expression must be marked IMMUTABLE
CREATE FUNCTION foo(hstore) RETURNS int4range LANGUAGE SQL AS $$ SELECT ($1->'p1')::int4range; $$ IMMUTABLE;
CREATE INDEX ht_test_idx ON ht USING GIST ( foo(ext) );
SET enable_seq_scan=false;
EXPLAIN SELECT * FROM ht WHERE foo(ext) = '[10,19)';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using ht_test_idx on ht (cost=0.25..8.52 rows=1 width=32)
Index Cond: (foo(ext) = '[10,19)'::int4range)
I'm guessing the cast isn't immutable because you can change the default format of the range from inclusive...exclusive "[...)" to something else. You presumably won't be doing that though.
Obviously you'll want your real function to deal with things like missing "p1" entries, badly formed range values etc.