I'm wondering how to correctly create an index that would index the keys of a JSONB column that has the following shape:
{
"perm1": true,
"perm2": false,
"perm3": true
}
Basically I'd like the lookup of perm1, perm2 and perm3 to be the keys to reference. I don't know the values of the keys so I can't using the regular create index. I did find this as a suggestion:
CREATE INDEX idx_visitors ON visitors USING GIN (jsonb jsonb_path_value_ops);
Is there a more appropriate way of creating asserting this index?
Related
Please let me know how to create index for below query.
SELECT * FROM customers
WHERE identifiers #>
'[{"systemName": "SAP", "systemReference": "33557"}]'
AND country_code = 'IN';
identifiers is jsonb type and data is as below.
[{"systemName": "ERP", "systemReference": "TEST"}, {"systemName": "FEED", "systemReference": "2733"}, {"systemName": "SAP", "systemReference": "33557"}]
country_code is varchar type.
Either create a GIN index on identifiers ..
CREATE INDEX customers_identifiers_idx ON customers
USING GIN(identifiers);
.. or a composite index with identifiers and country_code.
CREATE INDEX customers_country_code_identifiers_idx ON customers
USING GIN(identifiers,country_code gin_trgm_ops);
The second option will depend on the values distribution of country_code.
Demo: db<>fiddle
You can create gin index for jsonb typed columns in Postgresql. gin index has built-in operator classes to handle jsonb operators. Learn more about gin index here https://www.postgresql.org/docs/12/gin-intro.html
For varchar types, btree index is good enough.
I am creating a table that has a features jsonb column. There will be a dynamic set of features (each row can have an unknown set of features). Each feature is boolean true/false values.
Example:
Row 1: feature: { "happy": true, "tall": false, "motivated": true }
Row 2: feature: { "happy": true, "fast": true, "strong": false }
Row 3: feature: { "smart": true, "fast": true, "sleepy": false }
What would be the best way to index this column such that I can make queries to find all rows with featureX = true? All the examples I have looked up seem to need a field name to base the index on.
You can create an index on the complete JSON value:
create index on the_table using gin (features);
It can be used for e.g. the #> operator:
select *
from the_table
where features #> '{"happy": true}'
Another method would be to not store key/value pairs, but only list the features that are "true" in an array: ["happy", "motivated"] and then use the ? operator. This way the JSON value is a bit smaller and that might be more efficient.
select *
from the_table
where features ? 'happy'
or if you want to test for multiple features:
select *
from the_table
where features ?| array['happy', 'motivated']
That too can make use of the GIN index
My pg is 9.5+.
I have a jsonb data in column 'body':
{
"id":"58cf96481ebf47eba351db3b",
"JobName":"test",
"JobDomain":"SAW",
"JobStatus":"TRIGGERED",
"JobActivity":"ACTIVE"
}
And I create index for body and key:
CREATE INDEX scheduledjob_request_id_idx ON "ScheduledJob" USING gin ((body -> 'JobName'));
CREATE INDEX test_index ON "ScheduledJob" USING gin (body jsonb_path_ops)
This are my queries:
SELECT body FROM "ScheduledJob" WHERE body #> '{"JobName": "analytics_import_transaction_job"}';
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
Those are return correct data, but no one use index.
I saw the explain:
-> Seq Scan on public."ScheduledJob" (cost=0.00..4.55 rows=1 width=532)
So, I don't know why didn't use the index, and how to use the index for jsonb correctly.
Update:
I create index before insert data, the query can use index.
But I create index after insert the first data, the query will be
scan all records.
This is so strange, and how can I make the index useful when I insert data first.
So, I do some research and test that:
SELECT body FROM "ScheduledJob" WHERE (body#>'{JobName}' = '"analytics_import_transaction_job"') LIMIT 10;
This kind of query will never use the index.
And only the table have enough data, index can be available anytime.
We have the following json documents stored in our PG table (identities) in a jsonb column 'data':
{
"email": {
"main": "mainemail#email.com",
"prefix": "aliasPrefix",
"prettyEmails": ["stuff1", "stuff2"]
},
...
}
I have the following index set up on the table:
CREATE INDEX ix_identities_email_main
ON identities
USING gin
((data -> 'email->main'::text) jsonb_path_ops);
What am I missing that is preventing the following query from hitting that index?? It does a full seq scan on the table... We have tens of millions of rows, so this query is hanging for 15+ minutes...
SELECT * FROM identities WHERE data->'email'->>'main'='mainemail#email.com';
If you use JSONB data type for your data column, in order to index ALL "email" entry values you need to create following index:
CREATE INDEX ident_data_email_gin_idx ON identities USING gin ((data -> 'email'));
Also keep in mind that for JSONB you need to use appropriate list of operators;
The default GIN operator class for jsonb supports queries with the #>,
?, ?& and ?| operators
Following queries will hit this index:
SELECT * FROM identities
WHERE data->'email' #> '{"main": "mainemail#email.com"}'
-- OR
SELECT * FROM identities
WHERE data->'email' #> '{"prefix": "aliasPrefix"}'
If you need to search against array elements "stuff1" or "stuff2", index above will not work , you need to explicitly add expression index on "prettyEmails" array element values in order to make query work faster.
CREATE INDEX ident_data_prettyemails_gin_idx ON identities USING gin ((data -> 'email' -> 'prettyEmails'));
This query will hit the index:
SELECT * FROM identities
WHERE data->'email' #> '{"prettyEmails":["stuff1"]}'
I have a table with an HSTORE column 'ext', where the value is an int4range. An example:
"p1"=>"[10, 18]", "p2"=>"[24, 32]", "p3"=>"[29, 32]", "p4"=>"[18, 19]"
However, when I try to create an expression index on this, I get an error:
CREATE INDEX ix_test3_p1
ON test3
USING gist
(((ext -> 'p1'::text)::int4range));
ERROR: data type text has no default operator class for access method
"gist" SQL state: 42704 Hint: You must specify an operator class for
the index or define a default operator class for the data type.
How do I create the operator for this?
NOTE
Each record may have its own unique set of keys. Each key represents an attribute, and the values the value range. So not all records will have "p1". Consider this an EAV model in hstore.
I don't get that error - I get "functions in index expression must be marked IMMUTABLE"
CREATE TABLE ht (ext hstore);
INSERT INTO ht VALUES ('p1=>"[10,18]"'), ('p1=>"[99,99]"');
CREATE INDEX ht_test_idx ON ht USING GIST ( ((ext->'p1'::text)::int4range) );
ERROR: functions in index expression must be marked IMMUTABLE
CREATE FUNCTION foo(hstore) RETURNS int4range LANGUAGE SQL AS $$ SELECT ($1->'p1')::int4range; $$ IMMUTABLE;
CREATE INDEX ht_test_idx ON ht USING GIST ( foo(ext) );
SET enable_seq_scan=false;
EXPLAIN SELECT * FROM ht WHERE foo(ext) = '[10,19)';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using ht_test_idx on ht (cost=0.25..8.52 rows=1 width=32)
Index Cond: (foo(ext) = '[10,19)'::int4range)
I'm guessing the cast isn't immutable because you can change the default format of the range from inclusive...exclusive "[...)" to something else. You presumably won't be doing that though.
Obviously you'll want your real function to deal with things like missing "p1" entries, badly formed range values etc.