PostgreSQL: Index JSONB array that is queried with `#?` operator - postgresql

My table (table) has a JSONB field (data) that contains a field with an array where I store tags (tags).
I query that table with an expression like:
SELECT * FROM table WHERE data->'tags' #? '$[*] ? (# like_regex ".*(foo|bar).*" flag "i");
With such use-case is there a way for me to index the data->'tags' array to speed up the query? Or should I rather work on moving the tags array out of the JSONB field and into a TEXT[] field and index that?
I've already tried:
CREATE INDEX foo ON tbl USING GIN ((data->'tags') jsonb_path_ops);
but it doesn't work: https://gist.github.com/vkaracic/a62ac917d34eb6e975c4daeefbd316e8

The index you built can be used (if you set enable_seqscan=off, you will see that it does get used), but it is generally not chosen as it is pretty useless for this query. The only rows it would rule out through the index are the ones that don't have the 'tags' key at all, and even at that is poorly estimated so probably won't be used without drastic measures.
You could try to convert to text[] and the use parray_gin, but probably better would be to convert to a child table with text and then use pg_trgm.

Related

fuzzy finding through database - prisma

I am trying to build a storage manager where users can store their lab samples/data. Unfortunately, this means that the tables will end up being quite dynamic, as each sample might have different data associated with it. I will still require users to define a schema, so I can display the data properly, however, I think this schema will have to be represented as a JSON field in the underlying database.
I was wondering, in Prisma, is there a way to fuzzy search through collections. Could I type something like help and then return all rows that match this expression ANYWHERE in their columns? (including the JSON fields). Could i do something like this at all with posgresql? Or with MongoDB?
thank you
You can easily do that with jsonb in PostgreSQL.
If you have a table defined like
CREATE TABLE userdata (
id bigint PRIMARY KEY,
important_col1 text,
important_col2 integer,
other_cols jsonb
);
You can create an index like this
CREATE INDEX ON userdata USING gin (other_cols);
and search efficiently with
SELECT id FROM userdata WHERE other_cols #> '{"attribute": "value"}';
Here, #> is the JSON containment operator in PostgreSQL.
Yes, in PostgreSQL you surely can do this. It's quite straightforward. Here is an example.
Let your table be called the_table aliased as tht. Cast an entire table row as text tht::text and use case insensitive regular expression match operator ~* to find rows that contain help in this text. You can use more elaborate and powerful regular expression for searching too.
Please note that since the ~* operator will defeat any index, this query will result in a sequential scan.
select * -- or whatever list of expressions you need
from the_table as tht
where tht::text ~* 'help';

Fast way to check if PostgreSQL jsonb column contains certain string

The past two days I've been reading a lot about jsonb, full text search, gin index, trigram index and what not but I still can not find a definitive or at least a good enough answer on how to fastly search if a row of type JSONB contains certain string as a value. Since it's a search functionality the behavior should be like that of ILIKE
What I have is:
Table, lets call it app.table_1 which contains a lot of columns one of which is of type JSONB, so lets call it column_jsonb
The data inside column_jsonb will always be flatten (no nested objects, etc) but the keys can vary. An example of the data in the column with obfuscated values looks like this:
"{""Key1"": ""Value1"", ""Key2"": ""Value2"", ""Key3"": null, ""Key4"": ""Value4"", ""Key5"": ""Value5""}"
I have a GIN index for this column which doesn't seems to affect the search time significantly (I am testing with 20k records now which takes about 550ms). The indes looks like this:
CREATE INDEX ix_table_1_column_jsonb_gin
ON app.table_1 USING gin
(column_jsonb jsonb_path_ops)
TABLESPACE pg_default;
I am interested only in the VALUES and the way I am searching them now is this:
EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
Here search_term is variable coming from the front end with the string that the user is searching for
I have the following questions:
Is it possible to make the check faster without modifying the data model? I've read that trigram index might be usfeul for similar cases but at least for me it seems that converting jsonb to text and then checking will be slower and actually I am not sure if the trigram index will actually work if the column original type is JSONB and I explicitly cast each row to text? If I'm wroing I would really appreciate some explanation with example if possible.
Is there some JSONB function that I am not aware of which offers what I am searching for out of the box, I'm constrained to PostgreSQL v 11.9 so some new things coming with version 12 are not available for me.
If it's not possible to achieve significant improvement with the current data structure can you propose a way to restructure the data in column_jsonb maybe another column of some other type with data persisted in some other way, I don't know...
Thank you very much in advance!
If the data structure is flat, and you regularly need to search the values, and the values are all the same type, a traditional key/value table would seem more appropriate.
create table table1_options (
table1_id bigint not null references table1(id),
key text not null,
value text not null
);
create index table1_options_key on table1_options(key);
create index table1_options_value on table1_options(value);
select *
from table1_options
where value ilike 'some search%';
I've used simple B-Tree indexes, but you can use whatever you need to speed up your particular searches.
The downsides are that all values must have the same type (doesn't seem to be a problem here) and you need an extra table for each table. That last one can be mitigated somewhat with table inheritance.

Indexing PostgreSQL JSONB Array Elements

Like the title says, how can I index a JSONB array?
The contents look like...
["some_value", "another_value"]
I can easily access the elements like...
SELECT * FROM table WHERE data->>0 = 'some_value';
I created an index like so...
CREATE INDEX table_data_idx ON table USING gin ((data) jsonb_path_ops);
When I run EXPLAIN, I still see it sequentially scanning...
What am I missing on indexing an array of text elements?
If you want to support that exact query with an index, the index would have to look like this:
CREATE INDEX ON "table" ((data->>0));
If you want to use the index you have, you cannot limit the search to just a specific array element (in your case, the first). You can speed up a search for some_value anywhere in the array:
SELECT * FROM "table"
WHERE data #> '["some_value"]'::jsonb;
I ended up taking a different approach. I am still having problems getting the search to work using a JSONB Type, so I ended up switching my column to a varchar ARRAY
CREATE TABLE table (
data varchar ARRAY NOT NULL
);
CREATE INDEX table_data_idx ON table USING GIN (data);
SELECT * FROM table WHERE data #> '{some_value}';
This works and is using the index.
I think my problem with my JSONB approach is because the element is actually nested much further and being treated as text.
i.e. data->'some_key'->>'array_key'->>0
And everytime I try to search I get all sorts of invalid token errors and other such things.
You may want to create a materialized view that has the primary key (or other unique index of your table) and expands the array field into a text column with the jsonb_array_elements_text function:
CREATE MATERIALIZED VIEW table_mv
AS
SELECT DISTINCT table.id, jsonb_array_elements_text(data->0) AS array_elem FROM table;
You can then create a unique index on this materialized view (primary keys are not supported on materialized views):
CREATE UNIQUE INDEX table_array_idx ON table_mv(id, array_elem);
Then query with a join to the original table on its primary key:
SELECT * FROM table INNER JOIN table_mv ON table.id = table_mv.id WHERE table_mv.array_elem = 'some_value';
This query should use the unique index and then look up the primary key of the original table, both very fast.

How to index a PostgreSQL JSONB flat text array for fuzzy and right-anchored searches?

PostgreSQL version: 9.6.
The events table has a visitors JSONB column:
CREATE TABLE events (name VARCHAR(256), visitors JSONB);
The visitors column contains a "flat" JSON array:
["John Doe","Frédéric Martin","Daniel Smith",...].
The events table contains 10 million of rows, each row has between 1 and 20 visitors.
Is it possible to index the values of the array to perform efficient pattern-matching searches:
left anchored: select events whose visitors match 'John%'
right anchored: select events whose visitors match '%Doe'
unaccented: select events whose visitors match 'Frederic%'
case-insensitive: select events whose visitors match 'john%'
I am aware of the existence of the Postgres trigram extension gin_trgm_ops enabling to create indexes for case-insensitive and right-anchored searches, but I can't figure out how to create trigram indexes for the content of "flat" JSON arrays.
I read Pattern matching on jsonb key/value and Index for finding an element in a JSON array but the solutions provided do not seem to apply to my use case.
You should cast the jsonb to text and create a trigram index on it:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON events USING gin
((visitors::text) gin_trgm_ops);
Then use regular expression searches on the column. For example, to search for John Doe, you can use:
SELECT ...
FROM events
WHERE visitors::text *~ '\mJohn Doe\M';
The trigram index will support this query.

jsonb data type lookup cost in postgres

This might be an obvious and simple question.
But I read through the jsonb data type documentation, but nowhere it mentions the lookup cost of a key in jsonb data.
For example, let's say I have a table with following schema:
CREATE TABLE A (id character varying (20),
info jsonb);
I want to know how postgres would parse a where query as below:
SELECT * FROM A WHERE info->>'city' = 'portland';
While going through the jsonb field of a row, is the lookup constant time (O(1)) or linear time (checking each key one by one in the row's jsonb dictionary) within that jsonb data dictionary?
My intuition is that it must be constant time (else what's the point of a dictionary style data?) but I can't see it in the official documentation to convince my team.
Any help would be great!
Thanks!
As with any WHERE condition in SQL: if there is no index, the database has to go through all rows of the table to find those that satisfy your condition.
You can either index a specific expression, or you can index the whole json value using a GIN index which then enables Postgres to use the index if any of the supported operators are used.
If you always check for the city, you can create a regular B-Tree index:
create index on a ( (info->>'city') );
If you don't know what you will be looking for, a GIN index might be a better choice:
create index on a using gin (info);
But you will need to change your query to use one of the operators that are supported by a GIN index, e.g. using the contains operator #>
select *
from a
where info #> '{"city": "portland"}::jsonb;
Note that an index lookup is not always the most efficient solution. Sometimes it's faster to simply go through all rows, sometimes the index lookup is faster.
If you want to learn more about indexes in relational database, go through the material here: http://use-the-index-luke.com/