How to create a multicolumn GiST index in Postgresql - postgresql

The postgresql documentation specifies that a GiST index can have multiple columns, but does not provide an example of what this might look like.
I have a table that tracks assets owned by different customers.
CREATE TABLE asset (
id serial PRIMARY KEY,
description text NOT NULL,
customer_id uuid NOT NULL
);
I'm writing a query that allows a customer to search for an asset based on words in it's description.
SELECT *
FROM asset
WHERE to_tsvector('english', asset.description) ## plainto_tsvector('english', ?)
AND asset.customer_id = ?;
Were this a non-tsvector query, I'd build a simple multicolumn index
CREATE INDEX idx_name ON asset(customer_id, description);
I can create an index only on the tsvector:
CREATE INDEX idx_name ON asset USING gist(to_tsvector('english', asset.description));
However, the query optimizer doesn't use the gist index, because it seems to want to do customer_id filtering first. Is there a way that I can include the non-tsvector field customer_id in the gist index, somehow, or am I out of luck?

You can create a multi column GIN or GiST index in Postrgres using the following:
CREATE INDEX index_name
ON table_name
USING gist (field_one gist_trgm_ops, field_two gist_trgm_ops);

Related

Convert PostgreSQL JSONB column results for use in condition with IN

I have a table with a JSONB column that is used to store multiple tags (integer) that have been applied to a task, eg.: '[123, 456, 789]'.
ALTER TABLE "public"."task" ADD COLUMN "tags" jsonb;
I also have a table dedicated to storing all the tags that can be used, and the primary key of each record is used in my JSONB column of the task table.
CREATE TABLE public.tag (
tag_id serial NOT NULL,
label varchar(50) NOT NULL,
);
In this table (tag) I have an index based on the task ID, and I want to use this index in a query that returns the tags labels that were used in a task.
SELECT * FROM task, tag WHERE task.tags #> to_jsonb(tag.tag_id)
Using to_jsonb is really bad as it doesn't use my table's index, but if I change the SQL to something like the example below, the index is used and SQL performance is much better.
SELECT * FROM tag WHERE tag.tag_id IN (123, 456, 789)
How do I convert the jsonb column (task table) to a set of integer values ​​that can be used with the IN condition, as in the example below?
SELECT * FROM task, tag WHERE tag.tag_id IN (task.tags);
You can use PostgreSQL jsonb_array_elements function which convert JSON elements to table records. For example:
SELECT * FROM task, tag WHERE tag.tag_id in (
select jsonb_array_elements('[200, 100, 789]'::jsonb)::int4 as json_data
);
But, for best performance, if you get JSON data from the table fields, so you must index this JSON field not use the standard btree index type. For JSON types PostgreSQL has a different index type as GIN index. This index type will give the best performance. I use this index in my table which has a million records. Very very best performance. Example for creating GIN index:
CREATE INDEX tag_table_json_index ON tag_table USING gin (json_field_name jsonb_path_ops);

Postgres UUID array index possible with ANY?

I have a table like:
CREATE TABLE IF NOT EXISTS my_table (
id uuid NOT NULL PRIMARY KEY,
duplicate_ids uuid[] DEFAULT NULL,
);
And my query is:
SELECT * FROM my_table WHERE 'some-uuid'=ANY(duplicate_ids)
Using EXPLAIN ANALYZE and trying lots of different indexes, I am unable to get the above to use an index.
Here's what I've tried (Postgres 12):
CREATE INDEX duplicate_ids_idx ON my_table USING GIN (duplicate_ids);
CREATE INDEX duplicate_ids_idx ON my_table USING GIN (duplicate_ids array_ops);
CREATE INDEX duplicate_ids_idx ON my_table USING BTREE (duplicate_ids);
CREATE INDEX duplicate_ids_idx ON my_table USING BTREE (duplicate_ids array_ops);
I've also ran SET enable_seqscan TO off; before these tests to enforce index usage.
Questions I've read:
Can PostgreSQL index array columns?
Doesn't seem to apply for single values
https://dba.stackexchange.com/questions/125413/index-not-used-with-any-but-used-with-in
Seems to be talking about indexing multiple columns and using IN
Thank you very much for your time.
Question was answered by #a_horse_with_no_name
The solution appears to be to use something like:
SELECT * FROM my_table WHERE duplicate_ids && array['some_uuid']::uuid[]

Is it possible to mix btree and gist in a Postgres index?

I have a table built like this:
create table pois (
id varchar(32) primary key,
location geography(Point,4326),
category varchar(32),
entity_class varchar(1),
hide boolean
);
on which most queries look like this:
SELECT * from pois
WHERE ( (
ST_contains(st_buffer(ST_SetSRID(ST_LineFromEncodedPolyline('ifp_Ik_vpAfj~FrehHhtxAhdaDpxbCnoa#~g|Ay_['), 4326)::geography, 5000)::geometry, location::geometry)
AND ST_Distance(location, ST_LineFromEncodedPolyline('ifp_Ik_vpAfj~FrehHhtxAhdaDpxbCnoa#~g|Ay_[')) < 5000
AND hide = false
AND entity_class in ('A', 'B')
) );
currently I have two indexes. one on location "pois_location_idx" gist (location) and one on hide and entity_class: "pois_select_idx" btree (hide, entity_class)
Performance is acceptable but I am wondering if there is a better indexing strategy, and specifically if it is possible and makes sense to have mixed btree + gist indexes.
You can use the operator classes from the btree_gist extension to create a multi-column GiST index:
CREATE EXTENSION btree_gist;
CREATE INDEX ON pois USING gist (location, category);
In your special case I would doubt the usefulness of that, because hide is a boolean and a GiST index cannot support an IN clause.
Perhaps it would be better to create a partial index:
CREATE INDEX ON pois USING gist (location) WHERE NOT hide AND entity_class in ('A', 'B');
Such an index can only be used for queries whose WHERE clause matches that of the index, so it is less universally useful.

Full-text with partitioning in PostgreSQL

I have a table that I want to search in.
Table:
user_id: integer
text: text
deleted_at: datetime (nullable)
Index:
CREATE INDEX CONCURRENTLY "index_notifications_full_text" ON "notifications"
USING "gist" (to_tsvector('simple'::REGCONFIG, COALESCE(("text")::TEXT, ''::TEXT))) WHERE "deleted_at" IS NULL;
I need to implement a full-text search for users (only inside their messages that are not deleted).
How can I implement an index that indexes both user_id and text?
Using the btree_gin and/or btree_gist extensions, you can include user_id directly into a multicolumn FTS index. You can try it on each type in turn, as it can be hard to predict which one will be better in a given situation.
Alternatively, you could partition the table by user_id using declarative partitioning, and then keep the single-column index (although in that case, GIN is likely better than GiST).
If you want more detailed advice, you need to give us more details. Like how many use_id are there, how many notifications per user_id, and many tokens are there per notification, and an example of a plausible query you hope to support efficiently.
You can add a new column with the name e.g document_with_idx with tsvector type
on your notifications table,
ALTER TABLE notifications ADD COLUMN document_with_idx tsvector;
Then update that column value with the vectorized value of user_id and text column.
update notifications set document_with_idx = to_tsvector(user_id || ' ' || coalesce(text, ''));
Finally, create an index with the name e.g document_idx on that column,
CREATE INDEX document_idx
ON notifications
USING GIN (document_with_idx);
Now you can do a full-text search on both user_id and text column value using that document_with_idx column.
Now search like,
select user_id, text from notifications
where document_with_idx ## to_tsquery('your search string goes here');
See more: https://www.postgresql.org/docs/9.5/textsearch-tables.html

PostgreSQL: pg_trgm full-text search with jsonb columns?

I have a table with a jsonb column where I store variable data. I would like to search this column and also find fragments (leading or trailing whitespace). I think I know how to do this with text columns but cannot wrap my head around how to achieve this with jsonb columns.
There are two scenarios that I would like to achieve:
Search a specific key inside the jsonb column only (for example
data->>company)
Search the whole jsonb column
For text columns I generate gin indexes using pg_trgm.
Install extension pg_trgm:
CREATE extension if not exists pg_trgm;
Create table & index:
CREATE TABLE tbl (
col_text text,
col_json jsonb
);
CREATE INDEX table_col_trgm_idx ON tbl USING gin (col_text gin_trgm_ops);
Example query:
SELECT * FROM tbl WHERE col_text LIKE '%foo%'; -- leading wildcard
SELECT * FROM tbl WHERE col_text ILIKE '%foo%'; -- works case insensitive as well
Trying the same with the jsonb column fails. If I try to index the whole column
CREATE INDEX table_col_trgm_idx ON tbl USING gin (col_json gin_trgm_ops);
I get the error
ERROR (datatype_mismatch): operator class "gin_trgm_ops" does not accept data type jsonb
(Which makes sense). If I try to index just one key of the jsonb column I also receive an error:
CREATE INDEX table_col_trgm_idx ON tbl USING gin (col_json->>company gin_trgm_ops);
Error:
ERROR (syntax_error): syntax error at or near "->>"
I used this answer by #erwin-brandstetter as a reference. Any help is highly appreciated (and no, I don't want to implement Elasticsearch as of now :) ).
Edit: Creating the index like this actually works:
CREATE INDEX table_col_trgm_idx ON tbl USING gin ((col_json->>'company') gin_trgm_ops);
And querying it also doesn't lead to an error:
SELECT * FROM tbl WHERE col_json->>'company' LIKE '%foo%';
But the result is always empty.