Postgresql BTREE_GIN index with gin_trgm_ops option? - postgresql

On https://www.postgresql.org/docs/current/static/pgtrgm.html it is explained how special GIN idexes with gin_trgm_ops option can be used to facilitate trigram similarity operator performance.
CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);
It is also said:
These indexes do not support equality nor simple comparison operators,
so you may need a regular B-tree index too.
However, there is also BTREE_GIN extension which should allow GIN indexes to be used as substitute for BTREE indexes. https://www.postgresql.org/docs/current/static/btree-gin.html
My question is: If I install BTREE_GIN extension, could pg_trgm GIN index (with gin_trgm_ops option) be used as substitute for BTREE index? Does it combine properties of both BTREE_GIN and trigram GIN index, or additional BTREE index is still needed for joining and equality expressions etc.?

No, if you install btree_gin, you can create a GIN index over “basic” data types like integer, varchar or text.
This is normally useless, since you can use such an index for nothing that wouldn't be done better by a regular B-tree index, but it is very useful if you want to create a multicolumn GIN index including a column with such a data type, for example if you want to create a combined index for an expression like tscol ## to_tsquery('big data') AND intcol = 42.

Related

Efficient postgres index type for LIKE operator (fixed ending)

There is a postgres query using like statement:
email LIKE '%#%domain.com'
What is the most appropriate index type that I can use?
I've found pg_trgm module which must be enabled:
CREATE EXTENSION pg_trgm;
The pg_trgm module provides functions and operators for determining the similarity of ASCII alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.
And then you can
CREATE INDEX <index name> ON <table name> USING gin (<column> gin_trgm_ops);
Is there a better option?
gin_trgm_ops is described here: https://niallburkley.com/blog/index-columns-for-like-in-postgres/
The trigram index might be fine, and allows you write the query naturally. But more efficient would be a reversed string index:
create index on foobar (reverse(email) text_pattern_ops);
select * from foobar where reverse(email) LIKE reverse('%#%domain.com');
If your default collation is "C", then you don't need to specify text_pattern_ops.
If the search parameter contains any escaped (literal) characters, then you will have to do something more complicated than simply reversing it.

How to create a Gin index on a geometry column in Postgresql?

According to pgAdmin 4 4.21 documentation » Creating or Modifying a Table »
Select gin to create a GIN index. A GIN index may improve performance when managing two-dimensional geometric data types and nearest-neighbor searches
We should create a Gin index for geometric column if we intend to use Nearest-neighbor searches, Which I do!
However, when defining Gin index it asks for Operator Class and there are two options there (jsonb_path_obs and gin_int_ops) but none of them works with Geometry type.
Could someone please tell me how to create a Gin index on a Geometry type column?
P.S by geometry I mean PostGIS's geometry column type
Please link to the thing you are quoting so we don't have to go searching for it.
That looks like a bug in the pgadmin4 docs. They seem to have the GIN and GiST labels reversed in those descriptions. GIN supports multiple keys better than GiST does, but doesn't support nearest-neighbor or spatial. You want a GiST index.

What is the best type of index to use on a materialized view in PostgreSQL

I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.
If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.

Multicolumn index on 3 fields with heterogenous data types

I have a postgres table with 3 fields:
a : postgis geometry
b : array varchar[]
c : integer
and I have a query that involves all of them. I would like to add a multicolumn index to speed it up but I cannot as the 3 fields cannot go under the same index because of their nature.
What is the strategy in this case? Adding 3 indexes gist, gin and btree and postgres will use them all during the query?
Single-column index
Postgres can combine multiple indexes very efficiently in a single query with bitmap index scans. Most of the time, the most selective index is picked (or two, combined with bitmap index scans) and the rest is filtered. Once the result set is narrow enough, it's not efficient to scan more indexes.
Multicolumn index
It is still faster to have a perfectly matching multicolumn index, but not by orders of magnitude.
Since you want to include an array type I suggest to use a GIN index. AFAIK, operator classes are missing for general-purpose GiST indexes on array type. (The exception being intarray for integer arrays.)
To include the integer column, first install the additional module btree_gin, which provides the necessary GIN operator classes. Run once per database:
CREATE EXTENSION btree_gin;
Then you should be able to create your multicolumn index:
CREATE INDEX tbl_abc_gin_idx ON tbl USING GIN(a, b, c);
The order of index columns is irrelevant for GIN indexes. The manual:
A multicolumn GIN index can be used with query conditions that involve
any subset of the index's columns. Unlike B-tree or GiST, index search
effectiveness is the same regardless of which index column(s) the
query conditions use.
Nearest neighbour search
Since you are including a PostGis geometry type, chances are you want to do a nearest neighbour search, for which you need a GiST index. In this case I suggest two indexes:
CREATE INDEX tbl_ac_gist_idx ON tbl USING GiST(a, c); -- geometry type
CREATE INDEX tbl_bc_gin_idx ON tbl USING GIN(b, c);
You could add the integer column c to either or both. It depends.
For that, you need either btree_gin or btree_gist or both, respectively.
the 3 fields cannot go under the same index because of their nature
The 3 fields can go under the same index using the btree-gist module.

Postgres GIST vs Btree index

Following on from my previous question on this topic, Postgres combining multiple Indexes:
I have the following table on Postgres 9.2 (with postgis):
CREATE TABLE updates (
update_id character varying(50) NOT NULL,
coords geography(Point,4326) NOT NULL,
user_id character varying(50) NOT NULL,
created_at timestamp without time zone NOT NULL
);
And I am running following query on the table:
select *
from updates
where ST_DWithin(coords, ST_MakePoint(-126.4, 45.32)::geography, 30000)
and user_id='3212312'
order by created_at desc
limit 60
So given that, what Index should I use for (coords + user_id), GIST or BTree?
CREATE INDEX ix_coords_user_id ON updates USING GIST (coords, user_id);
OR
CREATE INDEX ix_coords_user_id ON updates (coords, user_id);
I was reading that BTree performs better than GIST, but am I forced to use GIST since I am using postgis geography field??
You must use GiST if you want to use any index method other than the regular b-tree indexes (or hash indexes, but they shouldn't really be used). PostGIS indexes require GiST.
B-tree indexes can only be used for basic operations involving equality or ordering, like =, <, <=, >, >=, <>, BETWEEN and IN. While you can create a b-tree index on a geomtery object (point, region, etc) it can only actually be used for equality as ordering comparisons like > are generally meaningless for such objects. A GiST index is required to support more complex and general comparisons like "contains", "intersects", etc.
You can use the btree_gist extension to enable b-tree indexing for GiST. It's considerably slower than regular b-tree indexes, but allows you to create a multi-column index that contains both GiST-only types and regular types like text, integer, etc.
In these situations you really need to use explain analyze (explain.depesz.com is useful for this) to examine how Pg uses various indexes and combinations of indexes that you create. Try different column orderings in multi-column indexes, and see whether two or more separate indexes are more effective.
I strongly suspect that you'll get the best results with the multicolumn GiST index in this case, but I'd try several different combinations of indexes and index column orderings to see.