Following on from my previous question on this topic, Postgres combining multiple Indexes:
I have the following table on Postgres 9.2 (with postgis):
CREATE TABLE updates (
update_id character varying(50) NOT NULL,
coords geography(Point,4326) NOT NULL,
user_id character varying(50) NOT NULL,
created_at timestamp without time zone NOT NULL
);
And I am running following query on the table:
select *
from updates
where ST_DWithin(coords, ST_MakePoint(-126.4, 45.32)::geography, 30000)
and user_id='3212312'
order by created_at desc
limit 60
So given that, what Index should I use for (coords + user_id), GIST or BTree?
CREATE INDEX ix_coords_user_id ON updates USING GIST (coords, user_id);
OR
CREATE INDEX ix_coords_user_id ON updates (coords, user_id);
I was reading that BTree performs better than GIST, but am I forced to use GIST since I am using postgis geography field??
You must use GiST if you want to use any index method other than the regular b-tree indexes (or hash indexes, but they shouldn't really be used). PostGIS indexes require GiST.
B-tree indexes can only be used for basic operations involving equality or ordering, like =, <, <=, >, >=, <>, BETWEEN and IN. While you can create a b-tree index on a geomtery object (point, region, etc) it can only actually be used for equality as ordering comparisons like > are generally meaningless for such objects. A GiST index is required to support more complex and general comparisons like "contains", "intersects", etc.
You can use the btree_gist extension to enable b-tree indexing for GiST. It's considerably slower than regular b-tree indexes, but allows you to create a multi-column index that contains both GiST-only types and regular types like text, integer, etc.
In these situations you really need to use explain analyze (explain.depesz.com is useful for this) to examine how Pg uses various indexes and combinations of indexes that you create. Try different column orderings in multi-column indexes, and see whether two or more separate indexes are more effective.
I strongly suspect that you'll get the best results with the multicolumn GiST index in this case, but I'd try several different combinations of indexes and index column orderings to see.
Related
I have a database with over 30,000,000 entries. When performing queries (including an ORDER BY clause) on a text field, the = operator results in relatively fast results. However we have noticed that when using the LIKE operator, the query becomes remarkably slow, taking minutes to complete. For example:
SELECT * FROM work_item_summary WHERE manager LIKE '%manager' ORDER BY created;
Creating indices on the keywords being searched will of course greatly speed up the query. The problem is we must support queries on any arbitrary pattern, and on any column, making this solution not viable.
My questions are:
Why are LIKE queries this much slower than = queries?
Is there any other way these generic queries can be optimized, or is about as good as one can get for a database with so many entries?
Your query plan shows a sequential scan, which is slow for big tables, and also not surprising since your LIKE pattern has a leading wildcard that cannot be supported with a plain B-tree index.
You need to add index support. Either a trigram GIN index to support any and all patterns, or a COLLATE "C" B-tree expression index on the reversed string to specifically target leading wildcards.
See:
PostgreSQL LIKE query performance variations
How to index a column for leading wildcard search and check progress?
One technic to speed up queries that search partial word (eg '%something%') si to use rotational indexing technic wich is not implementedin most of RDBMS.
This technique consists of cutting out each word of a sentence and then cutting it in "rotation", ie creating a list of parts of this word from which the first letter is successively removed.
As an example the word "electricity" will be exploded into 10 words :
lectricity
ectricity
ctricity
tricity
ricity
icity
city
ity
ty
y
Then you put all those words into a dictionnary that is a simple table with an index... and reference the root word.
Tables for this are :
CREATE TABLE T_WRD
(WRD_ID BIGINT IDENTITY PRIMARY KEY,
WRD_WORD VARCHAR(64) NOT NULL UNIQUE,
WRD_DROW GENERATED ALWAYS AS (REVERSE(WRD_WORD) NOT NULL UNIQUE) ;
GO
CREATE TABLE T_WORD_ROTATE_STRING_WRS
(WRD_ID BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
WRS_ROTATE SMALLINT NOT NULL,
WRD_ID_PART BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
PRIMARY KEY (WRD_ID, WRS_ROTATE));
GO
Can someone please explain why two indexes mechanism (a btree and a gist) are defined in an example given by Postgresql documentation (Please Check F.21.4. Example section).
This is the example code :
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING gist(path);
CREATE INDEX path_idx ON test USING btree(path);
At the last two lines, the author has created two indexes on path column. why? isn't that Gist enough for the purpose?
In section F.21.3 the author implies that b-tree and gist may speed up the following operators :
B-tree index over ltree: <, <=, =, >=, >
GiST index over ltree: <, <=, =, >=, >, #>, <#, #, ~, ?
Which means Gist is enough for all the above operators.let clear my question. Is the author trying to provide example for both indexes here or there is a reason for using both indexes.
I am creating a table with a ltree column and I want to know that should I create both indexes (btree and gist) on my column or gist is just enough?
B-tree indices are faster; check this discussion: What's the difference between B-Tree and GiST index methods (in PostgreSQL)?
So the logic behind creating both indices is that GiST is more flexible and can be used when the method of searching isn't supported by B-tree (the extra #>, <#, #, ~, ? that GiST has). However, when you do use the methods common between both (<, <=, =, >=, >), B-tree will be used as it is faster.
I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.
If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.
On https://www.postgresql.org/docs/current/static/pgtrgm.html it is explained how special GIN idexes with gin_trgm_ops option can be used to facilitate trigram similarity operator performance.
CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);
It is also said:
These indexes do not support equality nor simple comparison operators,
so you may need a regular B-tree index too.
However, there is also BTREE_GIN extension which should allow GIN indexes to be used as substitute for BTREE indexes. https://www.postgresql.org/docs/current/static/btree-gin.html
My question is: If I install BTREE_GIN extension, could pg_trgm GIN index (with gin_trgm_ops option) be used as substitute for BTREE index? Does it combine properties of both BTREE_GIN and trigram GIN index, or additional BTREE index is still needed for joining and equality expressions etc.?
No, if you install btree_gin, you can create a GIN index over “basic” data types like integer, varchar or text.
This is normally useless, since you can use such an index for nothing that wouldn't be done better by a regular B-tree index, but it is very useful if you want to create a multicolumn GIN index including a column with such a data type, for example if you want to create a combined index for an expression like tscol ## to_tsquery('big data') AND intcol = 42.
I have a postgres table with 3 fields:
a : postgis geometry
b : array varchar[]
c : integer
and I have a query that involves all of them. I would like to add a multicolumn index to speed it up but I cannot as the 3 fields cannot go under the same index because of their nature.
What is the strategy in this case? Adding 3 indexes gist, gin and btree and postgres will use them all during the query?
Single-column index
Postgres can combine multiple indexes very efficiently in a single query with bitmap index scans. Most of the time, the most selective index is picked (or two, combined with bitmap index scans) and the rest is filtered. Once the result set is narrow enough, it's not efficient to scan more indexes.
Multicolumn index
It is still faster to have a perfectly matching multicolumn index, but not by orders of magnitude.
Since you want to include an array type I suggest to use a GIN index. AFAIK, operator classes are missing for general-purpose GiST indexes on array type. (The exception being intarray for integer arrays.)
To include the integer column, first install the additional module btree_gin, which provides the necessary GIN operator classes. Run once per database:
CREATE EXTENSION btree_gin;
Then you should be able to create your multicolumn index:
CREATE INDEX tbl_abc_gin_idx ON tbl USING GIN(a, b, c);
The order of index columns is irrelevant for GIN indexes. The manual:
A multicolumn GIN index can be used with query conditions that involve
any subset of the index's columns. Unlike B-tree or GiST, index search
effectiveness is the same regardless of which index column(s) the
query conditions use.
Nearest neighbour search
Since you are including a PostGis geometry type, chances are you want to do a nearest neighbour search, for which you need a GiST index. In this case I suggest two indexes:
CREATE INDEX tbl_ac_gist_idx ON tbl USING GiST(a, c); -- geometry type
CREATE INDEX tbl_bc_gin_idx ON tbl USING GIN(b, c);
You could add the integer column c to either or both. It depends.
For that, you need either btree_gin or btree_gist or both, respectively.
the 3 fields cannot go under the same index because of their nature
The 3 fields can go under the same index using the btree-gist module.