What is the best type of index to use on a materialized view in PostgreSQL - postgresql

I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.

If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.

Related

Queries using LIKE are exceptionally slow

I have a database with over 30,000,000 entries. When performing queries (including an ORDER BY clause) on a text field, the = operator results in relatively fast results. However we have noticed that when using the LIKE operator, the query becomes remarkably slow, taking minutes to complete. For example:
SELECT * FROM work_item_summary WHERE manager LIKE '%manager' ORDER BY created;
Creating indices on the keywords being searched will of course greatly speed up the query. The problem is we must support queries on any arbitrary pattern, and on any column, making this solution not viable.
My questions are:
Why are LIKE queries this much slower than = queries?
Is there any other way these generic queries can be optimized, or is about as good as one can get for a database with so many entries?
Your query plan shows a sequential scan, which is slow for big tables, and also not surprising since your LIKE pattern has a leading wildcard that cannot be supported with a plain B-tree index.
You need to add index support. Either a trigram GIN index to support any and all patterns, or a COLLATE "C" B-tree expression index on the reversed string to specifically target leading wildcards.
See:
PostgreSQL LIKE query performance variations
How to index a column for leading wildcard search and check progress?
One technic to speed up queries that search partial word (eg '%something%') si to use rotational indexing technic wich is not implementedin most of RDBMS.
This technique consists of cutting out each word of a sentence and then cutting it in "rotation", ie creating a list of parts of this word from which the first letter is successively removed.
As an example the word "electricity" will be exploded into 10 words :
lectricity
ectricity
ctricity
tricity
ricity
icity
city
ity
ty
y
Then you put all those words into a dictionnary that is a simple table with an index... and reference the root word.
Tables for this are :
CREATE TABLE T_WRD
(WRD_ID BIGINT IDENTITY PRIMARY KEY,
WRD_WORD VARCHAR(64) NOT NULL UNIQUE,
WRD_DROW GENERATED ALWAYS AS (REVERSE(WRD_WORD) NOT NULL UNIQUE) ;
GO
CREATE TABLE T_WORD_ROTATE_STRING_WRS
(WRD_ID BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
WRS_ROTATE SMALLINT NOT NULL,
WRD_ID_PART BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
PRIMARY KEY (WRD_ID, WRS_ROTATE));
GO

Does multi-column BRIN column order matter?

I have a large amount of data (500+ mil rows) in a table that I need to filter/query in real-time. I haven't been able to get satisfactory performance OR predictable query plans using regular b-tree indexes. I thought that using a BRIN would help a lot, but because our data cannot be inserted in any controlled physical ordering that I need to query by, I have set up a MATERIALIZED VIEW to select the data (including joined data) and sort it in a specific order. Something along the lines of...
CREATE MATERIALIZED VIEW my_view AS
SELECT a.one, b.two, b.three, c.four, c.five, c.six
FROM a, b, c WHERE ...joins
ORDER BY b.three, b.two, a.one, c.four;
I then created the index based on multiple columns, since all specified columns will always be used in the single query this view is meant for.
CREATE INDEX my_view_idx ON my_view
USING BRIN (three, two, one, four) WITH (pages_per_range = 64);
I ordered the columns (both in the table and in the BRIN) based on selectivity, meaning b.three will filter out 80% of the records (ie. only 20% of records will match), b.two will filter out 70%, etc.
Was ordering the BRIN columns the same as the physical sorting of the table necessary? I cannot find any resources that describe this. The closest thing I could find was from: https://www.postgresql.org/docs/10/indexes-multicolumn.html ...
A multicolumn BRIN index can be used with query conditions that involve any subset of the index's columns. Like GIN and unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.
... but that doesn't describe column ordering, only inclusion in a query.
I could experiment (and have been, with surprisingly good results), but it's a slow process as it takes 2+ hours to materialize the view and build the index, so I would love to have some sort of factual basis for my guessing to avoid wasting lots of time.
I think the order of columns in BRIN index doesn't matter - according to the same doc: https://www.postgresql.org/docs/10/indexes-multicolumn.html
Like GIN and unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.
Looks like the order is only important for B-tree and GiST.

Are these indexes doing the same thing in respect to customer_id?

I'm pretty new to PostgreSQL so apologies if I'm asking the obvious.
I've got a table called customer_products. It contains the following two indexes:
CREATE INDEX customer_products_customer_id
ON public.customer_products USING btree (customer_id)
CREATE UNIQUE INDEX customer_products_customer_id_product_id
ON public.customer_products USING btree (customer_id, product_id)
Are they both doing the same thing in respect to customer_id or do they function in a different way? I'm not sure if I should leave them or remove customer_products_customer_id.
There is nothing that the first index can do that the second cannot, so you should drop the first index.
The only advantage of the first index over the second when it comes to queries whose WHERE (or ORDER BY) clause involves customer_id only is that the index is smaller. That makes a range scan over many index entries somewhat faster.
The price for an extra index in terms of size and data modification speed usually outweighs that advantage. In a read-only data warehouse where I have a query that profits significantly I may be tempted to keep both indexes, otherwise I wouldn't.
You should definitely not drop the UNIQUE index, because it has a valuable use that has nothing to do with performance: it prevents the table from containing two rows that have the save values for the indexed columns. If that is what you want to guarantee, a UNIQUE index will make sure that your data keep in good shape.
Side remark: even though the effect is the same, it is better if the table has a unique constraint (which is backed by a unique index) than just having the index. If nothing else, it documents the purpose better.

PostgreSQL OK to use HASH exclude constraint for uniqueness?

Since hashes are smaller than lengthy text, it seems to me that they could be preferred to b-trees for ensuring column uniqueness.
For the sole purpose of ensuring uniqueness, is there any reason the following isn't OK in PG 10?
CREATE TABLE test (
path ltree,
EXCLUDE USING HASH ((path::text) WITH =)
);
I assume hash collisions are internally dealt with. Would be useless otherwise.
I will use a GiST index for query enhancement.
I think quoting the manual on this sums it up:
Although it's allowed, there is little point in using B-tree or hash
indexes with an exclusion constraint, because this does nothing that
an ordinary unique constraint doesn't do better. So in practice the access method will always be GiST or SP-GiST.
All the more since you want to create a GiST index anyway. The exclusion constraint with USING GIST will create a matching GiST index as implementation detail automatically. No point in maintaining another, inefficient hash index not even being used in queries.
For simple uniqueness (WITH =) a plain UNIQUE btree index is more efficient. If your keys are very long, consider a unique index on a hash expression (using any immutable function) to reduce size. Like:
CREATE UNIQUE INDEX test_path_hash_uni_idx ON test (my_hash_func(path));
Related:
How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?
md5() would be a simple option as hash function.
Convert hex in text representation to decimal number
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Before Postgres 10 the use of hash indexes was discouraged. But with the latest updates, this has improved. Robert Haas (core developer) sums it up in a blog entry:
PostgreSQL's Hash Indexes Are Now Cool
CREATE INDEX test_path_hash_idx ON test USING HASH (path);
Alas (missed that in my draft), the access method "hash" does not support unique indexes, yet. So I would still go with the unique index on a hash expression above. (But no hash function that reduces information can fully guarantee uniqueness of the key - which may or may not be an issue.)

alternative to bitmap index in postgresql

I have a table with hundreds of millions rows with schema like below.
tabe AA {
id integer primay key,
prop0 boolean not null,
prop1 boolean not null,
prop2 smallint not null,
...
}
The each "property" field (prop0, prop1, ...) has a small number of distinct values. And I usually query to find "id" from the given conditions of properties fields. I think Bitmap index is best for this query. But postgresql seems not support bitmap index.
I tried b-tree index on each field but these indexes are not used according to the query explain.
Is there a good alternative way to do this?
(i'm using postgresql 9)
Your real problem is a bad schema design, not the index. The properties should be placed in a different table and your current table should link to that table using a many to many relation.
The BIT datatype might also be of use, just check the manual.
Create a multicolumn index on properties which are always or almost always queried. Or several multicolumn indexes if needed.
The alternative, when you do not query the same properties almost always, is to make a tsvector column with words describing your data, maintained using trigger, for example
prop0=true
prop1=false
prop2=4
would be
'propzero nopropone proptwo4'::tsvector
index it using GIN and then use full text search for searching:
where tsv ## 'popzero & nopropone & proptwo4'::tsquery
An index is only used if it actually speeds up the query which is not necessarily always the case. Especially with smallish tables (say thousands of rows) a full table scan ("seq scan" in the Postgres execution plan) might indeed be a lot faster.
How many rows did the table have when you tried the statement?
How did the query look like? Maybe there are other conditions that prevent the index usage.
Did you analyze the table to have up-to-date statistics?