How multiple indexes in postgres work on the same column

How multiple indexes in postgres work on the same column - postgresql

I was wondering I'm not really sure how multiple indexes would work on the same column.
So lets say I have an id column and a country column. And on those I have an index on id and another index on id and country. When I do my query plan it looks like its using both those indexes. I was just wondering how that works? Can I force it to use just the id and country index.
Also is it bad practice to do that? When is it a good idea to index the same column multiple times?

It is common to have indexes on both (id) and (country,id), or alternatively (country) and (country,id) if you have queries that benefit from each of them. You might also have (id) and (id, country) if you want the "covering" index on (id,country) to support index only scans, but still need the stand along to enforce a unique constraint.
In theory you could just have (id,country) and still use it to enforce uniqueness of id, but PostgreSQL does not support that at this time.
You could also sensibly have different indexes on the same column if you need to support different collations or operator classes.
If you want to force PostgreSQL to not use a particular index to see what happens with it gone, you can drop it in a transactions then roll it back when done:
BEGIN; drop index table_id_country_idx; explain analyze select * from ....; ROLLBACK;

Related

What is the best type of index to use on a materialized view in PostgreSQL

I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.

If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.

Are these indexes doing the same thing in respect to customer_id?

I'm pretty new to PostgreSQL so apologies if I'm asking the obvious.
I've got a table called customer_products. It contains the following two indexes:
CREATE INDEX customer_products_customer_id
ON public.customer_products USING btree (customer_id)
CREATE UNIQUE INDEX customer_products_customer_id_product_id
ON public.customer_products USING btree (customer_id, product_id)
Are they both doing the same thing in respect to customer_id or do they function in a different way? I'm not sure if I should leave them or remove customer_products_customer_id.

There is nothing that the first index can do that the second cannot, so you should drop the first index.
The only advantage of the first index over the second when it comes to queries whose WHERE (or ORDER BY) clause involves customer_id only is that the index is smaller. That makes a range scan over many index entries somewhat faster.
The price for an extra index in terms of size and data modification speed usually outweighs that advantage. In a read-only data warehouse where I have a query that profits significantly I may be tempted to keep both indexes, otherwise I wouldn't.
You should definitely not drop the UNIQUE index, because it has a valuable use that has nothing to do with performance: it prevents the table from containing two rows that have the save values for the indexed columns. If that is what you want to guarantee, a UNIQUE index will make sure that your data keep in good shape.
Side remark: even though the effect is the same, it is better if the table has a unique constraint (which is backed by a unique index) than just having the index. If nothing else, it documents the purpose better.

Understanding indexes and performance as they relate to indexed column and non-indexed column data in the same row

I have some tables that are around 100 columns wide. I haven't normalized them because to put it back together would require almost 3 dozen joins and am not sure it would perform any better... haven't tested it yet (I will) so can't say for sure.
Anyway, that really isn't the question. I have been indexing columns in these tables that I know will be pulled frequently, so something like 50 indexes per table.
I got to thinking though. These columns will never be pulled by themselves and are meaningless without the primary key (basically an item number). The PK will always be used for the join and even in simple SELECT queries, it will have to be a specified column so the data makes sense.
That got me thinking further about indexes and how they work. As I understand them the locations of a values are committed to memory for that column so it is quickly found in a query.
For example, if you have:
SELECT itemnumber, expdate
FROM items;
And both itemnumber and expdate are indexed, is that excessive and really adding any benefit? Is it sufficient to just index itemnumber and the index will know that expdate, or anything else that is queried for that item, is on the same row?
Secondly, if multiple columns constitute a primary key, should the index include them grouped together, or is individually sufficient?
For example,
CREATE INDEX test_index ON table (pk_col1, pk_col2, pk_col3);
vs.
CREATE INDEX test_index1 ON table (pk_col1);
CREATE INDEX test_index2 ON table (pk_col2);
CREATE INDEX test_index3 ON table (pk_col3);
Thanks for clearing that up in advance!

Uh oh, there is a mountain of basics that you still have to learn.
I'd recommend that you read the PostgreSQL documentation and the excellent book “SQL Performance Explained”.
I'll give you a few pointers to get you started:
Whenever you create a PRIMARY KEY or UNIQUE constraint, PostgreSQL automatically creates a unique index over all the columns of that constraint. So you don't have to create that index explicitly (but if it is a multicolumn index, it sometimes is useful to create another index on any but the first column).
Indexes are relevant to conditions in the WHERE clause and the GROUP BY clause and to some extent for table joins. They are irrelevant for entries in the SELECT list. An index provides an efficient way to get the part of a table that satisfies a certain condition; an (unsorted) access to all rows of a table will never benefit from an index.
Don't sprinkle your schema with indexes randomly, since indexes use space and make all data modification slow.
Use them where you know that they will do good: on columns on which a foreign key is defined, on columns that appear in WHERE clauses and contain many different values, on columns where your examination of the execution plan (with EXPLAIN) suggests that you can expect a performance benefit.

Indexing jsonb data for pattern matching searches

This is a follow-up to:
Pattern matching on jsonb key/value
I have a table as follows
CREATE TABLE "PreStage".transaction (
transaction_id serial NOT NULL,
transaction jsonb
CONSTRAINT pk_transaction PRIMARY KEY (transaction_id)
);
The content in my transaction jsonb column looks like
{"ADDR": "abcd", "CITY": "abcd", "PROV": "",
"ADDR2": "",
"ADDR3": "","CNSNT": "Research-NA", "CNTRY": "NL", "EMAIL": "#.com",
"PHONE": "12345", "HCO_NM": "HELLO", "UNQ_ID": "",
"PSTL_CD": "1234", "HCP_SR_NM": "", "HCP_FST_NM": "",
"HCP_MID_NM": ""}
I need search query like:
SELECT transaction AS data FROM "PreStage".transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%';
But I need to give my user flexibility to search any key/value on the fly.
An answer to the previous question suggested to create index as:
CREATE INDEX idxgin ON "PreStage".transaction
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);
Which works, but I wanted to index other keys, too. Hence was trying something like:
CREATE INDEX idxgin ON "PreStage".transaction USING gin
((transaction->>'HCP_FST_NM'),(transaction->>'HCP_LST_NM') gin_trgm_ops)
Which doesn't work. What would be the best indexing approach here or will I have to create a separate index for each key in which case the approach will not be generic if a new key/value pair is added to the data.

The syntax error that #jjanes pointed out aside,
for a mix of some popular keys (contained in many rows and / or searched often) plus many more rare keys (contained in few rows and / or rarely searched, new keys might pop up dynamically) I suggest this combination:
Trigram indexes for popular keys
It does not seem like you are going to combine multiple keys in one search often, and a single index with many keys would grow very big and slow. So I would create a separate index for each popular key. Make it a partial index for keys that are not contained in most rows:
CREATE INDEX trans_idxgin_HCP_FST_NM ON transaction -- contained in most rows
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);
CREATE INDEX trans_idxgin_ADDR ON transaction -- not in most rows
USING gin ((transaction->>'ADDR') gin_trgm_ops)
WHERE transaction ? 'ADDR';
Etc. Like detailed in my previous answer:
Pattern matching on jsonb key/value
Basic jsonb GIN index
If you have many different keys and / or new keys are added dynamically, you can cover the rest with a basic (default) jsonb_ops GIN index:
CREATE INDEX trans_idxgin ON "PreStage".transaction USING gin (transaction);
Among other things, this supports the search for keys. But you cannot use it for pattern matching on values.
What's the proper index for querying structures in arrays in Postgres jsonb?
Query
Combine predicates addressing both indexes:
SELECT transaction AS data
FROM "PreStage".transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND transaction ? 'HCP_FST_NM'; -- even if that seems redundant.
The second condition happens to match our partial indexes as well.
So either there is a specific trigram index for the given (popular / common) key, or there is at least an index to find (the few) rows containing the rare key - and then filter for matching values. The same query should give you the best of both worlds.
Be sure to run the latest version of Postgres, there have been various updates for cost estimates recently. It will be crucial that Postgres works with good estimates and current table statistics to choose the best query plan.

There is no built in index that does precisely what you want, searching for an exact key and a corresponding wild-card matching value, without specifying ahead of time which key(s) to use. It should be possible to create an extension which would do this, but it would be an awful lot of work, and I don't know of any that exist.
Your best option that works out of the box might be to cast the jsonb to text and index that text:
create index on transaction using gin ((transaction::text) gin_trgm_ops);
And then add a secondary condition to your query:
SELECT transaction AS data FROM transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND transaction::text ilike '%neer%';
Now it can use the index to find anything containing 'neer', and then later re-check that 'neer' occurs in the value for the 'HCP_FST_NM' key, as opposed to just some other place in the JSONB.
If your query word occurs in lots of places other than in the value of the desired key, then this might not give you very good performance. For example, if someone searched for:
transaction->>'EMAIL' ilike '%ADDR%'
AND transaction::text ilike '%ADDR%';
The the index would return every row, assuming all records have the same structure as what you show, because every row contains 'ADDR' because used as a key. Then every row would fail the other condition check, but only after doing a lot of work.

alternative to bitmap index in postgresql

I have a table with hundreds of millions rows with schema like below.
tabe AA {
id integer primay key,
prop0 boolean not null,
prop1 boolean not null,
prop2 smallint not null,
...
}
The each "property" field (prop0, prop1, ...) has a small number of distinct values. And I usually query to find "id" from the given conditions of properties fields. I think Bitmap index is best for this query. But postgresql seems not support bitmap index.
I tried b-tree index on each field but these indexes are not used according to the query explain.
Is there a good alternative way to do this?
(i'm using postgresql 9)

Your real problem is a bad schema design, not the index. The properties should be placed in a different table and your current table should link to that table using a many to many relation.
The BIT datatype might also be of use, just check the manual.

Create a multicolumn index on properties which are always or almost always queried. Or several multicolumn indexes if needed.
The alternative, when you do not query the same properties almost always, is to make a tsvector column with words describing your data, maintained using trigger, for example
prop0=true
prop1=false
prop2=4
would be
'propzero nopropone proptwo4'::tsvector
index it using GIN and then use full text search for searching:
where tsv ## 'popzero & nopropone & proptwo4'::tsquery

An index is only used if it actually speeds up the query which is not necessarily always the case. Especially with smallish tables (say thousands of rows) a full table scan ("seq scan" in the Postgres execution plan) might indeed be a lot faster.
How many rows did the table have when you tried the statement?
How did the query look like? Maybe there are other conditions that prevent the index usage.
Did you analyze the table to have up-to-date statistics?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse