Using postgresql UNIQUE INDEX and FUNCTIONAL INDEX together - postgresql

I have a postgresql table like below.
CREATE TABLE "user" (
"id" integer NOT NULL,
"hash" char(40) NOT NULL,
"username" char(255) NOT NULL,
PRIMARY KEY ("id"),
UNIQUE ("hash"));
However, since hash is 40 letter, I want to do a functional index like below to reduce the memory requirement.
CREATE INDEX CONCURRENTLY on user (substr(hash, 0, 20))
Is it okay to do like that or will it just generate another useless index? How can I make sure that UNIQUE index only index first 20 char from my hash?
Thanks.

If you need the hash to be unique, you must have a unique index on the whole thing. Otherwise you'll get unique violations for hashes that differ only in the last 20 chars.
You can create a non-unique index on the left 20 chars, like you showed:
CREATE INDEX on user (left(hash, 20))
But it probably serves no useful purpose. PostgreSQL will create a unique index on the whole 40 char hash automatically when you declare it as a a UNIQUE constraint. You cannot drop this index without dropping the constraint. So you're stuck with the full size index if you want to enforce uniqueness of hashes. Given that, it's unlikely that the functional index will be of much benefit. Even in queries like:
SELECT ...
FROM "user"
WHERE left(hash, 20) = left($1, 20) AND hash = $1
where you might think you're saving time by using a smaller index to do a quick check first, in reality it's fairly likely that PostgreSQL will ignore the functional index and prefer the full index since it's more selective.
I'm not totally clear on what you're trying to achieve, but if it's doing a partial or functional index to implement a unique constraint, you can't do that.
Also, store hash as bytea and use the index expression left(hash, 20). Or maybe 10, if you're currently storing as a 2-chars-per-byte hex representation.

Related

What is the best type of index to use on a materialized view in PostgreSQL

I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.
If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.

Are these indexes doing the same thing in respect to customer_id?

I'm pretty new to PostgreSQL so apologies if I'm asking the obvious.
I've got a table called customer_products. It contains the following two indexes:
CREATE INDEX customer_products_customer_id
ON public.customer_products USING btree (customer_id)
CREATE UNIQUE INDEX customer_products_customer_id_product_id
ON public.customer_products USING btree (customer_id, product_id)
Are they both doing the same thing in respect to customer_id or do they function in a different way? I'm not sure if I should leave them or remove customer_products_customer_id.
There is nothing that the first index can do that the second cannot, so you should drop the first index.
The only advantage of the first index over the second when it comes to queries whose WHERE (or ORDER BY) clause involves customer_id only is that the index is smaller. That makes a range scan over many index entries somewhat faster.
The price for an extra index in terms of size and data modification speed usually outweighs that advantage. In a read-only data warehouse where I have a query that profits significantly I may be tempted to keep both indexes, otherwise I wouldn't.
You should definitely not drop the UNIQUE index, because it has a valuable use that has nothing to do with performance: it prevents the table from containing two rows that have the save values for the indexed columns. If that is what you want to guarantee, a UNIQUE index will make sure that your data keep in good shape.
Side remark: even though the effect is the same, it is better if the table has a unique constraint (which is backed by a unique index) than just having the index. If nothing else, it documents the purpose better.

PostgreSQL OK to use HASH exclude constraint for uniqueness?

Since hashes are smaller than lengthy text, it seems to me that they could be preferred to b-trees for ensuring column uniqueness.
For the sole purpose of ensuring uniqueness, is there any reason the following isn't OK in PG 10?
CREATE TABLE test (
path ltree,
EXCLUDE USING HASH ((path::text) WITH =)
);
I assume hash collisions are internally dealt with. Would be useless otherwise.
I will use a GiST index for query enhancement.
I think quoting the manual on this sums it up:
Although it's allowed, there is little point in using B-tree or hash
indexes with an exclusion constraint, because this does nothing that
an ordinary unique constraint doesn't do better. So in practice the access method will always be GiST or SP-GiST.
All the more since you want to create a GiST index anyway. The exclusion constraint with USING GIST will create a matching GiST index as implementation detail automatically. No point in maintaining another, inefficient hash index not even being used in queries.
For simple uniqueness (WITH =) a plain UNIQUE btree index is more efficient. If your keys are very long, consider a unique index on a hash expression (using any immutable function) to reduce size. Like:
CREATE UNIQUE INDEX test_path_hash_uni_idx ON test (my_hash_func(path));
Related:
How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?
md5() would be a simple option as hash function.
Convert hex in text representation to decimal number
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Before Postgres 10 the use of hash indexes was discouraged. But with the latest updates, this has improved. Robert Haas (core developer) sums it up in a blog entry:
PostgreSQL's Hash Indexes Are Now Cool
CREATE INDEX test_path_hash_idx ON test USING HASH (path);
Alas (missed that in my draft), the access method "hash" does not support unique indexes, yet. So I would still go with the unique index on a hash expression above. (But no hash function that reduces information can fully guarantee uniqueness of the key - which may or may not be an issue.)

Indexing jsonb data for pattern matching searches

This is a follow-up to:
Pattern matching on jsonb key/value
I have a table as follows
CREATE TABLE "PreStage".transaction (
transaction_id serial NOT NULL,
transaction jsonb
CONSTRAINT pk_transaction PRIMARY KEY (transaction_id)
);
The content in my transaction jsonb column looks like
{"ADDR": "abcd", "CITY": "abcd", "PROV": "",
"ADDR2": "",
"ADDR3": "","CNSNT": "Research-NA", "CNTRY": "NL", "EMAIL": "#.com",
"PHONE": "12345", "HCO_NM": "HELLO", "UNQ_ID": "",
"PSTL_CD": "1234", "HCP_SR_NM": "", "HCP_FST_NM": "",
"HCP_MID_NM": ""}
I need search query like:
SELECT transaction AS data FROM "PreStage".transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%';
But I need to give my user flexibility to search any key/value on the fly.
An answer to the previous question suggested to create index as:
CREATE INDEX idxgin ON "PreStage".transaction
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);
Which works, but I wanted to index other keys, too. Hence was trying something like:
CREATE INDEX idxgin ON "PreStage".transaction USING gin
((transaction->>'HCP_FST_NM'),(transaction->>'HCP_LST_NM') gin_trgm_ops)
Which doesn't work. What would be the best indexing approach here or will I have to create a separate index for each key in which case the approach will not be generic if a new key/value pair is added to the data.
The syntax error that #jjanes pointed out aside,
for a mix of some popular keys (contained in many rows and / or searched often) plus many more rare keys (contained in few rows and / or rarely searched, new keys might pop up dynamically) I suggest this combination:
Trigram indexes for popular keys
It does not seem like you are going to combine multiple keys in one search often, and a single index with many keys would grow very big and slow. So I would create a separate index for each popular key. Make it a partial index for keys that are not contained in most rows:
CREATE INDEX trans_idxgin_HCP_FST_NM ON transaction -- contained in most rows
USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops);
CREATE INDEX trans_idxgin_ADDR ON transaction -- not in most rows
USING gin ((transaction->>'ADDR') gin_trgm_ops)
WHERE transaction ? 'ADDR';
Etc. Like detailed in my previous answer:
Pattern matching on jsonb key/value
Basic jsonb GIN index
If you have many different keys and / or new keys are added dynamically, you can cover the rest with a basic (default) jsonb_ops GIN index:
CREATE INDEX trans_idxgin ON "PreStage".transaction USING gin (transaction);
Among other things, this supports the search for keys. But you cannot use it for pattern matching on values.
What's the proper index for querying structures in arrays in Postgres jsonb?
Query
Combine predicates addressing both indexes:
SELECT transaction AS data
FROM "PreStage".transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND transaction ? 'HCP_FST_NM'; -- even if that seems redundant.
The second condition happens to match our partial indexes as well.
So either there is a specific trigram index for the given (popular / common) key, or there is at least an index to find (the few) rows containing the rare key - and then filter for matching values. The same query should give you the best of both worlds.
Be sure to run the latest version of Postgres, there have been various updates for cost estimates recently. It will be crucial that Postgres works with good estimates and current table statistics to choose the best query plan.
There is no built in index that does precisely what you want, searching for an exact key and a corresponding wild-card matching value, without specifying ahead of time which key(s) to use. It should be possible to create an extension which would do this, but it would be an awful lot of work, and I don't know of any that exist.
Your best option that works out of the box might be to cast the jsonb to text and index that text:
create index on transaction using gin ((transaction::text) gin_trgm_ops);
And then add a secondary condition to your query:
SELECT transaction AS data FROM transaction
WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%'
AND transaction::text ilike '%neer%';
Now it can use the index to find anything containing 'neer', and then later re-check that 'neer' occurs in the value for the 'HCP_FST_NM' key, as opposed to just some other place in the JSONB.
If your query word occurs in lots of places other than in the value of the desired key, then this might not give you very good performance. For example, if someone searched for:
transaction->>'EMAIL' ilike '%ADDR%'
AND transaction::text ilike '%ADDR%';
The the index would return every row, assuming all records have the same structure as what you show, because every row contains 'ADDR' because used as a key. Then every row would fail the other condition check, but only after doing a lot of work.

How to create a primary key using the hash method in postgresql

Is there any way to create a primary key using the hash method? Neither of the following statements work:
oid char(30) primary key using hash
primary key(oid) using hash
I assume, you meant to use the hash index method / type.
Primary keys are constraints. Some constraints can create index(es) in order to work properly (but this fact should not be relied upon). F.ex. a UNIQUE constraint will create a unique index. Note, that only B-tree currently supports unique indexes. The PRIMARY KEY constraint is a combination of the UNIQUE and the NOT NULL constraints, so (currently) it only supports B-tree.
You can set up a hash index too, if you want (besides the PRIMARY KEY constraint) -- but you cannot make that unique.
CREATE INDEX name ON table USING hash (column);
But, if you are willing to do this, you should be aware that there is some limitation on the hash indexes (up until PostgreSQL 10):
Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.
Also:
Currently, only the B-tree, GiST and GIN index methods support multicolumn indexes.
Note: Unfortunately, oid is not the best name for a column in PostgreSQL, because it can also be a name for a system column and type.
Note 2: The char(n) type is also discouraged. You can use varchar or text instead, with a CHECK constraint -- or (if the id is so uuid-like) the uuid type itself.