PostgreSQL OK to use HASH exclude constraint for uniqueness? - postgresql

Since hashes are smaller than lengthy text, it seems to me that they could be preferred to b-trees for ensuring column uniqueness.
For the sole purpose of ensuring uniqueness, is there any reason the following isn't OK in PG 10?
CREATE TABLE test (
path ltree,
EXCLUDE USING HASH ((path::text) WITH =)
);
I assume hash collisions are internally dealt with. Would be useless otherwise.
I will use a GiST index for query enhancement.

I think quoting the manual on this sums it up:
Although it's allowed, there is little point in using B-tree or hash
indexes with an exclusion constraint, because this does nothing that
an ordinary unique constraint doesn't do better. So in practice the access method will always be GiST or SP-GiST.
All the more since you want to create a GiST index anyway. The exclusion constraint with USING GIST will create a matching GiST index as implementation detail automatically. No point in maintaining another, inefficient hash index not even being used in queries.
For simple uniqueness (WITH =) a plain UNIQUE btree index is more efficient. If your keys are very long, consider a unique index on a hash expression (using any immutable function) to reduce size. Like:
CREATE UNIQUE INDEX test_path_hash_uni_idx ON test (my_hash_func(path));
Related:
How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?
md5() would be a simple option as hash function.
Convert hex in text representation to decimal number
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Before Postgres 10 the use of hash indexes was discouraged. But with the latest updates, this has improved. Robert Haas (core developer) sums it up in a blog entry:
PostgreSQL's Hash Indexes Are Now Cool
CREATE INDEX test_path_hash_idx ON test USING HASH (path);
Alas (missed that in my draft), the access method "hash" does not support unique indexes, yet. So I would still go with the unique index on a hash expression above. (But no hash function that reduces information can fully guarantee uniqueness of the key - which may or may not be an issue.)

Related

What is the best type of index to use on a materialized view in PostgreSQL

I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.
If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.

Possible to use a BRIN Index on a Primary Key in PostgreSQL

I was reading up on the BRIN index within PostgreSQL, and it seems to be beneficial to many of the tables we use.
That said, it applies nicely to a column which is already the primary key, in which case adding a separate index would negate part of the benefit of the index, which is space savings.
A PK is implicitly indexed, is it not? On that note, can it be done using a BRIN instead of a Btree, assuming the Btree is also implicit?
I tried this, and as expected it did not work:
create table foo (
id integer,
constraint foo_pk primary key using BRIN (id)
)
So, two questions:
Can a BRIN index be used on a PK?
If not, will the planner pick the more appropriate of the two if I have both a PK and a separate BRIN index (if performance means more to me than space)
And it's course possible that my understanding of this is incomplete, in which case I would appreciate any enlightenment.
Primary keys are a logical combination of NOT NULL and UNIQUE, therefore only an index type that supports uniqueness can be used.
From the PostgreSQL documentation (currently version 13):
Only B-tree currently supports unique indexes.
I'm not so sure BRIN would be faster than B-tree. It's a lot more space-efficient, but the fact that it's lossy and requires a secondary verification pass erodes any potential speed advantages. Once you are locked into having your B-tree primary key index, there's not much point to making a secondary overlapping BRIN index.

Are these indexes doing the same thing in respect to customer_id?

I'm pretty new to PostgreSQL so apologies if I'm asking the obvious.
I've got a table called customer_products. It contains the following two indexes:
CREATE INDEX customer_products_customer_id
ON public.customer_products USING btree (customer_id)
CREATE UNIQUE INDEX customer_products_customer_id_product_id
ON public.customer_products USING btree (customer_id, product_id)
Are they both doing the same thing in respect to customer_id or do they function in a different way? I'm not sure if I should leave them or remove customer_products_customer_id.
There is nothing that the first index can do that the second cannot, so you should drop the first index.
The only advantage of the first index over the second when it comes to queries whose WHERE (or ORDER BY) clause involves customer_id only is that the index is smaller. That makes a range scan over many index entries somewhat faster.
The price for an extra index in terms of size and data modification speed usually outweighs that advantage. In a read-only data warehouse where I have a query that profits significantly I may be tempted to keep both indexes, otherwise I wouldn't.
You should definitely not drop the UNIQUE index, because it has a valuable use that has nothing to do with performance: it prevents the table from containing two rows that have the save values for the indexed columns. If that is what you want to guarantee, a UNIQUE index will make sure that your data keep in good shape.
Side remark: even though the effect is the same, it is better if the table has a unique constraint (which is backed by a unique index) than just having the index. If nothing else, it documents the purpose better.

Using postgresql UNIQUE INDEX and FUNCTIONAL INDEX together

I have a postgresql table like below.
CREATE TABLE "user" (
"id" integer NOT NULL,
"hash" char(40) NOT NULL,
"username" char(255) NOT NULL,
PRIMARY KEY ("id"),
UNIQUE ("hash"));
However, since hash is 40 letter, I want to do a functional index like below to reduce the memory requirement.
CREATE INDEX CONCURRENTLY on user (substr(hash, 0, 20))
Is it okay to do like that or will it just generate another useless index? How can I make sure that UNIQUE index only index first 20 char from my hash?
Thanks.
If you need the hash to be unique, you must have a unique index on the whole thing. Otherwise you'll get unique violations for hashes that differ only in the last 20 chars.
You can create a non-unique index on the left 20 chars, like you showed:
CREATE INDEX on user (left(hash, 20))
But it probably serves no useful purpose. PostgreSQL will create a unique index on the whole 40 char hash automatically when you declare it as a a UNIQUE constraint. You cannot drop this index without dropping the constraint. So you're stuck with the full size index if you want to enforce uniqueness of hashes. Given that, it's unlikely that the functional index will be of much benefit. Even in queries like:
SELECT ...
FROM "user"
WHERE left(hash, 20) = left($1, 20) AND hash = $1
where you might think you're saving time by using a smaller index to do a quick check first, in reality it's fairly likely that PostgreSQL will ignore the functional index and prefer the full index since it's more selective.
I'm not totally clear on what you're trying to achieve, but if it's doing a partial or functional index to implement a unique constraint, you can't do that.
Also, store hash as bytea and use the index expression left(hash, 20). Or maybe 10, if you're currently storing as a 2-chars-per-byte hex representation.

How to create a primary key using the hash method in postgresql

Is there any way to create a primary key using the hash method? Neither of the following statements work:
oid char(30) primary key using hash
primary key(oid) using hash
I assume, you meant to use the hash index method / type.
Primary keys are constraints. Some constraints can create index(es) in order to work properly (but this fact should not be relied upon). F.ex. a UNIQUE constraint will create a unique index. Note, that only B-tree currently supports unique indexes. The PRIMARY KEY constraint is a combination of the UNIQUE and the NOT NULL constraints, so (currently) it only supports B-tree.
You can set up a hash index too, if you want (besides the PRIMARY KEY constraint) -- but you cannot make that unique.
CREATE INDEX name ON table USING hash (column);
But, if you are willing to do this, you should be aware that there is some limitation on the hash indexes (up until PostgreSQL 10):
Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.
Also:
Currently, only the B-tree, GiST and GIN index methods support multicolumn indexes.
Note: Unfortunately, oid is not the best name for a column in PostgreSQL, because it can also be a name for a system column and type.
Note 2: The char(n) type is also discouraged. You can use varchar or text instead, with a CHECK constraint -- or (if the id is so uuid-like) the uuid type itself.