How to create a primary key using the hash method in postgresql - postgresql

Is there any way to create a primary key using the hash method? Neither of the following statements work:
oid char(30) primary key using hash
primary key(oid) using hash

I assume, you meant to use the hash index method / type.
Primary keys are constraints. Some constraints can create index(es) in order to work properly (but this fact should not be relied upon). F.ex. a UNIQUE constraint will create a unique index. Note, that only B-tree currently supports unique indexes. The PRIMARY KEY constraint is a combination of the UNIQUE and the NOT NULL constraints, so (currently) it only supports B-tree.
You can set up a hash index too, if you want (besides the PRIMARY KEY constraint) -- but you cannot make that unique.
CREATE INDEX name ON table USING hash (column);
But, if you are willing to do this, you should be aware that there is some limitation on the hash indexes (up until PostgreSQL 10):
Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.
Also:
Currently, only the B-tree, GiST and GIN index methods support multicolumn indexes.
Note: Unfortunately, oid is not the best name for a column in PostgreSQL, because it can also be a name for a system column and type.
Note 2: The char(n) type is also discouraged. You can use varchar or text instead, with a CHECK constraint -- or (if the id is so uuid-like) the uuid type itself.

Related

Index required for basic joins on foreign key that references a primary key

I have a question about a fundamental aspect of PostgreSQL.
Suppose I have two tables along the lines of the following:
create table source_data_property (
source_data_property_id integer primary key generated by default as identity,
property_name text not null
);
create table source_data_value (
source_data_value_id integer primary key generated by default as identity,
source_data_property_id integer not null references source_data_property,
data_value numeric not null
);
Suppose I write a very simple query that just performs a basic join:
select
sdp.source_data_property_id,
sdp.property_name,
sdv.source_data_value_id,
sdv.data_value
from source_data_property as sdp
join source_data_value as sdv using (source_data_property_id)
;
For optimal query performance, is it necessary to add an index on the source_data_property_id column in the source_data_value table? My original thought was no, because the source_data_property_id is already indexed in the source_data_property table, but after thinking about it a bit I'm not so sure.
For optimal query performance, is it necessary to add an index on the source_data_property_id column in the source_data_value table?
In general yes, make indexes for your foreign keys. However...
A very small table won't get any advantage from indexes and Postgres will do a seq scan instead.
Similarly it depends on what sort of queries you're doing. In your example you're fetching every row in source_data_property which will also fetch every row in source_data_value. Using an index is slower and Postgres will do a seq scan instead.

Possible to use a BRIN Index on a Primary Key in PostgreSQL

I was reading up on the BRIN index within PostgreSQL, and it seems to be beneficial to many of the tables we use.
That said, it applies nicely to a column which is already the primary key, in which case adding a separate index would negate part of the benefit of the index, which is space savings.
A PK is implicitly indexed, is it not? On that note, can it be done using a BRIN instead of a Btree, assuming the Btree is also implicit?
I tried this, and as expected it did not work:
create table foo (
id integer,
constraint foo_pk primary key using BRIN (id)
)
So, two questions:
Can a BRIN index be used on a PK?
If not, will the planner pick the more appropriate of the two if I have both a PK and a separate BRIN index (if performance means more to me than space)
And it's course possible that my understanding of this is incomplete, in which case I would appreciate any enlightenment.
Primary keys are a logical combination of NOT NULL and UNIQUE, therefore only an index type that supports uniqueness can be used.
From the PostgreSQL documentation (currently version 13):
Only B-tree currently supports unique indexes.
I'm not so sure BRIN would be faster than B-tree. It's a lot more space-efficient, but the fact that it's lossy and requires a secondary verification pass erodes any potential speed advantages. Once you are locked into having your B-tree primary key index, there's not much point to making a secondary overlapping BRIN index.

PostgreSQL OK to use HASH exclude constraint for uniqueness?

Since hashes are smaller than lengthy text, it seems to me that they could be preferred to b-trees for ensuring column uniqueness.
For the sole purpose of ensuring uniqueness, is there any reason the following isn't OK in PG 10?
CREATE TABLE test (
path ltree,
EXCLUDE USING HASH ((path::text) WITH =)
);
I assume hash collisions are internally dealt with. Would be useless otherwise.
I will use a GiST index for query enhancement.
I think quoting the manual on this sums it up:
Although it's allowed, there is little point in using B-tree or hash
indexes with an exclusion constraint, because this does nothing that
an ordinary unique constraint doesn't do better. So in practice the access method will always be GiST or SP-GiST.
All the more since you want to create a GiST index anyway. The exclusion constraint with USING GIST will create a matching GiST index as implementation detail automatically. No point in maintaining another, inefficient hash index not even being used in queries.
For simple uniqueness (WITH =) a plain UNIQUE btree index is more efficient. If your keys are very long, consider a unique index on a hash expression (using any immutable function) to reduce size. Like:
CREATE UNIQUE INDEX test_path_hash_uni_idx ON test (my_hash_func(path));
Related:
How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?
md5() would be a simple option as hash function.
Convert hex in text representation to decimal number
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Before Postgres 10 the use of hash indexes was discouraged. But with the latest updates, this has improved. Robert Haas (core developer) sums it up in a blog entry:
PostgreSQL's Hash Indexes Are Now Cool
CREATE INDEX test_path_hash_idx ON test USING HASH (path);
Alas (missed that in my draft), the access method "hash" does not support unique indexes, yet. So I would still go with the unique index on a hash expression above. (But no hash function that reduces information can fully guarantee uniqueness of the key - which may or may not be an issue.)

Postgresql and primary key, foreign key indexing

On https://stackoverflow.com/questions/10356484/how-to-add-on-delete-cascade-constraints#= a user, kgrittn, commented saying that
But I notice that you have not created indexes on referencing columns... Deletes on the referenced table will take a long time without those, if you get many rows in those tables. Some databases automatically create an index on the referencing column(s); PostgreSQL leaves that up to you, since there are some cases where it isn't worthwhile.
I'm having difficulty understanding this completely. Is he saying that primary keys are not created automatically with an index or is he saying that foreign keys should be indexed (in particular cases that is). I've looked at the PostgreSQL documentation and it appears from there that an index is created for primary keys automatically. Is there a command I can use to list all indexes?
Thanks
A primary key is behind the scenes a special kind of a unique index. The quote referencing, that it might be a good idea to create an index also on columns, where the primary key is used as an foreign key.

Postgres - unique index on primary key

On Postgres, a unique index is automatically created for primary key columns. From the docs,
When an index is declared unique, multiple table rows with equal
indexed values are not allowed. Null values are not considered equal.
A multicolumn unique index will only reject cases where all indexed
columns are equal in multiple rows.
From my understanding, it seems like this index only checks uniqueness and isn't actually present for faster access when querying by primary key id's. Does this mean that this index structure doesn't consist of a sorted table (or a tree) for the primary key column? Is this correct?
In theory a unique or primary key constraint could be enforced without the presence of an index, but it would be a painful process. The index is mainly there for performance purposes.
However some databases (eg Oracle) allow a unique or primary key constraint to be supported by a non-unique index. Primarily this allows the enforcement of the constraint to be deferred until the end of a transaction, so lack of uniqueness can be permitted temporarily during a transaction, but also allows indexes to be built in parallel and with the constraint then defined as a secondary step.
Also, I'm not sure how the internals work on a PostgreSQL btree index, but all Oracle btree's are internally declared to be unique either:
on the key column(s), for an index that is intended to be UNIQUE, or
on the key column(s) plus the indexed row's ROWID, for a non-unique index.
Quite the contrary, The index is created in order to allow faster access - mainly to check for duplicates when a new record is inserted but can also be used by other queries against PK columns. The best structure for uk indexes is a btree because during the insert the index is created - If the rdbms detects collision in the leaf he will raise a unique constraint violation.