Possible to use a BRIN Index on a Primary Key in PostgreSQL - postgresql

I was reading up on the BRIN index within PostgreSQL, and it seems to be beneficial to many of the tables we use.
That said, it applies nicely to a column which is already the primary key, in which case adding a separate index would negate part of the benefit of the index, which is space savings.
A PK is implicitly indexed, is it not? On that note, can it be done using a BRIN instead of a Btree, assuming the Btree is also implicit?
I tried this, and as expected it did not work:
create table foo (
id integer,
constraint foo_pk primary key using BRIN (id)
)
So, two questions:
Can a BRIN index be used on a PK?
If not, will the planner pick the more appropriate of the two if I have both a PK and a separate BRIN index (if performance means more to me than space)
And it's course possible that my understanding of this is incomplete, in which case I would appreciate any enlightenment.

Primary keys are a logical combination of NOT NULL and UNIQUE, therefore only an index type that supports uniqueness can be used.
From the PostgreSQL documentation (currently version 13):
Only B-tree currently supports unique indexes.
I'm not so sure BRIN would be faster than B-tree. It's a lot more space-efficient, but the fact that it's lossy and requires a secondary verification pass erodes any potential speed advantages. Once you are locked into having your B-tree primary key index, there's not much point to making a secondary overlapping BRIN index.

Related

What is the best type of index to use on a materialized view in PostgreSQL

I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.
If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.

PostgreSQL OK to use HASH exclude constraint for uniqueness?

Since hashes are smaller than lengthy text, it seems to me that they could be preferred to b-trees for ensuring column uniqueness.
For the sole purpose of ensuring uniqueness, is there any reason the following isn't OK in PG 10?
CREATE TABLE test (
path ltree,
EXCLUDE USING HASH ((path::text) WITH =)
);
I assume hash collisions are internally dealt with. Would be useless otherwise.
I will use a GiST index for query enhancement.
I think quoting the manual on this sums it up:
Although it's allowed, there is little point in using B-tree or hash
indexes with an exclusion constraint, because this does nothing that
an ordinary unique constraint doesn't do better. So in practice the access method will always be GiST or SP-GiST.
All the more since you want to create a GiST index anyway. The exclusion constraint with USING GIST will create a matching GiST index as implementation detail automatically. No point in maintaining another, inefficient hash index not even being used in queries.
For simple uniqueness (WITH =) a plain UNIQUE btree index is more efficient. If your keys are very long, consider a unique index on a hash expression (using any immutable function) to reduce size. Like:
CREATE UNIQUE INDEX test_path_hash_uni_idx ON test (my_hash_func(path));
Related:
How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?
md5() would be a simple option as hash function.
Convert hex in text representation to decimal number
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Before Postgres 10 the use of hash indexes was discouraged. But with the latest updates, this has improved. Robert Haas (core developer) sums it up in a blog entry:
PostgreSQL's Hash Indexes Are Now Cool
CREATE INDEX test_path_hash_idx ON test USING HASH (path);
Alas (missed that in my draft), the access method "hash" does not support unique indexes, yet. So I would still go with the unique index on a hash expression above. (But no hash function that reduces information can fully guarantee uniqueness of the key - which may or may not be an issue.)

How to create a primary key using the hash method in postgresql

Is there any way to create a primary key using the hash method? Neither of the following statements work:
oid char(30) primary key using hash
primary key(oid) using hash
I assume, you meant to use the hash index method / type.
Primary keys are constraints. Some constraints can create index(es) in order to work properly (but this fact should not be relied upon). F.ex. a UNIQUE constraint will create a unique index. Note, that only B-tree currently supports unique indexes. The PRIMARY KEY constraint is a combination of the UNIQUE and the NOT NULL constraints, so (currently) it only supports B-tree.
You can set up a hash index too, if you want (besides the PRIMARY KEY constraint) -- but you cannot make that unique.
CREATE INDEX name ON table USING hash (column);
But, if you are willing to do this, you should be aware that there is some limitation on the hash indexes (up until PostgreSQL 10):
Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.
Also:
Currently, only the B-tree, GiST and GIN index methods support multicolumn indexes.
Note: Unfortunately, oid is not the best name for a column in PostgreSQL, because it can also be a name for a system column and type.
Note 2: The char(n) type is also discouraged. You can use varchar or text instead, with a CHECK constraint -- or (if the id is so uuid-like) the uuid type itself.

Postgres - unique index on primary key

On Postgres, a unique index is automatically created for primary key columns. From the docs,
When an index is declared unique, multiple table rows with equal
indexed values are not allowed. Null values are not considered equal.
A multicolumn unique index will only reject cases where all indexed
columns are equal in multiple rows.
From my understanding, it seems like this index only checks uniqueness and isn't actually present for faster access when querying by primary key id's. Does this mean that this index structure doesn't consist of a sorted table (or a tree) for the primary key column? Is this correct?
In theory a unique or primary key constraint could be enforced without the presence of an index, but it would be a painful process. The index is mainly there for performance purposes.
However some databases (eg Oracle) allow a unique or primary key constraint to be supported by a non-unique index. Primarily this allows the enforcement of the constraint to be deferred until the end of a transaction, so lack of uniqueness can be permitted temporarily during a transaction, but also allows indexes to be built in parallel and with the constraint then defined as a secondary step.
Also, I'm not sure how the internals work on a PostgreSQL btree index, but all Oracle btree's are internally declared to be unique either:
on the key column(s), for an index that is intended to be UNIQUE, or
on the key column(s) plus the indexed row's ROWID, for a non-unique index.
Quite the contrary, The index is created in order to allow faster access - mainly to check for duplicates when a new record is inserted but can also be used by other queries against PK columns. The best structure for uk indexes is a btree because during the insert the index is created - If the rdbms detects collision in the leaf he will raise a unique constraint violation.

Doubt in clustered and non Clustered index

I have a doubt that if my table do n't have any constraint like Primary Key,Foreign key,Unique key etc. then can i create the clustered index on table and clustered index can have the douplicate records ?
My 2nd question is where should we exectly use the non clustered index and when it is useful and benificial to create in table?
My 3rd question is How can we create the 249 non clustered index in a table .Is it the meaning, Creating the non clustered index on 249 columns ?
Can you anyone help me to remove my confusion in this.
First, the definition of a clustered index is that it is physical ordering of data on the disk. Every time you do an insert into that table, the new record will be placed on the physical disk in its order based on its value in the clustered index column. Because it is the physical location on the disk, it is (A) the most rapidly accessible column in the table but (B) only possible to define a single clustered index per table. Which column (or columns) you use as the clustered index depend on the data itself and its use. Primary keys are typically the clustered index, especially if the primary key is sequential (e.g. an integer that increments automatically with each insert). This will provide the fastest insert/update/delete functionality. If you are more interested in performing reads (select * from table), you may want to cluster on a Date column, as most queries have either a date in the where clause, the group by clause or both.
Second, clustered indexes (at least in the DB's I know) need not be unique (they CAN have duplicates). Constraining the column to be unique is separate matter. If the clustered index is a primary key its uniqueness is a function of being a primary key.
Third, I can't follow you questions concerning 249 columns. A non-clustered index is basically a tool for accelerating queries at the expense of extra disk space. It's hard to think of a case where creating an index on each column is necessary. If you want a quick rule of thumb...
Write a query using your table.
If a column is required to do a join, index it.
If a column is used in a where column, index it.
Remember all the indexes are doing for you is speeding up your queries. If queries run fast, don't worry about them.
This is just a thumbnail sketch of a large topic. There are tons of more informative/comprehensive resources on this matter, and some depend on the database system ... just google it.