About clustered index in postgres - postgresql

I'm using psql to access a postgres database. When viewing the metadata of a table, is there any way to see whether an index of a table is a clustered index?
I heard that the PRIMARY KEY of a table is automatically associated with a clustered index, is it true?

Note that PostgreSQL uses the term "clustered index" to use something vaguely similar and yet very different to SQL Server.
If a particular index has been nominated as the clustering index for a table, then psql's \d command will indicate the clustered index, e.g.,
Indexes:
"timezone_description_pkey" PRIMARY KEY, btree (timezone) CLUSTER
PostgreSQL does not nominate indices as clustering indices by default. Nor does it automatically arrange table data to correlate with the clustered index even when so nominated: the CLUSTER command has to be used to reorganise the table data.

In PostgreSQL the clustered attribute is held in the metadata of the corresponding index, rather than the relation itself. It is the indisclustered attribute in pg_index catalogue. Note, however, that clustering relations within postgres is a one-time action: even if the attribute is true, updates to the table do not maintain the sorted nature of the data. To date, automatic maintenance of data clustering remains a popular TODO item.
There is often confusion between clustered and integrated indexes, particularly since the popular textbooks use conflicting names, and the terminology is different again in the manuals of postgres and SQL server (to name just two). When I talk about an integrated index (also called a main index or primary index) I mean one in which the relation data is contained in the leaves of the index, as opposed an external or secondary index in which the leaves contain index entries that point to the table records. The former type is necessarily always clustered. Unfortunately postgres only supports the latter type. Anyhow, the fact that an integrated (primary) index is always clustered may have given rise to the belief that "a PRIMARY KEY of a table is automatically associated with a clustered index". The two statements sound similar, but are different.

PostgreSQL does not have direct implementation of CLUSTER index like Microsoft SQL Server.
Reference Taken from this Blog:
In PostgreSQL, we have one CLUSTER command which is similar to Cluster Index.
Once you create your table primary key or any other Index, you can execute the CLUSTER command by specifying that Index name to achieve the physical order of the Table Data.
When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order.
Syntax of Cluster:
First time you must execute CLUSTER using the Index Name.
CLUSTER table_name USING index_name;
Cluster the table:
Once you have executed CLUSTER with Index, next time you should execute only CLUSTER TABLE because It knows that which index already defined as CLUSTER.
CLUSTER table_name;

is there any way to see whether an index of a table is a clustered index
PostgreSQL does not have a clustered index, so you won't be able to see them.
I heard that the PRIMARY KEY of a table is automatically associated with a clustered index, is it true?
No, that's not true (see above)
You can manually cluster a table along an index, but this is nothing that will be maintained automatically (as e.g. with SQL Server's clustered indexes).
For more details, see the description of the CLUSTER command in the manual.

Cluster Indexing
A cluster index means telling the database to store the close values actually close to one another on the disk. They can uniquely identify the rows in the SQL table. Every table can have exactly one one clustered index. A cluster index can cover more than one column. By default, a column with a primary key already has a clustered index.
dictionary
A dictionary itself is a table with clustered index. Because all the data is physically stored in alphabetical order.
Non-Cluster Indexing
Non-clustered indexing is like simple indexing of a book. They are just used for fast retrieval of data. Not sure to have unique data. A non-clustered index contains the non-clustered index keys and their corresponding data location pointer. For example, a book's content index contains the key of a topic or chapter and the page location of that.
book content index
A book's content table holds the content name and its page location. It is not sure that the data is unique. Because same paragraph or text line or word can be placed many times.
PostgreSQL Indexing
PostgreSQL automatically creates indexes for PRIMARY KEY and every UNIQUE constraints of a table. Login to a database in PostgreSQL terminal and type \d table_name. All stored indexes will be visualized. If there is a clustered index then it will also be identified.
Creating a table
CREATE TABLE IF NOT EXISTS profile(
uid serial NOT NULL UNIQUE PRIMARY KEY,
username varchar(30) NOT NULL UNIQUE,
phone varchar(11) NOT NULL UNIQUE,
age smallint CHECK(age>12),
address text NULL
);
3 index will be created automatically. All these indexes are non clustered
"profile_pkey" PRIMARY KEY, btree (uid)
"profile_phone_key" UNIQUE CONSTRAINT, btree (phone)
"profile_username_key" UNIQUE CONSTRAINT, btree (username)
Create our own index with uid and username
CREATE INDEX profile_index ON profile(uid, username);
This actually creates a non-clustered index. To make it clustered, run the next part.
Transform a non-clustered index into a clustered one
ALTER TABLE profile CLUSTER ON profile_index;
Check the table with \d profile. It will be like this:
Table "public.profile"
Column | Type | Collation | Nullable | Default
----------+-----------------------+-----------+----------+--------------------------------------
uid | integer | | not null | nextval('profile_uid_seq'::regclass)
username | character varying(30) | | not null |
phone | character varying(11) | | not null |
age | smallint | | |
address | text | | |
Indexes:
"profile_pkey" PRIMARY KEY, btree (uid)
"profile_phone_key" UNIQUE CONSTRAINT, btree (phone)
"profile_username_key" UNIQUE CONSTRAINT, btree (username)
"profile_index" btree (uid, username) CLUSTER
Check constraints:
"profile_age_check" CHECK (age > 12)
Notice that the profile_index is now "CLUSTER"
Now, re-cluster the table so that the table can follow the cluster index role
CLUSTER profile;

If you want to know if a given table is CLUSTERed using SQL, you can use the following query to show the index being used (tested in Postgres versions 9.5 and 9.6):
SELECT
i.relname AS index_for_cluster
FROM
pg_index AS idx
JOIN
pg_class AS i
ON
i.oid = idx.indexrelid
WHERE
idx.indisclustered
AND idx.indrelid::regclass = 'your_table_name'::regclass;

Related

How to solve PostgreSQL index width problem

I have a docker container which contains a PostgreSQL database. An application then connects to the database. In the database I have a table defined as:
CREATE TABLE IF NOT EXISTS configuration (
id SERIAL PRIMARY KEY,
board BIGINT NOT NULL REFERENCES boards ( id ),
date_time decimal(20,0) NOT NULL,
version INTEGER NOT NULL,
data TEXT NOT NULL,
UNIQUE(board, date_time, version, data)
);
I am not explicitly creating any indices or any of the other meta-objects associated with this table.
The application used to be able to write to this table without problem, but now I am getting the following error:
Failure during 'insert_configuration_record': ERROR: index row size 5992 exceeds btree version 4 maximum 2704 for index "configuration_board_date_time_version_data_key"
DETAIL: Index row references tuple (2,12) in relation "configuration".
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
It is possible that the PostgreSQL version changed at some point when I rebuilt the docker container, but I have not seen any messages refusing to load the database from the persistent storage or asking me to upgrade it. The current database version is (PostgreSQL) 12.9 (Ubuntu 12.9-0ubuntu0.20.04.1).
It is possible that previously the data I was writing was short enough to not hit the limit.
How do I use "a function index of an MD5 hash of the value" to avoid this problem?
If you had that unique constraint before with the same data, then you must have built PostgreSQL with a block size greater than the default 8kB.
Anyway, you should do that the hint tells you, and instead of the unique constraint create a unique index:
CREATE UNIQUE INDEX ON configuration (
board,
date_time,
version,
md5(data)
);
You cannot turn this index into a unique constraint, because such a constraint can only be defined on plain columns, not on expressions. However, the behavior will be just the same as a constraint.

Which index is used to answer aggregates when we have several indexes?

I have a table which is partitioned on daily basis, each partition has certainly a primary key, and several other indexes on columns which are not null. If I get the query plane for the following:
SELECT COUNT(*) FROM parent_table;
I can see different indexes are used, sometimes the primary key index is used and some times others. How postgres is able to decide which index to use. Note that, my table is not clustered and never clustered before. Also, the primary key is serial.
What are the catalog / statistics tables which are used to make this decision.

Index creation taking forever on postgres

I was trying to create an index to an integer column using a btree index, but it was taking forever (more than 2 hours!).
The table has 17.514.879 lines. I didn't expect it to take that long.
After almost 2.5 hours, the connection to the database just died. When I reconnected to it, the index was there, but I don't know how good is this index.
How can I be sure the index was not messed up by the connection lost?
How to Check Whether the Index is Fine
Connect to the database via psql and run \d table_name (where table_name is the name of your table). For example:
grn=# \d users
Table "public.users"
Column | Type | Modifiers
--------+------------------------+-----------
name | character varying(255) |
Indexes:
"users_name_idx" btree (name)
You'll see indexes listed below the table schema. If the index is corrupt it'll be marked as so.
How to Create Indexes Without Locking the Whole Table
You can create an index in a way that doesn't lock the whole table but is even slower. To do so you need to add CONCURRENTLY to CREATE INDEX. For example:
CREATE INDEX CONCURRENTLY users_name_idx ON users(name);
How to Fix a Corrupt Index
If the index is corrupt you can either drop it and recreate CONCURRENTLY or use REINDEX INDEX index_name. For example:
REINDEX INDEX users_name_idx
will recreate the users_name_idx.

At what level do Postgres index names need to be unique?

In Microsoft SQL Server and MySQL, index names need to unique within the table, but not within the database. This doesn't seem to be the case for PostgreSQL.
Here's what I'm doing: I made a copy of a table using CREATE TABLE new_table AS SELECT * FROM old_table etc and need to re-create the indexes.
Running a query like CREATE INDEX idx_column_name ON new_table USING GIST(column_name) causes ERROR: relation "idx_column_name" already exists
What's going on here?
Indexes and tables (and views, and sequences, and...) are stored in the pg_class catalog, and they're unique per schema due to a unique key on it:
# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
----------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
...
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
Per #wildplasser's comment, you can omit the name when creating the index, and PG will assign a unique name automatically.
Names are unique within the schema. A schema is basically a namespace for {tables,constraints}, (and indexes, functions,etc).
cross-schema-constraints are allowed
Indexes share their namespace ( :=schema) with tables. (for Postgres: an index is a table).
(IIRC) the SQL standard does not define indexes; use constraints whenever you can (The GIST index in the question is probably an exception)
Ergo You'll need to invent another name.
or omit it: the system can invent a name if you dont supply one.
The downside of this: you can create multipe indices with the same definition (their names will be suffixed with _1, _2, IIRC)

Postgres - unique index on primary key

On Postgres, a unique index is automatically created for primary key columns. From the docs,
When an index is declared unique, multiple table rows with equal
indexed values are not allowed. Null values are not considered equal.
A multicolumn unique index will only reject cases where all indexed
columns are equal in multiple rows.
From my understanding, it seems like this index only checks uniqueness and isn't actually present for faster access when querying by primary key id's. Does this mean that this index structure doesn't consist of a sorted table (or a tree) for the primary key column? Is this correct?
In theory a unique or primary key constraint could be enforced without the presence of an index, but it would be a painful process. The index is mainly there for performance purposes.
However some databases (eg Oracle) allow a unique or primary key constraint to be supported by a non-unique index. Primarily this allows the enforcement of the constraint to be deferred until the end of a transaction, so lack of uniqueness can be permitted temporarily during a transaction, but also allows indexes to be built in parallel and with the constraint then defined as a secondary step.
Also, I'm not sure how the internals work on a PostgreSQL btree index, but all Oracle btree's are internally declared to be unique either:
on the key column(s), for an index that is intended to be UNIQUE, or
on the key column(s) plus the indexed row's ROWID, for a non-unique index.
Quite the contrary, The index is created in order to allow faster access - mainly to check for duplicates when a new record is inserted but can also be used by other queries against PK columns. The best structure for uk indexes is a btree because during the insert the index is created - If the rdbms detects collision in the leaf he will raise a unique constraint violation.