At what level do Postgres index names need to be unique? - postgresql

In Microsoft SQL Server and MySQL, index names need to unique within the table, but not within the database. This doesn't seem to be the case for PostgreSQL.
Here's what I'm doing: I made a copy of a table using CREATE TABLE new_table AS SELECT * FROM old_table etc and need to re-create the indexes.
Running a query like CREATE INDEX idx_column_name ON new_table USING GIST(column_name) causes ERROR: relation "idx_column_name" already exists
What's going on here?

Indexes and tables (and views, and sequences, and...) are stored in the pg_class catalog, and they're unique per schema due to a unique key on it:
# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
----------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
...
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
Per #wildplasser's comment, you can omit the name when creating the index, and PG will assign a unique name automatically.

Names are unique within the schema. A schema is basically a namespace for {tables,constraints}, (and indexes, functions,etc).
cross-schema-constraints are allowed
Indexes share their namespace ( :=schema) with tables. (for Postgres: an index is a table).
(IIRC) the SQL standard does not define indexes; use constraints whenever you can (The GIST index in the question is probably an exception)
Ergo You'll need to invent another name.
or omit it: the system can invent a name if you dont supply one.
The downside of this: you can create multipe indices with the same definition (their names will be suffixed with _1, _2, IIRC)

Related

Difference between oid and relfilenode

I am reading Internals of postgreSQL chp 1 and I am unable to understand the difference between object identifier and relfilenode.
Tables and indexes as database objects are internally managed by individual OIDs, while those data files are managed by the variable, relfilenode. The relfilenode values of tables and indexes basically but not always match the respective OIDs
I get that both these are the attributes of the system catalog 'pg_class' and OID can be thought of as the primary key of the table, so what is the purpose of relfilenode and how is it different from OID?
relfilenode is the prefix for the name of the files that make up the table. Initially it is identical to the immutable object ID (oid), but SQL statements that rewrite the table will modify it (for example VACUUM (FULL), CLUSTER, TRUNCATE or the variants of ALTER TABLE that rewrite the table).

duplicate key in a PostgreSQL index

I want to move my OwnCloud database to a new server, but the operation fails during restore.
pg_restore: [archive program (db)] COPY failed for table "oc_storages": ERROR: value of a duplicate key breaks unique constraint "storages_id_index"
DETAIL: The key "(id) = (local :: / var / www / owncloud_data /)" already exists.
Indeed, a simple query on the oc_sorages database shows that there is a duplicate.
ocl=# select * from oc_storages where id ~* 'owncloud_data';
id | numeric_id | available | last_checked
--------------------------------+------------+-----------+--------------
local::/var/www/owncloud_data/ | 491 | 1 |
local::/var/www/owncloud_data/ | 838 | 1 |
(2 rows)
but at the same time, postgresql managed to create an index for this table based on the id (storages_id_index). How is it possible that PostgreSQL accepts this duplicate in this table?
ocl=# SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'oc_storages';
indexname | indexdef
-------------------+-------------------------------------------------------------------------------------
oc_storages_pkey | CREATE UNIQUE INDEX oc_storages_pkey ON public.oc_storages USING btree (numeric_id)
storages_id_index | CREATE UNIQUE INDEX storages_id_index ON public.oc_storages USING btree (id)
(2 rows)
What to do to get out of this impasse: delete one of the two values? which ?
Thanks in advance.
Ernest.
There are usually two explanation for this:
Hardware problems leading to data corruption. Then remove conflicting rows manually, export the database and import it into a newly created cluster to get rid of potential lurking data corruption.
You upgraded the C library on the operating system and the collations changed, corrupting the index. Then remove conflicting rows manually and REINDEX the indexes with string columns.
This is one of those semantic annoyances I have with Postgres, but creating a UNIQUE INDEX on a table does not actually add an enforced table constraint.
You need to explicitly add each constraint USING the created index, e.g.:
CREATE UNIQUE INDEX oc_storages_pkey ON public.oc_storages USING btree (numeric_id);
ALTER TABLE public.oc_storages ADD CONSTRAINT oc_storages_pkey UNIQUE USING INDEX oc_storages_pkey;
If you do have such a table constraint already, then this would be a case of corruption.

ERROR: cannot create a unique index without the column "date_time" (used in partitioning)

I just started using timescaleDB with postgresql. I have a database named storage_db which contains a table named day_ahead_prices.
After installing timescaledb, I was following Migrate from the same postgresql database to migrate my storage_db into a timescaledb.
When I did (indexes included):
CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES);
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
It gave me the following error:
ERROR: cannot create a unique index without the column "date_time" (used in partitioning)
But when I did (indexed excluded):
CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
It was successful. Following which, I did
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
and it gave me the following output:
create_hypertable
------------------------------------
(3,public,tsdb_day_ahead_prices,t)
(1 row)
I am a bit new to this so can anyone please explain to me what is the difference between both of them and why was I getting an error in the first case?
P.S.:
My day_ahead_prices looks as follows:
id | country_code | values | date_time
----+--------------+---------+----------------------------
1 | LU | 100.503 | 2020-04-11 14:04:30.461605
2 | LU | 100.503 | 2020-04-11 14:18:39.600574
3 | DE | 106.68 | 2020-04-11 15:59:10.223965
Edit 1:
I created the day_ahead_prices table in python using flask and flask_sqlalchemy and the code is:
class day_ahead_prices(db.Model):
__tablename__ = "day_ahead_prices"
id = db.Column(db.Integer, primary_key=True)
country_code = db.Column(avail_cc_enum, nullable=False)
values = db.Column(db.Float(precision=2), nullable=False)
date_time = db.Column(db.DateTime, default=datetime.now(tz=tz), nullable=False)
def __init__(self, country_code, values):
self.country_code = country_code
self.values = values
When executing CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES); you're telling the database to create the tsdb_day_ahead_prices table using the day_ahead_prices as a template (same columns, same types for those columns), but you're also telling it to include the default values, constraints and indexes that you have defined on the original table, and apply/create the same for your new table.
Then you are executing the timescaledb command that makes the tsdb_day_ahead_prices table
a hypertable. A hypertable is an abstraction that hides away the partitioning of the physical
table. (https://www.timescale.com/products/how-it-works). You are telling
TimescaleDB to make the tsdb_day_ahead_prices a hypertable using the date_time column as a partitioning key.
When creating hypertables, one constraing that TimescaleDB imposes is that the partitioning column (in your case 'date_time') must be included in any unique indexes (and Primary Keys) for that table. (https://docs.timescale.com/latest/using-timescaledb/schema-management#indexing-best-practices)
The first error you get cannot create a unique index without the column "date_time" is exactly because of this. You copied the primary key definition on the id column. So the primary key is preventing
the table to be a hypertable.
The second time, you created the tsdb_day_ahead_prices table but you didn't copy
the indexes from the original table, so the primary key is not defined (which is really a unique index). So the creation of the hypertable was successfull.
The output you get from the create_hypertable function tells you that you have a new hypertable, in the public schema, the name of the hypertable, and the internal id that timescaledb uses for it.
So now you can use the tsdb_day_ahead_prices as normal, and timescaledb underneath will make sure the data goes into the proper partitions/chunks
Does the id need to be unique for this table?
If you're going to be keeping time-series data
then each row may not really be unique for each id, but may be uniquely identified by the id at a given time.
You can create a separate table for the items that you're identifying
items(id PRIMARY KEY, country_code) and have the hypertable be
day_ahead_prices(time, value, item_id REFERENCES items(id))

How to add a sort key to an existing table in AWS Redshift

In AWS Redshift, I want to add a sort key to a table that is already created. Is there any command which can add a column and use it as sort key?
UPDATE:
Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.
source: https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-redshift-supports-changing-table-sort-keys-dynamically/
At the moment I think its not possible (hopefully that will change in the future). In the past when I ran into this kind of situation I created a new table and copied the data from the old one into it.
from http://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html:
ADD [ COLUMN ] column_name
Adds a column with the specified name to the table. You can add only one column in each ALTER TABLE statement.
You cannot add a column that is the distribution key (DISTKEY) or a sort key (SORTKEY) of the table.
You cannot use an ALTER TABLE ADD COLUMN command to modify the following table and column attributes:
UNIQUE
PRIMARY KEY
REFERENCES (foreign key)
IDENTITY
The maximum column name length is 127 characters; longer names are truncated to 127 characters. The maximum number of columns you can define in a single table is 1,600.
As Yaniv Kessler mentioned, it's not possible to add or change distkey and sort key after creating a table, and you have to recreate a table and copy all data to the new table.
You can use the following SQL format to recreate a table with a new design.
ALTER TABLE test_table RENAME TO old_test_table;
CREATE TABLE new_test_table([new table columns]);
INSERT INTO new_test_table (SELECT * FROM old_test_table);
ALTER TABLE new_test_table RENAME TO test_table;
DROP TABLE old_test_table;
In my experience, this SQL is used for not only changing distkey and sortkey, but also setting the encoding(compression) type.
To add to Yaniv's answer, the ideal way to do this is probably using the CREATE TABLE AS command. You can specify the distkey and sortkey explicitly. I.e.
CREATE TABLE test_table_with_dist
distkey(field)
sortkey(sortfield)
AS
select * from test_table
Additional examples:
http://docs.aws.amazon.com/redshift/latest/dg/r_CTAS_examples.html
EDIT
I've noticed that this method doesn't preserve encoding. Redshift only automatically encodes during a copy statement. If this is a persistent table you should redefine the table and specify the encoding.
create table test_table_with_dist(
field1 varchar encode row distkey
field2 timestam pencode delta sortkey);
insert into test_table select * from test_table;
You can figure out which encoding to use by running analyze compression test_table;
AWS now allows you to add both sortkeys and distkeys without having to recreate tables:
TO add a sortkey (or alter a sortkey):
ALTER TABLE data.engagements_bot_free_raw
ALTER SORTKEY (id)
To alter a distkey or add a distkey:
ALTER TABLE data.engagements_bot_free_raw
ALTER DISTKEY id
Interestingly, the paranthesis are mandatory on SORTKEY, but not on DISTKEY.
You still cannot inplace change the encoding of a table - that still requires the solutions where you must recreate tables.
I followed this approach for adding the sort columns to my table table_transactons its more or less same approach only less number of commands.
alter table table_transactions rename to table_transactions_backup;
create table table_transactions compound sortkey(key1, key2, key3, key4) as select * from table_transactions_backup;
drop table table_transactions_backup;
Catching this query a bit late.
I find that using 1=1 the best way to create and replicate data into another table in redshift
eg:
CREATE TABLE NEWTABLE AS SELECT * FROM OLDTABLE WHERE 1=1;
then you can drop the OLDTABLE after verifying that the data has been copied
(if you replace 1=1 with 1=2, it copies only the structure - which is good for creating staging tables)
it is now possible to alter a sort kay:
Amazon Redshift now supports changing table sort keys dynamically
Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.
Customers when creating Redshift tables can optionally specify one or more table columns as sort keys. The sort keys are used to maintain the sort order of the Redshift tables and allows the query engine to achieve high performance by reducing the amount of data to read from disk and to save on storage with better compression. Currently Redshift customers who desire to change the sort keys after the initial table creation will need to re-create the table with new sort key definitions.
With the new ALTER SORT KEY command, users can dynamically change the Redshift table sort keys as needed. Redshift will take care of adjusting data layout behind the scenes and table remains available for users to query. Users can modify sort keys for a given table as many times as needed and they can alter sort keys for multiple tables simultaneously.
For more information ALTER SORT KEY, please refer to the documentation.
documentation
as for the documentation itself:
ALTER DISTKEY column_name or ALTER DISTSTYLE KEY DISTKEY column_name A
clause that changes the column used as the distribution key of a
table. Consider the following:
VACUUM and ALTER DISTKEY cannot run concurrently on the same table.
If VACUUM is already running, then ALTER DISTKEY returns an error.
If ALTER DISTKEY is running, then background vacuum doesn't start on a table.
If ALTER DISTKEY is running, then foreground vacuum returns an error.
You can only run one ALTER DISTKEY command on a table at a time.
The ALTER DISTKEY command is not supported for tables with interleaved sort keys.
When specifying DISTSTYLE KEY, the data is distributed by the values in the DISTKEY column. For more information about DISTSTYLE, see CREATE TABLE.
ALTER [COMPOUND] SORTKEY ( column_name [,...] ) A clause that changes
or adds the sort key used for a table. Consider the following:
You can define a maximum of 400 columns for a sort key per table.
You can only alter a compound sort key. You can't alter an interleaved sort key.
When data is loaded into a table, the data is loaded in the order of the sort key. When you alter the sort key, Amazon Redshift reorders the data. For more information about SORTKEY, see CREATE TABLE.
According to the updated documentation it is now possible to change a sort key type with:
ALTER [COMPOUND] SORTKEY ( column_name [,...] )
For reference (https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html):
"You can alter an interleaved sort key to a compound sort key or no sort key. However, you can't alter a compound sort key to an interleaved sort key."
ALTER TABLE table_name ALTER SORTKEY (sortKey1, sortKey2 ...etc)

About clustered index in postgres

I'm using psql to access a postgres database. When viewing the metadata of a table, is there any way to see whether an index of a table is a clustered index?
I heard that the PRIMARY KEY of a table is automatically associated with a clustered index, is it true?
Note that PostgreSQL uses the term "clustered index" to use something vaguely similar and yet very different to SQL Server.
If a particular index has been nominated as the clustering index for a table, then psql's \d command will indicate the clustered index, e.g.,
Indexes:
"timezone_description_pkey" PRIMARY KEY, btree (timezone) CLUSTER
PostgreSQL does not nominate indices as clustering indices by default. Nor does it automatically arrange table data to correlate with the clustered index even when so nominated: the CLUSTER command has to be used to reorganise the table data.
In PostgreSQL the clustered attribute is held in the metadata of the corresponding index, rather than the relation itself. It is the indisclustered attribute in pg_index catalogue. Note, however, that clustering relations within postgres is a one-time action: even if the attribute is true, updates to the table do not maintain the sorted nature of the data. To date, automatic maintenance of data clustering remains a popular TODO item.
There is often confusion between clustered and integrated indexes, particularly since the popular textbooks use conflicting names, and the terminology is different again in the manuals of postgres and SQL server (to name just two). When I talk about an integrated index (also called a main index or primary index) I mean one in which the relation data is contained in the leaves of the index, as opposed an external or secondary index in which the leaves contain index entries that point to the table records. The former type is necessarily always clustered. Unfortunately postgres only supports the latter type. Anyhow, the fact that an integrated (primary) index is always clustered may have given rise to the belief that "a PRIMARY KEY of a table is automatically associated with a clustered index". The two statements sound similar, but are different.
PostgreSQL does not have direct implementation of CLUSTER index like Microsoft SQL Server.
Reference Taken from this Blog:
In PostgreSQL, we have one CLUSTER command which is similar to Cluster Index.
Once you create your table primary key or any other Index, you can execute the CLUSTER command by specifying that Index name to achieve the physical order of the Table Data.
When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order.
Syntax of Cluster:
First time you must execute CLUSTER using the Index Name.
CLUSTER table_name USING index_name;
Cluster the table:
Once you have executed CLUSTER with Index, next time you should execute only CLUSTER TABLE because It knows that which index already defined as CLUSTER.
CLUSTER table_name;
is there any way to see whether an index of a table is a clustered index
PostgreSQL does not have a clustered index, so you won't be able to see them.
I heard that the PRIMARY KEY of a table is automatically associated with a clustered index, is it true?
No, that's not true (see above)
You can manually cluster a table along an index, but this is nothing that will be maintained automatically (as e.g. with SQL Server's clustered indexes).
For more details, see the description of the CLUSTER command in the manual.
Cluster Indexing
A cluster index means telling the database to store the close values actually close to one another on the disk. They can uniquely identify the rows in the SQL table. Every table can have exactly one one clustered index. A cluster index can cover more than one column. By default, a column with a primary key already has a clustered index.
dictionary
A dictionary itself is a table with clustered index. Because all the data is physically stored in alphabetical order.
Non-Cluster Indexing
Non-clustered indexing is like simple indexing of a book. They are just used for fast retrieval of data. Not sure to have unique data. A non-clustered index contains the non-clustered index keys and their corresponding data location pointer. For example, a book's content index contains the key of a topic or chapter and the page location of that.
book content index
A book's content table holds the content name and its page location. It is not sure that the data is unique. Because same paragraph or text line or word can be placed many times.
PostgreSQL Indexing
PostgreSQL automatically creates indexes for PRIMARY KEY and every UNIQUE constraints of a table. Login to a database in PostgreSQL terminal and type \d table_name. All stored indexes will be visualized. If there is a clustered index then it will also be identified.
Creating a table
CREATE TABLE IF NOT EXISTS profile(
uid serial NOT NULL UNIQUE PRIMARY KEY,
username varchar(30) NOT NULL UNIQUE,
phone varchar(11) NOT NULL UNIQUE,
age smallint CHECK(age>12),
address text NULL
);
3 index will be created automatically. All these indexes are non clustered
"profile_pkey" PRIMARY KEY, btree (uid)
"profile_phone_key" UNIQUE CONSTRAINT, btree (phone)
"profile_username_key" UNIQUE CONSTRAINT, btree (username)
Create our own index with uid and username
CREATE INDEX profile_index ON profile(uid, username);
This actually creates a non-clustered index. To make it clustered, run the next part.
Transform a non-clustered index into a clustered one
ALTER TABLE profile CLUSTER ON profile_index;
Check the table with \d profile. It will be like this:
Table "public.profile"
Column | Type | Collation | Nullable | Default
----------+-----------------------+-----------+----------+--------------------------------------
uid | integer | | not null | nextval('profile_uid_seq'::regclass)
username | character varying(30) | | not null |
phone | character varying(11) | | not null |
age | smallint | | |
address | text | | |
Indexes:
"profile_pkey" PRIMARY KEY, btree (uid)
"profile_phone_key" UNIQUE CONSTRAINT, btree (phone)
"profile_username_key" UNIQUE CONSTRAINT, btree (username)
"profile_index" btree (uid, username) CLUSTER
Check constraints:
"profile_age_check" CHECK (age > 12)
Notice that the profile_index is now "CLUSTER"
Now, re-cluster the table so that the table can follow the cluster index role
CLUSTER profile;
If you want to know if a given table is CLUSTERed using SQL, you can use the following query to show the index being used (tested in Postgres versions 9.5 and 9.6):
SELECT
i.relname AS index_for_cluster
FROM
pg_index AS idx
JOIN
pg_class AS i
ON
i.oid = idx.indexrelid
WHERE
idx.indisclustered
AND idx.indrelid::regclass = 'your_table_name'::regclass;