duplicate key in a PostgreSQL index - postgresql

I want to move my OwnCloud database to a new server, but the operation fails during restore.
pg_restore: [archive program (db)] COPY failed for table "oc_storages": ERROR: value of a duplicate key breaks unique constraint "storages_id_index"
DETAIL: The key "(id) = (local :: / var / www / owncloud_data /)" already exists.
Indeed, a simple query on the oc_sorages database shows that there is a duplicate.
ocl=# select * from oc_storages where id ~* 'owncloud_data';
id | numeric_id | available | last_checked
--------------------------------+------------+-----------+--------------
local::/var/www/owncloud_data/ | 491 | 1 |
local::/var/www/owncloud_data/ | 838 | 1 |
(2 rows)
but at the same time, postgresql managed to create an index for this table based on the id (storages_id_index). How is it possible that PostgreSQL accepts this duplicate in this table?
ocl=# SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'oc_storages';
indexname | indexdef
-------------------+-------------------------------------------------------------------------------------
oc_storages_pkey | CREATE UNIQUE INDEX oc_storages_pkey ON public.oc_storages USING btree (numeric_id)
storages_id_index | CREATE UNIQUE INDEX storages_id_index ON public.oc_storages USING btree (id)
(2 rows)
What to do to get out of this impasse: delete one of the two values? which ?
Thanks in advance.
Ernest.

There are usually two explanation for this:
Hardware problems leading to data corruption. Then remove conflicting rows manually, export the database and import it into a newly created cluster to get rid of potential lurking data corruption.
You upgraded the C library on the operating system and the collations changed, corrupting the index. Then remove conflicting rows manually and REINDEX the indexes with string columns.

This is one of those semantic annoyances I have with Postgres, but creating a UNIQUE INDEX on a table does not actually add an enforced table constraint.
You need to explicitly add each constraint USING the created index, e.g.:
CREATE UNIQUE INDEX oc_storages_pkey ON public.oc_storages USING btree (numeric_id);
ALTER TABLE public.oc_storages ADD CONSTRAINT oc_storages_pkey UNIQUE USING INDEX oc_storages_pkey;
If you do have such a table constraint already, then this would be a case of corruption.

Related

Delete duplicate entries in Postgresql

I have users table which has multiple same user entries and I need to delete duplicate
entries.How to skip foreign key related entries and delete remaining entries. For example below the entries I have in table.I need to delete duplicate entries which is not related to foreign keys.Could any one please guide how to proceed with this in Postgresql?
id name email role_id
2512 |Raja (Contractor) | raja_test#test.com|5 |
6 |Raja (Contractor) | raja_test#test.com|5 |
5 |Raja (Contractor) | raja_test#test.com|5 |
I have tried below query
delete from users a using users b where a.email=b.email ;
ERROR: update or delete on table "users" violates foreign key constraint "fk_rails_c5e2af0763" on table "devices"
DETAIL: Key (id)=(14) is still referenced from table "devices".
Devices table
id | mac_address | model | user_id
14 | 14:5E:BE:26 |Arris | 6
You can use:
ALTER TABLE users disable TRIGGER ALL;
-- your delete query
ALTER TABLE users enable TRIGGER ALL;
When we use disable trigger all in PostgreSQL, hidden triggers, foreign-key controls, and other constraints for the selected table are also disabled

ERROR: cannot create a unique index without the column "date_time" (used in partitioning)

I just started using timescaleDB with postgresql. I have a database named storage_db which contains a table named day_ahead_prices.
After installing timescaledb, I was following Migrate from the same postgresql database to migrate my storage_db into a timescaledb.
When I did (indexes included):
CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES);
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
It gave me the following error:
ERROR: cannot create a unique index without the column "date_time" (used in partitioning)
But when I did (indexed excluded):
CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
It was successful. Following which, I did
select create_hypertable('tsdb_day_ahead_prices', 'date_time');
and it gave me the following output:
create_hypertable
------------------------------------
(3,public,tsdb_day_ahead_prices,t)
(1 row)
I am a bit new to this so can anyone please explain to me what is the difference between both of them and why was I getting an error in the first case?
P.S.:
My day_ahead_prices looks as follows:
id | country_code | values | date_time
----+--------------+---------+----------------------------
1 | LU | 100.503 | 2020-04-11 14:04:30.461605
2 | LU | 100.503 | 2020-04-11 14:18:39.600574
3 | DE | 106.68 | 2020-04-11 15:59:10.223965
Edit 1:
I created the day_ahead_prices table in python using flask and flask_sqlalchemy and the code is:
class day_ahead_prices(db.Model):
__tablename__ = "day_ahead_prices"
id = db.Column(db.Integer, primary_key=True)
country_code = db.Column(avail_cc_enum, nullable=False)
values = db.Column(db.Float(precision=2), nullable=False)
date_time = db.Column(db.DateTime, default=datetime.now(tz=tz), nullable=False)
def __init__(self, country_code, values):
self.country_code = country_code
self.values = values
When executing CREATE TABLE tsdb_day_ahead_prices (LIKE day_ahead_prices INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES); you're telling the database to create the tsdb_day_ahead_prices table using the day_ahead_prices as a template (same columns, same types for those columns), but you're also telling it to include the default values, constraints and indexes that you have defined on the original table, and apply/create the same for your new table.
Then you are executing the timescaledb command that makes the tsdb_day_ahead_prices table
a hypertable. A hypertable is an abstraction that hides away the partitioning of the physical
table. (https://www.timescale.com/products/how-it-works). You are telling
TimescaleDB to make the tsdb_day_ahead_prices a hypertable using the date_time column as a partitioning key.
When creating hypertables, one constraing that TimescaleDB imposes is that the partitioning column (in your case 'date_time') must be included in any unique indexes (and Primary Keys) for that table. (https://docs.timescale.com/latest/using-timescaledb/schema-management#indexing-best-practices)
The first error you get cannot create a unique index without the column "date_time" is exactly because of this. You copied the primary key definition on the id column. So the primary key is preventing
the table to be a hypertable.
The second time, you created the tsdb_day_ahead_prices table but you didn't copy
the indexes from the original table, so the primary key is not defined (which is really a unique index). So the creation of the hypertable was successfull.
The output you get from the create_hypertable function tells you that you have a new hypertable, in the public schema, the name of the hypertable, and the internal id that timescaledb uses for it.
So now you can use the tsdb_day_ahead_prices as normal, and timescaledb underneath will make sure the data goes into the proper partitions/chunks
Does the id need to be unique for this table?
If you're going to be keeping time-series data
then each row may not really be unique for each id, but may be uniquely identified by the id at a given time.
You can create a separate table for the items that you're identifying
items(id PRIMARY KEY, country_code) and have the hypertable be
day_ahead_prices(time, value, item_id REFERENCES items(id))

Index creation taking forever on postgres

I was trying to create an index to an integer column using a btree index, but it was taking forever (more than 2 hours!).
The table has 17.514.879 lines. I didn't expect it to take that long.
After almost 2.5 hours, the connection to the database just died. When I reconnected to it, the index was there, but I don't know how good is this index.
How can I be sure the index was not messed up by the connection lost?
How to Check Whether the Index is Fine
Connect to the database via psql and run \d table_name (where table_name is the name of your table). For example:
grn=# \d users
Table "public.users"
Column | Type | Modifiers
--------+------------------------+-----------
name | character varying(255) |
Indexes:
"users_name_idx" btree (name)
You'll see indexes listed below the table schema. If the index is corrupt it'll be marked as so.
How to Create Indexes Without Locking the Whole Table
You can create an index in a way that doesn't lock the whole table but is even slower. To do so you need to add CONCURRENTLY to CREATE INDEX. For example:
CREATE INDEX CONCURRENTLY users_name_idx ON users(name);
How to Fix a Corrupt Index
If the index is corrupt you can either drop it and recreate CONCURRENTLY or use REINDEX INDEX index_name. For example:
REINDEX INDEX users_name_idx
will recreate the users_name_idx.

At what level do Postgres index names need to be unique?

In Microsoft SQL Server and MySQL, index names need to unique within the table, but not within the database. This doesn't seem to be the case for PostgreSQL.
Here's what I'm doing: I made a copy of a table using CREATE TABLE new_table AS SELECT * FROM old_table etc and need to re-create the indexes.
Running a query like CREATE INDEX idx_column_name ON new_table USING GIST(column_name) causes ERROR: relation "idx_column_name" already exists
What's going on here?
Indexes and tables (and views, and sequences, and...) are stored in the pg_class catalog, and they're unique per schema due to a unique key on it:
# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
----------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
...
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
Per #wildplasser's comment, you can omit the name when creating the index, and PG will assign a unique name automatically.
Names are unique within the schema. A schema is basically a namespace for {tables,constraints}, (and indexes, functions,etc).
cross-schema-constraints are allowed
Indexes share their namespace ( :=schema) with tables. (for Postgres: an index is a table).
(IIRC) the SQL standard does not define indexes; use constraints whenever you can (The GIST index in the question is probably an exception)
Ergo You'll need to invent another name.
or omit it: the system can invent a name if you dont supply one.
The downside of this: you can create multipe indices with the same definition (their names will be suffixed with _1, _2, IIRC)

About clustered index in postgres

I'm using psql to access a postgres database. When viewing the metadata of a table, is there any way to see whether an index of a table is a clustered index?
I heard that the PRIMARY KEY of a table is automatically associated with a clustered index, is it true?
Note that PostgreSQL uses the term "clustered index" to use something vaguely similar and yet very different to SQL Server.
If a particular index has been nominated as the clustering index for a table, then psql's \d command will indicate the clustered index, e.g.,
Indexes:
"timezone_description_pkey" PRIMARY KEY, btree (timezone) CLUSTER
PostgreSQL does not nominate indices as clustering indices by default. Nor does it automatically arrange table data to correlate with the clustered index even when so nominated: the CLUSTER command has to be used to reorganise the table data.
In PostgreSQL the clustered attribute is held in the metadata of the corresponding index, rather than the relation itself. It is the indisclustered attribute in pg_index catalogue. Note, however, that clustering relations within postgres is a one-time action: even if the attribute is true, updates to the table do not maintain the sorted nature of the data. To date, automatic maintenance of data clustering remains a popular TODO item.
There is often confusion between clustered and integrated indexes, particularly since the popular textbooks use conflicting names, and the terminology is different again in the manuals of postgres and SQL server (to name just two). When I talk about an integrated index (also called a main index or primary index) I mean one in which the relation data is contained in the leaves of the index, as opposed an external or secondary index in which the leaves contain index entries that point to the table records. The former type is necessarily always clustered. Unfortunately postgres only supports the latter type. Anyhow, the fact that an integrated (primary) index is always clustered may have given rise to the belief that "a PRIMARY KEY of a table is automatically associated with a clustered index". The two statements sound similar, but are different.
PostgreSQL does not have direct implementation of CLUSTER index like Microsoft SQL Server.
Reference Taken from this Blog:
In PostgreSQL, we have one CLUSTER command which is similar to Cluster Index.
Once you create your table primary key or any other Index, you can execute the CLUSTER command by specifying that Index name to achieve the physical order of the Table Data.
When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order.
Syntax of Cluster:
First time you must execute CLUSTER using the Index Name.
CLUSTER table_name USING index_name;
Cluster the table:
Once you have executed CLUSTER with Index, next time you should execute only CLUSTER TABLE because It knows that which index already defined as CLUSTER.
CLUSTER table_name;
is there any way to see whether an index of a table is a clustered index
PostgreSQL does not have a clustered index, so you won't be able to see them.
I heard that the PRIMARY KEY of a table is automatically associated with a clustered index, is it true?
No, that's not true (see above)
You can manually cluster a table along an index, but this is nothing that will be maintained automatically (as e.g. with SQL Server's clustered indexes).
For more details, see the description of the CLUSTER command in the manual.
Cluster Indexing
A cluster index means telling the database to store the close values actually close to one another on the disk. They can uniquely identify the rows in the SQL table. Every table can have exactly one one clustered index. A cluster index can cover more than one column. By default, a column with a primary key already has a clustered index.
dictionary
A dictionary itself is a table with clustered index. Because all the data is physically stored in alphabetical order.
Non-Cluster Indexing
Non-clustered indexing is like simple indexing of a book. They are just used for fast retrieval of data. Not sure to have unique data. A non-clustered index contains the non-clustered index keys and their corresponding data location pointer. For example, a book's content index contains the key of a topic or chapter and the page location of that.
book content index
A book's content table holds the content name and its page location. It is not sure that the data is unique. Because same paragraph or text line or word can be placed many times.
PostgreSQL Indexing
PostgreSQL automatically creates indexes for PRIMARY KEY and every UNIQUE constraints of a table. Login to a database in PostgreSQL terminal and type \d table_name. All stored indexes will be visualized. If there is a clustered index then it will also be identified.
Creating a table
CREATE TABLE IF NOT EXISTS profile(
uid serial NOT NULL UNIQUE PRIMARY KEY,
username varchar(30) NOT NULL UNIQUE,
phone varchar(11) NOT NULL UNIQUE,
age smallint CHECK(age>12),
address text NULL
);
3 index will be created automatically. All these indexes are non clustered
"profile_pkey" PRIMARY KEY, btree (uid)
"profile_phone_key" UNIQUE CONSTRAINT, btree (phone)
"profile_username_key" UNIQUE CONSTRAINT, btree (username)
Create our own index with uid and username
CREATE INDEX profile_index ON profile(uid, username);
This actually creates a non-clustered index. To make it clustered, run the next part.
Transform a non-clustered index into a clustered one
ALTER TABLE profile CLUSTER ON profile_index;
Check the table with \d profile. It will be like this:
Table "public.profile"
Column | Type | Collation | Nullable | Default
----------+-----------------------+-----------+----------+--------------------------------------
uid | integer | | not null | nextval('profile_uid_seq'::regclass)
username | character varying(30) | | not null |
phone | character varying(11) | | not null |
age | smallint | | |
address | text | | |
Indexes:
"profile_pkey" PRIMARY KEY, btree (uid)
"profile_phone_key" UNIQUE CONSTRAINT, btree (phone)
"profile_username_key" UNIQUE CONSTRAINT, btree (username)
"profile_index" btree (uid, username) CLUSTER
Check constraints:
"profile_age_check" CHECK (age > 12)
Notice that the profile_index is now "CLUSTER"
Now, re-cluster the table so that the table can follow the cluster index role
CLUSTER profile;
If you want to know if a given table is CLUSTERed using SQL, you can use the following query to show the index being used (tested in Postgres versions 9.5 and 9.6):
SELECT
i.relname AS index_for_cluster
FROM
pg_index AS idx
JOIN
pg_class AS i
ON
i.oid = idx.indexrelid
WHERE
idx.indisclustered
AND idx.indrelid::regclass = 'your_table_name'::regclass;