Why is this Postgres query throwing "duplicate key value violates unique constraint"? [duplicate] - postgresql

This question already has answers here:
Insert, on duplicate update in PostgreSQL?
(18 answers)
Closed 8 years ago.
I've implemented simple update/insert query like this:
-- NOTE: :time is replaced in real code, ids are placed statically for example purposes
-- set status_id=1 to existing rows, update others
UPDATE account_addresses
SET status_id = 1, updated_at = :time
WHERE account_id = 1
AND address_id IN (1,2,3)
AND status_id IN (2);
-- filter values according to what that update query returns, i.e. construct query like this to insert remaining new records:
INSERT INTO account_addresses (account_id, address_id, status_id, created_at, updated_at)
SELECT account_id, address_id, status_id, created_at::timestamptz, updated_at::timestamptz
FROM (VALUES (1,1,1,:time,:time),(1,2,1,:time,:time)) AS sub(account_id, address_id, status_id, created_at, updated_at)
WHERE NOT EXISTS (
SELECT 1 FROM account_addresses AS aa2
WHERE aa2.account_id = sub.account_id AND aa2.address_id = sub.address_id
)
RETURNING id;
-- throws:
-- PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "..."
-- DETAIL: Key (account_id, address_id)=(1, 1) already exists.
The reason why I'm doing it this way is: the record MAY exist with status_id=2. If so, set status_id=1.
Then insert new records. If it already exists, but was not affected by first UPDATE query, ignore it (i.e. rows with status_id=3).
This works nicely, but doing it concurrently, it crashes on duplicate key in race condition.
But why is race condition occurring, if I'm trying to do that "insert-where-not-exists" atomically?

Ah. I just searched a little more and insert where not exists is not atomic.
Quote from http://www.postgresql.org/message-id/26970.1296761016#sss.pgh.pa.us :
Mage writes:
The main question is that isn't "insert into ... select ... where not
exists" atomic?
No, it isn't: it will fail in the presence of other transactions
doing the same thing, because the EXISTS test will only see rows that
committed before the command started. You might care to read the
manual's chapter about concurrency:
http://www.postgresql.org/docs/9.0/static/mvcc.html

Related

Redshift insert a date value into a table

insert into table1 (ID,date)
select
ID,sysdate
from table2
assume i insert a record into table2 with value ID:1,date:2023-1-1
the expected result is update the ID of table1 base on the ID from table2 and update the value of date of table1 base on the sysdate from table2.
select *
from table1;
the expected result after running the insert statement will be
ID
date
1
2023-1-6
but what i get is:
ID
date
1
2023-1-1
I see a few possibilities based on the information given:
You say "the expected result is update the ID of table1 base on the ID from table2" and this begs the question - did ID = 1 exist in table1 BEFORE you ran the INSERT statement? If so are you expecting that the INSERT will update the value for ID #1? Redshift doesn't enforce or check uniqueness of primary keys and you would get 2 rows in the table1 in this case. Is this what is happening?
SYSDATE on Redshift provides the start timestamp of the current transaction, NOT the current statement. Have you had the current transaction open since the 1st?
You didn't COMMIT the results (or the statement failed) and are checking from a different session. It could also be that the transaction started before in the second session before the COMMIT completed. Working with MVCC across multiple sessions can trip anyone up.
There are likely other possible explanations. If you could provide DDL, sample data, and a simple test case so that others can recreate what you are seeing it would greatly narrow down the possibilities.

Return identifier of conflicted insert in postgres [duplicate]

I have the following UPSERT in PostgreSQL 9.5:
INSERT INTO chats ("user", "contact", "name")
VALUES ($1, $2, $3),
($2, $1, NULL)
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;
If there are no conflicts it returns something like this:
----------
| id |
----------
1 | 50 |
----------
2 | 51 |
----------
But if there are conflicts it doesn't return any rows:
----------
| id |
----------
I want to return the new id columns if there are no conflicts or return the existing id columns of the conflicting columns.
Can this be done? If so, how?
The currently accepted answer seems ok for a single conflict target, few conflicts, small tuples and no triggers. It avoids concurrency issue 1 (see below) with brute force. The simple solution has its appeal, the side effects may be less important.
For all other cases, though, do not update identical rows without need. Even if you see no difference on the surface, there are various side effects:
It might fire triggers that should not be fired.
It write-locks "innocent" rows, possibly incurring costs for concurrent transactions.
It might make the row seem new, though it's old (transaction timestamp).
Most importantly, with PostgreSQL's MVCC model UPDATE writes a new row version for every target row, no matter whether the row data changed. This incurs a performance penalty for the UPSERT itself, table bloat, index bloat, performance penalty for subsequent operations on the table, VACUUM cost. A minor effect for few duplicates, but massive for mostly dupes.
Plus, sometimes it is not practical or even possible to use ON CONFLICT DO UPDATE. The manual:
For ON CONFLICT DO UPDATE, a conflict_target must be provided.
A single "conflict target" is not possible if multiple indexes / constraints are involved. But here is a related solution for multiple partial indexes:
UPSERT based on UNIQUE constraint with NULL values
Back on the topic, you can achieve (almost) the same without empty updates and side effects. Some of the following solutions also work with ON CONFLICT DO NOTHING (no "conflict target"), to catch all possible conflicts that might arise - which may or may not be desirable.
Without concurrent write load
WITH input_rows(usr, contact, name) AS (
VALUES
(text 'foo1', text 'bar1', text 'bob1') -- type casts in first row
, ('foo2', 'bar2', 'bob2')
-- more?
)
, ins AS (
INSERT INTO chats (usr, contact, name)
SELECT * FROM input_rows
ON CONFLICT (usr, contact) DO NOTHING
RETURNING id --, usr, contact -- return more columns?
)
SELECT 'i' AS source -- 'i' for 'inserted'
, id --, usr, contact -- return more columns?
FROM ins
UNION ALL
SELECT 's' AS source -- 's' for 'selected'
, c.id --, usr, contact -- return more columns?
FROM input_rows
JOIN chats c USING (usr, contact); -- columns of unique index
The source column is an optional addition to demonstrate how this works. You may actually need it to tell the difference between both cases (another advantage over empty writes).
The final JOIN chats works because newly inserted rows from an attached data-modifying CTE are not yet visible in the underlying table. (All parts of the same SQL statement see the same snapshots of underlying tables.)
Since the VALUES expression is free-standing (not directly attached to an INSERT) Postgres cannot derive data types from the target columns and you may have to add explicit type casts. The manual:
When VALUES is used in INSERT, the values are all automatically
coerced to the data type of the corresponding destination column. When
it's used in other contexts, it might be necessary to specify the
correct data type. If the entries are all quoted literal constants,
coercing the first is sufficient to determine the assumed type for all.
The query itself (not counting the side effects) may be a bit more expensive for few dupes, due to the overhead of the CTE and the additional SELECT (which should be cheap since the perfect index is there by definition - a unique constraint is implemented with an index).
May be (much) faster for many duplicates. The effective cost of additional writes depends on many factors.
But there are fewer side effects and hidden costs in any case. It's most probably cheaper overall.
Attached sequences are still advanced, since default values are filled in before testing for conflicts.
About CTEs:
Are SELECT type queries the only type that can be nested?
Deduplicate SELECT statements in relational division
With concurrent write load
Assuming default READ COMMITTED transaction isolation. Related:
Concurrent transactions result in race condition with unique constraint on insert
The best strategy to defend against race conditions depends on exact requirements, the number and size of rows in the table and in the UPSERTs, the number of concurrent transactions, the likelihood of conflicts, available resources and other factors ...
Concurrency issue 1
If a concurrent transaction has written to a row which your transaction now tries to UPSERT, your transaction has to wait for the other one to finish.
If the other transaction ends with ROLLBACK (or any error, i.e. automatic ROLLBACK), your transaction can proceed normally. Minor possible side effect: gaps in sequential numbers. But no missing rows.
If the other transaction ends normally (implicit or explicit COMMIT), your INSERT will detect a conflict (the UNIQUE index / constraint is absolute) and DO NOTHING, hence also not return the row. (Also cannot lock the row as demonstrated in concurrency issue 2 below, since it's not visible.) The SELECT sees the same snapshot from the start of the query and also cannot return the yet invisible row.
Any such rows are missing from the result set (even though they exist in the underlying table)!
This may be ok as is. Especially if you are not returning rows like in the example and are satisfied knowing the row is there. If that's not good enough, there are various ways around it.
You can check the row count of the output and repeat the statement if it does not match the row count of the input. May be good enough for the rare case. The point is to start a new query (can be in the same transaction), which will then see the newly committed rows.
Or check for missing result rows within the same query and overwrite those with the brute force trick demonstrated in Alextoni's answer.
WITH input_rows(usr, contact, name) AS ( ... ) -- see above
, ins AS (
INSERT INTO chats AS c (usr, contact, name)
SELECT * FROM input_rows
ON CONFLICT (usr, contact) DO NOTHING
RETURNING id, usr, contact -- we need unique columns for later join
)
, sel AS (
SELECT 'i'::"char" AS source -- 'i' for 'inserted'
, id, usr, contact
FROM ins
UNION ALL
SELECT 's'::"char" AS source -- 's' for 'selected'
, c.id, usr, contact
FROM input_rows
JOIN chats c USING (usr, contact)
)
, ups AS ( -- RARE corner case
INSERT INTO chats AS c (usr, contact, name) -- another UPSERT, not just UPDATE
SELECT i.*
FROM input_rows i
LEFT JOIN sel s USING (usr, contact) -- columns of unique index
WHERE s.usr IS NULL -- missing!
ON CONFLICT (usr, contact) DO UPDATE -- we've asked nicely the 1st time ...
SET name = c.name -- ... this time we overwrite with old value
-- SET name = EXCLUDED.name -- alternatively overwrite with *new* value
RETURNING 'u'::"char" AS source -- 'u' for updated
, id --, usr, contact -- return more columns?
)
SELECT source, id FROM sel
UNION ALL
TABLE ups;
It's like the query above, but we add one more step with the CTE ups, before we return the complete result set. That last CTE will do nothing most of the time. Only if rows go missing from the returned result, we use brute force.
More overhead, yet. The more conflicts with pre-existing rows, the more likely this will outperform the simple approach.
One side effect: the 2nd UPSERT writes rows out of order, so it re-introduces the possibility of deadlocks (see below) if three or more transactions writing to the same rows overlap. If that's a problem, you need a different solution - like repeating the whole statement as mentioned above.
Concurrency issue 2
If concurrent transactions can write to involved columns of affected rows, and you have to make sure the rows you found are still there at a later stage in the same transaction, you can lock existing rows cheaply in the CTE ins (which would otherwise go unlocked) with:
...
ON CONFLICT (usr, contact) DO UPDATE
SET name = name WHERE FALSE -- never executed, but still locks the row
...
And add a locking clause to the SELECT as well, like FOR UPDATE.
This makes competing write operations wait till the end of the transaction, when all locks are released. So be brief.
More details and explanation:
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
Is SELECT or INSERT in a function prone to race conditions?
Deadlocks?
Defend against deadlocks by inserting rows in consistent order. See:
Deadlock with multi-row INSERTs despite ON CONFLICT DO NOTHING
Data types and casts
Existing table as template for data types ...
Explicit type casts for the first row of data in the free-standing VALUES expression may be inconvenient. There are ways around it. You can use any existing relation (table, view, ...) as row template. The target table is the obvious choice for the use case. Input data is coerced to appropriate types automatically, like in the VALUES clause of an INSERT:
WITH input_rows AS (
(SELECT usr, contact, name FROM chats LIMIT 0) -- only copies column names and types
UNION ALL
VALUES
('foo1', 'bar1', 'bob1') -- no type casts here
, ('foo2', 'bar2', 'bob2')
)
...
This does not work for some data types. See:
Casting NULL type when updating multiple rows
... and names
This also works for all data types.
While inserting into all (leading) columns of the table, you can omit column names. Assuming table chats in the example only consists of the 3 columns used in the UPSERT:
WITH input_rows AS (
SELECT * FROM (
VALUES
((NULL::chats).*) -- copies whole row definition
('foo1', 'bar1', 'bob1') -- no type casts needed
, ('foo2', 'bar2', 'bob2')
) sub
OFFSET 1
)
...
Aside: don't use reserved words like "user" as identifier. That's a loaded footgun. Use legal, lower-case, unquoted identifiers. I replaced it with usr.
I had exactly the same problem, and I solved it using 'do update' instead of 'do nothing', even though I had nothing to update. In your case it would be something like this:
INSERT INTO chats ("user", "contact", "name")
VALUES ($1, $2, $3),
($2, $1, NULL)
ON CONFLICT("user", "contact")
DO UPDATE SET
name=EXCLUDED.name
RETURNING id;
This query will return all the rows, regardless they have just been inserted or they existed before.
WITH e AS(
INSERT INTO chats ("user", "contact", "name")
VALUES ($1, $2, $3),
($2, $1, NULL)
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id
)
SELECT * FROM e
UNION
SELECT id FROM chats WHERE user=$1, contact=$2;
The main purpose of using ON CONFLICT DO NOTHING is to avoid throwing error, but it will cause no row returns. So we need another SELECT to get the existing id.
In this SQL, if it fails on conflicts, it will return nothing, then the second SELECT will get the existing row; if it inserts successfully, then there will be two same records, then we need UNION to merge the result.
Upsert, being an extension of the INSERT query can be defined with two different behaviors in case of a constraint conflict: DO NOTHING or DO UPDATE.
INSERT INTO upsert_table VALUES (2, 6, 'upserted')
ON CONFLICT DO NOTHING RETURNING *;
id | sub_id | status
----+--------+--------
(0 rows)
Note as well that RETURNING returns nothing, because no tuples have been inserted. Now with DO UPDATE, it is possible to perform operations on the tuple there is a conflict with. First note that it is important to define a constraint which will be used to define that there is a conflict.
INSERT INTO upsert_table VALUES (2, 2, 'inserted')
ON CONFLICT ON CONSTRAINT upsert_table_sub_id_key
DO UPDATE SET status = 'upserted' RETURNING *;
id | sub_id | status
----+--------+----------
2 | 2 | upserted
(1 row)
For insertions of a single item, I would probably use a coalesce when returning the id:
WITH new_chats AS (
INSERT INTO chats ("user", "contact", "name")
VALUES ($1, $2, $3)
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id
) SELECT COALESCE(
(SELECT id FROM new_chats),
(SELECT id FROM chats WHERE user = $1 AND contact = $2)
);
For insertions of multiples items, you can put the values on a temporary WITH and reference them later:
WITH chats_values("user", "contact", "name") AS (
VALUES ($1, $2, $3),
($4, $5, $6)
), new_chats AS (
INSERT INTO chats ("user", "contact", "name")
SELECT * FROM chat_values
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id
) SELECT id
FROM new_chats
UNION
SELECT chats.id
FROM chats, chats_values
WHERE chats.user = chats_values.user
AND chats.contact = chats_values.contact;
Note: as per Erwin's comment, on the off-chance your application will try to 'upsert' the same data concurrently (two workers trying to insert <unique_field> = 1 at the same time), and such data does not exist on the table yet, you should change the isolation level of your transaction before running the 'upsert':
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
On that specific case, one of the two transactions will be aborted. If this case happens a lot on your application, you might want to just do 2 separate queries, otherwise, handling the error and re-doing the query is just easier and faster.
Building on Erwin's answer above (terrific answer btw, never would have gotten here without it!), this is where I ended up. It solves a couple additional potential problems - it allows for duplicates (which would otherwise throw an error) by doing a select distinct on the input set, and it ensures that the returned IDs exactly match the input set, including the same order and allowing duplicates.
Additionally, and one part that was important for me, it significantly reduces the number of unnecessary sequence advancements using the new_rows CTE to only try inserting the ones that aren't already in there. Considering the possibility of concurrent writes, it will still hit some conflicts in that reduced set, but the later steps will take care of that. In most cases, sequence gaps aren't a big deal, but when you're doing billions of upserts, with a high percentage of conflicts, it can make the difference between using an int or a bigint for the ID.
Despite being big and ugly, it performs extremely well. I tested it extensively with millions of upserts, high concurrency, high numbers of collisions. Rock solid.
I've packaged it as a function, but if that's not what you want it should be easy to see how to translate to pure SQL. I've also changed the example data to something simple.
CREATE TABLE foo
(
bar varchar PRIMARY KEY,
id serial
);
CREATE TYPE ids_type AS (id integer);
CREATE TYPE bars_type AS (bar varchar);
CREATE OR REPLACE FUNCTION upsert_foobars(_vals bars_type[])
RETURNS SETOF ids_type AS
$$
BEGIN
RETURN QUERY
WITH
all_rows AS (
SELECT bar, ordinality
FROM UNNEST(_vals) WITH ORDINALITY
),
dist_rows AS (
SELECT DISTINCT bar
FROM all_rows
),
new_rows AS (
SELECT d.bar
FROM dist_rows d
LEFT JOIN foo f USING (bar)
WHERE f.bar IS NULL
),
ins AS (
INSERT INTO foo (bar)
SELECT bar
FROM new_rows
ORDER BY bar
ON CONFLICT DO NOTHING
RETURNING bar, id
),
sel AS (
SELECT bar, id
FROM ins
UNION ALL
SELECT f.bar, f.id
FROM dist_rows
JOIN foo f USING (bar)
),
ups AS (
INSERT INTO foo AS f (bar)
SELECT d.bar
FROM dist_rows d
LEFT JOIN sel s USING (bar)
WHERE s.bar IS NULL
ORDER BY bar
ON CONFLICT ON CONSTRAINT foo_pkey DO UPDATE
SET bar = f.bar
RETURNING bar, id
),
fin AS (
SELECT bar, id
FROM sel
UNION ALL
TABLE ups
)
SELECT f.id
FROM all_rows a
JOIN fin f USING (bar)
ORDER BY a.ordinality;
END
$$ LANGUAGE plpgsql;
If all you want is to upsert a single row
Then you can simplify things rather significantly by using a simple EXISTS check:
WITH
extant AS (
SELECT id FROM chats WHERE ("user", "contact") = ($1, $2)
),
inserted AS (
INSERT INTO chats ("user", "contact", "name")
SELECT $1, $2, $3
WHERE NOT EXISTS (SELECT NULL FROM extant)
RETURNING id
)
SELECT id FROM inserted
UNION ALL
SELECT id FROM extant
Since there is no ON CONFLICT clause, there is no update – only an insert, and only if necessary. So no unnecessary updates, no unnecessary write locks, no unnecessary sequence increments. No casts required either.
If the write lock was a feature in your use case, you can use SELECT FOR UPDATE in the extant expression.
And if you need to know whether a new row was inserted, you can add a flag column in the top-level UNION:
SELECT id, TRUE AS inserted FROM inserted
UNION ALL
SELECT id, FALSE FROM extant
The simplest, most performant solution is
BEGIN;
INSERT INTO chats ("user", contact, name)
VALUES ($1, $2, $3), ($2, $1, NULL)
ON CONFLICT ("user", contact) DO UPDATE
SET name = excluded.name
WHERE false
RETURNING id;
SELECT id
FROM chats
WHERE (user, contact) IN (($1, $2), ($2, $1));
COMMIT;
The DO UPDATE WHERE false locks but does not update the row, which is a feature, not a bug, since it ensures that another transaction cannot delete the row.
Some comments have want to distinguish between updated and created rows.
In that case, simply add txid_current() = xmin AS created to the select.
I modified the amazing answer by Erwin Brandstetter, which won't increment the sequence, and also won't write-lock any rows. I'm relatively new to PostgreSQL, so please feel free to let me know if you see any drawbacks to this method:
WITH input_rows(usr, contact, name) AS (
VALUES
(text 'foo1', text 'bar1', text 'bob1') -- type casts in first row
, ('foo2', 'bar2', 'bob2')
-- more?
)
, new_rows AS (
SELECT
c.usr
, c.contact
, c.name
, r.id IS NOT NULL as row_exists
FROM input_rows AS r
LEFT JOIN chats AS c ON r.usr=c.usr AND r.contact=c.contact
)
INSERT INTO chats (usr, contact, name)
SELECT usr, contact, name
FROM new_rows
WHERE NOT row_exists
RETURNING id, usr, contact, name
This assumes that the table chats has a unique constraint on columns (usr, contact).
Update: added the suggested revisions from spatar (below). Thanks!
Yet another update, per Revinand comment:
WITH input_rows(usr, contact, name) AS (
VALUES
(text 'foo1', text 'bar1', text 'bob1') -- type casts in first row
, ('foo2', 'bar2', 'bob2')
-- more?
)
, new_rows AS (
INSERT INTO chats (usr, contact, name)
SELECT
c.usr
, c.contact
, c.name
FROM input_rows AS r
LEFT JOIN chats AS c ON r.usr=c.usr AND r.contact=c.contact
WHERE r.id IS NULL
RETURNING id, usr, contact, name
)
SELECT id, usr, contact, name, 'new' as row_type
FROM new_rows
UNION ALL
SELECT id, usr, contact, name, 'update' as row_type
FROM input_rows AS ir
INNER JOIN chats AS c ON ir.usr=c.usr AND ir.contact=c.contact
I haven't tested the above, but if you're finding that the newly inserted rows are being returned multiple times, then you can either change the UNION ALL to just UNION, or (better), just remove the first query, altogether.

Cap on number of table rows matching a certain condition using Postgres exclusion constraint?

If I have a Postgresql db schema for a tires table like this (a user has many tires):
user_id integer
description text
size integer
color text
created_at timestamp
and I want to enforce a constraint that says "a user can only have 4 tires".
A naive way to implement this would be to do:
SELECT COUNT(*) FROM tires WHERE user_id='123'
, compare the result to 4 and insert if it's lower than 4. It's suspectible to race conditions and so is a naive approach.
I don't want to add a count column. How can I do this (or can I do this) using an exclusion constraint? If it's not possible with exclusion constraints, what is the canonical way?
The "right" way is using locks here.
Firstly, you need to lock related rows. Secondly, insert new record if a user has less than 4 tires. Here is the SQL:
begin;
-- lock rows
select count(*)
from tires
where user_id = 123
for update;
-- throw an error of there are more than 3
-- insert new row
insert into tires (user_id, description)
values (123, 'tire123');
commit;
You can read more here How to perform conditional insert based on row count?

Primary key duplicate in a table-valued parameter in stored procedure

I am using following code to insert date by Table Valued Parameter in my SP. Actually it works when one record exists in my TVP but when it has more than one record it raises the following error :
'Violation of Primary key constraint 'PK_ReceivedCash''. Cannot insert duplicate key in object 'Banking.ReceivedCash'. The statement has been terminated.
insert into banking.receivedcash(ReceivedCashID,Date,Time)
select (select isnull(Max(ReceivedCashID),0)+1 from Banking.ReceivedCash),t.Date,t.Time from #TVPCash as t
Your query is indeed flawed if there is more than one row in #TVPCash. The query to retrieve the maximum ReceivedCashID is a constant, which is then used for each row in #TVPCash to insert into Banking.ReceivedCash.
I strongly suggest finding alternatives rather than doing it this way. Multiple users might run this query and retrieve the same maximum. If you insist on keeping the query as it is, try running the following:
insert into banking.receivedcash(
ReceivedCashID,
Date,
Time
)
select
(select isnull(Max(ReceivedCashID),0) from Banking.ReceivedCash)+
ROW_NUMBER() OVER(ORDER BY t.Date,t.Time),
t.Date,
t.Time
from
#TVPCash as t
This uses ROW_NUMBER to count the row number in #TVPCash and adds this to the maximum ReceivedCashID of Banking.ReceivedCash.

Compact or renumber IDs for all tables, and reset sequences to max(id)?

After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.
I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.
But it's not easy to compact IDs by hand because of references and constraints.
Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?
Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.
The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:
Compacting a sequence in PostgreSQL
The currently accepted answer will fail for most cases.
Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.
The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:
BEGIN;
LOCK tbl;
-- remove all FK constraints to the column
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
-- for the simple case without FK references - or see below:
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
-- Update referencing value in FK columns at the same time (if any)
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
-- add all FK constraints to the column back
COMMIT;
This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.
If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.
Example for a table fk_tbl and a FK column fk_id:
WITH u1 AS (
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id
RETURNING t.id, t1.new_id -- return old and new ID
)
UPDATE fk_tbl f
SET fk_id = u1.new_id -- set to new ID
FROM u1
WHERE f.fk_id = u1.id; -- match on old ID
More in the referenced answer on dba.SE.
Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.
CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.
For example:
Create the table, put some data in, and remove a middle value:
db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
4 | are
5 | you
(4 rows)
Reset your sequence:
db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE
Update your data:
db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
3 | are
4 | you
(4 rows)
new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)
\i tmp.sql
-- the test tables
CREATE TABLE one (
id serial NOT NULL PRIMARY KEY
, payload text
);
CREATE TABLE two (
id serial NOT NULL PRIMARY KEY
, the_fk INTEGER REFERENCES one(id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);
-- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;
INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;
-- make some gaps
DELETE FROM one WHERE id % 13 > 0;
-- SELECT * FROM two;
-- Add new keycolumns to one and two
ALTER TABLE one
ADD COLUMN new_id SERIAL NOT NULL UNIQUE
;
-- UPDATE:
-- This could need DEFERRABLE
-- Note since the update is only a permutation of the
-- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;
ALTER TABLE two
ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
;
-- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
;
SELECT * FROM two;
-- The crucial part: the final renaming
-- (at this point it would be better not to allow other sessions
-- messing with the {one,two} tables ...
-- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);
ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);
-- Some checks.
-- (the automatically generated names for the indexes
-- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two
UPDATE: added the permutation of new_id (after creating it as a serial)
Funny thing is: it doesn't seem to need 'DEFERRABLE'.
*This script will work for postgresql
This is a generic solution that works for all cases
This query find the desciption of the fields of all tables from any database.
WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from
( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
FROM information_schema.columns
) colum
full join (SELECT schemaname, relid, relname,objoid, objsubid, description
FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid ) descre
on descre.relname = colum.table_name and descre.objsubid=colum.ordinal_position and descre.schemaname=colum.schemaname )
This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).
It finds the number of records of the table and then increment this number by one.
SELECT table_name, column_name, ordinal_position,column_default,
data_type, is_nullable, character_maximum_length, is_updatable,
description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''', (select max( '||column_name ||')+1 from '|| table_name ||' ), true);' as req
FROM description_bd where column_default like '%nextva%'
Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job.
It is called like this :
=> SELECT resequence('port','id','port_id_seq');
resequence
--------------
5090 -> 3919
Takes 3 parameters
name of table
name of column that is SERIAL
name of sequence that the SERIAL uses
The function returns a short report of what it has done, with the previous value of the sequence and the new value.
The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.
The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.
The code :
CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE
_old BIGINT;_new BIGINT := 0;
BEGIN
FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
_new=_new+1;
EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
END LOOP;
RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;