A bug in PostgreSQL suppress_redundant_updates_trigger? - postgresql

I was working on a set of triggers in PostgreSQL and I think I stumbled on a bug on the built-in function/trigger suppress_redundant_updates_trigger(). It's fully reproducible on my configuration (PostgreSQL 12 on my laptop).
First I set up a table, with two "before each row" triggers:
CREATE TABLE test (id int, val text);
INSERT INTO test VALUES (1, 'one'), (2, 'two');
CREATE OR REPLACE FUNCTION am_i_touched() RETURNS trigger LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
RAISE NOTICE 'Yes, I am touched!';
RETURN NEW;
END;
$BODY$;
CREATE TRIGGER az_test_suppress_redundant_update
BEFORE UPDATE ON public.test
FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();
-- Make sure trigger name is after the previous one
-- in alphabetical order as it drives execution order
CREATE TRIGGER bz_am_I_touched
BEFORE UPDATE ON public.test
FOR EACH ROW EXECUTE PROCEDURE am_i_touched();
I then run UPDATE test SET id = 1 WHERE id = 1. As expected, the update is suppressed by the first trigger since the row is left unchanged, and bz_am_i_touched() never fires. So far so good.
But then I run:
ALTER TABLE test ADD COLUMN newcol int
Now, I run again UPDATE test SET id = 1 WHERE id = 1... And this time, the update is NOT suppressed and bz_am_i_touched() fires! PGAdmin (v4) reports that one record was updated, not zero like the time before!
This is a one-off occurrence. Further UPDATE test SET id = 1 WHERE id = 1 work as expected... But then I tried UPDATE test SET id = 2 WHERE id = 2... and again I have this strange behavior - the update is not suppressed.
Is that an expected behavior? I can't understand how UPDATE test SET id = 1 WHERE id = 1 can result in the update not being suppressed.

The way the newcol NULL value is represented is different between the new tuple and the old tuple. So they are not considered to be the same, and so the update is not suppressed.
The tuples are compared in total with memcmp, so differences in even user-invisible bytes will be found significant. It doesn't loop through each field making individual type-dependent decisions about what differences are semantically meaningful. This seems to be intentional, for speed and simplicity. I doubt it would be considered a bug.

Related

Create Trigger For Update another row automatically on same table using postgresql

I want to create a trigger that can update another row on the same table on PostgreSQL.
if i run the query like these:
UPDATE tokens
SET amount = (SELECT amount FROM tokens WHERE id = 1)
where id = 2
these result that i expected.
description:
i want to set field amount on a row with id:2, where the amount value is from query result on a subquery, so the amount value on id:2 is same with id:1
Hopefully, with this created trigger, i can do update amount value on id=1 so the amount value on id:2 is same with id:1
Before update result:
id | amount|
1 | 200 |
2 | 200 |
When i update the amount value on id:1 to 100 on, so the amount value on id:2 become 100
After update result:
id | amount|
1 | 100 |
2 | 100 |
Update for my temporary solution:
i just create the UDF like these
CREATE FUNCTION update_amount(id_ops integer, id_mir integer) returns boolean LANGUAGE plpgsql AS $$
BEGIN
UPDATE tokens SET amount = (SELECT amount FROM tokens WHERE id = id_ops) WHERE id = id_mir;
RETURN 1;
END;
$$;
description:
id_ops: id where the amount i always update
id_mir: id where the amount automatically update after i update the amount with id_ops
Example of using my written UDF to resolve my problem:
I update the amount of id: 1 to 2000. The amount of id: 2 not updated to 2000
When i run query select update_amount(1,2);
The amount of id: 2 will same with amount on id: 1
I Need a trigger on PostgreSQL to automate or replace the function of UDF that i wrote
What you want to do it not really that difficult, I'll show you. But first: This is a very very bad idea. In fact bad enough that some databases, most notable Oracle, throw an exception if try it. Unfortunately Postgres allows it. You essentially create a recursive update as you are updating the table that initiated the trigger. This update in turn initiates the trigger. Without logic to stop this recursion you could update every row in the table.
I assume this is an extract for a much larger requirement, or perhaps you just want to know how to create a trigger. So we begin:
-- setup
drop table if exists tokens;
create table tokens( id integer, amount numeric(6,2));
-- create initial test data
insert into tokens(id, amount)
values (1,100), (2,150.69), (3,95.50), (4,75), (5,16.40);
Now the heart Postgres trigger: the trigger function, and the trigger. Note the function must be defined prior to the trigger which calls it.
-- create a trigger function: That is a function returning trigger.
create or replace function tokens_bur_func()
returns trigger
language plpgsql
as $$
begin
if new.id = 1
then
update tokens
set amount = new.amount
where id = 2;
end if;
return new;
end ;
$$;
-- create the trigger
create trigger tokens_bur
before update of amount
on tokens
for each row execute procedure tokens_bur_func();
--- test
select *
from tokens
order by id;
-- do an initial update
update tokens
set amount = 200
where id = 1;
-- Query returned successfully: one row affected, 31 msec execution time.
-- 1 row? Yes: DML count does not see change made from within trigger.
-- but
select *
from tokens
order by id;
Hard coding ids in a trigger however is not very functional after all "update ... where id in (1,2)" would be much easier, and safer as it does not require the recursion stop logic. So a slightly more generic trigger function is:
-- More general but still vastly limited:
-- trigger that mirrors subsequent row whenever an odd id is updated.
create or replace function tokens_bur_func()
returns trigger
language plpgsql
as $$
begin
if mod(new.id, 2)=1
then
update tokens
set amount = new.amount
where id = new.id+1;
end if;
return new;
end ;
$$;
-- test
update tokens
set amount = 900
where id = 3;
update tokens
set amount = 18.95
where id in (2,5);
select *
from tokens
order by id;
No matter how you proceed you required prior knowledge of update specifics. For example you said "might be id 2 I can set mirror from id 3" to do so you would need to alter the database in some manner either changing the trigger function or the trigger to pass parameters. (Triggers can pass parameters but they are static, supplied at create trigger time).
Finally make sure you got your recursion stop logic down cold. Because if not:
-- The Danger: What happens WITHOUT the 'stopper condition'
-- Using an almost direct conversion of your UDT
-- using new_id as
create or replace function tokens_bur_func()
returns trigger
language plpgsql
as $$
begin
update tokens
set amount = new.amount
where id = new.id+1;
return new;
end ;
$$;
-- test
update tokens
set amount = 137.92
where id = 1;
-- Query returned successfully: one row affected, 31 msec execution time.
-- but
select *
from tokens
order by id;

How can I delete an old record and insert a new record with a pl/pgsql function? [duplicate]

Several months ago I learned from an answer on Stack Overflow how to perform multiple updates at once in MySQL using the following syntax:
INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z)
ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2);
I've now switched over to PostgreSQL and apparently this is not correct. It's referring to all the correct tables so I assume it's a matter of different keywords being used but I'm not sure where in the PostgreSQL documentation this is covered.
To clarify, I want to insert several things and if they already exist to update them.
PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)
INSERT INTO the_table (id, column_1, column_2)
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE
SET column_1 = excluded.column_1,
column_2 = excluded.column_2;
Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:
Example 38-2. Exceptions with UPDATE/INSERT
This example uses exception handling to perform either UPDATE or INSERT, as appropriate:
CREATE TABLE db (a INT PRIMARY KEY, b TEXT);
CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
-- note that "a" must be unique
UPDATE db SET b = data WHERE a = key;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
BEGIN
INSERT INTO db(a,b) VALUES (key, data);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try the UPDATE again
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');
There's possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:
WITH foos AS (SELECT (UNNEST(%foo[])).*)
updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id)
INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)
WHERE updated.id IS NULL;
See a_horse_with_no_name's answer for a clearer example.
Warning: this is not safe if executed from multiple sessions at the same time (see caveats below).
Another clever way to do an "UPSERT" in postgresql is to do two sequential UPDATE/INSERT statements that are each designed to succeed or have no effect.
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
The UPDATE will succeed if a row with "id=3" already exists, otherwise it has no effect.
The INSERT will succeed only if row with "id=3" does not already exist.
You can combine these two into a single string and run them both with a single SQL statement execute from your application. Running them together in a single transaction is highly recommended.
This works very well when run in isolation or on a locked table, but is subject to race conditions that mean it might still fail with duplicate key error if a row is inserted concurrently, or might terminate with no row inserted when a row is deleted concurrently. A SERIALIZABLE transaction on PostgreSQL 9.1 or higher will handle it reliably at the cost of a very high serialization failure rate, meaning you'll have to retry a lot. See why is upsert so complicated, which discusses this case in more detail.
This approach is also subject to lost updates in read committed isolation unless the application checks the affected row counts and verifies that either the insert or the update affected a row.
With PostgreSQL 9.1 this can be achieved using a writeable CTE (common table expression):
WITH new_values (id, field1, field2) as (
values
(1, 'A', 'X'),
(2, 'B', 'Y'),
(3, 'C', 'Z')
),
upsert as
(
update mytable m
set field1 = nv.field1,
field2 = nv.field2
FROM new_values nv
WHERE m.id = nv.id
RETURNING m.*
)
INSERT INTO mytable (id, field1, field2)
SELECT id, field1, field2
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.id = new_values.id)
See these blog entries:
Upserting via Writeable CTE
WAITING FOR 9.1 – WRITABLE CTE
WHY IS UPSERT SO COMPLICATED?
Note that this solution does not prevent a unique key violation but it is not vulnerable to lost updates.
See the follow up by Craig Ringer on dba.stackexchange.com
In PostgreSQL 9.5 and newer you can use INSERT ... ON CONFLICT UPDATE.
See the documentation.
A MySQL INSERT ... ON DUPLICATE KEY UPDATE can be directly rephrased to a ON CONFLICT UPDATE. Neither is SQL-standard syntax, they're both database-specific extensions. There are good reasons MERGE wasn't used for this, a new syntax wasn't created just for fun. (MySQL's syntax also has issues that mean it wasn't adopted directly).
e.g. given setup:
CREATE TABLE tablename (a integer primary key, b integer, c integer);
INSERT INTO tablename (a, b, c) values (1, 2, 3);
the MySQL query:
INSERT INTO tablename (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
becomes:
INSERT INTO tablename (a, b, c) values (1, 2, 10)
ON CONFLICT (a) DO UPDATE SET c = tablename.c + 1;
Differences:
You must specify the column name (or unique constraint name) to use for the uniqueness check. That's the ON CONFLICT (columnname) DO
The keyword SET must be used, as if this was a normal UPDATE statement
It has some nice features too:
You can have a WHERE clause on your UPDATE (letting you effectively turn ON CONFLICT UPDATE into ON CONFLICT IGNORE for certain values)
The proposed-for-insertion values are available as the row-variable EXCLUDED, which has the same structure as the target table. You can get the original values in the table by using the table name. So in this case EXCLUDED.c will be 10 (because that's what we tried to insert) and "table".c will be 3 because that's the current value in the table. You can use either or both in the SET expressions and WHERE clause.
For background on upsert see How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
I was looking for the same thing when I came here, but the lack of a generic "upsert" function botherd me a bit so I thought you could just pass the update and insert sql as arguments on that function form the manual
that would look like this:
CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT)
RETURNS VOID
LANGUAGE plpgsql
AS $$
BEGIN
LOOP
-- first try to update
EXECUTE sql_update;
-- check if the row is found
IF FOUND THEN
RETURN;
END IF;
-- not found so insert the row
BEGIN
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing and loop
END;
END LOOP;
END;
$$;
and perhaps to do what you initially wanted to do, batch "upsert", you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php
the highest cost is executing the query from your code, on the database side the execution cost is much smaller
There is no simple command to do it.
The most correct approach is to use function, like the one from docs.
Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them
Something along the lines of:
update table
set column = x.column
from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column)
where table.id = x.id
returning id;
assuming id:2 was returned:
insert into table (id, column) values (1, 'aa'), (3, 'cc');
Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.
Here's a longer and more comprehensive article on the topic.
I use this function merge
CREATE OR REPLACE FUNCTION merge_tabla(key INT, data TEXT)
RETURNS void AS
$BODY$
BEGIN
IF EXISTS(SELECT a FROM tabla WHERE a = key)
THEN
UPDATE tabla SET b = data WHERE a = key;
RETURN;
ELSE
INSERT INTO tabla(a,b) VALUES (key, data);
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql
Personally, I've set up a "rule" attached to the insert statement. Say you had a "dns" table that recorded dns hits per customer on a per-time basis:
CREATE TABLE dns (
"time" timestamp without time zone NOT NULL,
customer_id integer NOT NULL,
hits integer
);
You wanted to be able to re-insert rows with updated values, or create them if they didn't exist already. Keyed on the customer_id and the time. Something like this:
CREATE RULE replace_dns AS
ON INSERT TO dns
WHERE (EXISTS (SELECT 1 FROM dns WHERE ((dns."time" = new."time")
AND (dns.customer_id = new.customer_id))))
DO INSTEAD UPDATE dns
SET hits = new.hits
WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id));
Update: This has the potential to fail if simultaneous inserts are happening, as it will generate unique_violation exceptions. However, the non-terminated transaction will continue and succeed, and you just need to repeat the terminated transaction.
However, if there are tons of inserts happening all the time, you will want to put a table lock around the insert statements: SHARE ROW EXCLUSIVE locking will prevent any operations that could insert, delete or update rows in your target table. However, updates that do not update the unique key are safe, so if you no operation will do this, use advisory locks instead.
Also, the COPY command does not use RULES, so if you're inserting with COPY, you'll need to use triggers instead.
Similar to most-liked answer, but works slightly faster:
WITH upsert AS (UPDATE spider_count SET tally=1 WHERE date='today' RETURNING *)
INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert)
(source: http://www.the-art-of-web.com/sql/upsert/)
I custom "upsert" function above, if you want to INSERT AND REPLACE :
`
CREATE OR REPLACE FUNCTION upsert(sql_insert text, sql_update text)
RETURNS void AS
$BODY$
BEGIN
-- first try to insert and after to update. Note : insert has pk and update not...
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
EXECUTE sql_update;
IF FOUND THEN
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION upsert(text, text)
OWNER TO postgres;`
And after to execute, do something like this :
SELECT upsert($$INSERT INTO ...$$,$$UPDATE... $$)
Is important to put double dollar-comma to avoid compiler errors
check the speed...
According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.
I have the same issue for managing account settings as name value pairs.
The design criteria is that different clients could have different settings sets.
My solution, similar to JWP is to bulk erase and replace, generating the merge record within your application.
This is pretty bulletproof, platform independent and since there are never more than about 20 settings per client, this is only 3 fairly low load db calls - probably the fastest method.
The alternative of updating individual rows - checking for exceptions then inserting - or some combination of is hideous code, slow and often breaks because (as mentioned above) non standard SQL exception handling changing from db to db - or even release to release.
#This is pseudo-code - within the application:
BEGIN TRANSACTION - get transaction lock
SELECT all current name value pairs where id = $id into a hash record
create a merge record from the current and update record
(set intersection where shared keys in new win, and empty values in new are deleted).
DELETE all name value pairs where id = $id
COPY/INSERT merged records
END TRANSACTION
CREATE OR REPLACE FUNCTION save_user(_id integer, _name character varying)
RETURNS boolean AS
$BODY$
BEGIN
UPDATE users SET name = _name WHERE id = _id;
IF FOUND THEN
RETURN true;
END IF;
BEGIN
INSERT INTO users (id, name) VALUES (_id, _name);
EXCEPTION WHEN OTHERS THEN
UPDATE users SET name = _name WHERE id = _id;
END;
RETURN TRUE;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT
For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org
The current best practice that I'm aware of is:
COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
Acquire Lock [optional] (advisory is preferable to table locks, IMO)
Merge. (the fun part)
UPDATE will return the number of modified rows. If you use JDBC (Java), you can then check this value against 0 and, if no rows have been affected, fire INSERT instead. If you use some other programming language, maybe the number of the modified rows still can be obtained, check documentation.
This may not be as elegant but you have much simpler SQL that is more trivial to use from the calling code. Differently, if you write the ten line script in PL/PSQL, you probably should have a unit test of one or another kind just for it alone.
Edit: This does not work as expected. Unlike the accepted answer, this produces unique key violations when two processes repeatedly call upsert_foo concurrently.
Eureka! I figured out a way to do it in one query: use UPDATE ... RETURNING to test if any rows were affected:
CREATE TABLE foo (k INT PRIMARY KEY, v TEXT);
CREATE FUNCTION update_foo(k INT, v TEXT)
RETURNS SETOF INT AS $$
UPDATE foo SET v = $2 WHERE k = $1 RETURNING $1
$$ LANGUAGE sql;
CREATE FUNCTION upsert_foo(k INT, v TEXT)
RETURNS VOID AS $$
INSERT INTO foo
SELECT $1, $2
WHERE NOT EXISTS (SELECT update_foo($1, $2))
$$ LANGUAGE sql;
The UPDATE has to be done in a separate procedure because, unfortunately, this is a syntax error:
... WHERE NOT EXISTS (UPDATE ...)
Now it works as desired:
SELECT upsert_foo(1, 'hi');
SELECT upsert_foo(1, 'bye');
SELECT upsert_foo(3, 'hi');
SELECT upsert_foo(3, 'bye');
PostgreSQL >= v15
Big news on this topic as in PostgreSQL v15, it is possible to use MERGE command. In fact, this long awaited feature was listed the first of the improvements of the v15 release.
This is similar to INSERT ... ON CONFLICT but more batch-oriented. It has a powerful WHEN MATCHED vs WHEN NOT MATCHED structure that gives the ability to INSERT, UPDATE or DELETE on such conditions.
It not only eases bulk changes, but it even adds more control that tradition UPSERT and INSERT ... ON CONFLICT
Take a look at this very complete sample from official page:
MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN NOT MATCHED AND s.stock_delta > 0 THEN
INSERT VALUES(s.winename, s.stock_delta)
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
UPDATE SET stock = w.stock + s.stock_delta
WHEN MATCHED THEN
DELETE;
PostgreSQL v9, v10, v11, v12, v13, v14
If version is under v15 and over v9.5 , probably best choice is to use UPSERT syntax, with ON CONFLICT clause
Here is the example how to do upsert with params and without special sql constructions
if you have special condition (sometimes you can't use 'on conflict' because you can't create constraint)
WITH upd AS
(
update view_layer set metadata=:metadata where layer_id = :layer_id and view_id = :view_id returning id
)
insert into view_layer (layer_id, view_id, metadata)
(select :layer_id layer_id, :view_id view_id, :metadata metadata FROM view_layer l
where NOT EXISTS(select id FROM upd WHERE id IS NOT NULL) limit 1)
returning id
maybe it will be helpful

postgresql on conflict-cannot affect row a second time

I have a table, i have auto numbering/sequence on data_id
tabledata
---------
data_id [PK]
data_code [Unique]
data_desc
example code:
insert into tabledata(data_code,data_desc) values(Z01,'red')
on conflict (data_code) do update set data_desc=excluded.data_desc
works fine, and then i insert again
insert into tabledata(data_code,data_desc) values(Z01,'blue')
on conflict (data_code) do update set data_desc=excluded.data_desc
i got this error
[Err] ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
this is my real code
insert into psa_aso_branch(branch_code,branch_desc,regional_code,status,created_date,lastmodified_date)
(select branch_code, branch, kode_regional,
case when status_data='Y' then true
else false end, current_date, current_date
from branch_history) on conflict (branch_code) do
update set branch_desc = excluded.branch_desc, regional_code = excluded.regional_code,status = (case when excluded.status='Y' then true else false end), created_date=current_date, lastmodified_date=current_date;
working fine on first, but not the next one (like the example i give you before)
You can use update on the existing record/row, and not on row you are inserting.
Here update in on conflict clause applies to row in excluded table, which holds row temporarily.
In the first case record is inserted since there is no clash on data_code and update is not executed at all.
In the second insert you are inserting Z01 which is already inserted as data_code and data_code is unique.
The excluded table still holds the duplicate value of data_code after the update, so the record is not inserted. In update set data_code have to be changed in order to insert record properly.
I have been stuck on this issue for about 24 hours.
It is weird when I test the query on cli and it's works fine. It is working fine when I make an insertion using one data row. This errors only appear when I'm using insert-select.
It is not mostly because of insert-select problem. It is because the select rows is not unique. This will trigger the CONFLICT for more than once.
Thanks to #zivaricha comment. I experiment from his notes. Just that its hard to understand at first.
Solution:
Using distinct to make sure the select returns unique result.
This error comes when the duplicacy occurs multiple times in the single insertion
for example you have column a , b , c and combination of a and b is unique and on duplicate you are updating c.
Now suppose you already have a = 1 , b = 2 , c = 3 and you are inserting a = 1 b = 2 c = 4 and a = 1 b = 2 c = 4
so means conflict occurs twice so it cant update a row twice
I think what is happening here
when you do an update on conflict, it does an update that re conflicts again and then throws that error
We can find the error message from the source code, which we can simply understand why we got ON CONFLICT DO UPDATE command cannot affect row a second time.
In the source code of PostgreSQL at src/backend/executor/nodeModifyTable.c and the function of ExecOnConflictUpdate(), we can find this comment:
This can occur when a just inserted tuple is updated again in the same command. E.g. because multiple rows with the same conflicting key values are inserted.
This is somewhat similar to the ExecUpdate() TM_SelfModified case. We do not want to proceed because it would lead to the same row being updated a second time in some unspecified order, and in contrast to plain UPDATEs there's no historical behavior to break.
As the comment said, we can not update the row which we are inserting in INSERT ... ON CONFLICT, just like:
postgres=# CREATE TABLE t (id int primary key, name varchar);
postgres=# INSERT INTO t VALUES (1, 'smart'), (1, 'keyerror')
postgres=# ON CONFLICT (id) DO UPDATE SET name = 'Buuuuuz';
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Remember, the executor of postgresql is a volcano model, so it will process the data we insert one by one. When we process to (1, 'smart'), since the table is empty, we can insert normally. When we get to (1, 'keyerror'), there is a conflict with the (1, 'smart') we just inserted, so the update logic is executed, which results in updating our own inserted data, which PostgreSQL doesn't allow us to do.
Similarly, we cannot update the same row of data twice:
postgres=# DROP TABLE IF EXISTS t;
postgres=# CREATE TABLE t (id int primary key, name varchar);
postgres=# INSERT INTO t VALUES (1, 'keyerror'), (1, 'buuuuz')
postgres=# ON CONFLICT (id) DO UPDATE SET name = 'Buuuuuuuuuz';
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Deciding to use insert or update depending if unique key is in table [duplicate]

Several months ago I learned from an answer on Stack Overflow how to perform multiple updates at once in MySQL using the following syntax:
INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z)
ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2);
I've now switched over to PostgreSQL and apparently this is not correct. It's referring to all the correct tables so I assume it's a matter of different keywords being used but I'm not sure where in the PostgreSQL documentation this is covered.
To clarify, I want to insert several things and if they already exist to update them.
PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)
INSERT INTO the_table (id, column_1, column_2)
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE
SET column_1 = excluded.column_1,
column_2 = excluded.column_2;
Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:
Example 38-2. Exceptions with UPDATE/INSERT
This example uses exception handling to perform either UPDATE or INSERT, as appropriate:
CREATE TABLE db (a INT PRIMARY KEY, b TEXT);
CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
-- note that "a" must be unique
UPDATE db SET b = data WHERE a = key;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
BEGIN
INSERT INTO db(a,b) VALUES (key, data);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try the UPDATE again
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');
There's possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:
WITH foos AS (SELECT (UNNEST(%foo[])).*)
updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id)
INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)
WHERE updated.id IS NULL;
See a_horse_with_no_name's answer for a clearer example.
Warning: this is not safe if executed from multiple sessions at the same time (see caveats below).
Another clever way to do an "UPSERT" in postgresql is to do two sequential UPDATE/INSERT statements that are each designed to succeed or have no effect.
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
The UPDATE will succeed if a row with "id=3" already exists, otherwise it has no effect.
The INSERT will succeed only if row with "id=3" does not already exist.
You can combine these two into a single string and run them both with a single SQL statement execute from your application. Running them together in a single transaction is highly recommended.
This works very well when run in isolation or on a locked table, but is subject to race conditions that mean it might still fail with duplicate key error if a row is inserted concurrently, or might terminate with no row inserted when a row is deleted concurrently. A SERIALIZABLE transaction on PostgreSQL 9.1 or higher will handle it reliably at the cost of a very high serialization failure rate, meaning you'll have to retry a lot. See why is upsert so complicated, which discusses this case in more detail.
This approach is also subject to lost updates in read committed isolation unless the application checks the affected row counts and verifies that either the insert or the update affected a row.
With PostgreSQL 9.1 this can be achieved using a writeable CTE (common table expression):
WITH new_values (id, field1, field2) as (
values
(1, 'A', 'X'),
(2, 'B', 'Y'),
(3, 'C', 'Z')
),
upsert as
(
update mytable m
set field1 = nv.field1,
field2 = nv.field2
FROM new_values nv
WHERE m.id = nv.id
RETURNING m.*
)
INSERT INTO mytable (id, field1, field2)
SELECT id, field1, field2
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.id = new_values.id)
See these blog entries:
Upserting via Writeable CTE
WAITING FOR 9.1 – WRITABLE CTE
WHY IS UPSERT SO COMPLICATED?
Note that this solution does not prevent a unique key violation but it is not vulnerable to lost updates.
See the follow up by Craig Ringer on dba.stackexchange.com
In PostgreSQL 9.5 and newer you can use INSERT ... ON CONFLICT UPDATE.
See the documentation.
A MySQL INSERT ... ON DUPLICATE KEY UPDATE can be directly rephrased to a ON CONFLICT UPDATE. Neither is SQL-standard syntax, they're both database-specific extensions. There are good reasons MERGE wasn't used for this, a new syntax wasn't created just for fun. (MySQL's syntax also has issues that mean it wasn't adopted directly).
e.g. given setup:
CREATE TABLE tablename (a integer primary key, b integer, c integer);
INSERT INTO tablename (a, b, c) values (1, 2, 3);
the MySQL query:
INSERT INTO tablename (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
becomes:
INSERT INTO tablename (a, b, c) values (1, 2, 10)
ON CONFLICT (a) DO UPDATE SET c = tablename.c + 1;
Differences:
You must specify the column name (or unique constraint name) to use for the uniqueness check. That's the ON CONFLICT (columnname) DO
The keyword SET must be used, as if this was a normal UPDATE statement
It has some nice features too:
You can have a WHERE clause on your UPDATE (letting you effectively turn ON CONFLICT UPDATE into ON CONFLICT IGNORE for certain values)
The proposed-for-insertion values are available as the row-variable EXCLUDED, which has the same structure as the target table. You can get the original values in the table by using the table name. So in this case EXCLUDED.c will be 10 (because that's what we tried to insert) and "table".c will be 3 because that's the current value in the table. You can use either or both in the SET expressions and WHERE clause.
For background on upsert see How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
I was looking for the same thing when I came here, but the lack of a generic "upsert" function botherd me a bit so I thought you could just pass the update and insert sql as arguments on that function form the manual
that would look like this:
CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT)
RETURNS VOID
LANGUAGE plpgsql
AS $$
BEGIN
LOOP
-- first try to update
EXECUTE sql_update;
-- check if the row is found
IF FOUND THEN
RETURN;
END IF;
-- not found so insert the row
BEGIN
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing and loop
END;
END LOOP;
END;
$$;
and perhaps to do what you initially wanted to do, batch "upsert", you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php
the highest cost is executing the query from your code, on the database side the execution cost is much smaller
There is no simple command to do it.
The most correct approach is to use function, like the one from docs.
Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them
Something along the lines of:
update table
set column = x.column
from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column)
where table.id = x.id
returning id;
assuming id:2 was returned:
insert into table (id, column) values (1, 'aa'), (3, 'cc');
Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.
Here's a longer and more comprehensive article on the topic.
I use this function merge
CREATE OR REPLACE FUNCTION merge_tabla(key INT, data TEXT)
RETURNS void AS
$BODY$
BEGIN
IF EXISTS(SELECT a FROM tabla WHERE a = key)
THEN
UPDATE tabla SET b = data WHERE a = key;
RETURN;
ELSE
INSERT INTO tabla(a,b) VALUES (key, data);
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql
Personally, I've set up a "rule" attached to the insert statement. Say you had a "dns" table that recorded dns hits per customer on a per-time basis:
CREATE TABLE dns (
"time" timestamp without time zone NOT NULL,
customer_id integer NOT NULL,
hits integer
);
You wanted to be able to re-insert rows with updated values, or create them if they didn't exist already. Keyed on the customer_id and the time. Something like this:
CREATE RULE replace_dns AS
ON INSERT TO dns
WHERE (EXISTS (SELECT 1 FROM dns WHERE ((dns."time" = new."time")
AND (dns.customer_id = new.customer_id))))
DO INSTEAD UPDATE dns
SET hits = new.hits
WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id));
Update: This has the potential to fail if simultaneous inserts are happening, as it will generate unique_violation exceptions. However, the non-terminated transaction will continue and succeed, and you just need to repeat the terminated transaction.
However, if there are tons of inserts happening all the time, you will want to put a table lock around the insert statements: SHARE ROW EXCLUSIVE locking will prevent any operations that could insert, delete or update rows in your target table. However, updates that do not update the unique key are safe, so if you no operation will do this, use advisory locks instead.
Also, the COPY command does not use RULES, so if you're inserting with COPY, you'll need to use triggers instead.
Similar to most-liked answer, but works slightly faster:
WITH upsert AS (UPDATE spider_count SET tally=1 WHERE date='today' RETURNING *)
INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert)
(source: http://www.the-art-of-web.com/sql/upsert/)
I custom "upsert" function above, if you want to INSERT AND REPLACE :
`
CREATE OR REPLACE FUNCTION upsert(sql_insert text, sql_update text)
RETURNS void AS
$BODY$
BEGIN
-- first try to insert and after to update. Note : insert has pk and update not...
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
EXECUTE sql_update;
IF FOUND THEN
RETURN;
END IF;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION upsert(text, text)
OWNER TO postgres;`
And after to execute, do something like this :
SELECT upsert($$INSERT INTO ...$$,$$UPDATE... $$)
Is important to put double dollar-comma to avoid compiler errors
check the speed...
According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.
I have the same issue for managing account settings as name value pairs.
The design criteria is that different clients could have different settings sets.
My solution, similar to JWP is to bulk erase and replace, generating the merge record within your application.
This is pretty bulletproof, platform independent and since there are never more than about 20 settings per client, this is only 3 fairly low load db calls - probably the fastest method.
The alternative of updating individual rows - checking for exceptions then inserting - or some combination of is hideous code, slow and often breaks because (as mentioned above) non standard SQL exception handling changing from db to db - or even release to release.
#This is pseudo-code - within the application:
BEGIN TRANSACTION - get transaction lock
SELECT all current name value pairs where id = $id into a hash record
create a merge record from the current and update record
(set intersection where shared keys in new win, and empty values in new are deleted).
DELETE all name value pairs where id = $id
COPY/INSERT merged records
END TRANSACTION
CREATE OR REPLACE FUNCTION save_user(_id integer, _name character varying)
RETURNS boolean AS
$BODY$
BEGIN
UPDATE users SET name = _name WHERE id = _id;
IF FOUND THEN
RETURN true;
END IF;
BEGIN
INSERT INTO users (id, name) VALUES (_id, _name);
EXCEPTION WHEN OTHERS THEN
UPDATE users SET name = _name WHERE id = _id;
END;
RETURN TRUE;
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT
For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org
The current best practice that I'm aware of is:
COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
Acquire Lock [optional] (advisory is preferable to table locks, IMO)
Merge. (the fun part)
UPDATE will return the number of modified rows. If you use JDBC (Java), you can then check this value against 0 and, if no rows have been affected, fire INSERT instead. If you use some other programming language, maybe the number of the modified rows still can be obtained, check documentation.
This may not be as elegant but you have much simpler SQL that is more trivial to use from the calling code. Differently, if you write the ten line script in PL/PSQL, you probably should have a unit test of one or another kind just for it alone.
Edit: This does not work as expected. Unlike the accepted answer, this produces unique key violations when two processes repeatedly call upsert_foo concurrently.
Eureka! I figured out a way to do it in one query: use UPDATE ... RETURNING to test if any rows were affected:
CREATE TABLE foo (k INT PRIMARY KEY, v TEXT);
CREATE FUNCTION update_foo(k INT, v TEXT)
RETURNS SETOF INT AS $$
UPDATE foo SET v = $2 WHERE k = $1 RETURNING $1
$$ LANGUAGE sql;
CREATE FUNCTION upsert_foo(k INT, v TEXT)
RETURNS VOID AS $$
INSERT INTO foo
SELECT $1, $2
WHERE NOT EXISTS (SELECT update_foo($1, $2))
$$ LANGUAGE sql;
The UPDATE has to be done in a separate procedure because, unfortunately, this is a syntax error:
... WHERE NOT EXISTS (UPDATE ...)
Now it works as desired:
SELECT upsert_foo(1, 'hi');
SELECT upsert_foo(1, 'bye');
SELECT upsert_foo(3, 'hi');
SELECT upsert_foo(3, 'bye');
PostgreSQL >= v15
Big news on this topic as in PostgreSQL v15, it is possible to use MERGE command. In fact, this long awaited feature was listed the first of the improvements of the v15 release.
This is similar to INSERT ... ON CONFLICT but more batch-oriented. It has a powerful WHEN MATCHED vs WHEN NOT MATCHED structure that gives the ability to INSERT, UPDATE or DELETE on such conditions.
It not only eases bulk changes, but it even adds more control that tradition UPSERT and INSERT ... ON CONFLICT
Take a look at this very complete sample from official page:
MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN NOT MATCHED AND s.stock_delta > 0 THEN
INSERT VALUES(s.winename, s.stock_delta)
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
UPDATE SET stock = w.stock + s.stock_delta
WHEN MATCHED THEN
DELETE;
PostgreSQL v9, v10, v11, v12, v13, v14
If version is under v15 and over v9.5 , probably best choice is to use UPSERT syntax, with ON CONFLICT clause
Here is the example how to do upsert with params and without special sql constructions
if you have special condition (sometimes you can't use 'on conflict' because you can't create constraint)
WITH upd AS
(
update view_layer set metadata=:metadata where layer_id = :layer_id and view_id = :view_id returning id
)
insert into view_layer (layer_id, view_id, metadata)
(select :layer_id layer_id, :view_id view_id, :metadata metadata FROM view_layer l
where NOT EXISTS(select id FROM upd WHERE id IS NOT NULL) limit 1)
returning id
maybe it will be helpful

PL/pgSQL query in PostgreSQL returns result for new, empty table

I am learning to use triggers in PostgreSQL but run into an issue with this code:
CREATE OR REPLACE FUNCTION checkAdressen() RETURNS TRIGGER AS $$
DECLARE
adrCnt int = 0;
BEGIN
SELECT INTO adrCnt count(*) FROM Adresse
WHERE gehoert_zu = NEW.kundenId;
IF adrCnt < 1 OR adrCnt > 3 THEN
RAISE EXCEPTION 'Customer must have 1 to 3 addresses.';
ELSE
RAISE EXCEPTION 'No exception';
END IF;
END;
$$ LANGUAGE plpgsql;
I create a trigger with this procedure after freshly creating all my tables so they are all empty. However the count(*) function in the above code returns 1.
When I run SELECT count(*) FROM adresse; outside of PL/pgSQL, I get 0.
I tried using the FOUND variable but it is always true.
Even more strangely, when I insert some values into my tables and then delete them again so that they are empty again, the code works as intended and count(*) returns 0.
Also if I leave out the WHERE gehoert_zu = NEW.kundenId, count(*) returns 0 which means I get more results with the WHERE clause than without.
--Edit:
Here is an example of how I use the procedure:
CREATE TABLE kunde (
kundenId int PRIMARY KEY
);
CREATE TABLE adresse (
id int PRIMARY KEY,
gehoert_zu int REFERENCES kunde
);
CREATE CONSTRAINT TRIGGER adressenKonsistenzTrigger AFTER INSERT ON Kunde
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE checkAdressen();
INSERT INTO kunde VALUES (1);
INSERT INTO adresse VALUES (1,1);
It looks like I am getting the DEFERRABLE INITIALLY DEFERRED part wrong. I assumed the trigger would be executed after the first INSERT statement but it happens after the second one, although the inserts are not inside a BEGIN; - COMMIT; - Block.
According to the PostgreSQL Documentation inserts are commited automatically every time if not inside such a block and thus there shouldn't be an entry in adresse when the first INSERT statement is commited.
Can anyone point out my mistake?
--Edit:
The trigger and DEFERRABLE INITIALLY DEFERRED seem to be working all right.
My mistake was to assume that since I am not using a BEGIN-COMMIT-Block each insert would be executed in an own transaction with the trigger being executed afterwards every time.
However even without the BEGIN-COMMIT all inserts get bundled into one transaction and the trigger is executed afterwards.
Given this behaviour, what is the point in using BEGIN-COMMIT?
You need a transaction plus the "DEFERRABLE INITIALLY DEFERRED" because of the chicken and egg problem.
starting with two empty tables:
you cannot insert a single row into the person table, because the it needs at least one address.
you cannot insert a single row into the address table, because the FK constraint needs a corresponding row on the person table to exist
This is why you need to bundle the two inserts into one operation: the transaction. You need the BEGIN+ COMMIT, and the DEFERRABLE allows transient forbidden database states to exists: it causes the check to be evaluated at commit time.
This may seem a bit silly, but the answer is you need to stop deferring the trigger and run it BEFORE the insert. If you run it after the insert, of course there is data in the table.
As far as I can tell this is working as expected.
One further note, you probably dont mean:
RAISE EXCEPTION 'No Exception';
You probably want
RAISE INFO 'No Exception';
Then you can change your settings and run queries in transactions to test that the trigger does what you want it to do. As it is, every insert is going to fail and you have no way to move this into production without editing your procedure.