I'm trying to insert or update data in a PostgreSQL db. The simplest case is a key-value pairing (the actual data is more complicated, but this is the smallest clear example)
When you set a value, I'd like it to insert if the key is not there, otherwise update. Sadly Postgres does not have an insert or update statement, so I have to emulate it myself.
I've been working with the idea of basically SELECTing whether the key exists, and then running the appropriate INSERT or UPDATE. Now clearly this needs to be be in a transaction or all manner of bad things could happen.
However, this is not working exactly how I'd like it to - I understand that there are limitations to serializable transactions, but I'm not sure how to work around this one.
Here's the situation -
ab: => set transaction isolation level serializable;
a: => select count(1) from table where id=1; --> 0
b: => select count(1) from table where id=1; --> 0
a: => insert into table values(1); --> 1
b: => insert into table values(1); -->
ERROR: duplicate key value violates unique constraint "serial_test_pkey"
Now I would expect it to throw the usual "couldn't commit due to concurrent update" but I'm guessing since the inserts are different "rows" this does not happen.
Is there an easy way to work around this?
Prior to Postgres 9.1 there were issues with isolation:
asking for serializable isolation guaranteed only that a single MVCC snapshot would be used for the entire transaction, which allowed certain documented anomalies.
Perhaps you're running into these "anomalies".
You can try SELECT … FOR UPDATE when checking if the row exists.
Alternatively, LOCK TABLE yourself.
If you're trying to implement UPSERT, then a slightly more reliable (or rather less unreliable) way is to first attempt UPDATE, check number of affected rows, and then try INSERT if no rows were updated.
Related
I want to get such behaviour on inserting data (conflict on id):
if there is no model with same id in db do INSERT
if there is entry with same id in db and that entry is newer (updated_at field) do NOT UPDATE
if there is entry with same id in db and that entry is older (updated_at field) do UPDATE
I'm using Ecto for that and want to work on constraints, however I cannot find an option to do so in documentation. Pseudo code of constraint could look like:
CHECK: NULL(current.updated_at) or incoming.updated_at > current.updated_at
Is such behaviour possible in Postgres?
PostgreSQL does not support CHECK constraints that reference table
data other than the new or updated row being checked. While a CHECK
constraint that violates this rule may appear to work in simple tests,
it cannot guarantee that the database will not reach a state in which
the constraint condition is false (due to subsequent changes of the
other row(s) involved). This would cause a database dump and reload to
fail. The reload could fail even when the complete database state is
consistent with the constraint, due to rows not being loaded in an
order that will satisfy the constraint. If possible, use UNIQUE,
EXCLUDE, or FOREIGN KEY constraints to express cross-row and
cross-table restrictions.
If what you desire is a one-time check against other rows at row
insertion, rather than a continuously-maintained consistency
guarantee, a custom trigger can be used to implement that. (This
approach avoids the dump/reload problem because pg_dump does not
reinstall triggers until after reloading data, so that the check will
not be enforced during a dump/reload.)
That should be simple using the WHERE clause of ON CONFLICT ... DO UPDATE:
INSERT INTO mytable (id, entry) VALUES (42, '2021-05-29 12:00:00')
ON CONFLICT (id)
DO UPDATE SET entry = EXCLUDED.entry
WHERE mytable.entry < EXCLUDED.entry;
I am creating a table in Postgresql 9.5 where id is the primary key. While inserting rows in the table if anyone tries to insert duplicate id, i want it to get ignored instead of raising exception. Is there any way such that i can set this while table creation itself that duplicate entries get ignored.
There are many techniques to resolve duplicate insertion issue while writing insertion query i.e. using ON CONFLICT DO NOTHING, or using WHERE EXISTS clause etc. But i want to handle this at table creation end so that the person writing insertion query doesn't need to bother any.
Creating RULE is one of the possible solution. Are there other possible solutions? Maybe something like this:
`CREATE TABLE dbo.foo (bar int PRIMARY KEY WITH (FILLFACTOR=90, IGNORE_DUP_KEY = ON))`
Although exact this statement doesn't work on Postgresql 9.5 on my machine.
add a trigger before insert or rule on insert do instead - otherwise has to be handled by inserting query. both solutions will require more resources on each insert.
Alternative way to use function with arguments for insert, that will check for duplicates, so end users will use function instead of INSERT statement.
WHERE EXISTS sub-query is not atomic btw - so you can still have exception after check...
9.5 ON CONFLICT DO NOTHING is the best solution still
I'm trying to insert data into a table which has a foreign key constraint. If there is a constraint violation in a row that I'm inserting, I want to chuck that data away.
The issue is that postgres returns an error every time I violate the constraint. Is it possible for me to have some statement in my insert statement like 'ON FOREIGN KEY CONSTRAINT DO NOTHING'?
EDIT:
This is the query that I'm trying to do, where info is a dict:
cursor.execute("INSERT INTO event (case_number_id, date, \
session, location, event_type, worker, result) VALUES \
(%(id_number)s, %(date)s, %(session)s, \
%(location)s, %(event_type)s, %(worker)s, %(result)s) ON CONFLICT DO NOTHING", info)
It errors out when there is a foreign key violation
If you're only inserting a single row at a time, you can create a savepoint before the insert and rollback to it when the insert fails (or release it when the insert succeeds).
For Postgres 9.5 or later, you can use INSERT ... ON CONFLICT DO NOTHING which does what it says. You can also write ON CONFLICT DO UPDATE SET column = value..., which will automagically convert your insert into an update of the row you are conflicting with (this functionality is sometimes called "upsert").
This does not work because OP is dealing with a foreign key constraint rather than a unique constraint. In that case, you can most easily use the savepoint method I described earlier, but for multiple rows it may prove tedious. If you need to insert multiple rows at once, it should be reasonably performant to split them into multiple insert statements, provided you are not working in autocommit mode, all inserts occur in one transaction, and you are not inserting a very large number of rows.
Sometimes, you really do need multiple inserts in a single statement, because the round-trip overhead of talking to your database plus the cost of having savepoints on every insert is simply too high. In this case, there are a number of imperfect approaches. Probably the least bad is to build a nested query which selects your data and joins it against the other table, something like this:
INSERT INTO table_A (column_A, column_B, column_C)
SELECT A_rows.*
FROM VALUES (...) AS A_rows(column_A, column_B, column_C)
JOIN table_B ON A_rows.column_B = table_B.column_B;
I'm running Postgres 9.3 and have a table tags, accessed through Python's psycopg2 module. I have a table called 'tags' that gets updated/inserted by two different methods, called 'update' and 'insert.' I also have several workers running concurrently, each of which call either 'update' or 'insert.' Due to a uniqueness constraint, I'd like to lock the 'tags' table directly before I perform the inserts or updates, and I commit the transaction directly after.
So my code roughly looks like (in psycopg2 parlance)
UPDATE:
cur.execute(LOCK TABLE tags IN SHARE ROW EXCLUSIVE MODE)
cur.execute(UPDATE tags SET ...)
cur.execute(DELETE FROM tags ....)
cur.execute(INSERT INTO tags ...)
connection.commit()
INSERT:
cur.execute(LOCK TABLE tags IN SHARE ROW EXCLUSIVE MODE)
cur.execute(DELETE FROM tags ....)
cur.execute(INSERT INTO tags ...)
connection.commit()
And my schema looks like
user_id varchar NOT NULL,
tag varchar NOT NULL,
time timestamptz,
CONSTRAINT unique_tag_key PRIMARY KEY (user_id, tag)
CONSTRAINT seen_before_user FOREIGN_KEY (user_id)
REFERENCES user_id_table (user_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
The problem I'm getting is that when I run concurrent workers I get deadlocks upon executing the share lock.
Weirdly though, if I replace the LOCK TABLE calls with a call like
cur.execute("SELECT pg_advisory_lock(tag_hash)")
where tag_hash is a hash on the tags table name 'tags', I get no such errors.
Why is it that I get errors with SHARE ROW EXCLUSIVE, but not pg_advisory lock? Are there any downsides to using pg_advisory locks here if I can guarantee that the tags table never gets modified outside of these two methods?
I think you are going about this the wrong way.
You should never be lock an entire table in Postgres unless you are running DDL on it.
You already have a unique constraint on the fields in question
Your best bet is to run the transaction in a try/except. If there is an exception For The constraint violation - Handle it properly, or a deadlock detected exception, simply retry it.
In general postgres is very good about handling locks without needing manual lock control unless you are doing something very crazy.
I have a question. Transaction isolation level is set to serializable. When the one user opens a transaction and INSERTs or UPDATEs data in "table1" and then another user opens a transaction and tries to INSERT data to the same table, does the second user need to wait 'til the first user commits the transaction?
Generally, no. The second transaction is inserting only, so unless there is a unique index check or other trigger that needs to take place, the data can be inserted unconditionally. In the case of a unique index (including primary key), it will block if both transactions are updating rows with the same value, e.g.:
-- Session 1 -- Session 2
CREATE TABLE t (x INT PRIMARY KEY);
BEGIN;
INSERT INTO t VALUES (1);
BEGIN;
INSERT INTO t VALUES (1); -- blocks here
COMMIT;
-- finally completes with duplicate key error
Things are less obvious in the case of updates that may affect insertions by the other transaction. I understand PostgreSQL does not yet support "true" serialisability in this case. I do not know how commonly supported it is by other SQL systems.
See http://www.postgresql.org/docs/current/interactive/mvcc.html
The second user will be blocked until the first user commits or rolls back his/her changes.