Postgres : ShareLock Deadlock on transaction - postgresql

Recently we have started getting lot of deadlock errors in logs. (Postgres server 9.6.5)
Our table consist of two columns one is an auto-increment primary key , while other is a json object.
two attributes from json object are defined as unique .
Now in logs we keep on getting errors that two simple insert queries on different rows are blocking each other.
============
process 65325 detected deadlock while waiting for ShareLock on transaction 2934224126 after 1000.050 ms
DETAIL: Process holding the lock: 35530. Wait queue: .
CONTEXT: while inserting index tuple (128,10) in relation "A"
STATEMENT: INSERT INTO A AS t (info) VALUES('{"x":"y",....)
ERROR: deadlock detected
DETAIL: Process 65325 waits for ShareLock on transaction 2934224126; blocked by process 35530.
Process 35530 waits for ShareLock on transaction 2934224125; blocked by process 65325.
Process 65325: INSERT INTO A AS t (info) VALUES({"x":"y",....)
Process 35530: INSERT INTO A AS t (info) VALUES({"x":"z",....)
====================
So basically two different rows are in deadlock condition.
Is there any suggestion on what conditions such deadlocks may occur?

Rows can never be in deadlock. It is not two different rows, but two different transactions, that are in deadlock. Your log is showing you the most recent insertion attempt by each transaction. Presumably, there were previous inserts as well in each transaction. But those won't show up in the log, unless they show up for some other reason (like log_statement=all).
So if T1 successfully (and invisibly, looking at your log file) inserted "x":"y", the T2 successfully and invisibly inserted "x":"z", and now T1 tries to insert "x":"z" and T2 tries "x":"y", there will be a deadlock. Assuming the unique index is on info->>'x'
This would be the same issue if you were not using JSON.
Mitigations would be, don't insert more than one row per transaction. Or if you do, always insert them in a specified order (for example, "y" before "z" due to the latin alphabet ordering), although in this case you just replace the deadlock error with a unique key violation. Or, just be prepared to catch the deadlock and try again.

Related

Should I lock a PostgreSQL table when invoking setval for sequence with the max ID function?

I have the following SQL script which sets the sequence value corresponding to max value of the ID column:
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1)) FROM mytable;
Should I lock 'mytable' in this case in order to prevent changing ID in a parallel request, such in the example below?
request #1 request #2
MAX(id)=5
inserted id 6
SETVAL=5
Or setval(max(id)) is an atomic operation?
Your suspicion is right, this approach is subject to race conditions.
But locking the table won't help, because it won't keep a concurrent transaction from fetching new sequence values. This transaction will block while the table is locked, but will happily continue inserting once the lock is gone, using a sequence value it got while the table was locked.
If it were possible to lock sequences, that might be a solution, but it is not possible to lock sequences.
I can think of two solutions:
Remove all privileges on the sequence while you modify it, so that concurrent requests to the sequence will fail. That causes errors, of course.
The pragmatic way: use
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1) + 100000) FROM mytable;
Here 100000 is a value that is safely bigger than the number rows that might get inserted while your operatoin is running.
You can use two requests in the same transaction:
ALTER SEQUENCE mytable_id_seq RESTART;
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1)) FROM mytable;
Note: the first command will lock the sequence for other transactions

PostgreSQL: deadlock detected SELECT FOR UPDATE in transaction

I have the following schema
ID (PK)| REF_ID | ACTIVE | STATUS
ID - Primary Key
I am using following query to select and update
BEGIN;
select * from table where ref_id = $1 and is_active is true for update;
UPDATE table set status = $1 where id =$2;
END;
Explanation for above
1) Select query result will be used to lock all the rows with provided ref ID and that result is used for some business logic
2) Update query to update the STATUS of a row which is part of same ref ID
ISSUE
postgres#machine ERROR: deadlock detected
postgres#machine DETAIL: Process 28297 waits for ShareLock on transaction 4809510; blocked by process 28296.
Process 28296 waits for ShareLock on transaction 4809502; blocked by process 28297.
Process 28297: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
Process 28296: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
postgres#machine ERROR: deadlock detected
postgres#machine DETAIL: Process 28454 waits for ShareLock on transaction 4810111; blocked by process 28384.
Process 28384 waits for ShareLock on transaction 4810092; blocked by process 28297.
Process 28297 waits for AccessExclusiveLock on tuple (113628,5) of relation 16817 of database 16384; blocked by process 28454.
Process 28454: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
Process 28384: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
Process 28297: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
This table is used in highly concurrent and distributed application (100's in parallel with same ref_id) and thats why i wanted to avoid distributed lock by having select and then update in same transaction.But i am facing with this deadlock error I don't know why explicit locking is not working.
Expected behaviour is that any other job with same reference ID must wait if any one else with same reference ID has acquired the lock
Help me figure out what I am missing or another workaround for this. I am still not clear even after explicit locking and being within transaction why is deadlock occurring.
As Laurenz said, in this simple case you should be able to eliminate the possibility of deadlock with an ORDER BY in your locking query.
A deadlock arises when, for example:
Process A acquires a lock on row 1
Process B acquires a lock on row 2
Process A requests a lock on row 2 (and waits for B to release it)
Process B requests a lock on row 1 (and waits for A to release it)
...And at this point, the processes will be waiting on each other forever (or rather, until the server notices, and kills one of them off).
But if both processes had agreed ahead of time to lock row 1 and then row 2, then this wouldn't have happened; one process would still be waiting on the other, but the other is free to proceed.
More generally, as long as all processes agree to follow the same ordering when acquiring locks, it's guaranteed that at least one of them is always making progress; if you only ever try to acquire locks which are "higher" than the ones you already hold, then whoever holds the "highest" lock will never be waiting on anyone.
The ordering needs to be unambiguous, and stable over time, so a generated primary key is ideal (i.e. you should ORDER BY id).

Unexpected deadlocks in Postgresql (while using psycopg2)

I am dealing with a deadlock issue in PostgreSQL that I do not understand.
I am trying to implement a Round Robin-like algorithm using Python, psycopg2 module, and a Postgres database.
I want several instances of an application to do the following:
- Lock the whole table with a list of tasks for a very short interval
- Pick a task to perform (least recently performed task, with some limitations)
- Label task so other instances do not pick it (only one instance is allowed to perform the same task at the same time)
- Unlock table
- Perform task
- Repeat
Other sessions should also be able to update certain fields of this table.
Suddenly, I am getting deadlocks that I cannot explain. I have simplified my Python script as hard as I could, I am performing a Commit after every statement (when possible), but still a deadlock appears every now and then.
For some reason, every time I get a deadlock, it is the first statement in a transaction. How is it even possible? My table doesn’t have any triggers, or foreign key constraints, or anything that would make things complicated. The only explanation I can come up with is that PostgreSQL does not release the lock immediately after the commit. Or perhaps it is psycopg2 that is not working the way I expect it to? I have failed to reproduce the issue by manually running statements in different sessions.
Deadlocks are rare, but I do get them at least once every few hours
I am running on PostgreSQL 9.6.1 and Python 2.7.12
Here is the code I run (this is just a simplified sample I made in order to catch the issue):
import psycopg2
import sys
import datetime
import time
sys.path.append('/opt/workflow/lib')
import config
import ovs_lib
instance_type='scan_master'
instance_id=sys.argv[1]
dbh=psycopg2.connect(dbname=config.values['pgsql']['db'], host=config.values['pgsql']['host'], port=int(config.values['pgsql']['port']), user=config.values['pgsql']['user'], password=config.values['pgsql']['pass'])
dbh.set_session(isolation_level='READ COMMITTED', autocommit=False)
cursor = dbh.cursor()
cursor.execute("SET search_path TO "+config.values['pgsql']['schema'])
def sanitize(string):
string=string.replace("'","''")
return string
def get_task(instance_id):
task_id=None
out_struct={}
instance_id=sanitize(instance_id)
#Lock whole table
dbh.commit() #Just in case
cursor.execute("SELECT 1 FROM wf_task FOR UPDATE") #Lock the table
cursor.execute("UPDATE wf_task SET scanner_instance_id=null WHERE scanner_instance_id='"+instance_id+"'") #release task from previous run
#Now get the task
sql ="SELECT t.task_id, st.scanner_function, t.parallel_runs\n"
sql+="FROM wf_task t\n"
sql+="JOIN wf_scanner_type st ON t.scanner_type_id=st.scanner_type_id\n"
sql+="WHERE status='A'\n"
sql+="AND t.scanner_instance_id is NULL\n"
sql+="AND last_scan_ts<=now()-scan_interval*interval '1 second'\n"
sql+="ORDER BY last_scan_ts\n"
sql+="LIMIT 1\n"
cursor.execute(sql)
cnt=cursor.rowcount
if cnt>0:
row=cursor.fetchone()
task_id=row[0]
sql ="UPDATE wf_task SET scanner_instance_id='"+instance_id+"',last_scan_ts=current_timestamp(3) WHERE task_id="+str(task_id)
cursor.execute(sql)
scanner_function=row[1]
parallel_runs=row[2]
out_struct['task_id']=task_id
out_struct['scanner_function']=scanner_function
out_struct['parallel_runs']=parallel_runs
dbh.commit()
return out_struct
def process_task(task_id):
sql="UPDATE wf_task SET submitted_ts=now() WHERE task_id="+str(task_id)+" AND submitted_ts<now()"
cursor.execute(sql)
dbh.commit()
sql="UPDATE wf_task SET executed_ts=now() WHERE task_id="+str(task_id)+" AND submitted_ts<now()"
cursor.execute(sql)
dbh.commit()
while True:
if not ovs_lib.check_control(instance_type, instance_id):
now_time=datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S')
print now_time+" Stop sygnal received"
exit(0)
task_struct=get_task(instance_id)
if 'task_id' not in task_struct:
time.sleep(1)
continue
process_task(task_struct['task_id'])
And here are examples of the error I get:
Traceback (most recent call last):
File "/opt/workflow/bin/scan_simple.py", line 70, in <module>
process_task(task_struct['task_id'])
File "/opt/workflow/bin/scan_simple.py", line 58, in process_task
cursor.execute(sql)
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 21577 waits for ShareLock on transaction 39243027; blocked by process 21425.
Process 21425 waits for ShareLock on transaction 39243029; blocked by process 21102.
Process 21102 waits for AccessExclusiveLock on tuple (8,12) of relation 39933 of database 16390; blocked by process 21577.
HINT: See server log for query details.
CONTEXT: while updating tuple (8,12) in relation "wf_task"
Traceback (most recent call last):
File "/opt/workflow/bin/scan_simple.py", line 66, in <module>
task_struct=get_task(instance_id)
File "/opt/workflow/bin/scan_simple.py", line 27, in get_task
cursor.execute("SELECT 1 FROM wf_task FOR UPDATE")
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 21776 waits for ShareLock on transaction 39488839; blocked by process 21931.
Process 21931 waits for ShareLock on transaction 39488844; blocked by process 21776.
HINT: See server log for query details.
CONTEXT: while locking tuple (17,9) in relation “wf_task"
At that time I had 6 instances of this script running simultaneously
No other sessions were active in the database.
Later update
Today I learned something new about Postgres that its very relevant to this question
Starting version 9.5, PostgreSQL supports a SKIP LOCKED statement, that solves the problem that I was trying to design my application around, and in a very elegant manner
If you are struggling with concurrency in PostgreSQL while trying to implement some sort of queue or round robin solution, you absolutely must read this:
https://blog.2ndquadrant.com/what-is-select-skip-locked-for-in-postgresql-9-5/
The problem is probably that the sequential scan in the first SELECT ... FOR UPDATE doesn't always return the rows in the same order, so concurrent executions of this statement lock the rows of the table in different orders. This leads to the deadlock you experience.
There are several solutions, in increasing goodness:
I think that the technique to lock the whole table for this update is horrible for performance, but if you insist on keeping your code, you could set synchronize_seqscans to off so that all sequential scans return the rows in the same order. But you really shouldn't lock all rows in a table like you do, because
It causes an unnecessary sequential scan.
It is not safe. Somebody could INSERT new rows between the time where you lock the rows and the time where you run your UPDATEs.
If you really want to lock the whole table, use the LOCK TABLE statement instead of locking all rows in the table. That will get rid of the deadlock as well.
The best solution is probably to lock the rows with the UPDATE itself. To avoid deadlocks, examine the execution plans that PostgreSQL uses for the UPDATE. This will be an index scan or a sequential scan. With an index scan you are safe, because that will return the rows in a certain order. For a sequential scan, disable the synchronize_seqscans feature mentioned above, ideally only for the transaction:
START TRANSACTION;
SET LOCAL synchronize_seqscans = off;
/* your UPDATEs go here */
COMMIT;

How can I deal with a race condition in PostgreSQL?

I've been getting a handful of errors on Postgresql, that seem to be tied to this race condition.
I have a process/daemon written in Twisted Python. The easiest way to describe it is as a Web Crawler - it pulls a page, parses the links, and logs what it's seen. Because of HTTP blocking, Twisted runs multiple "concurrent" processes deferred to threads.
Here's the race condition...
When I encounter a url shortener , this logic happens:
result= """SELECT * FROM shortened_link WHERE ( url_shortened = %(url)s ) LIMIT 1;"""
if result:
pass
else:
result= """INSERT INTO shortened_link ( url_shortened ..."
A surprising number or psycopg2.IntegrityError's are raised, because the unique index on url_shortened gets violated.
The select/insert does actually run that close together. From what I can tell, it looks like 2 shortened links get queued next to one another.
Process A: Select, returns Null
Process B: Select, returns Null
Process A: Insert , success
Process B: Insert , integrity error
Can anyone suggest any tips/tricks to handle this ? I'd like to avoid explicit locking, because I know that'll open up a whole other set of problems.
Do it all in a single command:
result= """
INSERT INTO shortened_link ( url_shortened ...
SELECT %(url)s
where not exists (
select 1
from shortened_link
WHERE url_shortened = %(url)s
);"""
It will only insert if that link does not exist.
There's really not a solution that avoids the need to be able to handle the possibility of a unique constraint violation error. If your framework can't do it then I'd wrap the SQL in a PL/pgSQL function or procedure that can.
Given that you can handle the error you might as well not test for the existence of the unique value and just attempt the insert, letting any error be handled by the EXCEPTION clause.
You either need a mutex lock of some kind, or you will have to live with the redundancies that will occur due to the race condition.
If you choose to go with the mutex lock - you don't necessarily need to use a database-level lock. You can simply lock down the Twisted process to block other threads handling a similar shortened url.
If you choose to avoid the lock, remove the unique constraint on the url_shortened field. Periodically, you can move these records to a 'clean' table that contains a single unique copy of each shortened url.

some questions about "select for update"?

Here is Pseudo code using libpq.so;but it does not go as what I think.
transaction begin
re1 = [select ics_time from table1 where c1=c11, c2=c22, c3=c33, c4=c44 for update];
if(re1 satisfies the condition)
{
re2 = [select id where c1=c11, c2=c22, c3=c33, c4=c44 for update];
delete from table1 where id = re2;
delete from table2 where id = re2;
delete from table3 where id = re3;
insert a new record into table1,table2,table3 with the c1,c2,c3,c4 as primary keys;
]
commit or rollback
Note that c1,c2,c3,c4 are all set as the primary key in the database, so it is only one row with these keys in the database.
What confuses me is as follows:
There are two "select for update" which will lock the same row. In
this code, does the second SQL statement wait for the exclusive lock
blocked by the first statement? But, the actual situation is that it
does not happen.
Something occurs beyond my expectation. In the log, I see a large
number of duplicate insert errors. In my opinion that the "select
for update " locks the row with the unique for keys, two processes
go serially. The insert operation goes after a delete. How can these
duplicate insertation occur? Doesn't the "select for update" add an
exclusive lock to the row, which blocks all other processes that
want to lock the same row?
Regarding your first point: Locks are not held by the statement, locks are held by the surrounding transaction. Your pseudo-code seems to use one connections with one transaction which in turn uses several statements. So the second SELECT FOR UPDATE is not blocked by the first. Read the docs about locking for this:
[...]An exclusive row-level lock on a specific row is automatically acquired when the row is updated or deleted. The lock is held until the transaction commits or rolls back, just like table-level locks. Row-level locks do not affect data querying; they block only writers to the same row.
Otherwise it would be very funny, if a transaction could block itself so easily.
Regarding your second point: I cannot answer this because a) your pseudo code is to pseudo for this problem and b) I don't understand what you mean by "processes" and the exact usecase.