Unexpected deadlocks in Postgresql (while using psycopg2) - postgresql

I am dealing with a deadlock issue in PostgreSQL that I do not understand.
I am trying to implement a Round Robin-like algorithm using Python, psycopg2 module, and a Postgres database.
I want several instances of an application to do the following:
- Lock the whole table with a list of tasks for a very short interval
- Pick a task to perform (least recently performed task, with some limitations)
- Label task so other instances do not pick it (only one instance is allowed to perform the same task at the same time)
- Unlock table
- Perform task
- Repeat
Other sessions should also be able to update certain fields of this table.
Suddenly, I am getting deadlocks that I cannot explain. I have simplified my Python script as hard as I could, I am performing a Commit after every statement (when possible), but still a deadlock appears every now and then.
For some reason, every time I get a deadlock, it is the first statement in a transaction. How is it even possible? My table doesn’t have any triggers, or foreign key constraints, or anything that would make things complicated. The only explanation I can come up with is that PostgreSQL does not release the lock immediately after the commit. Or perhaps it is psycopg2 that is not working the way I expect it to? I have failed to reproduce the issue by manually running statements in different sessions.
Deadlocks are rare, but I do get them at least once every few hours
I am running on PostgreSQL 9.6.1 and Python 2.7.12
Here is the code I run (this is just a simplified sample I made in order to catch the issue):
import psycopg2
import sys
import datetime
import time
sys.path.append('/opt/workflow/lib')
import config
import ovs_lib
instance_type='scan_master'
instance_id=sys.argv[1]
dbh=psycopg2.connect(dbname=config.values['pgsql']['db'], host=config.values['pgsql']['host'], port=int(config.values['pgsql']['port']), user=config.values['pgsql']['user'], password=config.values['pgsql']['pass'])
dbh.set_session(isolation_level='READ COMMITTED', autocommit=False)
cursor = dbh.cursor()
cursor.execute("SET search_path TO "+config.values['pgsql']['schema'])
def sanitize(string):
string=string.replace("'","''")
return string
def get_task(instance_id):
task_id=None
out_struct={}
instance_id=sanitize(instance_id)
#Lock whole table
dbh.commit() #Just in case
cursor.execute("SELECT 1 FROM wf_task FOR UPDATE") #Lock the table
cursor.execute("UPDATE wf_task SET scanner_instance_id=null WHERE scanner_instance_id='"+instance_id+"'") #release task from previous run
#Now get the task
sql ="SELECT t.task_id, st.scanner_function, t.parallel_runs\n"
sql+="FROM wf_task t\n"
sql+="JOIN wf_scanner_type st ON t.scanner_type_id=st.scanner_type_id\n"
sql+="WHERE status='A'\n"
sql+="AND t.scanner_instance_id is NULL\n"
sql+="AND last_scan_ts<=now()-scan_interval*interval '1 second'\n"
sql+="ORDER BY last_scan_ts\n"
sql+="LIMIT 1\n"
cursor.execute(sql)
cnt=cursor.rowcount
if cnt>0:
row=cursor.fetchone()
task_id=row[0]
sql ="UPDATE wf_task SET scanner_instance_id='"+instance_id+"',last_scan_ts=current_timestamp(3) WHERE task_id="+str(task_id)
cursor.execute(sql)
scanner_function=row[1]
parallel_runs=row[2]
out_struct['task_id']=task_id
out_struct['scanner_function']=scanner_function
out_struct['parallel_runs']=parallel_runs
dbh.commit()
return out_struct
def process_task(task_id):
sql="UPDATE wf_task SET submitted_ts=now() WHERE task_id="+str(task_id)+" AND submitted_ts<now()"
cursor.execute(sql)
dbh.commit()
sql="UPDATE wf_task SET executed_ts=now() WHERE task_id="+str(task_id)+" AND submitted_ts<now()"
cursor.execute(sql)
dbh.commit()
while True:
if not ovs_lib.check_control(instance_type, instance_id):
now_time=datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S')
print now_time+" Stop sygnal received"
exit(0)
task_struct=get_task(instance_id)
if 'task_id' not in task_struct:
time.sleep(1)
continue
process_task(task_struct['task_id'])
And here are examples of the error I get:
Traceback (most recent call last):
File "/opt/workflow/bin/scan_simple.py", line 70, in <module>
process_task(task_struct['task_id'])
File "/opt/workflow/bin/scan_simple.py", line 58, in process_task
cursor.execute(sql)
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 21577 waits for ShareLock on transaction 39243027; blocked by process 21425.
Process 21425 waits for ShareLock on transaction 39243029; blocked by process 21102.
Process 21102 waits for AccessExclusiveLock on tuple (8,12) of relation 39933 of database 16390; blocked by process 21577.
HINT: See server log for query details.
CONTEXT: while updating tuple (8,12) in relation "wf_task"
Traceback (most recent call last):
File "/opt/workflow/bin/scan_simple.py", line 66, in <module>
task_struct=get_task(instance_id)
File "/opt/workflow/bin/scan_simple.py", line 27, in get_task
cursor.execute("SELECT 1 FROM wf_task FOR UPDATE")
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 21776 waits for ShareLock on transaction 39488839; blocked by process 21931.
Process 21931 waits for ShareLock on transaction 39488844; blocked by process 21776.
HINT: See server log for query details.
CONTEXT: while locking tuple (17,9) in relation “wf_task"
At that time I had 6 instances of this script running simultaneously
No other sessions were active in the database.
Later update
Today I learned something new about Postgres that its very relevant to this question
Starting version 9.5, PostgreSQL supports a SKIP LOCKED statement, that solves the problem that I was trying to design my application around, and in a very elegant manner
If you are struggling with concurrency in PostgreSQL while trying to implement some sort of queue or round robin solution, you absolutely must read this:
https://blog.2ndquadrant.com/what-is-select-skip-locked-for-in-postgresql-9-5/

The problem is probably that the sequential scan in the first SELECT ... FOR UPDATE doesn't always return the rows in the same order, so concurrent executions of this statement lock the rows of the table in different orders. This leads to the deadlock you experience.
There are several solutions, in increasing goodness:
I think that the technique to lock the whole table for this update is horrible for performance, but if you insist on keeping your code, you could set synchronize_seqscans to off so that all sequential scans return the rows in the same order. But you really shouldn't lock all rows in a table like you do, because
It causes an unnecessary sequential scan.
It is not safe. Somebody could INSERT new rows between the time where you lock the rows and the time where you run your UPDATEs.
If you really want to lock the whole table, use the LOCK TABLE statement instead of locking all rows in the table. That will get rid of the deadlock as well.
The best solution is probably to lock the rows with the UPDATE itself. To avoid deadlocks, examine the execution plans that PostgreSQL uses for the UPDATE. This will be an index scan or a sequential scan. With an index scan you are safe, because that will return the rows in a certain order. For a sequential scan, disable the synchronize_seqscans feature mentioned above, ideally only for the transaction:
START TRANSACTION;
SET LOCAL synchronize_seqscans = off;
/* your UPDATEs go here */
COMMIT;

Related

FOR LOOP without a transaction

We am doing a system redesigning and due to the change in design we need to import data from multiple similar source tables into one table. For this same, I am running a loop which have the list of tables and importing all the data. However, due to massive amount of data, I got out of memory error after execution of around 12 hours and 20 tables. Now I discovered that the loop runs in a single transaction which I don't need since the system which is filling the data is suspended for that time. Having this transaction thing, I believe, it is taking longer time also. My requirement is to run my query without any transaction.
DO $$DECLARE r record;
BEGIN
FOR r IN SELECT '
INSERT INTO dbo.tb_requests
(node_request_id, request_type, id, process_id, data, timestamp_d1, timestamp_d2, create_time, is_processed)
SELECT lpad(A._id, 32, ''0'')::UUID, (A.data_type + 1) request_type, B.id, B.order_id, data_value, timestamp_d1, timestamp_d2, create_time, TRUE
FROM dbo.data_store_' || id || ' A
JOIN dbo.tb_new_processes B
ON A.process_id = B.process_id
WHERE A._id != ''0'';
' as log_query FROM dbo.list_table
ORDER BY line_id
LOOP
EXECUTE r.log_query;
END LOOP;
END$$;
This is a sample code block. It is not the actual code block but I think, it will give the idea.
Error Message(Translation from Original Japanese error Message):
ERROR: Out of memory
DETAIL: Request for size 32 failed in memory context "ExprContext".
SQL state: 53200
You cannot to run any statement on server side without transaction. For some modern Postgres releases you can run commit statement inside DO statement. It is closes current transaction and starts new transactions. This can breaks very long transaction, and can solve the problem with memory leak - Postgres releasing some memory at transaction end.
Or use shell scripts instead (bash) if it is possible.

Postgres : ShareLock Deadlock on transaction

Recently we have started getting lot of deadlock errors in logs. (Postgres server 9.6.5)
Our table consist of two columns one is an auto-increment primary key , while other is a json object.
two attributes from json object are defined as unique .
Now in logs we keep on getting errors that two simple insert queries on different rows are blocking each other.
============
process 65325 detected deadlock while waiting for ShareLock on transaction 2934224126 after 1000.050 ms
DETAIL: Process holding the lock: 35530. Wait queue: .
CONTEXT: while inserting index tuple (128,10) in relation "A"
STATEMENT: INSERT INTO A AS t (info) VALUES('{"x":"y",....)
ERROR: deadlock detected
DETAIL: Process 65325 waits for ShareLock on transaction 2934224126; blocked by process 35530.
Process 35530 waits for ShareLock on transaction 2934224125; blocked by process 65325.
Process 65325: INSERT INTO A AS t (info) VALUES({"x":"y",....)
Process 35530: INSERT INTO A AS t (info) VALUES({"x":"z",....)
====================
So basically two different rows are in deadlock condition.
Is there any suggestion on what conditions such deadlocks may occur?
Rows can never be in deadlock. It is not two different rows, but two different transactions, that are in deadlock. Your log is showing you the most recent insertion attempt by each transaction. Presumably, there were previous inserts as well in each transaction. But those won't show up in the log, unless they show up for some other reason (like log_statement=all).
So if T1 successfully (and invisibly, looking at your log file) inserted "x":"y", the T2 successfully and invisibly inserted "x":"z", and now T1 tries to insert "x":"z" and T2 tries "x":"y", there will be a deadlock. Assuming the unique index is on info->>'x'
This would be the same issue if you were not using JSON.
Mitigations would be, don't insert more than one row per transaction. Or if you do, always insert them in a specified order (for example, "y" before "z" due to the latin alphabet ordering), although in this case you just replace the deadlock error with a unique key violation. Or, just be prepared to catch the deadlock and try again.

Postgres: SELECT FOR UPDATE does not see new rows after lock release

Trying to support PostgreSQL DB in my application, found this strange behaviour.
Preparation:
CREATE TABLE test(id INTEGER, flag BOOLEAN);
INSERT INTO test(id, flag) VALUES (1, true);
Assume two concurrent transactions (Autocommit=false, READ_COMMITTED) TX1 and TX2:
TX1:
UPDATE test SET flag = FALSE WHERE id = 1;
INSERT INTO test(id, flag) VALUES (2, TRUE);
-- (wait, no COMMIT yet)
TX2:
SELECT id FROM test WHERE flag=true FOR UPDATE;
-- waits for TX1 to release lock
Now, if I COMMIT in TX1, the SELECT in TX2 returns empty cursor.
It is strange to me, because same experiment in Oracle and MariaDB results in selecting newly created row (id=2).
I could not find anything about this behaviour in PG documentation.
Am I missing something?
Is there any way to force PG server to "refresh" statement visibility after acquiring lock?
PS: PostgreSQL version 11.1
TX2 scans the table and tries to lock the results.
The scan sees the snapshot of the database from the start of the query, so it cannot see any rows that were inserted (or made eligible in some other way) by concurrent modifications that started after that snapshot was taken.
That is why you cannot see the row with the id 2.
For id 1, that is also true, so the scan finds that row. But the query has to wait until the lock is released. When that finally happens, it fetches that latest committed version of the row and performs the check again, so that row is excluded as well.
This “EvalPlanQual” recheck (to use PostgreSQL jargon) is only performed for rows that were found during the scan, but were locked. The second row isn't even found during the scan, so no such processing happens there.
This is a bit odd, admitted. But it is not a bug, it is just the way PostgreSQL wirks.
If you want to avoid such anomalies, use the REPEATABLE READ isolation level. Then you will get a serialization error in such a case and can retry the transaction, thus avoiding inconsistencies like that.

How can pgsql sequence be undefined when I just called nextval?

I've got an app built on top of PostgresQL, which makes use of a custom sequence. I think I understand sequences pretty well by now: they are non-transactional, currval is defined only within the current session, etc. But I don't understand this:
2015-10-13 10:37:16 SQLSelect: SELECT nextval('commit_id_seq')
2015-10-13 10:37:16 commit_id_seq: 57
2015-10-13 10:37:16 SQLExecute: UPDATE bid SET is_archived=false,company_id=1436,contact_id=15529,...(etc)...,sharing_policy='' WHERE id = 56229
2015-10-13 10:37:16 ERROR: ERROR: currval of sequence "commit_id_seq" is not yet defined in this session
CONTEXT: SQL statement "INSERT INTO history (table_name, record_id, sec_user_id, created, action, notes, status, before, after, commit_id)
SELECT TG_TABLE_NAME, rec.id, (SELECT id FROM sec_user WHERE name = CURRENT_USER), now(), SUBSTR(TG_OP,1,1), note, stat, oldH, newH, currval('commit_id_seq')"
PL/pgSQL function log_to_history() line 28 at SQL statement
[3]
We log every call to the database, and in the case of the SELECT nextval, I also log the result. The above are the exact calls, except that I trimmed the UPDATE statement (because the original is really long).
So, you can see that we just called nextval on the sequence, got a reasonable number back, and then we do an UPDATE that invokes a trigger function that attempts to use currval on that sequence... and it fails, claiming currval is not defined.
Note that this doesn't usually happen, but once it does start happening, it does so consistently (perhaps until the user disconnects from the DB).
How can this be? And what can I do about it?
Your UPDATE statement obviously calls a trigger. The most plausible cause of this error is that the trigger function is in a different schema from where the sequence is defined and the schema of the sequence is not in the search_path. That gives you two options to resolve this:
Make the schema of the sequence visible to the trigger function using SET search_path TO .... Note that this will make all objects in the schema of the sequence visible, which may be something of a security risk, depending on your database design.
Schema-qualify the sequence name in the trigger function: currval('my_schema.commit_id_seq').
Another plausible cause is connection pooling at your application end. Log the "session ID" (really just the starting time and pid of the current session) by adding %c to your log_line_prefix() parameter in postgresql.conf. In PostgreSQL every command runs in its own transaction unless a transaction is explicitly established. Connection pooling software also works at the transaction level (i.e. you start a transaction and then your connection will stay open until you close it, outside of a transaction there are no guarantees about session persistence). If that is the case you can wrap your entire set of commands in a BEGIN ... COMMIT block (you should probably use a specific call from your pooling software), or better yet, change your code to not depend on a previous nextval() call.

DB2 deadlock timeout Sqlstate: 40001, reason code 68 due to update statements called from servlet using SQL

I am calling update statements one after the other from a servlet to DB2. I am getting error sqlstate 40001, reason code 68 which i found it is due to deadlock timeout.
How can I resolve this issue?
Can it be resolved by setting query timeout?
If yes then how to use it with update statements in servlet or where to use it?
The reason code 68 already tells you this is due to a lock timeout (deadlock is reason code 2) It could be due to other users running queries at the same time that use the same data you are accessing, or your own multiple updates.
Begin by running db2pd -db locktest -locks show detail from a db2 command line to see where the locks are. You'll then need to run something like:
select tabschema, tabname, tableid, tbspaceid
from syscat.tables where tbspaceid = # and tableid = #
filling in the # symbols with the ID number you get from the db2pd command output.
Once you see where the locks are, here are some tips:
◦Deadlock frequency can sometimes be reduced by ensuring that all applications access their common data in the same order – meaning, for example, that they access (and therefore lock) rows in Table A, followed by Table B, followed by Table C, and so on.
taken from: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.trb.doc/doc/t0055074.html
recommended reading: http://www.ibm.com/developerworks/data/library/techarticle/dm-0511bond/index.html
Addendum: if your servlet or another guilty application is using select statements found to be involved in the deadlock, you can try appending with ur to the select statements if accuracy of the newly updated (or inserted) data isn't important.
For me, the solution was adding FOR READ ONLY WITH UR at the end of all my SELECT statements. (Apparently my select statements were returning so much data, it locked the tables long enough to interfere with other SQL statements)
See https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_isolationclause.html