What I am trying to achieve is to have multiple instances of the same application running at the same time, but only one of those instances running a cron, by locking it in a Postgres database.
My solution so far is :
Running a cron on all the instances.
Inserting a row in a table cron_lock with a unique identifier for the cron.
If I have an error while running the insert query, it is most likely because the row already exists (the cron identifier is the primary key of the table). If that is the case, I do nothing, and I exit.
If I don't have an error while running the insert query, then the application instance will run the cron process.
At the end of my process, I delete the row with the unique identifier.
This solution is working, but I am not sure if another locking mechanism would exist with Postgres, in particular one that would not have me execute queries that are creating errors.
Thanks to #Belayer I found a nice way to do it with advisory locks.
Here is my solution :
Each of my crons have an associated and unique ID (integer format).
All of the crons start on all the different servers. But before running the main function of the cron, I try to get an advisory lock with the unique ID in the database. If the cron can get the lock, then it will run the main function and free the lock, otherwise, it just stops.
And here is some pseudo code if you want to implement it in a language of your choice :
enum Cron {
Echo = 1,
Test = 2
}
function uniqueCron(id, mainFunction) {
result = POSTGRES ('SELECT pg_try_advisory_lock($id) AS "should_run"')
if(result == FALSE){ return }
mainFunction()
POSTGRES ('SELECT pg_advisory_unlock($id)')
}
cron(* * * * *) do {
uniqueCron(Cron.Echo, (echo "Unique cron"))
}
cron(*/5 * * * *) do {
uniqueCron(Cron.Test, (echo "Test"))
}
Running this process many times, or on many different servers, all using the same database, will result in only one mainFunction being executed at once, given that all crons are launched at the same time (same time/timezone on the different servers). A main function too short to execute might cause problems if one server try to get the lock and another already released it. In that case, wait a little before releasing the lock.
Related
code in matlab app:
query = sprintf('select *,st_askml(line) from %s;', table_name);
var = fetch(connection, query);
This completes successfully and I get the data, the app continues running. However if I separately (in this cause within a python script) try to run "drop table if exists" on the same table, it won't work because it is locked;
How should I change my select query in matlab so that it finishes gracefully. (btw the lock is released when I close the app).
I am open to hearing more details about this but I solved the problem by adding
execute(connection,'COMMIT');
after my select query.
I would like to lock a table for writing during a period of time, while leaving it available for reading.
Is that possible ?
Ideally I would like to lock the table with a predicate (for example prevent writing rows "where country = france").
If you really want to lock against such inserts, i.e. the query should hang and only continue when you allow it, you would have to place a SHARE lock on the table and keep the transaction open.
This is usually not a good idea.
If you want to prevent any such inserts, i.e. throw an error when such an insert is attempted, create a BEFORE INSERT trigger that throws an exception if the NEW row satisfies the condition.
You can use FOR SHARE lock, which blocks other transactions from performing like UPDATE and DELETE, while allowing SELECT FOR SHARE. (Read the docs for details: https://www.postgresql.org/docs/9.4/explicit-locking.html [13.3.2])
For example, there are 2 processes accessing table user_table, in the following sequence:
Process A: BEGIN;
Process A: SELECT username FROM user_table WHERE country = france FOR SHARE;
Process B: SELECT * FROM user_table FOR SHARE; (In here, process B can still read all the rows of the table)
Process B: UPDATE user_table SET username = 'test' WHERE country = france; (In here, process B is blocked and is waiting for process A to finish its transaction)
Let's say that I would like to create an index. It will take some time. I use pgadmin. Let's say that when query is being executed pgadmin crashed for some reason (for example computer is restarted).
What would be the status of that index. Would it keep on being created and finally at some point it would be created with success or it will fail immedately or it will fail aftger some time?
Is there any way to check what is the status of index being created? (Im using postgres version 10.x)
It is hard to answer the question if we do not know the reason for the crash of an application. If it was not caused by a server failure it is likely that the index was created correctly. You can check this by querying the system catalog pg_index. You have to know the index name, e.g.:
select indexrelid::regclass, indisvalid
from pg_index
where indexrelid::regclass::text = 'my_table_unique_col_key'
Per the documantaion:
indisvalid - If true, the index is currently valid for queries. False means the index is possibly incomplete: it must still be modified by INSERT/UPDATE operations, but it cannot safely be used for queries. If it is unique, the uniqueness property is not guaranteed true either.
If the index is not created yet (or at all due to a query failure) the above query returns no rows. You can check whether the query that creates the index is still running (see Dynamic Statistics Views):
select *
from pg_stat_activity
where query ilike 'create index%'
I am dealing with a deadlock issue in PostgreSQL that I do not understand.
I am trying to implement a Round Robin-like algorithm using Python, psycopg2 module, and a Postgres database.
I want several instances of an application to do the following:
- Lock the whole table with a list of tasks for a very short interval
- Pick a task to perform (least recently performed task, with some limitations)
- Label task so other instances do not pick it (only one instance is allowed to perform the same task at the same time)
- Unlock table
- Perform task
- Repeat
Other sessions should also be able to update certain fields of this table.
Suddenly, I am getting deadlocks that I cannot explain. I have simplified my Python script as hard as I could, I am performing a Commit after every statement (when possible), but still a deadlock appears every now and then.
For some reason, every time I get a deadlock, it is the first statement in a transaction. How is it even possible? My table doesn’t have any triggers, or foreign key constraints, or anything that would make things complicated. The only explanation I can come up with is that PostgreSQL does not release the lock immediately after the commit. Or perhaps it is psycopg2 that is not working the way I expect it to? I have failed to reproduce the issue by manually running statements in different sessions.
Deadlocks are rare, but I do get them at least once every few hours
I am running on PostgreSQL 9.6.1 and Python 2.7.12
Here is the code I run (this is just a simplified sample I made in order to catch the issue):
import psycopg2
import sys
import datetime
import time
sys.path.append('/opt/workflow/lib')
import config
import ovs_lib
instance_type='scan_master'
instance_id=sys.argv[1]
dbh=psycopg2.connect(dbname=config.values['pgsql']['db'], host=config.values['pgsql']['host'], port=int(config.values['pgsql']['port']), user=config.values['pgsql']['user'], password=config.values['pgsql']['pass'])
dbh.set_session(isolation_level='READ COMMITTED', autocommit=False)
cursor = dbh.cursor()
cursor.execute("SET search_path TO "+config.values['pgsql']['schema'])
def sanitize(string):
string=string.replace("'","''")
return string
def get_task(instance_id):
task_id=None
out_struct={}
instance_id=sanitize(instance_id)
#Lock whole table
dbh.commit() #Just in case
cursor.execute("SELECT 1 FROM wf_task FOR UPDATE") #Lock the table
cursor.execute("UPDATE wf_task SET scanner_instance_id=null WHERE scanner_instance_id='"+instance_id+"'") #release task from previous run
#Now get the task
sql ="SELECT t.task_id, st.scanner_function, t.parallel_runs\n"
sql+="FROM wf_task t\n"
sql+="JOIN wf_scanner_type st ON t.scanner_type_id=st.scanner_type_id\n"
sql+="WHERE status='A'\n"
sql+="AND t.scanner_instance_id is NULL\n"
sql+="AND last_scan_ts<=now()-scan_interval*interval '1 second'\n"
sql+="ORDER BY last_scan_ts\n"
sql+="LIMIT 1\n"
cursor.execute(sql)
cnt=cursor.rowcount
if cnt>0:
row=cursor.fetchone()
task_id=row[0]
sql ="UPDATE wf_task SET scanner_instance_id='"+instance_id+"',last_scan_ts=current_timestamp(3) WHERE task_id="+str(task_id)
cursor.execute(sql)
scanner_function=row[1]
parallel_runs=row[2]
out_struct['task_id']=task_id
out_struct['scanner_function']=scanner_function
out_struct['parallel_runs']=parallel_runs
dbh.commit()
return out_struct
def process_task(task_id):
sql="UPDATE wf_task SET submitted_ts=now() WHERE task_id="+str(task_id)+" AND submitted_ts<now()"
cursor.execute(sql)
dbh.commit()
sql="UPDATE wf_task SET executed_ts=now() WHERE task_id="+str(task_id)+" AND submitted_ts<now()"
cursor.execute(sql)
dbh.commit()
while True:
if not ovs_lib.check_control(instance_type, instance_id):
now_time=datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S')
print now_time+" Stop sygnal received"
exit(0)
task_struct=get_task(instance_id)
if 'task_id' not in task_struct:
time.sleep(1)
continue
process_task(task_struct['task_id'])
And here are examples of the error I get:
Traceback (most recent call last):
File "/opt/workflow/bin/scan_simple.py", line 70, in <module>
process_task(task_struct['task_id'])
File "/opt/workflow/bin/scan_simple.py", line 58, in process_task
cursor.execute(sql)
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 21577 waits for ShareLock on transaction 39243027; blocked by process 21425.
Process 21425 waits for ShareLock on transaction 39243029; blocked by process 21102.
Process 21102 waits for AccessExclusiveLock on tuple (8,12) of relation 39933 of database 16390; blocked by process 21577.
HINT: See server log for query details.
CONTEXT: while updating tuple (8,12) in relation "wf_task"
Traceback (most recent call last):
File "/opt/workflow/bin/scan_simple.py", line 66, in <module>
task_struct=get_task(instance_id)
File "/opt/workflow/bin/scan_simple.py", line 27, in get_task
cursor.execute("SELECT 1 FROM wf_task FOR UPDATE")
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 21776 waits for ShareLock on transaction 39488839; blocked by process 21931.
Process 21931 waits for ShareLock on transaction 39488844; blocked by process 21776.
HINT: See server log for query details.
CONTEXT: while locking tuple (17,9) in relation “wf_task"
At that time I had 6 instances of this script running simultaneously
No other sessions were active in the database.
Later update
Today I learned something new about Postgres that its very relevant to this question
Starting version 9.5, PostgreSQL supports a SKIP LOCKED statement, that solves the problem that I was trying to design my application around, and in a very elegant manner
If you are struggling with concurrency in PostgreSQL while trying to implement some sort of queue or round robin solution, you absolutely must read this:
https://blog.2ndquadrant.com/what-is-select-skip-locked-for-in-postgresql-9-5/
The problem is probably that the sequential scan in the first SELECT ... FOR UPDATE doesn't always return the rows in the same order, so concurrent executions of this statement lock the rows of the table in different orders. This leads to the deadlock you experience.
There are several solutions, in increasing goodness:
I think that the technique to lock the whole table for this update is horrible for performance, but if you insist on keeping your code, you could set synchronize_seqscans to off so that all sequential scans return the rows in the same order. But you really shouldn't lock all rows in a table like you do, because
It causes an unnecessary sequential scan.
It is not safe. Somebody could INSERT new rows between the time where you lock the rows and the time where you run your UPDATEs.
If you really want to lock the whole table, use the LOCK TABLE statement instead of locking all rows in the table. That will get rid of the deadlock as well.
The best solution is probably to lock the rows with the UPDATE itself. To avoid deadlocks, examine the execution plans that PostgreSQL uses for the UPDATE. This will be an index scan or a sequential scan. With an index scan you are safe, because that will return the rows in a certain order. For a sequential scan, disable the synchronize_seqscans feature mentioned above, ideally only for the transaction:
START TRANSACTION;
SET LOCAL synchronize_seqscans = off;
/* your UPDATEs go here */
COMMIT;
I've been getting a handful of errors on Postgresql, that seem to be tied to this race condition.
I have a process/daemon written in Twisted Python. The easiest way to describe it is as a Web Crawler - it pulls a page, parses the links, and logs what it's seen. Because of HTTP blocking, Twisted runs multiple "concurrent" processes deferred to threads.
Here's the race condition...
When I encounter a url shortener , this logic happens:
result= """SELECT * FROM shortened_link WHERE ( url_shortened = %(url)s ) LIMIT 1;"""
if result:
pass
else:
result= """INSERT INTO shortened_link ( url_shortened ..."
A surprising number or psycopg2.IntegrityError's are raised, because the unique index on url_shortened gets violated.
The select/insert does actually run that close together. From what I can tell, it looks like 2 shortened links get queued next to one another.
Process A: Select, returns Null
Process B: Select, returns Null
Process A: Insert , success
Process B: Insert , integrity error
Can anyone suggest any tips/tricks to handle this ? I'd like to avoid explicit locking, because I know that'll open up a whole other set of problems.
Do it all in a single command:
result= """
INSERT INTO shortened_link ( url_shortened ...
SELECT %(url)s
where not exists (
select 1
from shortened_link
WHERE url_shortened = %(url)s
);"""
It will only insert if that link does not exist.
There's really not a solution that avoids the need to be able to handle the possibility of a unique constraint violation error. If your framework can't do it then I'd wrap the SQL in a PL/pgSQL function or procedure that can.
Given that you can handle the error you might as well not test for the existence of the unique value and just attempt the insert, letting any error be handled by the EXCEPTION clause.
You either need a mutex lock of some kind, or you will have to live with the redundancies that will occur due to the race condition.
If you choose to go with the mutex lock - you don't necessarily need to use a database-level lock. You can simply lock down the Twisted process to block other threads handling a similar shortened url.
If you choose to avoid the lock, remove the unique constraint on the url_shortened field. Periodically, you can move these records to a 'clean' table that contains a single unique copy of each shortened url.