I want to make a persistent job queue in postgresql. So that multiple workers can select one job from the queue (using select for update with skip locked), process it and than delete it from the queue. I have a table:
create table queue (
id serial primary key
some_job_param1 text not null
some_job_param2 text not null
)
Now if there are two jobs then it works fine:
the worker1 starts a transaction and selects the first job
begin;
select * from queue for update skip locked limit 1;
starts the processing. worker2 does the same thing and selects the second job with the same query.
After worker1 does it's job, deletes it from the queue and commits the transaction:
delete from queue where id=$1;
commit;
Then the worker1 is ready for a new job, so it does the same thing. Begins a new transaction, selects for a job that isn't locked. But the problem is, that there are no more jobs, the query returns zero rows.
Ideal would be if the query would block until there is a new job and it returns a result. Is it somehow possible? Or I'm going in a wrong direction?
EDIT:
the workers are an external process. So if the worker died, the session dies and the transaction also. Then the selected job will be locked no more and ready for another worker. The pseudo code would look like this:
while (true) {
tx = db.new_transaction()
job_id, param1, param2 = tx.query("select * from queue for update skip locked limit 1")
try {
some_long_processing(param1, param2)
tx.commit()
} catch {
tx.rollback()
}
}
Related
I running some automated tasks on my postgres database at night using the pg_cron extension. I am moving certain old records to archive database tables. I am running 5 Stored Procedures concurrently on 5 different background workers, so they all start at the same time and run on different workers (I am assuming this is similar to running different Tasks on different Threads in Java). These 5 Stored Procedures are independent (moving records to archive tables), so they can run at the same time. I schedule them each using a command like
cron.schedule (myJob1,
'* * * * *',
'call my_stored_proc_1()'
);
cron.schedule (myJob2,
'* * * * *',
'call my_stored_proc_2()'
);
.
..
...
cron.schedule (myJob5,
'* * * * *',
'call my_stored_proc_5()'
);
NOW, I have some MORE dependent Store Procedures that I want to run. But they need to run AFTER these 5 Jobs finish/complete, because they are doing some DELETE... sql operations.
How can I have this second Stored Procedure (the one doing the DELETE queries) Job run AFTER my first 5 Stored Procedures Jobs when they are DONE? I don't want to set a CRON expression for the second Stored Procedure doing the DELETES, because I don't know what time the first 5 Stored Procs are even going to finish...
Below I included a little schematic of how the Jobs are currently triggered and how I want it to work (if possible):
Preface: how I understand problem
I hope that I understand the problem described by OP.
If I was wrong then it makes everything below invalid.
I suppose that it's about periodic night tasks heavy in CPU and/or IO.
E.g:
there are tasks A-C for archiving data
maybe task D-E for rebuilding aggregates / refreshing mat views
and finally task F that runs reindexing/analyze on whole DB
So it makes sense to run task F only after tasks A-E are finished.
Every task is needed to be run just once in a period of time:
once in a day or hour or week or only during weekends in a night time
it's better not to run in a time when server is under load
Does it fits with OP requirement - IDK.
For the sake of simplicity let's presume that each task runs only once in a night. It's easy to extend for other periods/requirements.
Data-driven approach
1. Add log table
E.g.
CREATE TABLE job_log (
log_id bigint,
job_name text,
log_date timestamptz
)
Tasks A-E
On start
For each job function do check:
IF EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskA' # TaskB-TaskE for each functiont
AND log_date::DATE = NOW()::DATE # check that function already executed this night
) OR EXISTS(
SELECT 1 FROM pg_stat_activity
WHERE
query like 'SELECT * FROM jobA_function();' # check that job not executing right now
) THEN RETURN;
END IF;
It's possible that other conditions could be added: look for amount of connections, existence of locks and so on.
This way it will be guaranteed that function will not be executed more frequently than needed.
On finish
INSERT INTO job_log
SELECT
(SELECT MAX(log_id) FROM job_log) + 1 # or use sequences/other autoincrements
,'TaskA'
,NOW()
Cronjob schedule
The meaning of it becames different.
Now it's: "try to initiate execution of task".
It's safe to schedule it for every hour between a chosen period or even more frequently.
Cronjob cannot know if the server is under load or not, are there locks on a table or maybe somebody started execution of task manually.
Job function could be more smart in that.
Task F
Same as above but check on start looks for completion of other tasks.
E.g.
IF NOT EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskA'
AND log_date::DATE = NOW()::DATE
) OR NOT EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskB'
AND log_date::DATE = NOW()::DATE
)
.... # checks for completions of other tasks
OR EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskF' # TaskB-TaskE for each functiont
AND log_date::DATE = NOW()::DATE # check that function already executed this night
) OR EXISTS(
SELECT 1 FROM pg_stat_activity
WHERE
query like 'SELECT * FROM jobF_function();' # check that job not executing right now
) THEN RETURN;
On completion
Write to job_log the same as other functions.
UPDATE. Cronjob schedule
Create multiple schedule in cronjob.
E.g.
Let's say tasks A-E will run approximately 10-15 minutes.
And it's possible that one or two of them could work for 30-45-60 minutes.
Create a schedule for task F to attempt start every 5 minutes.
How that will work:
attempt 1: task A finished, other still working -> exit
attempt 2: task A-C finished -> exit
attempt 3: tasks A-E finished -> start task F
attempt 4: tasks A-E finished but in pg_stat_activity there is an executing task F -> exit
attempt 5: tasks A-E finished, pg_stat_activity is empty but in logs we see that task F already executed -> no need to work -> exit
... all other attempts will be the same till next night
Summary
It's easy extend this approach for any requirements:
another periodicity
or make it unperiodic at all. E.g. make a table with trigger and start execution on change
dependencies of any depth and/or "fuzzy" dependencies
... literally everything
Conception remains the same:
cronjob schedule means "try to run"
decision to run or not is data-driven
I would be glad to hear criticism of any kind - who knows maybe I'm overlooking something.
You could to use pg_stat_activity view to ensure that there are no active query like your jobs 1-5.
Note:
Superusers and members of the built-in role pg_read_all_stats (see also Section 21.5) can see all the information about all sessions
...
while (
select count(*) > 0
from pg_stat_activity
where query in ('call my_stored_proc_1()', 'call my_stored_proc_2()', ...))
loop
perform pg_sleep(1);
perform pg_stat_clear_snapshot(); -- needs to retrieve the fresh data
end loop;
...
Just insert this code at the beginning of your stored proc 6 and call it for a few seconds after the jobs 1-5.
Note 1:
The condition could be simplified and generalized using regexp:
when query ~ 'my_stored_proc_1|my_stored_proc_2|...'
Note 2:
You could to implement timeout using clock_timestamp() function:
...
is_timedout := false;
timeout := '10 min'::interval; -- stop waiting after 10 minutes
start_time := clock_timestamp();
while (...)
loop
perform pg_sleep(1);
perform pg_stat_clear_snapshot(); -- needs to retrieve the fresh data
if clock_timestamp() - start_time > timeout then
is_timedout := true;
break;
end if;
end loop;
if is_timedout then
...
else
...
end if;
...
Note 3:
Look at the other columns of the pg_stat_activity. You may need to use them as well.
What I am trying to achieve is to have multiple instances of the same application running at the same time, but only one of those instances running a cron, by locking it in a Postgres database.
My solution so far is :
Running a cron on all the instances.
Inserting a row in a table cron_lock with a unique identifier for the cron.
If I have an error while running the insert query, it is most likely because the row already exists (the cron identifier is the primary key of the table). If that is the case, I do nothing, and I exit.
If I don't have an error while running the insert query, then the application instance will run the cron process.
At the end of my process, I delete the row with the unique identifier.
This solution is working, but I am not sure if another locking mechanism would exist with Postgres, in particular one that would not have me execute queries that are creating errors.
Thanks to #Belayer I found a nice way to do it with advisory locks.
Here is my solution :
Each of my crons have an associated and unique ID (integer format).
All of the crons start on all the different servers. But before running the main function of the cron, I try to get an advisory lock with the unique ID in the database. If the cron can get the lock, then it will run the main function and free the lock, otherwise, it just stops.
And here is some pseudo code if you want to implement it in a language of your choice :
enum Cron {
Echo = 1,
Test = 2
}
function uniqueCron(id, mainFunction) {
result = POSTGRES ('SELECT pg_try_advisory_lock($id) AS "should_run"')
if(result == FALSE){ return }
mainFunction()
POSTGRES ('SELECT pg_advisory_unlock($id)')
}
cron(* * * * *) do {
uniqueCron(Cron.Echo, (echo "Unique cron"))
}
cron(*/5 * * * *) do {
uniqueCron(Cron.Test, (echo "Test"))
}
Running this process many times, or on many different servers, all using the same database, will result in only one mainFunction being executed at once, given that all crons are launched at the same time (same time/timezone on the different servers). A main function too short to execute might cause problems if one server try to get the lock and another already released it. In that case, wait a little before releasing the lock.
I would like to lock a table for writing during a period of time, while leaving it available for reading.
Is that possible ?
Ideally I would like to lock the table with a predicate (for example prevent writing rows "where country = france").
If you really want to lock against such inserts, i.e. the query should hang and only continue when you allow it, you would have to place a SHARE lock on the table and keep the transaction open.
This is usually not a good idea.
If you want to prevent any such inserts, i.e. throw an error when such an insert is attempted, create a BEFORE INSERT trigger that throws an exception if the NEW row satisfies the condition.
You can use FOR SHARE lock, which blocks other transactions from performing like UPDATE and DELETE, while allowing SELECT FOR SHARE. (Read the docs for details: https://www.postgresql.org/docs/9.4/explicit-locking.html [13.3.2])
For example, there are 2 processes accessing table user_table, in the following sequence:
Process A: BEGIN;
Process A: SELECT username FROM user_table WHERE country = france FOR SHARE;
Process B: SELECT * FROM user_table FOR SHARE; (In here, process B can still read all the rows of the table)
Process B: UPDATE user_table SET username = 'test' WHERE country = france; (In here, process B is blocked and is waiting for process A to finish its transaction)
I have the following schema
ID (PK)| REF_ID | ACTIVE | STATUS
ID - Primary Key
I am using following query to select and update
BEGIN;
select * from table where ref_id = $1 and is_active is true for update;
UPDATE table set status = $1 where id =$2;
END;
Explanation for above
1) Select query result will be used to lock all the rows with provided ref ID and that result is used for some business logic
2) Update query to update the STATUS of a row which is part of same ref ID
ISSUE
postgres#machine ERROR: deadlock detected
postgres#machine DETAIL: Process 28297 waits for ShareLock on transaction 4809510; blocked by process 28296.
Process 28296 waits for ShareLock on transaction 4809502; blocked by process 28297.
Process 28297: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
Process 28296: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
postgres#machine ERROR: deadlock detected
postgres#machine DETAIL: Process 28454 waits for ShareLock on transaction 4810111; blocked by process 28384.
Process 28384 waits for ShareLock on transaction 4810092; blocked by process 28297.
Process 28297 waits for AccessExclusiveLock on tuple (113628,5) of relation 16817 of database 16384; blocked by process 28454.
Process 28454: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
Process 28384: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
Process 28297: select * from jobs where ref_id ='a840a8bd-b8a7-45b2-a474-47e2f68e702d' and is_active is true for update
This table is used in highly concurrent and distributed application (100's in parallel with same ref_id) and thats why i wanted to avoid distributed lock by having select and then update in same transaction.But i am facing with this deadlock error I don't know why explicit locking is not working.
Expected behaviour is that any other job with same reference ID must wait if any one else with same reference ID has acquired the lock
Help me figure out what I am missing or another workaround for this. I am still not clear even after explicit locking and being within transaction why is deadlock occurring.
As Laurenz said, in this simple case you should be able to eliminate the possibility of deadlock with an ORDER BY in your locking query.
A deadlock arises when, for example:
Process A acquires a lock on row 1
Process B acquires a lock on row 2
Process A requests a lock on row 2 (and waits for B to release it)
Process B requests a lock on row 1 (and waits for A to release it)
...And at this point, the processes will be waiting on each other forever (or rather, until the server notices, and kills one of them off).
But if both processes had agreed ahead of time to lock row 1 and then row 2, then this wouldn't have happened; one process would still be waiting on the other, but the other is free to proceed.
More generally, as long as all processes agree to follow the same ordering when acquiring locks, it's guaranteed that at least one of them is always making progress; if you only ever try to acquire locks which are "higher" than the ones you already hold, then whoever holds the "highest" lock will never be waiting on anyone.
The ordering needs to be unambiguous, and stable over time, so a generated primary key is ideal (i.e. you should ORDER BY id).
We have a few different tasks (eg. process image, process video) that are created when a user uploads media. The concept we have at the moment is to have a primary Task that is the container for all task types, and a Subtask which has all the metadata required to process the task.
I've seen countless cases where the answer is to use Redis, but we would like to keep a task history to calculate things like average task time, and the metadata of subtasks can be complex.
I'm not very experienced with PostgreSQL so see this as pseudocode:
BEGIN TRANSACTION;
-- Find an unclaimed task.
-- Claim the task.
-- Prevent another worker from also claiming this task.
UPDATE (
SELECT FROM subtasks
INNER JOIN tasks
ON tasks.id = subtasks.id
WHERE tasks.started_at = NULL -- Not claimed
ORDER BY tasks.created_at ASC -- First in, first out
LIMIT 1
FOR UPDATE SKIP LOCKED -- Don't wait for it, keep looking.
)
SET tasks.started_at = now()
RETURNING *
-- Use metadata from the subtask to perform the task.
-- If an error occurs we can roll back, unlocking the row.
-- Will this also roll back if the worker dies?
-- Mark the task as complete.
UPDATE tasks
SET completed_at = now()
WHERE tasks.id = $id
END TRANSACTION;
Will this work?
Edit 1: Using clock_timestamp() and no subselect.
BEGIN TRANSACTION
-- Find an unclaimed task.
SELECT FROM subtasks
INNER JOIN tasks
ON tasks.id = subtasks.id
WHERE tasks.started_at = NULL -- Not claimed
ORDER BY tasks.created_at ASC -- First in, first out
LIMIT 1 -- We only want to select a single task.
FOR UPDATE SKIP LOCKED -- Don't wait for it, keep looking.
-- Claim the task.
UPDATE tasks
SET started_at = clock_timestamp()
WHERE id = $taskId
-- Use metadata from the subtask to perform the task.
-- If an error occurs we can roll back, unlocking the row.
-- Will this also roll back if the worker dies?
-- Task is complete.
-- Mark the task as complete.
UPDATE tasks
SET completed_at = clock_timestamp()
WHERE id = $taskId
END TRANSACTION