Multiple processes are reading from a table to take jobs - postgresql

Say I am using postgres for job queueing. Meaning I have a table like:
table_jobs:
-id
-name
-taskId
-created_date
-status (open, processing, completed)
So there will be multiple processes reading this table, and they need to lock each row. Is there a way to prevent concurrency issues i.e. multiple processes reading a table and owning a job that is already taken?

There is SELECT ... FOR UPDATE feature:
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be blocked until the current transaction ends.
One possible way of implementing your queue is:
Worker process runs SELECT ... WHERE status = 'open' FOR UPDATE.
Worker process runs UPDATE ... WHERE id IN (...) with IDs it got from previous step.
Worker process does its stuff.
Worker process update tasks statuses to completed.

Related

Atomically update flag in Postgres?

I am implementing a lightweight jobs system in my highly-concurrent application. For simplicity, I will be using Postgres to manage the state of all jobs in the system.
I have a table where processes can mark specific jobs as "running" via a boolean flag is_running. If multiple processes attempt to run the same job, only one of the processes should "win".
I want to have an atomic statement that the application processes will call in an attempt to exclusively mark the job as running.
UPDATE jobs SET is_running = true WHERE is_running = false AND id = 1
If there a is a single row with (id=1, is_running=false), and multiple processes attempt to execute the above statement, is it possible for more than one process to actually set is_running=true?
I only want one of the processes to see an updated row count of 1 - all other processes should see an updated row count of 0.
Your approach is safe and free from race conditions, because PostgreSQL will reevaluate the WHERE condition after it had to wait for a lock caused by a concurrent modification. It will then see the changed value and skip the row.

maxTransactionLockRequestTimeoutMillis with concurrent transactions

I'm trying to get a better understanding of the lock acquisition behavior on MongoDB transactions. I have a scenario where two concurrent transactions try to modify the same document. Since one transaction will get the write lock on the document first, the second transaction will run into a write conflict and fail.
I stumbled upon the maxTransactionLockRequestTimeoutMillis setting as documented here: https://docs.mongodb.com/manual/reference/parameters/#param.maxTransactionLockRequestTimeoutMillis and it states:
The maximum amount of time in milliseconds that multi-document transactions should wait to acquire locks required by the operations in the transaction.
However, changing this value does not seem to have an impact on the observed behavior with a write conflict. Transaction 2 does not seem to wait for the lock to be released again but immediately runs into a write conflict when another transaction holds the lock (other than concurrent writes outside a transaction which will block and wait for the lock).
Do I understand correctly that the configured time in maxTransactionLockRequestTimeoutMillis does not include the act of actually receiving the write lock on the document or is there something wrong with my tests?

MS CRM Executing a Workflow for multiple records - Behavior of execution

I'm seeking a method to execute a workflow in Dynamics CRM, for many records, in such a way that every record chosen will start the process of the workflow only AFTER the previous record has finished processing it's workflow successfully.
I must note that I'm dealing with a workflow which has a few child workflows threaded, and therefore my request, so as not to "tangle up" while updating records in the workflow.
Thanx in advance :)
If Console application is an option, i would write one that:
gets reference to all records that needs to be processed
fires workflow for first one Code to start workflow
in a loop, check if workflow finished for record. You can check this by querying asyncoperation table, regardingobjectid field will have id of record that workflows was executed for
when workflow finished for record, fire workflow for next record and wait for it to finish

Can multiple SELECT FOR UPDATES in a single transaction cause a race condition (Postgres)?

I'm using Postgres 9.1. I'm wondering if using multiple SELECT FOR UPDATES in the same transaction could potentially cause a race condition.
2 concurrent transactions:
transaction 1: select for update on table 1 -- successfully acquires lock
transaction 2: select for update on table 2 -- successfully acquires lock
transaction 2: select for update on table 1 -- waiting for lock release from transaction 1
transaction 1: select for update on table 2 -- waiting for lock release from transaction 2
What happens in this situation? Does one of the waiting transactions eventually time out? If so, is there a way to configure the timeout duration?
edit: is deadlock_timeout the configuration I am looking for?
Yes, you should look for the deadlock_timeout in the docs.
But your scenario doesn't means that there will be a deadlock, 'cos PostgreSQL is using row-level locks and it is not clear whether your transactions are concurring for the same rows.
Another option is to use serialization level higher then default READ COMMITTED. But in this case your application should be ready to receive exceptions with SQLCODE=40001:
ERROR: could not serialize access due to concurrent update
This is expected, you should just re-try transaction as is.
A very good overview of Serializable isolation level you can find on the wiki.
PostgreSQL will detect the deadlock on step 4 and will fail the transaction. Here's what happened when I tried it in psql (only showing step 4):
template1=# SELECT * FROM table2 FOR UPDATE;
ERROR: deadlock detected
DETAIL: Process 17536 waits for ShareLock on transaction 166946; blocked by process 18880.
Process 18880 waits for ShareLock on transaction 166944; blocked by process 17536.
HINT: See server log for query details.
template1=#
This happens after 1s, which is the default timeout. The other answer has more information about this.

Conditional Work Queue Insertion for beanstalkd?

I'm using the Perl client of beanstalkd. I need a simple way to not enqueue the same work twice.
I need something that needs to basically wait until there are K elements, and then groups them together. To accomplish this, I have the producer:
insert item(s) into DB
insert a queue item into beanstalkd
And the consumer:
while ( 1 ) {
beanstalkd.retrieve
if ( DB items >= K )
func_to_process_all_items
kill job
}
This is linear in the number of requests/processing, but in the case of:
insert 1 item
... repeat many times ...
insert 1 item
Assuming all these insertions happened before a job was retrieved, this would add N queue items, and it would do something as such:
check DB, process N items
check DB, no items
... many times ...
check DB, no items
Is there a smarter way to do this so that it does not insert/process the later job requests unnecessarily?
I had a related requirement. I only wanted to process a specific job once within a few minutes, but the producer could queue several instances of the same job. I used memcache to store the job identifier and set the expiry of the key to just a few minutes.
When a worker tried to add the job identifier to memcache, only the first would succeed - on failure to add the job id, the worker would delete the job. After a few minutes, the key expires from memcache and the job can be processed again.
Not particularly elegant, but it works.
Will this work for you?:
Create two Tubes "buffer" and "live". Your producer always only adds to the "buffer" tube.
Create two workers one watches the "buffer" and the other watches the "live" that call the blocking reserve() call
Whenever the "buffer" worker returns on reserve, it buries the job if there are less than K items. If there are exactly K, then it "kicks" all K jobs and transfers them to the "live" tube.
The "live" watcher will now return on its own reserve()
You just need to take care that a job does not ever return to the buffer queue from the buried state. A failsafe way to do this might be to delete it and then add it to live.
The two separate queues are only for cleaner separation. You could do the same with a single queue by burying everyjob until there are K-1 and then on the arrival of the K-th job, kicking all of them live.