Is this function free from race-conditions? - postgresql

I wrote a function which returns me a record ID which is in the PENDING state (the column state). And it should limit active downloads (records that are in states STARTED or COMPLETED). So, it helps me to avoid resource limits problems. The functions is:
CREATE FUNCTION start_download(max_run INTEGER) RETURNS INTEGER
LANGUAGE PLPGSQL AS
$$
DECLARE
started_id INTEGER;
BEGIN
UPDATE downloads SET state='STARTED' WHERE id IN (
SELECT id FROM downloads WHERE state='PENDING' AND (
SELECT max_run >= COUNT(id) FROM downloads WHERE state::TEXT IN ('STARTED','COMPLETED'))
LIMIT 1 FOR UPDATE)
RETURNING id INTO started_id;
RETURN started_id;
COMMIT;
END;
$$;
Is it safe in the meaning of race-conditions? I mean that the resources limit will not be hit because of race condition (2 or more threads will get ID of some PENDING record or even of the same record and the limit of active, i.e. STARTED/COMPLETED downloads will be reached).
In short this function should work as test-and-set procedure and to return the available ID (switched from PENDING to STARTED, but it's irrelevant details). Does it have such property - free from race conditions? Or maybe I have to use some locks...
PS. 1) downloads table has columns id, state (an enum with values like STARTED, PENDING, ERROR, COMPLETED, PROCESSED) and others that don't matter for the question's context. 2) max_run is the limit of active downloads.

Yes, there is a race condition in the innermost subquery.
If two concurrent sessions run your function at the same time, they both could find that max_run is equal to the count of started or completed jobs, and they would both start a job, thereby pushing the number of started or running jobs over the limit.
That is not easy to avoid, unless you lock all the started or completed jobs (very bad for concurrency) or use a higher transaction isolation level (SERIALIZABLE).

Related

PostgreSQL BEFORE INSERT trigger locking behavior in a concurrent environment

I have a general function that can manipulate the sequence of any table (why is irrelevant to my question). It reads the current value, works out the new value, sets it, and returns its calculation, which is what's inserted. This is obviously a multi-step process.
I call it from a BEFORE INSERT trigger on tables where I need it.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
Specifically, does the BEFORE INSERT trigger have to complete before it is called again by another caller?
Logically, I would assume yes, but one never knows what may be going on under the hood.
If the answer is no, what minimal locking would I need on the function to guarantee I can read and write the sequence in a "thread-safe" manner?
I'm using PG 10.
EDIT
Here is the function updated with a lock:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
seq text := format('%I.%I_uts_seq', tg_table_schema, tg_table_name);
BEGIN
EXECUTE format('LOCK TABLE %I IN ROW EXCLUSIVE MODE;', tg_table_name);
EXECUTE 'SELECT last_value+1 FROM ' || seq INTO sv; -- currval(seq) isn't useable
PERFORM setval(seq, GREATEST(sv, (EXTRACT(epoch FROM localtimestamp) * 1000000)::int8), false);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
However, a SELECT already acquires ROW EXCLUSIVE, so this statement may be redundant and a stronger lock may be needed. Or, conversely, it may mean no lock is needed.
UPDATE
If I am reading this SO question correctly, my original version without the LOCK should work since the trigger acquires the same lock my updated function is redundantly taking.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
No. Not related to calling functions itself, but you can achieve this behaviour with SERIALIZABLE transaction isolation level:
This level emulates serial transaction execution for all committed
transactions; as if transactions had been executed one after another,
serially, rather than concurrently
But this approach would introduce several tradeoffs, such preparing your application to retry transactions with serialization failure.
Maybe a missed something, but I really believe that you just need NEXTVAL, something like below:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
-- First, use %I wildcard for identifiers instead of %s
seq text := format('%I.%I', tg_table_schema, tg_table_name || '_uts_seq');
BEGIN
-- Second, you couldn't call CURRVAL on a session
-- that you didn't issued NEXTVAL before
sv := NEXTVAL(seq);
-- Do your logic here...
-- Result is ignored since this is an STATEMENT trigger
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Remember that CURRVAL acts on session local scope and NEXTVAL on global scope, so you have a reliable thread-safe mechanism in hands.
The sequence itself handles thread safety with concurrent sessions. So it real comes down to the code that is interacting with the sequence. The following code is thread safe:
SELECT nextval('myseq');
If the sequence is doing much fancier things like setval and currval, I would be more worried about that being done in a high transaction/multi-user environment. Even so, the sequence itself should be locked from other queries while the sequence is being manipulated.

How can two DELETE queries deadlock in Postgres?

Among the many things we do with Postgres at work, we use it as a cache for certain kinds of remote requests. Our schema is:
CREATE TABLE IF NOT EXISTS cache (
key VARCHAR(256) PRIMARY KEY,
value TEXT NOT NULL,
ttl TIMESTAMP DEFAULT NULL
);
CREATE INDEX IF NOT EXISTS idx_cache_ttl ON cache(ttl);
This table does not have triggers or foreign keys. Updates are typically:
INSERT INTO cache (key, value, ttl)
VALUES ('Ethan is testing8393645', '"hi6286166"', sec2ttl(300))
ON CONFLICT (key) DO UPDATE
SET value = '"hi6286166"', ttl = sec2ttl(300);
(Where sec2ttl is defined as:)
CREATE OR REPLACE FUNCTION sec2ttl(seconds FLOAT)
RETURNS TIMESTAMP AS $$
BEGIN
IF seconds IS NULL THEN
RETURN NULL;
END IF;
RETURN now() + (seconds || ' SECOND')::INTERVAL;
END;
$$ LANGUAGE plpgsql;
Querying the cache is done in a transaction like this:
BEGIN;
DELETE FROM cache WHERE ttl IS NOT NULL AND now() > ttl;
SELECT value FROM cache WHERE key = 'Ethan is testing6460437';
COMMIT;
There are a few things not to like about this design -- the DELETE happening in cache "reads", the index on cache.ttl is not ascending which makes it kind of useless, (edit: ASC is the default, thanks wargre!) plus the fact that we're using Postgres as a cache at all. But all of that would have been acceptable except that we've started getting deadlocks in production, which tend to look like this:
ERROR: deadlock detected
DETAIL: Process 12750 waits for ShareLock on transaction 632693475; blocked by process 10080.
Process 10080 waits for ShareLock on transaction 632693479; blocked by process 12750.
HINT: See server log for query details.
CONTEXT: while deleting tuple (426,1) in relation "cache"
[SQL: 'DELETE FROM cache WHERE ttl IS NOT NULL AND now() > ttl;']
Investigating the logs more thoroughly indicates that both transactions were performing this DELETE operation.
As far as I can tell:
My transactions are in READ COMMITTED isolation mode.
ShareLocks are grabbed by one transaction to indicate that it wants to mutate rows that another transaction has mutated (i.e. locked).
Based on the output of an EXPLAIN query, the ShareLocks should be grabbed by both DELETE transactions in physical order.
The deadlock indicates that both queries locked rows in a different order.
If all that is correct, then somehow some simultaneous transaction has changed the physical order of rows. I see that an UPDATE can move a row to an earlier or later physical position, but in my application, the UPDATEs always remove rows from consideration by the DELETEs (because they're always extending a row's TTL). If the rows were previously in physical order, and you remove one, then you're still left with physical order. Similarly for DELETE. We're not doing any VACUUM or any other operation which you might expect to reorder rows.
Based on Avoiding PostgreSQL deadlocks when performing bulk update and delete operations, I tried to change the DELETE queries to:
DELETE FROM cache c
USING (
SELECT key
FROM cache
WHERE ttl IS NOT NULL AND now() > ttl
ORDER BY ttl ASC
FOR UPDATE
) del
WHERE del.key = c.key;
However, I'm still able to get deadlocks locally. So generally, how can two DELETE queries deadlock? Is it because they're locking in an undefined order, and if so, how do I enforce a specific order?
You should instead ignore expired cache entries, so you will not depend on a frequent delete operation for cache expiration:
SELECT value
FROM cache
WHERE
key = 'Ethan is testing6460437'
and (ttl is null or ttl<now());
And have another job that periodically chooses keys to delete skipping already locked keys, which has to either force a well defined order of deleted row, or, better, skip already locked for update rows:
with delete_keys as (
select key from cache
where
ttl is not null
and now()>ttl
for update skip locked
)
delete from cache
where key in (select key from delete_keys);
If you can't schedule this periodically you should run this cleanup like randomly once every 1000 runs of your select query, like this:
create or replace function delete_expired_cache()
returns void
language sql
as $$
with delete_keys as (
select key from cache
where
ttl is not null
and now()>ttl
for update skip locked
)
delete from cache
where key in (select key from delete_keys);
$$;
SELECT value
FROM cache
WHERE
key = 'Ethan is testing6460437'
and (ttl is null or ttl<now());
select delete_expired_cache() where random()<0.001;
You should avoid writes, as they are expensive. Don't delete cache so often.
Also you should use timestamp with time zone type (or timestamptz for short) instead of simple timestamp - especially if you don't know why - a timestamp is not the thing most think it is - blame SQL standard.

Why my sp deadlocked in a transaction?

This is what the sp does:
SET TRANSACTION ISOLATION LEVEL Serializable
begin transaction
IF EXISTS( SELECT 1 FROM dbo.Portfolio WHERE RawMessageID = #RawMessageID)
begin
----do some cleaning job like delete the portfolio, or something
end
insert into Portfolio
select #a,#b,#c,#d
commit
Randomly, I see deadlock happen and this deadlock graph showing the detail.
So it has to be that one instance of the call hold the shared lock from the select statement and asking for an null resource lock. Another instance holds the null resource lock and asks for shared lock.
Is my guess right? I was trying to write an simplified version to demo this, but never can trigger this deadlock.
Can someone help?

How to ensure a unique number field with zero order

Here is a table that has fields id, id_user, order_id.
Required when creating a record to find the last number of user and insert the following in order.
I wrote a stored procedure that takes the next order number to the user, but even it does not provide a unique order number.
CREATE OR REPLACE FUNCTION get_next_order()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $function$
DECLARE
next_order_num bigint;
BEGIN
select order_id + 1 INTO next_order_num
from payment_out
where payment_out.id_usr = NEW.id_usr
and payment_out.order_id is not null
order by payment_out.order_id desc
limit 1;
-- if payments does't exist, return 1
NEW.order_id = coalesce(next_order_num, 1);
return NEW;
END;
$function$
CREATE TRIGGER get_next_order
BEFORE INSERT
ON payment_out
FOR EACH ROW EXECUTE
PROCEDURE get_next_order()
How can I avoid duplicate order numbers?
For this to work in the presence of multiple concurrent transactions inserting orders for the same user, you need a lock on a particular record to make them wait and execute serially.
e.g., before the first SELECT, you might:
PERFORM 1 FROM "users" where id_user = NEW.id_user FOR UPDATE;
where you lock the parent "users" record that owns the orders.
Otherwise, multiple concurrent transactions could execute your procedure at the same time, but they can't see each others' inserted values, so they'll pick the same numbers.
However, beware: A foreign key constraint will cause a SHARE lock to be taken on the users entry already, when you insert into a table that depends on it. Your trigger will try to upgrade that into an UPDATE lock, but multiple transactions might already hold the SHARE lock, so this will block. You'll land up with transactions all waiting for each other, until PostgreSQL kills all but one of them in a deadlock abort error. The only way to avoid this is for the application to SELECT 1 FROM users WHERE id_user = blahblah FOR UPDATE before it creates the orders for that user.
A variant is to keep a next_order_id field in users and do an UPDATE users SET next_order_id = next_order_id + 1 RETURNING next_order_id, and use the result of that to set the order ID. The same lock upgrade problem applies.

PostgreSQL concurrent update selects

I am attempting to have some sort of update select for a job queue. I need it to support concurrent processes affecting the same table or database This server will be used only for the queue so a database per queue is acceptable. Originally I was thinking about something like the following:
UPDATE state=1,ts=NOW() FROM queue WHERE ID IN (SELECT ID FROM queue WHERE state=0 LIMIT X) RETURN *
Which I been reading that this will cause a race condition, I read that there was a an option for the SELECT subquery to use FOR UPDATE, but then that will lock the row and concurrent calls will be blocked where I would not mind if they skip over to the next unlocked row.
So what i am asking for is the best way to have a fifo system in postgres that requires the least amount of locking the entire database.
The typical way to do this is to wrap it in a PLPGSQL function, select FOR UPDATE NOWAIT, and then use exception handling to skip the locked rows.
This does place some additional overhead on the function because exception handling requires additional processor cycles to manage even if there are no exceptions.
As a very simple example:
CREATE OR REPLACE FUNCTION get_all_unlocked_customers() RETURNS SETOF customer
LANGUAGE PLPGSQL AS
$$
RETURN QUERY SELECT * FROM customer FOR UPDATE NOWAIT;
EXCEPTION
WHEN LOCK_NOT_AVAILABLE THEN
-- NO NEED TO DO ANYTHING
END;
END;
$$;