PostgreSQL concurrent update selects - postgresql

I am attempting to have some sort of update select for a job queue. I need it to support concurrent processes affecting the same table or database This server will be used only for the queue so a database per queue is acceptable. Originally I was thinking about something like the following:
UPDATE state=1,ts=NOW() FROM queue WHERE ID IN (SELECT ID FROM queue WHERE state=0 LIMIT X) RETURN *
Which I been reading that this will cause a race condition, I read that there was a an option for the SELECT subquery to use FOR UPDATE, but then that will lock the row and concurrent calls will be blocked where I would not mind if they skip over to the next unlocked row.
So what i am asking for is the best way to have a fifo system in postgres that requires the least amount of locking the entire database.

The typical way to do this is to wrap it in a PLPGSQL function, select FOR UPDATE NOWAIT, and then use exception handling to skip the locked rows.
This does place some additional overhead on the function because exception handling requires additional processor cycles to manage even if there are no exceptions.
As a very simple example:
CREATE OR REPLACE FUNCTION get_all_unlocked_customers() RETURNS SETOF customer
LANGUAGE PLPGSQL AS
$$
RETURN QUERY SELECT * FROM customer FOR UPDATE NOWAIT;
EXCEPTION
WHEN LOCK_NOT_AVAILABLE THEN
-- NO NEED TO DO ANYTHING
END;
END;
$$;

Related

Can postgres insert triggers and/or check be ran without inserting

I would love to be able to validate objects representing table rows using the database's existing constraints (triggers that raise exceptions and checks) without actually inserting them into the database.
Is there currently a way one could do this in postgres? At least with BEFORE INSERT triggers and CHECK, I assume it makes no sense with AFTER INSERT triggers.
The easiest way I can think or right now would be to:
Lock the table
Insert a new row
If exception raise to the API / else DELETE the row and call it valid
Unlock
But I can see several issues with this.
A simpler way is to insert within a transaction and not commit:
BEGIN;
INSERT INTO tbl(...) VALUES (...);
-- see effects ...
ROLLBACK;
No need for additional locking. The row is never visible to any other transaction with default transacton isolation level READ COMMITTED. (You might be stalling concurrent writes that confict with the tested row.)
Notable side-effect: Sequences of serial or IDENTITY columns are advanced even if the INSERT is never committed. But gaps in sequential numbers are to be expected anyway and nothing to worry about.
Be wary of triggers with side-effects. All "transactional" SQL effects are rolled back, even most DDL commands. But some special operations (like advancing sequences) are never rolled back.
Also, DEFERRED constraints do not kick in. The manual:
DEFERRED constraints are not checked until transaction commit.
If you need this a lot, work with a copy of your table, or even your database.
Strictly speaking, while any trigger / constraint / concurrent event is allowed, there is no other way to "validate objects" than to insert them into the actual target table in the actual target database at the actual point in time. Triggers, constraints, even default values, can interact with the current state of the whole DB. The more possibilities are ruled out and requirements are reduced, the more options we might have to emulate the test.
CREATE FUNCTION validate_function ( )
RETURNS trigger LANGUAGE plpgsql
AS $function$
DECLARE
valid_flag boolean := 't';
BEGIN
--Validation code
if valid_flag = 'f' then
RAISE EXCEPTION 'This record is not valid id %', id
USING HINT = 'Please enter valid record';
RETURN NULL;
else
RETURN NEW;
end if;
END;
$function$
CREATE TRIGGER validate_rec BEFORE INSERT OR UPDATE ON some_tbl
FOR EACH ROW EXECUTE FUNCTION validate_function();
With this function and trigger you validate inside the trigger. If the new record fails validation you set the valid_flag to false and then use that to raise exception. The RETURN NULL; is probably redundant and I am not sure it will be reached, but if it is it will also abort the insert or update. If the record is valid then you RETURN NEW and the insert/update completes.

Is this function free from race-conditions?

I wrote a function which returns me a record ID which is in the PENDING state (the column state). And it should limit active downloads (records that are in states STARTED or COMPLETED). So, it helps me to avoid resource limits problems. The functions is:
CREATE FUNCTION start_download(max_run INTEGER) RETURNS INTEGER
LANGUAGE PLPGSQL AS
$$
DECLARE
started_id INTEGER;
BEGIN
UPDATE downloads SET state='STARTED' WHERE id IN (
SELECT id FROM downloads WHERE state='PENDING' AND (
SELECT max_run >= COUNT(id) FROM downloads WHERE state::TEXT IN ('STARTED','COMPLETED'))
LIMIT 1 FOR UPDATE)
RETURNING id INTO started_id;
RETURN started_id;
COMMIT;
END;
$$;
Is it safe in the meaning of race-conditions? I mean that the resources limit will not be hit because of race condition (2 or more threads will get ID of some PENDING record or even of the same record and the limit of active, i.e. STARTED/COMPLETED downloads will be reached).
In short this function should work as test-and-set procedure and to return the available ID (switched from PENDING to STARTED, but it's irrelevant details). Does it have such property - free from race conditions? Or maybe I have to use some locks...
PS. 1) downloads table has columns id, state (an enum with values like STARTED, PENDING, ERROR, COMPLETED, PROCESSED) and others that don't matter for the question's context. 2) max_run is the limit of active downloads.
Yes, there is a race condition in the innermost subquery.
If two concurrent sessions run your function at the same time, they both could find that max_run is equal to the count of started or completed jobs, and they would both start a job, thereby pushing the number of started or running jobs over the limit.
That is not easy to avoid, unless you lock all the started or completed jobs (very bad for concurrency) or use a higher transaction isolation level (SERIALIZABLE).

Which explicit lock to use for a trigger?

I am trying to understand which type of a lock to use for a trigger function.
Simplified function:
CREATE OR REPLACE FUNCTION max_count() RETURNS TRIGGER AS
$$
DECLARE
max_row INTEGER := 6;
association_count INTEGER := 0;
BEGIN
LOCK TABLE my_table IN ROW EXCLUSIVE MODE;
SELECT INTO association_count COUNT(*) FROM my_table WHERE user_id = NEW.user_id;
IF association_count > max_row THEN
RAISE EXCEPTION 'Too many rows';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE CONSTRAINT TRIGGER my_max_count
AFTER INSERT OR UPDATE ON my_table
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE max_count();
I initially was planning to use EXCLUSIVE but it feels too heavy. What I really want is to ensure that during this function execution no new rows are added to the table with concerned user_id.
If you want to prevent concurrent transactions from modifying the table, a SHARE lock would be correct. But that could lead to a deadlock if two such transactions run at the same time — each has modified some rows and is blocked by the other one when it tries to escalate the table lock.
Moreover, all table locks that conflict with SHARE UPDATE EXCLUSIVE will lead to autovacuum cancelation, which will cause table bloat when it happens too often.
So stay away from table locks, they are usually the wrong thing.
The better way to go about this is to use no explicit locking at all, but to use the SERIALIZABLE isolation level for all transactions that access this table.
Then you can simply use your trigger (without lock), and no anomalies can occur. If you get a serialization error, repeat the transaction.
This comes with a certain performance penalty, but allows more concurrency than a table lock. It also avoids the problems described in the beginning.

Insert values in a loop and see the progress postgresql [duplicate]

I have Postgresql Function which has to INSERT about 1.5 million data into a table. What I want is I want to see the table getting populated with every one records insertion. Currently what is happening when I am trying with say about 1000 records, the get gets populated only after the complete function gets executed. If I stop the function half way through, no data gets populated. How can I make the record committed even if I stop after certain number of records have been inserted?
This can be done using dblink. I showed an example with one insert being committed you will need to add your while loop logic and commit every loop. You can http://www.postgresql.org/docs/9.3/static/contrib-dblink-connect.html
CREATE OR REPLACE FUNCTION log_the_dancing(ip_dance_entry text)
RETURNS INT AS
$BODY$
DECLARE
BEGIN
PERFORM dblink_connect('dblink_trans','dbname=sandbox port=5433 user=postgres');
PERFORM dblink('dblink_trans','INSERT INTO dance_log(dance_entry) SELECT ' || '''' || ip_dance_entry || '''');
PERFORM dblink('dblink_trans','COMMIT;');
PERFORM dblink_disconnect('dblink_trans');
RETURN 0;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION log_the_dancing(ip_dance_entry text)
OWNER TO postgres;
BEGIN TRANSACTION;
select log_the_dancing('The Flamingo');
select log_the_dancing('Break Dance');
select log_the_dancing('Cha Cha');
ROLLBACK TRANSACTION;
--Show records committed even though we rolled back outer transaction
select *
from dance_log;
What you're asking for is generally called an autonomous transaction.
PostgreSQL does not support autonomous transactions at this time (9.4).
To properly support them it really needs stored procedures, not just the user-defined functions it currently supports. It's also very complicated to implement autonomous tx's in PostgreSQL for a variety of internal reasons related to its session and process model.
For now, use dblink as suggested by Bob.
If you have the flexibility to change from function to procedure, from PostgreSQL 12 onwards you can do internal commits if you use procedures instead of functions, invoked by CALL command. Therefore your function will be changed to a procedure and invoked with CALL command: e.g:
CREATE PROCEDURE transaction_test2()
LANGUAGE plpgsql
AS $$
DECLARE
r RECORD;
BEGIN
FOR r IN SELECT * FROM test2 ORDER BY x LOOP
INSERT INTO test1 (a) VALUES (r.x);
COMMIT;
END LOOP;
END;
$$;
CALL transaction_test2();
More details about transaction management regarding Postgres are available here: https://www.postgresql.org/docs/12/plpgsql-transactions.html
For Postgresql 9.5 or newer you can use dynamic background workers provided by pg_background extension. It creates autonomous transaction. Please, refer the github page of the extension. The sollution is better then db_link. There is a complete guide on Autonomous transaction support in PostgreSQL. There is a third way to start autonomous transaction in Postgres, but some patching neede. Please see Peter's Eisentraut patch proposal for OracleDB-style transactions.

PostgreSQL BEFORE INSERT trigger locking behavior in a concurrent environment

I have a general function that can manipulate the sequence of any table (why is irrelevant to my question). It reads the current value, works out the new value, sets it, and returns its calculation, which is what's inserted. This is obviously a multi-step process.
I call it from a BEFORE INSERT trigger on tables where I need it.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
Specifically, does the BEFORE INSERT trigger have to complete before it is called again by another caller?
Logically, I would assume yes, but one never knows what may be going on under the hood.
If the answer is no, what minimal locking would I need on the function to guarantee I can read and write the sequence in a "thread-safe" manner?
I'm using PG 10.
EDIT
Here is the function updated with a lock:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
seq text := format('%I.%I_uts_seq', tg_table_schema, tg_table_name);
BEGIN
EXECUTE format('LOCK TABLE %I IN ROW EXCLUSIVE MODE;', tg_table_name);
EXECUTE 'SELECT last_value+1 FROM ' || seq INTO sv; -- currval(seq) isn't useable
PERFORM setval(seq, GREATEST(sv, (EXTRACT(epoch FROM localtimestamp) * 1000000)::int8), false);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
However, a SELECT already acquires ROW EXCLUSIVE, so this statement may be redundant and a stronger lock may be needed. Or, conversely, it may mean no lock is needed.
UPDATE
If I am reading this SO question correctly, my original version without the LOCK should work since the trigger acquires the same lock my updated function is redundantly taking.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
No. Not related to calling functions itself, but you can achieve this behaviour with SERIALIZABLE transaction isolation level:
This level emulates serial transaction execution for all committed
transactions; as if transactions had been executed one after another,
serially, rather than concurrently
But this approach would introduce several tradeoffs, such preparing your application to retry transactions with serialization failure.
Maybe a missed something, but I really believe that you just need NEXTVAL, something like below:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
-- First, use %I wildcard for identifiers instead of %s
seq text := format('%I.%I', tg_table_schema, tg_table_name || '_uts_seq');
BEGIN
-- Second, you couldn't call CURRVAL on a session
-- that you didn't issued NEXTVAL before
sv := NEXTVAL(seq);
-- Do your logic here...
-- Result is ignored since this is an STATEMENT trigger
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Remember that CURRVAL acts on session local scope and NEXTVAL on global scope, so you have a reliable thread-safe mechanism in hands.
The sequence itself handles thread safety with concurrent sessions. So it real comes down to the code that is interacting with the sequence. The following code is thread safe:
SELECT nextval('myseq');
If the sequence is doing much fancier things like setval and currval, I would be more worried about that being done in a high transaction/multi-user environment. Even so, the sequence itself should be locked from other queries while the sequence is being manipulated.