Conditional locking in Postgres function/stored procedure - postgresql

I'm building an event sourcing service for a web crawler, where there are several crawler workers scraping several websites and trying to keep deltas for the crawled resource. I've chosen PostgreSQL for the underlying data store. I need to give producers the ability to have optimistic locking using a flag called "expectedSeq" to control whether or not the event should be written for a particular stream. Initially, I was using a table, leveraging the auto-increment with a transaction to build the optimistic locking feature for each "stream" but I quickly found there's a file system cap on how many tables a server can handle.
Since I can't use auto-increment anymore, I'm trying to build this functionality using two tables, one for controlling the sequence of the stream, the other for storing the event itself.
The first question I have is, whether I should use store procedures or functions. The second is it possible to have conditional transactions inside a stored procedure or a Postgres function.
The logic I need to implemented is something of sorts
storeEvent(stream, expectedSeq = null)
lock row for `streams`.stream
if expectedSeq = null
update stream row with seq + 1
release lock
write event to event table
else
if expectedSeq != seq + 1
release lock
abort
else
update seq + 1
release lock
write event to event table

thanks to Ian Harris
CREATE OR REPLACE PROCEDURE store_event (v_topic varchar(40), v_expected_next_seq integer, v_data text)
LANGUAGE plpgsql
AS $$
DECLARE
next_seq integer;
BEGIN
-- FOR UPDATE clause places row level lock on table
next_seq := (
SELECT
seq
FROM
topics
WHERE
topic = v_topic
FOR UPDATE) + 1;
IF v_expected_next_seq IS NOT NULL AND next_seq != v_expected_next_seq THEN
RAISE 'Optimistic locking error';
END IF;
IF next_seq IS NULL THEN
RAISE 'Unknown topic';
END IF;
UPDATE
topics
SET
seq = next_seq
WHERE
topic = v_topic;
INSERT INTO events (topic, seq, data)
VALUES (v_topic, next_seq, v_data);
COMMIT;
END;
$$;

Related

Can postgres insert triggers and/or check be ran without inserting

I would love to be able to validate objects representing table rows using the database's existing constraints (triggers that raise exceptions and checks) without actually inserting them into the database.
Is there currently a way one could do this in postgres? At least with BEFORE INSERT triggers and CHECK, I assume it makes no sense with AFTER INSERT triggers.
The easiest way I can think or right now would be to:
Lock the table
Insert a new row
If exception raise to the API / else DELETE the row and call it valid
Unlock
But I can see several issues with this.
A simpler way is to insert within a transaction and not commit:
BEGIN;
INSERT INTO tbl(...) VALUES (...);
-- see effects ...
ROLLBACK;
No need for additional locking. The row is never visible to any other transaction with default transacton isolation level READ COMMITTED. (You might be stalling concurrent writes that confict with the tested row.)
Notable side-effect: Sequences of serial or IDENTITY columns are advanced even if the INSERT is never committed. But gaps in sequential numbers are to be expected anyway and nothing to worry about.
Be wary of triggers with side-effects. All "transactional" SQL effects are rolled back, even most DDL commands. But some special operations (like advancing sequences) are never rolled back.
Also, DEFERRED constraints do not kick in. The manual:
DEFERRED constraints are not checked until transaction commit.
If you need this a lot, work with a copy of your table, or even your database.
Strictly speaking, while any trigger / constraint / concurrent event is allowed, there is no other way to "validate objects" than to insert them into the actual target table in the actual target database at the actual point in time. Triggers, constraints, even default values, can interact with the current state of the whole DB. The more possibilities are ruled out and requirements are reduced, the more options we might have to emulate the test.
CREATE FUNCTION validate_function ( )
RETURNS trigger LANGUAGE plpgsql
AS $function$
DECLARE
valid_flag boolean := 't';
BEGIN
--Validation code
if valid_flag = 'f' then
RAISE EXCEPTION 'This record is not valid id %', id
USING HINT = 'Please enter valid record';
RETURN NULL;
else
RETURN NEW;
end if;
END;
$function$
CREATE TRIGGER validate_rec BEFORE INSERT OR UPDATE ON some_tbl
FOR EACH ROW EXECUTE FUNCTION validate_function();
With this function and trigger you validate inside the trigger. If the new record fails validation you set the valid_flag to false and then use that to raise exception. The RETURN NULL; is probably redundant and I am not sure it will be reached, but if it is it will also abort the insert or update. If the record is valid then you RETURN NEW and the insert/update completes.

Which explicit lock to use for a trigger?

I am trying to understand which type of a lock to use for a trigger function.
Simplified function:
CREATE OR REPLACE FUNCTION max_count() RETURNS TRIGGER AS
$$
DECLARE
max_row INTEGER := 6;
association_count INTEGER := 0;
BEGIN
LOCK TABLE my_table IN ROW EXCLUSIVE MODE;
SELECT INTO association_count COUNT(*) FROM my_table WHERE user_id = NEW.user_id;
IF association_count > max_row THEN
RAISE EXCEPTION 'Too many rows';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE CONSTRAINT TRIGGER my_max_count
AFTER INSERT OR UPDATE ON my_table
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE max_count();
I initially was planning to use EXCLUSIVE but it feels too heavy. What I really want is to ensure that during this function execution no new rows are added to the table with concerned user_id.
If you want to prevent concurrent transactions from modifying the table, a SHARE lock would be correct. But that could lead to a deadlock if two such transactions run at the same time — each has modified some rows and is blocked by the other one when it tries to escalate the table lock.
Moreover, all table locks that conflict with SHARE UPDATE EXCLUSIVE will lead to autovacuum cancelation, which will cause table bloat when it happens too often.
So stay away from table locks, they are usually the wrong thing.
The better way to go about this is to use no explicit locking at all, but to use the SERIALIZABLE isolation level for all transactions that access this table.
Then you can simply use your trigger (without lock), and no anomalies can occur. If you get a serialization error, repeat the transaction.
This comes with a certain performance penalty, but allows more concurrency than a table lock. It also avoids the problems described in the beginning.

Insert values in a loop and see the progress postgresql [duplicate]

I have Postgresql Function which has to INSERT about 1.5 million data into a table. What I want is I want to see the table getting populated with every one records insertion. Currently what is happening when I am trying with say about 1000 records, the get gets populated only after the complete function gets executed. If I stop the function half way through, no data gets populated. How can I make the record committed even if I stop after certain number of records have been inserted?
This can be done using dblink. I showed an example with one insert being committed you will need to add your while loop logic and commit every loop. You can http://www.postgresql.org/docs/9.3/static/contrib-dblink-connect.html
CREATE OR REPLACE FUNCTION log_the_dancing(ip_dance_entry text)
RETURNS INT AS
$BODY$
DECLARE
BEGIN
PERFORM dblink_connect('dblink_trans','dbname=sandbox port=5433 user=postgres');
PERFORM dblink('dblink_trans','INSERT INTO dance_log(dance_entry) SELECT ' || '''' || ip_dance_entry || '''');
PERFORM dblink('dblink_trans','COMMIT;');
PERFORM dblink_disconnect('dblink_trans');
RETURN 0;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION log_the_dancing(ip_dance_entry text)
OWNER TO postgres;
BEGIN TRANSACTION;
select log_the_dancing('The Flamingo');
select log_the_dancing('Break Dance');
select log_the_dancing('Cha Cha');
ROLLBACK TRANSACTION;
--Show records committed even though we rolled back outer transaction
select *
from dance_log;
What you're asking for is generally called an autonomous transaction.
PostgreSQL does not support autonomous transactions at this time (9.4).
To properly support them it really needs stored procedures, not just the user-defined functions it currently supports. It's also very complicated to implement autonomous tx's in PostgreSQL for a variety of internal reasons related to its session and process model.
For now, use dblink as suggested by Bob.
If you have the flexibility to change from function to procedure, from PostgreSQL 12 onwards you can do internal commits if you use procedures instead of functions, invoked by CALL command. Therefore your function will be changed to a procedure and invoked with CALL command: e.g:
CREATE PROCEDURE transaction_test2()
LANGUAGE plpgsql
AS $$
DECLARE
r RECORD;
BEGIN
FOR r IN SELECT * FROM test2 ORDER BY x LOOP
INSERT INTO test1 (a) VALUES (r.x);
COMMIT;
END LOOP;
END;
$$;
CALL transaction_test2();
More details about transaction management regarding Postgres are available here: https://www.postgresql.org/docs/12/plpgsql-transactions.html
For Postgresql 9.5 or newer you can use dynamic background workers provided by pg_background extension. It creates autonomous transaction. Please, refer the github page of the extension. The sollution is better then db_link. There is a complete guide on Autonomous transaction support in PostgreSQL. There is a third way to start autonomous transaction in Postgres, but some patching neede. Please see Peter's Eisentraut patch proposal for OracleDB-style transactions.

PostgreSQL generic handler for serialization failure

This is a followup question from this one so I know I can use (blocking) LOCKs but I want to use predicate locks and serializable transaction isolation.
What I'd like to have is a generic handler of serialization failures that would retry the function/query X number of times.
As example, I have this:
CREATE SEQUENCE account_id_seq;
CREATE TABLE account
(
id integer NOT NULL DEFAULT nextval('account_id_seq'),
title character varying(40) NOT NULL,
balance integer NOT NULL DEFAULT 0,
CONSTRAINT account_pkey PRIMARY KEY (id)
);
INSERT INTO account (title) VALUES ('Test Account');
CREATE OR REPLACE FUNCTION mytest() RETURNS integer AS $$
DECLARE
cc integer;
BEGIN
cc := balance from account where id=1;
RAISE NOTICE 'Balance: %', cc;
perform pg_sleep(3);
update account set balance = cc+10 where id=1 RETURNING balance INTO cc;
return cc;
END
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION myretest() RETURNS integer AS $$
DECLARE
tries integer := 5;
BEGIN
WHILE TRUE LOOP
BEGIN -- nested block for exception
RETURN mytest();
EXCEPTION
WHEN SQLSTATE '40001' THEN
IF tries > 0 THEN
tries := tries - 1;
RAISE NOTICE 'Restart! % left', tries;
ELSE
RAISE EXCEPTION 'NO RESTARTS LEFT';
END IF;
END;
END LOOP;
END
$$
LANGUAGE plpgsql;
So if call mytest() directly concurrently I get a serialization failure on the last commit:
4SO$ psql -c "select mytest()" & PIDA=$! && psql -c "select mytest()" && wait $PIDA
[1] 4909
NOTICE: Balance: 0
NOTICE: Balance: 0
mytest
--------
10
(1 row)
ERROR: could not serialize access due to concurrent update
CONTEXT: SQL statement "update account set balance = cc+10 where id=1 RETURNING balance"
PL/pgSQL function mytest() line 10 at SQL statement
If I call myretest() it should try to execute mytest() up until the 5th try where it would raise the exception.
So I have two points here (where maybe point 2 also invalidates point 1):
myretest() does not work as expected, every iteration results in serialiation_failure exception even after the concurrent thread finishes: is there something I should add to "reset" the transaction?
how could I make this (myretest() logic) generic so that it would apply to every called function in the system without the need for "wrapper" functions as such?
Serializable transactions provide exactly what you are looking for as long as you use some framework that starts the transaction over when it receives an error with a SQLSTATE of 40001 or 40P01.
In PostgreSQL a function always runs in the context of a transaction. You can't start a new transaction within the context of a "wrapper" function. That would require a slightly different feature, which is commonly called a "stored procedure" -- something which doesn't exist in PostgreSQL. Therefore, you need to put the logic to manage the restart into code which submits the transaction to the database. Fortunately, there are many connectors for that -- Java, perl, python, tcl, ODBC, etc. There is even a connector for making a separate connection to a PostgreSQL database within a PostgreSQL procedural language, which might allow you to do something like what you want:
http://www.postgresql.org/docs/current/static/dblink.html
I have seen this done in various "client" frameworks. Clearly it is a bad idea to spread this around to all locations where the application is logically dealing with the database, but there are many good reasons to route all database requests through one "accessor" method (or at least a very small number of them), and most frameworks provide a way to deal with this at that layer. (For example, in Spring you would want to create a transaction manager using dependency injection.) That probably belongs in some language you are using for your application logic, but if you really wanted to you could probably use plpgsql and dblink; that's probably not going to be your easiest path, though.

PostgreSQL concurrent update selects

I am attempting to have some sort of update select for a job queue. I need it to support concurrent processes affecting the same table or database This server will be used only for the queue so a database per queue is acceptable. Originally I was thinking about something like the following:
UPDATE state=1,ts=NOW() FROM queue WHERE ID IN (SELECT ID FROM queue WHERE state=0 LIMIT X) RETURN *
Which I been reading that this will cause a race condition, I read that there was a an option for the SELECT subquery to use FOR UPDATE, but then that will lock the row and concurrent calls will be blocked where I would not mind if they skip over to the next unlocked row.
So what i am asking for is the best way to have a fifo system in postgres that requires the least amount of locking the entire database.
The typical way to do this is to wrap it in a PLPGSQL function, select FOR UPDATE NOWAIT, and then use exception handling to skip the locked rows.
This does place some additional overhead on the function because exception handling requires additional processor cycles to manage even if there are no exceptions.
As a very simple example:
CREATE OR REPLACE FUNCTION get_all_unlocked_customers() RETURNS SETOF customer
LANGUAGE PLPGSQL AS
$$
RETURN QUERY SELECT * FROM customer FOR UPDATE NOWAIT;
EXCEPTION
WHEN LOCK_NOT_AVAILABLE THEN
-- NO NEED TO DO ANYTHING
END;
END;
$$;