State machine represented on DB, enforcement of state transition - postgresql

I've got a finite state machine which represents the phases of a job.
I need to represent the states in a Postgres database. I would like to enforce the code correctness by forbidding updates from one state to the other unless the state machine allows so.
A naive way to accomplish my goal could be the acquisition of an exclusive lock on the table, within the transaction check the current state and the next state, abort with errors in case of invalid update.
This is clearly a performances killer, since I'm going to lock the Job table at each state transition.
Is there a way to accomplish the same goal via constraints?

Trigger is the answer for your problem.
Let's consider simple table:
CREATE TABLE world (id serial PRIMARY KEY, state VARCHAR);
insert into world (state) values ('big bang');
insert into world (state) values ('stars formation');
insert into world (state) values ('human era');
Function that will be called by the trigger. Define your state machine logic here. RAISE EXCEPTION is useful, since you can provide custom message here.
CREATE FUNCTION check_world_change() RETURNS trigger as $check_world_change$
BEGIN
IF OLD.state = 'big bang' AND NEW.state = 'human era' THEN
RAISE EXCEPTION 'Dont skip stars';
END IF;
IF OLD.state = 'stars formation' AND NEW.state = 'big bang' THEN
RAISE EXCEPTION 'Impossible to reverse order of things';
END IF;
RETURN NEW;
END;
$check_world_change$ LANGUAGE plpgsql;
And define trigger for your table:
CREATE TRIGGER check_world_change BEFORE UPDATE ON world
FOR EACH ROW EXECUTE PROCEDURE check_world_change();
Now, when you try to update state of one of the rows, you'll get error:
world=# select * from world;
id | state
----+-----------------
2 | stars formation
1 | human era
3 | big bang
(3 rows)
world=# update world set state='human era' where state='big bang';
ERROR: Wrong transition
world=# select * from world;
id | state
----+-----------------
2 | stars formation
1 | human era
3 | big bang
(3 rows)
References:
https://www.postgresql.org/docs/9.5/static/plpgsql-trigger.html
https://www.postgresql.org/docs/9.5/static/sql-createtrigger.html

Related

Does CLOCK_TIMESTAMP from a BEFORE trigger match log/commit order *exactly* in PG 12.3?

I've got a Postgres 12.3 question: Can I rely on CLOCK_TIMESTAMP() in a trigger to stamp an updated_dts timestamp in exactly the same order as changes are committed to the permanent data?
On the face of it, this might sound like kind of an silly question, but I just spent two tracking down a super rare race condition in a non-Postgres system that hinged on exactly this behavior. (Lagging commits made their 'last value seen' tracking data unreliable.) Now I'm trying to figure out if it's possible for CLOCK_TIMESTAMP() to not match the order of changes recorded in the WAL perfectly.
It's simple to see how this could occur with NOW/TRANSACTION_TIMESTAMP/CURRENT_TIMESTAMP as they're returning the transaction start time, not the completion time. It's pretty easy, in that case, to record a timestamp sequence where the stamps and log order don't agree. But I can't figure out if there's any chance for commits to be saved in a different order to the BEFORE trigger CLOCK_TIMESTAMP() values.
For background, we need a 100% reliable timeline for an external search to use. As I understand it, I can create one using logical replication, and a replication-target side trigger to stamp changes as they're replayed from the log. What I'm unclear on, is if it's possible to get the same fidelity from CLOCK_TIMESTAMP() on a single server.
I haven't got the chops to get deep into the Postgres internals, and see how requests are interleaved, nor how granular execution is, and am hoping that someone here knows definitively. If this is more of a question for one of the PG mailing lists, please let me know.
-- Thanks
Below is a bit of sample code for how I'm looking at building the timestamps. It works fine, but doesn't prove anything about behavior with lots of concurrent processes.
---------------------------------------------
-- Create the trigger function
---------------------------------------------
DROP FUNCTION IF EXISTS api.set_updated CASCADE;
CREATE OR REPLACE FUNCTION api.set_updated()
RETURNS TRIGGER
AS $BODY$
BEGIN
NEW.updated_dts = CLOCK_TIMESTAMP();
RETURN NEW;
END;
$BODY$
language plpgsql;
COMMENT ON FUNCTION api.set_updated() IS 'Sets updated_dts field to CLOCK_TIMESTAMP(), if the record has changed..';
---------------------------------------------
-- Create the table
---------------------------------------------
DROP TABLE IF EXISTS api.numbers;
CREATE TABLE api.numbers (
id uuid NOT NULL DEFAULT extensions.gen_random_uuid (),
number integer NOT NULL DEFAULT NULL,
updated_dts timestamptz NOT NULL DEFAULT 'epoch'::timestamptz
);
---------------------------------------------
-- Define the triggers (binding)
---------------------------------------------
-- NOTE: I'm guessing that in production that I can use DEFAULT CLOCK_TIMESTAMP() instead of a BEFORE INSERT trigger,
-- I'm using a distinct DEFAULT value, as I want it to pop out if I'm not getting the trigger to fire.
CREATE TRIGGER trigger_api_number_before_insert
BEFORE INSERT ON api.numbers
FOR EACH ROW
EXECUTE PROCEDURE set_updated();
CREATE TRIGGER trigger_api_number_before_update
BEFORE UPDATE ON api.numbers
FOR EACH ROW
WHEN (OLD.* IS DISTINCT FROM NEW.*)
EXECUTE PROCEDURE set_updated();
---------------------------------------------
-- INSERT some data
---------------------------------------------
INSERT INTO numbers (number) values (1),(2),(3);
---------------------------------------------
-- Take a look
---------------------------------------------
SELECT * from numbers ORDER BY updated_dts ASC; -- The values should be listed as 1, 2, 3 as oldest to newest.
---------------------------------------------
-- UPDATE a row
---------------------------------------------
UPDATE numbers SET number = 11 where number = 1;
---------------------------------------------
-- Take a look
---------------------------------------------
SELECT * from numbers ORDER BY updated_dts ASC; -- The values should be listed as 2, 3, 11 as oldest to newest.
No, you cannot depend on clock_timestamp() order during trigger execution (or while evaluating a DEFAULT clause) being the same as commit order.
Commit will always happen later than the function call, and you cannot control how long it takes between them.
But I am surprised that that is a problem for you. Typically, the commit time is not visible or relevant. Why don't you simply accept the clock_timestamp() as the measure of things?

Conditional locking in Postgres function/stored procedure

I'm building an event sourcing service for a web crawler, where there are several crawler workers scraping several websites and trying to keep deltas for the crawled resource. I've chosen PostgreSQL for the underlying data store. I need to give producers the ability to have optimistic locking using a flag called "expectedSeq" to control whether or not the event should be written for a particular stream. Initially, I was using a table, leveraging the auto-increment with a transaction to build the optimistic locking feature for each "stream" but I quickly found there's a file system cap on how many tables a server can handle.
Since I can't use auto-increment anymore, I'm trying to build this functionality using two tables, one for controlling the sequence of the stream, the other for storing the event itself.
The first question I have is, whether I should use store procedures or functions. The second is it possible to have conditional transactions inside a stored procedure or a Postgres function.
The logic I need to implemented is something of sorts
storeEvent(stream, expectedSeq = null)
lock row for `streams`.stream
if expectedSeq = null
update stream row with seq + 1
release lock
write event to event table
else
if expectedSeq != seq + 1
release lock
abort
else
update seq + 1
release lock
write event to event table
thanks to Ian Harris
CREATE OR REPLACE PROCEDURE store_event (v_topic varchar(40), v_expected_next_seq integer, v_data text)
LANGUAGE plpgsql
AS $$
DECLARE
next_seq integer;
BEGIN
-- FOR UPDATE clause places row level lock on table
next_seq := (
SELECT
seq
FROM
topics
WHERE
topic = v_topic
FOR UPDATE) + 1;
IF v_expected_next_seq IS NOT NULL AND next_seq != v_expected_next_seq THEN
RAISE 'Optimistic locking error';
END IF;
IF next_seq IS NULL THEN
RAISE 'Unknown topic';
END IF;
UPDATE
topics
SET
seq = next_seq
WHERE
topic = v_topic;
INSERT INTO events (topic, seq, data)
VALUES (v_topic, next_seq, v_data);
COMMIT;
END;
$$;

Create Trigger For Update another row automatically on same table using postgresql

I want to create a trigger that can update another row on the same table on PostgreSQL.
if i run the query like these:
UPDATE tokens
SET amount = (SELECT amount FROM tokens WHERE id = 1)
where id = 2
these result that i expected.
description:
i want to set field amount on a row with id:2, where the amount value is from query result on a subquery, so the amount value on id:2 is same with id:1
Hopefully, with this created trigger, i can do update amount value on id=1 so the amount value on id:2 is same with id:1
Before update result:
id | amount|
1 | 200 |
2 | 200 |
When i update the amount value on id:1 to 100 on, so the amount value on id:2 become 100
After update result:
id | amount|
1 | 100 |
2 | 100 |
Update for my temporary solution:
i just create the UDF like these
CREATE FUNCTION update_amount(id_ops integer, id_mir integer) returns boolean LANGUAGE plpgsql AS $$
BEGIN
UPDATE tokens SET amount = (SELECT amount FROM tokens WHERE id = id_ops) WHERE id = id_mir;
RETURN 1;
END;
$$;
description:
id_ops: id where the amount i always update
id_mir: id where the amount automatically update after i update the amount with id_ops
Example of using my written UDF to resolve my problem:
I update the amount of id: 1 to 2000. The amount of id: 2 not updated to 2000
When i run query select update_amount(1,2);
The amount of id: 2 will same with amount on id: 1
I Need a trigger on PostgreSQL to automate or replace the function of UDF that i wrote
What you want to do it not really that difficult, I'll show you. But first: This is a very very bad idea. In fact bad enough that some databases, most notable Oracle, throw an exception if try it. Unfortunately Postgres allows it. You essentially create a recursive update as you are updating the table that initiated the trigger. This update in turn initiates the trigger. Without logic to stop this recursion you could update every row in the table.
I assume this is an extract for a much larger requirement, or perhaps you just want to know how to create a trigger. So we begin:
-- setup
drop table if exists tokens;
create table tokens( id integer, amount numeric(6,2));
-- create initial test data
insert into tokens(id, amount)
values (1,100), (2,150.69), (3,95.50), (4,75), (5,16.40);
Now the heart Postgres trigger: the trigger function, and the trigger. Note the function must be defined prior to the trigger which calls it.
-- create a trigger function: That is a function returning trigger.
create or replace function tokens_bur_func()
returns trigger
language plpgsql
as $$
begin
if new.id = 1
then
update tokens
set amount = new.amount
where id = 2;
end if;
return new;
end ;
$$;
-- create the trigger
create trigger tokens_bur
before update of amount
on tokens
for each row execute procedure tokens_bur_func();
--- test
select *
from tokens
order by id;
-- do an initial update
update tokens
set amount = 200
where id = 1;
-- Query returned successfully: one row affected, 31 msec execution time.
-- 1 row? Yes: DML count does not see change made from within trigger.
-- but
select *
from tokens
order by id;
Hard coding ids in a trigger however is not very functional after all "update ... where id in (1,2)" would be much easier, and safer as it does not require the recursion stop logic. So a slightly more generic trigger function is:
-- More general but still vastly limited:
-- trigger that mirrors subsequent row whenever an odd id is updated.
create or replace function tokens_bur_func()
returns trigger
language plpgsql
as $$
begin
if mod(new.id, 2)=1
then
update tokens
set amount = new.amount
where id = new.id+1;
end if;
return new;
end ;
$$;
-- test
update tokens
set amount = 900
where id = 3;
update tokens
set amount = 18.95
where id in (2,5);
select *
from tokens
order by id;
No matter how you proceed you required prior knowledge of update specifics. For example you said "might be id 2 I can set mirror from id 3" to do so you would need to alter the database in some manner either changing the trigger function or the trigger to pass parameters. (Triggers can pass parameters but they are static, supplied at create trigger time).
Finally make sure you got your recursion stop logic down cold. Because if not:
-- The Danger: What happens WITHOUT the 'stopper condition'
-- Using an almost direct conversion of your UDT
-- using new_id as
create or replace function tokens_bur_func()
returns trigger
language plpgsql
as $$
begin
update tokens
set amount = new.amount
where id = new.id+1;
return new;
end ;
$$;
-- test
update tokens
set amount = 137.92
where id = 1;
-- Query returned successfully: one row affected, 31 msec execution time.
-- but
select *
from tokens
order by id;

PostgreSQL generic handler for serialization failure

This is a followup question from this one so I know I can use (blocking) LOCKs but I want to use predicate locks and serializable transaction isolation.
What I'd like to have is a generic handler of serialization failures that would retry the function/query X number of times.
As example, I have this:
CREATE SEQUENCE account_id_seq;
CREATE TABLE account
(
id integer NOT NULL DEFAULT nextval('account_id_seq'),
title character varying(40) NOT NULL,
balance integer NOT NULL DEFAULT 0,
CONSTRAINT account_pkey PRIMARY KEY (id)
);
INSERT INTO account (title) VALUES ('Test Account');
CREATE OR REPLACE FUNCTION mytest() RETURNS integer AS $$
DECLARE
cc integer;
BEGIN
cc := balance from account where id=1;
RAISE NOTICE 'Balance: %', cc;
perform pg_sleep(3);
update account set balance = cc+10 where id=1 RETURNING balance INTO cc;
return cc;
END
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION myretest() RETURNS integer AS $$
DECLARE
tries integer := 5;
BEGIN
WHILE TRUE LOOP
BEGIN -- nested block for exception
RETURN mytest();
EXCEPTION
WHEN SQLSTATE '40001' THEN
IF tries > 0 THEN
tries := tries - 1;
RAISE NOTICE 'Restart! % left', tries;
ELSE
RAISE EXCEPTION 'NO RESTARTS LEFT';
END IF;
END;
END LOOP;
END
$$
LANGUAGE plpgsql;
So if call mytest() directly concurrently I get a serialization failure on the last commit:
4SO$ psql -c "select mytest()" & PIDA=$! && psql -c "select mytest()" && wait $PIDA
[1] 4909
NOTICE: Balance: 0
NOTICE: Balance: 0
mytest
--------
10
(1 row)
ERROR: could not serialize access due to concurrent update
CONTEXT: SQL statement "update account set balance = cc+10 where id=1 RETURNING balance"
PL/pgSQL function mytest() line 10 at SQL statement
If I call myretest() it should try to execute mytest() up until the 5th try where it would raise the exception.
So I have two points here (where maybe point 2 also invalidates point 1):
myretest() does not work as expected, every iteration results in serialiation_failure exception even after the concurrent thread finishes: is there something I should add to "reset" the transaction?
how could I make this (myretest() logic) generic so that it would apply to every called function in the system without the need for "wrapper" functions as such?
Serializable transactions provide exactly what you are looking for as long as you use some framework that starts the transaction over when it receives an error with a SQLSTATE of 40001 or 40P01.
In PostgreSQL a function always runs in the context of a transaction. You can't start a new transaction within the context of a "wrapper" function. That would require a slightly different feature, which is commonly called a "stored procedure" -- something which doesn't exist in PostgreSQL. Therefore, you need to put the logic to manage the restart into code which submits the transaction to the database. Fortunately, there are many connectors for that -- Java, perl, python, tcl, ODBC, etc. There is even a connector for making a separate connection to a PostgreSQL database within a PostgreSQL procedural language, which might allow you to do something like what you want:
http://www.postgresql.org/docs/current/static/dblink.html
I have seen this done in various "client" frameworks. Clearly it is a bad idea to spread this around to all locations where the application is logically dealing with the database, but there are many good reasons to route all database requests through one "accessor" method (or at least a very small number of them), and most frameworks provide a way to deal with this at that layer. (For example, in Spring you would want to create a transaction manager using dependency injection.) That probably belongs in some language you are using for your application logic, but if you really wanted to you could probably use plpgsql and dblink; that's probably not going to be your easiest path, though.

Locking in Postgres function

Let's say I have a transactions table and transaction_summary table. I have created following trigger to update transaction_summary table.
CREATE OR REPLACE FUNCTION doSomeThing() RETURNS TRIGGER AS
$BODY$
DECLARE
rec_cnt bigint;
BEGIN
-- lock rows which have to be updated
SELECT count(1) from (SELECT 1 FROM transaction_summary WHERE receiver = new.receiver FOR UPDATE) r INTO rec_cnt ;
IF rec_cnt = 0
THEN
-- if there are no rows then create new entry in summary table
-- lock whole table
LOCK TABLE "transaction_summary" IN ACCESS EXCLUSIVE MODE;
INSERT INTO transaction_summary( ... ) VALUES ( ... );
ELSE
UPDATE transaction_summary SET ... WHERE receiver = new.receiver;
END IF;
SELECT count(1) from (SELECT 1 FROM transaction_summary WHERE sender = new.sender FOR UPDATE) r INTO rec_cnt ;
IF rec_cnt = 0
THEN
LOCK TABLE "transaction_summary" IN ACCESS EXCLUSIVE MODE;
INSERT INTO transaction_summary( ... ) VALUES ( ... );
ELSE
UPDATE transaction_summary SET ... WHERE sender = new.sender;
END IF;
RETURN new;
END;
$BODY$
language plpgsql;
Question: Will there be a dead lock? According to my understanding deadlock it might happen like this:
_________
|__table__| <- executor #1 waits on executor #2 to be able to lock the whole table AND
|_________| executor #2 waits on executor #1 to be able to lock the whole table
|_________|
|_________| <- row is locked by executor #1
|_________|
|_________| <- row is locked by executor #2
It seems that only option is to lock the whole table every time in transaction beginning.
Are your 'SELECT 1 FROM transactions WHERE ...' meant to access 'transactions_summary' instead? Also, notice that those two queries can at least theoretically deadlock each other if two DB transactions are inserting two 'transactions' rows, with new.sender1=new.receiver2 and new.receiver1=new.sender2.
You can't, in general, guarantee that you won't get a deadlock from a database. Even if you try and prevent them by writing your queries carefully (eg, ordering updates) you can still get caught out because you can't control the order of INSERT/UPDATE, or of constraint checks. In any case, comparing every transaction against every other to check for deadlocks doesn't scale as your application grows.
So, your code should always be prepared to re-run transactions when you get 'deadlock detected' errors. If you do that and you think that conflicting transactions will be uncommon then you might as well let your deadlock handling code deal with it.
If you think deadlocks will be common then it might cause you a performance problem - although contending on a big table lock could be, too. Here are some options:
If new.receiver and new.sender are, for example, the IDs of rows in a MyUsers table, you could require all code which inserts into 'transactions_summary' to first do 'SELECT 1 FROM MyUsers WHERE id IN (user1, user2) FOR UPDATE'. It'll break if someone forgets, but so will your table locking. By doing it that way you'll swap one big table lock for many separate row locks.
Add UNIQUE constraints to transactions_summary and look for the error when it's violated. You should probably add constraints anyway, even if you handle this another way. It'll detect bugs.
You could allow duplicate transaction_summary rows, and require users of that table to add them up. Messy, and easy for developers who don't know to create bugs (though you could add a view which does the adding). But if you really can't take the performance hit of locking and deadlocks you could do it.
You could try the SERIALIZABLE transaction isolation level and take out the table locks. By my reading, the SELECT ... FOR UPDATE should create a predicate lock (and so should a plain SELECT). That'd stop any other transaction that does a conflicting insert from committing successfully. However, using SERIALIZABLE throughout your application will cost you performance and give you a lot more transactions to retry.
Here's how SERIALIZABLE transaction isolation level works:
create table test (id serial, x integer, total integer); ...
Transaction 1:
DB=# begin transaction isolation level serializable;
BEGIN
DB=# insert into test (x, total) select 3, 100 where not exists (select true from test where x=3);
INSERT 0 1
DB=# select * from test;
id | x | total
----+---+-------
1 | 3 | 100
(1 row)
DB=# commit;
COMMIT
Transaction 2, interleaved line for line with the first:
DB=# begin transaction isolation level serializable;
BEGIN
DB=# insert into test (x, total) select 3, 200 where not exists (select true from test where x=3);
INSERT 0 1
DB=# select * from test;
id | x | total
----+---+-------
2 | 3 | 200
(1 row)
DB=# commit;
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.