Optimistic Locking with PostgreSQL - postgresql

Problem statement: I am using the Repository pattern to pull and update records from a PostgreSQL DB (version 11) into a Node.JS API via the pg npm module. If two users try to modify the same record at almost the same time, the changes made by the first user to submit will be overwritten by the second user to submit. I want to prevent the second user's changes from being submitted until they have updated their local copy of the record with the first user's changes.
I know that a DB like CouchDB has a "_rev" property that it uses in this case to detect an attempt to update from a stale snapshot. As I've researched more, I've found this is called optimistic locking (Optimistic locking queue). So I'll add a rev column to my table and use it in my SQL update statement.
UPDATE tableX
SET field1 = value1,
field2 = value2,
...,
rev_field = uuid_generate_v4()
WHERE id_field = id_value
AND rev_field = rev_value
However, if id_value is a match and rev_value isn't that won't tell my repository code that the record was stale, only that 0 rows were affected by the query.
So I've got a script that I've written in pgAdmin that will detect cases where the update affected 0 rows and then checks the rev_field.
DO $$
DECLARE
i_id numeric;
i_uuid uuid;
v_count numeric;
v_rev uuid;
BEGIN
i_id := 1;
i_uuid := '20b2e135-42d0-4a49-94c0-5557dd09abd1';
UPDATE account_r
SET account_name = 'savings',
rev = uuid_generate_v4()
WHERE account_id = i_id
AND rev = i_uuid;
GET DIAGNOSTICS v_count = ROW_COUNT;
IF v_count < 1 THEN
SELECT rev INTO v_rev
FROM account_r
WHERE account_id = i_id;
IF v_rev <> i_uuid THEN
RAISE EXCEPTION 'revision mismatch';
END IF;
END IF;
RAISE NOTICE 'rows affected: %', v_count;
END $$;
While I'm perfectly comfortable adapting this code into a stored proc and calling that from Node, I'm hoping that there's a solution to this that's not nearly as complex. On the one hand, moving these functions to the DB will clean up my JS code, on the other hand, this is a lot of boilerplate SQL to write, since it will have to be done for UPDATE and DELETE for each table.
Is there an easier way to get this done? (Perhaps the code at Optimistic locking queue is the easier way?) Should I be looking at an ORM to help reduce the headache here?

There is no need to maintain a rev value. You can get the md5 hash of a table's row.
SQL Fiddle Here
create table mytable (
id int primary key,
some_text text,
some_int int,
some_date timestamptz
);
insert into mytable
values (1, 'First entry', 0, now() - interval '1 day'),
(2, 'Second entry', 1, now()),
(3, 'Third entry', 2, now() + interval '1 day');
select *, md5(mytable::text) from mytable order by id;
The fiddle includes other queries to demonstrate that the calculated md5() is based on the values of the row.
Using that hash for optimistic locking, the updates can take the form:
update mytable
set some_int = -1
where id = 1
and md5(mytable::text) = <md5 hash from select>
returning *
You will still need to check for no return rows, but that could be abstracted away on the Node side.
It looks like result.row_count contains the number of rows affected, so you will not need the returning * part.

Related

PostgreSQL concurrent check if row exists in table

Suppose I have simple logic:
If user had no balance accrual earlier (which is recorded in accruals table), we must give him 100$ to balance:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
BEGIN
--LOCK TABLE accruals; -- label A
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
END
$$;
COMMIT;
The problem of this transaction is it's not concurrent.
Running this transaction in parrallel results getting user_id=1 with balance=200 and 2 accruals recorded.
How do I test concurrency ?
1. I run in session 1: START TRANSACTION; LOCK TABLE accruals;
2. In session 2 and session 3 I run this transaction
3. In session 1: ROLLBACK
The question is: How do I make this 100% concurrent and make sure user will have 100$ only once.
The only way I see is to lock the table (label A in code sample)
But do I have another way ?
The simplest way is probably to use the serializable isolation level (by changing default_transaction_isolation). Then one of the processes should get something like "ERROR: could not serialize access due to concurrent update"
If you want to keep the isolation level at 'read committed', then you can just count accruals at the end and throw an error:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
_count int;
BEGIN
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
select count(*) into _count from accruals where user_id=1;
IF _count >1 THEN
RAISE SQLSTATE '22023';
END IF;
END
$$;
COMMIT;
This works because one process will block the other on the UPDATE (assuming non-zero number of rows get updated), and by the time one process commits to release the blocked process, its inserted row will be visible to the other one.
Formally there is then no need for the first check, but if you don't want a lot of churn due to rolled back INSERT and UPDATE, you might want to keep it.

Return the value changed by an update without a trigger

Postgres has a great RETURNING clause for INSERT, DELETE and UPDATE...and it's made me a bit greedy. In a few cases, what I'd like to get is not only the current value, but the previous value:
UPDATE analytic_productivity
SET points = 1000
WHERE points > 1000
RETURNING id,
points,
OLD.points;
I don't believe there's any way to access previous values outside of the lifespan and context of a trigger. So, I'll guess what I'd like isn't possible as such. If that's right, can anyone suggest an alternative? I'm overwriting outliers with some set values, and would like to record the modified values in another table. This is why I don't know the current value in advance. This is a rare (and clearly suspect) operation, and I don't want to record the change on normal inserts and updates.
As an alternative, I'm thinking that I can select the outliers, revise them, and then write back the modifications. So, do most of the work on the client side with a couple of requests to Postgres. If so, can someone suggest the right locking level to apply between my initial SELECT and my following UPDATE? I believe that the FOR UPDATE lock is right.
Any suggestions on a smart way to capture previous values, during an update, without a trigger would be great to hear about.
Follow-up
Thanks to comments here, I experimented a bit and came up with a solution that works in my case. To make my objectives clearer:
I've got a table named outlier_rule that defines values that are too high for a specific column.
The goal is to loop over the table, and apply the rules to set outliers to a fixed value.
Stomping on outliers like this is...questionable. There must be leaks in the app's UI that allow for unreasonable values. To help track these down, I'm recording the large values in a table named outlier_change.
I'd like to push this behavior into server-side function so that any of our servers, regardless of their codebase version, can invoke the current logic.
The client servers compose and send an email with a result summary, when outliers are found and corrected.
So, a server-side function to do everything, log some data, and return a result. I've got that working, but it's got the smell of You Don't Know What You're Doing So Just Keep Adding Code Until it Works. I've at least got a better handle on using FORMAT and think I understand now that a single function can do many things, and that you can choose what to return with the RETURN clause. For reference, the various bits of code:
CREATE TABLE IF NOT EXISTS data.outlier_rule (
id uuid NOT NULL DEFAULT extensions.gen_random_uuid(),
schema_name text NOT NULL DEFAULT NULL,
table_name text NOT NULL DEFAULT NULL,
column_name text NOT NULL DEFAULT NULL,
threshold integer,
set_to integer,
CONSTRAINT outlier_rule_id_pkey
PRIMARY KEY (schema_name,table_name,column_name)
);
For tracking the modifications, I've got a second table named outlier_change:
------------------------------
-- Table
------------------------------
DROP TABLE IF EXISTS data.outlier_change CASCADE;
CREATE TABLE IF NOT EXISTS data.outlier_change (
id uuid NOT NULL DEFAULT NULL,
outlier_rule_id uuid NOT NULL DEFAULT NULL,
value_was integer NOT NULL DEFAULT NULL,
set_to integer NOT NULL DEFAULT NULL,
change_count integer NOT NULL DEFAULT 0,
last_changed_dts timestamptz NOT NULL DEFAULT NOW(),
CONSTRAINT outlier_change_id_pkey
PRIMARY KEY (id,outlier_rule_id)
);
ALTER TABLE data.outlier_change OWNER TO user_change_structure;
------------------------------
-- Trigger Function
------------------------------
CREATE OR REPLACE FUNCTION data.on_outlier_change_upsert()
RETURNS pg_catalog.trigger AS $BODY$
BEGIN
NEW.last_changed_dts := NOW();
NEW.change_count := OLD.change_count + 1;
RETURN NEW; -- important!
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
------------------------------
-- Trigger
------------------------------
CREATE TRIGGER outlier_change_upsert BEFORE INSERT OR UPDATE ON data.outlier_change
FOR EACH ROW
EXECUTE PROCEDURE data.on_outlier_change_upsert();
DROP FUNCTION IF EXISTS data.outlier_fix ();
CREATE OR REPLACE FUNCTION data.outlier_fix ()
RETURNS TABLE (
schema_name text,
table_name text,
column_name text,
id uuid,
value_was integer,
set_to integer,
change_count integer
)
AS $$
DECLARE
rule record;
now_ timestamptz = NOW();
BEGIN
FOR rule IN SELECT * FROM data.outlier_rule LOOP
EXECUTE FORMAT (
'INSERT INTO outlier_change (
outlier_rule_id,
set_to,
id,
value_was)
SELECT %6$L,
%5$s,
%2$I.id,
%2$I.%3$I
FROM %1$I.%2$I
WHERE %3$I > %4$s
ON CONFLICT(id,outlier_rule_id) DO UPDATE SET
value_was = EXCLUDED.value_was,
set_to = EXCLUDED.set_to
RETURNING outlier_rule_id,
id,
value_was,
set_to
change_count;
UPDATE %1$I.%2$I
SET %3$I = %5$s
WHERE %3$I > %4$s;',
rule.schema_name,
rule.table_name,
rule.column_name,
rule.threshold,
rule.set_to,
rule.id);
END LOOP;
RETURN QUERY EXECUTE ('
SELECT outlier_rule.schema_name,
outlier_rule.table_name,
outlier_rule.column_name,
outlier_change.id,
outlier_change.value_was,
outlier_change.set_to,
outlier_change.change_count
FROM outlier_change
JOIN outlier_rule ON (outlier_rule.id = outlier_change.outlier_rule_id)
WHERE last_changed_dts = $1')
USING now_;
END;
$$ LANGUAGE plpgsql;
ALTER FUNCTION data.outlier_fix() OWNER TO user_bender;
You could achieve that with a bit of a hack. You can self join the table in your update query like this:
UPDATE analytic_productivity NEW
SET points = 1000
FROM analytic_productivity OLD
WHERE NEW.points > 1000
and NEW.id = OLD.id
RETURNING NEW.id,
NEW.points,
OLD.points as old_points;

Using the now() function and executing triggers

I am trying to create a trigger function in PostgreSQL that should check records with the same id (i.e. comparison by id with existing records) before inserting or updating the records. If the function finds records that have the same id, then that entry is set to be the time_dead. Let me explain with this example:
INSERT INTO persons (id, time_create, time_dead, name)
VALUES (1, 'now();', ' ', 'james');
I want to have a table like this:
id time_create time-dead name
1 06:12 henry
2 07:12 muka
id 1 had a time_create 06.12 but the time_dead was NULL. This is the same as id 2 but next time I try to run the insert query with same id but different names I should get a table like this:
id time_create time-dead name
1 06:12 14:35 henry
2 07:12 muka
1 14:35 waks
henry and waks share the same id 1. After running an insert query henry's time_dead is equal to waks' time_create. If another entry was to made with id 1, lets say for james, the time entry for james will be equal to the time_dead for waks. And so on.
So far my function looks like this. But it's not working:
CREATE FUNCTION tr_function() RETURNS trigger AS '
BEGIN
IF tg_op = ''UPDATE'' THEN
UPDATE persons
SET time_dead = NEW.time_create
Where
id = NEW.id
AND time_dead IS NULL
;
END IF;
RETURN new;
END
' LANGUAGE plpgsql;
CREATE TRIGGER sofgr BEFORE INSERT OR UPDATE
ON persons FOR each ROW
EXECUTE PROCEDURE tr_function();
When I run this its say time_dead is not supposed to be null. Is there a way I can write a trigger function that will automatically enter the time upon inserting or updating but give me results like the above tables when I run a select query?
What am I doing wrong?
My two tables:
CREATE TABLE temporary_object
(
id integer NOT NULL,
time_create timestamp without time zone NOT NULL,
time_dead timestamp without time zone,
PRIMARY KEY (id, time_create)
);
CREATE TABLE persons
(
name text
)
INHERITS (temporary_object);
Trigger function
CREATE FUNCTION tr_function()
RETURNS trigger AS
$func$
BEGIN
UPDATE persons p
SET time_dead = NEW.time_create
WHERE p.id = NEW.id
AND p.time_dead IS NULL
AND p.name <> NEW.name;
RETURN NEW;
END
$func$ LANGUAGE plpgsql;
You were missing the INSERT case in your trigger function (IF tg_op = ''UPDATE''). But there is no need for checking TG_OP to begin with, since the trigger only fires on INSERT OR UPDATE - assuming you don't use the same function in other triggers. So I removed the cruft.
Note that you don't have to escape single quotes inside a dollar-quoted string.
Also added:
AND p.name <> NEW.name
... to prevent INSERT's from terminating themselves instantly (and causing an infinite recursion). This assumes that a row can never succeed another row with the same name.
Aside: The setup is still not bullet-proof. UPDATEs could mess with your system. I could keep updating the id or a row, thereby terminating other rows but not leaving a successor. Consider disallowing updates on id. Of course, that would make the trigger ON UPDATE pointless. I doubt you need that to begin with.
now() as DEFAULT
If you want to use now() as default for time_create just make it so. Read the manual about setting a column DEFAULT. Then skip time_create in INSERTs and it is filled automatically.
If you want to force it (prevent everyone from entering a different value) create a trigger ON INSERT or add the following at the top of your trigger:
IF TG_OP = 'INSERT' THEN
NEW.time_create := now(); -- type timestamp or timestamptz!
RETURN NEW;
END IF;
Assuming your missleadingly named column "time_create" is actually a timestamp type.
That would force the current timestamp for new rows.

PostgreSQL: How to figure out missing numbers in a column using generate_series()?

SELECT commandid
FROM results
WHERE NOT EXISTS (
SELECT *
FROM generate_series(0,119999)
WHERE generate_series = results.commandid
);
I have a column in results of type int but various tests failed and were not added to the table. I would like to create a query that returns a list of commandid that are not found in results. I thought the above query would do what I wanted. However, it does not even work if I use a range that is outside the expected possible range of commandid (like negative numbers).
Given sample data:
create table results ( commandid integer primary key);
insert into results (commandid) select * from generate_series(1,1000);
delete from results where random() < 0.20;
This works:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE NOT EXISTS (SELECT 1 FROM results WHERE commandid = s.i);
as does this alternative formulation:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
LEFT OUTER JOIN results ON (results.commandid = s.i)
WHERE results.commandid IS NULL;
Both of the above appear to result in identical query plans in my tests, but you should compare with your data on your database using EXPLAIN ANALYZE to see which is best.
Explanation
Note that instead of NOT IN I've used NOT EXISTS with a subquery in one formulation, and an ordinary OUTER JOIN in the other. It's much easier for the DB server to optimise these and it avoids the confusing issues that can arise with NULLs in NOT IN.
I initially favoured the OUTER JOIN formulation, but at least in 9.1 with my test data the NOT EXISTS form optimizes to the same plan.
Both will perform better than the NOT IN formulation below when the series is large, as in your case. NOT IN used to require Pg to do a linear search of the IN list for every tuple being tested, but examination of the query plan suggests Pg may be smart enough to hash it now. The NOT EXISTS (transformed into a JOIN by the query planner) and the JOIN work better.
The NOT IN formulation is both confusing in the presence of NULL commandids and can be inefficient:
SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE s.i NOT IN (SELECT commandid FROM results);
so I'd avoid it. With 1,000,000 rows the other two completed in 1.2 seconds and the NOT IN formulation ran CPU-bound until I got bored and cancelled it.
As I mentioned in the comment, you need to do the reverse of the above query.
SELECT
generate_series
FROM
generate_series(0, 119999)
WHERE
NOT generate_series IN (SELECT commandid FROM results);
At that point, you should find values that do not exist within the commandid column within the selected range.
I am not so experienced SQL guru, but I like other ways to solve problem.
Just today I had similar problem - to find unused numbers in one character column.
I have solved my problem by using pl/pgsql and was very interested in what will be speed of my procedure.
I used #Craig Ringer's way to generate table with serial column, add one million records, and then delete every 99th record. This procedure work about 3 sec in searching for missing numbers:
-- creating table
create table results (commandid character(7) primary key);
-- populating table with serial numbers formatted as characters
insert into results (commandid) select cast(num_id as character(7)) from generate_series(1,1000000) as num_id;
-- delete some records
delete from results where cast(commandid as integer) % 99 = 0;
create or replace function unused_numbers()
returns setof integer as
$body$
declare
i integer;
r record;
begin
-- looping trough table with sychronized counter:
i := 1;
for r in
(select distinct cast(commandid as integer) as num_value
from results
order by num_value asc)
loop
if not (i = r.num_value) then
while true loop
return next i;
i = i + 1;
if (i = r.num_value) then
i = i + 1;
exit;
else
continue;
end if;
end loop;
else
i := i + 1;
end if;
end loop;
return;
end;
$body$
language plpgsql volatile
cost 100
rows 1000;
select * from unused_numbers();
Maybe it will be usable for someone.
If you're on AWS redshift, you might end up needing to defy the question, since it doesn't support generate_series. You'll end up with something like this:
select
startpoints.id gapstart,
min(endpoints.id) resume
from (
select id+1 id
from yourtable outer_series
where not exists
(select null
from yourtable inner_series
where inner_series.id = outer_series.id + 1
)
order by id
) startpoints,
yourtable endpoints
where
endpoints.id > startpoints.id
group by
startpoints.id;

postgresql: nested insert

I have two tables. Lets say tblA and tblB.
I need to insert a row in tblA and use the returned id as a value to be inserted as one of the columns in tblB.
I tried finding out this in documentation but could not get it. Well, is it possible to write a statement (intended to be used in prepared) like
INSERT INTO tblB VALUES
(DEFAULT, (INSERT INTO tblA (DEFAULT, 'x') RETURNING id), 'y')
like we do for SELECT?
Or should I do this by creating a Stored Procedure?. I'm not sure if I can create a prepared statement out of a Stored Procedure.
Please advise.
Regards,
Mayank
You'll need to wait for PostgreSQL 9.1 for this:
with
ids as (
insert ...
returning id
)
insert ...
from ids;
In the meanwhile, you need to use plpgsql, a temporary table, or some extra logic in your app...
This is possible with 9.0 and the new DO for anonymous blocks:
do $$
declare
new_id integer;
begin
insert into foo1 (id) values (default) returning id into new_id;
insert into foo2 (id) values (new_id);
end$$;
This can be executed as a single statement. I haven't tried creating a PreparedStatement out of that though.
Edit
Another approach would be to simply do it in two steps, first run the insert into tableA using the returning clause, get the generated value through JDBC, then fire the second insert, something like this:
PreparedStatement stmt_1 = con.prepareStatement("INSERT INTO tblA VALUES (DEFAULT, ?) returning id");
stmt_1.setString(1, "x");
stmt_1.execute(); // important! Do not use executeUpdate()!
ResultSet rs = stmt_1.getResult();
long newId = -1;
if (rs.next()) {
newId = rs.getLong(1);
}
PreparedStatement stmt_2 = con.prepareStatement("INSERT INTO tblB VALUES (default,?,?)");
stmt_2.setLong(1, newId);
stmt_2.setString(2, "y");
stmt_2.executeUpdate();
You can do this in two inserts, using currval() to retrieve the foreign key (provided that key is serial):
create temporary table tb1a (id serial primary key, t text);
create temporary table tb1b (id serial primary key,
tb1a_id int references tb1a(id),
t text);
begin;
insert into tb1a values (DEFAULT, 'x');
insert into tb1b values (DEFAULT, currval('tb1a_id_seq'), 'y');
commit;
The result:
select * from tb1a;
id | t
----+---
3 | x
(1 row)
select * from tb1b;
id | tb1a_id | t
----+---------+---
2 | 3 | y
(1 row)
Using currval in this way is safe whether in or outside of a transaction. From the Postgresql 8.4 documentation:
currval
Return the value most recently
obtained by nextval for this sequence
in the current session. (An error is
reported if nextval has never been
called for this sequence in this
session.) Because this is returning a
session-local value, it gives a
predictable answer whether or not
other sessions have executed nextval
since the current session did.
You may want to use AFTER INSERT trigger for that. Something along the lines of:
create function dostuff() returns trigger as $$
begin
insert into table_b(field_1, field_2) values ('foo', NEW.id);
return new; --values returned by after triggers are ignored, anyway
end;
$$ language 'plpgsql';
create trigger trdostuff after insert on table_name for each row execute procedure dostuff();
after insert is needed because you need to have the id to reference it. Hope this helps.
Edit
A trigger will be called in the same "block" as the command that triggered it, even if not using transactions - in other words, it becomes somewhat part of that command.. Therefore, there is no risk of something changing the referenced id between inserts.