Postgres concurrency issue - postgresql

I wrote the following trigger to guarantee that the field 'filesequence' on the insert receives always the maximum value + 1, for one stakeholder.
CREATE OR REPLACE FUNCTION update_filesequence()
RETURNS TRIGGER AS '
DECLARE
lastSequence file.filesequence%TYPE;
BEGIN
IF (NEW.filesequence IS NULL) THEN
PERFORM ''SELECT id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE'';
SELECT max(filesequence) INTO lastSequence FROM file WHERE stakeholder = NEW.stakeholder;
IF (lastSequence IS NULL) THEN
lastSequence = 0;
END IF;
lastSequence = lastSequence + 1;
NEW.filesequence = lastSequence;
END IF;
RETURN NEW;
END;
' LANGUAGE 'plpgsql';
CREATE TRIGGER file_update_filesequence BEFORE INSERT
ON file FOR EACH ROW EXECUTE PROCEDURE
update_filesequence();
But I have repeated 'filesequence' on the database:
select id, filesequence, stakeholder from file where stakeholder=5273;
id filesequence stakeholder
6773 5 5273
6774 5 5273
By my undertanding, the SELECT... FOR UPDATE would LOCK two transactions on the same stakeholder, and then the second one would read the new 'filesequence'. But it is not working.
I made some tests on PgAdmin, executing the following:
BEGIN;
select id from stakeholder where id = 5273 FOR UPDATE;
And it realy LOCKED other records being inserted to the same stakeholder. Then it seems that the LOCK is working.
But when I run the application with concurrent uploads, I see then repeating.
Someone could help me in finding what is the issue with my trigger?
Thanks,
Douglas.

Your idea is right. To get an autoincrement based on another field (let's say it designate to a group) you cannot use a sequence, then you have to lock the rows of that group before incrementing it.
The logic of your trigger function does that. But you have a misunderstood about the PERFORM operation. It supposed to be put instead of the SELECT keyword, so it does not receive an string as parameter. It means that when you do:
PERFORM 'SELECT id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE';
The PL/pgSQL is actually executing:
SELECT 'SELECT id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE';
And ignoring the result.
What you have to do on this line is:
PERFORM id FROM stakeholder WHERE id = NEW.stakeholder FOR UPDATE;
That is it, only change this line and you are done.

Related

PostgreSQL concurrent check if row exists in table

Suppose I have simple logic:
If user had no balance accrual earlier (which is recorded in accruals table), we must give him 100$ to balance:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
BEGIN
--LOCK TABLE accruals; -- label A
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
END
$$;
COMMIT;
The problem of this transaction is it's not concurrent.
Running this transaction in parrallel results getting user_id=1 with balance=200 and 2 accruals recorded.
How do I test concurrency ?
1. I run in session 1: START TRANSACTION; LOCK TABLE accruals;
2. In session 2 and session 3 I run this transaction
3. In session 1: ROLLBACK
The question is: How do I make this 100% concurrent and make sure user will have 100$ only once.
The only way I see is to lock the table (label A in code sample)
But do I have another way ?
The simplest way is probably to use the serializable isolation level (by changing default_transaction_isolation). Then one of the processes should get something like "ERROR: could not serialize access due to concurrent update"
If you want to keep the isolation level at 'read committed', then you can just count accruals at the end and throw an error:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
_count int;
BEGIN
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
select count(*) into _count from accruals where user_id=1;
IF _count >1 THEN
RAISE SQLSTATE '22023';
END IF;
END
$$;
COMMIT;
This works because one process will block the other on the UPDATE (assuming non-zero number of rows get updated), and by the time one process commits to release the blocked process, its inserted row will be visible to the other one.
Formally there is then no need for the first check, but if you don't want a lot of churn due to rolled back INSERT and UPDATE, you might want to keep it.

How to check if a user has data in certain tables

So the approach I'm taking is to create new boolean columns on the user table that if is set to true, then the table has data, if false the table is empty. Now I'm stuck because I don't know how to create the triggers, or more like the procedure that follows the trigger.
So my logic is...for each table have a trigger:
CREATE TRIGGER check_sales_trigger
AFTER INSERT OR DELETE
ON sales
FOR EACH ROW
EXECUTE PROCEDURE check_sales_table();
Then, create a procedure that updates the boolean column on the user table for each table. So basically I need help creating the procedure.
FYI, each client has his own db.
The functions (that's a plural) need to deal with a) new order for user, b) user id change for existing order and c) deleted order. Writing triggers isn't hard, it just needs some reading the manual. No ifs, no buts, no exceptions.
Since this is your first time, here's an example for the more complicated one (because it can lead to deadlocks if poorly written), to get you started:
create function check_sales_table__update() returns trigger as $$
begin
if new.user_id < old.user_id then
update users
set has_sales = true
where id = new.user_id;
update users
set has_sales = exists (select 1 from sales where user_id = old.user_id)
where id = old.user_id;
elsif old.user_id < new.user_id then
update users
set has_sales = exists (select 1 from sales where user_id = old.user_id)
where id = old.user_id;
update users
set has_sales = true
where id = new.user_id;
end if;
return null;
end;
$$ language plpgsql;
(The above assumes a not null field, of course.)

Using the now() function and executing triggers

I am trying to create a trigger function in PostgreSQL that should check records with the same id (i.e. comparison by id with existing records) before inserting or updating the records. If the function finds records that have the same id, then that entry is set to be the time_dead. Let me explain with this example:
INSERT INTO persons (id, time_create, time_dead, name)
VALUES (1, 'now();', ' ', 'james');
I want to have a table like this:
id time_create time-dead name
1 06:12 henry
2 07:12 muka
id 1 had a time_create 06.12 but the time_dead was NULL. This is the same as id 2 but next time I try to run the insert query with same id but different names I should get a table like this:
id time_create time-dead name
1 06:12 14:35 henry
2 07:12 muka
1 14:35 waks
henry and waks share the same id 1. After running an insert query henry's time_dead is equal to waks' time_create. If another entry was to made with id 1, lets say for james, the time entry for james will be equal to the time_dead for waks. And so on.
So far my function looks like this. But it's not working:
CREATE FUNCTION tr_function() RETURNS trigger AS '
BEGIN
IF tg_op = ''UPDATE'' THEN
UPDATE persons
SET time_dead = NEW.time_create
Where
id = NEW.id
AND time_dead IS NULL
;
END IF;
RETURN new;
END
' LANGUAGE plpgsql;
CREATE TRIGGER sofgr BEFORE INSERT OR UPDATE
ON persons FOR each ROW
EXECUTE PROCEDURE tr_function();
When I run this its say time_dead is not supposed to be null. Is there a way I can write a trigger function that will automatically enter the time upon inserting or updating but give me results like the above tables when I run a select query?
What am I doing wrong?
My two tables:
CREATE TABLE temporary_object
(
id integer NOT NULL,
time_create timestamp without time zone NOT NULL,
time_dead timestamp without time zone,
PRIMARY KEY (id, time_create)
);
CREATE TABLE persons
(
name text
)
INHERITS (temporary_object);
Trigger function
CREATE FUNCTION tr_function()
RETURNS trigger AS
$func$
BEGIN
UPDATE persons p
SET time_dead = NEW.time_create
WHERE p.id = NEW.id
AND p.time_dead IS NULL
AND p.name <> NEW.name;
RETURN NEW;
END
$func$ LANGUAGE plpgsql;
You were missing the INSERT case in your trigger function (IF tg_op = ''UPDATE''). But there is no need for checking TG_OP to begin with, since the trigger only fires on INSERT OR UPDATE - assuming you don't use the same function in other triggers. So I removed the cruft.
Note that you don't have to escape single quotes inside a dollar-quoted string.
Also added:
AND p.name <> NEW.name
... to prevent INSERT's from terminating themselves instantly (and causing an infinite recursion). This assumes that a row can never succeed another row with the same name.
Aside: The setup is still not bullet-proof. UPDATEs could mess with your system. I could keep updating the id or a row, thereby terminating other rows but not leaving a successor. Consider disallowing updates on id. Of course, that would make the trigger ON UPDATE pointless. I doubt you need that to begin with.
now() as DEFAULT
If you want to use now() as default for time_create just make it so. Read the manual about setting a column DEFAULT. Then skip time_create in INSERTs and it is filled automatically.
If you want to force it (prevent everyone from entering a different value) create a trigger ON INSERT or add the following at the top of your trigger:
IF TG_OP = 'INSERT' THEN
NEW.time_create := now(); -- type timestamp or timestamptz!
RETURN NEW;
END IF;
Assuming your missleadingly named column "time_create" is actually a timestamp type.
That would force the current timestamp for new rows.

strange behavior in table column in postgres

I am currently using postgres 8.3. I have created a table that acts as a dirty flag table for members that exist in another table. I have applied triggers after insert or update on the members table that will insert/update a record on the modifications table with a value of true. The trigger seems to work, however I am noticing that something is flipping the boolean is_modified value. I have no idea how to go about trying to isolate what could be flipping it.
Trigger function:
BEGIN;
CREATE OR REPLACE FUNCTION set_member_as_modified() RETURNS TRIGGER AS $set_member_as_modified$
BEGIN
LOOP
-- first try to update the key
UPDATE member_modification SET is_modified = TRUE, updated = current_timestamp WHERE "memberID" = NEW."memberID";
IF FOUND THEN
RETURN NEW;
END IF;
--member doesn't exist in modification table, so insert them
-- if someone else inserts the same key conncurrently, raise a unique-key failure
BEGIN
INSERT INTO member_modification("memberID",is_modified,updated) VALUES(NEW."memberID", TRUE,current_timestamp);
RETURN NEW;
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try the update again
END;
END LOOP;
END;
$set_member_as_modified$ LANGUAGE plpgsql;
COMMIT;
CREATE TRIGGER set_member_as_modified AFTER INSERT OR UPDATE ON members FOR EACH ROW EXECUTE PROCEDURE set_member_as_modified();
Here is the sql I run and the results:
$CREATE TRIGGER set_member_as_modified AFTER INSERT OR UPDATE ON members FOR EACH ROW EXECUTE PROCEDURE set_member_as_modified();
Results:
UPDATE 1
bluesky=# select * from member_modification;
-[ RECORD 1 ]---+---------------------------
modification_id | 14
is_modified | t
updated | 2011-05-26 09:49:47.992241
memberID | 182346
bluesky=# select * from member_modification;
-[ RECORD 1 ]---+---------------------------
modification_id | 14
is_modified | f
updated | 2011-05-26 09:49:47.992241
memberID | 182346
As you can see something flipped the is_modified value. Is there anything in postgres I can use to determine what queries/processes are acting on this table?
Are you sure you've posted everything needed? The two queries on member_modification suggest that a separate query is being run in between, which sets is_modified back to false.
You could add an text[] field to member_modification, e.g. query_trace text[] not null default '{}', then and a before insert/update trigger on each row on that table which goes something like:
NEW.query_trace := NEW.query_trace || current_query();
If current_query() is not available in 8.3, see this:
http://www.postgresql.org/docs/8.3/static/monitoring-stats.html
SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
pg_stat_get_backend_activity(s.backendid) AS current_query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
You could then get the list of subsequent queries that affected it:
select query_trace[i] from generate_series(1, array_length(query_trace, 1)) as i

count number of rows to be affected before update in trigger

I want to know number of rows that will be affected by UPDATE query in BEFORE per statement trigger . Is that possible?
The problem is that i want to allow only queries that will update up to 4 rows. If affected rows count is 5 or more i want to raise error.
I don't want to do this in code because i need this check on db level.
Is this at all possible?
Thanks in advance for any clues on that
Write a function that updates the rows for you or performs a rollback. Sorry for poor style formatting.
create function update_max(varchar, int)
RETURNS void AS
$BODY$
DECLARE
sql ALIAS FOR $1;
max ALIAS FOR $2;
rcount INT;
BEGIN
EXECUTE sql;
GET DIAGNOSTICS rcount = ROW_COUNT;
IF rcount > max THEN
--ROLLBACK;
RAISE EXCEPTION 'Too much rows affected (%).', rcount;
END IF;
--COMMIT;
END;
$BODY$ LANGUAGE plpgsql
Then call it like
select update_max('update t1 set id=id+10 where id < 4', 3);
where the first param ist your sql-Statement and the 2nd your max rows.
Simon had a good idea but his implementation is unnecessarily complicated. This is my proposition:
create or replace function trg_check_max_4()
returns trigger as $$
begin
perform true from pg_class
where relname='check_max_4' and relnamespace=pg_my_temp_schema();
if not FOUND then
create temporary table check_max_4
(value int check (value<=4))
on commit drop;
insert into check_max_4 values (0);
end if;
update check_max_4 set value=value+1;
return new;
end; $$ language plpgsql;
I've created something like this:
begin;
create table test (
id integer
);
insert into test(id) select generate_series(1,100);
create or replace function trg_check_max_4_updated_records()
returns trigger as $$
declare
counter_ integer := 0;
tablename_ text := 'temptable';
begin
raise notice 'trigger fired';
select count(42) into counter_
from pg_catalog.pg_tables where tablename = tablename_;
if counter_ = 0 then
raise notice 'Creating table %', tablename_;
execute 'create temporary table ' || tablename_ || ' (counter integer) on commit drop';
execute 'insert into ' || tablename_ || ' (counter) values(1)';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
else
execute 'select counter from ' || tablename_ into counter_;
execute 'update ' || tablename_ || ' set counter = counter + 1';
raise notice 'updating';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
if counter_ > 4 then
raise exception 'Cannot change more than 4 rows in one trancation';
end if;
end if;
return new;
end; $$ language plpgsql;
create trigger trg_bu_test before
update on test
for each row
execute procedure trg_check_max_4_updated_records();
update test set id = 10 where id <= 1;
update test set id = 10 where id <= 2;
update test set id = 10 where id <= 3;
update test set id = 10 where id <= 4;
update test set id = 10 where id <= 5;
rollback;
The main idea is to have a trigger on 'before update for each row' that creates (if necessary) a temporary table (that is dropped at the end of transaction). In this table there is just one row with one value, that is the number of updated rows in current transaction. For each update the value is incremented. If the value is bigger than 4, the transaction is stopped.
But I think that this is a wrong solution for your problem. What's a problem to run such wrong query that you've written about, twice, so you'll have 8 rows changed. What about deletion rows or truncating them?
PostgreSQL has two types of triggers: row and statement triggers. Row triggers only work within the context of a row so you can't use those. Unfortunately, "before" statement triggers don't see what kind of change is about to take place so I don't believe you can use those, either.
Based on that, I would say it's unlikely you'll be able to build that kind of protection into the database using triggers, not unless you don't mind using an "after" trigger and rolling back the transaction if the condition isn't satisfied. Wouldn't mind being proved wrong. :)
Have a look at using Serializable Isolation Level. I believe this will give you a consistent view of the database data within your transaction. Then you can use option #1 that MusiGenesis mentioned, without the timing vulnerability. Test it of course to validate.
I've never worked with postgresql, so my answer may not apply. In SQL Server, your trigger can call a stored procedure which would do one of two things:
Perform a SELECT COUNT(*) to determine the number of records that will be affected by the UPDATE, and then only execute the UPDATE if the count is 4 or less
Perform the UPDATE within a transaction, and only commit the transaction if the returned number of rows affected is 4 or less
No. 1 is timing vulnerable (the number of records affected by the UPDATE may change between the COUNT(*) check and the actual UPDATE. No. 2 is pretty inefficient, if there are many cases where the number of rows updated is greater than 4.