PostgreSQL read commit on different transactions - postgresql

I was running some tests to better understanding read commits for postgresql.
I have two transactions running in parallel:
-- transaction 1
begin;
select id from item order by id asc FETCH FIRST 500 ROWS ONLY;
select pg_sleep(10);
commit;
--transaction 2
begin;
select id from item order by id asc FETCH FIRST 500 ROWS ONLY;
commit;
The first transaction will select first 500 ids and then hold the id by sleeping 10s
The second transaction will in the mean while querying for first 500 rows in the table.
Based my understanding of read commits, first transaction will select 1 to 500 records and second transaction will select 501 to 1000 records.
But the actual result is that both two transactions select 1 to 500 records.
I will be really appreciated if someone can point out which part is wrong. Thanks

You are misinterpreting the meaning of read committed. It means that a transaction cannot see (select) updates that are not committed. Try the following:
create table read_create_test( id integer generated always as identity
, cola text
) ;
insert into read_create_test(cola)
select 'item-' || to_char(n,'fm000')
from generate_series(1,50) gs(n);
-- transaction 1
do $$
max_val integer;
begin
insert into read_create_test(cola)
select 'item2-' || to_char(n+100,'fm000')
from generate_series(1,50) gs(n);
select max(id)
into max_val
from read_create_test;
raise notice 'Transaction 1 Max id: %',max_val;
select pg_sleep(30); -- make sure 2nd transaction has time to start
commit;
end;
$$;
-- transaction 2 (run after transaction 1 begins but before it ends)
do $$
max_val integer;
begin
select max(id)
into max_val
from read_create_test;
raise notice 'Transaction 2 Max id: %',max_val;
end;
$$;
-- transaction 3 (run after transaction 1 ends)
do $$
max_val integer;
begin
select max(id)
into max_val
from read_create_test;
raise notice 'Transaction 3 Max id: %',max_val;
end;
$$;
Analyze the results keeping in mind that A transaction cannot see uncommitted DML.

Related

postgresql message_id auto increment for each user

I am building chatroom using Go,PostgreSQL.
I have this table -> messages
id | user | message_id | message
I want for each user, message_id Start from 1 and And increase automatically.
for example :
id | user | message_id | message
1 | one | 1 | Hello
2 | two | 1 | Hi!
3 | one | 2 | How are you?
4 | one | 3 | :)
Thanks in advance.
Without duplicate values and to safely increase message_id field we use SELECT FOR UPDATE command.
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row.
Now let's write a simple function for inserting messages:
CREATE OR REPLACE FUNCTION insert_message(p_user character varying, p_message text)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
v_new_id integer;
v_id integer;
begin
-- selecting user last inserting message and get last message_id, and blocking this record for update command
select mm.id, mm.message_id into v_id, v_new_id
from messages mm
where
mm."user" = p_user
order by
mm.message_id desc
limit 1
for update;
if (v_new_id is null) then
v_new_id = 0;
end if;
insert into messages
(
"user",
message_id,
message_text
) values
(
p_user,
v_new_id + 1,
p_message
);
-- after inserting message we must unblock record
if (v_new_id > 0) then
update messages
set
updated_date = now()
where id = v_id;
end if;
END;
$function$
;
As you can see from the code for increasing message_id we must filter records by user and then the ordered message_id field descending. Given that in future this table will have million records then performance will be bad. For good performancing let's write new function:
CREATE OR REPLACE FUNCTION insert_message(p_user character varying, p_message text)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
v_new_id integer;
v_id integer;
begin
-- in users table add new last_id field, for storing last message_id value by user
-- and then block this record
select last_id into v_new_id from users
where "user" = p_user
for update;
insert into messages
(
"user",
message_id,
message_text
) values
(
p_user,
v_new_id + 1,
p_message
);
-- unblocking user record
update users
set
last_id = v_new_id + 1
where "user" = p_user;
END;
$function$
;
This code was simple and high-performance.

Is it worth Parallel/Concurrent INSERT INTO... (SELECT...) to the same Table in Postgres?

I was attempting an INSERT INTO.... ( SELECT... ) (inserting a batch of rows from SELECT... subquery), onto the same table in my database. For the most part it was working, however, I did see a "Deadlock" exception logged every now and then. Does it make sense to do this or is there a way to avoid a deadlock scenario? On a high-level, my queries both resemble this structure:
CREATE OR REPLACE PROCEDURE myConcurrentProc() LANGUAGE plpgsql
AS $procedure$
DECLARE
BEGIN
LOOP
EXIT WHEN row_count = 0
WITH cte AS (SELECT *
FROM TableA tbla
WHERE EXISTS (SELECT 1 FROM TableB tblb WHERE tblb.id = tbla.id)
INSERT INTO concurrent_table (SELECT id FROM cte);
COMMIT;
UPDATE log_tbl
SET status = 'FINISHED',
WHERE job_name = 'tblA_and_B_job';
END LOOP;
END
$procedure$;
And the other script that runs in parallel and INSERTS... also to the same table is also basically:
CREATE OR REPLACE PROCEDURE myConcurrentProc() LANGUAGE plpgsql
AS $procedure$
DECLARE
BEGIN
LOOP
EXIT WHEN row_count = 0
WITH cte AS (SELECT *
FROM TableC c
WHERE EXISTS (SELECT 1 FROM TableD d WHERE d.id = tblc.id)
INSERT INTO concurrent_table (SELECT id FROM cte);
COMMIT;
UPDATE log_tbl
SET status = 'FINISHED',
WHERE job_name = 'tbl_C_and_D_job';
END LOOP;
END
$procedure$;
So you can see I'm querying two different tables in each script, however inserting into the same some_table. I also have the UPDATE... statement that writes to a log table so I suppose that could also cause issues. Is there any way to use BEGIN... END here and COMMIT to avoid any deadlock/concurrency issues or should I just create a 2nd table to hold the "tbl_C_and_D_job" data?

PostgreSQL concurrent check if row exists in table

Suppose I have simple logic:
If user had no balance accrual earlier (which is recorded in accruals table), we must give him 100$ to balance:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
BEGIN
--LOCK TABLE accruals; -- label A
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
END
$$;
COMMIT;
The problem of this transaction is it's not concurrent.
Running this transaction in parrallel results getting user_id=1 with balance=200 and 2 accruals recorded.
How do I test concurrency ?
1. I run in session 1: START TRANSACTION; LOCK TABLE accruals;
2. In session 2 and session 3 I run this transaction
3. In session 1: ROLLBACK
The question is: How do I make this 100% concurrent and make sure user will have 100$ only once.
The only way I see is to lock the table (label A in code sample)
But do I have another way ?
The simplest way is probably to use the serializable isolation level (by changing default_transaction_isolation). Then one of the processes should get something like "ERROR: could not serialize access due to concurrent update"
If you want to keep the isolation level at 'read committed', then you can just count accruals at the end and throw an error:
START TRANSACTION;
DO LANGUAGE plpgsql $$
DECLARE _accrual accruals;
_count int;
BEGIN
SELECT * INTO _accrual from accruals WHERE user_id = 1;
IF _accrual.accrual_id IS NOT NULL THEN
RAISE SQLSTATE '22023';
END IF;
UPDATE users SET balance = balance + 100 WHERE user_id = 1;
INSERT INTO accruals (user_id, amount) VALUES (1, 100);
select count(*) into _count from accruals where user_id=1;
IF _count >1 THEN
RAISE SQLSTATE '22023';
END IF;
END
$$;
COMMIT;
This works because one process will block the other on the UPDATE (assuming non-zero number of rows get updated), and by the time one process commits to release the blocked process, its inserted row will be visible to the other one.
Formally there is then no need for the first check, but if you don't want a lot of churn due to rolled back INSERT and UPDATE, you might want to keep it.

Is there a similar function in postgresql for mysql's SQL_CALC_FOUND_ROWS?

everybody using mysql knows:
SELECT SQL_CALC_FOUND_ROWS ..... FROM table WHERE ... LIMIT 5, 10;
and right after run this :
SELECT FOUND_ROWS();
how do i do this in postrgesql? so far, i found only ways where i have to send the query twice...
No, there is not (at least not as of July 2007). I'm afraid you'll have to resort to:
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT id, username, title, date FROM posts ORDER BY date DESC LIMIT 20;
SELECT count(id, username, title, date) AS total FROM posts;
END;
The isolation level needs to be SERIALIZABLE to ensure that the query does not see concurrent updates between the SELECT statements.
Another option you have, though, is to use a trigger to count rows as they're INSERTed or DELETEd. Suppose you have the following table:
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
poster TEXT,
title TEXT,
time TIMESTAMPTZ DEFAULT now()
);
INSERT INTO posts (poster, title) VALUES ('Alice', 'Post 1');
INSERT INTO posts (poster, title) VALUES ('Bob', 'Post 2');
INSERT INTO posts (poster, title) VALUES ('Charlie', 'Post 3');
Then, perform the following to create a table called post_count that contains a running count of the number of rows in posts:
-- Don't let any new posts be added while we're setting up the counter.
BEGIN;
LOCK TABLE posts;
-- Create and initialize our post_count table.
SELECT count(*) INTO TABLE post_count FROM posts;
-- Create the trigger function.
CREATE FUNCTION post_added_or_removed() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'DELETE' THEN
UPDATE post_count SET count = count - 1;
ELSIF TG_OP = 'INSERT' THEN
UPDATE post_count SET count = count + 1;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- Call the trigger function any time a row is inserted.
CREATE TRIGGER post_added_or_removed_tgr
AFTER INSERT OR DELETE
ON posts
FOR EACH ROW
EXECUTE PROCEDURE post_added_or_removed();
COMMIT;
Note that this maintains a running count of all of the rows in posts. To keep a running count of certain rows, you'll have to tweak it:
SELECT count(*) INTO TABLE post_count FROM posts WHERE poster <> 'Bob';
CREATE FUNCTION post_added_or_removed() RETURNS TRIGGER AS $$
BEGIN
-- The IF statements are nested because OR does not short circuit.
IF TG_OP = 'DELETE' THEN
IF OLD.poster <> 'Bob' THEN
UPDATE post_count SET count = count - 1;
END IF;
ELSIF TG_OP = 'INSERT' THEN
IF NEW.poster <> 'Bob' THEN
UPDATE post_count SET count = count + 1;
END IF;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
There is a simple way, but keep in mind, that following COUNT(*) aggr function will be applied to all rows returned after where and before limit/offset (may be costy)
SELECT
id,
"count" (*) OVER () AS cnt
FROM
objects
WHERE
id > 2
OFFSET 50
LIMIT 5
No, PostgreSQL doesn't try to count all relevant results when you only need 10 results. You need a seperate COUNT to count all results.

count number of rows to be affected before update in trigger

I want to know number of rows that will be affected by UPDATE query in BEFORE per statement trigger . Is that possible?
The problem is that i want to allow only queries that will update up to 4 rows. If affected rows count is 5 or more i want to raise error.
I don't want to do this in code because i need this check on db level.
Is this at all possible?
Thanks in advance for any clues on that
Write a function that updates the rows for you or performs a rollback. Sorry for poor style formatting.
create function update_max(varchar, int)
RETURNS void AS
$BODY$
DECLARE
sql ALIAS FOR $1;
max ALIAS FOR $2;
rcount INT;
BEGIN
EXECUTE sql;
GET DIAGNOSTICS rcount = ROW_COUNT;
IF rcount > max THEN
--ROLLBACK;
RAISE EXCEPTION 'Too much rows affected (%).', rcount;
END IF;
--COMMIT;
END;
$BODY$ LANGUAGE plpgsql
Then call it like
select update_max('update t1 set id=id+10 where id < 4', 3);
where the first param ist your sql-Statement and the 2nd your max rows.
Simon had a good idea but his implementation is unnecessarily complicated. This is my proposition:
create or replace function trg_check_max_4()
returns trigger as $$
begin
perform true from pg_class
where relname='check_max_4' and relnamespace=pg_my_temp_schema();
if not FOUND then
create temporary table check_max_4
(value int check (value<=4))
on commit drop;
insert into check_max_4 values (0);
end if;
update check_max_4 set value=value+1;
return new;
end; $$ language plpgsql;
I've created something like this:
begin;
create table test (
id integer
);
insert into test(id) select generate_series(1,100);
create or replace function trg_check_max_4_updated_records()
returns trigger as $$
declare
counter_ integer := 0;
tablename_ text := 'temptable';
begin
raise notice 'trigger fired';
select count(42) into counter_
from pg_catalog.pg_tables where tablename = tablename_;
if counter_ = 0 then
raise notice 'Creating table %', tablename_;
execute 'create temporary table ' || tablename_ || ' (counter integer) on commit drop';
execute 'insert into ' || tablename_ || ' (counter) values(1)';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
else
execute 'select counter from ' || tablename_ into counter_;
execute 'update ' || tablename_ || ' set counter = counter + 1';
raise notice 'updating';
execute 'select counter from ' || tablename_ into counter_;
raise notice 'Actual value for counter= [%]', counter_;
if counter_ > 4 then
raise exception 'Cannot change more than 4 rows in one trancation';
end if;
end if;
return new;
end; $$ language plpgsql;
create trigger trg_bu_test before
update on test
for each row
execute procedure trg_check_max_4_updated_records();
update test set id = 10 where id <= 1;
update test set id = 10 where id <= 2;
update test set id = 10 where id <= 3;
update test set id = 10 where id <= 4;
update test set id = 10 where id <= 5;
rollback;
The main idea is to have a trigger on 'before update for each row' that creates (if necessary) a temporary table (that is dropped at the end of transaction). In this table there is just one row with one value, that is the number of updated rows in current transaction. For each update the value is incremented. If the value is bigger than 4, the transaction is stopped.
But I think that this is a wrong solution for your problem. What's a problem to run such wrong query that you've written about, twice, so you'll have 8 rows changed. What about deletion rows or truncating them?
PostgreSQL has two types of triggers: row and statement triggers. Row triggers only work within the context of a row so you can't use those. Unfortunately, "before" statement triggers don't see what kind of change is about to take place so I don't believe you can use those, either.
Based on that, I would say it's unlikely you'll be able to build that kind of protection into the database using triggers, not unless you don't mind using an "after" trigger and rolling back the transaction if the condition isn't satisfied. Wouldn't mind being proved wrong. :)
Have a look at using Serializable Isolation Level. I believe this will give you a consistent view of the database data within your transaction. Then you can use option #1 that MusiGenesis mentioned, without the timing vulnerability. Test it of course to validate.
I've never worked with postgresql, so my answer may not apply. In SQL Server, your trigger can call a stored procedure which would do one of two things:
Perform a SELECT COUNT(*) to determine the number of records that will be affected by the UPDATE, and then only execute the UPDATE if the count is 4 or less
Perform the UPDATE within a transaction, and only commit the transaction if the returned number of rows affected is 4 or less
No. 1 is timing vulnerable (the number of records affected by the UPDATE may change between the COUNT(*) check and the actual UPDATE. No. 2 is pretty inefficient, if there are many cases where the number of rows updated is greater than 4.