Lock row, release later - postgresql

I'm trying to understand how to lock a row, and only release that lock later.
I have a table like this :
create table testTable (Name varchar(100));
Some test data
insert into testTable (name) select 'Bob';
insert into testTable (name) select 'John';
insert into testTable (name) select 'Steve';
Now, I want to select one of those rows, and prevent other other queries from seeing this row. I achieve that like this :
begin transaction;
select * from testTable where name = 'Bob' for update;
In another window, I do this :
select * from testTable for update skip locked;
Great, I don't see 'Bob' in that result set. Now, I want to do something with the primary retrieved row (Bob), and after I did my work, I want to release that row again. Simple answer would be to do :
commit transaction
However, I am running multiple transactions on the same connection, so I can't just begin and commit transactions all over the show. Ideally I would like to have a "named" transaction, something like :
begin transaction 'myTransaction';
select * from testTable where name = 'Bob' for update;
//do stuff with the data, outside sql then later call ...
commit transaction 'myTransaction';
But postgres doesn't support that. I have found "prepare transaction", but that seems to be a pear-shaped path I don't want to go down, especially as these transaction seem to persist through restarts even.
Is there anyway I can have a reference to commit/rollback for a specific transaction?

You can have only one transaction in a database session, so the question as such is moot.
But I assume that you do not really want to run a transaction, you want to block access to a certain row for a while.
It is usually not a good idea to use regular database locks for such a purpose (the exception are advisory locks, which serve exactly that purpose, but are not tied to table rows). The problem is that long database transactions keep autovacuum from doing its job.
I recommend that you add a status column to the table and change the status rather than locking the row. That would server the same purpose in a more natural fashion and make your problem go away.
If you are concerned that the status flag might not get cleared due to application logic problems, replace it with a visible_from column of type timestamp with time zone that initially contains -infinity. Instead of locking the row, set the value to current_timestamp + INTERVAL '5 minutes'. Only select rows that fulfill WHERE visible_from < current_timestamp. That way the “lock” will automatically expire after 5 minutes.

Related

How to avoid deadlock when delete/update the same record in the Postgres

I have a scenario when I play with Postgres.
We have one table with primary key, and there are two concurrent process, the one can update record, another process can delete record.
Now we are facing deadlock, when two processes play with update/delete the same record in the table.
I google how to avoid deadlock, someone says to use "SELECT FOR UPDATE".
Suppose there are two statements as following
update table_A set name='aaaa' where cid=1;
delete table_A where cid=1;
My question is,
(1) Do I need to add "SELECT FOR UPDATE" to both statements or just one statement in order to avoid deadlock?
(2) Could you give a complete example how to add "SELECT FOR UPDATE" ? I mean, what does it look like after you add "SELECT FOR UPDATE"? I never do it before, I want to learn how to add it.
SELECT ... FOR UPDATE locks the selected rows so that any other transaction can neither perform an update nor a SELECT ... FOR UPDATE on these rows. These transactions must wait until the transaction with the first SELECT ... FOR UPDATE releases the lock on the rows again.
If SELECT ... FOR UPDATE is the first statement in all transactions, no deadlock can occur. Because no transaction can lock other lines, which could be used in the further course of other transactions.
So your two transactions should look like this:
BEGIN;
SELECT * FROM table_A WHERE cid = 1 FOR UPDATE;
-- some other statements
UPDATE table_A SET name = 'aaaa' WHERE cid = 1;
END;
and:
BEGIN;
SELECT * FROM table_A WHERE cid = 1 FOR UPDATE;
-- some other statements
DELETE FROM table_A WHERE cid = 1;
END;

how can I do conditional insert in postgres when there can be concurrent inserts that can create conflict?

I am trying to write an experimentation framework where user can schedule some experiments based on location-ids and time.
my table schema looks like :
TABLE experiment (
id INT NOT NULL PRIMARY KEY,
name varchar(20) NOT NULL,
locationIds varchar[] NOT NULL,
timeStart timestamp NOT NULL,
timeEnd timestamp NOT NULL,
createdAt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
updatedAt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
)
there are insert operations to be done with condition that the location(s) and time should not overlap.
I wanted to know what can be done to avoid in-consistency of data state when there are 2 concurrent inserts taken up where location OR time overlaps,
Ideally I want one of the insert to succeed, but I am fine If both fails and application is supposed to retry again.
Few Approached I tried to think:
Approach:
APPROACH-1
Have an enable column that tells whether certain entry is valid
OR not.
I insert the experiment schedule entry with enable=FALSE
Then I check if there is any other entry which is enabled and is
overlapping with the current Insert.
IF there is such entry then I do nothing and that experiment is not
scheduled. Else I update the entry to enable=TRUE.
Problem : If there is a concurrent conflicting insert, then both will get enable=TRUE when both cleared the step-3.
I gave a thought if I let the transaction-isolation level to be read-uncommitted then also, I can't differentiate the ones in process and the ones already enable=TRUE
Then I thought, If I mark enable as a enum [IN_PROGRESS, ENABLED, DISABLED] then approach will look like this.
APPROACH-2
Have an enable column that tells whether certain entry is [IN_PROGRESS, ENABLED, DISABLED]
I insert the experiment schedule entry with enable=IN_PROGRESS
Then I check if there is any other entry which is enable=ENABLED OR enable=IN_PROGRESS and is overlapping with the current Insert.
IF there is such entry then I update enable=DISABLED and that experiment is not
scheduled. Else I update the entry to enable=ENABLED.
Problem : If there is a concurrent conflicting insert, then both will get enable=DISABLED when both cleared the step-3 and get such overlapping entry.
If the transaction-isolation level is READ-COMMITTED then this will only work IF each step is a transaction, rather whole process as one transaction.
If the transaction-isolation level is READ-UNCOMMITTED then this can be taken up as one transaction, with DISABLED state can be taken as a ROLLBACK step too.
APPROACH-3
Using Trigger Based solution as I am using POSTGRES, I can add a trigger for each insert operation, post insert where I check for such overlapping entry, if there is none, then I update the row to have enable=TRUE
CREATE OR REPLACE FUNCTION enable_if_unique()
RETURNS TRIGGER AS $$
BEGIN
IF (TG_OP = 'INSERT') THEN
UPDATE experiment
SET NEW.enable=true
WHERE (SELECT count(1)
FROM experiment
WHERE enable= true AND location_Ids && OLD.location_ids AND (OLD.timeStart, OLD.timeEnd) OVERLAPS (timeStart, timeEnd)
) = 0;
RETURN NEW;
END IF;
END;
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER enable_if_unique_trigger BEFORE INSERT ON experiment FOR EACH ROW EXECUTE PROCEDURE enable_if_unique();
I am not sure about Approach 3 because I feel it require trigger to act in a serial manner for each insert operation so that one of the Experiment is actually enabled while rest of overlapping ones are disabled.
APPROACH-4
From online search for other possible solution, I See Inserts taken up using Select Statement and the WHERE clause helping to add the required condition.
INSERT INTO experiment(id, name, locationIds, timeStart, timeEnd)
SELECT 1, 'exp-1', ARRAY[123,234,345], '2020-03-13 12:00:00'
WHERE (
SELECT count(1)
FROM EXPERIMENT
WHERE enable= true
AND
location_Ids && OLD.location_ids
AND
(OLD.timeStart, OLD.timeEnd) OVERLAPS (timeStart, timeEnd)
) = 0;
I feel there is still possibility of consistency issue as both concurrent operations will not be able to read each in the SELECT statement checking the constraint.
Final APPROACH : APPROACH-2
I like to know following things :
Which is the best approach in terms of scalability and high-throughput ?
Which approach is actually making the sure the data consistency is maintained?
Any Other Approach that I could have used and missed here!!!
Newbie To POSTGRES, Will APPRECIATE example OR links
as mentioned by #a_horse_with_no_name
we can use exclusion constraint :
-- this prevents overlaps in the locationids AND the time range
alter table experiment
add constraint no_overlap
exclude using gist (locationids with &&, tsrange(timestart, timeend) with &&);

Force a "lock" with Postgres and GO

I am new to Postgres so this may be obvious (or very difficult, I am not sure).
I would like to force a table or row to be "locked" for at least a few seconds at a time. Which will cause a second operation to "wait".
I am using golang with "github.com/lib/pq" to interact with the database.
The reason I need this is because I am working on a project that monitors postgresql. Thanks for any help.
You can also use select ... for update to lock a row or rows for the length of the transaction.
Basically, it's like:
begin;
select * from foo where quatloos = 100 for update;
update foo set feens = feens + 1 where quatloos = 100;
commit;
This will execute an exclusive row-level lock on foo table rows where quatloos = 100. Any other transaction attempting to access those rows will be blocked until commit or rollback has been issued once the select for update has run.
Ideally, these locks should live as short as possible.
See: https://www.postgresql.org/docs/current/static/explicit-locking.html

How to ensure a unique number field with zero order

Here is a table that has fields id, id_user, order_id.
Required when creating a record to find the last number of user and insert the following in order.
I wrote a stored procedure that takes the next order number to the user, but even it does not provide a unique order number.
CREATE OR REPLACE FUNCTION get_next_order()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $function$
DECLARE
next_order_num bigint;
BEGIN
select order_id + 1 INTO next_order_num
from payment_out
where payment_out.id_usr = NEW.id_usr
and payment_out.order_id is not null
order by payment_out.order_id desc
limit 1;
-- if payments does't exist, return 1
NEW.order_id = coalesce(next_order_num, 1);
return NEW;
END;
$function$
CREATE TRIGGER get_next_order
BEFORE INSERT
ON payment_out
FOR EACH ROW EXECUTE
PROCEDURE get_next_order()
How can I avoid duplicate order numbers?
For this to work in the presence of multiple concurrent transactions inserting orders for the same user, you need a lock on a particular record to make them wait and execute serially.
e.g., before the first SELECT, you might:
PERFORM 1 FROM "users" where id_user = NEW.id_user FOR UPDATE;
where you lock the parent "users" record that owns the orders.
Otherwise, multiple concurrent transactions could execute your procedure at the same time, but they can't see each others' inserted values, so they'll pick the same numbers.
However, beware: A foreign key constraint will cause a SHARE lock to be taken on the users entry already, when you insert into a table that depends on it. Your trigger will try to upgrade that into an UPDATE lock, but multiple transactions might already hold the SHARE lock, so this will block. You'll land up with transactions all waiting for each other, until PostgreSQL kills all but one of them in a deadlock abort error. The only way to avoid this is for the application to SELECT 1 FROM users WHERE id_user = blahblah FOR UPDATE before it creates the orders for that user.
A variant is to keep a next_order_id field in users and do an UPDATE users SET next_order_id = next_order_id + 1 RETURNING next_order_id, and use the result of that to set the order ID. The same lock upgrade problem applies.

SQL Isolation levels or locks in large procedures

I have big stored procedures that handle user actions.
They consist of multiple select statements. These are filtered, most of the times only getting one row. The Selects are copied into temptables or otherwise evaluated.
Finally, a merge-Statement does the needed changes in the DB.
All is encapsulated in a transaction.
I have concurrent input from users, and the selected rows of the select statements should be locked to keep data integrity.
How can I lock the selected Rows of all select statements, so that they aren't updated through other transactions while the current transaction is in process?
Does a table hint combination of ROWLOCK and HOLDLOCK work in a way that only the selected rows are locked, or are the whole tables locked because of the HOLDLOCK?
SELECT *
FROM dbo.Test
WITH (ROWLOCK HOLDLOCK )
WHERE id = #testId
Can I instead use
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
right after the start of the transaction? Or does this lock the whole tables?
I am using SQL2008 R2, but would also be interested if things work differently in SQL2012.
PS: I just read about the table hints UPDLOCK and SERIALIZE. UPDLOCK seems to be a solution to lock only one row, and it seems as if UPDLOCK always locks instead of ROWLOCK, which does only specify that locks are row based IF locks are applied. I am still confused about the best way to solve this...
Changing the isolation level fixed the problem (and locked on row level):
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Here is how I tested it.
I created a statement in a blank page of the SQL Management Studio:
begin tran
select
*
into #message
from dbo.MessageBody
where MessageBody.headerId = 28
WAITFOR DELAY '0:00:05'
update dbo.MessageBody set [message] = 'message1'
where headerId = (select headerId from #message)
select * from dbo.MessageBody where headerId = (select headerId from #message)
drop table #message
commit tran
While executing this statement (which takes at last 5 seconds due to the delay), I called the second query in another window:
begin tran
select
*
into #message
from dbo.MessageBody
where MessageBody.headerId = 28
update dbo.MessageBody set [message] = 'message2'
where headerId = (select headerId from #message)
select * from dbo.MessageBody where headerId = (select headerId from #message)
drop table #message
commit tran
and I was rather surprised that it executed instantaneously. This was due to the default SQL Server transaction level "Read Commited" http://technet.microsoft.com/en-us/library/ms173763.aspx . Since the update of the first script is done after the delay, during the second script there are no umcommited changes yet, so the row 28 is read and updated.
Changing the Isolation level to Serialization prevented this, but it also prevented concurrency - both scipts were executed consecutively.
That was OK, since both scripts read and changed the same row (via headerId=28). Changing headerId to another value in the second script, the statements were executed parallel. So the lock from SERIALIZATION seems to be on row level.
Adding the table hint
WITH ( SERIALIZABLE)
in the first select of the first statement does also prevent further reads oth the selected row.