Enforce Atomic Operations x Locks - tsql

I have a database model that works similar to a banking account (one table for operations, a nd a trigger to update the balance). I'm currently using SQL SERVER 2008 R2.
TABLE OPERATIONS
----------------
VL_CREDIT decimal(10,2)
VL_DEBIT decimal(10,2)
TABLE BALANCE
-------------
DT_OPERATION datetime
VL_CURRENT decimal(10,2)
PROCEDURE INSERT_OPERATION
--------------------------
GET LAST BALANCE BY DATE
CHECK IF VALUE OF OPERATION > BALANCE
IF > RETURN ERROR
ELSE INSERT INTO OPERATION(...,....)
The issue I have is the following:
The procedure to insert the operation has to check the balance to see if there's money available before inserting the operation, so the balance never gets negative. If there's no balance, I return some code to tell the user the balance is not enough.
My concern is: If this procedure gets called multiple times in a row, how can I guarantee that it's atomic?
I have some ideas, but as I am not sure which would guarantee it:
BEGIN TRANSACTION on the OPERATION PROCEDURE
Some sort of lock on selecting the BALANCE table, but it must hold until the end of procedure execution
Can you suggest some approach to guarantee that? Thanks in advance.
UPDATE
I read on MSDN (http://technet.microsoft.com/en-us/library/ms187373.aspx) that if my procedure has BEGIN/END TRANSACTION, and the SELECT on table BALANCE has WITH(TABLOCKX), it locks the table until the end of the transaction, so if a subsequent call to this procedure is made during the execution of the first, it will wait, and then guarantee that the value is always the last updated. Will it work? And if so, is it the best practice?

If you're amenable to changing your table structures, I'd build it this way:
create table Transactions (
SequenceNo int not null,
OpeningBalance decimal(38,4) not null,
Amount decimal(38,4) not null,
ClosingBalance as CONVERT(decimal(38,4),OpeningBalance + Amount) persisted,
PrevSequenceNo as CASE WHEN SequenceNo > 1 THEN SequenceNo - 1 END persisted,
constraint CK_Transaction_Sequence CHECK (SequenceNo > 0),
constraint PK_Transaction_Sequence PRIMARY KEY (SequenceNo),
constraint CK_Transaction_NotNegative CHECK (OpeningBalance + Amount >= 0),
constraint UQ_Transaction_BalanceCheck UNIQUE (SequenceNo, ClosingBalance),
constraint FK_Transaction_BalanceCheck FOREIGN KEY
(PrevSequenceNo, OpeningBalance)
references Transactions
(SequenceNo,ClosingBalance)
/* Optional - another check that Transaction 1 has 0 amount and
0 opening balance, if required */
)
Where you just apply credits and debits as +ve or -ve values for Amount. The above structure is enough to enforce the "not going negative" requirement (via CK_Transaction_NotNegative), and it also ensures that you know the current balance (by finding the row with the highest SequenceNo and taking the ClosingBalance value. Together, UQ_Transaction_BalanceCheck and FK_Transaction_BalanceCheck (and the computed columns) ensure that the entire sequence of transactions is valid, and PK_Transaction_Sequence keeps everything building in order
So, if we populate it with some data:
insert into Transactions (SequenceNo,OpeningBalance,Amount) values
(1,0.0,10.0),
(2,10.0,-5.50),
(3,4.50,2.75)
And now we can attempt an insert (this could be INSERT_PROCEDURE with #NewAmount passed as a parameter):
declare #NewAmount decimal(38,4)
set #NewAmount = -15.50
;With LastTransaction as (
select SequenceNo,ClosingBalance,
ROW_NUMBER() OVER (ORDER BY SequenceNo desc) as rn
from Transactions
)
insert into Transactions (SequenceNo,OpeningBalance,Amount)
select SequenceNo + 1, ClosingBalance, #NewAmount
from LastTransaction
where rn = 1
This insert fails because it would have caused the balance to go negative. But if #NewAmount was small enough, it would have succeeded. And if two inserts are attempted at "the same time" then either a) They're just far enough apart in reality that they both succeed, and the balances are all kept correct, or b) One of them will receive a PK violation error.

Related

Best way to model state changes for point in time queries

I'm working on a system that needs to be able to find the "state" of an item at a particular time in history. The state is binary (either on or off). In this case it's to determine where to direct (to a particular "keyspace") a piece of timestamped data as determined by the timestamp of the data. I'm having a hard time deciding what the best way to model the data is.
Method 1 is to use the tstzrange with state being implied by the bounds of the range:
create extension btree_gist;
create table core.range_director (
range tstzrange,
directee_id text,
keyspace text,
-- allow a directee to be directed to multiple keyspaces at once
exclude using gist (directee_id with =, keyspace with =, range with &&)
);
insert into core.range_director values
('[2021-01-15 00:00:00 -0:00,2021-01-20 00:00:00 -0:00)', 'THING_ID', 'KEYSPACE_1'),
('[2021-01-15 00:00:00 -0:00,)', 'THING_ID', 'KEYSPACE_2');
select keyspace from core.range_director
where directee_id = 'THING_ID' and range_director.range #> '2021-01-15'::timestamptz;
-- returns KEYSPACE_1 and KEYSPACE_2
select keyspace from core.range_director
where directee_id = 'THING_ID' and range_director.range #> '2021-01-21'::timestamptz;
-- returns KEYSPACE_2
Method 2 is to have explicit state changes:
create table core.status_director (
status_time timestamptz,
status text,
directee_id text,
keyspace text
); -- not sure what pk to use for this method
insert into core.status_director values
('2021-01-15 00:00:00 -0:00','Open','THING_ID','KEYSPACE_1'),
('2021-01-20 00:00:00 -0:00','Closed','THING_ID','KEYSPACE_1'),
('2021-01-15 00:00:00 -0:00','Open','THING_ID','KEYSPACE_2');
select distinct on(keyspace) keyspace, status from core.status_director
where directee_id = 'THING_ID'
and status_time < '2021-01-16'
order by keyspace, status_time desc;
-- returns KEYSPACE_1:Open KEYSPACE_2:Open
select distinct on(keyspace) keyspace, status from core.status_director
where directee_id = 'THING_ID'
and status_time < '2021-01-21'
order by keyspace, status_time desc;
-- returns KEYSPACE_1:Closed, KEYSPACE_2:Open
-- so, client code has to ensure that it only directs to status=Open keyspaces
Maybe there are other methods that would work as well, but these two seem to make the most sense to me. The benefit of the first method is the really easy query, but the down side is that you now have to update rows to close the state whereas in the second method you can just post new states which seems easier.
The table could conceivable grow into thousands or tens of thousands of rows, but will probably not grow into millions (but does the best method change depending on the expected row count?). I have a couple of similar tables with the same point-in-time "state" queries so it's really important that I get the model for them right.
My instinct is to go with Method 1, but are there any footguns or performance considerations that I'm not thinking of that would urge the use case towards Method 2 (or another method I haven't considered?)
No footguns with Method 1, just great big huge cannons. With that method how do you determine the current status. You need to scan each status change and for each one toggle the status, or perhaps use something like "count(*)%2" odd gives one state even another. What happens if any row gets deleted, or data purged and you do not know how many state transactions there were. With the Method 2 you retrieve the greatest date and directly obtain the status.
For myself I would do Method 3. That being Method1 + Method 2. Yes I would have a date range of the status and the status value itself. That gives me complex historical analysis as I have the complete history as well as direct access to current status at any time.
So after doing a bunch of research on the topic I found that my case is a variation of a "Valid-Time State Table". See ch. 2 and ch. 5 of Developing Time-Oriented Database Applications in SQL by Richard Snodgrass.
The support for these tables isn't great but it's not terrible either (at least PostgreSQL has tstzranges to work with). Method 1 of my post is largely sufficient - the main wrinkle is between the state table and other tables.
Since PostgreSQL doesn't have native support for these kinds of temporal tables, you have to build referential integrity yourself. There's a bunch of ways to do this, but for anyone in the future looking for some direction, here is an example of what that might look like for a referential query on two bitemporal tables:
create table a (
row_id bigserial, -- to track individual rows
id int,
pov tstzrange, -- period of validity
pop tstzrange -- period of presence
);
create table b (
row_id bigserial,
id int,
pov tstzrange,
pop tstzrange,
a_id int
);
-- are we good?
with each_pov as (
select bool_or(a.pov #> b.pov) as ok
from a
join b on a.id = b.a_id
and upper(a.pop) is null
and upper(b.pop) is null
group by b.pov
) select coalesce(
bool_and(each_pov.ok),
(select count(*) = 0 from b where upper(pop) is null)
) from each_pov;
You can put the query into a constraint trigger on both the main table and the referenced table to get something approaching sequenced referential integrity for the current period of presence.

Postgresql race conditions

I have a table which stores geo position of user. Which looks like this:
|id|coords|create_time|
And i have controller which saves record in database, but user can save record only once per 5 hours. Simple "if" check is not working, cause if you send a request within like 10ms lets say 100 times, check is going to fail, because record not in db (saving takes some time). So there is simple race condition. How to solve this problem on database level?
One solution would be to use the SERIALIZABLE transaction isolation level throughout.
Then your transactions could be as simple as:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM mytable
WHERE create_time > current_timestamp - INTERVAL '5 hours';
-- throw an error of the result is not 0
INSERT INTO mytable (oords, create_time) VALUES (..., current_timestamp);
COMMIT;
SERIALIZABLE isolation will guarantee a serialization error in one of two concurrent transactions like that.
Now SERIALIZABLE is simple to use, but it saps performance somewhat, needs a bigger lock table and you have to be ready to repeat transactions that receive a serialization error.
A second solution that works with the default READ COMMITTED isolation level would be an exclusion constraint:
ALTER TABLE mytable ADD EXCLUDE USING gist (
tstzrange(create_time, create_time + INTERVAL '5 hours') WITH &&
);
Here && is the range “overlaps” operator, and the condition would exclude any two entries in the table which are closer than 5 hour.
tstzrange is a “timestamp with time zone-range” and is the appropriate type if create_time is of that type; for timestamp without time zone use tsrange.
This is automatically safe from race conditions, and one of two concurrent INSERTs would receive a constraint violation error.
If you need to have that overlap check per person, let's assume that there is a person_id column as well. Then you need to extend the exclusion constraint:
CREATE EXTENSION btree_gist; -- for GiST indexes on bigint columns
ALTER TABLE mytable ADD EXCLUDE USING gist (
person_id WITH =,
tstzrange(create_time, create_time + INTERVAL '5 hours') WITH &&
);

Prevent two threads from selecting same row ibm db2

I have a situation where I have multiple (potentially hundreds) threads repeating the same task (using a java scheduled executor, if you are curious). This task entails selecting rows of changes (from a table called change) that have not yet been processed (processed changes are kept track in a m:n join table called process_change_rel that keeps track of the process id, record id and status) processing them, then updating back the status.
My question is, how is the best way to prevent two threads from the same process from selecting the same row? Will the below solution (using for update to lock rows ) work? If not, please suggest a working solution
Create table change(
—id , autogenerated pk
—other fields
)
Create table change_process_rel(
—change id (pk of change table)
—process id (pk of process table)
—status)
Query I would use is listed below
Select * from
change c
where c.id not in(select changeid from change_process_rel with cs) for update
Please let me know if this would work
You have to "lock" a row which you are going to process somehow. Such a "locking" should be concurrent of course with minimum conflicts / errors.
One way is as follows:
Create table change
(
id int not null generated always as identity
, v varchar(10)
) in userspace1;
insert into change (v) values '1', '2', '3';
Create table change_process_rel
(
id int not null
, pid int not null
, status int not null
) in userspace1;
create unique index change_process_rel1 on change_process_rel(id);
Now you should be able to run the same statement from multiple concurrent sessions:
SELECT ID
FROM NEW TABLE
(
insert into change_process_rel (id, pid, status)
select c.id, mon_get_application_handle(), 1
from change c
where not exists (select 1 from change_process_rel r where r.id = c.id)
fetch first 1 row only
with ur
);
Every such a statement inserts 1 or 0 rows into the change_process_rel table, which is used here as a "lock" table. The corresponding ID from change is returned, and you may proceed with processing of the corresponding event in the same transaction.
If the transaction completes successfully, then the row inserted into the change_process_rel table is saved, so, the corresponding id from change may be considered as processed. If the transaction fails, the corresponding "lock" row from change_process_rel disappears, and this row may be processed later by this or another application.
The problem of this method is, that when both tables become large enough, such a sub-select may not work as quick as previously.
Another method is to use Evaluate uncommitted data through lock deferral.
It requires to place the status column into the change table.
Unfortunately, Db2 for LUW doesn't have SKIP LOCKED functionality, which might help with such a sort of algorithms.
If, let's say, status=0 is "not processed", and status<>0 is some processing / processed status, then after setting these DB2_EVALUNCOMMITTED and DB2_SKIP* registry variables and restart the instance, you may "catch" the next ID for processing with the following statement.
SELECT ID
FROM NEW TABLE
(
update
(
select id, status
from change
where status=0
fetch first 1 row only
)
set status=1
);
Once you get it, you may do further processing of this ID in the same transaction as previously.
It's good to create an index for performance:
create index change1 on change(status);
and may be set this table as volatile or collect distribution statistics on this column in addition to regular statistics on table and its indexes periodically.
Note that such a registry variables setting has global effect, and you should keep it in mind...

PostgreSQL - How to make a condition with records between the current record date and the same date plus 5 min?

I have something like this. With this part of code I detect if a vehicle stopped at least 5 minutes.
And works but, with a large amount of data, it starts to be slow.
I did a lot of tests and I'm sure that my problem is in the not exists block.
My table:
CREATE TABLE public.messages
(
id bigint PRIMARY KEY DEFAULT nextval('messages_id_seq'::regclass),
messagedate timestamp with time zone NOT NULL,
vehicleid integer NOT NULL,
driverid integer NOT NULL,
speedeffective double precision NOT NULL,
-- ... few nonsense properties
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.messages OWNER TO postgres;
CREATE INDEX idx_messages_1 ON public.messages
USING btree (vehicleid, messagedate);
And my query:
SELECT
*
FROM
messages m
WHERE
m.speedeffective > 0
and m.next_speedeffective = 0
and not exists( -- my problem
select id
from messages
where
vehicleid = m.vehicleid
and speedeffective > 5 -- I forgot this condition
and messagedate > m.messagedate
and messagedate <= m.messagedate + interval '5 minutes'
)
I can't figure out how to build the condition in a more performant way.
Edit DAY2:
I added a previous table like this to use in the second table:
WITH messagesx as (
SELECT
vehicleid,
messagedate
FROM
messages
WHERE
speedeffective > 5
)
and now works better. I think that I'm missing a little detail.
Typically, a 'NOT EXISTS' will slow down your query as it requires a full scan of the table for each of the outer rows. Try to incorporate the same functionality within a join (I'm trying to rewrite the query here, without knowing the table, so I might make a mistake here):
SELECT
*
FROM
messages m1
LEFT JOIN
messages m2
ON m1.vehicleid = m2.vehicleid AND m1.messagedate < m2.messagedate AND m1.messagedate <= m2.messagedate+interval '5 minutes'
WHERE
speedeffective > 0
and next_speedeffective = 0
and m2.vehicleid IS NULL
Take note that the NOT EXISTS is rewritten as the non-hit of the join condition.
Based on this answer: https://stackoverflow.com/a/36445233/5000827
and reading about NOT IN, NOT EXISTS and LEFT JOIN (where join is NULL)
For PostgreSQL, NOT EXISTS and LEFT JOIN are anti-join and works at the same way. (This is the reason why the #CountZukula answer's result is almost the same than mine)
The problem was on the kind of operation: Nest or Hash.
So, based on this: https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html
PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:
To recover or reuse disk space occupied by updated or deleted rows.
To update data statistics used by the PostgreSQL query planner.
To update the visibility map, which speeds up index-only scans.
To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
I made a VACUUM ANALYZE to messages table and the same query works way fast.
So, with the VACUUM PostgreSQL can decide better.

Concurrency scenarios with INSERTs

I'm designing a booking system in PHP + PostgreSQL.
I'm not able to find a clean solution to a concurrency problem based on INSERTs operations.
The DB system is mainly made of these tables:
CREATE TABLE booking (
booking_id INT,
user_id INT,
state SMALLINT,
nb_coupons INT
);
CREATE booking_state_history (
booking_state_history_id INT,
timestamp TIMESTAMP,
booking_id INT,
state SMALLINT);
CREATE TABLE coupon_purchase(
coupon_purchase_id,
user_id INT,
nb INT,
value MONEY)
CREATE TABLE coupon_refund(
coupon_refund_id INT,
user_id,
nb INT,
value MONEY)
CREATE TABLE booking_payment(
booking_payment_id INT,
user_id,
booking_id,
nb INT,
value MONEY)
A booking must be paid with coupons that have been previously purchased by the user. Some coupons may have been refund. All these operations are stored in the two corresponding tables to keep an history and be able to compute the coupon balance.
Constraint: the coupon balance cannot be negative at any time.
A booking is finalized when it is paid with coupons.
Then the following operations happen:
BEGIN;
(1) Check there are enough coupons remaining to pay the booking. (SELECT)
(2) Decide which coupons (number and value) will be used to pay the booking
(mainly, higher cost coupon used first. But that is not the issue here.)
(3) Add records to booking_payment (INSERTs)
(4) Move the booking to state="PAID" (integer value representing "PAID") (UPDATE)
(5) Add a record to booking_state_history (INSERT)
COMMIT;
These operations need to be atomic to preserve DB information coherency.
Hence the usage of transactions that allow to COMMIT or ROLLBACK in case of failure, DB exception, PHP exception or any other issue in the middle of the operations.
Scenario 1
Since I'm in a concurrent access environment (web site) nothing prevents the user from (for instance) asking for a coupon refund while doing a booking payment at the same time.
Scenario 2
He can also trigger two concurrent booking payments at the same time in two different transactions.
So the following can happen:
Scenario 1
After (1) is done, the coupon refund is triggered by the user and the subsequent coupon balance is not enough to pay the booking any more.
When it COMMITs the balance becomes negative.
Note:
Even if I do a recheck of coupon balance in a new (6) step, there is a possibility for the coupon refund to happen in the meantime between (6) and COMMIT.
Scenario 2
Two concurrent booking payment transactions for which total number of coupons for payment is too much for the global balance to stay positive. Only one of them can happen.
Transaction 1 and transaction 2 are checking for balance and seeing enough coupons for their respective payment in step (1).
They go on with their operations and COMMIT. The new balance is negative and conflicting with the constraint.
Note:
Even if I do a coupon balance recheck in a new (6) step the transactions cannot see the operations not yet commited by the other one.
So they blindly proceed to COMMIT.
I guess this is an usual concurrency case but I cannot find a pattern to solve this on the internet.
I thought of rechecking the balance after the COMMIT so I can manually UNDO all the operations. But it is not totally safe since if an exception happen after the commit the UNDO won't be done.
Any idea to solve this concurrency problem?
Thanks.
Your problem boils down to the question of "what should be the synchronization lock". From your question it seems that the booking is not booking of a specific item. But lets assume, that a user is booking a specific hotel room so you need to solve two problems:
prevent overbooking (e.g. booking the same thing for two people)
prevent parallel account state miscalculation
So when a user gets to a point when he/she is about to hit confirm button, this is a possible scenario you can implement:
begin transaction
lock the user entry so that parallel processes are blocked
SELECT * FROM user FOR UPDATE WHERE id = :id
re-check account balance and throw exception / rollback if there are insufficient funds
lock the item to be booked to prevent overbooking
SELECT * FROM room FOR UPDATE WHERE id = :id
re-check booking availability and throw exception / rollback if the item is already booked
create booking entry and subtract funds from user's account
commit transaction (all locks will be released)
If, in your case, you don't need to check for overbooking, just skip / ignore steps 4 and 5.
Below is the solution I've implemented.
Note: I just treated the coupon transfer part below but it is the same with booking state change and booking_state_history.
The main idea is to preserve this part of the processing as a critical section.
When an INSERT into booking_payment, coupon_purchase or coupon_refund is to be done I prevent other transactions from doing the same by putting a lock on a dedicated table through an UPDATE for the given user_id.
This way, only transactions impacting this given user_id for the same kind of treatment will be locked.
Intialization
DROP TABLE coupon_purchase;
DROP TABLE coupon_refund;
DROP TABLE booking_payment;
DROP TABLE lock_coupon_transaction;
CREATE TABLE coupon_purchase(
coupon_purchase_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE coupon_refund(
coupon_refund_id SERIAL PRIMARY KEY,
user_id INT,
nb INT);
CREATE TABLE booking_payment(
booking_payment_id SERIAL PRIMARY KEY,
user_id INT,
booking_id INT,
nb INT);
CREATE TABLE lock_coupon_transaction (
user_id INT,
timestamp TIMESTAMP);
INSERT INTO coupon_purchase
(user_id, nb) VALUES
(1, 1),
(1, 5);
INSERT INTO coupon_refund
(user_id, nb) VALUES
(1, 3);
INSERT INTO lock_coupon_transaction
(user_id, timestamp) VALUES
(1, current_timestamp);
Transaction 1
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO booking_payment
(user_id, booking_id, nb)
SELECT 1::INT, 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 1
Transaction 2
BEGIN;
UPDATE lock_coupon_transaction SET timestamp=current_timestamp WHERE user_id='1';
// Transaction is locked waiting for a COMMIT or ROLLBACK from transaction 1.
Transaction 1
COMMIT;
COMMIT
Transaction 2
// Transaction 1 lock has been released so transaction 2 can go on
WITH coupon_balance AS (
SELECT
t1.nb_purchased_coupons -
t2.nb_refunded_coupons -
t3.nb_booking_payment_coupons AS total
FROM
(SELECT COALESCE(SUM(nb),0) AS nb_purchased_coupons FROM coupon_purchase WHERE user_id='1' ) t1,
(SELECT COALESCE(SUM(nb),0) AS nb_refunded_coupons FROM coupon_refund WHERE user_id='1' ) t2,
(SELECT COALESCE(SUM(nb),0) AS nb_booking_payment_coupons FROM booking_payment WHERE user_id='1' ) t3
)
INSERT INTO coupon_refund
(user_id, nb)
SELECT 1::INT, 3::INT
FROM coupon_balance
WHERE (total::INT >= 3::INT);
INSERT 0 0
COMMIT;
COMMIT
INSERT couldn't be done since not enough money on the account. This is the expected behavior.
The previous transaction was commited when the second one proceeded. So transaction 2 could see all changes made by transaction 1.
This way there is not risk to have concurrent access to coupons handling.