book:
id: primary key, integer
title: varchar
borrowed: boolean
borrowed_by_user_id: foreign key user.id
user:
id: primary key, integer
name: varchar
blocked: boolean
The isolation level is READ COMMITED, because it is default level in PostgreSQL (this requirement is not from me).
I am using one database transaction to SELECT FOR UPDATE a book and lend it to any user if book is not borrowed yet. The book was selected FOR UPDATE so it cannot be borrowed concurrently.
But there is another problem. We cannot allow to lend a book to blocked user. How can we ascertain that? Even if we check at the beginning if user is not blocked, the result might not be correct because a concurrent transaction could block the user after that check.
For example, a user can be blocked by a concurrent transaction from the admin's panel.
How to solve that issue?
I see that I can use SERIALIZABLE. It requires a handling errors, yes?
I am not sure how that CHECK works. Could you say more about it?
These are actually two questions.
About the books:
If you lock the book with SELECT ... FOR UPDATE as soon as you consider lending it out, this is an example of “pessimistic locking” and will block the book for all concurrent activity.
That is fine if the transactions are very short – specifically, if there is no user interaction between the locking and the end of the transaction.
Otherwise you should use “optimistic locking”. This can be done in several ways:
Use REPEATABLE READ transaction isolation. Then updating a book that has been modified since you read its data will lead to a serialization error (see the note at the end).
When selecting books, remember the values of the system columns ctid and xmin. Then update as follows:
UPDATE books SET ...
WHERE id = ...
AND ctid = original_ctid AND xmin = original_xmin;
If no row gets updated, somebody must have modified the book since you looked at it.
About the users:
Three ideas:
You use SERIALIZABLE transaction isolation (see the note at the end).
You maintain a counter on the user that contains the number of books the user has borrowed.
Then you can have a check constraint like
ALTER TABLE users ADD CHECK (NOT blocked OR books_borrowed = 0);
Such a check constraint is evaluated at the end of each statement and has to yield TRUE, else an error is thrown.
So either the transaction that borrows a book or the transaction that blocks the user must fail (both transactions have to modify the user).
Right before lending a book to a user, you run
SELECT blocked FROM users WHERE id = ... FOR UPDATE;
If you get TRUE, you abort the transaction, otherwise lend out the book.
A concurrent transaction that wants to block the user has to SELECT ... FOR UPDATE on the user as well and only then check if there are any books lent to that user.
That way, no inconsistency can happen: if you want to block a user, all concurrent transactions that want to lend a book to the user must either be completed, so that you see their effect, or they must wait until you are done blocking the user, whereupon they will fail.
Note about higher isolation levels:
If you run transactions at an isolation level of REPEATABLE READ or SERIALIZABLE, you can encounter serialization errors. These are not bugs in your program, they are normal and to be expected. If you encounter a serialization error, you have to rollback and try the same transaction again. That is the price you pay for not having to worry about race conditions.
Related
I am building an app that helps people transfer money from one account to another. I have two tables "Users" and "Transactions". The way I am currently handling transfers is by
Check if the sender has enough balance to make a transfer.
Deduct the balance from the sender and update the sender's balance.
Add the amount deducted from the sender's account to the recipient's account and then update the balance of the recipient.
Then finally write the transaction record on the "Transactions" table as a single entry like below:
id | transactionId | senderAccount | recipientAccount | Amount |
—--+---------------+---------------+------------------+--------+
1 | ijiej33 | A | B | 100 |
so my question is, is recording a transaction as a single entry like above a good practice or will this kind of database model design produce future challenges?
Thanks
Check if the sender has enough balance to make a transfer.
Deduct the balance from the sender and update the sender's balance.
Yes, but.
If two concurrent connections attempt to deduct money from the sender at the same time, they may both successfully check that there is enough money for each transaction on its own, then succeed even though the balance is insufficient for both transactions to succeed.
You must use a SELECT FOR UPDATE when checking. This will lock the row for the duration of the transaction (until COMMIT or ROLLBACK), and any concurrent connection attempting to also SELECT FOR UPDATE on the same row will have to wait.
Presumably the receiver account can always receive money, so there is no need to lock it explicitly, but the UPDATE will lock it anyway. And locks must always be acquired in the same order or you will get deadlocks.
For example if a transatcion locks rows 1 then 2, while another locks rows 2 then 1: the first one will lock 1, the second will lock 2, then the first will try to lock 2 but it is already locked, and the second will try to lock 1 but it is also already locked by the other transaction. Both transactions will wait for each other forever until the deadlock detector nukes one of them.
One simple way to dodge this is to use ORDER BY:
SELECT ... FROM users WHERE user_id IN (sender_id,receiver_id)
ORDER BY user_id FOR UPDATE;
This will lock both rows in the order of their user_ids, which will always be the same.
Then you can do the rest of the procedure.
Since it is always a good idea to hold locks for the shortest amount of time, I'd recommend to put the whole thing inside a plpgsql stored procedure, including the COMMIT/ROLLBACK and error handling. Try to make the stored procedure failsafe and atomic.
Note, for security purposes, you should:
Store the balance of both accounts before the money transfer occured into the transactions table. You're already SELECT'ing it in the SELECT for update, might as well use it. It will be useful for auditing.
For security, if a user gets their password stolen there's not much you can do, but if your application gets hacked it would be nice if the hacker was not able to issue global UPDATEs to all the account balances, mess with the audit tables, etc. This means you need to read up on this and create several postgres users/roles with suitable permissions for backup, web application, etc. Some tables and especially the transactions table should have all UPDATE privileges revoked, and INSERT allowed only for the transactions stored procs, for example. The aim is to make the audit tables impossible to modify, basically append-only from the point of view of the application code.
Likewise you can handle updates to balance via stored procedures and forbid the web application role from messing with it. You could even add take a user-specific security token passed as a parameter to the stored proc, to authenticate the app user to the database, so the database only allows transfers from the account of the user who is logged in, not just any user.
Basically if it involves money, then it involves legislation, and you have to think about how not to go to jail when your web app gets hacked.
When implementing a system which creates tasks that need to be resolved by some workers, my idea would be to create a table which would have some task definition along with a status, e.g. for document review we'd have something like reviewId, documentId, reviewerId, reviewTime.
When documents are uploaded to the system we'd just store the documentId along with a generated reviewId and leave the reviewerId and reviewTime empty. When next reviewer comes along and starts the review we'd just set his id and current time to mark the job as "in progress" (I deliberately skip the case where the reviewer takes a long time, or dies during the review).
When implementing such a use case in e.g. PostgreSQL we could use the UPDATE review SET reviewerId = :reviewerId, reviewTime: reviewTime WHERE reviewId = (SELECT reviewId from review WHERE reviewId is null AND reviewTime is null FOR UPDATE SKIP LOCKED LIMIT 1) RETURNING reviewId, documentId, reviewerId, reviewTime (so basically update the first non-taken row, using SKIP LOCKED to skip any already in-processing rows).
But when moving from native solution to JDBC and beyond, I'm having troubles implementing this:
Spring Data JPA and Spring Data JDBC don't allow the #Modifying query to return anything else than void/boolean/int and force us to perform 2 queries in a single transaction - one for the first pending row, and second one with the update
one alternative would be to use a stored procedure but I really hate the idea of storing such logic so away from the code
other alternative would be to use a persistent queue and skip the database all along but this introduced additional infrastructure components that need to be maintained and learned. Any suggestions are welcome though.
Am I missing something? Is it possible to have it all or do we have to settle for multiple queries or stored procedures?
Why Spring Data doesn't support returning entity for modifying queries?
Because it seems like a rather special thing to do and Spring Data JDBC tries to focus on the essential stuff.
Is it possible to have it all or do we have to settle for multiple queries or stored procedures?
It is certainly possible to do this.
You can implement a custom method using an injected JdbcTemplate.
If I've got an environment with multiple instances of the same client connecting to a MongoDB server and I want a simple locking mechanism to ensure single client access for a short time, can I safely use my own lock object?
Say I have one object with a lockState that can be "locked" or "unlocked" and the plan is everyone checks that it is "unlocked" before doing "stuff". To lock the system I say:
db.collection.update( { "lockState": "unlocked" }, { "lockState": "locked" })
(aka UPDATE lockObj SET lockState = 'locked' WHERE lockState = 'unlocked')
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
Both clients find the record by the query parameter of the update
Client 1 updates the record (which is an atomic operation)
update returns success
Client 2 updates the document (it's already found it before client 1 modified it)
update returns success
I realize this is probably a very contrived case that would be very hard to reproduce, but is it possible or does mongo somehow make client 2's update fail?
Alternative approach
Use insert instead of update. insert is atomic and will fail if the document already exists.
To lock the system: db.locks.insert({someId: 27, state: “locked”}).
If the insert succeeds - I've got the lock and since the update was atomic, no one else can have it.
If the insert fails - someone else must have the lock.
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
No, only one client at a time writes to the lock space (Global, Database, Collection or Document depending on your version and configuration) and the operations on that lock space are sequential and one or the other (read or write, not both) per document so that other connections will not mistakenly pick up a document in a inbetween state and think that it is not locked by another client.
All operations on a single document are atomic, whether update or insert.
When user is registering on website, e-mail needs to be provided which is unique. I've made unique index on schema's email attribute, so if I try to save the document in database, error with code 11000 will be returned. My question is, regarding to business layer and data layer, should I just pass the document to database and catch/check error codes which it returns or should I check if the user with that e-mail exists before? I've being told that data integrity should be checked before passing it to the database by the business layer, but I don't see the reason why should I do that since I believe that mongo would be much faster raising the exception itself since it has that index provided. The only disadvantage I see in error code checking is that error codes might change (but I could abstract them) and the syntax might be changed.
There is the practical matter of speed and the fragility of "check-then-set" systems. If you try and check if an email exists before you write the document keyed on email, there is a chance that between the time you check and the time you right the conditions of the unique index are met and your write fails anyhow. This is a classic race condition. Further, it takes 2 queries to do check-then-set but only 1 query to do the insert and handle the failure. In my application I am having success with just letting the failure occur and reacting to the result.
As #JamesWahlin says, it is the difference between dong this all in one or causing mixed results (along with the index check) from potential race conditions by adding the extra client read.
Definitely rely on the response of only insert from MongoDB here.
I have a set of tables and am currently trying to set up the correct relationships for the tables. The condensed version is below.
Users
ID INT NOT NULL
Activities
ID INT NOT NULL
UserID INT NULL
Logs
ID INT NOT NULL
UserID INT NULL
ActivityID INT NULL
I have relationships relating UserID from both Activities and Logs back to Users.ID and ActivityID relating back to Activities.ID.
I have set Activities.UserID and Logs.UserID to set null on delete, and cascade on update. My problem comes when i attempt to set the same update and delete functions to Logs.ActivityID but i get an error about "may cause cycles or multiple cascade paths". My problem is, Logs Require a User, and do not require an Activity, but, Logs that do have an Activity need to be updated if and when an Activity changes.
What way do i have around this that does not involve having two separate Logs tables, and does not involve manually updating the Logs table. Is this even possible in SQL Server 2012?
SQL Server does not support multiple cascade paths. Your options to work around this limitation are:
write your own logic for dealing with multiple cascade paths (recommended).
change your schema such that multiple cascade paths are not required.
wait for SQL Server to fix this. Don't hold your breath - this has been a limitation for years. See these Connect items:
a. http://connect.microsoft.com/SQLServer/feedback/details/126159/cascade-updates
b. http://connect.microsoft.com/SQLServer/feedback/details/307723/allow-multiple-cascade-paths-for-foreign-key-and-on-dalete-update-cascade
Notice how they keep saying "we don't have time for this now; we'll consider it for the next release"? This isn't the kind of thing that sells software, because people are satisfied - generally - with coding the logic themselves.
migrate to a database platform that supports multiple cascade paths (sounds like you think you have many options for this, but I don't think you do, and I don't know what you sacrifice by switching, not even counting porting your schema and code).