How to lock a row while updating it in SqlAlchemy Core? - postgresql

It is incredibly difficult to unit test race conditions like this. I was hoping to verify this with experts here.
I have the following scenario where I would like to obtain the first VPN Profile that is not assigned to any device.
In the meanwhile any other process trying to obtain the same profile (since it's the next in line), should wait here until my transaction has completed.
I would then update this vpn profile and assign the given device id to it and finish the transaction.
At this point any process that was waiting on the select().first() statement shouldn't be obtaining this particular VPN profile, because it has already been assigned to a device_id. Instead it should obtain the next available one.
After some digging, this is the code I have come up with. I'm using with_for_update() in the select statement and keep everything within the same engine.begin() transaction. I'm using SQLAlchemy Core without ORM.
async with engine.begin() as conn:
query = (
VpnProfileTable.select()
.where(
VpnProfileTable.c.device_id == None,
)
.with_for_update(nowait=False)
)
record = (await conn.execute(query)).first()
if record:
query = (
VpnProfileTable.update()
.where(VpnProfileTable.c.id == record.id)
.values(device_id=device_id)
)
await conn.execute(query)
await conn.commit()
Is my code reflecting what I'm trying to achieve?
Not sure if I need to commit(), since there is already a with engine.begin() statement. It should happen automatically at the end.
Many thanks

Related

How to avoid DataIntegrityViolationException in multithread environment for inserts?

Suppose we have a "devices" table that has a "device_uuid" column on which we have a unique constraint.
The pseudocode for the part of the application looks like
Optional<Device> device = deviceRepository.findByDeviceUUID(deviceUUID)
if (device.isPresent())
device.heartBeat()
else
createNewDevice(deviceUUID)
On high load, when more than one thread tries to add a new device I end up with DataIntegrityViolationException because of the unique constraint violation, which is nice because data is consistent, however, it makes quite a mess in the logs of my application. Is it possible to lock it somehow on the database level?
I have tried already making it in Serializable isolation level, but it is not a solution to this type of problem.
The easiest way is to select the Device with the FOR UPDATE SQL clause. In this case, another transaction won't be able to obtain the same row until the previous one is still holding it.
If you're using JPA, then here is the way.
Device device = (Device) entityManager.createQuery("from Device where deviceId = :id")
.setParameter("id", uuid)
.setLockMode(LockModeType.PESSIMISTIC_WRITE)
.getSingleResult();
for a multi-threaded-environment its always better to generate UUID on centralized-node
here Database can be a good option
We can create a function at DB side and get uuid from there

How to optimise this ef core query?

I'm using EF Core 3.0 code first with MSSQL database. I have big table that has ~5 million records. I have indexes on ProfileId, EventId and UnitId. This query takes ~25-30 seconds to execute. Is it normal or there is a way to optimize it?
await (from x in _dbContext.EventTable
where x.EventId == request.EventId
group x by new { x.ProfileId, x.UnitId } into grouped
select new
{
ProfileId = grouped.Key.ProfileId,
UnitId = grouped.Key.UnitId,
Sum = grouped.Sum(a => a.Count * a.Price)
}).AsNoTracking().ToListAsync();
I tried to loos through profileIds, adding another WHERE clause and removing ProfileId from grouping parameter, but it worked slower.
Capture the SQL being executed with a profiling tool (SSMS has one, or Express Profiler) then run that within SSMS /w execution plan enabled. This may highlight an indexing improvement. If the execution time in SSMS roughly correlates to what you're seeing in EF then the only real avenue of improvement will be hardware on the SQL box. You are running a query that will touch 5m rows any way you look at it.
Operations like this are not that uncommon, just not something that a user would expect to sit and wait for. This is more of a reporting-type request so when faced with requirements like this I would look at options to have users queue up a request where they can receive a notification when the operation completes to fetch the results. This would be set up to prevent users from repeatedly requesting updates ("not sure if I clicked" type spams) or also considerations to ensure too many requests from multiple users aren't kicked off simultaneously. Ideally this would be a candidate to run off a read-only reporting replica rather than the read-write production DB to avoid locks slowing/interfering with regular operations.
Try to remove ToListAsync(). Or replace it with AsQueryableAsync(). Add ToList slow performance down.
await (from x in _dbContext.EventTable
where x.EventId == request.EventId
group x by new { x.ProfileId, x.UnitId } into grouped
select new
{
ProfileId = grouped.Key.ProfileId,
UnitId = grouped.Key.UnitId,
Sum = grouped.Sum(a => a.Count * a.Price)
});

Postgres multi-update with multiple `where` cases

Excuse what seems like it could be a duplicate. I'm familiar with multiple updates in Postgres... but I can't seem to figure out a way around this one...
I have a photos table with the following columns: id (primary key), url, sort_order, and owner_user_id.
We would like to allow our interface to allow the user to reorder their existing photos in a collection view. In which case when a drag-reorder interaction is complete, I am able to send a POST body to our API with the following:
req.body.photos = [{id: 345, order: 1, id: 911, order: 2, ...<etc>}]
In which case I can turn around and run the following query in a loop per each item in the array.
photos.forEach(function (item) {
db.runQuery('update photos set sort_order=$1 where id=$2 and owner_user_id=$3', [item.order, item.id, currentUserId])
})
In general, it's generally frowned upon to run database queries inside loops, so if there's anyway this can be done with 1 query that would be fantastic.
Much thanks in advance.
Running a select query inside of a loop is definitely questionable, but I don't think multiple updates is necessarily frowned upon if the data you are updating doesn't natively reside on the database. To do these as separate transactions, however, might be.
My recommendation would be to wrap all known updates in a single transaction. This is not only kinder to the database (compile once, execute many, commit once), but this is an ACID approach to what I believe you are trying to do. If, for some reason, one of your updates fails, they will all fail. This prevents you from having two photos with an order of "1."
I didn't recognize your language, but here is an example of what this might look like in C#:
NpgSqlConnection conn = new NpgSqlConnection(connectionString);
conn.Open();
NpgSqlTransaction trans = conn.BeginTransaction();
NpgSqlCommand cmd = new NpqSqlCommand("update photos set sort_order=:SORT where id=:ID",
conn, trans);
cmd.Parameters.Add(new NpgSqlParameter("SORT", DbType.Integer));
cmd.Parameters.Add(new NpgSqlParameter("ID", DbType.Integer));
foreach (var photo in photos)
{
cmd.Parameters[0].Value = photo.SortOrder;
cmd.Parameters[1].Value = photo.Id;
cmd.ExecuteNonQuery();
}
trans.Commit();
I think in Perl, for example, it would be even simpler -- turn off DBI AutoCommit and commit after the inserts.
CAVEAT: Of course, add error trapping -- I was just illustrating what it might look like.
Also, I changed you update SQL. If "Id" is the primary key, I don't think you need the additional owner_user_id=$3 clause to make it work.

Postgres 'if not exists' fails because the sequence exists

I have several counters in an application I am building, as am trying to get them to be dynamically created by the application as required.
For a simplistic example, if someone types a word into a script it should return the number of times that word has been entered previously. Here is an example of sql that may be executed if they typed the word example.
CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
SELECT nextval('example')
This would return 1 the first time it ran, 2 the second time, etc.
The problem is when 2 people click the button at the same time.
First, please note that a lot more is happening in my application than just these statements, so the chances of them overlapping is much more significant than it would be if this was all that was happening.
1> BEGIN;
2> BEGIN;
1> CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
2> CREATE SEQUENCE IF NOT EXISTS example START WITH 1; -- is blocked by previous statement
1> SELECT nextval('example') -- returns 1 to user.
1> COMMIT; -- unblocks second connection
2> ERROR: duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(example, 109649) already exists.
I was under the impression that by using "IF NOT EXISTS", the statement should just be a no-op if it does exist, but it seems to have this race condition where that is not the case. I say race condition because if these two are not executed at the same time, it works as one would expect.
I have noticed that IF NOT EXISTS is fairly new to postgres, so maybe they haven't worked out all of the kinks yet?
EDIT:
The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)
Sequences are part of the database schema. If you find yourself modifying the schema dynamically based on the data stored in the database, you are probably doing something wrong. This is especially true for sequences, which have special properties e.g. regarding their behavior with respect to transactions. Specifically, if you increment a sequence (with the help of nextval) in the middle of a transaction and then you rollback that transaction, the value of the sequence will not be rolled back. So most likely, this kind of behavior is something that you don't want with your data. In your example, imagine that a user tries to add word. This results in the corresponding sequence being incremented. Now imagine that the transaction does not complete for reason (e.g. maybe the computer crashes) and it gets rolled back. You would end up with the word not being added to the database but with the sequence being incremented.
For the particular example that you mentioned, there is an easy solution; create an ordinary table to store all the "sequences". Something like that would do it:
CREATE TABLE word_frequency (
word text NOT NULL UNIQUE,
frequency integer NOT NULL
);
Now I understand that this is just an example, but if this approach doesn't work for your actual use case, let us know and we can adjust it to your needs.
Edit: Here's how you the above solution works. If a new word is added, run the following query ("UPSERT" syntax in postgres 9.5+ only):
INSERT INTO word_frequency(word,frequency)
VALUES ('foo',1)
ON CONFLICT (word)
DO UPDATE
SET frequency = word_frequency.frequency + excluded.frequency
RETURNING frequency;
This query will insert a new word in word_frequency with frequency 1, or if the word exists already it will increment the existing frequency by 1. Now what happens if two transaction try to do that at the same time? Consider the following scenario:
client 1 client 2
-------- --------
BEGIN
BEGIN
UPSERT ('foo',1)
UPSERT ('foo',1) <====
COMMIT
COMMIT
What will happen is that as soon as client 2 tries increment the frequency for foo (marked with the arrow above), that operation will block because the row was modified by a different transaction. When client 1 commits, client 2 will get unblocked and continue without any errors. This is exactly how we wanted it to work. Also note, that postgresql will use row-level locking to implement this behavior, so other insertions will not be blocked.
EDIT: The main reason we were considering doing things this way was to
avoid excess locking. The thought being that if two people were to
increment at the same time, using a sequence would mean that neither
user should have to wait for the other (except, as in this example,
for the initial creation of that sequence)
It sounds like you're optimizing for a problem that likely does not exist. Sure, if you have 100,000 simultaneous users that are only inserting rows (since a sequence will only be used then normally) there is the possibility of some contention with the sequence but realistically there will be other bottle necks long before the sequence gets in the way.
I'd advise you to first prove that the sequence is an issue. With a proper database design (which dynamic DDL is not) the sequence will not be the bottle neck.
As a reference, DDL is not transaction safe in most databases.

Row-Level Update Lock using System.Transactions

I have a MSSQL procedure with the following code in it:
SELECT Id, Role, JurisdictionType, JurisdictionKey
FROM
dbo.SecurityAssignment WITH(UPDLOCK, ROWLOCK)
WHERE Id = #UserIdentity
I'm trying to move that same behavior into a component that uses OleDb connections, commands, and transactions to achieve the same result. (It's a security component that uses the SecurityAssignment table shown above. I want it to work whether that table is in MSSQL, Oracle, or Db2)
Given the above SQL, if I run a test using the following code
Thread backgroundThread = new Thread(
delegate()
{
using (var transactionScope = new TrasnsactionScope())
{
Subject.GetAssignmentsHavingUser(userIdentity);
Thread.Sleep(5000);
backgroundWork();
transactionScope.Complete();
}
});
backgroundThread.Start();
Thread.Sleep(3000);
var foregroundResults = Subject.GetAssignmentsHavingUser(userIdentity);
Where
Subject.GetAssignmentsHavingUser
runs the sql above and returns a collection of results and backgroundWork is an Action that updates rows in the table, like this:
delegate
{
Subject.UpdateAssignment(newAssignment(user1, role1));
}
Then the foregroundResults returned by the test should reflect the changes made in the backgroundWork action.
That is, I retrieve a list of SecurityAssignment table rows that have UPDLOCK, ROWLOCK applied by the SQL, and subsequent queries against those rows don't return until that update lock is released - thus the foregroundResult in the test includes the updates made in the backgroundThread.
This all works fine.
Now, I want to do the same with database-agnostic SQL, using OleDb transactions and isolation levels to achieve the same result. And I can't for the life of me, figure out how to do it. Is it even possible, or does this row-level locking only apply at the db level?