Sequelize transaction retry doens't work as expected - postgresql

I don't understand how transaction retry works in sequelize.
I am using managed transaction, though I also tried with unmanaged with same outcome
await sequelize.transaction({ isolationLevel: Sequelize.Transaction.ISOLATION_LEVELS.REPEATABLE_READ}, async (t) => {
user = await User.findOne({
where: { id: authenticatedUser.id },
transaction: t,
lock: t.LOCK.UPDATE,
});
user.activationCodeCreatedAt = new Date();
user.activationCode = activationCode;
await user.save({transaction: t});
});
Now if I run this when the row is already locked, I am getting
DatabaseError [SequelizeDatabaseError]: could not serialize access due to concurrent update
which is normal. This is my retry configuration:
retry: {
match: [
/concurrent update/,
],
max: 5
}
I want at this point sequelize to retry this transaction. But instead I see that right after SELECT... FOR UPDATE it's calling again SELECT... FOR UPDATE. This is causing another error
DatabaseError [SequelizeDatabaseError]: current transaction is aborted, commands ignored until end of transaction block
How to use sequelizes internal retry mechanism to retry the whole transaction?

Manual retry workaround function
Since Sequelize devs simply aren't interested in patching this for some reason after many years, here's my workaround:
async function transactionWithRetry(sequelize, transactionArgs, cb) {
let done = false
while (!done) {
try {
await sequelize.transaction(transactionArgs, cb)
done = true
} catch (e) {
if (
sequelize.options.dialect === 'postgres' &&
e instanceof Sequelize.DatabaseError &&
e.original.code === '40001'
) {
await sequelize.query(`ROLLBACK`)
} else {
// Error that we don't know how to handle.
throw e;
}
}
}
}
Sample usage:
const { Transaction } = require('sequelize');
await transactionWithRetry(sequelize,
{ isolationLevel: Transaction.ISOLATION_LEVELS.SERIALIZABLE },
async t => {
const rows = await sequelize.models.MyInt.findAll({ transaction: t })
await sequelize.models.MyInt.update({ i: newI }, { where: {}, transaction: t })
}
)
The error code 40001 is documented at: https://www.postgresql.org/docs/13/errcodes-appendix.html and it's the only one I've managed to observe so far on Serialization failures: What are the conditions for encountering a serialization failure? Let me know if you find any others that should be auto looped and I'll patch them in.
Here's a full runnable test for it which seems to indicate that it is working fine: https://github.com/cirosantilli/cirosantilli.github.io/blob/dbb2ec61bdee17d42fe7e915823df37c4af2da25/sequelize/parallel_select_and_update.js
Tested on:
"pg": "8.5.1",
"pg-hstore": "2.3.3",
"sequelize": "6.5.1",
PostgreSQL 13.5, Ubuntu 21.10.
Infinite list of related requests
https://github.com/sequelize/sequelize/issues/1478 from 2014. Original issue was MySQL but thread diverged everywhere.
https://github.com/sequelize/sequelize/issues/8294 from 2017. Also asked on Stack Overflow, but got Tumbleweed badge and the question appears to have been auto deleted, can't find it on search. Mentions MySQL. Is a bit of a mess, as it also includes connection errors, which are not clear retries such as PostgreSQL serialization failures.
https://github.com/sequelize/sequelize/issues/12608 mentions Postgres
https://github.com/sequelize/sequelize/issues/13380 by the OP of this question
Meaning of current transaction is aborted, commands ignored until end of transaction block
The error is pretty explicit, but just to clarify to other PostgreSQL newbies: in PostgreSQL, when you get a failure in the middle of a transaction, Postgres just auto-errors any following queries until a ROLLBACK or COMMIT happens and ends the transaction.
The DB client code is then supposed to notice that just re-run the transaction.
These errors are therefore benign, and ideally Sequelize should not raise on them. Those errors are actually expected when using ISOLATION LEVEL SERIALIZABLE and ISOLATION LEVEL REPEATABLE READ, and prevent concurrent errors from happening.
But unfortunately sequelize does raise them just like any other errors, so it is inevitable for our workaround to have a while/try/catch loop.

Related

Trying to use Knex onConflict times out my Cloud Function

I am trying to insert geoJSON data into a PostGIS instance on a regular schedule and there is usually duplicate data each time it runs. I am looping through this geoJSON data and trying to use Knex.js onConflict modifier to ignore when a duplicate key field is found but, it times out my cloud function.
async function insertFeatures() {
try {
const results = await getGeoJSON();
pool = pool || (await createPool());
const st = knexPostgis(pool);
for (const feature of results.features) {
const { geometry, properties } = feature;
const { region, date, type, name, url } = properties;
const point = st.geomFromGeoJSON(geometry);
await pool('observations').insert({
region: region,
url: url,
date: date,
name: name,
type: type,
geom: point,
})
.onConflict('url')
.ignore()
}
} catch (error) {
console.log(error)
return res.status(500).json({
message: error + "Poop"
});
}
}
The timeout error could be caused by a variety of reasons,either it could be transaction batch size your function is processing or connection pool size or database server limitations.Here in your cloud function, check whether when setting up the pool, knex allows us to optionally register afterCreate callback, if this callback is added it is getting positive that you make the call to the done callback that is passed as the last parameter to your registered callback or else no connection will be acquired leading to timeout.
Also one way to see what knex is doing internally is to set DEBUG=knex:* environment variable, before running the code so that knex outputs information about queries, transactions and pool connections while code executes.It is advised that you set batch sizes, connection pool size and connection limits from the database server to match the workload that you are pushing to the server, this ensures the basic timeout issues caused.
Also check for similar examples here:
Knex timeout error acquiring connection
When trying to mass insert timeout occurs for knexjs error
Having timeout error after upgrading knex
Knex timeout acquiring a connection

How to execute different error messages depending on where a query failed in a transaction in pg-promise?

how can I execute varying error messages depending on where a query failed, triggering a rollback, in my transaction?
I'll be using the sample code from the documentation:
db.tx(t => {
// creating a sequence of transaction queries:
const q1 = t.none(query);
const q2 = t.one(query);
const q3 = t.one(query);
// returning a promise that determines a successful transaction:
return t.batch([q1, q2, q3]); // all of the queries are to be resolved;
})
.then(data => {
// success, COMMIT was executed
})
.catch(error => {
// failure, ROLLBACK was executed
});
Preferred output is the following:
if the transaction failed in q1:
res.json({error: true, message:"q1 failed"})
if the transaction failed in q2:
res.json({error: true, message:"q2 failed"})
if the transaction failed in q3:
res.json({error: true, message:"q2 failed"}), etc.
What I'm thinking is using a Switch statement to determine what error message to execute, although I don't have an idea on how to know what query failed in the transaction.
Thank you for your help!
P.S. I recently migrated from node-pg to pg-promise (which is why I'm a bit new with the API) due to having a hard time with transactions as recommended in my previous posts, and yeah, pg-promise made a lot of things easier the 1 day worth of refactoring code is worth it.
Since you are using method batch, you get BatchError thrown when the method fails, which has useful property data, among others:
.catch(err => {
// find index of the first failed query:
const errIdx = err.data.findIndex(e => !e.success);
// do what you want here, based on the index;
});
Note that inside such error handler, err.data[errIdx].result is the same as err.first, representing the first error that occurred.

Grails Job | Multiple updates in mongodb always throw optimistic locking exception, how to handle it?

i have a grails job which is scheduled to run at every night, to update stats of all user which are firstOrderDate, lastOrderDate and totalOrders.
Have a look at the code.
void updateOrderStatsForAllUsers(DateTime date) {
List<Order> usersByOrders = Delivery.findAllByDeliveryDateAndStatus(date, "DELIVERED")*.order
List<User> customers = usersByOrders*.customer.unique()
for (User u in customers) {
List<Order> orders = new ArrayList<Order>();
orders = u.orders?.findAll { it.status.equals("DELIVERED") }?.sort { it?.dateCreated }
if (orders?.size() > 0) {
u.firstOrderDate = orders?.first()?.dateCreated
u.lastOrderDate = orders?.last()?.dateCreated
u.totalOrders = orders.size()
u.save(flush: true)
}
}
}
and the job that runs this code is
def execute(){
long jobStartTime = System.currentTimeMillis()
emailService.sendJobStatusEmail(JOB_NAME, "STARTED", 0, null)
try {
// Daily job for updating user orders
DateTime yesterday = new DateTime().withZone(DateTimeZone.getDefault()).withTimeAtStartOfDay().minusDays(1)
userService.updateOrderStatsForAllUsers(yesterday)
emailService.sendJobStatusEmail(JOB_NAME, "FINISHED", jobStartTime, null)
}
catch (Exception e) {
emailService.sendJobStatusEmail(JOB_NAME, "FAILED", jobStartTime, e)
}
}
So i am sending a mail , for any exception that occurs , now the issue is i always get a failure mail of "Error: OptimisticLockingException" at u.save(). For a date i have around 400 users.
I know why optimistic locking happens , but as you can see i am not updating the same user record in loop instead , i have a list of different users and i am iterating them to update all of them. Then how come i get an optimistic locking exception at user save. help !
Optimistic locking is a hibernate error and Mango DB has nothing to do with this.
What entity is throwing optimistic locking exception is it customer or order or delivery?
How do you ensure none of these entities are getting updated elsewhere in the app when this job is running?
How do you ensure this job is getting triggered only once at a time?
Try to add some logging to see it's a repeatable issue by triggering the job again once the previous execution has completed.
More debugging may help resolve the issue.
the quartz jobs usually do not provide the TX-context for it's operations, so you should wrap your method into a transaction by hand:
def execute(){
...
User.withTransaction{ tx ->
userService.updateOrderStatsForAllUsers(yesterday)
}
....
}

multiple tasks in nodeunit with mongo fail

I've taken How do I get an asynchronous result back with node unit and mongoose? and VERY slightly modified it to be simpler to show my failure.
var mongoose = require('mongoose');
var db;
module.exports = {
setUp: function(callback) {
try {
//db.connection.on('open', function() {
mongoose.connection.on('open', function() {
console.log('Opened connection');
callback();
});
db = mongoose.connect('mongodb://localhost/test_1');
console.log('Started connection, waiting for it to open');
} catch (err) {
console.log('Setting up failed:', err.message);
test.done();
callback(err);
}
},
tearDown: function(callback) {
console.log('In tearDown');
try {
console.log('Closing connection');
db.disconnect();
callback();
} catch (err) {
console.log('Tearing down failed:', err.message);
test.done();
callback(err);
}
},
test1: function(test) {
test.ifError(null);
test.done();
},
test2: function(test) {
test.ifError(null);
test.done();
}
};
When running this with nodeunit I get the following:
stam2_test.js
Started connection, waiting for it to open
Opened connection
In tearDown
Closing connection
✔ test1
Started connection, waiting for it to open
Opened connection
FAILURES: Undone tests (or their setups/teardowns):
- test2
To fix this, make sure all tests call test.done()
Some more info:
If in the setUp/tearDown i don't user mongo but just a test code, like increasing a counter, it all works.
If I have only one test, everything works.
Adding another test AND having mongo in the setup consistently fails it so I guess I'm doing something wrong in the setup.
Thank you in advance.
The reason for the failure seems that the event subscription in mongoose.connection.on('open',...) remains bound to the callback from test1 even after disconnect and connect for test2. The extra call to the previous callback is the one causing trouble.
You should make sure to remove the subscription somehow when you are done with it. Since mongoose connection is based on nodejs EventEmitter then a simple solution might be to replace the call
mongoose.connection.on('open'...)
with
mongoose.connection.once('open'...)
but you could also use the general add/removeListener() as needed.
On a different note, it seems unneeded to connect and disconnect in each test of a unit test. You could just connect once by something like requiring a module that connects to your test database as in require('db_connect_test'), the module db_connect_test should just call mongoose.connect(...) and all the tests would run with the same connection (or pool as mongoose creates).
Have a good one!

Code First - Retrieve and Update Record in a Transaction without Deadlocks

I have a EF code first context which represents a queue of jobs which a processing application can retrieve and run. These processing applications can be running on different machines but pointing at the same database.
The context provides a method that returns a QueueItem if there is any work to do, or null, called CollectQueueItem.
To ensure no two applications can pick up the same job, the collection takes place in a transaction with an ISOLATION LEVEL of REPEATABLE READ. This means that if there are two attempts to pick up the same job at the same time, one will be chosen as the deadlock victim and be rolled back. We can handle this by catching the DbUpdateException and return null.
Here is the code for the CollectQueueItem method:
public QueueItem CollectQueueItem()
{
using (var transaction = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead }))
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
transaction.Complete();
return queueItem;
}
}
catch (DbUpdateException) //we might have been the deadlock victim. No matter.
{ }
return null;
}
}
I ran a test in LinqPad to check that this is working as expected. Here is the test below:
var ids = Enumerable.Range(0, 8).AsParallel().SelectMany(i =>
Enumerable.Range(0, 100).Select(j => {
using (var context = new QueueContext())
{
var queueItem = context.CollectQueueItem();
return queueItem == null ? -1 : queueItem.OperationId;
}
})
);
var sw = Stopwatch.StartNew();
var results = ids.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());
sw.Stop();
Console.WriteLine("Elapsed time: {0}", sw.Elapsed);
Console.WriteLine("Deadlocked: {0}", results.Where(r => r.Key == -1).Select(r => r.Value).SingleOrDefault());
Console.WriteLine("Duplicates: {0}", results.Count(r => r.Key > -1 && r.Value > 1));
//IsolationLevel = IsolationLevel.RepeatableRead:
//Elapsed time: 00:00:26.9198440
//Deadlocked: 634
//Duplicates: 0
//IsolationLevel = IsolationLevel.ReadUncommitted:
//Elapsed time: 00:00:00.8457558
//Deadlocked: 0
//Duplicates: 234
I ran the test a few times. Without the REPEATABLE READ isolation level, the same job is retrieved by different theads (seen in the 234 duplicates). With REPEATABLE READ, jobs are only retrieved once but performance suffers and there are 634 deadlocked transactions.
My question is: is there a way to get this behaviour in EF without the risk of deadlocks or conflicts? I know in real life there will be less contention as the processors won't be continually hitting the database, but nonetheless, is there a way to do this safely without having to handle the DbUpdateException? Can I get performance closer to that of the version without the REPEATABLE READ isolation level? Or are Deadlocks not that bad in fact and I can safely ignore the exception and let the processor retry after a few millis and accept that the performance will be OK if the not all the transactions are happening at the same time?
Thanks in advance!
Id recommend a different approach.
a) sp_getapplock
Use an SQL SP that provides an Application lock feature
So you can have unique app behaviour, which might involve read from the DB or what ever else activity you need to control. It also lets you use EF in a normal way.
OR
b) Optimistic concurrency
http://msdn.microsoft.com/en-us/data/jj592904
//Object Property:
public byte[] RowVersion { get; set; }
//Object Configuration:
Property(p => p.RowVersion).IsRowVersion().IsConcurrencyToken();
a logical extension to the APP lock or used just by itself is the rowversion concurrency field on DB. Allow the dirty read. BUT when someone goes to update the record As collected, it fails if someone beat them to it. Out of the box EF optimistic locking.
You can delete "collected" job records later easily.
This might be better approach unless you expect high levels of concurrency.
As suggested by Phil, I used optimistic concurrency to ensure the job could not be processed more than once. I realised that rather than having to add a dedicated rowversion column I could use the IsLocked bit column as the ConcurrencyToken. Semantically, if this value has changed since we retrieved the row, the update should fail since only one processor should ever be able to lock it. I used the fluent API as below to configure this, although I could also have used the ConcurrencyCheck data annotation.
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<QueueItem>()
.Property(p => p.IsLocked)
.IsConcurrencyToken();
}
I was then able to simple the CollectQueueItem method, losing the TransactionScope entirely and catching the more DbUpdateConcurrencyException.
public OperationQueueItem CollectQueueItem()
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
return queueItem;
}
}
catch (DbUpdateConcurrencyException) //someone else grabbed the job.
{ }
return null;
}
I reran the tests, you can see it's a great compromise. No duplicates, nearly 100x faster than with REPEATABLE READ, and no DEADLOCKS so the DBAs won't be on my case. Awesome!
//Optimistic Concurrency:
//Elapsed time: 00:00:00.5065586
//Deadlocked: 624
//Duplicates: 0