How to execute different error messages depending on where a query failed in a transaction in pg-promise? - pg-promise

how can I execute varying error messages depending on where a query failed, triggering a rollback, in my transaction?
I'll be using the sample code from the documentation:
db.tx(t => {
// creating a sequence of transaction queries:
const q1 = t.none(query);
const q2 = t.one(query);
const q3 = t.one(query);
// returning a promise that determines a successful transaction:
return t.batch([q1, q2, q3]); // all of the queries are to be resolved;
})
.then(data => {
// success, COMMIT was executed
})
.catch(error => {
// failure, ROLLBACK was executed
});
Preferred output is the following:
if the transaction failed in q1:
res.json({error: true, message:"q1 failed"})
if the transaction failed in q2:
res.json({error: true, message:"q2 failed"})
if the transaction failed in q3:
res.json({error: true, message:"q2 failed"}), etc.
What I'm thinking is using a Switch statement to determine what error message to execute, although I don't have an idea on how to know what query failed in the transaction.
Thank you for your help!
P.S. I recently migrated from node-pg to pg-promise (which is why I'm a bit new with the API) due to having a hard time with transactions as recommended in my previous posts, and yeah, pg-promise made a lot of things easier the 1 day worth of refactoring code is worth it.

Since you are using method batch, you get BatchError thrown when the method fails, which has useful property data, among others:
.catch(err => {
// find index of the first failed query:
const errIdx = err.data.findIndex(e => !e.success);
// do what you want here, based on the index;
});
Note that inside such error handler, err.data[errIdx].result is the same as err.first, representing the first error that occurred.

Related

Trying to use Knex onConflict times out my Cloud Function

I am trying to insert geoJSON data into a PostGIS instance on a regular schedule and there is usually duplicate data each time it runs. I am looping through this geoJSON data and trying to use Knex.js onConflict modifier to ignore when a duplicate key field is found but, it times out my cloud function.
async function insertFeatures() {
try {
const results = await getGeoJSON();
pool = pool || (await createPool());
const st = knexPostgis(pool);
for (const feature of results.features) {
const { geometry, properties } = feature;
const { region, date, type, name, url } = properties;
const point = st.geomFromGeoJSON(geometry);
await pool('observations').insert({
region: region,
url: url,
date: date,
name: name,
type: type,
geom: point,
})
.onConflict('url')
.ignore()
}
} catch (error) {
console.log(error)
return res.status(500).json({
message: error + "Poop"
});
}
}
The timeout error could be caused by a variety of reasons,either it could be transaction batch size your function is processing or connection pool size or database server limitations.Here in your cloud function, check whether when setting up the pool, knex allows us to optionally register afterCreate callback, if this callback is added it is getting positive that you make the call to the done callback that is passed as the last parameter to your registered callback or else no connection will be acquired leading to timeout.
Also one way to see what knex is doing internally is to set DEBUG=knex:* environment variable, before running the code so that knex outputs information about queries, transactions and pool connections while code executes.It is advised that you set batch sizes, connection pool size and connection limits from the database server to match the workload that you are pushing to the server, this ensures the basic timeout issues caused.
Also check for similar examples here:
Knex timeout error acquiring connection
When trying to mass insert timeout occurs for knexjs error
Having timeout error after upgrading knex
Knex timeout acquiring a connection

Sequelize transaction retry doens't work as expected

I don't understand how transaction retry works in sequelize.
I am using managed transaction, though I also tried with unmanaged with same outcome
await sequelize.transaction({ isolationLevel: Sequelize.Transaction.ISOLATION_LEVELS.REPEATABLE_READ}, async (t) => {
user = await User.findOne({
where: { id: authenticatedUser.id },
transaction: t,
lock: t.LOCK.UPDATE,
});
user.activationCodeCreatedAt = new Date();
user.activationCode = activationCode;
await user.save({transaction: t});
});
Now if I run this when the row is already locked, I am getting
DatabaseError [SequelizeDatabaseError]: could not serialize access due to concurrent update
which is normal. This is my retry configuration:
retry: {
match: [
/concurrent update/,
],
max: 5
}
I want at this point sequelize to retry this transaction. But instead I see that right after SELECT... FOR UPDATE it's calling again SELECT... FOR UPDATE. This is causing another error
DatabaseError [SequelizeDatabaseError]: current transaction is aborted, commands ignored until end of transaction block
How to use sequelizes internal retry mechanism to retry the whole transaction?
Manual retry workaround function
Since Sequelize devs simply aren't interested in patching this for some reason after many years, here's my workaround:
async function transactionWithRetry(sequelize, transactionArgs, cb) {
let done = false
while (!done) {
try {
await sequelize.transaction(transactionArgs, cb)
done = true
} catch (e) {
if (
sequelize.options.dialect === 'postgres' &&
e instanceof Sequelize.DatabaseError &&
e.original.code === '40001'
) {
await sequelize.query(`ROLLBACK`)
} else {
// Error that we don't know how to handle.
throw e;
}
}
}
}
Sample usage:
const { Transaction } = require('sequelize');
await transactionWithRetry(sequelize,
{ isolationLevel: Transaction.ISOLATION_LEVELS.SERIALIZABLE },
async t => {
const rows = await sequelize.models.MyInt.findAll({ transaction: t })
await sequelize.models.MyInt.update({ i: newI }, { where: {}, transaction: t })
}
)
The error code 40001 is documented at: https://www.postgresql.org/docs/13/errcodes-appendix.html and it's the only one I've managed to observe so far on Serialization failures: What are the conditions for encountering a serialization failure? Let me know if you find any others that should be auto looped and I'll patch them in.
Here's a full runnable test for it which seems to indicate that it is working fine: https://github.com/cirosantilli/cirosantilli.github.io/blob/dbb2ec61bdee17d42fe7e915823df37c4af2da25/sequelize/parallel_select_and_update.js
Tested on:
"pg": "8.5.1",
"pg-hstore": "2.3.3",
"sequelize": "6.5.1",
PostgreSQL 13.5, Ubuntu 21.10.
Infinite list of related requests
https://github.com/sequelize/sequelize/issues/1478 from 2014. Original issue was MySQL but thread diverged everywhere.
https://github.com/sequelize/sequelize/issues/8294 from 2017. Also asked on Stack Overflow, but got Tumbleweed badge and the question appears to have been auto deleted, can't find it on search. Mentions MySQL. Is a bit of a mess, as it also includes connection errors, which are not clear retries such as PostgreSQL serialization failures.
https://github.com/sequelize/sequelize/issues/12608 mentions Postgres
https://github.com/sequelize/sequelize/issues/13380 by the OP of this question
Meaning of current transaction is aborted, commands ignored until end of transaction block
The error is pretty explicit, but just to clarify to other PostgreSQL newbies: in PostgreSQL, when you get a failure in the middle of a transaction, Postgres just auto-errors any following queries until a ROLLBACK or COMMIT happens and ends the transaction.
The DB client code is then supposed to notice that just re-run the transaction.
These errors are therefore benign, and ideally Sequelize should not raise on them. Those errors are actually expected when using ISOLATION LEVEL SERIALIZABLE and ISOLATION LEVEL REPEATABLE READ, and prevent concurrent errors from happening.
But unfortunately sequelize does raise them just like any other errors, so it is inevitable for our workaround to have a while/try/catch loop.

sequential queries in MongoDB queries not working properly sometimes

I am executing 2 update queries in sequential manner. I am using generator function & yield to handle asynchronous behaviour of javascript.
var result = yield db.tasks.update({
"_id": task._id,
"taskLog":{$elemMatch:{"currentApproverRole": vcurrentApproverRole,
"currentApprover": new RegExp(employeeCode, 'i')}}
}, {
$set: {
"taskPendingAt": vnextApproverEmpCode,
"status": vactionTaken,
"lastUpdated": vactionTakenTime,
"lastUpdatedBy": employeeCode,
"shortPin":shortPin,
"workFlowDetails":task.workFlowDetails,
"taskLog.$.reason": reason,
"taskLog.$.actionTakenBy": employeeCode,
"taskLog.$.actionTakenByName": loggedInUser.firstName+" "+loggedInUser.lastName,
"taskLog.$.actionTaken": vactionTaken,
"taskLog.$.actionTakenTime": vactionTakenTime
}
});
var vstatus = vactionTaken;
// Below is the query that is not working properly sometimes
yield db.groupPicnic.update({"gppTaskId": task.workFlowDetails.gppTaskId, "probableParticipantList.employeeCode": task.createdBy},
{
$set: {
'probableParticipantList.$.applicationStatus': vactionTaken
}
})
Second update operation does not execute sometimes (Works 9 out of 10 times). I don't seem to figure out how to handle this issue?
ES6 generators are supposed to provide a simple way for writing iterators.
An iterator is just a sequence of values - like an array, but consumed dynamically and produced lazily.
Currently your code does this:
let imAnUnresolvedPromise = co().next();
// exiting app, promise might not resolve in time
By moving forward and -not- waiting on the promise (assuming your app closes) you can't guarantee that it will execute in time, hence why the unstable behaviour your experiencing.
All you have to change is to wait on the promise to resolve.
let resolveThis = await co().next();
EDIT:
Without async/await syntax you'll have to use nested callbacks to guarantee the correct order, like so:
co().next().then((promiseResolved) => {
co().next().then((promiseTwoResolved) => {
console.log("I'm done")
})
});

Replay subject subscription behaviour

The following code works as expected but I am puzzled by the way it behaves when I uncomment the line 'o.OnCompleted();'
The code joins all subscribers to the result of a single long operation and caches the result for further subscribers for 2 seconds. Any subscription after this time starts the process again.
Subscriptions will come from other threads (simulated with the thread pool).
var obs = Observable.Create((IObserver<Guid> o) =>
{
Console.WriteLine("Start");
Thread.Sleep(1000); // process
Console.WriteLine("End");
o.OnNext(Guid.NewGuid());
//o.OnCompleted(); // <-- uncomment this
return Disposable.Empty;
})
.Replay(TimeSpan.FromSeconds(2))
.RefCount()
.Take(1);
ThreadPool.QueueUserWorkItem(delegate
{
// simulate request from threadpool
obs.Subscribe(x => Console.WriteLine($"1: {x}"), () => Console.WriteLine($"1: complete"));
});
ThreadPool.QueueUserWorkItem(delegate
{
obs.Subscribe(x => Console.WriteLine($"2: {x}"), () => Console.WriteLine($"2: complete"));
});
Thread.Sleep(4000);
ThreadPool.QueueUserWorkItem(delegate
{
obs.Subscribe(x => Console.WriteLine($"3: {x}"), () => Console.WriteLine($"3: complete"));
});
Here is the result:
Start
End
1: 255BEFDC-2F14-40AD-AE77-2B005C5A3AA9
2: 255BEFDC-2F14-40AD-AE77-2B005C5A3AA9
1: complete
2: complete
Start
End
3: 1214DC63-F688-475A-9CB7-C3784054A4AC
3: complete
The odd behaviour is if I uncomment the line 'o.OnCompleted()' the result changes to this:
Start
End
1: 255BEFDC-2F14-40AD-AE77-2B005C5A3AA9
2: 255BEFDC-2F14-40AD-AE77-2B005C5A3AA9
1: complete
2: complete
Start
End
3: complete
The 3rd subscriber causes another subscription to the root observable but the result is missing. It appears the ReplaySubject caches the result of the previous observable having completed but still causes a new subscription. This seems unintuitive. I would like to understand why it doesn't work.
Note: I originially tried this using Defer instead of Create which had the same result as the second run above (for obvious reasons).
When you use the Replay/RefCount pair you create an observable that shares a common subscription to the source observable.
From the source:
Returns a connectable observable sequence that shares a single subscription to the underlying sequence replaying all notifications.
Now, it's important to remember that an observable produces a series of zero or more values, followed by either a complete or error signal. It cannot produce values after a complete or error is produced.
Since you are sharing a common subscription to the source and if your source producing a complete then it cannot produce more values. So when you call o.OnCompleted() then you're doing exactly that.
Also, as a side-note, you should avoid ever writing return Disposable.Empty; inside a Create. It means you're creating an observable than can complete before the subscription has returned and that can lead to race conditions.
The way to write your code without it is:
var obs =
Observable
.Defer(() => Observable.Return(Guid.NewGuid()).Concat(Observable.Never<Guid>()))
.Replay(TimeSpan.FromSeconds(2.0))
.RefCount()
.Take(1);
But this is the same as not calling o.OnCompleted().

Code First - Retrieve and Update Record in a Transaction without Deadlocks

I have a EF code first context which represents a queue of jobs which a processing application can retrieve and run. These processing applications can be running on different machines but pointing at the same database.
The context provides a method that returns a QueueItem if there is any work to do, or null, called CollectQueueItem.
To ensure no two applications can pick up the same job, the collection takes place in a transaction with an ISOLATION LEVEL of REPEATABLE READ. This means that if there are two attempts to pick up the same job at the same time, one will be chosen as the deadlock victim and be rolled back. We can handle this by catching the DbUpdateException and return null.
Here is the code for the CollectQueueItem method:
public QueueItem CollectQueueItem()
{
using (var transaction = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead }))
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
transaction.Complete();
return queueItem;
}
}
catch (DbUpdateException) //we might have been the deadlock victim. No matter.
{ }
return null;
}
}
I ran a test in LinqPad to check that this is working as expected. Here is the test below:
var ids = Enumerable.Range(0, 8).AsParallel().SelectMany(i =>
Enumerable.Range(0, 100).Select(j => {
using (var context = new QueueContext())
{
var queueItem = context.CollectQueueItem();
return queueItem == null ? -1 : queueItem.OperationId;
}
})
);
var sw = Stopwatch.StartNew();
var results = ids.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());
sw.Stop();
Console.WriteLine("Elapsed time: {0}", sw.Elapsed);
Console.WriteLine("Deadlocked: {0}", results.Where(r => r.Key == -1).Select(r => r.Value).SingleOrDefault());
Console.WriteLine("Duplicates: {0}", results.Count(r => r.Key > -1 && r.Value > 1));
//IsolationLevel = IsolationLevel.RepeatableRead:
//Elapsed time: 00:00:26.9198440
//Deadlocked: 634
//Duplicates: 0
//IsolationLevel = IsolationLevel.ReadUncommitted:
//Elapsed time: 00:00:00.8457558
//Deadlocked: 0
//Duplicates: 234
I ran the test a few times. Without the REPEATABLE READ isolation level, the same job is retrieved by different theads (seen in the 234 duplicates). With REPEATABLE READ, jobs are only retrieved once but performance suffers and there are 634 deadlocked transactions.
My question is: is there a way to get this behaviour in EF without the risk of deadlocks or conflicts? I know in real life there will be less contention as the processors won't be continually hitting the database, but nonetheless, is there a way to do this safely without having to handle the DbUpdateException? Can I get performance closer to that of the version without the REPEATABLE READ isolation level? Or are Deadlocks not that bad in fact and I can safely ignore the exception and let the processor retry after a few millis and accept that the performance will be OK if the not all the transactions are happening at the same time?
Thanks in advance!
Id recommend a different approach.
a) sp_getapplock
Use an SQL SP that provides an Application lock feature
So you can have unique app behaviour, which might involve read from the DB or what ever else activity you need to control. It also lets you use EF in a normal way.
OR
b) Optimistic concurrency
http://msdn.microsoft.com/en-us/data/jj592904
//Object Property:
public byte[] RowVersion { get; set; }
//Object Configuration:
Property(p => p.RowVersion).IsRowVersion().IsConcurrencyToken();
a logical extension to the APP lock or used just by itself is the rowversion concurrency field on DB. Allow the dirty read. BUT when someone goes to update the record As collected, it fails if someone beat them to it. Out of the box EF optimistic locking.
You can delete "collected" job records later easily.
This might be better approach unless you expect high levels of concurrency.
As suggested by Phil, I used optimistic concurrency to ensure the job could not be processed more than once. I realised that rather than having to add a dedicated rowversion column I could use the IsLocked bit column as the ConcurrencyToken. Semantically, if this value has changed since we retrieved the row, the update should fail since only one processor should ever be able to lock it. I used the fluent API as below to configure this, although I could also have used the ConcurrencyCheck data annotation.
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<QueueItem>()
.Property(p => p.IsLocked)
.IsConcurrencyToken();
}
I was then able to simple the CollectQueueItem method, losing the TransactionScope entirely and catching the more DbUpdateConcurrencyException.
public OperationQueueItem CollectQueueItem()
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
return queueItem;
}
}
catch (DbUpdateConcurrencyException) //someone else grabbed the job.
{ }
return null;
}
I reran the tests, you can see it's a great compromise. No duplicates, nearly 100x faster than with REPEATABLE READ, and no DEADLOCKS so the DBAs won't be on my case. Awesome!
//Optimistic Concurrency:
//Elapsed time: 00:00:00.5065586
//Deadlocked: 624
//Duplicates: 0