OrientDB keeps locking records indefinitely - orientdb

I'm running OrientDB (community edition 2.2.9) in distributed mode with multiple nodes.
After a couple of minutes, I start getting the following error on my queries:
com.orientechnologies.orient.server.distributed.task.ODistributedRecordLockedException: Timeout (1500ms) on acquiring lock on record #1010:2651. It is locked by request 3.1000 DB name="MyDatabase"
The query in this instance looks like this:
UPDATE #1010:2651 SET name='foo';
The record remains locked and I can't run the query until I restart the database.
If I don't run the server in distributed mode, I don't get this error so it must have something to do with running it in distributed mode.
Here is my default-distributed-db-config.json
{
"autoDeploy": true,
"readQuorum": 1,
"writeQuorum": 1,
"executionMode": "asynchronous",
"readYourWrites": true,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"*": {
"servers": ["<NEW_NODE>"]
}
}
}
I was using the following configuration in my orientdb-server-config.xml:
....
<handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
<parameters>
....
<parameter value="com.orientechnologies.orient.server.distributed.conflict.ODefaultReplicationConflictResolver" name="conflict.resolver.impl"/>
....
</parameters>
</handler>
...
By removing the "ODefaultReplicationConflictResolver" paramter from the config, the locking issue happens less frequently.
Why are the records locking up like this and how can I avoid it?

Using asynchronous execution mode may cause this problem. See: Asynchronous replication mode.
You can try changing execution mode or try adding a retry into your query. Using Java: it is possible to catch events of command during asynchronous replication, thanks to the following method of OCommandSQL:
onAsyncReplicationOk(), to catch the event when the asynchronous replication succeed
onAsyncReplicationError(), to catch the event when the asynchronous replication returns error
Example retrying up to 3 times in case of concurrent modification exception on creation of edges:
g.command( new OCommandSQL("create edge Own from (select from User) to (select from Post)")
.onAsyncReplicationError(new OAsyncReplicationError() {
#Override
public ACTION onAsyncReplicationError(Throwable iException, int iRetry) {
System.err.println("Error, retrying...");
return iException instanceof ONeedRetryException && iRetry<=3 ? ACTION.RETRY : ACTION.IGNORE;
}
})
.onAsyncReplicationOk(new OAsyncReplicationOk() {
System.out.println("OK");
}
).execute();
Or adding retry into a SQL Batch:
begin
let upd = UPDATE #1010:2651 SET name='foo'
commit retry 100
return $upd
Hope it helps.

Related

Mongo change-Stream with Spring resumeAt vs startAfter and fault tolerance in case of connection loss

Can't find an answer on stackOverflow, nor in any documentation,
I have the following change stream code(listen to a DB not a specific collection)
Mongo Version is 4.2
#Configuration
public class DatabaseChangeStreamListener {
//Constructor, fields etc...
#PostConstruct
public void initialize() {
MessageListenerContainer container = new DefaultMessageListenerContainer(mongoTemplate, new SimpleAsyncTaskExecutor(), this::onException);
ChangeStreamRequest.ChangeStreamRequestOptions options =
new ChangeStreamRequest.ChangeStreamRequestOptions(mongoTemplate.getDb().getName(), null, buildChangeStreamOptions());
container.register(new ChangeStreamRequest<>(this::onDatabaseChangedEvent, options), Document.class);
container.start();
}
private ChangeStreamOptions buildChangeStreamOptions() {
return ChangeStreamOptions.builder()
.returnFullDocumentOnUpdate()
.filter(newAggregation(match(where(OPERATION_TYPE).in(INSERT.getValue(), UPDATE.getValue(), REPLACE.getValue(), DELETE.getValue()))))
.resumeAt(Instant.now().minusSeconds(1))
.build();
}
//more code
}
I want the stream to start listening from system initiation time only, without taking anything prior in the op-log, will .resumeAt(Instant.now().minusSeconds(1)) work?
do I need to use starAfter method if so how can I found the latest resumeToken in the db?
or is it ready out of the box and I don't need to add any resume/start lines?
second question, I never stop the container(it should always live while app is running), In case of disconnection from the mongoDB and reconnection will the listener in current configuration continue to consume messages? (I am having a hard time simulation DB disconnection)
If it will not resume handling events, what do I need to change in the configuration so that the change stream will continue and will take all the event from the last received resumeToken prior to the disconnection?
I have read this great article on medium change stream in prodcution,
but it uses the cursor directly, and I want to use the spring DefaultMessageListenerContainer, as it is much more elegant.
So I will answer my own(some more dumb, some less :)...) questions:
when no resumeAt timestamp provided the change stream will start from current time, and will not draw any previous events.
resumeAfter event vs timestamp difference can be found here: stackOverflow answer
but keep in mind, that for timestamp it is inclusive of the event, so if you want to start from next event(in java) do:
private BsonTimestamp getNextEventTimestamp(BsonTimestamp timestamp) {
return new BsonTimestamp(timestamp.getValue() + 1);
}
In case of internet disconnection the change stream will not resume,
as such I recommend to take following approach in case of error:
private void onException() {
ScheduledExecutorService executorService = newSingleThreadScheduledExecutor();
executorService.scheduleAtFixedRate(() -> recreateChangeStream(executorService), 0, 1, TimeUnit.SECONDS);
}
private void recreateChangeStream(ScheduledExecutorService executorService) {
try {
mongoTemplate.getDb().runCommand(new BasicDBObject("ping", "1"));
container.stop();
startNewContainer();
executorService.shutdown();
} catch (Exception ignored) {
}
}
First I am creating a runnable scheduled task that always runs(but only 1 at a time newSingleThreadScheduledExecutor()), I am trying to ping the DB, after a successful ping I am stopping the old container and starting a new one, you can also pass the last timestamp you took so that you can get all events you might have missed
timestamp retrieval from event:
BsonTimestamp resumeAtTimestamp = changeStreamDocument.getClusterTime();
then I am shutting down the task.
also make sure the resumeAtTimestamp exist in oplog...

Sequelize transaction retry doens't work as expected

I don't understand how transaction retry works in sequelize.
I am using managed transaction, though I also tried with unmanaged with same outcome
await sequelize.transaction({ isolationLevel: Sequelize.Transaction.ISOLATION_LEVELS.REPEATABLE_READ}, async (t) => {
user = await User.findOne({
where: { id: authenticatedUser.id },
transaction: t,
lock: t.LOCK.UPDATE,
});
user.activationCodeCreatedAt = new Date();
user.activationCode = activationCode;
await user.save({transaction: t});
});
Now if I run this when the row is already locked, I am getting
DatabaseError [SequelizeDatabaseError]: could not serialize access due to concurrent update
which is normal. This is my retry configuration:
retry: {
match: [
/concurrent update/,
],
max: 5
}
I want at this point sequelize to retry this transaction. But instead I see that right after SELECT... FOR UPDATE it's calling again SELECT... FOR UPDATE. This is causing another error
DatabaseError [SequelizeDatabaseError]: current transaction is aborted, commands ignored until end of transaction block
How to use sequelizes internal retry mechanism to retry the whole transaction?
Manual retry workaround function
Since Sequelize devs simply aren't interested in patching this for some reason after many years, here's my workaround:
async function transactionWithRetry(sequelize, transactionArgs, cb) {
let done = false
while (!done) {
try {
await sequelize.transaction(transactionArgs, cb)
done = true
} catch (e) {
if (
sequelize.options.dialect === 'postgres' &&
e instanceof Sequelize.DatabaseError &&
e.original.code === '40001'
) {
await sequelize.query(`ROLLBACK`)
} else {
// Error that we don't know how to handle.
throw e;
}
}
}
}
Sample usage:
const { Transaction } = require('sequelize');
await transactionWithRetry(sequelize,
{ isolationLevel: Transaction.ISOLATION_LEVELS.SERIALIZABLE },
async t => {
const rows = await sequelize.models.MyInt.findAll({ transaction: t })
await sequelize.models.MyInt.update({ i: newI }, { where: {}, transaction: t })
}
)
The error code 40001 is documented at: https://www.postgresql.org/docs/13/errcodes-appendix.html and it's the only one I've managed to observe so far on Serialization failures: What are the conditions for encountering a serialization failure? Let me know if you find any others that should be auto looped and I'll patch them in.
Here's a full runnable test for it which seems to indicate that it is working fine: https://github.com/cirosantilli/cirosantilli.github.io/blob/dbb2ec61bdee17d42fe7e915823df37c4af2da25/sequelize/parallel_select_and_update.js
Tested on:
"pg": "8.5.1",
"pg-hstore": "2.3.3",
"sequelize": "6.5.1",
PostgreSQL 13.5, Ubuntu 21.10.
Infinite list of related requests
https://github.com/sequelize/sequelize/issues/1478 from 2014. Original issue was MySQL but thread diverged everywhere.
https://github.com/sequelize/sequelize/issues/8294 from 2017. Also asked on Stack Overflow, but got Tumbleweed badge and the question appears to have been auto deleted, can't find it on search. Mentions MySQL. Is a bit of a mess, as it also includes connection errors, which are not clear retries such as PostgreSQL serialization failures.
https://github.com/sequelize/sequelize/issues/12608 mentions Postgres
https://github.com/sequelize/sequelize/issues/13380 by the OP of this question
Meaning of current transaction is aborted, commands ignored until end of transaction block
The error is pretty explicit, but just to clarify to other PostgreSQL newbies: in PostgreSQL, when you get a failure in the middle of a transaction, Postgres just auto-errors any following queries until a ROLLBACK or COMMIT happens and ends the transaction.
The DB client code is then supposed to notice that just re-run the transaction.
These errors are therefore benign, and ideally Sequelize should not raise on them. Those errors are actually expected when using ISOLATION LEVEL SERIALIZABLE and ISOLATION LEVEL REPEATABLE READ, and prevent concurrent errors from happening.
But unfortunately sequelize does raise them just like any other errors, so it is inevitable for our workaround to have a while/try/catch loop.

sequential queries in MongoDB queries not working properly sometimes

I am executing 2 update queries in sequential manner. I am using generator function & yield to handle asynchronous behaviour of javascript.
var result = yield db.tasks.update({
"_id": task._id,
"taskLog":{$elemMatch:{"currentApproverRole": vcurrentApproverRole,
"currentApprover": new RegExp(employeeCode, 'i')}}
}, {
$set: {
"taskPendingAt": vnextApproverEmpCode,
"status": vactionTaken,
"lastUpdated": vactionTakenTime,
"lastUpdatedBy": employeeCode,
"shortPin":shortPin,
"workFlowDetails":task.workFlowDetails,
"taskLog.$.reason": reason,
"taskLog.$.actionTakenBy": employeeCode,
"taskLog.$.actionTakenByName": loggedInUser.firstName+" "+loggedInUser.lastName,
"taskLog.$.actionTaken": vactionTaken,
"taskLog.$.actionTakenTime": vactionTakenTime
}
});
var vstatus = vactionTaken;
// Below is the query that is not working properly sometimes
yield db.groupPicnic.update({"gppTaskId": task.workFlowDetails.gppTaskId, "probableParticipantList.employeeCode": task.createdBy},
{
$set: {
'probableParticipantList.$.applicationStatus': vactionTaken
}
})
Second update operation does not execute sometimes (Works 9 out of 10 times). I don't seem to figure out how to handle this issue?
ES6 generators are supposed to provide a simple way for writing iterators.
An iterator is just a sequence of values - like an array, but consumed dynamically and produced lazily.
Currently your code does this:
let imAnUnresolvedPromise = co().next();
// exiting app, promise might not resolve in time
By moving forward and -not- waiting on the promise (assuming your app closes) you can't guarantee that it will execute in time, hence why the unstable behaviour your experiencing.
All you have to change is to wait on the promise to resolve.
let resolveThis = await co().next();
EDIT:
Without async/await syntax you'll have to use nested callbacks to guarantee the correct order, like so:
co().next().then((promiseResolved) => {
co().next().then((promiseTwoResolved) => {
console.log("I'm done")
})
});

Grails Job | Multiple updates in mongodb always throw optimistic locking exception, how to handle it?

i have a grails job which is scheduled to run at every night, to update stats of all user which are firstOrderDate, lastOrderDate and totalOrders.
Have a look at the code.
void updateOrderStatsForAllUsers(DateTime date) {
List<Order> usersByOrders = Delivery.findAllByDeliveryDateAndStatus(date, "DELIVERED")*.order
List<User> customers = usersByOrders*.customer.unique()
for (User u in customers) {
List<Order> orders = new ArrayList<Order>();
orders = u.orders?.findAll { it.status.equals("DELIVERED") }?.sort { it?.dateCreated }
if (orders?.size() > 0) {
u.firstOrderDate = orders?.first()?.dateCreated
u.lastOrderDate = orders?.last()?.dateCreated
u.totalOrders = orders.size()
u.save(flush: true)
}
}
}
and the job that runs this code is
def execute(){
long jobStartTime = System.currentTimeMillis()
emailService.sendJobStatusEmail(JOB_NAME, "STARTED", 0, null)
try {
// Daily job for updating user orders
DateTime yesterday = new DateTime().withZone(DateTimeZone.getDefault()).withTimeAtStartOfDay().minusDays(1)
userService.updateOrderStatsForAllUsers(yesterday)
emailService.sendJobStatusEmail(JOB_NAME, "FINISHED", jobStartTime, null)
}
catch (Exception e) {
emailService.sendJobStatusEmail(JOB_NAME, "FAILED", jobStartTime, e)
}
}
So i am sending a mail , for any exception that occurs , now the issue is i always get a failure mail of "Error: OptimisticLockingException" at u.save(). For a date i have around 400 users.
I know why optimistic locking happens , but as you can see i am not updating the same user record in loop instead , i have a list of different users and i am iterating them to update all of them. Then how come i get an optimistic locking exception at user save. help !
Optimistic locking is a hibernate error and Mango DB has nothing to do with this.
What entity is throwing optimistic locking exception is it customer or order or delivery?
How do you ensure none of these entities are getting updated elsewhere in the app when this job is running?
How do you ensure this job is getting triggered only once at a time?
Try to add some logging to see it's a repeatable issue by triggering the job again once the previous execution has completed.
More debugging may help resolve the issue.
the quartz jobs usually do not provide the TX-context for it's operations, so you should wrap your method into a transaction by hand:
def execute(){
...
User.withTransaction{ tx ->
userService.updateOrderStatsForAllUsers(yesterday)
}
....
}

How can MongoDB java driver determine if replica set is in the process of automatic failover?

Our application is build upon mongodb replica set.
I'd like to catch all exceptions thrown among the time frame when replica set is in process of automatic failover.
I will make application retry or wait for failover completes.
So that the failover won't influence user.
I found document describing the behavior of java driver here: https://jira.mongodb.org/browse/DOCS-581
I write a test program to find all possible exceptions, they are all MongoException but with different message:
MongoException.Network: "Read operation to server /10.11.0.121:27017 failed on database test"
MongoException: "can't find a master"
MongoException: "not talking to master and retries used up"
MongoException: "No replica set members available in [ here is replica set status ] for { "mode" : "primary"}"
Maybe more...
I'm confused and not sure if it is safe to determine by error message.
Also I don't want to catch all MongoException.
Any suggestion?
Thanks
I am now of the opinion that Mongo in Java is particularly weak in this regards. I don't think your strategy of interpreting the error codes scales well or will survive driver evolution. This is, of course, opinion.
The good news is that the Mongo driver provides a way get the status of a ReplicaSet: http://api.mongodb.org/java/2.11.1/com/mongodb/ReplicaSetStatus.html. You can use it directly to figure out whether there is a Master visible to your application. If that is all you want to know, the http://api.mongodb.org/java/2.11.1/com/mongodb/Mongo.html#getReplicaSetStatus() is all you need. Grab that kid and check for a not-null master and you are on your way.
ReplicaSetStatus rss = mongo.getReplicaSetStatus();
boolean driverInFailover = rss.getMaster() == null;
If what you really need is to figure out if the ReplSet is dead, read-only, or read-write, this gets more difficult. Here is the code that kind-of works for me. I hate it.
#Override
public ReplSetStatus getReplSetStatus() {
ReplSetStatus rss = ReplSetStatus.DOWN;
MongoClient freshClient = null;
try {
if ( mongo != null ) {
ReplicaSetStatus replicaSetStatus = mongo.getReplicaSetStatus();
if ( replicaSetStatus != null ) {
if ( replicaSetStatus.getMaster() != null ) {
rss = ReplSetStatus.ReadWrite;
} else {
/*
* When mongo.getReplicaSetStatus().getMaster() returns null, it takes a a
* fresh client to assert whether the ReplSet is read-only or completely
* down. I freaking hate this, but take it up with 10gen.
*/
freshClient = new MongoClient( mongo.getAllAddress(), mongo.getMongoClientOptions() );
replicaSetStatus = freshClient.getReplicaSetStatus();
if ( replicaSetStatus != null ) {
rss = replicaSetStatus.getMaster() != null ? ReplSetStatus.ReadWrite : ReplSetStatus.ReadOnly;
} else {
log.warn( "freshClient.getReplicaSetStatus() is null" );
}
}
} else {
log.warn( "mongo.getReplicaSetStatus() returned null" );
}
} else {
throw new IllegalStateException( "mongo is null?!?" );
}
} catch ( Throwable t ) {
log.error( "Ingore unexpected error", t );
} finally {
if ( freshClient != null ) {
freshClient.close();
}
}
log.debug( "getReplSetStatus(): {}", rss );
return rss;
}
I hate it because it doesn't follow the Mongo Java Driver convention of your application only needs a single Mongo and through this singleton you connect to the rest of the Mongo data structures (DB, Collection, etc). I have only been able to observe this working by new'ing up a second Mongo during the check so that I can rely upon the ReplicaSetStatus null check to discriminate between "ReplSet-DOWN" and "read-only".
What is really needed in this driver is some way to ask direct questions of the Mongo to see if the ReplSet can be expected at this moment to support each of the WriteConcerns or ReadPreferences. Something like...
/**
* #return true if current state of Client can support readPreference, false otherwise
*/
boolean mongo.canDoRead( ReadPreference readPreference )
/**
* #return true if current state of Client can support writeConcern; false otherwise
*/
boolean mongo.canDoWrite( WriteConcern writeConcern )
This makes sense to me because it acknowledges the fact that the ReplSet may have been great when the Mongo was created, but conditions right now mean that Read or Write operations of a specific type may fail due to changing conditions.
In any event, maybe http://api.mongodb.org/java/2.11.1/com/mongodb/ReplicaSetStatus.html gets you what you need.
When Mongo is failing over, there are no nodes in a PRIMARY state. You can just get the replica set status via the replSetGetStatus command and look for a master node. If you don't find one, you can assume that the cluster is in a failover transition state, and can retry as desired, checking the replica set status on each failed connection.
I don't know the Java driver implementation itself, but I'd do catch all MongoExceptions, then filter them on getCode() basis. If the error code does not apply to replica sets failures, then I'd rethrow the MongoException.
The problem is, to my knowledge there is no error codes reference in the documentation. Well there is a stub here, but this is fairly incomplete. The only way is to read the code of the Java driver to know what code it uses…