How can MongoDB java driver determine if replica set is in the process of automatic failover? - mongodb

Our application is build upon mongodb replica set.
I'd like to catch all exceptions thrown among the time frame when replica set is in process of automatic failover.
I will make application retry or wait for failover completes.
So that the failover won't influence user.
I found document describing the behavior of java driver here: https://jira.mongodb.org/browse/DOCS-581
I write a test program to find all possible exceptions, they are all MongoException but with different message:
MongoException.Network: "Read operation to server /10.11.0.121:27017 failed on database test"
MongoException: "can't find a master"
MongoException: "not talking to master and retries used up"
MongoException: "No replica set members available in [ here is replica set status ] for { "mode" : "primary"}"
Maybe more...
I'm confused and not sure if it is safe to determine by error message.
Also I don't want to catch all MongoException.
Any suggestion?
Thanks

I am now of the opinion that Mongo in Java is particularly weak in this regards. I don't think your strategy of interpreting the error codes scales well or will survive driver evolution. This is, of course, opinion.
The good news is that the Mongo driver provides a way get the status of a ReplicaSet: http://api.mongodb.org/java/2.11.1/com/mongodb/ReplicaSetStatus.html. You can use it directly to figure out whether there is a Master visible to your application. If that is all you want to know, the http://api.mongodb.org/java/2.11.1/com/mongodb/Mongo.html#getReplicaSetStatus() is all you need. Grab that kid and check for a not-null master and you are on your way.
ReplicaSetStatus rss = mongo.getReplicaSetStatus();
boolean driverInFailover = rss.getMaster() == null;
If what you really need is to figure out if the ReplSet is dead, read-only, or read-write, this gets more difficult. Here is the code that kind-of works for me. I hate it.
#Override
public ReplSetStatus getReplSetStatus() {
ReplSetStatus rss = ReplSetStatus.DOWN;
MongoClient freshClient = null;
try {
if ( mongo != null ) {
ReplicaSetStatus replicaSetStatus = mongo.getReplicaSetStatus();
if ( replicaSetStatus != null ) {
if ( replicaSetStatus.getMaster() != null ) {
rss = ReplSetStatus.ReadWrite;
} else {
/*
* When mongo.getReplicaSetStatus().getMaster() returns null, it takes a a
* fresh client to assert whether the ReplSet is read-only or completely
* down. I freaking hate this, but take it up with 10gen.
*/
freshClient = new MongoClient( mongo.getAllAddress(), mongo.getMongoClientOptions() );
replicaSetStatus = freshClient.getReplicaSetStatus();
if ( replicaSetStatus != null ) {
rss = replicaSetStatus.getMaster() != null ? ReplSetStatus.ReadWrite : ReplSetStatus.ReadOnly;
} else {
log.warn( "freshClient.getReplicaSetStatus() is null" );
}
}
} else {
log.warn( "mongo.getReplicaSetStatus() returned null" );
}
} else {
throw new IllegalStateException( "mongo is null?!?" );
}
} catch ( Throwable t ) {
log.error( "Ingore unexpected error", t );
} finally {
if ( freshClient != null ) {
freshClient.close();
}
}
log.debug( "getReplSetStatus(): {}", rss );
return rss;
}
I hate it because it doesn't follow the Mongo Java Driver convention of your application only needs a single Mongo and through this singleton you connect to the rest of the Mongo data structures (DB, Collection, etc). I have only been able to observe this working by new'ing up a second Mongo during the check so that I can rely upon the ReplicaSetStatus null check to discriminate between "ReplSet-DOWN" and "read-only".
What is really needed in this driver is some way to ask direct questions of the Mongo to see if the ReplSet can be expected at this moment to support each of the WriteConcerns or ReadPreferences. Something like...
/**
* #return true if current state of Client can support readPreference, false otherwise
*/
boolean mongo.canDoRead( ReadPreference readPreference )
/**
* #return true if current state of Client can support writeConcern; false otherwise
*/
boolean mongo.canDoWrite( WriteConcern writeConcern )
This makes sense to me because it acknowledges the fact that the ReplSet may have been great when the Mongo was created, but conditions right now mean that Read or Write operations of a specific type may fail due to changing conditions.
In any event, maybe http://api.mongodb.org/java/2.11.1/com/mongodb/ReplicaSetStatus.html gets you what you need.

When Mongo is failing over, there are no nodes in a PRIMARY state. You can just get the replica set status via the replSetGetStatus command and look for a master node. If you don't find one, you can assume that the cluster is in a failover transition state, and can retry as desired, checking the replica set status on each failed connection.

I don't know the Java driver implementation itself, but I'd do catch all MongoExceptions, then filter them on getCode() basis. If the error code does not apply to replica sets failures, then I'd rethrow the MongoException.
The problem is, to my knowledge there is no error codes reference in the documentation. Well there is a stub here, but this is fairly incomplete. The only way is to read the code of the Java driver to know what code it uses…

Related

Mongo change-Stream with Spring resumeAt vs startAfter and fault tolerance in case of connection loss

Can't find an answer on stackOverflow, nor in any documentation,
I have the following change stream code(listen to a DB not a specific collection)
Mongo Version is 4.2
#Configuration
public class DatabaseChangeStreamListener {
//Constructor, fields etc...
#PostConstruct
public void initialize() {
MessageListenerContainer container = new DefaultMessageListenerContainer(mongoTemplate, new SimpleAsyncTaskExecutor(), this::onException);
ChangeStreamRequest.ChangeStreamRequestOptions options =
new ChangeStreamRequest.ChangeStreamRequestOptions(mongoTemplate.getDb().getName(), null, buildChangeStreamOptions());
container.register(new ChangeStreamRequest<>(this::onDatabaseChangedEvent, options), Document.class);
container.start();
}
private ChangeStreamOptions buildChangeStreamOptions() {
return ChangeStreamOptions.builder()
.returnFullDocumentOnUpdate()
.filter(newAggregation(match(where(OPERATION_TYPE).in(INSERT.getValue(), UPDATE.getValue(), REPLACE.getValue(), DELETE.getValue()))))
.resumeAt(Instant.now().minusSeconds(1))
.build();
}
//more code
}
I want the stream to start listening from system initiation time only, without taking anything prior in the op-log, will .resumeAt(Instant.now().minusSeconds(1)) work?
do I need to use starAfter method if so how can I found the latest resumeToken in the db?
or is it ready out of the box and I don't need to add any resume/start lines?
second question, I never stop the container(it should always live while app is running), In case of disconnection from the mongoDB and reconnection will the listener in current configuration continue to consume messages? (I am having a hard time simulation DB disconnection)
If it will not resume handling events, what do I need to change in the configuration so that the change stream will continue and will take all the event from the last received resumeToken prior to the disconnection?
I have read this great article on medium change stream in prodcution,
but it uses the cursor directly, and I want to use the spring DefaultMessageListenerContainer, as it is much more elegant.
So I will answer my own(some more dumb, some less :)...) questions:
when no resumeAt timestamp provided the change stream will start from current time, and will not draw any previous events.
resumeAfter event vs timestamp difference can be found here: stackOverflow answer
but keep in mind, that for timestamp it is inclusive of the event, so if you want to start from next event(in java) do:
private BsonTimestamp getNextEventTimestamp(BsonTimestamp timestamp) {
return new BsonTimestamp(timestamp.getValue() + 1);
}
In case of internet disconnection the change stream will not resume,
as such I recommend to take following approach in case of error:
private void onException() {
ScheduledExecutorService executorService = newSingleThreadScheduledExecutor();
executorService.scheduleAtFixedRate(() -> recreateChangeStream(executorService), 0, 1, TimeUnit.SECONDS);
}
private void recreateChangeStream(ScheduledExecutorService executorService) {
try {
mongoTemplate.getDb().runCommand(new BasicDBObject("ping", "1"));
container.stop();
startNewContainer();
executorService.shutdown();
} catch (Exception ignored) {
}
}
First I am creating a runnable scheduled task that always runs(but only 1 at a time newSingleThreadScheduledExecutor()), I am trying to ping the DB, after a successful ping I am stopping the old container and starting a new one, you can also pass the last timestamp you took so that you can get all events you might have missed
timestamp retrieval from event:
BsonTimestamp resumeAtTimestamp = changeStreamDocument.getClusterTime();
then I am shutting down the task.
also make sure the resumeAtTimestamp exist in oplog...

Mongoose how to listen for collection changes

I need to build a mongo updater process to dowload mongodb data to local IoT devices (configuration data, etc.)
My goal is to watch for some mongo collections in a fixed interval (1 minute, for example). If I have changed a collection (deletion, insertion or update) I will download the full collection to my device. The collections will have no more than a few hundred simple records, so it´s gonna not be a lot of data to download.
Is there any mechanism to find out a collection has changed since last pool ? What mongo features should be used in that case ?
To listen for changes to your MongoDB collection, set up a Mongoose Model.watch.
const PersonModel = require('./models/person')
const personEventEmitter = PersonModel.watch()
personEventEmitter.on('change', change => console.log(JSON.stringify(change)))
const person = new PersonModel({name: 'Thabo'})
person.save()
// Triggers console log on change stream
// {_id: '...', operationType: 'insert', ...}
Note: This functionality is only available on a MongoDB Replicaset
See Mongoose Model Docs for more:
If you want to listen for changes to your DB, use Connection.watch.
See Mongoose Connection Docs for more
These functions listen for Change Events from MongoDB Change Streams as of v3.6
I think best solution would be using post update middleware.
You can read more about that here
http://mongoosejs.com/docs/middleware.html
I have the same demand on an embedded that works quite autonomously, and it is always necessary to auto adjust your operating parameters without having to reboot your system.
For this I created a configuration manager class, and in its constructor I coded a "parameter monitor", which checks the database only the parameters that are flagged for it, of course if a new configuration needs to be monitored, I inform the config -manager in another part of the code to reload such an update.
As you can see the process is very simple, and of course can be improved to avoid overloading the config-manager with many updates and also prevent them from overlapping with a very small interval.
Since there are many settings to be read, I open a cursor for a query as soon as the database is connected and opened. As data streaming sends me new data, I create a proxy for it so that it can be manipulated according to the type and internal details of Config-manager. I then check if the property needs to be monitored, if so, I call an inner-function called watch that I created to handle this, and it queries the subproject of the same name to see what default time it takes to check in the database by updates, and thus registers a timeout for that task, and each check recreates the timeout with the updated time or interrupts the update if watch no longer exists.
this.connection.once('open', () => {
let cursor = Config.find({}).cursor();
cursor.on('data', (doc) => {
this.config[doc.parametro] = criarProxy(doc.parametro, doc.valor);
if (doc.watch) {
console.log(sprintf("Preparando para Monitorar %s", doc.parametro));
function watch(configManager, doc) {
console.log("Monitorando parametro: %s", doc.parametro);
if (doc.watch) setTimeout(() => {
Config.findOne({
parametro: doc.parametro
}).then((doc) => {
console.dir(doc);
if (doc) {
if (doc.valor != configManager.config[doc.parametro]) {
console.log("Parametro monitorado: %(parametro)s, foi alterado!", doc);
configManager.config[doc.parametro] = criarProxy(doc.parametro, doc.valor);
} else
console.log("Parametro monitorado %{parametro}s, não foi alterado", doc);
watch(configManager, doc);
} else
console.log("Verifique o parametro: %s")
})
},
doc.watch)
}
watch(this, doc);
}
});
cursor.on('close', () => {
if (process.env.DEBUG_DETAIL > 2) console.log("ConfigManager closed cursor data");
resolv();
});
cursor.on('end', () => {
if (process.env.DEBUG_DETAIL > 2) console.log("ConfigManager end data");
});
As you can see the code can improve a lot, if you want to give suggestions for improvements according to your environment or generics please use the gist: https://gist.github.com/carlosdelfino/929d7918e3d3a6172fdd47a59d25b150

how to implement high availability service (2 nodes) using zookeeper

i need to provide a H/A mechanism.
i understand thet zookeeper can be chosen as a leader election
i'm looking for the right pattern for this flow:
i need to implement a service that invoke a flow.
when it starting the flow [the flow is a looping flow], it must validate that it is the leader. (say by it's ip address).
i understand that i can put a value into a zookeeper that define that entering
instance, and dispose it when 1 loop end, or for a period of time.
it is that the right pattern?
also it seems that race condition issues if i use something like:
...
...
List<String> names = zk.getChildren( path, false );
String id = null;
// See whether we have already run for election in this process
for ( String name : names ) {
{
if ( name.startsWith( myIP ) ) {
id = name;
break;
}
}
if ( id == null ) {
id = zk.create( path + "/" + myIP, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
}
boolean isLeader = id != null;
for example:
2 services read null
and than 2nd overrides the 1st signature, and both of them running the task.
can you help?
thanks
Using ZooKeeper correctly can be difficult and takes some studying of its APIs and semantics. There are some nice high-level libraries such as Curator that already have implemented common algorithms such as leader election. The ZooKeeper documentation also has a recipe for leader election.

Squeryl - HikariCP - mySql - Distributing Read Traffic to Slaves

I'm trying to follow the steps listed at http://dev.mysql.com/doc/connector-j/en/connector-j-master-slave-replication-connection.html which states
To enable this functionality, use the com.mysql.jdbc.ReplicationDriver
class when configuring your application server's connection pool
From https://github.com/brettwooldridge/HikariCP - it says
HikariCP will attempt to resolve a driver through the DriverManager
based solely on the jdbcUrl
So is this configuration all thats needed?
db.default.url=jdbc:mysql:replication ...
Squeryl has has a number of db Adapters; but my understanding is these are unrelated?
http://squeryl.org/api/index.html#org.squeryl.adapters.MySQLInnoDBAdapter
Sorry for the key word loading - I'm just not too sure where I need to focus
Thanks
Brent
For people hitting this in 2020, Hikari uses
com.mysql.jdbc.jdbc2.optional.MysqlDataSource
as a data source. If I look at the code of the above class. It has a method named connect which returns Connection instance.
protected Connection getConnection(Properties props) throws SQLException {
String jdbcUrlToUse = null;
if (!this.explicitUrl) {
StringBuffer jdbcUrl = new StringBuffer("jdbc:mysql://");
if (this.hostName != null) {
jdbcUrl.append(this.hostName);
}
jdbcUrl.append(":");
jdbcUrl.append(this.port);
jdbcUrl.append("/");
if (this.databaseName != null) {
jdbcUrl.append(this.databaseName);
}
jdbcUrlToUse = jdbcUrl.toString();
} else {
jdbcUrlToUse = this.url;
}
Properties urlProps = mysqlDriver.parseURL(jdbcUrlToUse, (Properties)null);
urlProps.remove("DBNAME");
urlProps.remove("HOST");
urlProps.remove("PORT");
Iterator keys = urlProps.keySet().iterator();
while(keys.hasNext()) {
String key = (String)keys.next();
props.setProperty(key, urlProps.getProperty(key));
}
return mysqlDriver.connect(jdbcUrlToUse, props);
}
where mysqlDriver is an instance of
protected static final NonRegisteringDriver mysqlDriver;
if i check the connect method of NonRegisteringDriver class. It looks like this
public Connection connect(String url, Properties info) throws SQLException {
if (url != null) {
if (StringUtils.startsWithIgnoreCase(url, "jdbc:mysql:loadbalance://")) {
return this.connectLoadBalanced(url, info);
}
if (StringUtils.startsWithIgnoreCase(url, "jdbc:mysql:replication://")) {
return this.connectReplicationConnection(url, info);
}
}
Properties props = null;
if ((props = this.parseURL(url, info)) == null) {
return null;
} else if (!"1".equals(props.getProperty("NUM_HOSTS"))) {
return this.connectFailover(url, info);
} else {
try {
com.mysql.jdbc.Connection newConn = ConnectionImpl.getInstance(this.host(props), this.port(props), props, this.database(props), url);
return newConn;
} catch (SQLException var6) {
throw var6;
} catch (Exception var7) {
SQLException sqlEx = SQLError.createSQLException(Messages.getString("NonRegisteringDriver.17") + var7.toString() + Messages.getString("NonRegisteringDriver.18"), "08001", (ExceptionInterceptor)null);
sqlEx.initCause(var7);
throw sqlEx;
}
}
}
After looking at the code, it looks like it supports. I haven't tried it till now. Will try and let you know from personal experience. From code, it looks directly feasible.
Squeryl offers different MySQL adapters because innodb supports referential keys, while myisam does not. It seems like what your'e doing should be handled at the connection pool level, so I don't think your Squeryl configuration will have an affect.
I've never configured Hikari for replicated MySQL, but if it requires an alternative JDBC driver I'd be surprised if you can provide a JDBC URL and everything just works. I'm guessing that Hikari's default functionality is to pick the plain vanilla MySQL JDBC driver unless you tell it otherwise. Luckily, Hikari has quite a few config options including the ability to set a specific driverClassName.
Replication allows for a different URL:
jdbc:mysql:replication://[server1],[server2],[server2]/[database]
I've never tried it, but I assume this will resolve to the ReplicationDriver.
And I find myself back here - please note, hikari doesn't support the Replication driver.
https://github.com/brettwooldridge/HikariCP/issues/625#issuecomment-251613688
MySQL Replication Driver simply does NOT work together with HikariCP.
And
https://groups.google.com/forum/#!msg/hikari-cp/KtKgzR8COrE/higEHoPkAwAJ
... nobody running anything resembling a mission critical application takes MySQL's driver-level replication support seriously.

Code First - Retrieve and Update Record in a Transaction without Deadlocks

I have a EF code first context which represents a queue of jobs which a processing application can retrieve and run. These processing applications can be running on different machines but pointing at the same database.
The context provides a method that returns a QueueItem if there is any work to do, or null, called CollectQueueItem.
To ensure no two applications can pick up the same job, the collection takes place in a transaction with an ISOLATION LEVEL of REPEATABLE READ. This means that if there are two attempts to pick up the same job at the same time, one will be chosen as the deadlock victim and be rolled back. We can handle this by catching the DbUpdateException and return null.
Here is the code for the CollectQueueItem method:
public QueueItem CollectQueueItem()
{
using (var transaction = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead }))
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
transaction.Complete();
return queueItem;
}
}
catch (DbUpdateException) //we might have been the deadlock victim. No matter.
{ }
return null;
}
}
I ran a test in LinqPad to check that this is working as expected. Here is the test below:
var ids = Enumerable.Range(0, 8).AsParallel().SelectMany(i =>
Enumerable.Range(0, 100).Select(j => {
using (var context = new QueueContext())
{
var queueItem = context.CollectQueueItem();
return queueItem == null ? -1 : queueItem.OperationId;
}
})
);
var sw = Stopwatch.StartNew();
var results = ids.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());
sw.Stop();
Console.WriteLine("Elapsed time: {0}", sw.Elapsed);
Console.WriteLine("Deadlocked: {0}", results.Where(r => r.Key == -1).Select(r => r.Value).SingleOrDefault());
Console.WriteLine("Duplicates: {0}", results.Count(r => r.Key > -1 && r.Value > 1));
//IsolationLevel = IsolationLevel.RepeatableRead:
//Elapsed time: 00:00:26.9198440
//Deadlocked: 634
//Duplicates: 0
//IsolationLevel = IsolationLevel.ReadUncommitted:
//Elapsed time: 00:00:00.8457558
//Deadlocked: 0
//Duplicates: 234
I ran the test a few times. Without the REPEATABLE READ isolation level, the same job is retrieved by different theads (seen in the 234 duplicates). With REPEATABLE READ, jobs are only retrieved once but performance suffers and there are 634 deadlocked transactions.
My question is: is there a way to get this behaviour in EF without the risk of deadlocks or conflicts? I know in real life there will be less contention as the processors won't be continually hitting the database, but nonetheless, is there a way to do this safely without having to handle the DbUpdateException? Can I get performance closer to that of the version without the REPEATABLE READ isolation level? Or are Deadlocks not that bad in fact and I can safely ignore the exception and let the processor retry after a few millis and accept that the performance will be OK if the not all the transactions are happening at the same time?
Thanks in advance!
Id recommend a different approach.
a) sp_getapplock
Use an SQL SP that provides an Application lock feature
So you can have unique app behaviour, which might involve read from the DB or what ever else activity you need to control. It also lets you use EF in a normal way.
OR
b) Optimistic concurrency
http://msdn.microsoft.com/en-us/data/jj592904
//Object Property:
public byte[] RowVersion { get; set; }
//Object Configuration:
Property(p => p.RowVersion).IsRowVersion().IsConcurrencyToken();
a logical extension to the APP lock or used just by itself is the rowversion concurrency field on DB. Allow the dirty read. BUT when someone goes to update the record As collected, it fails if someone beat them to it. Out of the box EF optimistic locking.
You can delete "collected" job records later easily.
This might be better approach unless you expect high levels of concurrency.
As suggested by Phil, I used optimistic concurrency to ensure the job could not be processed more than once. I realised that rather than having to add a dedicated rowversion column I could use the IsLocked bit column as the ConcurrencyToken. Semantically, if this value has changed since we retrieved the row, the update should fail since only one processor should ever be able to lock it. I used the fluent API as below to configure this, although I could also have used the ConcurrencyCheck data annotation.
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<QueueItem>()
.Property(p => p.IsLocked)
.IsConcurrencyToken();
}
I was then able to simple the CollectQueueItem method, losing the TransactionScope entirely and catching the more DbUpdateConcurrencyException.
public OperationQueueItem CollectQueueItem()
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
return queueItem;
}
}
catch (DbUpdateConcurrencyException) //someone else grabbed the job.
{ }
return null;
}
I reran the tests, you can see it's a great compromise. No duplicates, nearly 100x faster than with REPEATABLE READ, and no DEADLOCKS so the DBAs won't be on my case. Awesome!
//Optimistic Concurrency:
//Elapsed time: 00:00:00.5065586
//Deadlocked: 624
//Duplicates: 0