I'm using the Mongo Java driver (2.8.0) to connect to a Mongo instance.
I noticed that if I restart mongod, then the first operation after the restart (even a simple count()) always fails with EOFException or a Broken pipe.
I'm using the following Mongo options:
opts.autoConnectRetry = true;
opts.maxAutoConnectRetryTime = 2000L;
opts.connectTimeout = 30000;
opts.socketTimeout = 60000;
Is there a way to tell the driver to try to re-establish the connections? I thought that "autoReconnectRetry" will do that, but that only works after the connection is "discovered" (through a single failed operation) to be broken.
The AutoConnectRetry option will retry when opening a connection to the server, but doesn't guarantee you won't get a read exception. You still need to handle exceptions in your application and retry if appropriate.
Blurb from the docs:
If true, the driver will keep trying to connect to the same server in case that the socket cannot be established. There is maximum amount of time to keep retrying, which is 15s by default. This can be useful to avoid some exceptions being thrown when a server is down temporarily by blocking the operations. It also can be useful to smooth the transition to a new master (so that a new master is elected within the retry time). Note that when using this flag: - for a replica set, the driver will trying to connect to the old master for that time, instead of failing over to the new one right away - this does not prevent exception from being thrown in read/write operations on the socket, which must be handled by the application. Even if this flag is false, the driver already has mechanisms to automatically recreate broken connections and retry the read operations. Default is false.
Related
We have mongodb with mgo driver for golang. There are two app servers connecting to mongodb running besides apps (golang binaries). Mongodb runs as a replica set and each server connects two primary or secondary depending on replica's current state.
We have experienced the SocketException handling request, closing client connection: 9001 socket exception on one of the mongo servers( which resulted in the connection to mongodb from our apps to die. After that, replica set continued to be functional but our second server (on which the error didn't happen) the connection died as well.
In the golang logs it was manifested as:
read tcp 10.10.0.5:37698-\u003e10.10.0.7:27017: i/o timeout
Why did this happen? How can this be prevented?
As I understand, mgo connects to the whole replica by the url (it detects whole topology by the single instance's url) but why did dy·ing of the connection on one of the servers killed it on second one?
Edit:
Full package path that is used "gopkg.in/mgo.v2"
Unfortunately can't share mongo files here. But besides the socketexecption mongo logs don't contain anything useful. There is indication of some degree of lock contention where lock acquired time is quite high some times but nothing beyond that
MongoDB does some heavy indexing some times but the wasn't any unusual spikes recently so it's nothing beyond normal
First, the mgo driver you are using: gopkg.in/mgo.v2 developed by Gustavo Niemeyer (hosted at https://github.com/go-mgo/mgo) is not maintained anymore.
Instead use the community supported fork github.com/globalsign/mgo. This one continues to get patched and evolve.
Its changelog includes: "Improved connection handling" which seems to be directly relating to your issue.
Its details can be read here https://github.com/globalsign/mgo/pull/5 which points to the original pull request https://github.com/go-mgo/mgo/pull/437:
If mongoServer fail to dial server, it will close all sockets that are alive, whether they're currently use or not.
There are two cons:
Inflight requests will be interrupt rudely.
All sockets closed at the same time, and likely to dial server at the same time. Any occasional fail in the massive dial requests (high concurrency scenario) will make all sockets closed again, and repeat...(It happened in our production environment)
So I think sockets currently in use should closed after idle.
Note that the github.com/globalsign/mgo has backward compatible API, it basically just added a few new things / features (besides the fixes and patches), which means you should be able to just change the import paths and all should be working without further changes.
I run 3 processes at the same time , all of them are using the same DB (RDS postgres)
all of them are java application that uses JDBC for connecting to the DB
I am using PGPoolingDataSource in every process as a connection pool for the DB
every request is handled by the book - ended with
finally{
connection.close();
}
main problems:
1.
I ran out of connections really fast because I do a massive work
with the DB at the beginning (which is ok) but the pool never
shrinks.
I get some exceptions in the code because there are not enough connections and I wish I could expand the timeout when a requesting
a connection.
My insights:
the PGPoolinDataSource never shrinks by definition! I couldn't find any documentation about it, but I assume this is the case. So I tried the apache DBCP pool and again I am having the same problem .
I think there must be timeout when waiting for a connection - I would guess that this timeout can be configured, but I couldn't find this configuration on both pools .
My Questions:
why does the pool never shrinks?!
how to determine how many connections to allocate for each pool\process (here every process has one pool)
what happens if I don't close the pool (not the connections) and the app is dead does the connections on the pool are still alive? this happens a lot when I update the application I stop and start it so I never close the pool.
what would be a good JDBC connection pool that works best with postgres and that has an option to set the timeout for the getConnection ?
Thanks
Our MongoDB setup uses three replica set shards. Each webserver runs a mongos instance locally, and the client node.js processes connect through that using Mongoose (3.6.20) and node-mongodb-native. So node-mongodb-native just connects to mongos on localhost.
When a replica set primary goes down hard (we can simulate this by doing 'ifdown eth0' on the primary) mongos properly detects this, and also detects that a new primary has been elected. So far, so good. But node-mongodb-native's connections to the mongos instance are still open but not functional, and a restart of the node procs is required.
Our assumption was that mongos would just kill any established connections to the dead primary and node-mongodb-native would reconnect, but that seems to not be the case; both the server and the OS think these connections are open. By contrast, on primary stepDown, the clients fail over fine, connections are closed and reopened.
We are looking at socketTimeoutMS, but that seems incorrect since it causes disconnects for connections that are merely idle.
Are we missing configuration to our client or mongos, or do we have to implement our own pinging?
Based on experimentation and the following MongoDB bug, this appears to just be a shortcoming of mongos (or, if you prefer, of the client libraries) at this point. Right now it looks like 'write your own pinging logic in your app and trigger a reconnect when that fails', so that's what we are doing.
https://jira.mongodb.org/browse/SERVER-9041
I'm upgrading a sharded cluster and want to turn one of three mongos instance off. I've guaranteed that new incoming connections will not take place because I disabled the box in my load balancer. However, I'm concerned there might be existing connections on the mongos instance still active.
I've run the following on the Mongo instance:
db._adminCommand("connPoolStats");
Do you have any tis on interpreting the result? Is this the correct command?
The cursorInfo command should work. If there are no more cursors, then it's ok to shut off the mongos. Any connections that still exist will simply fail over to another mongos through the load balancer when they try to reconnect (assuming they have an appropriate reconnection policy in place). The only thing you need to worry about is cursors, since they have state, which is taken care of by cursorInfo.
I never have my hands on coding. I got a doubt regarding mongodb replica sets
below is the situation
I have an alert monitoring application.
It is using mongodb with replica set with 3 nodes.
Applications Java code base keep connecting to the primary and doing some transactions.
Now my question is that,
if the primary server is down, how will it effect the application server.
I mean, would the app server writes error saying connection failed like errors.
OR
the replica set will pick one of the slaves automatically as master and provides the application server to do its activity. How will it happen...?
Thanks & Regards,
UDAY
The replica set will try to pick another server as the new primary. If you have three nodes, and one goes down, the other two will negotiate which one becomes the new master. If two go down, or somehow communication between the remaining breaks down, there will be no new master until the situation is recovered.
The official drivers support this automatic fail-over, as does the mongos routing server if you use it. So the application code does not need to do anything here.
I am not sure if there will be connection errors during the brief period of time this fail-over negotiation takes (you will probably get errors for a few seconds).