MongoDB connection fails on multiple app servers - mongodb

We have mongodb with mgo driver for golang. There are two app servers connecting to mongodb running besides apps (golang binaries). Mongodb runs as a replica set and each server connects two primary or secondary depending on replica's current state.
We have experienced the SocketException handling request, closing client connection: 9001 socket exception on one of the mongo servers( which resulted in the connection to mongodb from our apps to die. After that, replica set continued to be functional but our second server (on which the error didn't happen) the connection died as well.
In the golang logs it was manifested as:
read tcp 10.10.0.5:37698-\u003e10.10.0.7:27017: i/o timeout
Why did this happen? How can this be prevented?
As I understand, mgo connects to the whole replica by the url (it detects whole topology by the single instance's url) but why did dy·ing of the connection on one of the servers killed it on second one?
Edit:
Full package path that is used "gopkg.in/mgo.v2"
Unfortunately can't share mongo files here. But besides the socketexecption mongo logs don't contain anything useful. There is indication of some degree of lock contention where lock acquired time is quite high some times but nothing beyond that
MongoDB does some heavy indexing some times but the wasn't any unusual spikes recently so it's nothing beyond normal

First, the mgo driver you are using: gopkg.in/mgo.v2 developed by Gustavo Niemeyer (hosted at https://github.com/go-mgo/mgo) is not maintained anymore.
Instead use the community supported fork github.com/globalsign/mgo. This one continues to get patched and evolve.
Its changelog includes: "Improved connection handling" which seems to be directly relating to your issue.
Its details can be read here https://github.com/globalsign/mgo/pull/5 which points to the original pull request https://github.com/go-mgo/mgo/pull/437:
If mongoServer fail to dial server, it will close all sockets that are alive, whether they're currently use or not.
There are two cons:
Inflight requests will be interrupt rudely.
All sockets closed at the same time, and likely to dial server at the same time. Any occasional fail in the massive dial requests (high concurrency scenario) will make all sockets closed again, and repeat...(It happened in our production environment)
So I think sockets currently in use should closed after idle.
Note that the github.com/globalsign/mgo has backward compatible API, it basically just added a few new things / features (besides the fixes and patches), which means you should be able to just change the import paths and all should be working without further changes.

Related

Under which circumstances can documents insert with insert_many not appear in DB

We are using pymongo 3.12 and Python 3.12, MongoDB 4.2. We write results of our task from Celery worker process into MongoDB using pymongo. MongoClient is not instantiated each time, thus, we use connection pooling and reuse the connections. There are multiple instances of Celery workers competing for running a job, so our Celery server has multiple connections to MongoDB.
The problem: sometimes, results of that particular operation are not in MongoDB, however, no error is logged, and our code captures all exceptions, so it looks like no exception ever happens. We use plain insert_many with default parameters, which means our insert is ordered and any failure would trigger an error. Only this particular operation fails, others that read or write data from/to the same or another MongoDB instance work fine. Problem can be reproduced in different systems. We have added maxIdleTimeMS parameter to MongoDB connection in order to close idle connections, but it did not help.
Is there a way to tell programmatically which local port is used by pymongo connection which is going to serve my request?

Why does my mongoDB account have 292 connections?

I only write data into my mongoDB database once a day and I am not currently writing any data into it but there have been a consistent 292 connections into my database for the past three hours. No reads or writes, just connections and a consistent 29 commands per second since this started.
Concerned by this, I adjusted settings to only allow access from one specific IP, and changed all my passwords but the number hasn't changed, still 292 connections and 29 commands per second. Any idea what is causing this or perhaps how I can dig in further?
The number of connections depends on the cluster setup. A connection can be external (e.g. your app or monitoring tools) or internal (e.g. to replicate your data to secondary nodes or a backup process).
You can use db.currentOp() to list the active connections.
Consider that you app instance(s) may not open just 1 connection, but several, depending on the driver that connects to the DB and how it handles connection pooling. The connection pool size can be thought of as the max number of concurrent requests that your driver can service. For example, the default connection pool size for the Node.js MongoDB driver is 5. If you have set a high pool size, either with the driver or connection string, your app may open many connections to concurrently process the write commands.
You can start by process of elimination:
Completely cut your app off from the DB. There is a keep-alive time, so connections won‘t close immediately unless the driver closes them formally. You may have to wait some time, depending on the keep-alive setting. You can also restart your cluster and see how many connections there are initially.
Connect you app to the DB and check how the connection number changes with each request. Check whether your app properly closes connections to the DB at some point after opening them.

Broken connection after Mongod restart

I'm using the Mongo Java driver (2.8.0) to connect to a Mongo instance.
I noticed that if I restart mongod, then the first operation after the restart (even a simple count()) always fails with EOFException or a Broken pipe.
I'm using the following Mongo options:
opts.autoConnectRetry = true;
opts.maxAutoConnectRetryTime = 2000L;
opts.connectTimeout = 30000;
opts.socketTimeout = 60000;
Is there a way to tell the driver to try to re-establish the connections? I thought that "autoReconnectRetry" will do that, but that only works after the connection is "discovered" (through a single failed operation) to be broken.
The AutoConnectRetry option will retry when opening a connection to the server, but doesn't guarantee you won't get a read exception. You still need to handle exceptions in your application and retry if appropriate.
Blurb from the docs:
If true, the driver will keep trying to connect to the same server in case that the socket cannot be established. There is maximum amount of time to keep retrying, which is 15s by default. This can be useful to avoid some exceptions being thrown when a server is down temporarily by blocking the operations. It also can be useful to smooth the transition to a new master (so that a new master is elected within the retry time). Note that when using this flag: - for a replica set, the driver will trying to connect to the old master for that time, instead of failing over to the new one right away - this does not prevent exception from being thrown in read/write operations on the socket, which must be handled by the application. Even if this flag is false, the driver already has mechanisms to automatically recreate broken connections and retry the read operations. Default is false.

How the primary server down will be handled automatically in mongodb replication

I never have my hands on coding. I got a doubt regarding mongodb replica sets
below is the situation
I have an alert monitoring application.
It is using mongodb with replica set with 3 nodes.
Applications Java code base keep connecting to the primary and doing some transactions.
Now my question is that,
if the primary server is down, how will it effect the application server.
I mean, would the app server writes error saying connection failed like errors.
OR
the replica set will pick one of the slaves automatically as master and provides the application server to do its activity. How will it happen...?
Thanks & Regards,
UDAY
The replica set will try to pick another server as the new primary. If you have three nodes, and one goes down, the other two will negotiate which one becomes the new master. If two go down, or somehow communication between the remaining breaks down, there will be no new master until the situation is recovered.
The official drivers support this automatic fail-over, as does the mongos routing server if you use it. So the application code does not need to do anything here.
I am not sure if there will be connection errors during the brief period of time this fail-over negotiation takes (you will probably get errors for a few seconds).

Too many open mongoDB connections when using Celery

I'm using Celery to download feeds and resize images. The feeds and image paths are then stored in MongoDB using mongoengine. When I check current connections (db.serverStatus()["connections"]) after running the tasks I have between 50-80 "current" connections, which remain open until I shutdown celeryd. Has anyone experienced this issue and/or do you know what I can do to solve it?
Thanks,
Kenzic
This just means that there are between 50 and 80 connections open to the MongoDB server, and isn't cause for concern. PyMongo (and therefore MongoEngine) maintain an internal pool of connections (that is, sockets) to mongod, so even when nothing is happening (no active queries, commands, etc), the connections remain open to the database for the next time they will be used. By default, PyMongo attempts to retain no more than 10 open connections per Connection object.
Are you experiencing any specific problems due to the number of open connections?