mongodb i/o timeout when using clustered mongo instances - mongodb

I have an application that is using the package for communication with a Mongo database server (which is a fairly simple wrapper around The way the application works is that it creates a session in the main thread on start-up, and then each individual go routine that needs to make requests to the mongo server calls Clone on the session and does a defer session.Close on the resulting value. As far as I can tell, this is all standard operating procedure.
This setup works without any errors in our development environments where we are either using a locally run MongoDB or a sandbox instance on MongoLab. Recently we promoted the application up to our staging environment where we have the application talking to a Shared Cluster instance of MongoDB on MongoLab (the cheapest 15$ option). This is where the weirdness starts happening. The /first/ request that goes through (from the first go-routine that gets invoked) comes back with the expected response, but the subsequent ones all return
read tcp <ip address>:47112: i/o timeout
This happens both from our local development machines pointed at the cluster or from the AWS host for the staging environment. Since the Mongo cluster is from Mongolabs I am going to assume that they've configured everything correctly on their end.
The code is somewhat boring TBH: It literally just opens the session in the main function and maintains a reference to it, and then there are multiple goroutines with this basic structure:
sess := session.Clone()
defer sess.Close()
// make requests to Mongo
During testing, I even restricted it to run only one thing at once (i.e. only one goroutine is active at any given time), and it still fails in the same fashion.
Has anybody run into this before? Do I need to configure in a specific fashion? Maybe use mgo directly? I am at my wits end with this :(

In a rather long and grueling process, we finally tracked down where this issue and similar ones like it came from in our program. It ended up being a session leak in the v1 version of the library. The bug and fix are outlined here, but the v1 version of this library is horribly outdated at this point and the later versions do not exhibit this issue.
I doubt this answer will be useful for anybody so late in the game (especially since we ourselves solved it like.. 3 years ago at this point), but just wanted to leave the answer here for completeness.


Two master instances on same database

I want to use Postgresql in Windows Server 2012 R2 for one our project where it can be 24/7 uptime.
I would like to ask the community if I can have 2 master instances in 2 different servers A&B and they will 'work' on the same DB located in a shared file storage in lan. Always one master instance on server A will be online and when it goes offline for some reason (I suppose) a powershell script will recognize that the postgresql service stopped and will start the service in server B. The same script will continuous check that only one service in servers A & B is working to avoid conflicts.
I'd like to ask if this is possible or a better approach for my configuration.
(I can't use replication because when server A shuts down the server B is in read-only mode thing that I don't want)
If you manage to start two instances of PostgreSQL on the same data directory, serious data corruption will happen.
Normally there is a file that prevents that, but a PostgreSQL server process on a different machine that accesses the same file system will happily unlink that after spewing some log messages, thinking it was left behind from a crash.
So you are really walking on thin ice with a solution like that.
One other issue that you didn't think of is that script that is supposed to check if the server is still running. What if that script fails, because for example the network connection between the two servers is down, but the server is still up an running happily? Such a “split brain” scenario will cause data corruption with your setup.
Another word of caution: since you seem to be using Windows (Powershell?), you probably envision a CIFS file system when you are talking of shared storage. A Windows “network share” is not a reliable file system — last time I checked, it did not honor _commit.
Creating a reliable failover cluster is harder than you think, and I'd recommend that you check existing solutions before you try to roll your own.

MongoDB sometimes stops

I have a Linux Server (CentOS) with 32GB RAM.
I installed the MongoDB with a Java application. But sometimes, the MongoDB stops to work. So I need to restart it.
I already used the ulimit linux command to change te open files limit to 64000, but the problem still happens.
I'd like to know if somebody have some experience with MongoDB and can give me some tip about this problem.
If you're getting "too many connections," it's quite possible you're opening too many MongoClients. In general, you need only open the one client and its internal connection pool will manage everything for you. This isn't always possible, though, so you'll want to make sure you properly manage the scope of each MongoClient and call close() on it when you're done.

MongoDB replica set in Azure "Waiting for role to start... Calling OnRoleStart()"

I have a problem trying to implement a mongodb replica set as a worker role instance in Windows Azure. In the Windows Azure portal, one of the instances is shown as busy with the status:
Waiting for role to start... Calling OnRoleStart()
I have checked all the settings and everything seems to be ok, what could the problem be?
Denis Markelov's blog post helped me solve this problem. The solution is mainly his, however I had to take an extra step to get it to work and thought others might find it useful.
Solution from blog:
Windows Azure reuses virtual machines for roles, so after a fresh
deployment on a hard drive you can find files that were created during
previous sessions. If MongoDB was terminated improperly - there might
be a lock file ("persisted mutex" analogue), because of which MongoDB
refuses to start. It is located at the drive with a label
"WindowsAzureDrive" (say it is F:), at the path:
In the case of a production use this situation might require a
recovery procedures, but if you are just in the process of initial
setup - it is safe to remove this file, letting MongoDB to start
I was having this problem and did as suggested, however I was still having the same problem. So I took a look at the log file at
And saw that another file was also giving an error. In order to fix the problem, you also have to delete the file local.ns in the same directory as mongod.lock.

Get Chef to execute a mongodb script after mongodb has started

We're currently using chef to provision our servers and we want our recipe/cookbook to automatically add some data to the mongo database once its installed and running.
This is where we start to run into problems. We were using an execute resource to run the mongo script like this:
execute "install-mongodb-config" do
command "mongo #{node[:mongodb][:mongo_db_host]}/#{node[:mongodb][:mongo_db]} \"#{node[:mongodb][:mongo_add_config_script]}\""
action :run
This part of the recipe always failed no matter what we tried! I won't get into the details of everything we tried here (unless i need to) but lets just say that i've exhausted all possibilities of subscribes and notifies (i think).
The problem originates from the fact that we are using the mongodb::10gen_repo to install mongodb. The recipe exits when apt-get installs the package and then chef continues on to execute more resourses.
We have tried executing the above resource directly after mongodb::10gen_repo but it doesn't seem like mongodb is available and the mongo shell cannot connect and run the script. The error we see is somewhat like this:
MongoDB shell version: 2.0.2
Thu Sep 6 18:40:45 ReferenceError: setTimeout is not defined mongotest.js:2
failed to load: mongoAddConfig.js
Nothing we have tried has been able to get around this in a nice chef way. The thing that we resorted to was to replace the execute resource with the following:
execute "install-mongodb-config" do
command "sleep 60; mongo #{node[:mongodb][:mongo_db_host]}/#{node[:mongodb][:mongo_db]} \"#{node[:mongodb][:mongo_add_config_script]}\""
action :run
Which just makes the command sleep for 60 seconds before the mongo script is run. I know this isn't the Right way to do this but it works for now.
Can anyone suggest the Right way to do this? I have a feeling that I will need to talk to the guys that created the mongodb chef script and request a feature!
First of all. Remove this "sleep 60". This can be done by chef: All resources have common attributes and "retries" and "retry_delay" are part of them. So the easiest way would be:
execute "install-mongodb-config" do
command "mongo some_command"
action :run
retries 6
retry_delay 10
If you have more than 2-3 places, where you have to run some command on mongo database, consider creating LWRP, similar to one created in this mongodb cookbook. (Particularly check the libraries/mongodb.rb file). You can hide the logic that waits for the server to respond there.
Is it important that the same Chef run that installs the software also injects the initial configuration? The 'chefly' method to constructing cookbooks and recipes is to guard against idempotency in order to ensure that they can be run over and over again without producing unintended results.
In this particular case, I would limit the first recipe to only just installing and starting up mongodb. This recipe would do nothing if it saw that mongodb was already running on the host. Then, I'd have another recipe that would run only if it saw that mongo had been setup and was running. It would query the mongodb to see if the initial configuration had been done. If so, it would simply return. If not, it would run your configuration routine.
In this way, these recipes could run all the time, anytime, on your machine. Even if someone uninstalled mongodb, chef would get around to ensuring that it was set back up again and pristine.
So, I don't know much at all about chef. But your problem seems to be that you try to immediately connect after bringing the server up.
Server's are not immediately available when you bring them up since there is a bit of overhead that goes into electing a primary, getting all the server status's etc.
You can recreate this without chef by trying to bring up a replica set and immediately trying to connect to it in a simple script. So it's not chef specific.
Not sure if there is a way around the server startup lag since bringing up a primary is expected to be a relatively infrequent occurrence compared to just adding nodes to a set.
The only potential solution I see that is cleaner is adding a longer Timeout for the connection to be formed in the configuration. You can find how to do this in the mongodb documentation here:
The flag of interest for you is likely connectTimeoutMS

Database migrations: manage with build script or automatic on app startup?

I'm in the process of developing a deployment system for a new web app and I'm wondering where the best point in the process to manage database migrations is (the question of how to do the migrations is another problem entirely).
It seems there are two ways to go:
Use a migration script that can
either be run manually from command
line or as part of the automatic
deployment/build process
Run the migrations when the app
starts up (I'm using ASP.NET so this
can be done easily enough without
causing a long-running user request)
Does anyone have any suggestions/insight/experience with these approaches? Any other suggestions?
I can see why #1 might be more attractive - it gives me complete control over when the DB is updated. However, I quite like #2 as it allows me to quickly iterate between deployments and reduces the manual process. #2 could also be used on my development machine to allow even quicker iterations. Hmm, starting to think having both might be a good thing...
We have a sales-force system with ~100 client and we are updating database at application startup (True, our is a desktop application.) I like this approach, it's safe and iterative if we have indeterministic startpoint (is the client database new or only updated to verison x.y.z?).
But at serverside I'm preferr your #1 option: we create a SQL query file on our virtual machine (based on the copy of the original database) and runs this query against the real server.
Disconnected clients: startup, iterative scripts
Server: query created on VM based on the actual and real database
So I'm interrested in this problem too, and find some (half)frameworks as RikMigrations. After some googling there is a good startplace about DB versioning/migration frameworks: .NET Database Migration Tool Roundup. Not neccessarely the documentation but the team blogs can be interresting.
I like option #1 better as it seems much more flexible. In lieu of actually performing migrations on each app start, I think I would verify that the database schema (version number?) matches the code, and if not, throw a warning or error about a mismatched database schema.
I'd prefer option #1 for a number of reasons. First, integration tests usually require your DB schema to be up-to-date, and launching a web-site to upgrade the schema will be a huge timewaster. Second, you cannot change database schema while your site is running (say, add a couple of indexes to speed things up).
As for production side of things, upgrading your database in transaction MSI-style installation is much better than attempting to upgrade at each app startup since you can potentially end up with desynchronized database-application versions.
And if you're looking for the migration framework, take a look at Wizardby.
If the application ever has to run on a customer's machine than migrating at startup can prevent a lot of support calls - assuming you can do seamless migration without user intervention (I hope you aren't normally running your web app with permission to modify the database).
If the application always runs under your control automatic migration is less of an issue - but still can be a good feature, especially if you want to minimize downtime and manual deployment steps.