MongoS sharding metadata manager failed asking for instance is manually reset - mongodb

My MongoS servers are not staring they are sending this error in logs.
SHARDING [Balancer] caught exception while doing balance: Server's
sharding metadata manager failed to initialize and will remain in
this state until the instance is manually reset :: caused by ::
HostNotFound: unable to resolve DNS for host confserv_1.xyz.com
2016-05-02T17:57:06.612+0530 I SHARDING [Balancer] about to log metadata event into actionlog: { _id: "DB2255-2016-05-02T17:57:06.611+0530-5727479aa1051c5fb04fcc49", server: "mongoS1", clientAddr: "", time: new Date(1462192026611), what: "balancer.round", ns: "", details: { executionTimeMillis: 35, errorOccured: true, errmsg: "Server's sharding metadata manager failed to initialize and will remain in this state until the instance is manually reset :: caused by :: HostNotFoun..." } }
When I connect config server using host name it is working fine.
I tried to restart MongoS server it is not coming up.
I check Mongo code and found this error mentioned in
https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/sharding_state.cpp
/ TODO: remove after v3.4.
// This is for backwards compatibility with old style initialization through metadata
// commands/setShardVersion. As well as all assignments to _initializationStatus and
// _setInitializationState_inlock in this method.
if (_getInitializationState() == InitializationState::kInitializing) {
auto waitStatus = _waitForInitialization_inlock(deadline, lk);
if (!waitStatus.isOK()) {
return waitStatus;
}
}
if (_getInitializationState() == InitializationState::kError) {
return {ErrorCodes::ManualInterventionRequired,
str::stream() << "Server's sharding metadata manager failed to initialize and will "
"remain in this state until the instance is manually reset"
<< causedBy(_initializationStatus)};
}
But it does not mention anything what manual intervention is required.
Current Mongo version is 3.2.6

I just ran into this problem while trying to harden the security configuration. As in your case, I was able to connect to the config servers from all mongos instances.
In my case I was also testing a case with members of replica sets being in different datacenters, and I had the problem only after steppingDown some primaries.
I noticed at the end that, not as the error message is pretending, the issue was happening on some primaries of one datacenter, who were not able to route back to the config server. After fixing the routing problem (/etc/hosts eventually), no more problems occurred on the mongo side.

Related

Unable to connect to MongoDb using MongoClientSettings as parameter to MongoClient

I am developing a C# MVC Web API which uses MongoDb as backend.I tried connecting to my mongodb database using
MongoClient mongoClient = new MongoClient(connectionString)
where connectionstring is in format : mongodb://Username:Password#hostname.eastus.cloudapp.azure.com
Mongo db is hosted in a virtual machine in Azure.I am able to connect to the database and all works good.But I am getting frequent exceptions:
"MongoDb.driver.MongoConnectionException".An exception occurred while
receivinf a message from server--->System.IO.IOException:Unable to
read data from the transport connection : A connection attempt failed
because the connected party did not properly respond after a period of
time,......"
So after a bit of research I have learnt that Azure is killing idle connections and I have to set MaxConnectionIdleTime.
In order to set MaxConnectionIdleTime I decided to connect to Mongodb in the below way
var credential = MongoCredential.CreateCredential("dbname", "UserName", "Password");
var settings = new MongoClientSettings
{
Credentials = new[] { credential },
Server = new MongoServerAddress("HostName", 27017),
MaxConnectionIdleTime = new TimeSpan(0, 3, 0)
};
MongoClient mongoClient = new MongoClient(settings);
In this case I am using the same username,password combination given in the connection string which I used to connect before.
While trying to connect here I am getting inner Exception:
MongoDB.Driver.MongoAuthenticationException: "Unable to authenticate
using sasl protocol mechanism SCRAM-SHA-1".
"MongoDb.driver.MongoConnectionException".An exception occurred while receivinf a message from server--->System.IO.IOException:Unable to read data from the transport connection : A connection attempt failed because the connected party did not properly respond after a period of time,....
The reason behind this exception is when hosted in Azure,Azure tries to kill the idle connections but the C# driver is not aware of this.The driver tries to execute queries on the killed connections without knowing the connection is not existing.
The solution that worked out for me is to set maxIdleTimeMS=45000 in connection string.
This way driver will not use a connection which has been idle for long time.
Here is the connection string that worked out for me
connectionString="Username:Password#hostname.eastus.cloudapp.azure.com/?connectTimeoutMS=30000&socketTimeoutMS=30000&waitQueueTimeoutMS=30000&maxIdleTimeMS=45000"
I have had a similar error with my Azure hosted MongoDb (Cosmos Db). It turned out to be network settings such that I had blocked all access. Changing it to Allow access from "All networks" fixed the issue.
The error is very misleading, I would have expected a connection timeout.
A timeout occured after 30000ms selecting a server using
CompositeServerSelector{ Selectors =
MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector,
LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000
} }. Client view of cluster state is { ClusterId : "1", ConnectionMode
: "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers :
[{ ServerId: "{ ClusterId : 1, EndPoint :
"Unspecified/XXXX.documents.azure.com:10255" }", EndPoint:
"Unspecified/XXXX.documents.azure.com:10255", State: "Disconnected",
Type: "Unknown", HeartbeatException:
"MongoDB.Driver.MongoConnectionException: An exception occurred while
opening a connection to the server. --->
MongoDB.Driver.MongoAuthenticationException: Unable to authenticate
using sasl protocol mechanism SCRAM-SHA-1. --->
MongoDB.Driver.MongoCommandException: Command saslContinue failed: Not
Authenticated.
To troubleshot I tried from MongoDb Compass as well and that didn't work either, showing me it wasn't the code.

Check mongo status on Meteor?

I'm trying to create an alarm system for my application, that will trigger when one of the services (e.g. MongoDB) is not working.
What I'm doing is, once the application is started, I shut down my MongoDB server and try to connect to it, but instead of receiving an error my application just gets stuck into the execution of the method. The server console looks like something is in execution.
My current code (coffeescript) is:
checkMongoService: ()->
mongo = Npm.require 'mongodb'
assert = Npm.require 'assert'
url = 'mongodb://....'
mongo.connect url, (err, db) ->
assert.equal null, err
console.log 'Connected correctly to server'
db.close()
return
I've also been trying by doing a simple
Meteor.users.find().count();
or using MongoInternals with
testConnection = new MongoInternals.RemoteCollectionDriver("mongodb://...);
but still same issue, when mongo is not running no error is thrown and the console stops to work. If then I start Mongo again, it will just return the result (in this case the log 'Connected correctly to server')
Something that I've noticed is if I try with meteor shell to execute testConnection = new MongoInternals.RemoteCollectionDriver("mongodb://...); I get an error "Error: failed to connect to [127.0.0.1:27017]"
TL;DR
Do you might have an idea on how I can check if mongo is reachable or do you know if I'm doing something wrong with the code above?
Try setting the timeouts to be a bit shorter than the default 30 seconds:
mongo.connect(url, {
connectTimeoutMS: 1000,
socketTimeoutMS: 1000,
reconnectTries: 1
}, function(err, db) {...}
(Full set of connection params are here)
Meteor.status().status
from the docs
This method returns the status of the connection between the client and the server. The return value is an object with the following fields:
connected (Boolean)
True if currently connected to the server. If false, changes and method invocations will be queued up until the connection is reestablished.
status (String)
Blockquote
Describes the current reconnection status. The possible values are connected (the connection is up and running), connecting (disconnected and trying to open a new connection), failed (permanently failed to connect; e.g., the client and server support different versions of DDP), waiting (failed to connect and waiting to try to reconnect) and offline (user has disconnected the connection).
https://docs.meteor.com/api/connections.html

mongodb grails simple application times out

I'm having an issue with mongodb 2.6.5 and grails 2.4.4 that I can't resolve. For the sake of isolating the problem I created a simple 2.4.4 grails app, installed the grails mongodb plugin (compile ":mongodb:3.0.2"), commented out the hibernate dependencies, added my mongodb datasource, and set up a simple domain class (com.nerds.Nerd). When I generate-all and then start the app and navigate to the NerdController CRUD page I get the following error every time:
MongoTimeoutException occurred when processing request: [GET] /MONGO/nerd/index
Timed out while waiting to connect after 10000 ms. Stacktrace follows:
com.mongodb.MongoTimeoutException: Timed out while waiting to connect after 10000 ms
I can access mongo via http using http://localhost:28017/
I have also tested manually adding data and querying from mongo. This all works fine.
In the debug log prior to the timeout it looks like GORM aquired a mongo session and then tried rolling back a transaction.
DatastoreTransactionManager:128 - Found thread-bound Session [org.grails.datastore.mapping.mongo.MongoSession#e47ee6] for Datastore transaction
DatastoreTransactionManager:128 - Creating new transaction with name [null]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT,readOnly
DatastoreTransactionManager:128 - Initiating transaction rollback
DatastoreTransactionManager:128 - Rolling back Datastore transaction on Session [org.grails.datastore.mapping.mongo.MongoSession#e47ee6]
DatastoreTransactionManager:128 - Resuming suspended transaction after completion of inner transaction
Any insight would be helpful. Thanks
edit: The mongo datasource is pretty simple. I'm using the correct port.
From the mongo log:
014-11-18T13:10:13.388-0900 [initandlisten] MongoDB starting : pid=17275 port=27017 dbpath=/var/lib/mongodb 32-bit host=enterprise
from DataSource.groovy
grails { mongo { host = 'localhost' port = 27017 databaseName = 'mydb' } }
I'm fairly certain the issue was on the mongod side. I stopped the mongo daemon, put it into high verbose debug mode (using mongod -vvvv command), and when I tried to replicate the issue while watching the console output, the issue did not happen. I'm not entirely sure what the exact cause of the timeout was, but its not happening now. Thanks for the responses.

Sail.js multiple connections on start

I've got an odd problem - on start of my sails app (which is connecting with postgres and deployed on heroku ) there are multiple connections (around 10) to database, and since it's free account, if I then try to launch app on localhost to test some new code I get an error "too many connections for a role". So does anyone know why there are so many connections to database and can I change it, to have only one connection per app?
EDIT:
Error creating a connection to Postgresql: error: too many connections for role
"xwoellnkvjcupt"
Error creating a connection to Postgresql: error: too many connections for role
"xwoellnkvjcupt"
error: Hook failed to load: orm (error: too many connections for role "xwoellnkv
jcupt")
error: Error encountered while loading Sails core!
error: error: too many connections for role "xwoellnkvjcupt"
at Connection.parseE (C:\Studia\szachman2\node_modules\sails-postgresql\node
_modules\pg\lib\connection.js:561:11)
at Connection.parseMessage (C:\Studia\szachman2\node_modules\sails-postgresq
l\node_modules\pg\lib\connection.js:390:17)
at null. (C:\Studia\szachman2\node_modules\sails-postgresql\node_
modules\pg\lib\connection.js:98:18)
at CleartextStream.EventEmitter.emit (events.js:95:17)
at CleartextStream. (_stream_readable.js:746:14)
at CleartextStream.EventEmitter.emit (events.js:92:17)
at emitReadable_ (_stream_readable.js:408:10)
at _stream_readable.js:401:7
at process._tickDomainCallback (node.js:459:13)
this is an error I am getting often when trying to test some new code on localhost.
#jantar #sgress454 I just added a troubleshooting message in sails-postgresql to try and make this better. Here's what it says:
-> Maybe your poolSize configuration is set too high? e.g. If your Postgresql database only supports 20 concurrent connections, you should make sure you have your poolSize set as something < 20. The default poolSize is 10.
To override the default poolSize, specify a poolSize property on the relevant Postgresql "connection" config object. If you're using Sails, this is generally located in config/connections.js, or wherever your environment-specific database configuration is set.
-> Do you have multiple Sails instances sharing the same Postgresql database? Each Sails instance may use up to the configured poolSize # of connections. Assuming all of the Sails instances are just copies of one another (a reasonable best practice) we can calculate the actual # of Postgresql connections used (C) by multiplying the configured poolSize (P) by the number of Sails instances (N). If the actual number of connections (C) exceeds the total # of AVAILABLE connections to your Postgresql database (V), then you have problems. If this applies to you, try reducing your poolSize configuration. A reasonable poolSize setting would be V/N.
This is due to Sails's auto-migration feature which attempts to keep your models and database synced up. It's not intended to be used in production. You can turn auto-migration off on a single model by adding migrate: safe to the model definition:
module.exports = {
migrate: 'safe',
attributes: {...}
}
You can turn auto-migration off for all models by adding a model config, usually in your config/locals.js:
module.exports = {
model: {
migrate: 'safe'
},
environment: 'production',
...other local config...
}
A little update for the V1. Your adapter in config/datastore.js should look like this if you want to set a maximum size for the connection pool :
{
adapter: 'sails-postgresql',
url: 'yourconnectionurl',
max: 1 // This is the important part for poolSize, I set 1 because I don't want more than 1 connection ^^
}
If you want to know all infos you can set, look here : https://github.com/sailshq/machinepack-postgresql/blob/176413efeab90dc5099dc60718e8b520942ce3be/machines/create-manager.js , at line 162 :
// Basic:
'host', 'port', 'database', 'user', 'password', 'ssl',
// Advanced Client Config:
'application_name', 'fallback_application_name',
// General Pool Config:
'max', 'min', 'refreshIdle', 'idleTimeoutMillis',
// Advanced Pool Config:
// These should only be used if you know what you are doing.
// https://github.com/coopernurse/node-pool#documentation
'name', 'create', 'destroy', 'reapIntervalMillis', 'returnToHead',
'priorityRange', 'validate', 'validateAsync', 'log'

SocketException in Mongo

I just set up a replica set in Mongo (prod environment). I'm now getting a lot of exceptions like below (clipped).
I went into mongo and ran a serverStatus command on my primary mongo node and only have about 300 connections going, so it's hardly working.
Below are my connection option settings in my server code:
auto_connect_retry = false
connections_per_host = 10
threads_multiplier = 10
max_wait_time = 120000
connect_timeout = 10000
socket_timeout = 0
Do I have something mis-configured?
Sep 9, 2013 8:31:26 PM com.mongodb.DBPortPool gotError
WARNING: emptying DBPortPool to /10.0.8.10:27017 b/c of error
java.net.SocketException: Connection timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.bson.io.Bits.readFully(Bits.java:46)
at org.bson.io.Bits.readFully(Bits.java:33)
at org.bson.io.Bits.readFully(Bits.java:28)
at com.mongodb.Response.<init>(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:142)
at com.mongodb.DBPort.call(DBPort.java:92)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:244)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:216)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:288)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:273)
at com.mongodb.DBCollection.findOne(DBCollection.java:347)
at com.mongodb.DBCollection.findOne(DBCollection.java:332)
at com.mongodb.casbah.MongoCollectionBase$class.findOneByID(MongoCollection.scala:232)
at com.mongodb.casbah.MongoCollection.findOneByID(MongoCollection.scala:866)
at com.novus.salat.dao.SalatDAO.findOneById(SalatDAO.scala:353)
at com.novus.salat.dao.ModelCompanion$class.findOneById(ModelCompanion.scala:173)
Generally a connection timeout occurs from one of the following in a replica set
1) All members are not able to communicate with each other
2) A program is connecting to replica for update and it is unable to send it to primary due to overload or 1st as well
3) All relicas are not in sync and one is lagging behind too much
4) Leader election is going on but not completed due to some reason
Please check if your relica set is consistent and all nodes are working by issuing rs.status() on primary node , also as earlier suggested check primary logs for more information