I have a mongodb replicaset running in a docker container (mongo:3.0.11) in a aws vpc (for this specific case just one node, primary).
This server is shutdown every night and started again in the next morning.
After a few months running seamlessly, I'm having a few errors in the past few weeks. Happens that once or twice a week the mongo startup fails.
rs.status() returns stateStr: REMOVED
and as error message: errmsg : "Our replica set config is invalid or we are not a member of it"
Looking at the mongo logs I have:
2016-06-07T12:01:48.724+0000 W NETWORK [ReplicationExecutor] getaddrinfo("database.my_vpc_dns.net") failed: Name or service not known
When this error happens, a simple restart on the docker container will fix, but I'm struggling to understand what is causing this error to happen occasionally.
Probably the replica loses the configuration when doing the restart. It is possible that the replica loses the reading of the DNS reason why it does not manage to raise when the server is started.
What you can do is to point to the machine directly through the domain.my-machine in the Execute db.isMaster() in primary to not restart.
Related
I am trying to run a Postgresql replica DB on Fargate. For storage I mounted an EFS volume and put the the DB on via the PGDATA environment variable. When the Fargate task starts the sync with the main DB starts and everything seem to work fine for some minutes until:
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
This error then occurs every other minute until the health checks fail eventually.
Anyone has an idea what does this error mean in this context? And/or how to get fix it?
We'v set up sharded cluster with replica set for each shard in our production ENV. Last night we encountered a issue while connecting to the cluster and issuing building index command through mongo shell , by connecting to the mongos instance , not directly to specific shard.
The issue is : once starting building the index, connections created from mongos to this shard increases rapidly and 'too many connections' errors show up in the primary shard's log file very soon.
The below is link for primary shard's log summary:
At the very beginning for index
Then very soon, the connections numbers reached to 10000:
Connection limit exceeded
From the three mongos' log, all the connections are initiated from mongos. We have googled and find related issue link as : https://jira.mongodb.org/browse/SERVER-28822
But there is no trigger conditions. And the same time, I tried to reproduce the question in test ENV ,but not occurred again. So, please help.
here is configurations for mongos:
mongos' configuration
and here is for shard:
primary shard's configuration
Found the answer.
It was because the index creation issued by mongorestore command was foreground, not background. I mistook the way which mongorestore took and not checked the meta file for the table schema.
I was using MongoDB version 2.6.6 on Google Compute Engine and used the click to deploy method.
rs0:SECONDARY> db.createUser({user:"admin", pwd:"secret_password", roles:[{role:"root", db:"admin"}]})
2015-07-13T15:02:28.434+0000 Error: couldn't add user: not master at src/mongo/shell/db.js:1004
rs0:SECONDARY> use admin
switched to db admin
rs0:SECONDARY> db.createUser({user:"admin", pwd:"secret_password", roles:["root"]})
2015-07-13T15:13:28.591+0000 Error: couldn't add user: not master at src/mongo/shell/db.js:1004
I had a similar problem with mongo 3.2:
Error: couldn't add user: not master :
When trying to create a new user, with root role.
I was using only a local copy of mongo.
In my mongod.conf file I had the following uncommented:
replication:
replSetName: <node name>
Commenting that out and restarting fixed the problem. I guess mongo thought it was part of a replication set, and was confused as to who the Master was.
Edit:
I've also found that if you ARE trying to setup a replication set, and you get the above error, then run:
rs.initiate()
This will start a replication set, and set the current node as PRIMARY.
Exit, and then log back in and you should see:
PRIMARY>
Now create users as needed.
I ran into this error when scripting replica set creation.
The solution was to add a delay between rs.initiate() and db.createUser().
Replica set creation is seemingly done in background and it takes time for the primary node to actually become primary. In interactive use this doesn't cause a problem because there is a delay while typing the next command, but when scripting the interactions the delay may need to be forced.
MongoDB will be deployed in a cluster of Compute Engine instances (also known as a MongoDB replica set). Each instance will use a boot disk and separate disk for database files.
Primary and master nodes are the nodes that can accept writes. MongoDB’s replication is “single-master:” only one node can accept write operations at a time.
Secondary and slave nodes are read-only nodes that replicate from the primary.
Your error message looks like you are trying to add the user on the secondary. Try adding the user in the primary.
I ran into this issue when I thought I was running mongo 3.4 but it was mongo 3.6. Uninstalling 3.6 and installing 3.4 fixed my issue.
I have a simple replica set configured as follows:
mongo1 (primary)
mongo2 (secondary)
mongo3 (arbiter)
It functioned correctly for around a month and then we started seeing intermittent exceptions along the lines of:
Moped::Errors::ReplicaSetReconfigured: The operation: #<Moped::Protocol::Command
#length=179 #request_id=1400 #response...>{:order=>"SwimSet"}, :update=>{"$inc"=>
{:next=>1}}, :new=>true, :upsert=>true} #fields=nil> failed with error "not master"
They key bit being "failed with error not master. This happens sporadically when trying to write to a collection. This is not during or immediately after a failover. Shutting the secondary down but leaving the arbiter running resolves the error but leaves us without any redundancy.
What we've tried:
Rebuilding the secondary and re-adding it to the cluster
Failing over to the newly built node, then rebuilding the old primary
Upgrading to Mongo 2.6.4
Current Versions:
Mongo Server: 2.6.4
Mongoid: 3.1.6
Moped: 1.5.2
Any suggestions very much appreciated as been battling with this on and off for nearly a month now.
I have a standalone Mongo instance running a replica set. I can't seem to connect and run any queries in the Mongo shell however. I get the following:
error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
I set SlaveOk like so:
db.getMongo().setSlaveOk()
..but I still get an error:
error: {
"$err" : "not master or secondary; cannot currently read from this replSet member",
"code" : 13436
}
I can't seem to find a straight answer on Google: how do I connect to my replica set using the mongo shell?
I got the same problem and solved it using
rs.initiate()
If you are connecting to a node in a replica set that is not the master, you need to explicitly tell the client that it is ok that your are not connected to a master..
You can do this by calling
rs.slaveOk()
You can then perform your query.
Note that you will only be able to perform queries, not make changes to the repository, when connected to a slave node.
You are connected to a node that is neither in state secondary or primary. This node could be an arbiter or possibly a secondary in recovery mode. For example, if I had a replica set of 3 nodes, (where there is one primary, a secondary and an arbiter) I would get the same error if I had connected to the arbiter and issued a query even after I had set slaveOK true. The shell's command line prompt should indicate what state the node you are connected is in:
foo:ARBITER> db.test.find()
error: {
"$err" : "not master or secondary; cannot currently read from this replSet member",
"code" : 13436
}
have you tried: db.getMongo().setSlaveOk(true)
I also got the error. But when i tried connecting to the secondary node using the machine name instead of 'localhost' or 127.0.0.1 the error went away.
This error is only displayed when you are running an instance that's part of a replica set in standalone mode without completely removing it from the replica set.
e.g. You restart your instance on a different port but don't remove the --repSet option when starting it.
This starts it but neither as a primary nor as a secondary, hence the error
not master or secondary;
Depending on what you intended to do initially, either restart the instance on the correct port and with the correct --repSet option. This adds it to the replica set and gets rid of this error
If you intended to run the instance as standalone for some time (say to create an index), then start it on a different port WITHOUT the --repSet option
I got the same error while running aggregate() on staging server with two replica sets.I think you need to change the read preference to 'secondaryPreferred'.
Just put .read('secondaryPreferred') after the query function.