Application stopped connecting to the mongodb secondary replicaset. We have the read preference set to secondary.
mongodb://db0.example.com,db1.example.com,db2.example.com/?replicaSet=myRepl&readPreference=secondary&maxStalenessSeconds=120
Connections always go to the primary overloading the primary node. This issue started after restarting patching and restart of the servers.
Tried mongo shell connectvity using above resulting in command being abruptly terminated. I see the process for that connect in the server in ps -ef|grep mongo
Any one faced this issue? Any troubleshooting tips are appreciated. Log's aren't showing anything related to the terminated/stopped connection process.
We were able to fix the issue. It was an issue on the spring boot side. When the right bean (we have two beans - one for primary and one for secondary connections) was injected, the connection was established to the secondary node for heavy reading and reporting purposes.
Related
I'm fairly new to mongodb, just a couple of months. I just converted my mongodb database to support a secondary replica set so I can watch collections. I only added one secondary which I'm guessing now may not be the best after reading you should create an odd number, but it is a localhost on one machine. I went through the instructions, got replication working fine for for half a day running my programs. But for some reason recently it has switched the database for port 27017 from primary to secondary. Primary was previously on localhost:27017 and secondary was on localhost:27027. Now my normal program can't connect to localhost:27017 without an error, which I believe it is because it is a secondary replica set now when it was primary before, assuming you can only connect to a primary. Here is the error msg.
Exception in thread "main" com.mongodb.MongoNotPrimaryException: Command failed with error 10107 (NotWritablePrimary): 'not master' on server localhost:27017.
I'm perplexed why mongodb switched the replica set primary in the first place. I doubt an error occurred, but certainly possible, but I haven't had a single "localhost" error in months of development.
For now, ideally how would I switch 27017 back to be the primary. How do I do that so my existing programs can function again?
Eventually when in production, what is the best methodology to handle this, assuming a lookup to a DNS entry to an ip address and suddenly the primary gets changed because of a fail over?
Given question 3 is a bit more involved, is there something I can do in my development environment to better simulate a production environment.
I use StackOverflow extensively but this is my first post so thanks for anyone who can provide advice.
Without knowing more about the replica configuration and circumstances of the switch over I'm not sure anyone could confidently answer question 1 but it may not be important compared to question 3.
When you want to manually switch the primary you can manipulate the priority settings:
https://docs.mongodb.com/manual/tutorial/force-member-to-be-primary/
Or run manual commands to freeze or step down the current primary:
https://docs.mongodb.com/manual/tutorial/force-member-to-be-primary/#force-a-member-to-be-primary-using-database-commands
The safest option is to ensure your application is aware of all replicas in the replica set. Then when you have these situations where something unexpected has happened the application will fail over to a writable db without any issues.
https://mongodb.github.io/mongo-java-driver/3.4/driver/tutorials/connect-to-mongodb/#connect-to-a-replica-set
I can only suggest setting up some VMs or containers as replica set members to better represent a production environment.
https://hub.docker.com/_/mongo
I was able to solve the problem by using the connect string which comprised both replica sets, which I was unaware I needed to do. Such as for java:
mongoClient = MongoClients.create("mongodb://localhost:27017,localhost:27027");
This also worked for Mongo Compass so I was able to connect to the secondary database. I didn't know you needed to provide paths to all replica sets when trying to connect, but in retrospect makes goods sense if something is down.
If you need a replica set for testing, you can create a single-node RS. Follow the instructions for creating a RS but only add one node.
I am running a couple of spring boot apps in Kubernetes. Each app is using spring JPA to connect to a Postgresql 10.6 database. What I have noticed is when the pods are killed unexpectedly the connections to the database are not released.
Running SELECT sum(numbackends) FROM pg_stat_database; on the database returns lets say 50, after killing a couple of pods running the the spring app and rerunning the query this number jumps to 60 this eventually causes the number of connections to postgresql to exceeds the maximum and prevent restarted pod's applications connecting to the database.
I have experimented with the postgresql option idle_in_transaction_session_timeout setting it to 15s but this does not drop the old connections and the number keeps increasing.
I am making use of the com.zaxxer.hikari.HikariDataSource datasource for my spring apps and was wondering if there was a way to prevent this from happening, either on the posgresql or spring boot's side.
Any advise or suggestions are welcome.
Spring boot version: 2.0.3.RELEASE
Java version: 1.8
Postgresql version 10.6
This issue can arise not only with kubernetes pods but also with a simple application running on a server which is killed forcibly ( like kill -9 pid on Linux) and not given opportunity to process to clean up via a shutdown hook to spring boot. I think in this case nothing on the application side can help. You can however try cleanup of inactive connections by several methods on database side as mentioned here.
How do you setup mongodb replication in production environments? I started using cloud formation with this template but it crashes half way. I want to setup mongo so that it has one primary and two replications.
I haven't found a good tutorial for how to setup Mongo replication.
Some other questions I have are:
How does the failover work, if I have three Ec2 instances each with mongo and the primary fails. Another instance becomes the primary but how does my client PyMongo and Scala Mongo know the IP address of the new primary.
Lets say the primary goes down for 1 hour and there are 2,000 writes. When it goes back up, how does the primary gets updated. Do I need a script for this?
I am trying to do this with flask PyMongo
I ended up testing this on my local machine here is what I found.
Failover is done by the client, in the Mongo URI you specify all your replications and when PyMongo connects to it. He checks to see which one is the primary and writes to that one.
When the database goes back up they all sync to match the same records in the all the databases.
Readthedocs has step by step manual on setting up MongoDB cluster on different platforms, including AWS EC2:
https://mongodb-documentation.readthedocs.io/en/latest/ecosystem/tutorial/install-mongodb-on-amazon-ec2.html#deploy-a-multi-node-replica-set
To provide your clients with working mongo instance you can employ several different strategies. For example:
Set up Route53 failover. Route53 will monitor health instance of primary node, and change DNS record to point to secondary in case of failure.
Use service discovery. Consul, etc, ZooKeeper and doozerd are worth exploring.
In case of failing and then coming back a mongodb node will receive latest data from other nodes — that's just what replica set does.
Our MongoDB setup uses three replica set shards. Each webserver runs a mongos instance locally, and the client node.js processes connect through that using Mongoose (3.6.20) and node-mongodb-native. So node-mongodb-native just connects to mongos on localhost.
When a replica set primary goes down hard (we can simulate this by doing 'ifdown eth0' on the primary) mongos properly detects this, and also detects that a new primary has been elected. So far, so good. But node-mongodb-native's connections to the mongos instance are still open but not functional, and a restart of the node procs is required.
Our assumption was that mongos would just kill any established connections to the dead primary and node-mongodb-native would reconnect, but that seems to not be the case; both the server and the OS think these connections are open. By contrast, on primary stepDown, the clients fail over fine, connections are closed and reopened.
We are looking at socketTimeoutMS, but that seems incorrect since it causes disconnects for connections that are merely idle.
Are we missing configuration to our client or mongos, or do we have to implement our own pinging?
Based on experimentation and the following MongoDB bug, this appears to just be a shortcoming of mongos (or, if you prefer, of the client libraries) at this point. Right now it looks like 'write your own pinging logic in your app and trigger a reconnect when that fails', so that's what we are doing.
https://jira.mongodb.org/browse/SERVER-9041
I never have my hands on coding. I got a doubt regarding mongodb replica sets
below is the situation
I have an alert monitoring application.
It is using mongodb with replica set with 3 nodes.
Applications Java code base keep connecting to the primary and doing some transactions.
Now my question is that,
if the primary server is down, how will it effect the application server.
I mean, would the app server writes error saying connection failed like errors.
OR
the replica set will pick one of the slaves automatically as master and provides the application server to do its activity. How will it happen...?
Thanks & Regards,
UDAY
The replica set will try to pick another server as the new primary. If you have three nodes, and one goes down, the other two will negotiate which one becomes the new master. If two go down, or somehow communication between the remaining breaks down, there will be no new master until the situation is recovered.
The official drivers support this automatic fail-over, as does the mongos routing server if you use it. So the application code does not need to do anything here.
I am not sure if there will be connection errors during the brief period of time this fail-over negotiation takes (you will probably get errors for a few seconds).