We have a mongodb with 2 shardings, each of the shardings has those servers:
Shard 1: Master, running MongoD and Config server
Shard 1-s1: Slave, running MongoD and MongoS server
Shard 1-s2: Slave, running MongoD and MongoS and Arbiter server
Shard 2: Master, running MongoD and Config Server
Shard 2-s1: Slave, running MongoD and Config and MongoS server
Shard 2-s2: Slave, running MongoD and MongoS and Arbiter server
But the mongodb allways failed in recent days, after days of search, i find out that the MongoD runs at Shard 1(Master) always going down after reviced too many connections, other MongoD don't have this problem.
When the S1Master's MongoD running with too many connections for about 2 hours, the 4 Mongos Server will shut down one by one. Here is the Mongos's error log(10.81.4.72:7100 runs MongoD):
Tue Aug 20 20:01:52 [conn8526] DBClientCursor::init call() failed
Tue Aug 20 20:01:52 [conn3897] ns: user.dev could not initialize cursor across all shards because : stale config detected for ns: user.
dev ParallelCursor::_init # s01/10.36.31.36:7100,10.42.50.24:7100,10.81.4.72:7100 attempt: 0
Tue Aug 20 20:01:52 [conn744] ns: user.dev could not initialize cursor across all shards because : stale config detected for ns: user.d
ev ParallelCursor::_init # s01/10.36.31.36:7100,10.42.50.24:7100,10.81.4.72:7100 attempt: 0
I don't know why this mongod revieved so many connections, the chunks shows the sharding works well.
Related
I am trying to create a sharding on my local machine. But when I create MongoDB shard server in the bin directory using this command start mongod --shardsvr --port 2005 -logpath C:\data\shard\s0\log\s0.log --dbpath C:\data\shard\s0 nothing happens so I went on to create mongos router mongos --port 2007 --configdb test/localhost:2001. Mongos created successfully and then I add shard to mongos using command sh.addShard("localhost:2005") then this error appears "errmsg" : "failed to run command { isMaster: 1 } when attempting to add shard localhost:2005 :: caused by :: HostUnreachable: Error connecting to localhost:2005 (127.0.0.1:2005) :: caused by :: No connection could be made because the target machine actively refused it.". I don't know how to deal with this error.
Here is the topology I have running off self hosted Ubuntu machines on AWS (EC2)
MongoDB version 5.0.3
Mongos - 1 server
Config Servers (3 servers)
Shards (3) each is a ReplicaSet with 3 members each, therefore 9 data nodes
For various reasons, which don't seem to be in the logs, all 3 members of Shard 3 went down. Restarting the processes on these 3 members was the most obvious step, however, this was the output.
ubuntu#XXXXXXXX:~$ mongosh
Current Mongosh Log ID: 619aa17d4fac3c77a109b11b
Connecting to: mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000
MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27017
Eventually, one member came back, as SECONDARY, restarting the mongod processes on the other members seemed to cascade and kill the first member. At least so it appeared.
Following this guide, I entered mongosh on Shard 3, Member 1 (3-1), I then disconnected the other members from the ReplicaSet. After this, Member one refuses to start. The process logs show this
ubuntu#XXXXXXX:~$ sudo systemctl status mongod
● mongod.service - MongoDB Database Server
Loaded: loaded (/lib/systemd/system/mongod.service; enabled; vendor preset: enabled)
Active: failed (Result: signal) since Sun 2021-11-21 20:02:31 UTC; 798ms ago
Docs: https://docs.mongodb.org/manual
Process: 1094 ExecStart=/usr/bin/mongod --config /etc/mongod.conf (code=killed, signal=ABRT)
Main PID: 1094 (code=killed, signal=ABRT)
Nov 21 20:02:26 ip-172-31-32-42 systemd[1]: Started MongoDB Database Server.
Nov 21 20:02:31 ip-172-31-32-42 systemd[1]: mongod.service: Main process exited, code=killed, sta>
Nov 21 20:02:31 ip-172-31-32-42 systemd[1]: mongod.service: Failed with result 'signal'.
Is it possible to get any of these members in RS3 back up?
Is it possible to restore the data that has been sharded on Shard 3, of course this is only part of it, Shard 1 & Shard 2 (and their ReplicaSets are okay)
I want to use mongodb-shard for my project. I created a helm chart https://github.com/b-rohit/mongodb-shard-chart to deploy to kubernetes cluster.
I use kind cluster running locally to test it. The config and shard servers are running properly. I am able to execute commands in their mongo shells. The mongos server is not able to connect to the replica set in config server. I get following error message in mongos
2020-04-17T13:33:31.579+0000 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: FailedToSatisfyReadPreference: Could not find host matching read preference { mode: "primary" } for set mongors1conf
2020-04-17T13:33:31.579+0000 W SHARDING [mongosMain] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set mongors1conf
2020-04-17T13:33:31.579+0000 I SHARDING [shard-registry-reload] Periodic reload of shard registry failed :: caused by :: FailedToSatisfyReadPreference: could not get updated shard list from config server :: caused by :: Could not find host matching read preference { mode: "nearest" } for set mongors1conf; will retry after 30s
On config server logs are following
2020-04-17T13:33:11.578+0000 I NETWORK [listener] connection accepted from 10.244.0.6:34400 #5 (1 connection now open)
2020-04-17T13:33:11.578+0000 I NETWORK [conn5] received client metadata from 10.244.0.6:34400 conn5: { driver: { name: "NetworkInterfaceTL", version: "4.2.5" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2020-04-17T13:33:11.589+0000 I ACCESS [conn5] Successfully authenticated as principal __system on local from client 10.244.0.6:34400
2020-04-17T13:33:38.197+0000 I SHARDING [replSetDistLockPinger] Marking collection config.lockpings as collection version: <unsharded>
2020-04-17T13:33:38.202+0000 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
2020-04-17T13:44:39.743+0000 I CONTROL [LogicalSessionCacheRefresh] Failed to create config.system.sessions: Cannot create config.system.sessions until there are shards, will try again at the next refresh interval
2020-04-17T13:44:39.743+0000 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Cannot create config.system.sessions until there are shards
2020-04-17T13:44:39.743+0000 I SH_REFR [ConfigServerCatalogCacheLoader-1] Refresh for collection config.system.sessions took 0 ms and found the collection is not sharded
2020-04-17T13:44:39.743+0000 I CONTROL [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Collection config.system.sessions is not sharded.
2020-04-17T13:44:42.570+0000 I NETWORK [conn5] end connection 10.244.0.10:37922 (0 connections now open)
I am new to mongodb. It took lot of time to put this chart together. I checked other similar questions also. could not find host matching read preferences in mongodb
I am not able to debug it further.
Your config server replica set is either:
not running (not all nodes are up)
not a replica set (replSetInitiate not executed, or failed)
is referenced from the shard nodes incorrectly (wrong host, ip or wrong replica set name)
is up and running but your shards aren't allowed to access it due to firewall rules
Ensure you can access the replica set nodes from mongo shell from the machines on which the shard mongods are running.
As I am new with MongoDB.
I have configured 1 config server, 2 shards and all are connected with mongod client. Also, Mongos is UP and Running.
I am currenctly implementing Sharding using Mongo Zone. As I have to store the data accourding to country/Zone. But facing below error in shards:
2018-05-03T16:22:40.200+0530 W SHARDING [conn13] Chunk move failed :: caused by ::
NoShardingEnabled: Cannot accept sharding commands if not started with --shardsvr
2018-05-03T16:27:11.223+0530 I SHARDING [Balancer] Balancer move testdb.user:
[{ zipcode: "380001" }, { zipcode: "39000" }), from rs1, to rs0 failed :: caused by ::
NoShardingEnabled: Cannot accept sharding commands if not started with --shardsvr
I have already started server with --shardsvr and also mentioned in config file as well. But still I am facing issue.
Please help me get this resolve.
Thanks in Advance.
Ankit
I started one Mongos server by the following command:
nohup mongos --port 57017 --configdb configsvr1:47017,configsvr2:47017,configsvr3:47017 > s.nohup 2>&1 &
configsvr* are configuration servers. my problem is that Mongos instance always exit after a few hour.
The Mongos log I found as following:
W SHARDING [LockPinger] Error encountered while stopping ping on mongos_server:
57017:1472436263:-1365656364 :: caused by :: 17382
Can't use connection pool during shutdown
I SHARDING [signalProcessingThread] dbexit: rc:0