Mongodb balancer have big chunk difference - mongodb

i have a sharded mongo env, everything was ok but recently i noticed that
the shards have big difference:
chunks:
ProductionShardC 939
ProductionShardB 986
ProductionShardA 855
edPrimaryShard 1204
balancer is running and i can see it also in the locks:
db.locks.find( { _id : "balancer" } ).pretty()
{
"_id" : "balancer",
"process" : "ip-10-0-0-100:27017:1371132087:1804289383",
"state" : 2,
"ts" : ObjectId("51e1e5d75e1777de5f007ea5"),
"when" : ISODate("2013-07-13T23:42:15.660Z"),
"who" : "ip-10-0-0-100:27017:1371132087:1804289383:Balancer:846930886",
"why" : "doing balance round"
}
here is the /var/log/mongo/mongos.log of mongos
cat mongos.log
Sun Aug 4 15:33:29.859 [mongosMain] MongoS version 2.4.4 starting: pid=8520 port=27017 64-bit host=ip-10-0-0-100 (--help for usage)
Sun Aug 4 15:33:29.859 [mongosMain] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2
Sun Aug 4 15:33:29.859 [mongosMain] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Sun Aug 4 15:33:29.859 [mongosMain] options: { configdb: "10.0.1.200:27019,10.0.1.201:27019,10.0.1.202:27019", keyFile: "/media/Data/db/mongoKeyFile", logpath: "/var/log/mongo/mongos.log" }
Sun Aug 4 15:33:30.078 [mongosMain] SyncClusterConnection connecting to [10.0.1.200:27019]
Sun Aug 4 15:33:30.079 [mongosMain] SyncClusterConnection connecting to [10.0.1.201:27019]
Sun Aug 4 15:33:30.080 [mongosMain] SyncClusterConnection connecting to [10.0.1.202:27019]
Sun Aug 4 15:33:30.092 [mongosMain] SyncClusterConnection connecting to [10.0.1.200:27019]
Sun Aug 4 15:33:30.093 [mongosMain] SyncClusterConnection connecting to [10.0.1.201:27019]
Sun Aug 4 15:33:30.093 [mongosMain] SyncClusterConnection connecting to [10.0.1.202:27019]
Sun Aug 4 15:33:30.809 [mongosMain] waiting for connections on port 27017
Sun Aug 4 15:33:30.809 [Balancer] about to contact config servers and shards
Sun Aug 4 15:33:30.810 [websvr] admin web console waiting for connections on port 28017
Sun Aug 4 15:33:30.810 [Balancer] starting new replica set monitor for replica set edPrimaryShard with seed of 10.0.1.150:27017,10.0.1.151:27017,10.0.1.152:27017
Sun Aug 4 15:33:30.811 [Balancer] successfully connected to seed 10.0.1.150:27017 for replica set edPrimaryShard
Sun Aug 4 15:33:30.811 [Balancer] changing hosts to { 0: "10.0.1.150:27017", 1: "10.0.1.152:27017", 2: "10.0.1.151:27017" } from edPrimaryShard/
Sun Aug 4 15:33:30.811 [Balancer] trying to add new host 10.0.1.150:27017 to replica set edPrimaryShard
Sun Aug 4 15:33:30.812 [Balancer] successfully connected to new host 10.0.1.150:27017 in replica set edPrimaryShard
Sun Aug 4 15:33:30.812 [Balancer] trying to add new host 10.0.1.151:27017 to replica set edPrimaryShard
Sun Aug 4 15:33:30.813 [Balancer] successfully connected to new host 10.0.1.151:27017 in replica set edPrimaryShard
Sun Aug 4 15:33:30.813 [Balancer] trying to add new host 10.0.1.152:27017 to replica set edPrimaryShard
Sun Aug 4 15:33:30.813 [Balancer] successfully connected to new host 10.0.1.152:27017 in replica set edPrimaryShard
Sun Aug 4 15:33:31.013 [Balancer] Primary for replica set edPrimaryShard changed to 10.0.1.150:27017
Sun Aug 4 15:33:31.019 [Balancer] replica set monitor for replica set edPrimaryShard started, address is edPrimaryShard/10.0.1.150:27017,10.0.1.151:27017,10.0.1.152:27017
Sun Aug 4 15:33:31.019 [ReplicaSetMonitorWatcher] starting
Sun Aug 4 15:33:31.021 [Balancer] starting new replica set monitor for replica set ProductionShardA with seed of 10.0.1.160:27017,10.0.1.161:27017,10.0.1.162:27017
Sun Aug 4 15:33:31.021 [Balancer] successfully connected to seed 10.0.1.160:27017 for replica set ProductionShardA
Sun Aug 4 15:33:31.022 [Balancer] changing hosts to { 0: "10.0.1.160:27017", 1: "10.0.1.162:27017", 2: "10.0.1.161:27017" } from ProductionShardA/
Sun Aug 4 15:33:31.022 [Balancer] trying to add new host 10.0.1.160:27017 to replica set ProductionShardA
Sun Aug 4 15:33:31.022 [Balancer] successfully connected to new host 10.0.1.160:27017 in replica set ProductionShardA
Sun Aug 4 15:33:31.022 [Balancer] trying to add new host 10.0.1.161:27017 to replica set ProductionShardA
Sun Aug 4 15:33:31.023 [Balancer] successfully connected to new host 10.0.1.161:27017 in replica set ProductionShardA
Sun Aug 4 15:33:31.023 [Balancer] trying to add new host 10.0.1.162:27017 to replica set ProductionShardA
Sun Aug 4 15:33:31.024 [Balancer] successfully connected to new host 10.0.1.162:27017 in replica set ProductionShardA
Sun Aug 4 15:33:31.187 [Balancer] Primary for replica set ProductionShardA changed to 10.0.1.160:27017
Sun Aug 4 15:33:31.232 [Balancer] replica set monitor for replica set ProductionShardA started, address is ProductionShardA/10.0.1.160:27017,10.0.1.161:27017,10.0.1.162:27017
Sun Aug 4 15:33:31.234 [Balancer] starting new replica set monitor for replica set ProductionShardB with seed of 10.0.1.170:27017,10.0.1.171:27017,10.0.1.172:27017
Sun Aug 4 15:33:31.235 [Balancer] successfully connected to seed 10.0.1.170:27017 for replica set ProductionShardB
Sun Aug 4 15:33:31.237 [Balancer] changing hosts to { 0: "10.0.1.170:27017", 1: "10.0.1.172:27017", 2: "10.0.1.171:27017" } from ProductionShardB/
Sun Aug 4 15:33:31.237 [Balancer] trying to add new host 10.0.1.170:27017 to replica set ProductionShardB
Sun Aug 4 15:33:31.237 [Balancer] successfully connected to new host 10.0.1.170:27017 in replica set ProductionShardB
Sun Aug 4 15:33:31.237 [Balancer] trying to add new host 10.0.1.171:27017 to replica set ProductionShardB
Sun Aug 4 15:33:31.238 [Balancer] successfully connected to new host 10.0.1.171:27017 in replica set ProductionShardB
Sun Aug 4 15:33:31.238 [Balancer] trying to add new host 10.0.1.172:27017 to replica set ProductionShardB
Sun Aug 4 15:33:31.238 [Balancer] successfully connected to new host 10.0.1.172:27017 in replica set ProductionShardB
Sun Aug 4 15:33:31.361 [Balancer] Primary for replica set ProductionShardB changed to 10.0.1.170:27017
Sun Aug 4 15:33:31.379 [Balancer] replica set monitor for replica set ProductionShardB started, address is ProductionShardB/10.0.1.170:27017,10.0.1.171:27017,10.0.1.172:27017
Sun Aug 4 15:33:31.383 [Balancer] starting new replica set monitor for replica set ProductionShardC with seed of 10.0.1.180:27017,10.0.1.181:27017,10.0.1.182:27017
Sun Aug 4 15:33:31.383 [Balancer] successfully connected to seed 10.0.1.180:27017 for replica set ProductionShardC
Sun Aug 4 15:33:31.384 [Balancer] changing hosts to { 0: "10.0.1.180:27017", 1: "10.0.1.182:27017", 2: "10.0.1.181:27017" } from ProductionShardC/
Sun Aug 4 15:33:31.384 [Balancer] trying to add new host 10.0.1.180:27017 to replica set ProductionShardC
Sun Aug 4 15:33:31.385 [Balancer] successfully connected to new host 10.0.1.180:27017 in replica set ProductionShardC
Sun Aug 4 15:33:31.385 [Balancer] trying to add new host 10.0.1.181:27017 to replica set ProductionShardC
Sun Aug 4 15:33:31.385 [Balancer] successfully connected to new host 10.0.1.181:27017 in replica set ProductionShardC
Sun Aug 4 15:33:31.385 [Balancer] trying to add new host 10.0.1.182:27017 to replica set ProductionShardC
Sun Aug 4 15:33:31.386 [Balancer] successfully connected to new host 10.0.1.182:27017 in replica set ProductionShardC
Sun Aug 4 15:33:31.499 [Balancer] Primary for replica set ProductionShardC changed to 10.0.1.180:27017
Sun Aug 4 15:33:31.510 [Balancer] replica set monitor for replica set ProductionShardC started, address is ProductionShardC/10.0.1.180:27017,10.0.1.181:27017,10.0.1.182:27017
Sun Aug 4 15:33:31.513 [Balancer] config servers and shards contacted successfully
Sun Aug 4 15:33:31.513 [Balancer] balancer id: ip-10-0-0-100:27017 started at Aug 4 15:33:31
Sun Aug 4 15:33:31.513 [Balancer] SyncClusterConnection connecting to [10.0.1.200:27019]
Sun Aug 4 15:33:31.514 [Balancer] SyncClusterConnection connecting to [10.0.1.201:27019]
Sun Aug 4 15:33:31.514 [Balancer] SyncClusterConnection connecting to [10.0.1.202:27019]
Sun Aug 4 15:33:31.537 [LockPinger] creating distributed lock ping thread for 10.0.1.200:27019,10.0.1.201:27019,10.0.1.202:27019 and process ip-10-0-0-100:27017:1375619611:1804289383 (sleeping for 30000ms)
Sun Aug 4 15:33:35.777 [mongosMain] connection accepted from 84.108.44.142:50916 #1 (1 connection now open)
Sun Aug 4 15:33:35.963 [conn1] authenticate db: admin { authenticate: 1, user: "root", nonce: "50c90ba9496d0a2d", key: "52390c478fffe89d03b776dd14e7c0d6" }
Sun Aug 4 15:33:37.704 [conn1] ChunkManager: time to load chunks for profiles.devices: 104ms sequenceNumber: 2 version: 2898|1177||51bb0e3a5e1777de5ffbf898 based on: (empty)
Sun Aug 4 15:33:37.712 [conn1] ChunkManager: time to load chunks for profiles.user_devices: 4ms sequenceNumber: 3 version: 92|25||51bb10be5e1777de5ffbf8d5 based on: (empty)
Sun Aug 4 15:33:37.715 [conn1] creating WriteBackListener for: 10.0.1.150:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.715 [conn1] creating WriteBackListener for: 10.0.1.151:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.715 [conn1] creating WriteBackListener for: 10.0.1.152:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.718 [conn1] creating WriteBackListener for: 10.0.1.160:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.718 [conn1] creating WriteBackListener for: 10.0.1.161:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.718 [conn1] creating WriteBackListener for: 10.0.1.162:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.722 [conn1] creating WriteBackListener for: 10.0.1.170:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.722 [conn1] creating WriteBackListener for: 10.0.1.171:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.722 [conn1] creating WriteBackListener for: 10.0.1.172:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.725 [conn1] creating WriteBackListener for: 10.0.1.180:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.725 [conn1] creating WriteBackListener for: 10.0.1.181:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:37.725 [conn1] creating WriteBackListener for: 10.0.1.182:27017 serverID: 51fe4a1a309fab9136fcd24a
Sun Aug 4 15:33:39.468 [conn1] warning: mongos collstats doesn't know about: systemFlags
Sun Aug 4 15:33:39.468 [conn1] warning: mongos collstats doesn't know about: userFlags
Sun Aug 4 15:33:39.469 [conn1] warning: mongos collstats doesn't know about: systemFlags
Sun Aug 4 15:33:39.469 [conn1] warning: mongos collstats doesn't know about: userFlags
Sun Aug 4 15:33:39.470 [conn1] warning: mongos collstats doesn't know about: systemFlags
Sun Aug 4 15:33:39.470 [conn1] warning: mongos collstats doesn't know about: userFlags
Sun Aug 4 15:33:39.470 [conn1] warning: mongos collstats doesn't know about: systemFlags
Sun Aug 4 15:33:39.470 [conn1] warning: mongos collstats doesn't know about: userFlags
why it have such a big different? 1 shard has 855 and another 1204
how can i fix it?

Related

Mongo Crashes Periodically

We have a 3-node replicaSet that periodically crashes and is unable to recover. Looking through our PRIMARY server's mongod.log file, I see multiple errors. I'm not sure where to begin or even what to include in this post but I'll start with the errors I am receiving. If I'm missing something, please let me know and I'll edit the post and include it. Can anyone shed any light on why this is happening?
Thu Feb 27 14:09:47.790 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet syncing to: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.791 [rsBackgroundSync] repl: couldn't connect to server mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.792 [conn152] end connection xx.xxx.xxx.107:43904 (71 connections now open)
Thu Feb 27 14:09:48.077 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:09:48.077 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet info mongos2i.hostname.com:27017 is down (or slow to respond):
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state DOWN
Thu Feb 27 14:09:48.080 [rsMgr] not electing self, mongos1i.hostname.com:27017 would veto with 'mongom1i.hostname.com:27017 is trying to elect itself but mongos2i.hostname.com:27017 is already primary and more up-to-date'
Thu Feb 27 14:09:49.079 [conn153] replSet info voting yea for mongos1i.hostname.com:27017 (1)
Thu Feb 27 14:09:50.080 [rsHealthPoll] replSet member mongos1i.hostname.com:27017 is now in state PRIMARY
Thu Feb 27 14:09:50.081 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is up
Thu Feb 27 14:09:50.082 [initandlisten] connection accepted from xx.xxx.xxx.107:43907 #154 (72 connections now open)
Thu Feb 27 14:09:50.082 [conn154] end connection xx.xxx.xxx.107:43907 (71 connections now open)
Thu Feb 27 14:09:50.086 [initandlisten] connection accepted from xx.xxx.xxx.107:43909 #155 (72 connections now open)
Thu Feb 27 14:09:50.792 [rsBackgroundSync] replSet syncing to: mongos1i.hostname.com:27017
Thu Feb 27 14:09:52.082 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:10:04.090 [conn155] end connection xx.xxx.xxx.107:43909 (71 connections now open)
Thu Feb 27 14:10:04.091 [initandlisten] connection accepted from xx.xxx.xxx.107:43913 #156 (72 connections now open)
Thu Feb 27 14:10:10.731 [conn153] end connection xx.xxx.xxx.97:52297 (71 connections now open)
Thu Feb 27 14:10:10.732 [initandlisten] connection accepted from xx.xxx.xxx.97:52302 #157 (72 connections now open)
Thu Feb 27 14:10:29.706 [initandlisten] connection accepted from 127.0.0.1:56436 #158 (73 connections now open)
Thu Feb 27 14:10:34.100 [conn156] end connection xx.xxx.xxx.107:43913 (72 connections now open)
Thu Feb 27 14:10:34.101 [initandlisten] connection accepted from xx.xxx.xxx.107:43916 #159 (73 connections now open)
Thu Feb 27 14:10:40.743 [conn157] end connection xx.xxx.xxx.97:52302 (72 connections now open)
Thu Feb 27 14:10:40.744 [initandlisten] connection accepted from xx.xxx.xxx.97:52309 #160 (73 connections now open)
Thu Feb 27 14:11:04.110 [conn159] end connection xx.xxx.xxx.107:43916 (72 connections now open)
Thu Feb 27 14:11:04.111 [initandlisten] connection accepted from xx.xxx.xxx.107:43918 #161 (73 connections now open)
Thu Feb 27 14:11:09.191 [conn161] end connection xx.xxx.xxx.107:43918 (72 connections now open)
Thu Feb 27 14:11:09.452 [initandlisten] connection accepted from xx.xxx.xxx.107:43919 #162 (73 connections now open)
Thu Feb 27 14:11:09.453 [conn162] end connection xx.xxx.xxx.107:43919 (72 connections now open)
Thu Feb 27 14:11:09.456 [initandlisten] connection accepted from xx.xxx.xxx.107:43921 #163 (73 connections now open)
Thu Feb 27 14:11:10.111 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:11:10.111 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:11:10.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state STARTUP2
Thu Feb 27 14:11:10.755 [conn160] end connection xx.xxx.xxx.97:52309 (72 connections now open)
Thu Feb 27 14:11:10.757 [initandlisten] connection accepted from xx.xxx.xxx.97:52311 #164 (73 connections now open)
Thu Feb 27 14:11:12.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:11:23.462 [conn163] end connection xx.xxx.xxx.107:43921 (72 connections now open)
Thu Feb 27 14:11:23.463 [initandlisten] connection accepted from xx.xxx.xxx.107:43925 #165 (73 connections now open)
Thu Feb 27 14:11:31.831 [conn158] end connection 127.0.0.1:56436 (72 connections now open)
Thu Feb 27 14:11:40.768 [conn164] end connection xx.xxx.xxx.97:52311 (71 connections now open)
Thu Feb 27 14:11:40.769 [initandlisten] connection accepted from xx.xxx.xxx.97:52315 #166 (72 connections now open)
Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:
We are using CentOS and Mongo 2.4.9.
Thanks in advance for the help.
The log output you have posted shows that your MongoDB instance did not crash. It exited normally.
Consider the following lines:
Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:
The first line above indicates that your MongoDB instancce recieved signal 15 from your OS (SIGTERM). This lead to MongoDB terminating. SIGTERM is the default level for the kill command and for stop portion of an init script in most Linux distros.

Cant initialise replica set on debian (open/create failed in createPrivateMap)

I try to setup MongoDB on my new virtual Server running with Debian 7.3. If a try to configurate the replica set with
hosts = {
"_id" : "rs0",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "localhost:27017"
}
]
}
rs.initiate(hosts);
MongoDB crashes with following exceptions:
Tue Jan 21 00:10:24.599 [initandlisten] MongoDB starting : pid=3616 port=27017 dbpath=/var/lib/mongodb 64-bit host=lvps176-28-17-95.dedicated.hosteurope.de
Tue Jan 21 00:10:24.599 [initandlisten]
Tue Jan 21 00:10:24.600 [initandlisten] ** WARNING: You are running in OpenVZ. This is known to be broken!!!
Tue Jan 21 00:10:24.600 [initandlisten]
Tue Jan 21 00:10:24.600 [initandlisten] db version v2.4.7
Tue Jan 21 00:10:24.600 [initandlisten] git version: 0161738abf06c1f067b56a465b706efd6f4bf2aa
Tue Jan 21 00:10:24.600 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Tue Jan 21 00:10:24.600 [initandlisten] allocator: tcmalloc
Tue Jan 21 00:10:24.600 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", replSet: "rs0" }
Tue Jan 21 00:10:24.609 [initandlisten] journal dir=/var/lib/mongodb/journal
Tue Jan 21 00:10:24.609 [initandlisten] recover : no journal files present, no recovery needed
Tue Jan 21 00:10:24.740 [initandlisten] preallocateIsFaster=true 2.38
Tue Jan 21 00:10:24.780 [initandlisten] waiting for connections on port 27017
Tue Jan 21 00:10:24.780 [websvr] admin web console waiting for connections on port 28017
Tue Jan 21 00:10:24.786 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jan 21 00:10:24.786 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
Tue Jan 21 00:10:27.429 [initandlisten] connection accepted from 127.0.0.1:50602 #1 (1 connection now open)
Tue Jan 21 00:10:34.786 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jan 21 00:10:40.160 [conn1] replSet replSetInitiate admin command received from client
Tue Jan 21 00:10:40.163 [conn1] replSet replSetInitiate config object parses ok, 1 members specified
Tue Jan 21 00:10:40.164 [conn1] replSet replSetInitiate all members seem up
Tue Jan 21 00:10:40.164 [conn1] ******
Tue Jan 21 00:10:40.164 [conn1] creating replication oplog of size: 24630MB...
Tue Jan 21 00:10:40.165 [FileAllocator] allocating new datafile /var/lib/mongodb/local.1, filling with zeroes...
Tue Jan 21 00:10:40.165 [FileAllocator] creating directory /var/lib/mongodb/_tmp
Tue Jan 21 00:10:40.205 [FileAllocator] done allocating datafile /var/lib/mongodb/local.1, size: 2047MB, took 0.036 secs
Tue Jan 21 00:10:40.206 [FileAllocator] allocating new datafile /var/lib/mongodb/local.2, filling with zeroes...
Tue Jan 21 00:10:40.233 [FileAllocator] done allocating datafile /var/lib/mongodb/local.2, size: 2047MB, took 0.027 secs
Tue Jan 21 00:10:40.234 [FileAllocator] allocating new datafile /var/lib/mongodb/local.3, filling with zeroes...
Tue Jan 21 00:10:40.255 [FileAllocator] done allocating datafile /var/lib/mongodb/local.3, size: 2047MB, took 0.02 secs
Tue Jan 21 00:10:40.256 [FileAllocator] allocating new datafile /var/lib/mongodb/local.4, filling with zeroes...
Tue Jan 21 00:10:40.275 [FileAllocator] done allocating datafile /var/lib/mongodb/local.4, size: 2047MB, took 0.019 secs
Tue Jan 21 00:10:40.276 [FileAllocator] allocating new datafile /var/lib/mongodb/local.5, filling with zeroes...
Tue Jan 21 00:10:40.355 [FileAllocator] done allocating datafile /var/lib/mongodb/local.5, size: 2047MB, took 0.079 secs
Tue Jan 21 00:10:40.356 [FileAllocator] allocating new datafile /var/lib/mongodb/local.6, filling with zeroes...
Tue Jan 21 00:10:40.372 [FileAllocator] done allocating datafile /var/lib/mongodb/local.6, size: 2047MB, took 0.014 secs
Tue Jan 21 00:10:40.372 [FileAllocator] allocating new datafile /var/lib/mongodb/local.7, filling with zeroes...
Tue Jan 21 00:10:40.498 [FileAllocator] done allocating datafile /var/lib/mongodb/local.7, size: 2047MB, took 0.121 secs
Tue Jan 21 00:10:40.499 [FileAllocator] allocating new datafile /var/lib/mongodb/local.8, filling with zeroes...
Tue Jan 21 00:10:40.546 [FileAllocator] done allocating datafile /var/lib/mongodb/local.8, size: 2047MB, took 0.046 secs
Tue Jan 21 00:10:40.546 [conn1] ERROR: mmap private failed with out of memory. (64 bit build)
Tue Jan 21 00:10:40.546 [conn1] Assertion: 13636:file /var/lib/mongodb/local.8 open/create failed in createPrivateMap (look in log for more information)
0xde0151 0xda188b 0xda1dcc 0xa5a63b 0xa5af9a 0xaba3b1 0x8d518d 0x8d5698 0x8d577f 0x8d5a1e 0xabbb00 0xac1429 0xa75908 0xc10af1 0x8dd4da 0x8de04d 0x8df582 0xa81f00 0xa867cc 0x9fa469
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde0151]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xda188b]
/usr/bin/mongod() [0xda1dcc]
/usr/bin/mongod(_ZN5mongo8MongoMMF13finishOpeningEv+0x1fb) [0xa5a63b]
/usr/bin/mongod(_ZN5mongo8MongoMMF6createERKSsRyb+0x5a) [0xa5af9a]
/usr/bin/mongod(_ZN5mongo13MongoDataFile4openEPKcib+0x141) [0xaba3b1]
/usr/bin/mongod(_ZN5mongo8Database7getFileEiib+0xbd) [0x8d518d]
/usr/bin/mongod(_ZN5mongo8Database8addAFileEib+0x38) [0x8d5698]
/usr/bin/mongod(_ZN5mongo8Database12suitableFileEPKcibb+0xaf) [0x8d577f]
/usr/bin/mongod(_ZN5mongo8Database11allocExtentEPKcibb+0x9e) [0x8d5a1e]
/usr/bin/mongod(_ZN5mongo13_userCreateNSEPKcRKNS_7BSONObjERSsPb+0x7a0) [0xabbb00]
/usr/bin/mongod(_ZN5mongo12userCreateNSEPKcNS_7BSONObjERSsbPb+0x2b9) [0xac1429]
/usr/bin/mongod(_ZN5mongo11createOplogEv+0xa78) [0xa75908]
/usr/bin/mongod(_ZN5mongo18CmdReplSetInitiate3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1da1) [0xc10af1]
/usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8dd4da]
/usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x71d) [0x8de04d]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8df582]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa81f00]
/usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa867cc]
/usr/bin/mongod() [0x9fa469]
Tue Jan 21 00:10:40.563 [conn1] replSet replSetInitiate exception: file /var/lib/mongodb/local.8 open/create failed in createPrivateMap (look in log for more information)
Tue Jan 21 00:10:40.563 [conn1] command admin.$cmd command: { replSetInitiate: { _id: "rs0", version: 1.0, members: [ { _id: 0.0, host: "localhost:27017" } ] } } ntoreturn:1 keyUpdates:0 locks(micros) W:401$
Tue Jan 21 00:10:44.787 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jan 21 00:10:53.835 [conn1] replSet replSetInitiate admin command received from client
Tue Jan 21 00:10:53.835 [conn1] replSet replSetInitiate config object parses ok, 1 members specified
Tue Jan 21 00:10:53.835 [conn1] replSet replSetInitiate all members seem up
Tue Jan 21 00:10:53.835 [conn1] replSet info saving a newer config version to local.system.replset
Tue Jan 21 00:10:54.564 Invalid access at address: 0x18 from thread: conn1
Tue Jan 21 00:10:54.564 Got signal: 11 (Segmentation fault).
Why MongoDB tries to create 8 files a 2GB? My machine run with 16GB ram, may that is the problem? Because in one line the error "[conn1] ERROR: mmap private failed with out of memory. (64 bit build)" looks like my machine run out of memory. But i only create a replica set with one member and an empty database. Sombody know that bug?
It's not RAM but disk space that is the problem, it's not a bug either.
Tue Jan 21 00:10:40.164 [conn1] creating replication oplog of size: 24630MB...
From MongoDB docs:
The oplog (operations log) is a special capped collection that keeps a
rolling record of all operations that modify the data stored in your
databases. MongoDB applies database operations on the primary and then
records the operations on the primary’s oplog. The secondary members
then copy and apply these operations in an asynchronous process. All
replica set members contain a copy of the oplog, allowing them to
maintain the current state of the database.
For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB
allocates 5% of the available free disk space to the oplog. If this
amount is smaller than a gigabyte, then MongoDB allocates 1 gigabyte
of space.
Above via http://docs.mongodb.org/manual/core/replica-set-oplog/
Oplog is needed for replication (It's a capped collection of a fixed size) and this is created automatically when you create replica set. oplogSize can be set via configuration options (if you would like to just experiment with the setup and can't free up some more disk space).
Here's a doc on it: http://docs.mongodb.org/manual/reference/configuration-options/#oplogSize
However:
Once the mongod has created the oplog for the first time, changing
oplogSize will not affect the size of the oplog.
via http://docs.mongodb.org/manual/reference/configuration-options/#oplogSize
If you would like to change oplogSize after it has been already created you could use this tutorial: http://docs.mongodb.org/manual/tutorial/change-oplog-size/
However if this is your "playground" installation, it better to delete content of your old MongoDB data directory (/var/lib/mongodb), change config file /etc/mongodb.conf (or pass --oplogSize param to mongo when it starts) and just have a "fresh start" with smaller oplog or point your MongoDB dbpath directory to a place where it has more disk space.

mongodb replica set unreachable

I am trying to configure a standalone mongodb replica set with 3 instances. I seem to have gotten into a funky state. Two of my instances went down, and I was left with all secondary nodes. I tried to follow this: http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/
I got this error though:
rs0:SECONDARY> rs.reconfig(cfg, {force : true})
{
"errmsg" : "exception: need most members up to reconfigure, not ok : obfuscated_hostname:27019",
"code" : 13144,
"ok" : 0
}
When I look at the logs I see this:
Fri Aug 2 20:45:11.895 [initandlisten] options: { config: "/etc/mongodb1.conf",
dbpath: "/var/lib/mongodb1", logappend: "true", logpath: "/var/log/mongodb/mongodb1.log",
port: 27018, replSet: "rs0" }
Fri Aug 2 20:45:11.897 [initandlisten] journal dir=/var/lib/mongodb1/journal
Fri Aug 2 20:45:11.897 [initandlisten] recover begin
Fri Aug 2 20:45:11.897 [initandlisten] recover lsn: 0
Fri Aug 2 20:45:11.897 [initandlisten] recover /var/lib/mongodb1/journal/j._0
Fri Aug 2 20:45:11.899 [initandlisten] recover cleaning up
Fri Aug 2 20:45:11.899 [initandlisten] removeJournalFiles
Fri Aug 2 20:45:11.899 [initandlisten] recover done
Fri Aug 2 20:45:11.923 [initandlisten] waiting for connections on port 27018
Fri Aug 2 20:45:11.925 [websvr] admin web console waiting for connections on port 28018
Fri Aug 2 20:45:11.927 [rsStart] replSet I am hostname_obfuscated:27018
Fri Aug 2 20:45:11.927 [rsStart] replSet STARTUP2
Fri Aug 2 20:45:11.929 [rsHealthPoll] replset info hostname_obf:27017 thinks that we are down
Fri Aug 2 20:45:11.929 [rsHealthPoll] replSet member hostname_obf:27017 is up
Fri Aug 2 20:45:11.929 [rsHealthPoll] replSet member hostname_obf:27017 is now in state SECONDARY
Fri Aug 2 20:45:12.587 [initandlisten] connection accepted from ip_obf:52446 #1 (1 connection now open)
Fri Aug 2 20:45:12.587 [initandlisten] connection accepted from ip_obf:52447 #2 (2 connections now open)
Fri Aug 2 20:45:12.588 [conn1] end connection ip_obf:52446 (1 connection now open)
Fri Aug 2 20:45:12.928 [rsSync] replSet SECONDARY
I'm unable to connect to the mongo instances, even though the logs say that it is up and running. Any ideas on what to do here?
You did not mention which version of mongodb you are using, but I assume it is post-2.0.
I think the problem with your forced reconfiguration is that after this reconfiguration, you still need to have the minimum number of nodes for a functioning replica set, i.e. 3. But since you originally had 3 members and lost 2, there is no way you could turn that single surviving node into a functioning replica set.
Your only option for recovery would be to bring up the surviving node as a stand-alone server, backup the database, and then create a new 3-node replica set with that data.
Yes you can turn up a single secondary replica to primary if the secondary server is running fine.Do follow the below simple steps:
Step 1: Connect to member and check the current configuration
rs.conf()
Step 2: Save the current configuration to another variable.
x = rs.conf()
Step 3: Select the id,host and port of the member that is to be made as primary.
x.members = [{"_id":1,"host" : "localhost.localdomain:27017"}]
Step 4: Reconfigure the new replica set by force.
rs.reconfig(x, {force:true})
Now the desired member will be promoted as the primary.

Mongodb gives error during startup

Whenever I try to play with mongo's interactive shell, it dies:
somekittens#DLserver01:~$ mongo
MongoDB shell version: 2.2.2
connecting to: test
Mon Dec 17 13:14:16 DBClientCursor::init call() failed
Mon Dec 17 13:14:16 Error: Error during mongo startup. :: caused by :: 10276 DBClientBase::findN: transport error: 127.0.0.1:27017 ns: admin.$cmd query: { whatsmyuri: 1 } src/mongo/shell/mongo.js:91
exception: connect failed
I'm able to repair the install (deleting mongodb.lock, etc) and get back to this point, but it'll only die again.
/var/log/mongodb/mongodb.log
Mon Dec 17 13:14:03
Mon Dec 17 13:14:03 warning: 32-bit servers don't have journaling enabled by default. Please use --journal if you want durability.
Mon Dec 17 13:14:03
Mon Dec 17 13:14:03 [initandlisten] MongoDB starting : pid=2674 port=27017 dbpath=/var/lib/mongodb 32-bit host=DLserver01
Mon Dec 17 13:14:03 [initandlisten]
Mon Dec 17 13:14:03 [initandlisten] ** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
Mon Dec 17 13:14:03 [initandlisten] ** see http://blog.mongodb.org/post/137788967/32-bit-limitations
Mon Dec 17 13:14:03 [initandlisten] ** with --journal, the limit is lower
Mon Dec 17 13:14:03 [initandlisten]
Mon Dec 17 13:14:03 [initandlisten] db version v2.2.2, pdfile version 4.5
Mon Dec 17 13:14:03 [initandlisten] git version: d1b43b61a5308c4ad0679d34b262c5af9d664267
Mon Dec 17 13:14:03 [initandlisten] build info: Linux domU-12-31-39-01-70-B4 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686 BOOST_LIB_VERSION=1_49
Mon Dec 17 13:14:03 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", logappend: "true", logpath: "/var/log/mongodb/mongodb.log" }
Mon Dec 17 13:14:03 [initandlisten] Unable to check for journal files due to: boost::filesystem::basic_directory_iterator constructor: No such file or directory: "/var/lib/mongodb/journal"
Mon Dec 17 13:14:03 [initandlisten] couldn't unlink socket file /tmp/mongodb-27017.sockerrno:1 Operation not permitted skipping
Mon Dec 17 13:14:03 [initandlisten] waiting for connections on port 27017
Mon Dec 17 13:14:03 [websvr] admin web console waiting for connections on port 28017
Mon Dec 17 13:14:16 [initandlisten] connection accepted from 127.0.0.1:57631 #1 (1 connection now open)
Mon Dec 17 13:14:16 Invalid operation at address: 0x819bb23 from thread: conn1
Mon Dec 17 13:14:16 Got signal: 4 (Illegal instruction).
Mon Dec 17 13:14:16 Backtrace:
0x8759eaa 0x817033a 0x81709ff 0x20e40c 0x819bb23 0x854cd54 0x85377d1 0x846b594 0x83e5591 0x83e6c15 0x81902b4 0x8746731 0x49ad4c 0x34ed3e
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x2a) [0x8759eaa]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x3ba) [0x817033a]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x2af) [0x81709ff]
[0x20e40c]
/usr/bin/mongod(_ZNK5mongo7BSONObj4copyEv+0x33) [0x819bb23]
/usr/bin/mongod(_ZN5mongo11ParsedQuery4initERKNS_7BSONObjE+0x494) [0x854cd54]
/usr/bin/mongod(_ZN5mongo11ParsedQueryC1ERNS_12QueryMessageE+0x91) [0x85377d1]
/usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x34) [0x846b594]
/usr/bin/mongod() [0x83e5591]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x3d5) [0x83e6c15]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x84) [0x81902b4]
/usr/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x421) [0x8746731]
/lib/i386-linux-gnu/libpthread.so.0(+0x6d4c) [0x49ad4c]
/lib/i386-linux-gnu/libc.so.6(clone+0x5e) [0x34ed3e]
Connecting using node's shell:
> mdb.open(function(err, db) { console.log(err) });
[Error: failed to connect to [localhost:27017]]
I've searched around for this error and found nothing of use. This is running on a fairly old server (Ubuntu 12.04 32-bit, 640MB RAM, 500MHz P2). How can I fix this?
There is an issue Invalid operation at address: 0x819b263 from thread: TTLMonitor in mongodb jira list. I think that is about your case.
A new server may be the easiest solution, otherwise you have to download the source code, make some modification and compile it youself.

How can I fix "EMPTYUNREACHABLE" on deploying a test replset on my mac?

I'm trying to deploy a development/test replication set on my macbook pro using this document.
http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
I started 3 instances of mongod, each on port 10001, 10002, 10003
I used the configuration file to start mongod. The configuration file below:
rs0:
dbpath = /Users/Thomas/mongodb/data/rs0/
port = 10000
logpath = /Users/Thomas/mongodb/log/rs0.log
logappend = true
replSet = rs0
rs1:
dbpath = /Users/Thomas/mongodb/data/rs1/
port = 10001
logpath = /Users/Thomas/mongodb/log/rs1.log
logappend = true
replSet = rs0
rs2:
dbpath = /Users/Thomas/mongodb/data/rs2/
port = 10002
logpath = /Users/Thomas/mongodb/log/rs2.log
logappend = true
replSet = rs0
And using the following command to start:
mongod -f config/rs0.conf
mongod -f config/rs1.conf
mongod -f config/rs2.conf
then I connect to it using mongo: mongo localhost:10001, but when I use rs.initiate() command to initialize the repl set, it failed.
> rs.initiate()
{
"startupStatus" : 4,
"info" : "rs0",
"errmsg" : "all members and seeds must be reachable to initiate set",
"ok" : 0
}
and check by rs.status()
> rs.status()
{
"startupStatus" : 4,
"errmsg" : "can't currently get local.system.replset config from self or any seed (EMPTYUNREACHABLE)",
"ok" : 0
}
And the log shows it's a EMPTYUNREACHABLE error. how could solve it?
***** SERVER RESTARTED *****
Mon Oct 15 22:02:31 [initandlisten] MongoDB starting : pid=568 port=10000 dbpath=/Users/Thomas/mongodb/data/rs0/ 64-bit host=bogon
Mon Oct 15 22:02:31 [initandlisten]
Mon Oct 15 22:02:31 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
Mon Oct 15 22:02:31 [initandlisten] db version v2.2.0, pdfile version 4.5
Mon Oct 15 22:02:31 [initandlisten] git version: f5e83eae9cfbec7fb7a071321928f00d1b0c5207
Mon Oct 15 22:02:31 [initandlisten] build info: Darwin bs-osx-106-x86-64-1.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386 BOOST_LIB_VERSION=1_49
Mon Oct 15 22:02:31 [initandlisten] options: { config: "config/rs0.conf", dbpath: "/Users/Thomas/mongodb/data/rs0/", logappend: "true", logpath: "/Users/Thomas/mongodb/log/rs0.log", port: 10000, replSet: "rs0", rest: "true" }
Mon Oct 15 22:02:31 [initandlisten] journal dir=/Users/Thomas/mongodb/data/rs0/journal
Mon Oct 15 22:02:31 [initandlisten] recover : no journal files present, no recovery needed
Mon Oct 15 22:02:31 [websvr] admin web console waiting for connections on port 11000
Mon Oct 15 22:02:31 [initandlisten] waiting for connections on port 10000
Mon Oct 15 22:02:37 [rsStart] trying to contact bogon:10000
Mon Oct 15 22:02:43 [rsStart] couldn't connect to bogon:10000: couldn't connect to server bogon:10000
Mon Oct 15 22:02:49 [rsStart] replSet can't get local.system.replset config from self or any seed (yet)
Mon Oct 15 22:03:05 [rsStart] trying to contact bogon:10000
Mon Oct 15 22:03:11 [rsStart] couldn't connect to bogon:10000: couldn't connect to server bogon:10000
Mon Oct 15 22:03:17 [rsStart] replSet can't get local.system.replset config from self or any seed (yet)
Mon Oct 15 22:03:33 [rsStart] trying to contact bogon:10000
Mon Oct 15 22:03:39 [rsStart] couldn't connect to bogon:10000: couldn't connect to server bogon:10000
Mon Oct 15 22:03:45 [rsStart] replSet can't get local.system.replset config from self or any seed (yet)
Mon Oct 15 22:04:01 [rsStart] trying to contact bogon:10000
Mon Oct 15 22:04:07 [rsStart] couldn't connect to bogon:10000: couldn't connect to server bogon:10000
Mon Oct 15 22:04:13 [rsStart] replSet can't get local.system.replset config from self or any seed (yet)
Try setting bind_ip to 127.0.0.1 or add an entry for `bogon' to your /etc/hosts?
mongodb appears to be getting the hostname() for the local system but it’s not resolvable.
I faced the exact issue today. I was able to start mongodb but not the replica sets. I had to delete the following line from the mongod.conf file
'bind_ip: "127.0.0.1"'
I also figured out that the mongod.conf files for mongodb and replica sets were different and stored at different locations, maybe because I brew installed the latest version of mongodb?. I found my replica set config file at "/usr/local/etc/mongod.conf".
FYI, I believe you hit this bug in 2.2.0:
https://jira.mongodb.org/browse/SERVER-7367
It is now fixed and scheduled for release in 2.2.1 - the fixes for 2.2.0 basically involve making sure that you use resolvable and reachable addresses for your sets because 2.2.0 tries to reach even local instances over the network.