Why has my newly created mongodb local database grown to 24GB? - mongodb

I setup a mongodb replica set a few days ago, I did some little test on it and everything working well. Today I just found its local collection grew to 24G !!!
rs0:PRIMARY> show dbs
local 24.06640625GB
test 0.203125GB
The other collections look normal except "oplog.rs":
rs0:PRIMARY> db.oplog.rs.stats()
{
"ns" : "local.oplog.rs",
"count" : 322156,
"size" : 119881336,
"avgObjSize" : 372.12200300475547,
"storageSize" : 24681987920,
"numExtents" : 12,
"nindexes" : 0,
"lastExtentSize" : 1071292416,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {
},
"capped" : true,
"max" : NumberLong("9223372036854775807"),
"ok" : 1
}
This is my mongodb.conf
dbpath=/data/db
#where to log
logpath=/var/log/mongodb/mongodb.log
logappend=true
port = 27017
fork = true
replSet = rs0
How can I solve it? Many thanks.

The oplog, which keeps an ongoing log for the operations on the primary and is used for the replication mechanism, is allocated by default at 5% of the available free disk space (for Linux/Unix systems, not Mac OS X or Windows). So if you have a lot of free disk space, MongoDB will make a large oplog, which means you have a large time window within which you could restore back to any point in time, for instance. Once the oplog reaches its maximum size, it simply rolls over (ring buffer).
You can specify the size of the oplog when initializing the database using the oplogSize option, see http://docs.mongodb.org/manual/core/replica-set-oplog/
Bottom line: Unless you are really short of disk space (which apparently you aren't, otherwise the oplog wouldn't have been created so big), don't worry about it. It provides extra security.

Related

How to migrate from MMAPv1 to WiredTiger with minimal downtime without mongodump/mongorestore

Most guidelines recommend to use mongodump/mongorestore, but for large product databases downtime can be very long
You can use replication and an additional server for this or the same server if the load allows.
You need 3 running MongoDB instance:
Your server you want to update (remind that WiredTiger support since 3.0).
Second instance of MongoDB which can be run on an additional server. Database will be temporarily copied to it by the replication.
And the third instance of MongoDB is arbiter, which doesn’t store data and only participates in the election of primary server. The arbiter can be run on the additional server on a separate port.
Anyway you need to backup your database. You can run “mongodump” without parameters and directory “./dump” will be created with the database dump. You can use “--gzip“ parameter to compress result size.
mongodump --gzip
Just in case, the command to restore:
mongorestore --gzip
It should be run in the same directory where “./dump” dir and “--gzip“ parameter should be added if used in “mongodump”.
Begin configure from the additional server. My target system is Linux RedHat without Internet, so I download and install MongoDB via RPM manually. Add the section to /etc/mongod.conf:
replication:
oplogSizeMB: 10240
replSetName: REPLICA
Check that the net section look like this to allow access from other servers:
net:
bindIp: 0.0.0.0
port: 27017
and run:
service mongod start
Run the third MongoDB instance - arbiter. It can work on the additional server on a different port. Create a temporary directory for the arbiter database:
mkdir /tmp/mongo
chmod 777 -R /tmp/mongo
and run:
mongod --dbpath /tmp/mongo --port 27001 --replSet REPLICA \
--fork --logpath /tmp/mongo/db1.log
Now configure the main server. Edit /etc/mongod.conf
replication:
oplogSizeMB: 10240
replSetName: REPLICA
and restart MongoDB on the main server:
service mongod restart
It’s important! After restarting the main server read operations may be unavailable. I was getting the following error:
{ "ok" : 0, "errmsg" : "node is recovering", "code" : 13436 }
So as quickly as possible you need to connect to MongoDB on the main server via “mongo” console and run the following command to configure replication:
rs.initiate(
{
_id: "REPLICA",
members: [
{ _id: 0, host : "<IP address of main server>:27017",
priority: 1.0 },
{ _id: 1, host : "<IP address of additional server>:27017",
priority: 0.5 },
{ _id: 2, host : "<IP address of additional server(the arbiter)>:27001",
arbiterOnly : true, priority: 0.5 }
]
}
)
After this operation all actions with MongoDB will be available and data synchronization will be started.
I don’t recommend to use rs.initiate() on the main server without parameters as in most tutorials, because name of the main server will be configured by default as DNS-name from the /etc/hostname. It's not very convenient for me because I use IP-addresses for communications in my projects.
To check the synchronization progress you can call from “mongo” console:
rs.status()
Result example:
{
"set" : "REPLICA",
"date" : ISODate("2017-01-19T14:30:34.292Z"),
"myState" : 1,
"term" : NumberLong(1),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "<IP address of main server>:27017",
"health" : 1.0,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 165,
"optime" : {
"ts" : Timestamp(6377323060650835, 3),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2017-01-19T14:30:33.000Z"),
"infoMessage" : "could not find member to sync from",
"electionTime" : Timestamp(6377322974751490, 1),
"electionDate" : ISODate("2017-01-19T14:30:13.000Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "<IP address of additional server>:27017",
"health" : 1.0,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 30,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00.000Z"),
"lastHeartbeat" : ISODate("2017-01-19T14:30:33.892Z"),
"lastHeartbeatRecv" : ISODate("2017-01-19T14:30:34.168Z"),
"pingMs" : NumberLong(3),
"syncingTo" : "<IP address of main server>:27017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "<IP address of additional server (the arbiter)>:27001",
"health" : 1.0,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 30,
"lastHeartbeat" : ISODate("2017-01-19T14:30:33.841Z"),
"lastHeartbeatRecv" : ISODate("2017-01-19T14:30:30.158Z"),
"pingMs" : NumberLong(0),
"configVersion" : 1
}
],
"ok" : 1.0
}
After “stateStr” of the additional server will be replaced from ”STARTUP2” to ”SECONDARY”, our servers are synchronized.
While we wait for the end of the synchronization, it is necessary to modify client applications a little bit they can work with all servers in replica.
If you use the ConnectionString, you should replace it with something like:
mongodb://<IP address of main server>:27017,<IP address of additional server>:27017,<IP address of additional server (the arbiter)>:27001/?replicaSet=REPLICA
If you use C++ mongo-cxx-driver legacy, as I am, you should to use mongo::DBClientReplicaSet instead mongo::DBClientConnection and list all three servers in connection parameters, including the arbiter.
There is a third option - you can simply change IP of MongoDB server in clients after switching PRIMARY-SECONDARY, but it's not very fair.
After the synchronization has ended and an additional server status has established as SECONDARY, we need to switch the PRIMARY and SECONDARY by executing the command in “mongo” console on the main server. This is important because command will not work on the additional server.
cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 1
cfg.members[2].priority = 0.5
rs.reconfig(cfg)
Then check server status by executing:
rs.status()
Stop the MongoDB on the main server
service mongod stop
and simply delete the entire contents of a directory with database. It is safe, because we have a working copy on the additional server, and in the beginning we have made a backup. Be careful. MongoDB doesn’t create a database directory itself. If you've deleted it, you need not only to restore
mkdir /var/lib/mongo
and setup owner:
chown -R mongod:mongod /var/lib/mongo
Check storage engine wiredTiger is configured in /etc/mongod.conf. From 3.2 it is used by default:
storage:
...
engine: wiredTiger
...
And run MongoDB:
service mongod start
The main server will get the configuration from the secondary server automatically and data will be synced back to WiredTiger storage.
After the synchronization is finished switch the PRIMARY server back. This operation should be performed on an additional server because it is the PRIMARY now.
cfg = rs.conf()
cfg.members[0].priority = 1
cfg.members[1].priority = 0.5
cfg.members[2].priority = 0.5
rs.reconfig(cfg)
Return the old version of database clients or change ConnectionString back.
Now turn off replication if necessary. Remove 2 replication servers from the main server:
rs.remove("<IP address of additional server>:27017")
rs.remove("<IP address of additional server (the arbiter)>:27001")
Remove all “replication” section from /etc/mongod.conf and restart MongoDB:
service mongod restart
After these we get the warning when connected via the “mongo” console:
2017-01-19T12:26:51.948+0300 I STORAGE [initandlisten] ** WARNING: mongod started without --replSet yet 1 documents are present in local.system.replset
2017-01-19T12:26:51.948+0300 I STORAGE [initandlisten] ** Restart with --replSet unless you are doing maintenance and no other clients are connected.
2017-01-19T12:26:51.948+0300 I STORAGE [initandlisten] ** The TTL collection monitor will not start because of this.
To get rid of it, you need to remove the database “local”. There is only one collection “startup_log” in this database in default state, so you can do this without fear via “mongo” console
use local
db.dropDatabase()
and restart MongoDB:
service mongod restart
If you will remove the “local” database before “replication” section from /etc/mongod.conf, it is immediately restored. So I could not do only one MongoDB restart.
On the additional server perform the same action:
remove “replication“ section from /etc/mongod.conf
restart MongoDB
drop the “local“ database
again restart
The arbiter just stop and remove:
pkill -f /tmp/mongo
rm -r /tmp/mongo

MongoDB replica heartbeat request time exceeded

I have replica set (hosted on amazon) which has:
primary
secondary
arbiter
All of them are version 3.2.6 and this replica is making one shard in my sharded cluster (if that is important although I think it is not).
When I type rs.status() on primary it says that cannot reach secondary (the same thing is on arbiter):
{
"_id" : 1,
"name" : "secondary-ip:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-07-20T15:40:50.479Z"),
"lastHeartbeatRecv" : ISODate("2016-07-20T15:40:51.793Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "Couldn't get a connection within the time limit",
"configVersion" : -1
}
(btw look at the optimeDate O.o)
Error in my log is:
[ReplicationExecutor] Error in heartbeat request to secondary-ip:27017; ExceededTimeLimit: Couldn't get a connection within the time limit
Strange thing is that when I go on secondary and type rs.status() everything looks OK. Also I am able to connect to secondary from my primary instance (with mongo --host secondary) so I guess it is not network issue. Yesterday it was all working fine.
TL;DR my primary cannot see secondary and arbiter cannot see secondary and my secondary sees primary and it was all working fine just day ago and I am able manually connect to secondary from primary instance.
Anyone has an idea what could go wrong?
Tnx,
Ivan
It seems the secondary optimeDate is responsible for the error, the best way to get to know the reasons for this wrong optimeDate is to investigate the secondary's machine current date time as it could be wrong as well. Not sure you are still looking for an answer but the optimedate is the problem and its not the connection between your replicaset machines.

MongoDB not showing collections information even though I am sure its there

So I am using MongoDB 3.2 version.
I created a db and its collection via a Clojure wrapper called monger
But when I connect to the mongo shell, and check if collections are created I can't see it.
Here's the code:
Primary> use db_name
PRIMARY> db.version()
3.2.3
PRIMARY> db.stats()
{
"db" : "db_name",
"collections" : 4,
"objects" : 0,
"avgObjSize" : 0,
"dataSize" : 0,
"storageSize" : 16384,
"numExtents" : 0,
"indexes" : 9,
"indexSize" : 36864,
"ok" : 1
}
PRIMARY> show collections
PRIMARY> db.coll1.getIndexes()
[ ]
PRIMARY> db.getCollectionInfos()
Tue May 24 16:29:44 TypeError: db.getCollectionInfos is not a function (shell):1
PRIMARY>
But when I check if collections are created via clojure I can see the information.
user=> (monger.db/get-collection-names mongo-db*)
#{"coll1" "coll2" "coll3" "coll4"}
What is going on?
Found the issue. So it turns out that if the mongo shell and running mongo instance are of two different versions then db.getCollectionNames() and db.collection.getIndexes() will return no output.
This can happen if you are connecting to a remote mongo instance and the instance via you are connecting to is running say 2.x shell version (you can see this when you start the shell) and the running mongo is 3.x version.
According to the documentation:
For MongoDB 3.0 deployments using the WiredTiger storage engine, if you run db.getCollectionNames() and db.collection.getIndexes() from a version of the mongo shell before 3.0 or a version of the driver prior to 3.0 compatible version, db.getCollectionNames() and db.collection.getIndexes() will return no data, even if there are existing collections and indexes. For more information, see WiredTiger and Driver Version Compatibility.
Spent almost an hour trying to figure this out, thought this might be helpful to others.

Can't figure out why mongo database becomes bigger after migration?

I'm new to mongodb. I have a local server and a remote server. After migrate the mongo database from the local server to a remote server using mongodump/mongorestore tools, I found out that the size of database became bigger on remote server.
Here is my sample :
on local server (Ubuntu 14.04.2 LTS, mongo 3.0.5):
> show dbs
Daily_data 7.9501953125GB
Monthly_data 0.453125GB
Weekly_data 1.953125GB
on remote server (CentOS 6.7, mongo 2.4.3):
> show dbs
Daily_data 9.94921875GB
Monthly_data 0.953125GB
Weekly_data 3.9521484375GB
I also checked the status of one collection to compare, the count is the same but the size (like indexSize, totalIndexSize, etc) has changed:
this is the status of collection on the local server:
> db.original_prices.stats()
{
"ns" : "Daily_data.original_prices",
"count" : 9430984,
"size" : 2263436160,
"avgObjSize" : 240,
"numExtents" : 21,
"storageSize" : 2897301504,
"lastExtentSize" : 756662272,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"indexDetails" : {
},
"totalIndexSize" : 627777808,
"indexSizes" : {
"_id_" : 275498496,
"symbol_1_dateTime_1" : 352279312
},
"ok" : 1
}
this is the status of collection on the remote server:
> db.original_prices.stats()
{
"ns" : "Daily_data.original_prices",
"count" : 9430984,
"size" : 1810748976,
"avgObjSize" : 192.00000508960676,
"storageSize" : 2370023424,
"numExtents" : 19,
"nindexes" : 2,
"lastExtentSize" : 622702592,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 639804704,
"indexSizes" : {
"_id_" : 305994976,
"symbol_1_dateTime_1" : 333809728
},
"ok" : 1
}
If mongodump/mongorestore is a good save way to migrate the mongo database?
The problem here as you seem to have already noticed is the index as is clearly shown that it is the indexSize that has grown here, and there is a perfectly logical explanation.
When running the restore the indexes are rebuilt but in a way that avoids blocking the other write operations happening in the restore operation. This is similar to the process employed in Build Indexes in the Background as described in the documentation, not exactly the same but close.
In order to get the most optimal index size it is best to first drop indexes from the target database and use the --noIndexRestore option with the mongorestore command as this will prevent index building during the data load.
Then when complete you can run a regular createIndex exluding any usage of the "background" option so the indexes are created in the foreground. The result will be that the database will be blocked from read and write during index creation, but the resulting indexes will be of a smaller size.
As for the general practice, you will note that other data sizes will in fact come out "smaller" as in the process of "rebuilding" then any slack space present in the source will not be created when the data is restored.
The data from mongodump is in a binary format and should always be used in preference to the textual format of mongoexport and related mongoimport, when of course taking data from one MongoDB instance and to use on another, since that is not the purpose of those tools.
Other alternates ae file system copies such as an LVM snapshot, which will of course restore in exactly the same state as the backup copy was made.
Factors that can affect the disk size of your collection include the underlying hardware, filesystem, and configuration. In your case, the prevailing factor seems to be a difference in the storage engine used on the local and remote servers: your local server is running Mongo 3.0 while the remote is running an older version. This is apparent based on the presence of the paddingFactorNote property, however you can confirm by running db.version() in both environments.
Between Mongo 2.4/2.6 and Mongo 3.0 there were several important changes to how collections are stored, not least the addition of the WiredTiger storage engine as an alternative to the default mmapv1 storage engine. There were also changes to how the mmapv1 engine (which you are using) pads documents during allocation to accommodate growth in document size.
The other major reason for the size differences comes from your use of mongorestore. During normal usage, mongo databases are not stored in a way that minimizes disk usage. However, mongorestore rebuilds the database/collection in a compact way, which is why for the collection you posted, the remote storageSize is smaller.

Mongodb, all replSet stuck at Startup2

I have mongodb replication set with 2 node(node0, node1), one day one of it(node1) crash.
considering deleting all data of node1 and restart it will take a long time, I shutdown node0 and rsync data to node1
after that, I start node0 and node1. both replSet stuck at STARTUP2, bellow is some log:
Sat Feb 8 13:14:22.031 [rsMgr] replSet I don't see a primary and I can't elect myself
Sat Feb 8 13:14:24.888 [rsStart] replSet initial sync pending
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
How to solve this problem?
EDIT 10/29/15: I found there's actually an easier way to find back your primary by using rs.reconfig with option {force: true}. You can find detail document here. Use with caution though as mentioned in the document it may cause rollback.
You should never build a 2-member replica set because once one of them is down, the other one wouldn't know if it's because the other one is down, or itself has been cut off from network. As a solution, add an arbiter node for voting.
So your problem is, when you restart node0, while node1 is already dead, no other node votes to it. it doesn't know if it's suitable to run a a primary node anymore. Thus it falls back to a secondary, that's why you see the message
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
I'm afraid as I know there's no other official way to resolve this issue other than rebuilding the replica set (but you can find some tricks later). Follow these steps:
Stop node0
Go to the data folder of node0 (on my machine it's /var/lib/mongodb. find yours in config file located at /etc/mongodb.conf)
delete local.* from the folder. note that
this undoable, even if you backed up these files.
You'll lose all the users in local database.
Start node0 and you shall see it running as a standalone node.
Then follow mongodb manual to recreate a replica set
run rs.initiate() to initialize replica set
add node1 to replica set: rs.add("node1 domain name");
I'm afraid you'll have to spend a long time waiting for the sync to finish. And then you are good to go.
I strongly recommend adding an arbiter to avoid this situation again.
So, above is the official way to reolve your issue, and this is how I did it with MongoDB 2.4.8. I didn't find any document to prove it so there's absolutely NO gurantee. you do it on your own risk. Anyway, if it doesn't work for you, just fallback to the official way. Worth tryng ;)
make sure in the whole progress no application is trying to modify your database. otherwise these modifications will not be synced to secondary server.
restart your server without the replSet=[set name] parameter, so that it runs as standalone, and you can do modifications to it.
go to local database, and delete node1 from db.system.replset. for example in my machine originally it's like:
{
"_id": "rs0",
"version": 5,
"members": [{
"_id": 0,
"host": "node0"
}, {
"_id": 1,
"host": "node1"
}]
}
You should change it to
{
"_id": "rs0",
"version": 5,
"members": [{
"_id": 0,
"host": "node0"
}]
}
Restart with replSet=[set name] and you are supposed to see node0 become primary again.
Add node1 to the replica set with rs.add command.
That's all. Let me know if you should have any question.
I had the same issue when using MMS. I created a new ReplicaSet of 3 machines (2 data + 1 arbiter, which is tricky to setup on MMS btw) and they were all in STARTUP2 "initial sync need a member to be primary or secondary to do our initial sync"
myReplicaSet:STARTUP2> rs.status()
{
"set" : "myReplicaSet",
"date" : ISODate("2015-01-17T21:20:12Z"),
"myState" : 5,
"members" : [
{
"_id" : 0,
"name" : "server1.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:11Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"
},
{
"_id" : 1,
"name" : "server2.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"infoMessage" : "initial sync need a member to be primary or secondary to do our initial sync",
"self" : true
},
{
"_id" : 3,
"name" : "server3.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 140,
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:10Z"),
"pingMs" : 0
}
],
"ok" : 1
}
To fix it, I used yaoxing answer. I had to shutdown the ReplicaSet on MMS, and wait for all members to be shut. It took a while...
Then, On all of them, I removed the content of the data dir:
sudo rm -Rf /var/data/*
And only after that, I turned the ReplicaSet On and all was fine.