Taking a look at the serverStatus command, I see the following data.
>db.runCommand( { serverStatus: 1} )
...
"mem" : {
"bits" : 64,
"resident" : 2138, // Mongo uses 2 GB RAM
"virtual" : 33272,
"supported" : true,
"mapped" : 16489, // equals db.coll.totalSize()
"mappedWithJournal" : 32978
},
Mongo recommends that the working set size fit in RAM.
If I understand correctly, then 16.4 GB of Mongo documents/indexes are memory mapped. Since Mongo is only using 2 GB of RAM, whenever Mongo needs to access an address outside of that 2 GB, then Mongo will need to fetch the contents of the address on disk and then load them into memory?
Is this my explanation the main reason that working set must fit into RAM?
Related
db.collection.stats()
Response :
"count" : 20696445,
"size" : NumberLong("1478263842661"),
"storageSize" : 334732324864,
"totalIndexSize" : 676327424,
"indexSizes" : {
"_id_" : 377094144,
"leadID_1" : 128049152,
"leadID_hashed" : 171184128
},
"avgObjSize" : 71425.97884134208
My actual disk size is matched with storageSize. So what is the size and other keys.
You haven't mentioned the version of MongoDB server you are using but given the size of your data is much larger than the storageSize on disk, I'm assuming you are using the WiredTiger storage engine which compresses data and indexes by default. The WiredTiger storage engine was first available as an option in the MongoDB 3.0 production series and became the default storage engine for new deployments in MongoDB 3.2+.
In your example output it looks like you have 1.4TB of uncompressed data which is currently occupying 334GB on disk (the storageSize value). Storage space used by indexes for this collection is reported separately under indexSizes and summed up as totalIndexSize.
The output of collection.stats() will vary depending on your MongoDB server version and configured storage engine, but is generally described in the MongoDB manual as part of the output of the collStats command which is called by the db.collection.stats() shell helper.
Note: MongoDB documentation is versioned so you should always make sure you are referencing documentation that matches your release series of MongoDB (i.e. 3.2, 3.4, ...). Default documentation links will point to the current production release.
Refer link
collStats.size
The total size in memory of all records in a collection. This value does not include the record header, which is 16 bytes per record, but does include the record’s padding. Additionally size does not include the size of any indexes associated with the collection, which the totalIndexSize field reports.
The scale argument affects this value.
collStats.storageSize
The total amount of storage allocated to this collection for document storage. The scale argument affects this value.
storageSize does not include index size. See totalIndexSize for index sizing.
As far as I understand, the storage size for MongoDB should always be larger than data size. However, after upgrading to Mongo 3.0 and using WiredTiger, I start seeing that the data size is larger than the storage size.
Here's from one of the databases:
{
"db" : "Results",
"collections" : NumberInt(1),
"objects" : NumberInt(251816),
"avgObjSize" : 804.4109548241573,
"dataSize" : NumberInt(202563549),
"storageSize" : NumberInt(53755904),
"numExtents" : NumberInt(0),
"indexes" : NumberInt(5),
"indexSize" : NumberInt(41013248),
"ok" : NumberInt(1)
}
Note that 202563549 > 53755904 by far margin. I am confused how this can be. Is the way to read db.stats() different now in Mongo 3.0?
The storageSize metric is equal to the size (in bytes) of all the data extents in the database. Without compression, this number is larger than dataSize because it includes yet-unused space (in data extents) and space vacated by deleted or moved documents within extents. However, as you are using the WiredTiger storage engine, data is compressed on the disk and is therefore smaller than the dataSize.
MongoDB 3.0 with WiredTiger engine uses 'snappy' compression by default.
If this affects your DB performance, you can consider to turn it off (blockCompressor: none) in the mongod.conf file:
storage:
engine: wiredTiger
wiredTiger:
collectionConfig:
blockCompressor: none
We have been running mongoDB in single unsharded instance, just one database. The size of data files was 0.45 GB. When I looked into the storageSize of all the collections, the total size was ~85 MB. In a bid to reclaim unused space, we ran repairDatabase(), with understanding that file sizes grow from 64 to 128 to 256 and so on till 2 GB. Since the mongo object data we have (85 MB) can be accommodated in 64 + 128 MB files, we were expecting the 256 MB file to be reclaimed. However, to our surprise, no space was reclaimed.
Can someone let us know the logic based on which we can find how much space would be reclaimed? Essentially, given total disk space a database takes, and given total mongo object data size, can one estimate accurately how much space would be reclaimed?
The following is the db.stats() output as requested in a comment:
> db.stats()
{
"db" : "analytics_data_1",
"collections" : 12,
"objects" : 207223,
"avgObjSize" : 353.6659347659285,
"dataSize" : 73287716,
"storageSize" : 84250624,
"numExtents" : 43,
"indexes" : 26,
"indexSize" : 21560112,
"fileSize" : 469762048,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1
}
>
The storage FAQ explains that an extra file is always pre-allocated and as soon as you start writing to it, mongod will preallocate the next file.
Repair won't reclaim any space that would normally exist - it can only help if you've deleted a lot of data or dropped some collections.
Disabling preallocation can save you space but will cost you in performance as the file will be allocated when it's actually needed to write to - and that will slow down inserts.
With a 11 GB working set (db.records.totalSize()), I ran the touch command in order to get Mongo to use as much memory as possible on my 16-GB RAM box. Before running touch, the serverStatus command showed that Mongo's mem.resident equaled 5800 (roughly 6 GB RAM).
db.runCommand({ touch: "records", data: true, index: true })
{ "ok" : 1 }
But, after running touch, Mongo's using roughly the same amount of RAM.
"mem" : {
"bits" : 64,
"resident" : 5821, /* only a 21 MB increase */
"virtual" : 29010,
"supported" : true,
"mapped" : 14362,
"mappedWithJournal" : 28724
},
Why did the touch command hardly increase how much RAM Mongo uses (mem.resident)?
The way that MongoDB db.serverStatus() command reports resident memory is by counting how many pages in physical RAM were actually accessed by mongod process.
This means that while your collection and indexes were read into RAM they won't show up in "res" value until you start actually querying against it.
You can verify that the data was read into RAM (if it was definitely cold before) just by seeing how much RAM mongod process has (not virtual memory).
Well, I am new to mongo and today morning I had a (bad) idea. I was playing around with indexes from the shell and decided to create a large collection with many documents (100 million). So I executed the following command:
for (i = 1; i <= 100; i++) {
for (j = 100; j > 0; j--) {
for (k = 1; k <= 100; k++) {
for (l = 100; l > 0; l--) {
db.testIndexes.insert({a:i, b:j, c:k, d:l})
}
}
}
}
However, the things didn't go as I expected:
It took 45 minutes complete the request.
It created 16 GB data on my hard disk.
It used 80% of my RAM (8GB total) and it won't release them till I restarted my PC.
As you can see in the photo below, as the number of documents inside the collection was growing, the time of the insertion of documents was growing as well. I suggest that by the last modification time of the data files:
Is this an expected behavior? I don't think that 100 million simple documents are too much.
P.S. I am now really afraid to run an ensureIndex command.
Edit:
I executed the following command:
> db.testIndexes.stats()
{
"ns" : "test.testIndexes",
"count" : 100000000,
"size" : 7200000056,
"avgObjSize" : 72.00000056,
"storageSize" : 10830266336,
"numExtents" : 28,
"nindexes" : 1,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 3248014112,
"indexSizes" : {
"_id_" : 3248014112
},
"ok" : 1
}
So, the default index on _id has more than 3GB size.
It took 45 minutes complete the request.
Not surprised.
It created 16 GB data on my hard disk.
As #Abhishek states everything seems fine, MongoDB does use a fair amount of space without compression currently (that's coming later hopefully).
It seems that the data size is about 7.2GB while the average object size is 72 bytes, it seems this is working perfectly (since 72 bytes fits into 7.2GB) with the 3GB overhead of the _id index it seems that the storage size of 10GB is fitting quite well.
Though I am concerned that it has used 6GB more than the statistics say it needs to, that might need more looking into. I am guessing it is because of how MongoDB wrote to the data files, it might even be because you was not using a non fire and forget write concern (w>0), all in all; hmmm.
It used 80% of my RAM (8GB total) and it won't release them till I restarted my PC.
MongoDB will try and take as much RAM as the OS will let it. If the OS lets it take 80% then 80% it will take. This is actually a good sign, it shows that MongoDB has the right configuration values to store your working set efficiently.
When running ensureIndex mongod will never free up RAM. It simply has no hooks for that, instead the OS will shrink its allocated block to make room for more (or should rather).
This is an expected behavior, mongo db files starts with filesize 16MB ( test.0 ), and grow till 2GB and then 2GB is constant.
100 million ( 16 GB ) documents in nothing.
You can run ensureIndex, it shouldn't take much time.
You need not to restart your pc, the moment other process needed RAM, mongod will free RAM.
FYI : test.12 is completely empty.
I am guessing you are not worried about 16GB size just for 100 million documents ?