Different realation between dataSize and storageSize in different databases - mongodb

I am using MongoDB 4.0 and I have 2 different databases as below:
First one with this stats
"collections" : 527,
"views" : 0,
"objects" : 20512406,
"avgObjSize" : 145.463036271805,
"dataSize" : 2983796858.0,
"storageSize" : 10980642816.0,
"numExtents" : 0,
"indexes" : 2335,
"indexSize" : 7409999872.0
and the other one with this stats
"collections" : 483,
"views" : 0,
"objects" : 11765584,
"avgObjSize" : 6324.48132315404,
"dataSize" : 74411216264.0,
"storageSize" : 30270824448.0,
"numExtents" : 0,
"indexes" : 1632,
"indexSize" : 939061248.0,
I am using WiredTiger Storage Engine and I know that it compress data and keep it on disc. My question is why in first database, storageSize is larger than dataSize but in the second on dataSize is larger than storageSize?
And one more question, why numExtents : 0 I know that it contains data extents and index extents but why it shows 0?

Related

What are the meanings of "storageSize" and "size" for a collection's sizes in MongoDB 3.2?

About the statistics of a collection
From MongoDB: The Definite Guide, 2ed, by Kristina Chodorow, 2013,
which I guess uses MongoDB 2.4.0, it says that "storageSize" is
greater than "size"
For seeing information about a whole collection, there is a stats function:
> db.boards.stats()
{
"ns" : "brains.boards",
"count" : 12,
"size" : 32292,
"avgObjSize" : 2691,
"storageSize" : 270336,
"numExtents" : 3,
"nindexes" : 2,
"lastExtentSize" : 212992,
"paddingFactor" : 1.0099999999999825,
"flags" : 1,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"username_1_slug_1" : 8176
},
"ok" : 1
}
"size" is what you’d get if you called Object.bsonsize() on each
element in the collection and added up all the sizes: it’s the actual
number of bytes the document in the collection are taking up.
Equivalently, if you take the "avgObjSize" and multiply it by "count",
you’ll get "size".
As mentioned above, a total count of the documents’ bytes leaves out
some important space a collection uses: the padding around each
document and the indexes. "storage Size" not only includes those, but
also empty space that has been set aside for the collection but not
yet used. Collections always have empty space at the “end” so that new
documents can be added quickly.
On my local computer, I experiment with MongoDB 3.2 to get the
statistics of a collection, and find that "storageSize" is smaller
than "size"
> db.c20160712.stats(1024)
{
"ns" : "TVFS.c20160712",
"count" : 2305,
"size" : 231,
"avgObjSize" : 102,
"storageSize" : 80,
...
Do the meanings of "storageSize" and "size" change in MongoDB 3.2 from 2.x?
If yes, what do they mean in 3.2?
Thanks.

How could I know mongo data unused space

In Mongo, the storage size is pre-allocated, example, the db.stats() output as below, the storageSize : 65536 is not full used by the mongo document, but how could I know the free space is available for the pre-allocated storageSize?
"127.0.0.1:27018" : {
"db" : "test",
"collections" : 1,
"objects" : 10,
"avgObjSize" : 53.08618233618234,
"dataSize" : 530,
"storageSize" : 65536,
"numExtents" : 0,
"indexes" : 1,
"indexSize" : 532,
"ok" : 1
If I understand correctly the answer should be equal to storageSize - dataSize?
This document can be really helpfull : http://blog.mongolab.com/2014/01/how-big-is-your-mongodb/
And the db.stats() document of mongodb: https://docs.mongodb.org/manual/reference/command/dbStats/, where you can find the meaning of every values returned by dbStats

MongoDB runs much slower on a real server with wiredTiger than in Vagrant with MMAPv1

I'm having a weird problem. I'm measuring a performance on a similar dataset with a similar indexes (in fact i just mognodumped/mongorestored it).
One instance is running locally in vagrant (1 core, 4 gigs etc...) Another is running on a server.
The version of of MongoDB is 3.0.6.
So I ran that import in vagrant server and got noticeably different results in performance. In fact vagrant instance executes same query 3 to 5 times faster than a real server.
So I checked db.stats() output. And here are the differences:
Reallife server:
> db.stats()
{
"db" : "komparu_product_acc",
"collections" : 1,
"objects" : 30235,
"avgObjSize" : 147517.09485695386,
"dataSize" : 4460179363,
"storageSize" : 1610596352,
"numExtents" : 0,
"indexes" : 16,
"indexSize" : 2682880,
"ok" : 1
}
And here's vagrant instance:
> db.stats()
{
"db" : "komparu_product_dev",
"collections" : 4,
"objects" : 30273,
"avgObjSize" : 261799.2074786113,
"dataSize" : 7925447408,
"storageSize" : 9727320048,
"numExtents" : 27,
"indexes" : 17,
"indexSize" : 11233824,
"fileSize" : 36423335936,
"nsSizeMB" : 16,
"extentFreeList" : {
"num" : 52,
"totalSize" : 24781381472
},
"dataFileVersion" : {
"major" : 4,
"minor" : 22
},
"ok" : 1
}
Here are stats for live server collection:
http://pastebin.com/9vipBmQm
Same for vagrant one:
http://pastebin.com/HbbSi0Pu
As you can see. Live server runs wiredTiger data storage. And I can notice that it compresses data really tight (compared to MMAPv1).
For more concern here's an explain output for the same query on both environments:
Vagrant: http://pastebin.com/8h3iL5Fh
Live: http://pastebin.com/2DjX4YTg
How can I boost the performance of wiredTiger? Or should I switch back to MMAPv1?
Any hints are helpful!
Thank you!

How do i know the size of my mongodb collections so that i can compare that with RAM Size

When i googled for MongoDB best pratices , i found out that , the size of collection in mongodb must be smaller when compared to RAM Size of the CPU
I have got 6 collections in my mongodb Database .
Please tell me how can i know the size of collections present in MongoDB
The status for one of my collection is
db.chains.stats()
{
"ns" : "at.chains",
"count" : 2967,
"size" : 89191980,
"avgObjSize" : 30061.33468149646,
"storageSize" : 335118080,
"numExtents" : 18,
"nindexes" : 3,
"lastExtentSize" : 67136000,
"paddingFactor" : 1.0099999999999996,
"flags" : 1,
"totalIndexSize" : 34742272,
"indexSizes" : {
"_id_" : 155648,
"symbol_1" : 172032,
"unique_symbol_1" : 34414592
},
"ok" : 1
}
Do i need to sum up the size of all the 6 collections i got and compare that with the RAM Size ??
Or is there any other way ??
Thanks in advance .
You just need call db.stats(); in Mongodb console, here is the Mongodb website about your question.
> db.stats();
{
"db" : "test",
"collections" : 5,
"objects" : 24,
"avgObjSize" : 67.33333333333333,
"dataSize" : 1616,
"storageSize" : 28672,
"numExtents" : 5,
"indexes" : 4,
"indexSize" : 32704,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1
You can calculate the size of your collection (chains) by looking at the "size" value. The 89191980 is in bytes, so it's roughly 85 MB. You can find the documentation here: http://docs.mongodb.org/manual/reference/command/collStats/
It'd be a good idea to take a look at this SO thread, they do a good job of covering RAM size & "working set".
What does it mean to fit "working set" into RAM for MongoDB?

Shards are not equally balanced in cluster

I have 2 shards.
One is over standalone server and another over replicaset:
mongos> db.runCommand({listshards:1})
{
"shards" : [
{
"_id" : "shard0000",
"host" : "mongo3:10001"
},
{
"_id" : "set1",
"host" : "set1/mongo1:10001,mongo2:10001"
}
],
"ok" : 1
}
I've inserted about 30M records.
As far as I understand mongo should balance equally the data between the shards, but it does not happen:
mongos> db.stats()
{
"raw" : {
"set1/mongo1:10001,mongo2:10001" : {
"db" : "my_ginger",
"collections" : 3,
"objects" : 5308714,
"avgObjSize" : 811.9953284354742,
"dataSize" : 4310650968,
"storageSize" : 4707774464,
"numExtents" : 23,
"indexes" : 2,
"indexSize" : 421252048,
"fileSize" : 10666115072,
"nsSizeMB" : 16,
"ok" : 1
},
"mongo3:10001" : {
"db" : "my_ginger",
"collections" : 6,
"objects" : 25162626,
"avgObjSize" : 1081.6777010475776,
"dataSize" : 27217851444,
"storageSize" : 28086624096,
"numExtents" : 38,
"indexes" : 6,
"indexSize" : 1903266512,
"fileSize" : 34276900864,
"nsSizeMB" : 16,
"ok" : 1
}
},
"objects" : 30471340,
"avgObjSize" : 1034.6936633571088,
"dataSize" : 31528502412,
"storageSize" : 32794398560,
"numExtents" : 61,
"indexes" : 8,
"indexSize" : 2324518560,
"fileSize" : 44943015936,
"ok" : 1
}
What am I doing wrong?
Thanks.
According the sh.status() output in the comments, you have 164 chunks on shard0000 (the single host) and 85 on set1 (the replica set). There are a couple of common reasons that this kind of imbalance can happen:
You picked a bad shard key (monotonically increasing or similar)
All your data was initially on a single shard and is being rebalanced
The balancer will be continuously attempting to move chunks from the high shard to the low while at the same time moving the max-chunk around (for people that pick the aforementioned monotonically increasing keys, this helps). However, there can only be one migration at the time, so this will take some time, especially if you continue writing/reading from the shards at the same time. If things are really bad, and you did pick a poor shard key, then this may persist for some time.
If all your data was on one shard first, and then you added another, then you have a similar problem - it will take a while for the chunk count to stabilise because half the data has to be moved from the original shard (in addition to its other activities) to balance things out. The balancer will pick low range chunks to move first in general, so if these are less likely to be in memory (back to the poor shard key again), then they will have to be paged in before they can be migrated.
To check the balancer is running:
http://docs.mongodb.org/manual/reference/method/sh.setBalancerState/#sh.getBalancerState
Then, to see what it has been up to, connect to a mongos (last 10 operations):
use config
db.changelog.find().sort({$natural:-1}).limit(10).pretty()
Similarly you will see messaging in the primary logs of each shard relating to the migrations, how long they are taking etc. if you want to see their performance.