I have 2 shards.
One is over standalone server and another over replicaset:
mongos> db.runCommand({listshards:1})
{
"shards" : [
{
"_id" : "shard0000",
"host" : "mongo3:10001"
},
{
"_id" : "set1",
"host" : "set1/mongo1:10001,mongo2:10001"
}
],
"ok" : 1
}
I've inserted about 30M records.
As far as I understand mongo should balance equally the data between the shards, but it does not happen:
mongos> db.stats()
{
"raw" : {
"set1/mongo1:10001,mongo2:10001" : {
"db" : "my_ginger",
"collections" : 3,
"objects" : 5308714,
"avgObjSize" : 811.9953284354742,
"dataSize" : 4310650968,
"storageSize" : 4707774464,
"numExtents" : 23,
"indexes" : 2,
"indexSize" : 421252048,
"fileSize" : 10666115072,
"nsSizeMB" : 16,
"ok" : 1
},
"mongo3:10001" : {
"db" : "my_ginger",
"collections" : 6,
"objects" : 25162626,
"avgObjSize" : 1081.6777010475776,
"dataSize" : 27217851444,
"storageSize" : 28086624096,
"numExtents" : 38,
"indexes" : 6,
"indexSize" : 1903266512,
"fileSize" : 34276900864,
"nsSizeMB" : 16,
"ok" : 1
}
},
"objects" : 30471340,
"avgObjSize" : 1034.6936633571088,
"dataSize" : 31528502412,
"storageSize" : 32794398560,
"numExtents" : 61,
"indexes" : 8,
"indexSize" : 2324518560,
"fileSize" : 44943015936,
"ok" : 1
}
What am I doing wrong?
Thanks.
According the sh.status() output in the comments, you have 164 chunks on shard0000 (the single host) and 85 on set1 (the replica set). There are a couple of common reasons that this kind of imbalance can happen:
You picked a bad shard key (monotonically increasing or similar)
All your data was initially on a single shard and is being rebalanced
The balancer will be continuously attempting to move chunks from the high shard to the low while at the same time moving the max-chunk around (for people that pick the aforementioned monotonically increasing keys, this helps). However, there can only be one migration at the time, so this will take some time, especially if you continue writing/reading from the shards at the same time. If things are really bad, and you did pick a poor shard key, then this may persist for some time.
If all your data was on one shard first, and then you added another, then you have a similar problem - it will take a while for the chunk count to stabilise because half the data has to be moved from the original shard (in addition to its other activities) to balance things out. The balancer will pick low range chunks to move first in general, so if these are less likely to be in memory (back to the poor shard key again), then they will have to be paged in before they can be migrated.
To check the balancer is running:
http://docs.mongodb.org/manual/reference/method/sh.setBalancerState/#sh.getBalancerState
Then, to see what it has been up to, connect to a mongos (last 10 operations):
use config
db.changelog.find().sort({$natural:-1}).limit(10).pretty()
Similarly you will see messaging in the primary logs of each shard relating to the migrations, how long they are taking etc. if you want to see their performance.
Related
About the statistics of a collection
From MongoDB: The Definite Guide, 2ed, by Kristina Chodorow, 2013,
which I guess uses MongoDB 2.4.0, it says that "storageSize" is
greater than "size"
For seeing information about a whole collection, there is a stats function:
> db.boards.stats()
{
"ns" : "brains.boards",
"count" : 12,
"size" : 32292,
"avgObjSize" : 2691,
"storageSize" : 270336,
"numExtents" : 3,
"nindexes" : 2,
"lastExtentSize" : 212992,
"paddingFactor" : 1.0099999999999825,
"flags" : 1,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"username_1_slug_1" : 8176
},
"ok" : 1
}
"size" is what you’d get if you called Object.bsonsize() on each
element in the collection and added up all the sizes: it’s the actual
number of bytes the document in the collection are taking up.
Equivalently, if you take the "avgObjSize" and multiply it by "count",
you’ll get "size".
As mentioned above, a total count of the documents’ bytes leaves out
some important space a collection uses: the padding around each
document and the indexes. "storage Size" not only includes those, but
also empty space that has been set aside for the collection but not
yet used. Collections always have empty space at the “end” so that new
documents can be added quickly.
On my local computer, I experiment with MongoDB 3.2 to get the
statistics of a collection, and find that "storageSize" is smaller
than "size"
> db.c20160712.stats(1024)
{
"ns" : "TVFS.c20160712",
"count" : 2305,
"size" : 231,
"avgObjSize" : 102,
"storageSize" : 80,
...
Do the meanings of "storageSize" and "size" change in MongoDB 3.2 from 2.x?
If yes, what do they mean in 3.2?
Thanks.
I'm having a weird problem. I'm measuring a performance on a similar dataset with a similar indexes (in fact i just mognodumped/mongorestored it).
One instance is running locally in vagrant (1 core, 4 gigs etc...) Another is running on a server.
The version of of MongoDB is 3.0.6.
So I ran that import in vagrant server and got noticeably different results in performance. In fact vagrant instance executes same query 3 to 5 times faster than a real server.
So I checked db.stats() output. And here are the differences:
Reallife server:
> db.stats()
{
"db" : "komparu_product_acc",
"collections" : 1,
"objects" : 30235,
"avgObjSize" : 147517.09485695386,
"dataSize" : 4460179363,
"storageSize" : 1610596352,
"numExtents" : 0,
"indexes" : 16,
"indexSize" : 2682880,
"ok" : 1
}
And here's vagrant instance:
> db.stats()
{
"db" : "komparu_product_dev",
"collections" : 4,
"objects" : 30273,
"avgObjSize" : 261799.2074786113,
"dataSize" : 7925447408,
"storageSize" : 9727320048,
"numExtents" : 27,
"indexes" : 17,
"indexSize" : 11233824,
"fileSize" : 36423335936,
"nsSizeMB" : 16,
"extentFreeList" : {
"num" : 52,
"totalSize" : 24781381472
},
"dataFileVersion" : {
"major" : 4,
"minor" : 22
},
"ok" : 1
}
Here are stats for live server collection:
http://pastebin.com/9vipBmQm
Same for vagrant one:
http://pastebin.com/HbbSi0Pu
As you can see. Live server runs wiredTiger data storage. And I can notice that it compresses data really tight (compared to MMAPv1).
For more concern here's an explain output for the same query on both environments:
Vagrant: http://pastebin.com/8h3iL5Fh
Live: http://pastebin.com/2DjX4YTg
How can I boost the performance of wiredTiger? Or should I switch back to MMAPv1?
Any hints are helpful!
Thank you!
I generated a database test in MongoDB, having a collections named col. The command show dbs gives:
admin (empty)
local 0.078GB
test 1.953GB
(I really don't know, why the size is >1.9 GB, since there is only 21 small documents in the collection col).
The Command db.stats() tells me, that there are 3 collections available:
{
"db" : "test",
"collections" : 3,
"objects" : 25,
"avgObjSize" : 96.64,
"dataSize" : 2416,
"storageSize" : 857456640,
"numExtents" : 19,
"indexes" : 1,
"indexSize" : 8176,
"fileSize" : 2080374784,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"extentFreeList" : {
"num" : 2,
"totalSize" : 139264
},
"ok" : 1
}
But when i type show collections, only two collections are listet.
col
system.indexes
So, where does this 3rd collection come from?
And does it explain why the test-Database is 1.9GB large? db.col.stats() tells me, that lots of data is stored, but the 21 documents are really small:
{
"ns" : "test.col",
"count" : 21,
"size" : 2160,
"avgObjSize" : 102,
"storageSize" : 857440256,
"numExtents" : 17,
"nindexes" : 1,
"lastExtentSize" : 227803136,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1
}
The size
Ok, first, as for the size: show dbs shows the size of the actual data files, not their content.
MongoDB preallocates datafiles of static size, usually beginning with 64MB (+ a namespace file), doubling which each step until 2GB are reached. Each subsequent datafile will have 2GB in size. A new datafile is allocated as soon as the last allocated datafile receives it's first entry, thereby eliminating latency for data file allocation for requests.
It may well be the case that your test database (being the default database) was much bigger, but documents or even whole collections were deleted. However, as of the time of this writing, mongod never reclaims disk space automatically, but leaves allocated datafiles for future use.
The "Ghost" collection
As for the "Ghost" collection: Yes, there is one. It is the namespace collection of the database (not surprisingly called system.namespaces), which is implicit. Have a look at the output of
db.system.namespaces.find()
That being said: Don't fiddle with any of the system.* collections. They don't have their name for fun.
When i googled for MongoDB best pratices , i found out that , the size of collection in mongodb must be smaller when compared to RAM Size of the CPU
I have got 6 collections in my mongodb Database .
Please tell me how can i know the size of collections present in MongoDB
The status for one of my collection is
db.chains.stats()
{
"ns" : "at.chains",
"count" : 2967,
"size" : 89191980,
"avgObjSize" : 30061.33468149646,
"storageSize" : 335118080,
"numExtents" : 18,
"nindexes" : 3,
"lastExtentSize" : 67136000,
"paddingFactor" : 1.0099999999999996,
"flags" : 1,
"totalIndexSize" : 34742272,
"indexSizes" : {
"_id_" : 155648,
"symbol_1" : 172032,
"unique_symbol_1" : 34414592
},
"ok" : 1
}
Do i need to sum up the size of all the 6 collections i got and compare that with the RAM Size ??
Or is there any other way ??
Thanks in advance .
You just need call db.stats(); in Mongodb console, here is the Mongodb website about your question.
> db.stats();
{
"db" : "test",
"collections" : 5,
"objects" : 24,
"avgObjSize" : 67.33333333333333,
"dataSize" : 1616,
"storageSize" : 28672,
"numExtents" : 5,
"indexes" : 4,
"indexSize" : 32704,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1
You can calculate the size of your collection (chains) by looking at the "size" value. The 89191980 is in bytes, so it's roughly 85 MB. You can find the documentation here: http://docs.mongodb.org/manual/reference/command/collStats/
It'd be a good idea to take a look at this SO thread, they do a good job of covering RAM size & "working set".
What does it mean to fit "working set" into RAM for MongoDB?
I have 57M documents in my mongodb collection, which is 19G of data.
My indexes are taking up 10G. Does this sound normal or could I be doing something very wrong! My primary key is 2G.
{
"ns" : "myDatabase.logs",
"count" : 56795183,
"size" : 19995518140,
"avgObjSize" : 352.0636272974065,
"storageSize" : 21217578928,
"numExtents" : 39,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 10753999088,
"indexSizes" : {
"_id_" : 2330814080,
"type_1_playerId_1" : 2999537296,
"type_1_time_-1" : 2344582464,
"type_1_tableId_1" : 3079065248
},
"ok" : 1
}
The index size is determined by the number of documents being indexed, as well as the size of the key (compound keys store more information and will be larger). In this case, the _id index divided by the number of documents is 40 bytes, which seems relatively reasonable.
If you run db.collection.getIndexes(), you can find the index version. If {v : 0}, the index was created prior to mongo 2.0, in which case you should upgrade to {v:1}. This process is documented here: http://www.mongodb.org/display/DOCS/Index+Versions