Why are my mongodb indexes so large - mongodb

I have 57M documents in my mongodb collection, which is 19G of data.
My indexes are taking up 10G. Does this sound normal or could I be doing something very wrong! My primary key is 2G.
{
"ns" : "myDatabase.logs",
"count" : 56795183,
"size" : 19995518140,
"avgObjSize" : 352.0636272974065,
"storageSize" : 21217578928,
"numExtents" : 39,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 10753999088,
"indexSizes" : {
"_id_" : 2330814080,
"type_1_playerId_1" : 2999537296,
"type_1_time_-1" : 2344582464,
"type_1_tableId_1" : 3079065248
},
"ok" : 1
}

The index size is determined by the number of documents being indexed, as well as the size of the key (compound keys store more information and will be larger). In this case, the _id index divided by the number of documents is 40 bytes, which seems relatively reasonable.
If you run db.collection.getIndexes(), you can find the index version. If {v : 0}, the index was created prior to mongo 2.0, in which case you should upgrade to {v:1}. This process is documented here: http://www.mongodb.org/display/DOCS/Index+Versions

Related

What are the meanings of "storageSize" and "size" for a collection's sizes in MongoDB 3.2?

About the statistics of a collection
From MongoDB: The Definite Guide, 2ed, by Kristina Chodorow, 2013,
which I guess uses MongoDB 2.4.0, it says that "storageSize" is
greater than "size"
For seeing information about a whole collection, there is a stats function:
> db.boards.stats()
{
"ns" : "brains.boards",
"count" : 12,
"size" : 32292,
"avgObjSize" : 2691,
"storageSize" : 270336,
"numExtents" : 3,
"nindexes" : 2,
"lastExtentSize" : 212992,
"paddingFactor" : 1.0099999999999825,
"flags" : 1,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"username_1_slug_1" : 8176
},
"ok" : 1
}
"size" is what you’d get if you called Object.bsonsize() on each
element in the collection and added up all the sizes: it’s the actual
number of bytes the document in the collection are taking up.
Equivalently, if you take the "avgObjSize" and multiply it by "count",
you’ll get "size".
As mentioned above, a total count of the documents’ bytes leaves out
some important space a collection uses: the padding around each
document and the indexes. "storage Size" not only includes those, but
also empty space that has been set aside for the collection but not
yet used. Collections always have empty space at the “end” so that new
documents can be added quickly.
On my local computer, I experiment with MongoDB 3.2 to get the
statistics of a collection, and find that "storageSize" is smaller
than "size"
> db.c20160712.stats(1024)
{
"ns" : "TVFS.c20160712",
"count" : 2305,
"size" : 231,
"avgObjSize" : 102,
"storageSize" : 80,
...
Do the meanings of "storageSize" and "size" change in MongoDB 3.2 from 2.x?
If yes, what do they mean in 3.2?
Thanks.

Size of MongoDB capped collection is not right?

I have a capped collection. When I call stats() on it I get following output:
/* 0 */
{
"ns" : "log_db.access_logs",
"count" : 42088,
"size" : 13602632,
"avgObjSize" : 323,
"storageSize" : 100003840,
"numExtents" : 1,
"nindexes" : 2,
"lastExtentSize" : 100003840,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 3932656,
"indexSizes" : {
"_id_" : 1389920,
"apikey_1_ts_1" : 2542736
},
"capped" : true,
"max" : NumberLong(9223372036854775807),
"ok" : 1
}
I don't know what 'max' parameter is showing. I created this collection using db.runCommand('convertToCapped' ... ) construct and I had put the 'size' parameter to be 100 million. But here there's no mention of it (size)
Can someone explain the meaning of 'max' field here and also how to find correct size of a capped collection.
I had this exact question once, since it was not in the documentation and this is the answer I got from a 10gen (MongoDB Inc) employee mongodb capped collection is not using all available space :
The max property is an (optional) maximum number of documents to allow in the capped collection (see db.createCollection()). If you don't specify a max value, it will be set to MAXINT (in your example, the maximum positive value for a signed Int64). Since space for capped collections is always preallocated, the size limit takes precedence over the max number of documents.

MongoDB - totalSize of a collection

Assume a collection name test has the following data
{ a : 1}
{ a : 2}
Also, it is indexed on {a : 1}
> db.test.stats()
{
"ns" : "mydb.test",
"count" : 2,
"size" : 96,
"avgObjSize" : 48,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 2,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"a_1" : 8176
},
"ok" : 1
}
> db.test.totalSize()
24544
As per documentation - it returns The total size of the data in the collection plus the size of every indexes on the collection. How ? From the data above,
total size of the data -> "size" : 96
size of every indexes -> 8176 * 2 -> 16392
Total -> 16392 + 96 = 16488
Why is there a difference ? What am I missing ?
The totalSize is equal to storageSize + totalIndexSize. As you will notice, these add up to exactly 24544.
To avoid constant reallocation of new hard-drive space and the resulting filesystem fragmentation when new documents are added to a collection, MongoDB overallocates storage space for each collection in advance. As a result, the total space used by a collection is always larger than the sum of its data.

How do i know the size of my mongodb collections so that i can compare that with RAM Size

When i googled for MongoDB best pratices , i found out that , the size of collection in mongodb must be smaller when compared to RAM Size of the CPU
I have got 6 collections in my mongodb Database .
Please tell me how can i know the size of collections present in MongoDB
The status for one of my collection is
db.chains.stats()
{
"ns" : "at.chains",
"count" : 2967,
"size" : 89191980,
"avgObjSize" : 30061.33468149646,
"storageSize" : 335118080,
"numExtents" : 18,
"nindexes" : 3,
"lastExtentSize" : 67136000,
"paddingFactor" : 1.0099999999999996,
"flags" : 1,
"totalIndexSize" : 34742272,
"indexSizes" : {
"_id_" : 155648,
"symbol_1" : 172032,
"unique_symbol_1" : 34414592
},
"ok" : 1
}
Do i need to sum up the size of all the 6 collections i got and compare that with the RAM Size ??
Or is there any other way ??
Thanks in advance .
You just need call db.stats(); in Mongodb console, here is the Mongodb website about your question.
> db.stats();
{
"db" : "test",
"collections" : 5,
"objects" : 24,
"avgObjSize" : 67.33333333333333,
"dataSize" : 1616,
"storageSize" : 28672,
"numExtents" : 5,
"indexes" : 4,
"indexSize" : 32704,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1
You can calculate the size of your collection (chains) by looking at the "size" value. The 89191980 is in bytes, so it's roughly 85 MB. You can find the documentation here: http://docs.mongodb.org/manual/reference/command/collStats/
It'd be a good idea to take a look at this SO thread, they do a good job of covering RAM size & "working set".
What does it mean to fit "working set" into RAM for MongoDB?

What does the max field mean in the output of db.<collectionname>.stats( )?

I am looking at the output of db.system.profile.stats() and I'm curious about what the max field means in the returned document (running mongodb 2.2.2).
Here's an example:
> db.system.profile.stats()
{
"ns" : "mydb.system.profile",
"count" : 2476,
"size" : 1012284,
"avgObjSize" : 408.83844911147014,
"storageSize" : 1052672,
"numExtents" : 2,
"nindexes" : 0,
"lastExtentSize" : 4096,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {
},
"capped" : true,
"max" : 2147483647,
"ok" : 1
}
There is no mention of max on the official mongodb documentation of db.collection.stats().
Perhaps it has something to do with the fact that system.profile is a capped collection. Although max is definitely not the maximum size of the capped collection because (1) the max shown is a huge number and (2) my collection doesn't get larger than 2500 or so documents and the total size is much less than this.
Any thoughts?
Thanks,
Kevin
max is an optional setting for a capped collection to also limit the number of documents in the collection, instead of just limiting by number of bytes (size).
See docs here.