Mongo is taking a HUGE amount of space to save this data? - mongodb

https://gist.github.com/1173528#comments
shows the structure of the data file ...
the short version is
{ "img_ref" : {
"$ref" : "mapimage",
"$id" : ObjectId("4e454599f404e8d51c000002")
},
"scale" : 128, "image" : "4e454599f404e8d51c000002", "tile_i" : 0, "tile_j" : 9, "w" : 9, "e" : 10, "n" : 0, "s" : 0,
"heights" : [
[
0,
2,
0,
1,
515,
0,
256,
...], [...]
, _id: ObjectId("...") }
The stats() on this collection is:
{
"ns" : "ac2.mapimage_tile",
"count" : 18443,
"size" : 99513670744,
"avgObjSize" : 5395742.056281516,
"storageSize" : 100336473712,
"numExtents" : 74,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 5832704,
"indexSizes" : {
"_id_" : 786432,
"img_ref_1_tile_i_1_tile_j_1" : 2236416,
"image_1" : 1212416,
"image_1_tile_i_1_tile_j_1_scale_1" : 1597440
},
"ok" : 1
}
Note the average object size, 5,395,742 bytes - or 5 MB! 5 MB for storing 16,384 ints seems pretty extreme!

See http://bsonspec.org/#/specification for how things get serialized in mongodb. Arrays are actually very space inefficient especially because we store the index number as a string key for each element. This is less of a problem for small arrays of large elements like strings or objects, but for large arrays of 32-bit ints it is very expensive.

MongoDB pre-allocates space for it's databases: http://www.mongodb.org/display/DOCS/Developer+FAQ#DeveloperFAQ-Whyaremydatafilessolarge%3F
What you are likely seeing is that pre-allocation- if you add further items, you probably will not see a further increase in space usage for a long while.
Also: http://www.mongodb.org/display/DOCS/Excessive+Disk+Space

Related

MongoDB - totalSize of a collection

Assume a collection name test has the following data
{ a : 1}
{ a : 2}
Also, it is indexed on {a : 1}
> db.test.stats()
{
"ns" : "mydb.test",
"count" : 2,
"size" : 96,
"avgObjSize" : 48,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 2,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"a_1" : 8176
},
"ok" : 1
}
> db.test.totalSize()
24544
As per documentation - it returns The total size of the data in the collection plus the size of every indexes on the collection. How ? From the data above,
total size of the data -> "size" : 96
size of every indexes -> 8176 * 2 -> 16392
Total -> 16392 + 96 = 16488
Why is there a difference ? What am I missing ?
The totalSize is equal to storageSize + totalIndexSize. As you will notice, these add up to exactly 24544.
To avoid constant reallocation of new hard-drive space and the resulting filesystem fragmentation when new documents are added to a collection, MongoDB overallocates storage space for each collection in advance. As a result, the total space used by a collection is always larger than the sum of its data.

exception: BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB)

I'm trying to use the full search http://docs.mongodb.org/manual/tutorial/search-for-text/
db ['Item']. runCommand ('text', {search: 'deep voice', language: 'english'})
it works well
but when I add conditions
db['Item'].runCommand( 'text', { search: 'deep voice' , language: 'english' , filter: {"and":[{"_extendedBy":{"in":["Voiceover"]}},{"and":[{"or":[{"removed":null},{"removed":{"\(exists":false}}]},{"category":ObjectId("51bc464ab012269e23278d55")},{"active":true},{"visible":true}]}]} } )
I receive an error
{
"queryDebugString" : "deep|voic||||||",
"language" : "english",
"errmsg" : "exception: BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _extendedBy: \"Voiceover\"",
"code" : 10334,
"ok" : 0
}
delete the word "voice"
db['Item'].runCommand( 'text', { search: 'deep' , language: 'english' , filter: {"\)and":[{"_extendedBy":{"in":["Voiceover"]}},{"and":[{"or":[{"removed":null},{"removed":{"exists":false}}]},{"category":ObjectId("51bc464ab012269e23278d55")},{"active":true},{"visible":true}]}]} } );
receive
response to a request ...... ......
],
"stats" : {
"nscanned" : 87,
"nscannedObjects" : 87,
"n" : 18,
"nfound" : 18,
"timeMicros" : 1013
},
"ok" : 1
}
Couldn’t understand why the error occurs?
database is not large "storageSize" : 2793472,
db.Item.stats()
{
"ns" : "internetjock.Item",
"count" : 616,
"size" : 2035840,
"avgObjSize" : 3304.935064935065,
"storageSize" : 2793472,
"numExtents" : 5,
"nindexes" : 12,
"lastExtentSize" : 2097152,
"paddingFactor" : 1.0000000000001221,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 7440160,
"indexSizes" : {
"_id_" : 24528,
"modlrHff22a60ae822e1e68ba919bbedcb8957d5c5d10f" : 40880,
"modlrH6f786b134a46c37db715aa2c831cfbe1fadb9d1d" : 40880,
"modlrI467f6180af484be29ee9258920fc4837992c825e" : 24528,
"modlrI5cb302f507b9d0409921ac0c51f7d9fc4fd5d2ee" : 40880,
"modlrI6393f31b5b6b4b2cd9517391dabf5db6d6dd3c28" : 8176,
"modlrI1c5cbf0ce48258a5a39c1ac54a1c1a038ebe1027" : 32704,
"modlrH6e623929cc3867746630bae4572b9dbe5bd3b9f7" : 40880,
"modlrH72ea9b8456321008fd832ef9459d868800ce87cb" : 40880,
"modlrU821e16c04f9069f8d0b705d78d8f666a007c274d" : 24528,
"modlrT88fc09e54b17679b0028556344b50c9fe169bdb5" : 7080416,
"modlrIefa804b72cc346d66957110e286839a3f42793ef" : 40880
},
"ok" : 1
}
I had same problem with mongo 3.0.0 and 3.1.9 with relatively small database (12GB).
After wasting roughly 4 hours of time on this I found workaround using hidden parameter
mongorestore --batchSize=10
where number varies depending on nature of your data. Start with 1000.
The result document returned by the first query is apparently greater than 16MB. MongoDB has a max document size of 16MB. The second query is returning a document that's lesser than 16MB and hence no errors.
There's no way around this. Here's the link to documentation:
http://docs.mongodb.org/manual/reference/limits/
Recreate the Text Index and everything works :-)
db.Item.dropIndex('modlrT88fc09e54b17679b0028556344b50c9fe169bdb5');
db.Item.ensureIndex({'keywords':'text'},{'name':'modlrT88fc09e54b17679b0028556344b50c9fe169bdb5'})
db.Item.stats()
...
"modlrT88fc09e54b17679b0028556344b50c9fe169bdb5" : 7080416, //before
...
"modlrT88fc09e54b17679b0028556344b50c9fe169bdb5" : 2518208 //after Recreated the Text Index

How do i know the size of my mongodb collections so that i can compare that with RAM Size

When i googled for MongoDB best pratices , i found out that , the size of collection in mongodb must be smaller when compared to RAM Size of the CPU
I have got 6 collections in my mongodb Database .
Please tell me how can i know the size of collections present in MongoDB
The status for one of my collection is
db.chains.stats()
{
"ns" : "at.chains",
"count" : 2967,
"size" : 89191980,
"avgObjSize" : 30061.33468149646,
"storageSize" : 335118080,
"numExtents" : 18,
"nindexes" : 3,
"lastExtentSize" : 67136000,
"paddingFactor" : 1.0099999999999996,
"flags" : 1,
"totalIndexSize" : 34742272,
"indexSizes" : {
"_id_" : 155648,
"symbol_1" : 172032,
"unique_symbol_1" : 34414592
},
"ok" : 1
}
Do i need to sum up the size of all the 6 collections i got and compare that with the RAM Size ??
Or is there any other way ??
Thanks in advance .
You just need call db.stats(); in Mongodb console, here is the Mongodb website about your question.
> db.stats();
{
"db" : "test",
"collections" : 5,
"objects" : 24,
"avgObjSize" : 67.33333333333333,
"dataSize" : 1616,
"storageSize" : 28672,
"numExtents" : 5,
"indexes" : 4,
"indexSize" : 32704,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1
You can calculate the size of your collection (chains) by looking at the "size" value. The 89191980 is in bytes, so it's roughly 85 MB. You can find the documentation here: http://docs.mongodb.org/manual/reference/command/collStats/
It'd be a good idea to take a look at this SO thread, they do a good job of covering RAM size & "working set".
What does it mean to fit "working set" into RAM for MongoDB?

What does the max field mean in the output of db.<collectionname>.stats( )?

I am looking at the output of db.system.profile.stats() and I'm curious about what the max field means in the returned document (running mongodb 2.2.2).
Here's an example:
> db.system.profile.stats()
{
"ns" : "mydb.system.profile",
"count" : 2476,
"size" : 1012284,
"avgObjSize" : 408.83844911147014,
"storageSize" : 1052672,
"numExtents" : 2,
"nindexes" : 0,
"lastExtentSize" : 4096,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {
},
"capped" : true,
"max" : 2147483647,
"ok" : 1
}
There is no mention of max on the official mongodb documentation of db.collection.stats().
Perhaps it has something to do with the fact that system.profile is a capped collection. Although max is definitely not the maximum size of the capped collection because (1) the max shown is a huge number and (2) my collection doesn't get larger than 2500 or so documents and the total size is much less than this.
Any thoughts?
Thanks,
Kevin
max is an optional setting for a capped collection to also limit the number of documents in the collection, instead of just limiting by number of bytes (size).
See docs here.

Why are my mongodb indexes so large

I have 57M documents in my mongodb collection, which is 19G of data.
My indexes are taking up 10G. Does this sound normal or could I be doing something very wrong! My primary key is 2G.
{
"ns" : "myDatabase.logs",
"count" : 56795183,
"size" : 19995518140,
"avgObjSize" : 352.0636272974065,
"storageSize" : 21217578928,
"numExtents" : 39,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 10753999088,
"indexSizes" : {
"_id_" : 2330814080,
"type_1_playerId_1" : 2999537296,
"type_1_time_-1" : 2344582464,
"type_1_tableId_1" : 3079065248
},
"ok" : 1
}
The index size is determined by the number of documents being indexed, as well as the size of the key (compound keys store more information and will be larger). In this case, the _id index divided by the number of documents is 40 bytes, which seems relatively reasonable.
If you run db.collection.getIndexes(), you can find the index version. If {v : 0}, the index was created prior to mongo 2.0, in which case you should upgrade to {v:1}. This process is documented here: http://www.mongodb.org/display/DOCS/Index+Versions