How to improve/optimize speed of MongoDB query? - mongodb

I have a small Mongo database with ~30k records.
Simple query, which uses 5-6 parameters takes almost a second (considering entire DB is in RAM).
Can anyone suggest what I'm doing wrong?
2015-11-26T18:41:29.540+0200 [conn3] command vvpilotdb2.$cmd command:
count { count: "TestResults", query: { Test: 5.0, IsAC: true,
InputMode: 0.0, IsOfficialTest: true, IsSanity: false, IsStress:
false, IsUnderNoise: false, MetalRodSize: 9.0 }, fields: {} }
planSummary: COLLSCAN keyUpdates:0 numYields:1 locks(micros) r:1397227
reslen:48 944ms
Here is db.stats(). I haven't assigned any indexes by myself. all settings - default.:
> db.stats()
{
"db" : "vvpilotdb2",
"collections" : 5,
"objects" : 28997,
"avgObjSize" : 7549.571610856296,
"dataSize" : 218914928,
"storageSize" : 243347456,
"numExtents" : 17,
"indexes" : 3,
"indexSize" : 964768,
"fileSize" : 469762048,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"ok" : 1
}

In MongoDB, the _id field is indexed by default.
You should index the field which you will be using for making the query.
Compound indexes can also be created on multiple fields and the order for them (ascending/descending).
Here's the documentation for the same:
https://docs.mongodb.org/manual/indexes/

Related

MongoDB optimization

I need to optimize my MongoDB performance but can't figure out how. Maybe there's some tips. Or maybe i should use another storage engine. Any ideas are welcome.
I have following log output in which are described query:
2015-08-04T15:09:56.226+0300 [conn129682] command mongodb_db1.$cmd command: aggregate { aggregate: "collection", pipeline: [ { $match: { _id.index_id_1: 4931359 } } ] } keyUpdates:0 numYields:39 locks(micros) r:83489 reslen:177280 286ms
I have collection named collection which contains following data structure:
{
"_id" : {
"x" : "x",
"index_id_1" : NumberLong(5617088)
},
"value" : {
"value_1" : 1.0000000000000000,
"value_2" : 0.0000000000000000,
"value_3" : 1.0000000000000000
}
}
By querying stats in result i have following details:
{
"ns" : "mongodb_db1.collection",
"count" : 2.07e+007,
"size" : 4968000000.0000000000000000,
"avgObjSize" : 240,
"storageSize" : 5524459408.0000000000000000,
"numExtents" : 25,
"nindexes" : 3,
"lastExtentSize" : 5.36601e+008,
"paddingFactor" : 1.0000000000000000,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 4475975728.0000000000000000,
"indexSizes" : {
"_id_" : 2884043120.0000000000000000,
"_id.x.index_id_1" : 1.07118e+009,
"_id.index_id_1" : 5.20754e+008
},
"ok" : 1.0000000000000000
}
Running on single node ( no shards ).
MongoDB version is: 2.4.
Installed RAM (MB): 24017 ( index size ~120GB )
10Gen / Mongodb are running a series of FREE online courses, that cover all you need to know (Latest iteration starts today). Simply head over and sign up for the DBA course, and if your feeling brave a couple of the others, but there is a lot of common / duplicated material, between all variants at the beginning.

Getting rid of _id in mongodb collection

I know it is not possible to remove the _id field in a mongodb collection. However, the size of my collections is large, that the index on the _id field prevents me from loading the other indices in the RAM. My machine has 125GB of RAM and my collection stats is as follows:
db.call_records.stats()
{
"ns" : "stc_cdrs.call_records",
"count" : 1825338618,
"size" : 438081268320,
"avgObjSize" : 240,
"storageSize" : 468641284752,
"numExtents" : 239,
"nindexes" : 3,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 165290709024,
"indexSizes" : {
"_id_" : 73450862016,
"caller_id_1" : 45919923504,
"receiver_id_1" : 45919923504
},
"ok" : 1
}
When I do a query like the following:
db.call_records.find({ "$or" : [ { "caller_id": 125091840205 }, { "receiver_id" : 125091840205 } ] }).explain()
{
"clauses" : [
{
"cursor" : "BtreeCursor caller_id_1",
"isMultiKey" : false,
"n" : 401,
"nscannedObjects" : 401,
"nscanned" : 401,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"caller_id" : [
[
125091840205,
125091840205
]
]
}
},
{
"cursor" : "BtreeCursor receiver_id_1",
"isMultiKey" : false,
"n" : 383,
"nscannedObjects" : 383,
"nscanned" : 383,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"receiver_id" : [
[
125091840205,
125091840205
]
]
it takes more than 15 seconds on average to return the results. The indices for both caller_id and receiver_id should be around 90GB, which is OK. However, the 73GB index on the _id makes this query very slow.
You correctly told that you can not remove _id field from your document. You also can not remove an index from this field, so this is something you have to live with.
For some reason you start with the assumption that _id index makes your query slow, which is completely unjustifiable and most probably is wrong. This index is not used and just stays there untouched.
Few things I would try to do in your situation:
You have 400 billion documents in your collection, have you thought that this is a right time to start sharding your database? In my opinion you should.
use explain with your query to actually figure out what slows it down.
Looking at your query, I would also try to do the following:
change your document from
{
... something else ...
receiver_id: 234,
caller_id: 342
}
to
{
... something else ...
participants: [342, 234]
}
where your participants are [caller_id, receiver_id] in this order, then you can put only one index on this field. I know that it will not make your indices smaller, but I hope that because you will not use $or clause, you will get results faster. P.S. if you will do this, do not do this in production, test whether it give you a significant improvement and only then change in prod.
There are a lot of potential issues here.
The first is that your indexes do not include all of the data returned. This means Mongo is getting the _id from the index and then using the _id to retrieve and return the document in question. So removing the _id index, even if you could, would not help.
Second, the query includes an OR. This forces Mongo to load both indexes so that it can read them and then retrieve the documents in question.
To improve performance, I think you have just a few choices:
Add the additional elements to the indexes and restrict the data returned to what is available in the index (this would change indexOnly = true in the explain results)
Explore sharding as Skooppa.com mentioned.
Rework the query and/or the document to eliminate the OR condition.

mongodb query should be covered by index but is not

the query:
db.myColl.find({"M.ST": "mostrepresentedvalueinthecollection", "M.TS": new Date(2014,2,1)}).explain()
explain output :
"cursor" : "BtreeCursor M.ST_1_M.TS_1",
"isMultiKey" : false,
"n" : 587606,
"nscannedObjects" : 587606,
"nscanned" : 587606,
"nscannedObjectsAllPlans" : 587606,
"nscannedAllPlans" : 587606,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 9992,
"nChunkSkips" : 0,
"millis" : 174820,
"indexBounds" : {
"M.ST" : [
[
"mostrepresentedvalueinthecollection",
"mostrepresentedvalueinthecollection"
]
],
"M.TS" : [
[
ISODate("2014-03-01T00:00:00Z"),
ISODate("2014-03-01T00:00:00Z")
]
]
},
"server" : "myServer"
additional details: myColl contains about 40m documents, average object size is 300b.
I don't get why indexOnly is not set to true, I have a compound index on {"M.ST":1, "M.TS":1}
The mongo host is a unix box with 16gb RAM and 500gb disk space (spinning disk).
The total index size of the database is 10gb, we've got around 1k upserts/sec, on those 1K 20 are inserts the rest are Increments.
We have another query that adds a third field in the find query (called "M.X"), and also a compound index on "M.ST", "M.X", "M.TS". That one is lightning fast and scans only 330 documents.
Any idea what could be wrong ?
Thanks.
EDIT : here's the structure of a sample document:
{
"_id" : "somestring",
"D" : {
"20140301" : {
"IM" : {
"CT" : 143
}
},
"20140302" : {
"IM" : {
"CT" : 44
}
},
"20140303" : {
"IM" : {
"CT" : 206
}
},
"20140314" : {
"IM" : {
"CT" : 5
}
}
},
"Y" : "someotherstring",
"IM" : {
"CT" : 1
},
"M" : {
"X" : 99999,
"ST" : "mostrepresentedvalueinthecollection",
"TS" : ISODate("2014-03-01T00:00:00.000Z")
},
}
The idea is to store some analytics metrics by month, the "D" field represents an array of documents containing data for each day of the month.
EDIT:
This feature is not currently implemented. Corresponding JIRA ticket is SERVER-2104. You can upvote for it, but for now, to utilize covered index queries you need to avoid use of dot-notation/embedded document.
I think you need to set a projection on that query, to tell mongo what indexes it covers.
Try this..
db.myColl.find({"M.ST": "mostrepresentedvalueinthecollection", "M.TS": new Date(2014,2,1)},{ M.ST:1, M.TS:1, _id:0 }).explain()

exception: BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB)

I'm trying to use the full search http://docs.mongodb.org/manual/tutorial/search-for-text/
db ['Item']. runCommand ('text', {search: 'deep voice', language: 'english'})
it works well
but when I add conditions
db['Item'].runCommand( 'text', { search: 'deep voice' , language: 'english' , filter: {"and":[{"_extendedBy":{"in":["Voiceover"]}},{"and":[{"or":[{"removed":null},{"removed":{"\(exists":false}}]},{"category":ObjectId("51bc464ab012269e23278d55")},{"active":true},{"visible":true}]}]} } )
I receive an error
{
"queryDebugString" : "deep|voic||||||",
"language" : "english",
"errmsg" : "exception: BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _extendedBy: \"Voiceover\"",
"code" : 10334,
"ok" : 0
}
delete the word "voice"
db['Item'].runCommand( 'text', { search: 'deep' , language: 'english' , filter: {"\)and":[{"_extendedBy":{"in":["Voiceover"]}},{"and":[{"or":[{"removed":null},{"removed":{"exists":false}}]},{"category":ObjectId("51bc464ab012269e23278d55")},{"active":true},{"visible":true}]}]} } );
receive
response to a request ...... ......
],
"stats" : {
"nscanned" : 87,
"nscannedObjects" : 87,
"n" : 18,
"nfound" : 18,
"timeMicros" : 1013
},
"ok" : 1
}
Couldn’t understand why the error occurs?
database is not large "storageSize" : 2793472,
db.Item.stats()
{
"ns" : "internetjock.Item",
"count" : 616,
"size" : 2035840,
"avgObjSize" : 3304.935064935065,
"storageSize" : 2793472,
"numExtents" : 5,
"nindexes" : 12,
"lastExtentSize" : 2097152,
"paddingFactor" : 1.0000000000001221,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 7440160,
"indexSizes" : {
"_id_" : 24528,
"modlrHff22a60ae822e1e68ba919bbedcb8957d5c5d10f" : 40880,
"modlrH6f786b134a46c37db715aa2c831cfbe1fadb9d1d" : 40880,
"modlrI467f6180af484be29ee9258920fc4837992c825e" : 24528,
"modlrI5cb302f507b9d0409921ac0c51f7d9fc4fd5d2ee" : 40880,
"modlrI6393f31b5b6b4b2cd9517391dabf5db6d6dd3c28" : 8176,
"modlrI1c5cbf0ce48258a5a39c1ac54a1c1a038ebe1027" : 32704,
"modlrH6e623929cc3867746630bae4572b9dbe5bd3b9f7" : 40880,
"modlrH72ea9b8456321008fd832ef9459d868800ce87cb" : 40880,
"modlrU821e16c04f9069f8d0b705d78d8f666a007c274d" : 24528,
"modlrT88fc09e54b17679b0028556344b50c9fe169bdb5" : 7080416,
"modlrIefa804b72cc346d66957110e286839a3f42793ef" : 40880
},
"ok" : 1
}
I had same problem with mongo 3.0.0 and 3.1.9 with relatively small database (12GB).
After wasting roughly 4 hours of time on this I found workaround using hidden parameter
mongorestore --batchSize=10
where number varies depending on nature of your data. Start with 1000.
The result document returned by the first query is apparently greater than 16MB. MongoDB has a max document size of 16MB. The second query is returning a document that's lesser than 16MB and hence no errors.
There's no way around this. Here's the link to documentation:
http://docs.mongodb.org/manual/reference/limits/
Recreate the Text Index and everything works :-)
db.Item.dropIndex('modlrT88fc09e54b17679b0028556344b50c9fe169bdb5');
db.Item.ensureIndex({'keywords':'text'},{'name':'modlrT88fc09e54b17679b0028556344b50c9fe169bdb5'})
db.Item.stats()
...
"modlrT88fc09e54b17679b0028556344b50c9fe169bdb5" : 7080416, //before
...
"modlrT88fc09e54b17679b0028556344b50c9fe169bdb5" : 2518208 //after Recreated the Text Index

Why are my mongodb indexes so large

I have 57M documents in my mongodb collection, which is 19G of data.
My indexes are taking up 10G. Does this sound normal or could I be doing something very wrong! My primary key is 2G.
{
"ns" : "myDatabase.logs",
"count" : 56795183,
"size" : 19995518140,
"avgObjSize" : 352.0636272974065,
"storageSize" : 21217578928,
"numExtents" : 39,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 10753999088,
"indexSizes" : {
"_id_" : 2330814080,
"type_1_playerId_1" : 2999537296,
"type_1_time_-1" : 2344582464,
"type_1_tableId_1" : 3079065248
},
"ok" : 1
}
The index size is determined by the number of documents being indexed, as well as the size of the key (compound keys store more information and will be larger). In this case, the _id index divided by the number of documents is 40 bytes, which seems relatively reasonable.
If you run db.collection.getIndexes(), you can find the index version. If {v : 0}, the index was created prior to mongo 2.0, in which case you should upgrade to {v:1}. This process is documented here: http://www.mongodb.org/display/DOCS/Index+Versions