I have a MongoDB Collection for weather data with each document consisting about 50 different weather parameters fields. Simple Example below:
{
"wind":7,
"swell":6,
"temp":32,
...
"50th_field":32
}
If I only need one field from all documents, say temp, my query would be this:
db.weather.find({},{ temp: 1})
So internally, does MongoDB has to fetch the entire document for just 1 field which was requested(projected)? Wouldn't it be an expensive operation?
I tried MongoDB Compass to benchmark timings, but the time required was <1ms so couldn't figure out.
MonogDB will read all data, however only field temp (and _id) will be transmitted over your network to the client. In case your document are rather big, then the over all performance should be better when you project only the fields you need to get.
Yes. This is how to avoid it:
create an index on temp
Use find(Temp)
turn off _id (necessary).
Run:
db.coll.find({ temp:{ $ne:null }},{ temp:1, _id:0 })`
{} triggers collscan because the algorithm tries to match the query fields with project
With {temp}, {temp, _id:0} it says: "Oh, I only need temp".
It should also be smart to tell that {}, {temp, _id:0} only needs index, but it's not.
Basically using projection with limiting fields is always faster then fetch full document, You can even use the covered index to avoid examining the documents(no disk IO) the archive better performance.
Check the executionStats of demo below, the totalDocsExamined was 0! but you must remove the _id field in projection because it's not included in index.
See also:
https://docs.mongodb.com/manual/core/query-optimization/#covered-query
> db.test.insertOne({name: 'TJT'})
{
"acknowledged" : true,
"insertedId" : ObjectId("5faa0c8469dffee69357dde3")
}
> db.test.createIndex({name: 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.test.explain('executionStats').find({name: 'TJT'}, {_id: 0, name: 1})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "memo.test",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "TJT"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"_id" : 0,
"name" : 1
},
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"TJT\", \"TJT\"]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
}
Assuming I have only male and females in my user collection. Is the following :
User.find({ gender: { $in: ['male','female'] }})
slower than this one :
User.find()
I feel like it would be, but I don't really know how MongoDB works internally. Both requests return the entire collection. I'm building a filter feature and I'd like to simplify my api code by considering that every call is filtered somehow.
it is a good question as it touches basic query planning capabilites.
Comparing explain results we can see that using IN invokes collection scan by specified query parameter - which is more expensive than basic document dump, when querying without parameters.
db.User.find({ gender: { $in: ['male','female'] }}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.User",
"indexFilterSet" : false,
"parsedQuery" : {
"gender" : {
"$in" : [
"female",
"male"
]
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"gender" : {
"$in" : [
"female",
"male"
]
}
},
"direction" : "forward"
},
"rejectedPlans" : []
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 24,
"executionTimeMillis" : 0,
"totalKeysExamined" : 0,
"totalDocsExamined" : 24,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"gender" : {
"$in" : [
"female",
"male"
]
}
},
"nReturned" : 24,
"executionTimeMillisEstimate" : 0,
"works" : 26,
"advanced" : 24,
"needTime" : 1,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 24
}
},
"serverInfo" : {
"host" : "greg",
"port" : 27017,
"version" : "3.2.3",
"gitVersion" : "b326ba837cf6f49d65c2f85e1b70f6f31ece7937"
},
"ok" : 1
}
db.User.find().explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.User",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : []
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : []
},
"direction" : "forward"
},
"rejectedPlans" : []
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 24,
"executionTimeMillis" : 0,
"totalKeysExamined" : 0,
"totalDocsExamined" : 24,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : []
},
"nReturned" : 24,
"executionTimeMillisEstimate" : 0,
"works" : 26,
"advanced" : 24,
"needTime" : 1,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 24
}
},
"serverInfo" : {
"host" : "greg",
"port" : 27017,
"version" : "3.2.3",
"gitVersion" : "b326ba837cf6f49d65c2f85e1b70f6f31ece7937"
},
"ok" : 1
}
When querying without a condition, it return all the documents without checking. But if you and a condition. Simply it compile the condition into BSON and match with the data in the database, Which is slower. But if you create an index on gender. You can not see any difference in time (in both cases)
I have Mongodb collection with about 7 million documents that represents places.
I run a query that search for places that their name start with a prefix near a specific location.
We have a compound index as described bellow to speed up the search.
When the search query find match (even if only one) the query is execute very fast (~20 milisec). But when there is no match it can take 30 sec for the query to execute.
Please assist.
In detailed:
Each place (geoData) has the following fields:
"loc" - a GeoJSON point that represent the location
"categoriesIds" - array of int ids
"name" - the name of the placee
The following index is defined on this collection:
{
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
}
The query is:
db.geoData.find({
"loc":{
"$near":{
"$geometry":{
"type": "Point" ,
"coordinates": [ -0.10675191879272461 , 51.531600743186644]
},
"$maxDistance": 5000.0
}
},
"categoriesIds":{
"$in": [ 1 , 2 , 71 , 70 , 74 , 72 , 73 , 69 , 44 , 26 , 27 , 33 , 43 , 45 , 53 , 79]
},
"name":{ "$regex": "^Cafe Ne"}
})
Execution stats
(Link to the whole explain result)
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 169,
"totalKeysExamined" : 14333,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "GEO_NEAR_2DSPHERE",
"nReturned" : 1,
"executionTimeMillisEstimate" : 60,
"works" : 14354,
"advanced" : 1,
"needTime" : 14351,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"searchIntervals" : [
{
"minDistance" : 0,
"maxDistance" : 3408.329295346151,
"maxInclusive" : false
},
{
"minDistance" : 3408.329295346151,
"maxDistance" : 5000,
"maxInclusive" : true
}
],
"inputStages" : [
{
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 20,
"works" : 6413,
"advanced" : 1,
"needTime" : 6411,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 20,
"works" : 6413,
"advanced" : 1,
"needTime" : 6411,
"needFetch" : 0,
"saveState" : 361,
"restoreState" : 361,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230031\"]",
"[\"2f10032300311\", \"2f10032300312\")",
"[\"2f10032300312\", \"2f10032300313\")",
"[\"2f10032300313\", \"2f10032300314\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")"
],
"categoriesIds" : [
"[1.0, 1.0]",
"[2.0, 2.0]",
"[26.0, 26.0]",
"[27.0, 27.0]",
"[33.0, 33.0]",
"[43.0, 43.0]",
"[44.0, 44.0]",
"[45.0, 45.0]",
"[53.0, 53.0]",
"[69.0, 69.0]",
"[70.0, 70.0]",
"[71.0, 71.0]",
"[72.0, 72.0]",
"[73.0, 73.0]",
"[74.0, 74.0]",
"[79.0, 79.0]"
],
"name" : [
"[\"Cafe Ne\", \"Cafe Nf\")",
"[/^Cafe Ne/, /^Cafe Ne/]"
]
},
"keysExamined" : 6412,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 1
}
},
{
"stage" : "FETCH",
"nReturned" : 0,
"executionTimeMillisEstimate" : 40,
"works" : 7922,
"advanced" : 0,
"needTime" : 7921,
"needFetch" : 0,
"saveState" : 261,
"restoreState" : 261,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 0,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 40,
"works" : 7922,
"advanced" : 0,
"needTime" : 7921,
"needFetch" : 0,
"saveState" : 261,
"restoreState" : 261,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"indexName" : "loc_2dsphere_categoriesIds_1_name_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300011\", \"2f10032300012\")",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230032\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")",
"[\"2f10032300322\", \"2f10032300323\")"
],
"categoriesIds" : [
"[1.0, 1.0]",
"[2.0, 2.0]",
"[26.0, 26.0]",
"[27.0, 27.0]",
"[33.0, 33.0]",
"[43.0, 43.0]",
"[44.0, 44.0]",
"[45.0, 45.0]",
"[53.0, 53.0]",
"[69.0, 69.0]",
"[70.0, 70.0]",
"[71.0, 71.0]",
"[72.0, 72.0]",
"[73.0, 73.0]",
"[74.0, 74.0]",
"[79.0, 79.0]"
],
"name" : [
"[\"Cafe Ne\", \"Cafe Nf\")",
"[/^Cafe Ne/, /^Cafe Ne/]"
]
},
"keysExamined" : 7921,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0
}
}
]
},
Execution stats when searching for "CafeNeeNNN" instead of "Cafe Ne"
(Link to the whole explain result )
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 2537,
"totalKeysExamined" : 232259,
"totalDocsExamined" : 162658,
"executionStages" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"name" : /^CafeNeeNNN/
},
{
"categoriesIds" : {
"$in" : [
1,
2,
26,
27,
33,
43,
44,
45,
53,
69,
70,
71,
72,
73,
74,
79
]
}
}
]
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 1330,
"works" : 302752,
"advanced" : 0,
"needTime" : 302750,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 70486,
"alreadyHasObj" : 70486,
"inputStage" : {
"stage" : "GEO_NEAR_2DSPHERE",
"nReturned" : 70486,
"executionTimeMillisEstimate" : 1290,
"works" : 302751,
"advanced" : 70486,
"needTime" : 232264,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere"
},
"indexName" : "loc_2dsphere",
"searchIntervals" : [
{
"minDistance" : 0,
"maxDistance" : 3408.329295346151,
"maxInclusive" : false
},
{
"minDistance" : 3408.329295346151,
"maxDistance" : 5000,
"maxInclusive" : true
}
],
"inputStages" : [
{
"stage" : "FETCH",
"nReturned" : 44540,
"executionTimeMillisEstimate" : 110,
"works" : 102690,
"advanced" : 44540,
"needTime" : 58149,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 44540,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 44540,
"executionTimeMillisEstimate" : 90,
"works" : 102690,
"advanced" : 44540,
"needTime" : 58149,
"needFetch" : 0,
"saveState" : 4731,
"restoreState" : 4731,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere"
},
"indexName" : "loc_2dsphere",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230031\"]",
"[\"2f10032300311\", \"2f10032300312\")",
"[\"2f10032300312\", \"2f10032300313\")",
"[\"2f10032300313\", \"2f10032300314\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")"
]
},
"keysExamined" : 102689,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 44540
}
},
{
"stage" : "FETCH",
"nReturned" : 47632,
"executionTimeMillisEstimate" : 250,
"works" : 129571,
"advanced" : 47632,
"needTime" : 81938,
"needFetch" : 0,
"saveState" : 2556,
"restoreState" : 2556,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 47632,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"filter" : {
"TwoDSphereKeyInRegionExpression" : true
},
"nReturned" : 47632,
"executionTimeMillisEstimate" : 230,
"works" : 129571,
"advanced" : 47632,
"needTime" : 81938,
"needFetch" : 0,
"saveState" : 2556,
"restoreState" : 2556,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"loc" : "2dsphere"
},
"indexName" : "loc_2dsphere",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"loc" : [
"[\"2f1003230\", \"2f1003230\"]",
"[\"2f10032300\", \"2f10032300\"]",
"[\"2f100323000\", \"2f100323000\"]",
"[\"2f1003230001\", \"2f1003230001\"]",
"[\"2f10032300011\", \"2f10032300012\")",
"[\"2f10032300012\", \"2f10032300013\")",
"[\"2f1003230002\", \"2f1003230002\"]",
"[\"2f10032300021\", \"2f10032300022\")",
"[\"2f10032300022\", \"2f10032300023\")",
"[\"2f100323003\", \"2f100323003\"]",
"[\"2f1003230031\", \"2f1003230032\")",
"[\"2f1003230032\", \"2f1003230032\"]",
"[\"2f10032300320\", \"2f10032300321\")",
"[\"2f10032300321\", \"2f10032300322\")",
"[\"2f10032300322\", \"2f10032300323\")"
]
},
"keysExamined" : 129570,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 47632
}
}
]
}
},
Indexes on the collection
{
"0" : {
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "wego.geoData"
},
"1" : {
"v" : 1,
"key" : {
"srcId" : 1
},
"name" : "srcId_1",
"ns" : "wego.geoData"
},
"2" : {
"v" : 1,
"key" : {
"loc" : "2dsphere"
},
"name" : "loc_2dsphere",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
},
"3" : {
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "wego.geoData"
},
"4" : {
"v" : 1,
"key" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"name" : 1
},
"name" : "loc_2dsphere_categoriesIds_1_name_1",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
},
"5" : {
"v" : 1,
"key" : {
"loc" : "2dsphere",
"categoriesIds" : 1,
"keywords" : 1
},
"name" : "loc_2dsphere_categoriesIds_1_keywords_1",
"ns" : "wego.geoData",
"2dsphereIndexVersion" : 2
}
}
Collection stats link
I am going to speculate here a bit, and then a comment about your design.
First, when you create an index on key which has an array on a value you create a record for each element of the array:
To index a field that holds an array value, MongoDB creates an index
key for each element in the array.
This is from MongoDB own documentation about indecies.
So, if your typical record more than a hand full of categories and you have 7 million records,
your index is huge, and it will also take time to scan the index itself to find that the index does not contain what you are looking for. It is still
faster than a collection scan, but it darn slow compared to how fast it takes to find an existing record.
Now, let me comment about your schema design. This is a matter of style so feel free to ignore this part.
You have a record which might be in 17 categories. That is a bit overwhelming, and over abusing the term category. A category is a specific
division, a way to quickly associate a thing with a group of things. What is a thing that belong to so many groups?
Let's take for example your records Cafe Ne. I assume in the real world - and please remember, programming and applications are at best when the solve real world problems - Cafe Ne, is either a restaurant, a caffe, a jazz bar,
a dinner. It's for sure not a garage (unless, cafe means cars in a language I don't know). I can hardly imagine it's a bank or a dental clinic. I'd have to really make an effort, to find more than 10 meaningful categories, that users search a cafe by.
My point is, even though mongodb allows you to design things like that, it does not mean you have to. Try to narrow the number of categories you have and the ones you look for, and you will have much better performance.
As JohnnyHK suggested in comments, and Oz123 pointed to in his answer, the issue here appears to be an index that has grown so large that it fails to perform well as an index. I believe that in addition to the category expansion issue that has already been pointed out, the ordering of fields in your index creates trouble. Compound indexes are built according to the order of fields, and putting name after categoriesIds makes it more costly to query on name.
It's clear that you need to tune your indexes. Exactly how you tune them depends on the types of queries that you are expecting to support. In particularly, I'm not sure if you'll see better performance from a compound index of loc and name or if you'll see better performance from individual indexes, one for loc and one for name. Mongo themselves are a little vague on when it's best to use a compound index and when it's best to use individual indexes and rely on index intersection.
My intuition says that individual indexes will perform better, but I'd test both scenarios.
If you anticipate needing to query by category as well, without name or loc fields that could narrow the query down, it's probably best to create a separate categoriesIds index.
The order of the fields in a compound index is very important. It's hard to diagnose without having access to the real data and usage patterns, but this key might increase the odds of matching (or not) the document using only the index:
{
"loc" : "2dsphere",
"name" : 1,
"categoriesIds" : 1
}
Not sure if it is exactly the same issue but we had a similar problem with a multikey index with poor performance when no results were found.
It is actually a Mongo bug that was fixed in v3.3.8.
https://jira.mongodb.org/browse/SERVER-15086
We fixed our problems after upgrading Mongo and rebuilding the index.
I have a collection with around 50 lake records.Below is one sample document
{
"_id" : NumberLong(4253223),
"locId" : 59,
"startIpNum" : NumberLong("3287940726"),
"endIpNum" : NumberLong("3287940761"),
"maxmind_location" : {
"locId" : 59,
"country" : "DK",
"region" : "",
"city" : "",
"postalCode" : "",
"latitude" : "56.0000",
"longitude" : "10.0000",
"metroCode" : "",
"areaCode" : "\n"
}
}
Below is Query I am trying to perform.I want to find last record from the matching condition.
find({
$and: [
{startIpNum: { $lte: 459950297 }},
{endIpNum: { $gte: 459950297 }}
]
}).sort({_id : -1}).limit(1)
I have septate ascending index on startIpNum and endIpNum. I have replaced _id with incremental id value like Mysql.
When I do query without sort and with limit 1. It gives me result in 0ms. As soon as I put sort (I need sort since I want last matching record ) the query get hang forever.
I have also tried below query, but It takes around 700ms. With compound
index on {startIpNum :1 , endIpNum : 1 , _id : -1 } with the sort on _id.
find({
startIpNum : { $lte: 459950297 },
endIpNum : { $gte: 459950297 }
}).sort({
startIpNum :1,
endIpNum :1 ,
_id : -1
}).limit(1).explain({ verbose : true});
How can I achieve sort in my first approach.
Here is explain. Its still scanning 370061 indexes for
db.maxmind.find({startIpNum : { $lte: 459950297 }, endIpNum : { $gte: 459950297 } }).sort({startIpNum :1, endIpNum :1 , _id : -1 }).limit(1).hint("startIpNum_1_endIpNum_1__id_-1").explain( { verbose: true } );
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "yogeshTest.maxmind",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"startIpNum" : {
"$lte" : 459950297
}
},
{
"endIpNum" : {
"$gte" : 459950297
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 0,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"startIpNum" : 1,
"endIpNum" : 1,
"_id" : -1
},
"indexName" : "startIpNum_1_endIpNum_1__id_-1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"startIpNum" : [
"[-inf.0, 459950297.0]"
],
"endIpNum" : [
"[459950297.0, inf.0]"
],
"_id" : [
"[MaxKey, MinKey]"
]
}
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 433,
"totalKeysExamined" : 370061,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "LIMIT",
"nReturned" : 1,
"executionTimeMillisEstimate" : 430,
"works" : 370062,
"advanced" : 1,
"needTime" : 370060,
"needFetch" : 0,
"saveState" : 2891,
"restoreState" : 2891,
"isEOF" : 1,
"invalidates" : 0,
"limitAmount" : 0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 420,
"works" : 370061,
"advanced" : 1,
"needTime" : 370060,
"needFetch" : 0,
"saveState" : 2891,
"restoreState" : 2891,
"isEOF" : 0,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 410,
"works" : 370061,
"advanced" : 1,
"needTime" : 370060,
"needFetch" : 0,
"saveState" : 2891,
"restoreState" : 2891,
"isEOF" : 0,
"invalidates" : 0,
"keyPattern" : {
"startIpNum" : 1,
"endIpNum" : 1,
"_id" : -1
},
"indexName" : "startIpNum_1_endIpNum_1__id_-1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"startIpNum" : [
"[-inf.0, 459950297.0]"
],
"endIpNum" : [
"[459950297.0, inf.0]"
],
"_id" : [
"[MaxKey, MinKey]"
]
},
"keysExamined" : 370061,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0
}
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "cus360-H81M-S",
"port" : 27017,
"version" : "3.0.3",
"gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105"
},
"ok" : 1
}
Before you post the output for db.collection.getIndexes() and explain of your query, let us try the following. What I suspect is that your {startIpNum :1 , endIpNum : 1 , _id : -1 } does not win as a query plan.
So what you can try is force MongoDB to use that index by hinting:
find({
startIpNum : { $lte: 459950297 },
endIpNum : { $gte: 459950297 }
}).sort({
startIpNum :1,
endIpNum :1 ,
_id : -1
}).limit(1).hint({startIpNum :1 , endIpNum : 1 , _id : -1 })
Currently it seems like your query fetches all the matching documents, loads them into memory, and sorts them there. With hinting, using your index, it will just pick your documents in the right order initially.
I want to know why the follow search in mongo db (C#) would take 50 seconds to execute.
I followed the basic idea of http://calv.info/indexing-schemaless-documents-in-mongo/
I have 100,000 records in a collection(captures). On each document I have a SearchTerm Collection
public class SearchTerm
{
public string Key { get; set; }
public object Value { get; set; }
}
public class Capture
{
//Some other fields
public IList<SearchTerm> SearchTerms { get; set; }
}
I have also defined a index like so
var capturesCollection = database.GetCollection<Capture>("captures");
capturesCollection.CreateIndex("SearchTerms.Key", "SearchTerms.Value");
But the following query takes 50 seconds to execute
var query = Query.Or(Query.And(Query.EQ("SearchTerms.Key", "ClientId"), Query.EQ("SearchTerms.Value", selectedClient.Id)), Query.And(Query.EQ("SearchTerms.Key", "CustomerName"), Query.EQ("SearchTerms.Value", "Jan")));
var selectedCapture = capturesCollection.Find(query).ToList();
Edit: As asked my explain:
clauses: [{ "cursor" : "BtreeCursor SearchTerms.Key_1_SearchTerms.Value_1", "isMultiKey" : true, "n" : 10003, "nscannedObjects" : 100000, "nscanned" : 100000, "scanAndOrder" : false, "indexOnly" : false, "nChunkSkips" : 0, "indexBounds" : { "SearchTerms.Key" : [["ClientId", "ClientId"]], "SearchTerms.Value" : [[{ "$minElement" : 1 }, { "$maxElement" : 1 }]] } }, { "cursor" : "BtreeCursor SearchTerms.Key_1_SearchTerms.Value_1", "isMultiKey" : true, "n" : 70328, "nscannedObjects" : 90046, "nscanned" : 211653, "scanAndOrder" : false, "indexOnly" : false, "nChunkSkips" : 0, "indexBounds" : { "SearchTerms.Key" : [["CustomerName", "CustomerName"]], "SearchTerms.Value" : [[{ "$minElement" : 1 }, { "$maxElement" : 1 }]] } }]
cursor: QueryOptimizerCursor
n: 73219
nscannedObjects: 190046
nscanned: 311653
nscannedObjectsAllPlans: 190046
nscannedAllPlans: 311653
scanAndOrder: false
nYields: 2436
nChunkSkips: 0
millis: 5196
server: piro-pc:27017
filterSet: false
stats: { "type" : "KEEP_MUTATIONS", "works" : 311655, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 73219, "needTime" : 238435, "needFetch" : 0, "isEOF" : 1, "children" : [{ "type" : "OR", "works" : 311655, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 73219, "needTime" : 238435, "needFetch" : 0, "isEOF" : 1, "dupsTested" : 80331, "dupsDropped" : 7112, "locsForgotten" : 0, "matchTested_0" : 0, "matchTested_1" : 0, "children" : [{ "type" : "FETCH", "works" : 100001, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 10003, "needTime" : 89997, "needFetch" : 0, "isEOF" : 1, "alreadyHasObj" : 0, "forcedFetches" : 0, "matchTested" : 10003, "children" : [{ "type" : "IXSCAN", "works" : 100000, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 100000, "needTime" : 0, "needFetch" : 0, "isEOF" : 1, "keyPattern" : "{ SearchTerms.Key: 1, SearchTerms.Value: 1 }", "boundsVerbose" : "field #0['SearchTerms.Key']: [\"ClientId\", \"ClientId\"], field #1['SearchTerms.Value']: [MinKey, MaxKey]", "isMultiKey" : 1, "yieldMovedCursor" : 0, "dupsTested" : 100000, "dupsDropped" : 0, "seenInvalidated" : 0, "matchTested" : 0, "keysExamined" : 100000, "children" : [] }] }, { "type" : "FETCH", "works" : 211654, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 70328, "needTime" : 141325, "needFetch" : 0, "isEOF" : 1, "alreadyHasObj" : 0, "forcedFetches" : 0, "matchTested" : 70328, "children" : [{ "type" : "IXSCAN", "works" : 211653, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 90046, "needTime" : 121607, "needFetch" : 0, "isEOF" : 1, "keyPattern" : "{}", "boundsVerbose" : "field #0['SearchTerms.Key']: [\"CustomerName\", \"CustomerName\"], field #1['SearchTerms.Value']: [MinKey, MaxKey]", "isMultiKey" : 1, "yieldMovedCursor" : 0, "dupsTested" : 211653, "dupsDropped" : 121607, "seenInvalidated" : 0, "matchTested" : 0, "keysExamined" : 211653, "children" : [] }] }] }] }
Thanks for posting the explain. Let's address the problems one at a time.
First, I don't think this query does what you think it does / want it to do. Let me show you by example using the mongo shell. Your query, translated into the shell, is
{ "$or" : [
{ "$and" : [
{ "SearchTerms.Key" : "ClientId" },
{ "SearchTerms.Value" : "xxx" }
]},
{ "$and" : [
{ "SearchTerms.Key" : "CustomerName" },
{ "SearchTerms.Value" : "Jan" }
]}
]}
This query finds documents where either some Key has the value "ClientId" and some Value has the value "xxx" or some Key has the value "CustomerName" and some Value the value "Jan". The key and the value don't need to be part of the same array element. For example, the following document matches your query
{ "SearchTerms" : [
{ "Key" : "ClientId", "Value" : 691 },
{ "Key" : "banana", "Value" : "xxx" }
]
}
I'm guessing your desired behavior is to match exactly the documents that contain the Key and Value in the same array element. The $elemMatch operator is the tool for the job:
{ "$or" : [
{ "SearchTerms" : { "$elemMatch" : { "Key" : "ClientId", "Value" : "xxx" } } },
{ "SearchTerms" : { "$elemMatch" : { "Key" : "CustomerName", "Value" : "Jan" } } }
]}
Second, I don't think this schema is what you are looking for. You don't describe your use case so I can't be confident, but the situation described in that blog post is a very rare situation where you need to store and search on arbitrary key-value pairs that can change from one document to the next. This is like letting users put in custom metadata. Almost no applications want or need to do this. It looks like your application is storing information about customers, probably for an internal system. You should be able to define a data model for your customers that looks like
{
"CustomerId" : 1234,
"CustomerName" : "Jan",
"ClientId" : "xpj1234",
...
}
This will simplify and improve things dramatically. I think the wires got crossed here because sometimes people call MongoDB "schemaless" and the blog post talks about "schemaless" documents. The blog post really is talking about schemaless documents where you don't know what is going to go in there. Most applications should know pretty much exactly what the general structure of the documents in a collection will be.
Finally, I think on the basis of this we can disregard the issue with the slow query for now. Feel free to ask another question or edit this one with extra explanation if you need more help or if the problem doesn't go away once you've taken into account what I've said here.
1) Please take a look at mongodb log file and see whats the query that gets generated against the database.
2) Enter that query into mongo shell and add ".explain()" at the end -- and see if your index is actually being used (does it say Basic Cursor or Btree Cursor ?)
3) If your index is used, whats the value of "nscanned" attribute? Perhaps your index does not have enough "value diversity" in it?