MongoDB picking wrong index

MongoDB picking wrong index - mongodb

The following document is stored in a collection:
"ldr": {
"d": NumberInt(318),
"w": NumberInt(46),
"m": NumberInt(10),
"pts": [
{
"lid": ObjectId("47cc67093475061e3d95369d"),
"dPts": NumberLong(110),
"wPts": NumberLong(110),
"mPts": NumberLong(220),
"aPts": NumberLong(3340)
},
{
"lid": ObjectId("56316279be4f0eda62ebfee0"),
"dPts": NumberInt(0),
"wPts": NumberInt(0),
"mPts": NumberInt(0),
"aPts": NumberInt(0)
}
]
}
I have 4 indexes on a collection:
ldr.pts.lid_1_ldr.d_1_ldr.pts.dPts_-1
ldr.pts.lid_1_ldr.w_1_ldr.pts.wPts_-1
ldr.pts.lid_1_ldr.m_1_ldr.pts.mPts_-1
ldr.pts.lid_1_ldr.pts.aPts_-1
I use the following query:
db.my_collection.find({"ldr.pts.lid":ObjectId("47cc67093475061e3d95369d"), "ldr.w": NumberInt(46)},{"ldr":1}).sort({"ldr.pts.wPts":-1}).explain()
Note: I have run this query with the {ldr:1} left out with the same result.
I would expect the query above to use the following index:
ldr.pts.lid_1_ldr.w_1_ldr.pts.wPts_-1
However, this is the result of the explain:
{
"cursor" : "BtreeCursor ldr.pts.lid_1_ldr.d_1_ldr.pts.dPts_-1",
"isMultiKey" : true,
"n" : 3,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 16,
"nscannedAllPlans" : 16,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"ldr.pts.lid" : [
[
ObjectId("47cc67093475061e3d95369d"),
ObjectId("47cc67093475061e3d95369d")
]
],
"ldr.d" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"ldr.pts.dPts" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "Beast-PC:27017",
"filterSet" : false
}
As you can see the first index is being picked.
I've tried using a hint and supplying the correct index but that still results in indexOnly being false and in scanAndOrder being true.
Any ideas?

Sorting on a field within an array isn't likely to produce what you're expecting as your descending sort on ldr.pts.wPts will sort based on the max of all the wPts values from each document's pts array, rather than just the wPts value from the matching pts array element.
That's at the root of why your query can't use an index for the sorting.

Related

Querying and sorting indexed collection in MongoDB results in data overflow

"events" is a capped collection that stores user click events on a webpage. A document looks like this:
{
"event_name" : "click",
"user_id" : "ea0b4027-05f7-4902-b133-ff810b5800e1",
"object_type" : "ad",
"object_id" : "ea0b4027-05f7-4902-b133-ff810b5822e5",
"object_properties" : { "foo" : "bar" },
"event_properties" : {"foo" : "bar" },
"time" : ISODate("2014-05-31T22:00:43.681Z")
}
Here's a compound index for this collection:
db.events.ensureIndex({object_type: 1, time: 1});
This is how I am querying:
db.events.find( {
$or : [ {object_type : 'ad'}, {object_type : 'element'} ],
time: { $gte: new Date("2013-10-01T00:00:00.000Z"), $lte: new Date("2014-09-01T00:00:00.000Z") }},
{ user_id: 1, event_name: 1, object_id: 1, object_type : 1, obj_properties : 1, time:1 } )
.sort({time: 1});
This is causing: "too much data for sort() with no index. add an index or specify a smaller limit" in mongo 2.4.9 and "Overflow sort stage buffered data usage of 33554618 bytes exceeds internal limit of 33554432 bytes" in Mongo 2.6.3. I'm using Java MongoDB driver 2.12.3. It throws the same error when I use "$natural" sorting. It seems like MongoDB is not really using the index defined for sorting, but I can't figure out why (I read MongoDB documentation on indexes). I appreciate any hints.
Here is the result of explain():
{
"clauses" : [
{
"cursor" : "BtreeCursor object_type_1_time_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"object_type" : [
[
"element",
"element"
]
],
"time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor object_type_1_time_1",
"isMultiKey" : false,
"n" : 399609,
"nscannedObjects" : 399609,
"nscanned" : 399609,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"object_type" : [
[
"ad",
"ad"
]
],
"time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
"cursor" : "QueryOptimizerCursor",
"n" : 408440,
"nscannedObjects" : 409686,
"nscanned" : 409686,
"nscannedObjectsAllPlans" : 409686,
"nscannedAllPlans" : 409686,
"scanAndOrder" : false,
"nYields" : 6402,
"nChunkSkips" : 0,
"millis" : 2633,
"server" : "MacBook-Pro.local:27017",
"filterSet" : false
}

According to explain(), When the mongo run the query it did use the compound index. The problem is the sort({time:1}).
Your index is {object_type:1, time:1}, it means the query results are ordered by object_type first, if the object_type is same, then ordered by time.
For the sort {time:1}, mongo have to load all the matched objects(399609) into the memory to sort by time due to the order is not the same to the index({object_type:1, time:1}). Assume that the avg size of object is 100 bytes, then the limit would be exceeded.
more info: http://docs.mongodb.org/manual/core/index-compound/
For instance, there are 3 objects with index {obj_type:1, time:1}:
{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "c", "time" : ISODate("2014-02-31T22:00:43.681Z")}
{"obj_type": "b", "time" : ISODate("2014-03-31T22:00:43.681Z")}
db.events.find({}).sort({"obj_type":1, "time":1}).limit(2)
{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "b", "time" : ISODate("2014-03-31T22:00:43.681Z")}
"nscanned" : 2 (This one use index order, which is sorted by {obj_type:1, time:1})
db.events.find({}).sort({"time":1}).limit(2)
{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "c", "time" : ISODate("2014-02-31T22:00:43.681Z")}
"nscanned" : 3 (This one will load all the matched results and then sort)

Why is this $elemMatch query not using my index?

My query:
{
"unique_contact_method.enrichments": {
"$not": {
"$elemMatch": {
"created_by.name": "fullcontact"
}
}
}
}
My Index:
{
v: 1,
name: "unique_contact_method.enrichments.created_by.name_1",
key: {
"unique_contact_method.enrichments.created_by.name": 1
},
ns: "app27434806.unique_contact_methods",
background: true,
safe: true
}
The .explain() result:
Why no index?

The use of the $not operator here is what makes index usage impossible. There is one statement in the documentation that "implies" this, if not completely clearly:
"Remember that the $not operator only affects other operators and cannot check fields and documents independently. So, use the $not operator for logical disjunctions and the $ne operator to test the contents of fields directly."
The essential phrase there is "cannot check fields", which means it does not actually "test" the value of the field as can be done with an index. A simple document explains this the best:
{
"_id" : ObjectId("53f3e414deee3a78e47e57e2"),
"created" : [ { "name" : "Bill" }, { "name" : "Ted" } ]
}
Where of course an index is created on "created.name".
Now consider the following query and explain output:
db.doctest.find({ "created": { "$elemMatch": { "name": "Bill" } } }).explain()
{
"cursor" : "BtreeCursor created.name_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"created.name" : [
[
"Bill",
"Bill"
]
]
},
"server" : "ubuntu:27017",
"filterSet" : false
}
That simply selects the index and shows the index bounds as expected.
Not look at this with $not, and I'm going to "force" the index with .hint():
db.doctest.find({ "created": { "$not": { "$elemMatch": { "name": "Bill" } } } }).hint({ "created.name": 1 }).explain()
{
"cursor" : "BtreeCursor created.name_1",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 1,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"created.name" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "ubuntu:27017",
"filterSet" : false
}
The important part to look at here is "indexBounds". This explains why without the hint the index would not be used, as simply put there are no "bounds" to select by. The $not operation basically says:
"Look at every value tested by the condition and if it is true then consider it false or essentially the reverse"
The end evaluation here is that "Ted" is not "Bill" therefore the condition is true, but there is no way to "look for that" using an index.
So the consideration here is how do you do the same thing and use an index? The passage from the documentation tells you that in order to consider the "field" you need to use the $ne operator instead:
db.doctest.find({ "created": { "$elemMatch": { "name": { "$ne": "Bill" } } } }).explain()
{
"cursor" : "BtreeCursor created.name_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"created.name" : [
[
{
"$minElement" : 1
},
"Bill"
],
[
"Bill",
{
"$maxElement" : 1
}
]
]
},
"server" : "ubuntu:27017",
"filterSet" : false
}
Now the "indexBounds" shows you that the index is used to essentially "filter out" the values that were supplied. So the index is used to pull any other value than "Bill".
The conclusion here is that $not has it's logical uses, but in many cases what you actually want is $ne instead. Where $not must be applied, take into consideration that and index for the field values will not be used to make the comparison.

Occasionally I find the index has been used in query automatically even though operator $not joins the action. It let me recall
this question which also confused me on a long moment. I try on the new clue and find something different. And I think I find the answer finally. Welcome to everyone to comment here if find something else different.
Run on mongo shell, V2.6.4
Initialize data as below:
> db.a.drop();
false
> db.a.insert({_id:1, a:[1,2,3], b:[{x:1, y:2}, {x:4, y:4}], c:1});
WriteResult({ "nInserted" : 1 })
> db.a.insert({_id:2, a:[4,2,3], b:[{x:1, y:2}, {x:4, y:4}], c:1});
WriteResult({ "nInserted" : 1 })
> db.a.ensureIndex({a:1}, {name:"a"});
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.a.ensureIndex({"b.x":1}, {name:"bx"});
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
> db.a.ensureIndex({c:1}, {name:"c"});
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}
> db.a.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.a"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a",
"ns" : "test.a"
},
{
"v" : 1,
"key" : {
"b.x" : 1
},
"name" : "bx",
"ns" : "test.a"
},
{
"v" : 1,
"key" : {
"c" : 1
},
"name" : "c",
"ns" : "test.a"
}
]
> db.a.find();
{ "_id" : 1, "a" : [ 1, 2, 3 ], "b" : [ { "x" : 1, "y" : 2 }, { "x" : 2, "y" : 3 } ], "c" : 1 }
{ "_id" : 2, "a" : [ 4, 2, 3 ], "b" : [ { "x" : 1, "y" : 2 }, { "x" : 4, "y" : 4 } ], "c" : 1 }
This block just simply proves that index will be properly used automatically even though $not joins the query action.
> db.a.find({c:{$not:{$gte:1}}}).explain();
{
"cursor" : "BtreeCursor c",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"c" : [
[
{
"$minElement" : 1
},
1
],
[
Infinity,
{
"$maxElement" : 1
}
]
]
},
"server" : "Duke-PC:27017",
"filterSet" : false
}
This is the style that the original question mentioned. Index has been used automatically.
> db.a.find({b:{$elemMatch:{x:{$gte:1}}}}).explain();
{
"cursor" : "BtreeCursor bx", // attention on this line
"isMultiKey" : true,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 9,
"indexBounds" : {
"b.x" : [
[
1,
Infinity
]
]
},
"server" : "Duke-PC:27017",
"filterSet" : false
}
Index doesn't work when use operator $not preceding $elemMatch. It's the core of this question.
> db.a.find({b:{$not:{$elemMatch:{x:{$gte:1}}}}}).explain();
{
"cursor" : "BasicCursor", // attention on this line
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"server" : "Duke-PC:27017",
"filterSet" : false
}
This block: find some way to explain the mechanics of index on array filed.
Totally two documents, but nscanned: 6. This tells us something how the index has been structured on array type. That is, index node is on every element of array but not the array itself. I imagine the index structure on field a like this:
BTree: Node(value:1, entry:[entry({_id:1})]), Node(value:2, entry:[entry({_id:1}), entry({_id:2})]), ...
Of course, this is only my imagination for explanation. :)
> db.a.find({a:{$gte:1}}).explain();
{
"cursor" : "BtreeCursor a",
"isMultiKey" : true,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 6, // attention on this line
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 6,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"a" : [
[
1,
Infinity
]
]
},
"server" : "Duke-PC:27017",
"filterSet" : false
}
When use operator $not, the relevant index has been adopted automatically. And the field "indexBounds" tells us how $not handles the query.
> db.a.find({a:{$not:{$gte:2}}},{_id:0,a:1}).explain();
{
"cursor" : "BtreeCursor a",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 1, // attention on this field
"nscanned" : 2, // attention on this field
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : { // attention on this field
"a" : [
[
{
"$minElement" : 1
},
2
],
[
Infinity,
{
"$maxElement" : 1
}
]
]
},
"server" : "Duke-PC:27017",
"filterSet" : false
}
Insert a new document with same field name a but not array.
> db.a.insert({a:1});
WriteResult({ "nInserted" : 1 })
> db.a.find();
{ "_id" : 1, "a" : [ 1, 2, 3 ], "b" : [ { "x" : 1, "y" : 2 }, { "x" : 2, "y" : 3 } ], "c" : 1 }
{ "_id" : 2, "a" : [ 4, 2, 3 ], "b" : [ { "x" : 1, "y" : 2 }, { "x" : 4, "y" : 4 } ], "c" : 1 }
{ "_id" : ObjectId("541e4fcbb65042180c128280"), "a" : 1 }
Please read this block comparing with just above content.
> db.a.find({a:{$not:{$gte:2}}},{_id:0,a:1}).explain();
{
"cursor" : "BtreeCursor a",
"isMultiKey" : true, // This tells engine there are repeated array elements on index.
"n" : 1,
"nscannedObjects" : 2, // The third document should only access the index to fetch data
// since it has enough information.
// But here engine still read from the collection. My unstanding is the engine
// can not distinguish whether this index field is an array element or not,
// so it has to access the collection to find more information.
"nscanned" : 3,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 3,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 25,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
2
],
[
Infinity,
{
"$maxElement" : 1
}
]
]
},
"server" : "Duke-PC:27017",
"filterSet" : false
}
Conclusion:
elemMatch is very special:
$elemMatch explicitly tells that the field "b" is an array.
And according to the query definition on this operator, any element found matching the query then true can be returned immediately. But only completing to scan all elements of the array and not finding any satisfying one, then false can be returned.
But index structure (think about my imagination above) on array can not support this kind of operation because engine can not determine which nodes on index are exactly from a certain array, if only by index. This is the most important point to explain this question.
Other operators have not this limit from their own query definition, such as $gte, $lt, ..., because only one matching can judge it's matched or not, which can be satisfied by index directly.
Finally, there is a way to solve the original question, but not perfectly because the whole element must be provided.
Index on the array field, not the element.
> db.a.ensureIndex({b:1});
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 4,
"numIndexesAfter" : 5,
"ok" : 1
}
> db.a.find({b:{$ne:{x:2, y:3}}}).explain();
{
"cursor" : "BtreeCursor b_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 2,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 33,
"indexBounds" : {
"b" : [
[
{
"$minElement" : 1
},
{
"x" : 2,
"y" : 3
}
],
[
{
"x" : 2,
"y" : 3
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Duke-PC:27017",
"filterSet" : false
}

MongoDB embedded secondary compound index covered slow query

I have following embedded secondary compound index:
db.people.ensureIndex({"sources_names.source_id":1,"sources_names.value":1})
Here is part of db.people.getIndexes():
{
"v" : 1,
"key" : {
"sources_names.source_id" : 1,
"sources_names.value" : 1
},
"ns" : "diglibtest.people",
"name" : "sources_names.source_id_1_sources_names.value_1"
}
So I run following index covered query:
db.people.find({ "sources_names.source_id": ObjectId('5166d57f7a8f348676000001'), "sources_names.value": "Ulrike Weiland" }, {"sources_names.source_id":1, "sources_names.value":1, "_id":0} ).pretty()
{
"sources_names" : [
{
"value" : "Ulrike Weiland",
"source_id" : ObjectId("5166d57f7a8f348676000001")
}
]
}
It took about 5 seconds. So I run explain:
db.people.find({ "sources_names.source_id": ObjectId('5166d57f7a8f348676000001'), "sources_names.value": "Ulrike Weiland" }, {"sources_names.source_id":1, "sources_names.value":1, "_id":0 }).explain()
{
"cursor" : "BtreeCursor sources_names.source_id_1_sources_names.value_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1260353,
"nscanned" : 1260353,
"nscannedObjectsAllPlans" : 1260353,
"nscannedAllPlans" : 1260353,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 4308,
"indexBounds" : {
"sources_names.source_id" : [
[
ObjectId("5166d57f7a8f348676000001"),
ObjectId("5166d57f7a8f348676000001")
]
],
"sources_names.value" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "dash-pc.local:27017"
}
But why this index-covered-query goes through whole database? How should I create index to boost performance?
Thanks!

You are using a multikey index (i.e. sources_names.source_id) in multiple places, from the docs ( http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/#create-indexes-that-support-covered-queries ):
An index cannot cover a query if:
any of the indexed fields in any of the documents in the collection includes an array.
If an indexed field is an array, the index becomes a multi-key index index and cannot
support a covered query.
You can tell this is a multikey index here form the explain:
"isMultiKey" : true,
Basically the dot notation is classed as multikey because sources_names is an array as such the index contains an array.
As for improving the speed: I have not looked in this but your problem is here:
"sources_names.value" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
Whereby the index is not being optimally used to find the sources_names.value.
Edit
I thought that the answer I just gave was a bit weird, since this should not be a multikey index, so I actually went off and tested this:
> db.gh.ensureIndex({'d.id':1,'d.g':1})
> db.gh.find({'d.id':5, 'd.g':'d'})
{ "_id" : ObjectId("516826e5f44947064473a00a"), "d" : { "id" : 5, "g" : "d" } }
> db.gh.find({'d.id':5, 'd.g':'d'}).explain()
{
"cursor" : "BtreeCursor d.id_1_d.g_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"d.id" : [
[
5,
5
]
],
"d.g" : [
[
"d",
"d"
]
]
},
"server" : "ubuntu:27017"
}
It seems my original thoughts where right, this shouldn't be a multikey index. You have some dirty data in value me thinks and it is causing you problems.
I would go through your database and make sure that your records are correctly entered.
You most likely have something like:
{
"sources_names" : [
{
"value" : ["Ulrike Weiland", 1],
"source_id" : ObjectId("5166d57f7a8f348676000001")
}
]
}
Some where.

MongoDB index not helping query with multikey index

I have a collection of documents with a multikey index defined. However, the performance of the query is pretty poor for just 43K documents. Is ~215ms for this query considered poor? Did I define the index correctly if nscanned is 43902 (which equals the total documents in the collection)?
Document:
{
"_id": {
"$oid": "50f7c95b31e4920008dc75dc"
},
"bank_accounts": [
{
"bank_id": {
"$oid": "50f7c95a31e4920009b5fc5d"
},
"account_id": [
"ff39089358c1e7bcb880d093e70eafdd",
"adaec507c755d6e6cf2984a5a897f1e2"
]
}
],
"created_date": "2013,01,17,09,50,19,274089",
}
Index:
{ "bank_accounts.bank_id" : 1 , "bank_accounts.account_id" : 1}
Query:
db.visitor.find({ "bank_accounts.account_id" : "ff39089358c1e7bcb880d093e70eafdd" , "bank_accounts.bank_id" : ObjectId("50f7c95a31e4920009b5fc5d")}).explain()
Explain:
{
"cursor" : "BtreeCursor bank_accounts.bank_id_1_bank_accounts.account_id_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 43902,
"nscanned" : 43902,
"nscannedObjectsAllPlans" : 43902,
"nscannedAllPlans" : 43902,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 213,
"indexBounds" : {
"bank_accounts.bank_id" : [
[
ObjectId("50f7c95a31e4920009b5fc5d"),
ObjectId("50f7c95a31e4920009b5fc5d")
]
],
"bank_accounts.account_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Not_Important"
}

I see three factors in play.
First, for application purposes, make sure that $elemMatch isn't a more appropriate query for this use-case. http://docs.mongodb.org/manual/reference/operator/elemMatch/. It seems like it would be bad if the wrong results came back due to multiple subdocuments satisfying the query.
Second, I imagine the high nscanned value can be accounted for by querying on each of the field values independently. .find({ bank_accounts.bank_id: X }) vs. .find({"bank_accounts.account_id": Y}). You may see that nscanned for the full query is about equal to nscanned of the largest subquery. If the index key were being evaluated fully as a range, this would not be expected, but...
Third, the { "bank_accounts.account_id" : [[{"$minElement" : 1},{"$maxElement" : 1}]] } clause of the explain plan shows that no range is being applied to this portion of the key.
Not really sure why, but I suspect it has something to do with account_id's nature (an array within a subdocument within an array). 200ms seems about right for an nscanned that high.
A more performant document organization might be to denormalize the account_id -> bank_id relationship within the subdocument, and store:
{"bank_accounts": [
{
"bank_id": X,
"account_id: Y,
},
{
"bank_id": X,
"account_id: Z,
}
]}
instead of:
{"bank_accounts": [{
"bank_id": X,
"account_id: [Y, Z],
}]}
My tests below show that with this organization, the query optimizer gets back to work and exerts a range on both keys:
> db.accounts.insert({"something": true, "blah": [{ a: "1", b: "2"} ] })
> db.accounts.ensureIndex({"blah.a": 1, "blah.b": 1})
> db.accounts.find({"blah.a": 1, "blah.b": "A RANGE"}).explain()
{
"cursor" : "BtreeCursor blah.a_1_blah.b_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"blah.a" : [
[
1,
1
]
],
"blah.b" : [
[
"A RANGE",
"A RANGE"
]
]
}
}

Mongo Index not being used

I created an index around several items for a particular query I am doing:
{
"v" : 1,
"key" : {
"MODIFIED" : -1,
"state" : 1,
"fail" : 1,
"generated" : 1
},
"ns" : "foo.bar",
"name" : "MODIFIED_-1_state_1_fail_1_generated"
}
However when I execute my query, it doesn't apear to be using my index. Could you please provide some insite into what I'm doing wrong?
Thank you!
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
"$or": [
{"state": "ca"},
{"state": "ok"}
]
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 464215,
"nscannedObjects" : 464215,
"n" : 0,
"millis" : 7549,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}

There's a good reason your index cannot be used for your query and I also think there are some issues with the query itself. The reason it's not hitting an index is because of the nested $or operator by the way but I think your actual problem is a lack of understanding on all the operators available to you in MongoDB :
First of all, your nested $or to check if the state is either "ca" or "ok" is not necessary and ( since it's the main reason you're not hitting your index) can be replaced with state:{$in:["ca", "ok"]} which does the exact same thing. Now your query is :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
And it will hit your index. Your second issue is that a top-level $and clause is not necessary. Note that AND(OR(A, B), AND(C, D)) = AND(OR(A, B), C, D). This query does the same :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"fail": {"$ne": 1},
"generated": {"$exists": false}
}).explain();
Which still hits the index :
{
"clauses" : [
{
"cursor" : "BtreeCursor MODIFIED_-1_state_1_fail_1_generated_1 multi",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"MODIFIED" : [
[
{
"$maxElement" : 1
},
{
"sec" : 1321419600,
"usec" : 0
}
]
],
"state" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"fail" : [
[
{
"$minElement" : 1
},
1
],
[
1,
{
"$maxElement" : 1
}
]
],
"generated" : [
[
null,
null
]
]
}
},
{
"cursor" : "BasicCursor",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
],
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1
}
Hope that helps! By the way it's slightly more conventional to start the first key in your compound index with order 1 and the second with -1. Note that the -1 is only used to determine the direction relative to the previous field.