Mongo compound index is not chosen - mongodb

I have the following schema:
{
score : { type : Number},
edges : [{name : { type : String }, rank : { type : Number }}],
disable : {type : Boolean},
}
I have tried to build index for the following query:
db.test.find( {
disable: false,
edges: { $all: [
{ $elemMatch:
{ name: "GOOD" } ,
},
{ $elemMatch:
{ name: "GREAT" } ,
},
] },
}).sort({"score" : 1})
.limit(40)
.explain()
First try
When I created the index name "score" :
{
"edges.name" : 1,
"score" : 1
}
The 'explain' returned :
{
"cursor" : "BtreeCursor score",
....
}
Second try
when I changed "score" to:
{
"disable" : 1,
"edges.name" : 1,
"score" : 1
}
The 'explain' returned :
"clauses" : [
{
"cursor" : "BtreeCursor name",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 304,
"nscanned" : 304,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"edges.name" : [
[
"GOOD",
"GOOD"
]
]
}
},
{
"cursor" : "BtreeCursor name",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 304,
"nscanned" : 304,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"edges.name" : [
[
"GOOD",
"GOOD"
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
....
}
Where the 'name' index is :
{
'edges.name' : 1
}
Why is the mongo refuses to use the 'disable' field in the index?
(I've tried other fields too except from 'disable' but got the same problem)

It looks like the query optimiser is choosing the most efficient index and the index on edges.name works best. I recreated your collection by inserting a couple documents according to your schema. I then created the compound index below.
db.test.ensureIndex({
"disable" : 1,
"edges.name" : 1,
"score" : 1
});
When running an explain on the query you specified, the index was used.
> db.test.find({ ... }).sort({ ... }).explain()
{
"cursor" : "BtreeCursor disable_1_edges.name_1_score_1",
"isMultiKey" : true,
...
}
However, as soon as I added the index on edges.name, the query optimiser chose that index for the query.
> db.test.find({ ... }).sort({ ... }).explain()
{
"cursor" : "BtreeCursor edges.name_1",
"isMultiKey" : true,
...
}
You can still hint the other index though, if you want the query to use the compound index.
> db.test.find({ ... }).sort({ ... }).hint("disable_1_edges.name_1_score_1").explain()
{
"cursor" : "BtreeCursor disable_1_edges.name_1_score_1",
"isMultiKey" : true,
...
}
[EDIT: Added possible explanation related to the use of $all.]
Note that if you run the query without $all, the query optimiser uses the compound index.
> db.test.find({
"disable": false,
"edges": { "$elemMatch": { "name": "GOOD" }}})
.sort({"score" : 1}).explain();
{
"cursor" : "BtreeCursor disable_1_edges.name_1_score_1",
"isMultiKey" : true,
...
}
I believe the issue here is that you are using $all. As you can see in the result of your explain field, there are clauses, each pertaining to one of the values you are searching with $all. So the query has to find all pairs of disable and edges.name for each of the clauses. My intuition is that these two runs with the compound index make it slower than a query that looks directly at edges.name and then weeds out disable. This might be related to this issue and this issue, which you might want to look into.

Related

index for gte, lte and sort in different fields

My query to mongodb is:
db.records.find({ from_4: { '$lte': 7495 }, to_4: { '$gte': 7495 } }).sort({ from_11: 1 }).skip(60000).limit(100).hint("from_4_1_to_4_-1_from_11_1").explain()
I suggest that it should use index from_4_1_to_4_-1_from_11_1
{
"from_4": 1,
"to_4": -1,
"from_11": 1
}
But got error:
error: {
"$err" : "Runner error: Overflow sort stage buffered data usage of 33555322 bytes exceeds internal limit of 33554432 bytes",
"code" : 17144
} at src/mongo/shell/query.js:131
How to avoid this error?
Maybe I should create another index, that better fits my query.
I tried index with all ascending fields too ...
{
"from_4": 1,
"to_4": 1,
"from_11": 1
}
... but the same error.
P.S. I noticed, that when I remove skip command ...
> db.records.find({ from_4: { '$lte': 7495 }, to_4: { '$gte': 7495 } }).sort({ from_11: 1 }).limit(100).hint("from_4_1_to_4_-1_from_11_1").explain()
...it's ok, I got explain output, but it says that I don't use index: "indexOnly" : false
{
"clauses" : [
{
"cursor" : "BtreeCursor from_4_1_to_4_-1_from_11_1",
"isMultiKey" : false,
"n" : 100,
"nscannedObjects" : 61868,
"nscanned" : 61918,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"from_4" : [
[
-Infinity,
7495
]
],
"to_4" : [
[
Infinity,
7495
]
],
"from_11" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"from_4" : [
[
-Infinity,
7495
]
],
"to_4" : [
[
Infinity,
7495
]
],
"from_11" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 100,
"nscannedObjects" : 61868,
"nscanned" : 61918,
"nscannedObjectsAllPlans" : 61868,
"nscannedAllPlans" : 61918,
"scanAndOrder" : false,
"nYields" : 832,
"nChunkSkips" : 0,
"millis" : 508,
"server" : "myMac:27026",
"filterSet" : false
}
P.P.S I have read mongo db tutorial about sort indexes and think that I do all right.
Update
accroding #dark_shadow advice I created 2 more indexes:
db.records.ensureIndex({from_11: 1})
db.records.ensureIndex({from_11: 1, from_4: 1, to_4: 1})
and index db.records.ensureIndex({from_11: 1}) becomes what I need:
db.records.find({ from_4: { '$lte': 7495 }, to_4: { '$gte': 7495 } }).sort({ from_11: 1 }).skip(60000).limit(100).explain()
{
"cursor" : "BtreeCursor from_11_1",
"isMultiKey" : false,
"n" : 100,
"nscannedObjects" : 90154,
"nscanned" : 90155,
"nscannedObjectsAllPlans" : 164328,
"nscannedAllPlans" : 164431,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1284,
"nChunkSkips" : 0,
"millis" : 965,
"indexBounds" : {
"from_11" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "myMac:27025",
"filterSet" : false
}
When you use range queries (and you are) mongo query don't use the index for sorting anyway. You can check this by looking at the "scanAndOrder" value of your explain() once you test your query. If that value exists and is true it means it'll sort the resultset in memory (scan and order) rather than use the index directly. This is the reason why you are getting error in your first query.
As the Mongodb documentation says,
For in-memory sorts that do not use an index, the sort() operation is significantly slower. The sort() operation will abort when it uses 32 megabytes of memory.
You can check the value of scanAndOrder in your first query by using limit(100) for in memory sorting.
Your second query works because you have used limit so it will sort only 100 documents which can be done in memory.
Why "indexOnly" : false ?
This simply indicates that all the fields you wish to return are not in the index, the BtreeCursor indicates that the index was used for the query (a BasicCursor would mean it had not). For this to be an indexOnly query, you would need to be returning only the those fields in the index (that is: {_id : 0,from_4 :1, to_4:1, from_11 :1 }) in your projection. That would mean that it would never have to touch the data itself and could return everything you need from the index alone. You can check this also using the explain once you have modified your query for returning only mentioned fields.
Now, you will be confused. It uses index or not ? For sorting, it won't use the index but for querying it is using the index. That's the reason you get BtreeCusrsor (you should have seen your index name also in that).
Now, to solve your problem you can either create two index:
{
"from_4": 1,
"to_4": 1,
}
{
"from_11" : 1
}
and then see if it's giving error now or using your index for sorting by carefully observing scanOrder value.
There is one more work around:
Change the order of compund index:
{
"FROM_11" : 1,
"from_4": 1,
"to_4": 1,
}
NOT SURE ABOUT THIS APPROACH. It should work hopefully.
Looking at what you are trying to get, you can also do sort with {from_11:-1}.limit(1868).
I hope I have made the things a bit clearer now. Please do some testing based on my suggestions. If you face any issues, please let me know. We can work on it.
Thanks

Speed up MongoDB aggregation

I have a sharded collection "my_collection" with the following structure:
{
"CREATED_DATE" : ISODate(...),
"MESSAGE" : "Test Message",
"LOG_TYPE": "EVENT"
}
The mongoDB environment is sharded with 2 shards. The above collection is sharded using Hashed shard key on LOG_TYPE. There are 7 more other possibilities for LOG_TYPE attribute.
I have 1 million documents in "my_collection" and I am trying to find the count of documents based on the LOG_TYPE using the following query:
db.my_collection.aggregate([
{ "$group" :{
"_id": "$LOG_TYPE",
"COUNT": { "$sum":1 }
}}
])
But this is getting me result in about 3 seconds. Is there any way to improve this? Also when I run the explain command, it shows that no Index has been used. Does the group command doesn't use an Index?
There are currently some limitations in what aggregation framework can do to improve the performance of your query, but you can help it the following way:
db.my_collection.aggregate([
{ "$sort" : { "LOG_TYPE" : 1 } },
{ "$group" :{
"_id": "$LOG_TYPE",
"COUNT": { "$sum":1 }
}}
])
By adding a sort on LOG_TYPE you will be "forcing" the optimizer to use an index on LOG_TYPE to get the documents in order. This will improve the performance in several ways, but differently depending on the version being used.
On real data if you have the data coming into the $group stage sorted, it will improve the efficiency of accumulation of the totals. You can see the different query plans where with $sort it will use the shard key index. The improvement this gives in actual performance will depend on the number of values in each "bucket" - in general LOG_TYPE having only seven distinct values makes it an extremely poor shard key, but it does mean that it all likelihood the following code will be a lot faster than even optimized aggregation:
db.my_collection.distinct("LOG_TYPE").forEach(function(lt) {
print(db.my_collection.count({"LOG_TYPE":lt});
});
There are a limited number of things that you can do in MongoDB, at the end of the day this might be a physical problem that extends beyond MongoDB itself, maybe latency causing configsrvs to respond untimely or results to be brought back from shards too slowly.
However you might be able to solve some performane problems by using a covered query. Since you are in fact sharding on LOG_TYPE you will already have an index on it (required before you can shard on it), not only that but the aggregation framework will auto add projection so that won't help.
MongoDB is likely having to communicate to every shard for the results, otherwise called a scatter and gather operation.
$group on its own will not use an index.
This is my results on 2.4.9:
> db.t.ensureIndex({log_type:1})
> db.t.runCommand("aggregate", {pipeline: [{$group:{_id:'$log_type'}}], explain: true})
{
"serverPipeline" : [
{
"query" : {
},
"projection" : {
"log_type" : 1,
"_id" : 0
},
"cursor" : {
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"allPlans" : [
{
"cursor" : "BasicCursor",
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"indexBounds" : {
}
}
],
"server" : "ubuntu:27017"
}
},
{
"$group" : {
"_id" : "$log_type"
}
}
],
"ok" : 1
}
This is the result from 2.6:
> use gthtg
switched to db gthtg
> db.t.insert({log_type:"fdf"})
WriteResult({ "nInserted" : 1 })
> db.t.ensureIndex({log_type: 1})
{ "numIndexesBefore" : 2, "note" : "all indexes already exist", "ok" : 1 }
> db.t.runCommand("aggregate", {pipeline: [{$group:{_id:'$log_type'}}], explain: true})
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"log_type" : 1,
"_id" : 0
},
"plan" : {
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false,
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false
}
]
}
}
},
{
"$group" : {
"_id" : "$log_type"
}
}
],
"ok" : 1
}

MongoDB - Index Intersection with two multikey indexes

I have two arrays in my collection (one is an embedded document and the other one is just a simple collection of strings). A document for example:
{
"_id" : ObjectId("534fb7b4f9591329d5ea3d0c"),
"_class" : "discussion",
"title" : "A",
"owner" : "1",
"tags" : ["tag-1", "tag-2", "tag-3"],
"creation_time" : ISODate("2014-04-17T11:14:59.777Z"),
"modification_time" : ISODate("2014-04-17T11:14:59.777Z"),
"policies" : [
{
"participant_id" : "2",
"action" : "CREATE"
}, {
"participant_id" : "1",
"action" : "READ"
}
]
}
Since some of the queries will include only the policies and some will include the tags and the participants arrays, and considering the fact that I can't create an multikey indexe with two arrays, I thought that it will be a classic scenario to use the Index Intersection.
I'm executing a query , but I can't see the intersection kicks in.
Here are the indexes:
db.discussion.getIndexes()
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test-fw.discussion"
},
{
"v" : 1,
"key" : {
"tags" : 1,
"creation_time" : 1
},
"name" : "tags",
"ns" : "test-fw.discussion",
"dropDups" : false,
"background" : false
},
{
"v" : 1,
"key" : {
"policies.participant_id" : 1,
"policies.action" : 1
},
"name" : "policies",
"ns" : "test-fw.discussion"
}
Here is the query:
db.discussion.find({
"$and" : [
{ "tags" : { "$in" : [ "tag-1" , "tag-2" , "tag-3"] }},
{ "policies" : { "$elemMatch" : {
"$and" : [
{ "participant_id" : { "$in" : [
"participant-1",
"participant-2",
"participant-3"
]}},
{ "action" : "READ"}
]
}}}
]
})
.limit(20000).sort({ "creation_time" : 1 }).explain();
And here is the result of the explain:
"clauses" : [
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-1",
"tag-1"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-2",
"tag-2"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-3",
"tag-3"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 20000,
"nscannedObjects" : 30000,
"nscanned" : 30000,
"nscannedObjectsAllPlans" : 30203,
"nscannedAllPlans" : 30409,
"scanAndOrder" : false,
"nYields" : 471,
"nChunkSkips" : 0,
"millis" : 165,
"server" : "User-PC:27017",
"filterSet" : false
Each of the tags in the query (tag1, tag-2 and tag-3 ) have 10K documents.
Each of the policies ({participant-1,READ},{participant-2,READ},{participant-3,READ}) have 10K documents.
The AND operator results with 20K documents.
As I said earlier, I can't see why the intersection of the two indexes (I mean the policies and the tags indexes), doesn't kick in.
Can someone please shade some light on the thing that I'm missing?
There are two things that are actually important to your understanding of this.
The first point is that the query optimizer can only use one index when resolving the query plan and cannot use both of the indexes you have specified. As such it picks the one that is the best "fit" by it's own determination, unless you explicitly specify this with a hint. Intersection somewhat suits, but now for the next point:
The second point is documented in the limitations of compound indexes. This actually points out that even if you were to "try" to create a compound index that included both of the array fields you want, then you could not. The problem here is that as an array this introduces too many possibilities for the bounds keys, and a multi-key index already introduces a fair level of complexity when used in compound with a standard field.
The limitations on combining the two multi-key indexes is the main problem here, much as it is on creation, the complexity of "combining" the two produces two many permutations to make it a viable option.
It might just be the case that the policies index is actually going to be the better one to use for this type of search, and you could probably amend this by specifying that field in the query first:
db.discussion.find({
{
"policies" : { "$elemMatch" : {
"participant_id" : { "$in" : [
"participant-1",
"participant-2",
"participant-3"
]},
"action" : "READ"
}},
"tags" : { "$in" : [ "tag-1" , "tag-2" , "tag-3"] }
}
)
That is if that will select the smaller range of data, which it probably does. Otherwise use the hint modifier as mentioned earlier.
If that does not actually directly help results, it might be worth re-considering the schema to something that would not involve having those values in array fields or some other type of "meta" field that could be easily looked up with an index.
Also note in the edited form that all the wrapping $and statements should not be required as "and" is implicit in MongoDB queries. As a modifier it is only required if you want two different conditions on the same field.
After doing a little testing, I believe Mongo can, in fact, use two multikey indexes in an intersection. I created a collection with the following structure:
{
"_id" : ObjectId("54e129c90ab3dc0006000001"),
"bar" : [
"hgcmdflitt",
...
"nuzjqxmzot"
],
"foo" : [
"bxzvqzlpwy",
...
"xcwrwluxbd"
]
}
I created indexes on foo and bar and then ran the following query. Note the "true" passed in to explain. This enables verbose mode.
db.col.find({"bar":"hgcmdflitt", "foo":"bxzvqzlpwy"}).explain(true)
In the verbose results, you can find the "allPlans" section of the response, which will show you all of the query plans mongo considered.
"allPlans" : [
{
"cursor" : "BtreeCursor bar_1",
...
},
{
"cursor" : "BtreeCursor foo_1",
...
},
{
"cursor" : "Complex Plan"
...
}
]
If you see a plan with "cursor" : "Complex Plan" that means mongo considered using an index intersection. To find the reasons why mongo might not have decided to actually use that query plan, see this answer: Why doesn't MongoDB use index intersection?

MongoDB embedded secondary compound index covered slow query

I have following embedded secondary compound index:
db.people.ensureIndex({"sources_names.source_id":1,"sources_names.value":1})
Here is part of db.people.getIndexes():
{
"v" : 1,
"key" : {
"sources_names.source_id" : 1,
"sources_names.value" : 1
},
"ns" : "diglibtest.people",
"name" : "sources_names.source_id_1_sources_names.value_1"
}
So I run following index covered query:
db.people.find({ "sources_names.source_id": ObjectId('5166d57f7a8f348676000001'), "sources_names.value": "Ulrike Weiland" }, {"sources_names.source_id":1, "sources_names.value":1, "_id":0} ).pretty()
{
"sources_names" : [
{
"value" : "Ulrike Weiland",
"source_id" : ObjectId("5166d57f7a8f348676000001")
}
]
}
It took about 5 seconds. So I run explain:
db.people.find({ "sources_names.source_id": ObjectId('5166d57f7a8f348676000001'), "sources_names.value": "Ulrike Weiland" }, {"sources_names.source_id":1, "sources_names.value":1, "_id":0 }).explain()
{
"cursor" : "BtreeCursor sources_names.source_id_1_sources_names.value_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1260353,
"nscanned" : 1260353,
"nscannedObjectsAllPlans" : 1260353,
"nscannedAllPlans" : 1260353,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 4308,
"indexBounds" : {
"sources_names.source_id" : [
[
ObjectId("5166d57f7a8f348676000001"),
ObjectId("5166d57f7a8f348676000001")
]
],
"sources_names.value" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "dash-pc.local:27017"
}
But why this index-covered-query goes through whole database? How should I create index to boost performance?
Thanks!
You are using a multikey index (i.e. sources_names.source_id) in multiple places, from the docs ( http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/#create-indexes-that-support-covered-queries ):
An index cannot cover a query if:
any of the indexed fields in any of the documents in the collection includes an array.
If an indexed field is an array, the index becomes a multi-key index index and cannot
support a covered query.
You can tell this is a multikey index here form the explain:
"isMultiKey" : true,
Basically the dot notation is classed as multikey because sources_names is an array as such the index contains an array.
As for improving the speed: I have not looked in this but your problem is here:
"sources_names.value" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
Whereby the index is not being optimally used to find the sources_names.value.
Edit
I thought that the answer I just gave was a bit weird, since this should not be a multikey index, so I actually went off and tested this:
> db.gh.ensureIndex({'d.id':1,'d.g':1})
> db.gh.find({'d.id':5, 'd.g':'d'})
{ "_id" : ObjectId("516826e5f44947064473a00a"), "d" : { "id" : 5, "g" : "d" } }
> db.gh.find({'d.id':5, 'd.g':'d'}).explain()
{
"cursor" : "BtreeCursor d.id_1_d.g_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"d.id" : [
[
5,
5
]
],
"d.g" : [
[
"d",
"d"
]
]
},
"server" : "ubuntu:27017"
}
It seems my original thoughts where right, this shouldn't be a multikey index. You have some dirty data in value me thinks and it is causing you problems.
I would go through your database and make sure that your records are correctly entered.
You most likely have something like:
{
"sources_names" : [
{
"value" : ["Ulrike Weiland", 1],
"source_id" : ObjectId("5166d57f7a8f348676000001")
}
]
}
Some where.

MongoDB two seemingly identical queries - one uses an index the other doesn't

I noticed in Mongo logs that some queries were taking longer than expected.
Fri Jan 4 08:53:39 [conn587] query mydb.User query: { query: { someField: "eu", lastRecord.importantValue: { $ne: nan.0 }, lastRecord.otherValue: { $gte: 1000 } }, orderby: { lastRecord.importantValue: -1 } } ntoreturn:50 ntoskip:0 nscanned:40681 scanAndOrder:1 keyUpdates:0 numYields: 1 locks(micros) r:2649788 nreturned:50 reslen:334041 1575ms
Considering that I had built an index on {someField : 1, "lastRecord.otherValue" : 1, "lastRecord.importantValue" : -1} I went to investigate.
During that I noticed that what seem like two identical queries to me, just phrased differently syntactically and what return identical values, are executed differently by MongoDB - one uses the index as expected, while the other doesn't.
And my web application invokes the version that doesn't use indexes.
I'm obviously misunderstanding something fundamental here.
Index used fine:
> db.User.find({}, {_id:1}).sort({"lastRecord.importantValue" : -1}).limit(5).explain()
{
"cursor" : "BtreeCursor lastRecord.importantValue_-1",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 5,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 5,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"lastRecord.importantValue" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "whatever"
}
Index not used:
> db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}}, {_id:1}).limit(5).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 67281,
"nscanned" : 67281,
"nscannedObjectsAllPlans" : 67281,
"nscannedAllPlans" : 67281,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 133,
"indexBounds" : {
},
"server" : "whatever"
}
Hint doesn't help:
> db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}, $hint : {"lastRecord.importantValue" : -1}}, {_id:1}).limit(5).explain()
{
"cursor" : "BasicCursor",
// snip
}
However returned values are same (like expected):
> db.User.find({}, {_id:1}).sort({"lastRecord.importantValue" : -1}).limit(5)
{ "_id" : NumberLong(500280899) }
{ "_id" : NumberLong(500335132) }
{ "_id" : NumberLong(500378261) }
{ "_id" : NumberLong(500072584) }
{ "_id" : NumberLong(500071366) }
> db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}}, {_id:1}).limit(5)
{ "_id" : NumberLong(500280899) }
{ "_id" : NumberLong(500335132) }
{ "_id" : NumberLong(500378261) }
{ "_id" : NumberLong(500072584) }
{ "_id" : NumberLong(500071366) }
Index is present (this one I created to test the simpler queries, the compound index is also still there):
> db.User.getIndexes()
[
// snip other indexes
{
"v" : 1,
"key" : {
"lastRecord.importantValue" : -1
},
"ns" : "mydb.User",
"name" : "lastRecord.importantValue_-1"
}
]
Morphia code just FYI (not sure if I can get exact command it generates):
ds.find(User.class).filter("someField =", v1)
.filter("lastRecord.importantValue !=", Double.NaN)
.filter("lastRecord.otherValue >=", v2)
.order("-lastRecord.importantValue")
.limit(50);
Any ideas?
Edit 6-Jan:
Just noticed another instance of this in the logs:
TOKEN ip-10-212-234-60 Sun Jan 6 09:20:54 [conn249] query mydb.User query: { query: { someField: "eu", lastRecord.otherValue: { $gte: 1000 } }, orderby: { lastRecord.importantValue: -1 } } cursorid:9069510232503855502 ntoreturn:50 ntoskip:0 nscanned:2042 keyUpdates:0 locks(micros) r:118923 nreturned:50 reslen:344959 118ms
Note I have since removed the $ne from the query. So it executes in 118 ms and (if I interpret right) scans 2042 rows only.
However if I do the following from console on the same server:
> db.User.find({$query: { someField: "eu", "lastRecord.otherValue": { $gte: 1000 } }, $orderby: { "lastRecord.importantValue": -1 } }).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 70308,
"nscanned" : 70308,
"nscannedObjectsAllPlans" : 70308,
"nscannedAllPlans" : 70308,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 847,
"indexBounds" : {
},
"server" : "ip-whatever:27017"
}
So could it really be just a bug in explain?
Edit - further update 6-Jan:
On the other hand on my local system (same indexes, including "{someField : 1, "lastRecord.otherValue" : 1, "lastRecord.importantValue" : -1}") I managed to obtain the following under load:
Sun Jan 06 17:43:56 [conn33] query mydb.User query: { query: { someField: "eu", lastRecord.otherValue: { $gte: 1000 } }, orderby: { lastRecord.importantValue: -1 } }
cursorid:76077040284571 ntoreturn:50 ntoskip:0 nscanned:183248 keyUpdates:0 numYields: 2318 locks(micros) r:285016301 nreturned:50 reslen:341500 148567ms
148567ms :(
In fact the problem is mixing up the two syntax.
According to the documentation : http://docs.mongodb.org/manual/reference/operator/query/
So when you use .explain in :
db.User.find({}, {_id:1}).sort({"lastRecord.importantValue" : -1}).limit(5).explain()
fine but when you use this :
db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}}, {_id:1}).limit(5).explain()
Explain is used in the first type of syntax : you should use $explain:1 inside the $query and not .explain after.
See this question also : MongoDB $query operator ignores index?
It is quite about the same issue.