I created an index around several items for a particular query I am doing:
{
"v" : 1,
"key" : {
"MODIFIED" : -1,
"state" : 1,
"fail" : 1,
"generated" : 1
},
"ns" : "foo.bar",
"name" : "MODIFIED_-1_state_1_fail_1_generated"
}
However when I execute my query, it doesn't apear to be using my index. Could you please provide some insite into what I'm doing wrong?
Thank you!
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
"$or": [
{"state": "ca"},
{"state": "ok"}
]
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 464215,
"nscannedObjects" : 464215,
"n" : 0,
"millis" : 7549,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
There's a good reason your index cannot be used for your query and I also think there are some issues with the query itself. The reason it's not hitting an index is because of the nested $or operator by the way but I think your actual problem is a lack of understanding on all the operators available to you in MongoDB :
First of all, your nested $or to check if the state is either "ca" or "ok" is not necessary and ( since it's the main reason you're not hitting your index) can be replaced with state:{$in:["ca", "ok"]} which does the exact same thing. Now your query is :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
And it will hit your index. Your second issue is that a top-level $and clause is not necessary. Note that AND(OR(A, B), AND(C, D)) = AND(OR(A, B), C, D). This query does the same :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"fail": {"$ne": 1},
"generated": {"$exists": false}
}).explain();
Which still hits the index :
{
"clauses" : [
{
"cursor" : "BtreeCursor MODIFIED_-1_state_1_fail_1_generated_1 multi",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"MODIFIED" : [
[
{
"$maxElement" : 1
},
{
"sec" : 1321419600,
"usec" : 0
}
]
],
"state" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"fail" : [
[
{
"$minElement" : 1
},
1
],
[
1,
{
"$maxElement" : 1
}
]
],
"generated" : [
[
null,
null
]
]
}
},
{
"cursor" : "BasicCursor",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
],
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1
}
Hope that helps! By the way it's slightly more conventional to start the first key in your compound index with order 1 and the second with -1. Note that the -1 is only used to determine the direction relative to the previous field.
Related
The following document is stored in a collection:
"ldr": {
"d": NumberInt(318),
"w": NumberInt(46),
"m": NumberInt(10),
"pts": [
{
"lid": ObjectId("47cc67093475061e3d95369d"),
"dPts": NumberLong(110),
"wPts": NumberLong(110),
"mPts": NumberLong(220),
"aPts": NumberLong(3340)
},
{
"lid": ObjectId("56316279be4f0eda62ebfee0"),
"dPts": NumberInt(0),
"wPts": NumberInt(0),
"mPts": NumberInt(0),
"aPts": NumberInt(0)
}
]
}
I have 4 indexes on a collection:
ldr.pts.lid_1_ldr.d_1_ldr.pts.dPts_-1
ldr.pts.lid_1_ldr.w_1_ldr.pts.wPts_-1
ldr.pts.lid_1_ldr.m_1_ldr.pts.mPts_-1
ldr.pts.lid_1_ldr.pts.aPts_-1
I use the following query:
db.my_collection.find({"ldr.pts.lid":ObjectId("47cc67093475061e3d95369d"), "ldr.w": NumberInt(46)},{"ldr":1}).sort({"ldr.pts.wPts":-1}).explain()
Note: I have run this query with the {ldr:1} left out with the same result.
I would expect the query above to use the following index:
ldr.pts.lid_1_ldr.w_1_ldr.pts.wPts_-1
However, this is the result of the explain:
{
"cursor" : "BtreeCursor ldr.pts.lid_1_ldr.d_1_ldr.pts.dPts_-1",
"isMultiKey" : true,
"n" : 3,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 16,
"nscannedAllPlans" : 16,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"ldr.pts.lid" : [
[
ObjectId("47cc67093475061e3d95369d"),
ObjectId("47cc67093475061e3d95369d")
]
],
"ldr.d" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"ldr.pts.dPts" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "Beast-PC:27017",
"filterSet" : false
}
As you can see the first index is being picked.
I've tried using a hint and supplying the correct index but that still results in indexOnly being false and in scanAndOrder being true.
Any ideas?
Sorting on a field within an array isn't likely to produce what you're expecting as your descending sort on ldr.pts.wPts will sort based on the max of all the wPts values from each document's pts array, rather than just the wPts value from the matching pts array element.
That's at the root of why your query can't use an index for the sorting.
I've spent the better part of this morning re-reading MongoDB docs, blogs and other answers on the stack her and I'm still missing something which I hope is painfully obvious to others.
EDIT: I've changed the scheme of the document to not have sub-documents (metadata.*) and am still having problems with the index not being covered. I've dropped the existing indexes and re-indexed with new ones:
So not I've got:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.daily"
},
{
"v" : 1,
"key" : {
"host" : 1,
"cid" : 1,
"title" : 1,
"urls" : 1,
"global" : -1,
"current" : -1,
"total" : -1
},
"name" : "byHostTotals",
"ns" : "test.daily"
},
{
"v" : 1,
"key" : {
"host" : 1,
"cid" : 1,
"title" : 1,
"urls" : 1,
"total" : -1,
"global" : -1,
"current" : -1
},
"name" : "byHostCurrents",
"ns" : "test.daily"
}
]
And given this query:
db.daily.find({'host': 'example.com'}, {'_id': 0, 'cid': 1, 'title': 1, 'current': 1}).hint("byHostCurrents").sort({'current': -1}).limit(10).explain()
is not showing up as being covered by the index named "byHostCurrent":
{
"clauses" : [
{
"cursor" : "BtreeCursor byHostCurrents",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 1090,
"nscanned" : 1111,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"host" : [
[
"example.com",
"example.com"
]
],
"cid" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"title" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"total" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
],
"global" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
],
"current" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"host" : [
[
"usatoday.com",
"usatoday.com"
]
],
"cid" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"title" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"total" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
],
"global" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
],
"current" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 10,
"nscannedObjects" : 1090,
"nscanned" : 1111,
"nscannedObjectsAllPlans" : 1090,
"nscannedAllPlans" : 1111,
"scanAndOrder" : false,
"nYields" : 8,
"nChunkSkips" : 0,
"millis" : 9,
"server" : "ubuntu:27017",
"filterSet" : false
}
MongoDB version is: 2.6.3.
So here's the skinny...
When the Query:
db.daily.find({'host': 'example.com'}, {'_id': 0, 'cid': 1, 'title': 1, 'current': 1}).hint("byHostCurrents").sort({'current': -1}).limit(10);
If I didn't have the .sort() in there it would use the index, however since I'm using sort NOW the ORDER of the Indexed fields become important.
For the above query to use an index, I'd need to make a new index like this:
db.daily.ensureIndex({'current': -1, 'host': 1, 'cid': 1, 'title': 1});
Now with this index in place, we'll get indexOnly: true since we're looking down the total currents in reverse order (descending) and we'll only have to scan as many entries in the index as is needed to meet the 'host' = 'example.com' and limit requirements.
So in total I had to have 4 additional indexes to support my queries:
one to find the content ids with the most current people on it (the above index)
one to find the content ids that have had the most people on it (like the above but totals: -1 rather than current:-1)
one to find content by host sorted by current (see index below) and,
one to find content by host sorted by total (like the one below)
db.daily.ensureIndex({'host': 1, 'current': -1, 'cid': 1, 'title': 1});
So the MongoDB docs are not very clear on their explanation of these things especially when looking at the sort issue. What they lack to say is if you are going to use a sort, you have to include the prefix fields after your equality query or to include all the prefix fields.
For example given my original index from my question:
db.daily.ensureIndex({"host" : 1, "cid" : 1, "title" : 1, "urls" : 1, "global" : -1, "current" : -1, "total" : -1});
If I wanted a query to be covered by the index then I'd have to change from this:
db.daily.find({'host': 'example.com'}, {'_id': 0, 'cid': 1, 'title': 1, 'current': 1}).hint("byHostCurrents").sort({'current': -1}).limit(10);
To This:
db.daily.find({'host': 'example.com'}, {'_id': 0, 'cid': 1, 'title': 1, 'current': 1}).hint("byHostCurrents").sort({'cid':1, 'title':1, 'urls': 1, 'global: 1, 'current': -1}).limit(10);
which is not what I wanted.
Hope this helps someone in the future.
It seems to me that the following two queries should have exactly the same "explain" output:
Query 1:
{
$and: [
{ $or: [
{ Foo: "123" },
{ Bar: "456" }
] },
{ Baz: { $in: ["abc", "def"] } }
]
}
Query 2:
{
$or: [
{ Foo: "123" },
{ Bar: "456" }
],
Baz: { $in: ["abc", "def"] } }
}
Note that I have indexes on { Foo: -1, Baz: -1 } and { Bar: -1, Baz: -1 }, so this is optimized for the $or operator. And in fact, in the version for Query 2, in the explain output, I see two clauses, both with appropriate index bounds, one for (Foo, Baz) and one for (Bar, Baz). MongoDB is doing exactly what it's supposed to.
But in the first version (Query 1), there are no clauses anymore. It gives me a BasicCursor with no index bounds specified.
What's the difference between these two queries? Why does Mongo seem to be able to optimize #2 but not #1?
Right now I'm testing these queries using MongoVue, so I have control over the JSON, but ultimately I'm going to be using the C# driver, and I'm pretty sure it will always emit the syntax in #1 and not #2, so it's important to find out what's going on...
This seems to be a bug of some kind in mongodb. What version are you using?
According to that bug report the issue is resolved in 2.5.3.
Until we move to the later versions (I am at 2.4.6) we will have to be careful with the $and operator.
I am going to try it in 2.6 as well.
UPDATE:
Indeed it is fixed in 2.6.3 that I am now.
> db.test.find()
{ "_id" : 1, "Fields" : { "K1" : 123, "K2" : 456 } }
{ "_id" : 2, "Fields" : { "K1" : 456, "K2" : 123 } }
> db.test.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.test"
},
{
"v" : 1,
"key" : {
"Fields.K1" : 1
},
"name" : "Fields.K1_1",
"ns" : "test.test"
},
{
"v" : 1,
"key" : {
"Fields.K2" : 1
},
"name" : "Fields.K2_1",
"ns" : "test.test"
}
]
> db.test.find({"$and" : [{ "Fields.K1" : 123, "Fields.K2" : 456}]}).explain()
{
"cursor" : "BtreeCursor Fields.K1_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"Fields.K1" : [
[
123,
123
]
]
},
"server" : "benihime:27017",
"filterSet" : false
}
> db.test.find({ "Fields.K1" : 123, "Fields.K2" : 456}).explain()
{
"cursor" : "BtreeCursor Fields.K1_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"Fields.K1" : [
[
123,
123
]
]
},
"server" : "benihime:27017",
"filterSet" : false
}
I have 200k records in my collection. My data model looks like as follows:
{
"_id" : ObjectId("51750ec159dcef125863b7c4"),
"DateAdded" : ISODate("2013-04-22T00:00:00.000Z"),
"DateRemoved" : ISODate("2013-12-22T00:00:00.000Z"),
"DealerID" : ObjectId("51750bd559dcef07ec964a41"),
"ExStockID" : "8324482",
"Make" : "Mazda",
"Model" : "3",
"Price" : 11479,
"Year" : 2012,
"Variant" : "1.6d (115) TS 5dr",
"Turnover": 150
}
I have several indexes for the collection, one of those used for aggregation framework is:
{
"DealerID" : 1,
"DateRemoved" : -1,
"Price" : 1,
"Turnover" : 1
}
The aggregate query which is being used:
db.stats.aggregate([
{
"$match": {
"DealerID": {
"$in": [
ObjectId("523325ac59dcef1b90a3d446"),
....
// here is specified more than 150 ObjectIds
]
},
"DateRemoved": {
"$gte": ISODate("2013-12-01T00:00:00Z"),
"$lt": ISODate("2014-01-01T00:00:00Z")
}
}
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
"$group": {
"_id": null,
"Price": {
"$avg": "$Price"
},
"Turnover": {
"$avg": "$Turnover"
}
}
}]);
and the time for this query executions resides between 30-200 seconds.
How can I optimize this?
You can try to run explain on the aggregation pipeline, but as I don't have your full dataset, I can't try it out properly:
p = [
{
"$match": {
"DealerID": {
"$in": [
ObjectId("51750bd559dcef07ec964a41"),
ObjectId("51750bd559dcef07ec964a44"),
]
},
"DateRemoved": {
"$gte": ISODate("2013-12-01T00:00:00Z"),
"$lt": ISODate("2014-01-01T00:00:00Z")
}
}
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
"$group": {
"_id": null,
"Price": {
"$avg": "$Price"
},
"Turnover": {
"$avg": "$Turnover"
}
}
}];
db.s.runCommand('aggregate', { pipeline: p, explain: true } );
I would suggest to remove the fields that are not part of the $match (Price and Turnover). Also, I think you should switch the order of DealerId and DateRemoved as you want to do one range search, and from that range then include all the dealers. Doing it the other way around means that you can really only use the index for the 150 single items, and then you need to do a range search.
Using #Derick's answer I have found the index which prevented to create the covered index. As far as I can see query optimizer uses the first index which covers just the query itself, so I have changed the order of indexes. So here is outcome before and after.
Before:
{
"serverPipeline" : [
{
"query" : {...},
"projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
"cursor" : {
"cursor" : "BtreeCursor DealerIDDateRemoved multi",
"isMultiKey" : false,
"n" : 11036,
"nscannedObjects" : 11008,
"nscanned" : 11307,
"nscannedObjectsAllPlans" : 11201,
"nscannedAllPlans" : 11713,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 58,
"indexBounds" : {...},
"allPlans" : [...],
"oldPlan" : {...},
"server" : "..."
}
},
{
"$group" : {...}
}
],
"ok" : 1
}
After these changes indexOnly param now shows true, this means we have just created the covered index:
{
"serverPipeline" : [
{
"query" : {...},
"projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
"cursor" : {
"cursor" : "BtreeCursor DealerIDDateRemovedPriceTurnover multi",
"isMultiKey" : false,
"n" : 11036,
"nscannedObjects" : 0,
"nscanned" : 11307,
"nscannedObjectsAllPlans" : 285,
"nscannedAllPlans" : 11713,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 58,
"indexBounds" : {...},
"allPlans" : [...],
"server" : "..."
}
},
{
"$group" : {...}
],
"ok" : 1
}
Now the query works approximately between 0.085-0.300 seconds. Additional information about covered queries Create Indexes that Support Covered Queries
I've got a question on how to write an index properly to avoid resorting to a hint.
Sample "Test" Collection Schema
{
_id: ObjectId(<whatever>),
a: <whatever>,
b: <whatever>,
c: <whatever>,
d: <whatever>,
e: {
f: <whatever>,
g: <whatever>
}
}
Index on "Test"
db.test.ensureIndex( { "a": NumberInt(1), "c": NumberInt(1), "_id": NumberInt(1), "d": NumberInt(1) },
{ name: "a_1_c_1__id_1_d_1", background: true } );
Query without hint and query with hint...
> db.test.find({},{d:1}).explain();
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 752,
"nscannedObjects" : 752,
"nscanned" : 752,
"nscannedObjectsAllPlans" : 752,
"nscannedAllPlans" : 752,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 5,
"indexBounds" : {
},
"server" : <whatever>
}
> db.test.find({},{d:1}).hint("a_1_c_1__id_1_d_1").explain();
{
"cursor" : "BtreeCursor a_1_c_1__id_1_d_1",
"isMultiKey" : false,
"n" : 752,
"nscannedObjects" : 752,
"nscanned" : 752,
"nscannedObjectsAllPlans" : 752,
"nscannedAllPlans" : 752,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"c" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"d" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : <whatever>
}
I'd (obviously) like the query to use the covered index but I don't know how to get there without using the hint. Is it possible? I'd prefer to manipulate the index vs. changing the query but changing the query is an option, if need be.
Turns out this is a known issue. Apologies for the post.
https://jira.mongodb.org/browse/SERVER-2109