Increase Aggregation Framework Performance in MongoDb - mongodb

I have 200k records in my collection. My data model looks like as follows:
{
"_id" : ObjectId("51750ec159dcef125863b7c4"),
"DateAdded" : ISODate("2013-04-22T00:00:00.000Z"),
"DateRemoved" : ISODate("2013-12-22T00:00:00.000Z"),
"DealerID" : ObjectId("51750bd559dcef07ec964a41"),
"ExStockID" : "8324482",
"Make" : "Mazda",
"Model" : "3",
"Price" : 11479,
"Year" : 2012,
"Variant" : "1.6d (115) TS 5dr",
"Turnover": 150
}
I have several indexes for the collection, one of those used for aggregation framework is:
{
"DealerID" : 1,
"DateRemoved" : -1,
"Price" : 1,
"Turnover" : 1
}
The aggregate query which is being used:
db.stats.aggregate([
{
"$match": {
"DealerID": {
"$in": [
ObjectId("523325ac59dcef1b90a3d446"),
....
// here is specified more than 150 ObjectIds
]
},
"DateRemoved": {
"$gte": ISODate("2013-12-01T00:00:00Z"),
"$lt": ISODate("2014-01-01T00:00:00Z")
}
}
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
"$group": {
"_id": null,
"Price": {
"$avg": "$Price"
},
"Turnover": {
"$avg": "$Turnover"
}
}
}]);
and the time for this query executions resides between 30-200 seconds.
How can I optimize this?

You can try to run explain on the aggregation pipeline, but as I don't have your full dataset, I can't try it out properly:
p = [
{
"$match": {
"DealerID": {
"$in": [
ObjectId("51750bd559dcef07ec964a41"),
ObjectId("51750bd559dcef07ec964a44"),
]
},
"DateRemoved": {
"$gte": ISODate("2013-12-01T00:00:00Z"),
"$lt": ISODate("2014-01-01T00:00:00Z")
}
}
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
"$group": {
"_id": null,
"Price": {
"$avg": "$Price"
},
"Turnover": {
"$avg": "$Turnover"
}
}
}];
db.s.runCommand('aggregate', { pipeline: p, explain: true } );
I would suggest to remove the fields that are not part of the $match (Price and Turnover). Also, I think you should switch the order of DealerId and DateRemoved as you want to do one range search, and from that range then include all the dealers. Doing it the other way around means that you can really only use the index for the 150 single items, and then you need to do a range search.

Using #Derick's answer I have found the index which prevented to create the covered index. As far as I can see query optimizer uses the first index which covers just the query itself, so I have changed the order of indexes. So here is outcome before and after.
Before:
{
"serverPipeline" : [
{
"query" : {...},
"projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
"cursor" : {
"cursor" : "BtreeCursor DealerIDDateRemoved multi",
"isMultiKey" : false,
"n" : 11036,
"nscannedObjects" : 11008,
"nscanned" : 11307,
"nscannedObjectsAllPlans" : 11201,
"nscannedAllPlans" : 11713,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 58,
"indexBounds" : {...},
"allPlans" : [...],
"oldPlan" : {...},
"server" : "..."
}
},
{
"$group" : {...}
}
],
"ok" : 1
}
After these changes indexOnly param now shows true, this means we have just created the covered index:
{
"serverPipeline" : [
{
"query" : {...},
"projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
"cursor" : {
"cursor" : "BtreeCursor DealerIDDateRemovedPriceTurnover multi",
"isMultiKey" : false,
"n" : 11036,
"nscannedObjects" : 0,
"nscanned" : 11307,
"nscannedObjectsAllPlans" : 285,
"nscannedAllPlans" : 11713,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 58,
"indexBounds" : {...},
"allPlans" : [...],
"server" : "..."
}
},
{
"$group" : {...}
],
"ok" : 1
}
Now the query works approximately between 0.085-0.300 seconds. Additional information about covered queries Create Indexes that Support Covered Queries

Related

MongoDB picking wrong index

The following document is stored in a collection:
"ldr": {
"d": NumberInt(318),
"w": NumberInt(46),
"m": NumberInt(10),
"pts": [
{
"lid": ObjectId("47cc67093475061e3d95369d"),
"dPts": NumberLong(110),
"wPts": NumberLong(110),
"mPts": NumberLong(220),
"aPts": NumberLong(3340)
},
{
"lid": ObjectId("56316279be4f0eda62ebfee0"),
"dPts": NumberInt(0),
"wPts": NumberInt(0),
"mPts": NumberInt(0),
"aPts": NumberInt(0)
}
]
}
I have 4 indexes on a collection:
ldr.pts.lid_1_ldr.d_1_ldr.pts.dPts_-1
ldr.pts.lid_1_ldr.w_1_ldr.pts.wPts_-1
ldr.pts.lid_1_ldr.m_1_ldr.pts.mPts_-1
ldr.pts.lid_1_ldr.pts.aPts_-1
I use the following query:
db.my_collection.find({"ldr.pts.lid":ObjectId("47cc67093475061e3d95369d"), "ldr.w": NumberInt(46)},{"ldr":1}).sort({"ldr.pts.wPts":-1}).explain()
Note: I have run this query with the {ldr:1} left out with the same result.
I would expect the query above to use the following index:
ldr.pts.lid_1_ldr.w_1_ldr.pts.wPts_-1
However, this is the result of the explain:
{
"cursor" : "BtreeCursor ldr.pts.lid_1_ldr.d_1_ldr.pts.dPts_-1",
"isMultiKey" : true,
"n" : 3,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 16,
"nscannedAllPlans" : 16,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"ldr.pts.lid" : [
[
ObjectId("47cc67093475061e3d95369d"),
ObjectId("47cc67093475061e3d95369d")
]
],
"ldr.d" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"ldr.pts.dPts" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "Beast-PC:27017",
"filterSet" : false
}
As you can see the first index is being picked.
I've tried using a hint and supplying the correct index but that still results in indexOnly being false and in scanAndOrder being true.
Any ideas?
Sorting on a field within an array isn't likely to produce what you're expecting as your descending sort on ldr.pts.wPts will sort based on the max of all the wPts values from each document's pts array, rather than just the wPts value from the matching pts array element.
That's at the root of why your query can't use an index for the sorting.

Why does MongoDB's "$and" operator sometimes use a different plan vs. specifying the criteria inline?

It seems to me that the following two queries should have exactly the same "explain" output:
Query 1:
{
$and: [
{ $or: [
{ Foo: "123" },
{ Bar: "456" }
] },
{ Baz: { $in: ["abc", "def"] } }
]
}
Query 2:
{
$or: [
{ Foo: "123" },
{ Bar: "456" }
],
Baz: { $in: ["abc", "def"] } }
}
Note that I have indexes on { Foo: -1, Baz: -1 } and { Bar: -1, Baz: -1 }, so this is optimized for the $or operator. And in fact, in the version for Query 2, in the explain output, I see two clauses, both with appropriate index bounds, one for (Foo, Baz) and one for (Bar, Baz). MongoDB is doing exactly what it's supposed to.
But in the first version (Query 1), there are no clauses anymore. It gives me a BasicCursor with no index bounds specified.
What's the difference between these two queries? Why does Mongo seem to be able to optimize #2 but not #1?
Right now I'm testing these queries using MongoVue, so I have control over the JSON, but ultimately I'm going to be using the C# driver, and I'm pretty sure it will always emit the syntax in #1 and not #2, so it's important to find out what's going on...
This seems to be a bug of some kind in mongodb. What version are you using?
According to that bug report the issue is resolved in 2.5.3.
Until we move to the later versions (I am at 2.4.6) we will have to be careful with the $and operator.
I am going to try it in 2.6 as well.
UPDATE:
Indeed it is fixed in 2.6.3 that I am now.
> db.test.find()
{ "_id" : 1, "Fields" : { "K1" : 123, "K2" : 456 } }
{ "_id" : 2, "Fields" : { "K1" : 456, "K2" : 123 } }
> db.test.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.test"
},
{
"v" : 1,
"key" : {
"Fields.K1" : 1
},
"name" : "Fields.K1_1",
"ns" : "test.test"
},
{
"v" : 1,
"key" : {
"Fields.K2" : 1
},
"name" : "Fields.K2_1",
"ns" : "test.test"
}
]
> db.test.find({"$and" : [{ "Fields.K1" : 123, "Fields.K2" : 456}]}).explain()
{
"cursor" : "BtreeCursor Fields.K1_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"Fields.K1" : [
[
123,
123
]
]
},
"server" : "benihime:27017",
"filterSet" : false
}
> db.test.find({ "Fields.K1" : 123, "Fields.K2" : 456}).explain()
{
"cursor" : "BtreeCursor Fields.K1_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"Fields.K1" : [
[
123,
123
]
]
},
"server" : "benihime:27017",
"filterSet" : false
}

MongoDB two seemingly identical queries - one uses an index the other doesn't

I noticed in Mongo logs that some queries were taking longer than expected.
Fri Jan 4 08:53:39 [conn587] query mydb.User query: { query: { someField: "eu", lastRecord.importantValue: { $ne: nan.0 }, lastRecord.otherValue: { $gte: 1000 } }, orderby: { lastRecord.importantValue: -1 } } ntoreturn:50 ntoskip:0 nscanned:40681 scanAndOrder:1 keyUpdates:0 numYields: 1 locks(micros) r:2649788 nreturned:50 reslen:334041 1575ms
Considering that I had built an index on {someField : 1, "lastRecord.otherValue" : 1, "lastRecord.importantValue" : -1} I went to investigate.
During that I noticed that what seem like two identical queries to me, just phrased differently syntactically and what return identical values, are executed differently by MongoDB - one uses the index as expected, while the other doesn't.
And my web application invokes the version that doesn't use indexes.
I'm obviously misunderstanding something fundamental here.
Index used fine:
> db.User.find({}, {_id:1}).sort({"lastRecord.importantValue" : -1}).limit(5).explain()
{
"cursor" : "BtreeCursor lastRecord.importantValue_-1",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 5,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 5,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"lastRecord.importantValue" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "whatever"
}
Index not used:
> db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}}, {_id:1}).limit(5).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 67281,
"nscanned" : 67281,
"nscannedObjectsAllPlans" : 67281,
"nscannedAllPlans" : 67281,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 133,
"indexBounds" : {
},
"server" : "whatever"
}
Hint doesn't help:
> db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}, $hint : {"lastRecord.importantValue" : -1}}, {_id:1}).limit(5).explain()
{
"cursor" : "BasicCursor",
// snip
}
However returned values are same (like expected):
> db.User.find({}, {_id:1}).sort({"lastRecord.importantValue" : -1}).limit(5)
{ "_id" : NumberLong(500280899) }
{ "_id" : NumberLong(500335132) }
{ "_id" : NumberLong(500378261) }
{ "_id" : NumberLong(500072584) }
{ "_id" : NumberLong(500071366) }
> db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}}, {_id:1}).limit(5)
{ "_id" : NumberLong(500280899) }
{ "_id" : NumberLong(500335132) }
{ "_id" : NumberLong(500378261) }
{ "_id" : NumberLong(500072584) }
{ "_id" : NumberLong(500071366) }
Index is present (this one I created to test the simpler queries, the compound index is also still there):
> db.User.getIndexes()
[
// snip other indexes
{
"v" : 1,
"key" : {
"lastRecord.importantValue" : -1
},
"ns" : "mydb.User",
"name" : "lastRecord.importantValue_-1"
}
]
Morphia code just FYI (not sure if I can get exact command it generates):
ds.find(User.class).filter("someField =", v1)
.filter("lastRecord.importantValue !=", Double.NaN)
.filter("lastRecord.otherValue >=", v2)
.order("-lastRecord.importantValue")
.limit(50);
Any ideas?
Edit 6-Jan:
Just noticed another instance of this in the logs:
TOKEN ip-10-212-234-60 Sun Jan 6 09:20:54 [conn249] query mydb.User query: { query: { someField: "eu", lastRecord.otherValue: { $gte: 1000 } }, orderby: { lastRecord.importantValue: -1 } } cursorid:9069510232503855502 ntoreturn:50 ntoskip:0 nscanned:2042 keyUpdates:0 locks(micros) r:118923 nreturned:50 reslen:344959 118ms
Note I have since removed the $ne from the query. So it executes in 118 ms and (if I interpret right) scans 2042 rows only.
However if I do the following from console on the same server:
> db.User.find({$query: { someField: "eu", "lastRecord.otherValue": { $gte: 1000 } }, $orderby: { "lastRecord.importantValue": -1 } }).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 70308,
"nscanned" : 70308,
"nscannedObjectsAllPlans" : 70308,
"nscannedAllPlans" : 70308,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 847,
"indexBounds" : {
},
"server" : "ip-whatever:27017"
}
So could it really be just a bug in explain?
Edit - further update 6-Jan:
On the other hand on my local system (same indexes, including "{someField : 1, "lastRecord.otherValue" : 1, "lastRecord.importantValue" : -1}") I managed to obtain the following under load:
Sun Jan 06 17:43:56 [conn33] query mydb.User query: { query: { someField: "eu", lastRecord.otherValue: { $gte: 1000 } }, orderby: { lastRecord.importantValue: -1 } }
cursorid:76077040284571 ntoreturn:50 ntoskip:0 nscanned:183248 keyUpdates:0 numYields: 2318 locks(micros) r:285016301 nreturned:50 reslen:341500 148567ms
148567ms :(
In fact the problem is mixing up the two syntax.
According to the documentation : http://docs.mongodb.org/manual/reference/operator/query/
So when you use .explain in :
db.User.find({}, {_id:1}).sort({"lastRecord.importantValue" : -1}).limit(5).explain()
fine but when you use this :
db.User.find({$query: {}, $orderby : {"lastRecord.importantValue": -1}}, {_id:1}).limit(5).explain()
Explain is used in the first type of syntax : you should use $explain:1 inside the $query and not .explain after.
See this question also : MongoDB $query operator ignores index?
It is quite about the same issue.

MongoDB indexes for $elemMatch

The indexes help page at http://www.mongodb.org/display/DOCS/Indexes doesn't mention $elemMatch and since it says to add an index on my 2M+ object collection I thought I'd ask this:
I am doing a query like:
{ lc: "eng", group: "xyz", indices: { $elemMatch: { text: "as", pos: { $gt: 1 } } } }
If I add an index
{lc:1, group:1, indices.text:1, indices.pos:1}
will this query with the $elemMatch component be able to be fully run through the index?
Based on your query, I imagine that your documents look something like this:
{
"_id" : 1,
"lc" : "eng",
"group" : "xyz",
"indices" : [
{
"text" : "as",
"pos" : 2
},
{
"text" : "text",
"pos" : 4
}
]
}
I created a test collection with documents of this format, created the index, and ran the query that you posted with the .explain() option.
The index is being used as expected:
> db.test.ensureIndex({"lc":1, "group":1, "indices.text":1, "indices.pos":1})
> db.test.find({ lc: "eng", group: "xyz", indices: { $elemMatch: { text: "as", pos: { $gt: 1 } } } }).explain()
{
"cursor" : "BtreeCursor lc_1_group_1_indices.text_1_indices.pos_1",
"isMultiKey" : true,
"n" : NumberLong(1),
"nscannedObjects" : NumberLong(1),
"nscanned" : NumberLong(1),
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : NumberLong(0),
"millis" : 0,
"indexBounds" : {
"lc" : [
[
"eng",
"eng"
]
],
"group" : [
[
"xyz",
"xyz"
]
],
"indices.text" : [
[
"as",
"as"
]
],
"indices.pos" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Marcs-MacBook-Pro.local:27017"
}
The documentation on the .explain() feature may be found here: http://www.mongodb.org/display/DOCS/Explain
.explain() may be used to display information about a query, including which (if any) index is used.

Mongo Index not being used

I created an index around several items for a particular query I am doing:
{
"v" : 1,
"key" : {
"MODIFIED" : -1,
"state" : 1,
"fail" : 1,
"generated" : 1
},
"ns" : "foo.bar",
"name" : "MODIFIED_-1_state_1_fail_1_generated"
}
However when I execute my query, it doesn't apear to be using my index. Could you please provide some insite into what I'm doing wrong?
Thank you!
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
"$or": [
{"state": "ca"},
{"state": "ok"}
]
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 464215,
"nscannedObjects" : 464215,
"n" : 0,
"millis" : 7549,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
There's a good reason your index cannot be used for your query and I also think there are some issues with the query itself. The reason it's not hitting an index is because of the nested $or operator by the way but I think your actual problem is a lack of understanding on all the operators available to you in MongoDB :
First of all, your nested $or to check if the state is either "ca" or "ok" is not necessary and ( since it's the main reason you're not hitting your index) can be replaced with state:{$in:["ca", "ok"]} which does the exact same thing. Now your query is :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
And it will hit your index. Your second issue is that a top-level $and clause is not necessary. Note that AND(OR(A, B), AND(C, D)) = AND(OR(A, B), C, D). This query does the same :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"fail": {"$ne": 1},
"generated": {"$exists": false}
}).explain();
Which still hits the index :
{
"clauses" : [
{
"cursor" : "BtreeCursor MODIFIED_-1_state_1_fail_1_generated_1 multi",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"MODIFIED" : [
[
{
"$maxElement" : 1
},
{
"sec" : 1321419600,
"usec" : 0
}
]
],
"state" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"fail" : [
[
{
"$minElement" : 1
},
1
],
[
1,
{
"$maxElement" : 1
}
]
],
"generated" : [
[
null,
null
]
]
}
},
{
"cursor" : "BasicCursor",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
],
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1
}
Hope that helps! By the way it's slightly more conventional to start the first key in your compound index with order 1 and the second with -1. Note that the -1 is only used to determine the direction relative to the previous field.