MongoDB indexes for $elemMatch - mongodb

The indexes help page at http://www.mongodb.org/display/DOCS/Indexes doesn't mention $elemMatch and since it says to add an index on my 2M+ object collection I thought I'd ask this:
I am doing a query like:
{ lc: "eng", group: "xyz", indices: { $elemMatch: { text: "as", pos: { $gt: 1 } } } }
If I add an index
{lc:1, group:1, indices.text:1, indices.pos:1}
will this query with the $elemMatch component be able to be fully run through the index?

Based on your query, I imagine that your documents look something like this:
{
"_id" : 1,
"lc" : "eng",
"group" : "xyz",
"indices" : [
{
"text" : "as",
"pos" : 2
},
{
"text" : "text",
"pos" : 4
}
]
}
I created a test collection with documents of this format, created the index, and ran the query that you posted with the .explain() option.
The index is being used as expected:
> db.test.ensureIndex({"lc":1, "group":1, "indices.text":1, "indices.pos":1})
> db.test.find({ lc: "eng", group: "xyz", indices: { $elemMatch: { text: "as", pos: { $gt: 1 } } } }).explain()
{
"cursor" : "BtreeCursor lc_1_group_1_indices.text_1_indices.pos_1",
"isMultiKey" : true,
"n" : NumberLong(1),
"nscannedObjects" : NumberLong(1),
"nscanned" : NumberLong(1),
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : NumberLong(0),
"millis" : 0,
"indexBounds" : {
"lc" : [
[
"eng",
"eng"
]
],
"group" : [
[
"xyz",
"xyz"
]
],
"indices.text" : [
[
"as",
"as"
]
],
"indices.pos" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Marcs-MacBook-Pro.local:27017"
}
The documentation on the .explain() feature may be found here: http://www.mongodb.org/display/DOCS/Explain
.explain() may be used to display information about a query, including which (if any) index is used.

Related

When using $elemMatch, why is MongoDB performing a FETCH stage if index is covering the filter?

I have the following table :
> db.foo.find()
{ "_id" : 1, "k" : [ { "a" : 50, "b" : 10 } ] }
{ "_id" : 2, "k" : [ { "a" : 90, "b" : 80 } ] }
With an compound index on k field :
"key" : {
"k.a" : 1,
"k.b" : 1
},
"name" : "k.a_1_k.b_1"
If I run the following query :
db.foo.aggregate([
{ $match: { "k.a" : 50 } },
{ $project: { _id : 0, "dummy": {$literal:""} }}
])
The index if used (make sense) and there is no need of FETCH stage :
"winningPlan" : {
"stage" : "COUNT_SCAN",
"keyPattern" : {
"k.a" : 1,
"k.b" : 1
},
"indexName" : "k.a_1_k.b_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"k.a" : [ ],
"k.b" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"indexBounds" : {
"startKey" : {
"k.a" : 50,
"k.b" : { "$minKey" : 1 }
},
"startKeyInclusive" : true,
"endKey" : {
"k.a" : 50,
"k.b" : { "$maxKey" : 1 }
},
"endKeyInclusive" : true
}
}
However, If I run the following query that use $elemMatch :
db.foo.aggregate([
{ $match: { k: {$elemMatch: {a : 50, b : { $in : [5, 6, 10]}}}} },
{ $project: { _id : 0, "dummy" : {$literal : ""}} }
])
There is a FETCH stage (which AFAIK is not necessary) :
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"k" : {
"$elemMatch" : {
"$and" : [
{ "a" : { "$eq" : 50 } },
{ "b" : { "$in" : [ 5, 6, 10 ] } }
]
}
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"k.a" : 1,
"k.b" : 1
},
"indexName" : "k.a_1_k.b_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"k.a" : [ ],
"k.b" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"k.a" : [
"[50.0, 50.0]"
],
"k.b" : [
"[5.0, 5.0]",
"[6.0, 6.0]",
"[10.0, 10.0]"
]
}
}
},
I am using MongoDB 3.4.
I ask this because I have a DB with lot of documents and there is a query that use aggregate() and $elemMatch (it perform more useful things than projecting nothing as in this question OFC, but theoretically things does not require a FETCH stage). I found out main reason of query being slow if is the FETCH stage, which AFAIK is not needed.
Is there some logic that force MongoDB to use FETCH when $elemMatch is used, or is it just a missing optimization ?
Looks like even a single "$elemMatch" forces mongo to do FETCH -> COUNT instead of COUNT_SCAN. Opened a ticket in their Jira with simple steps to reproduce - https://jira.mongodb.org/browse/SERVER-35223
TLDR: this is the expected behaviour of a multikey index combined with an $elemMatch.
From the covered query section of the multikey index documents:
Multikey indexes cannot cover queries over array field(s).
Meaning all information about a sub-document is not in the multikey index.
Let's imagine the following scenario:
//doc1
{
"k" : [ { "a" : 50, "b" : 10 } ]
}
//doc2
{
"k" : { "a" : 50, "b" : 10 }
}
Because Mongo "flattens" the array it indexes, once the index is built Mongo cannot differentiate between these 2 documents, $elemMatch specifically requires an array object to match (i.e doc2 will never match an $elemMatch query).
Meaning Mongo is forced to FETCH the documents to determine which docs will match, this is the premise causing the behaviour you see.

Why does MongoDB's "$and" operator sometimes use a different plan vs. specifying the criteria inline?

It seems to me that the following two queries should have exactly the same "explain" output:
Query 1:
{
$and: [
{ $or: [
{ Foo: "123" },
{ Bar: "456" }
] },
{ Baz: { $in: ["abc", "def"] } }
]
}
Query 2:
{
$or: [
{ Foo: "123" },
{ Bar: "456" }
],
Baz: { $in: ["abc", "def"] } }
}
Note that I have indexes on { Foo: -1, Baz: -1 } and { Bar: -1, Baz: -1 }, so this is optimized for the $or operator. And in fact, in the version for Query 2, in the explain output, I see two clauses, both with appropriate index bounds, one for (Foo, Baz) and one for (Bar, Baz). MongoDB is doing exactly what it's supposed to.
But in the first version (Query 1), there are no clauses anymore. It gives me a BasicCursor with no index bounds specified.
What's the difference between these two queries? Why does Mongo seem to be able to optimize #2 but not #1?
Right now I'm testing these queries using MongoVue, so I have control over the JSON, but ultimately I'm going to be using the C# driver, and I'm pretty sure it will always emit the syntax in #1 and not #2, so it's important to find out what's going on...
This seems to be a bug of some kind in mongodb. What version are you using?
According to that bug report the issue is resolved in 2.5.3.
Until we move to the later versions (I am at 2.4.6) we will have to be careful with the $and operator.
I am going to try it in 2.6 as well.
UPDATE:
Indeed it is fixed in 2.6.3 that I am now.
> db.test.find()
{ "_id" : 1, "Fields" : { "K1" : 123, "K2" : 456 } }
{ "_id" : 2, "Fields" : { "K1" : 456, "K2" : 123 } }
> db.test.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.test"
},
{
"v" : 1,
"key" : {
"Fields.K1" : 1
},
"name" : "Fields.K1_1",
"ns" : "test.test"
},
{
"v" : 1,
"key" : {
"Fields.K2" : 1
},
"name" : "Fields.K2_1",
"ns" : "test.test"
}
]
> db.test.find({"$and" : [{ "Fields.K1" : 123, "Fields.K2" : 456}]}).explain()
{
"cursor" : "BtreeCursor Fields.K1_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"Fields.K1" : [
[
123,
123
]
]
},
"server" : "benihime:27017",
"filterSet" : false
}
> db.test.find({ "Fields.K1" : 123, "Fields.K2" : 456}).explain()
{
"cursor" : "BtreeCursor Fields.K1_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"Fields.K1" : [
[
123,
123
]
]
},
"server" : "benihime:27017",
"filterSet" : false
}

force use of index on complex MongoDB query?

i have a large collection of "messages" with 'to', 'from', 'type', and 'visible_to' fields that I want to query against with a fairly complex query that pulls only the messages to/from a particular user of a particular set of types that are visible to that user. Here is an actual example:
{
"$and": [
{
"$and": [
{
"$or": [
{
"to": "52f65f592f1d88ebcb00004f"
},
{
"from": "52f65f592f1d88ebcb00004f"
}
]
},
{
"$or": [
{
"type": "command"
},
{
"type": "image"
}
]
}
]
},
{
"$or": [
{
"public": true
},
{
"visible_to": "52f65f592f1d88ebcb00004f"
}
]
}
]
}
With indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"expires" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "expires_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"from" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "from_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"type" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "type_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"ts" : 1,
"type" : -1
},
"ns" : "n2-mongodb.messages",
"name" : "ts_1_type_-1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "to_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"visible_to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "visible_to_1",
"background" : true,
"safe" : null
},
{
"v" : 1,
"key" : {
"public" : 1,
"visible_to" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "public_1_visible_to_1"
},
{
"v" : 1,
"key" : {
"to" : 1,
"from" : 1
},
"ns" : "n2-mongodb.messages",
"name" : "to_1_from_1"
}
]
And here is the explain(true) output from our MongoDB 2.2.2 instance, which looks like a full scan:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 35702,
"nscanned" : 35702,
"nscannedObjectsAllPlans" : 35702,
"nscannedAllPlans" : 35702,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 85,
"indexBounds" : {
},
"allPlans" : [
{
"cursor" : "BasicCursor",
"n" : 0,
"nscannedObjects" : 35702,
"nscanned" : 35702,
"indexBounds" : {
}
}
],
"server" : "XXXXXXXX"
}
Looking at the explain output, MongoDB is not using any indexes for this - is there a way to get it to use at least the compound index {to: 1, from: 1} to dramatically narrow the search space? Or is there a better way to optimize this query? Or is MongoDB wholly unsuited for a query like this?
To force the MongoDB query optimizer to adopt a specific approach, you can use the $hint operator.
From the docs,
The $hint operator forces the query optimizer to use a specific index to fulfill the query. Specify the index either by the index name or by document.
The query optimizer in MongoDB 2.6 will include support for applying indexes to complex queries.

Increase Aggregation Framework Performance in MongoDb

I have 200k records in my collection. My data model looks like as follows:
{
"_id" : ObjectId("51750ec159dcef125863b7c4"),
"DateAdded" : ISODate("2013-04-22T00:00:00.000Z"),
"DateRemoved" : ISODate("2013-12-22T00:00:00.000Z"),
"DealerID" : ObjectId("51750bd559dcef07ec964a41"),
"ExStockID" : "8324482",
"Make" : "Mazda",
"Model" : "3",
"Price" : 11479,
"Year" : 2012,
"Variant" : "1.6d (115) TS 5dr",
"Turnover": 150
}
I have several indexes for the collection, one of those used for aggregation framework is:
{
"DealerID" : 1,
"DateRemoved" : -1,
"Price" : 1,
"Turnover" : 1
}
The aggregate query which is being used:
db.stats.aggregate([
{
"$match": {
"DealerID": {
"$in": [
ObjectId("523325ac59dcef1b90a3d446"),
....
// here is specified more than 150 ObjectIds
]
},
"DateRemoved": {
"$gte": ISODate("2013-12-01T00:00:00Z"),
"$lt": ISODate("2014-01-01T00:00:00Z")
}
}
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
"$group": {
"_id": null,
"Price": {
"$avg": "$Price"
},
"Turnover": {
"$avg": "$Turnover"
}
}
}]);
and the time for this query executions resides between 30-200 seconds.
How can I optimize this?
You can try to run explain on the aggregation pipeline, but as I don't have your full dataset, I can't try it out properly:
p = [
{
"$match": {
"DealerID": {
"$in": [
ObjectId("51750bd559dcef07ec964a41"),
ObjectId("51750bd559dcef07ec964a44"),
]
},
"DateRemoved": {
"$gte": ISODate("2013-12-01T00:00:00Z"),
"$lt": ISODate("2014-01-01T00:00:00Z")
}
}
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
"$group": {
"_id": null,
"Price": {
"$avg": "$Price"
},
"Turnover": {
"$avg": "$Turnover"
}
}
}];
db.s.runCommand('aggregate', { pipeline: p, explain: true } );
I would suggest to remove the fields that are not part of the $match (Price and Turnover). Also, I think you should switch the order of DealerId and DateRemoved as you want to do one range search, and from that range then include all the dealers. Doing it the other way around means that you can really only use the index for the 150 single items, and then you need to do a range search.
Using #Derick's answer I have found the index which prevented to create the covered index. As far as I can see query optimizer uses the first index which covers just the query itself, so I have changed the order of indexes. So here is outcome before and after.
Before:
{
"serverPipeline" : [
{
"query" : {...},
"projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
"cursor" : {
"cursor" : "BtreeCursor DealerIDDateRemoved multi",
"isMultiKey" : false,
"n" : 11036,
"nscannedObjects" : 11008,
"nscanned" : 11307,
"nscannedObjectsAllPlans" : 11201,
"nscannedAllPlans" : 11713,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 58,
"indexBounds" : {...},
"allPlans" : [...],
"oldPlan" : {...},
"server" : "..."
}
},
{
"$group" : {...}
}
],
"ok" : 1
}
After these changes indexOnly param now shows true, this means we have just created the covered index:
{
"serverPipeline" : [
{
"query" : {...},
"projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
"cursor" : {
"cursor" : "BtreeCursor DealerIDDateRemovedPriceTurnover multi",
"isMultiKey" : false,
"n" : 11036,
"nscannedObjects" : 0,
"nscanned" : 11307,
"nscannedObjectsAllPlans" : 285,
"nscannedAllPlans" : 11713,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 58,
"indexBounds" : {...},
"allPlans" : [...],
"server" : "..."
}
},
{
"$group" : {...}
],
"ok" : 1
}
Now the query works approximately between 0.085-0.300 seconds. Additional information about covered queries Create Indexes that Support Covered Queries

Mongo Index not being used

I created an index around several items for a particular query I am doing:
{
"v" : 1,
"key" : {
"MODIFIED" : -1,
"state" : 1,
"fail" : 1,
"generated" : 1
},
"ns" : "foo.bar",
"name" : "MODIFIED_-1_state_1_fail_1_generated"
}
However when I execute my query, it doesn't apear to be using my index. Could you please provide some insite into what I'm doing wrong?
Thank you!
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
"$or": [
{"state": "ca"},
{"state": "ok"}
]
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 464215,
"nscannedObjects" : 464215,
"n" : 0,
"millis" : 7549,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
There's a good reason your index cannot be used for your query and I also think there are some issues with the query itself. The reason it's not hitting an index is because of the nested $or operator by the way but I think your actual problem is a lack of understanding on all the operators available to you in MongoDB :
First of all, your nested $or to check if the state is either "ca" or "ok" is not necessary and ( since it's the main reason you're not hitting your index) can be replaced with state:{$in:["ca", "ok"]} which does the exact same thing. Now your query is :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"$and": [
{"fail": {"$ne": 1}},
{"generated": {"$exists": false}}
]
}).explain();
And it will hit your index. Your second issue is that a top-level $and clause is not necessary. Note that AND(OR(A, B), AND(C, D)) = AND(OR(A, B), C, D). This query does the same :
db.foo.find( {
"$or": [
{
"MODIFIED": {
"$gt": {
"sec": 1321419600,
"usec": 0
}
}
},
{
state:{$in:["ca", "ok"]}
}
],
"fail": {"$ne": 1},
"generated": {"$exists": false}
}).explain();
Which still hits the index :
{
"clauses" : [
{
"cursor" : "BtreeCursor MODIFIED_-1_state_1_fail_1_generated_1 multi",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"MODIFIED" : [
[
{
"$maxElement" : 1
},
{
"sec" : 1321419600,
"usec" : 0
}
]
],
"state" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"fail" : [
[
{
"$minElement" : 1
},
1
],
[
1,
{
"$maxElement" : 1
}
]
],
"generated" : [
[
null,
null
]
]
}
},
{
"cursor" : "BasicCursor",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
],
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1
}
Hope that helps! By the way it's slightly more conventional to start the first key in your compound index with order 1 and the second with -1. Note that the -1 is only used to determine the direction relative to the previous field.