When using $elemMatch, why is MongoDB performing a FETCH stage if index is covering the filter? - mongodb

I have the following table :
> db.foo.find()
{ "_id" : 1, "k" : [ { "a" : 50, "b" : 10 } ] }
{ "_id" : 2, "k" : [ { "a" : 90, "b" : 80 } ] }
With an compound index on k field :
"key" : {
"k.a" : 1,
"k.b" : 1
},
"name" : "k.a_1_k.b_1"
If I run the following query :
db.foo.aggregate([
{ $match: { "k.a" : 50 } },
{ $project: { _id : 0, "dummy": {$literal:""} }}
])
The index if used (make sense) and there is no need of FETCH stage :
"winningPlan" : {
"stage" : "COUNT_SCAN",
"keyPattern" : {
"k.a" : 1,
"k.b" : 1
},
"indexName" : "k.a_1_k.b_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"k.a" : [ ],
"k.b" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"indexBounds" : {
"startKey" : {
"k.a" : 50,
"k.b" : { "$minKey" : 1 }
},
"startKeyInclusive" : true,
"endKey" : {
"k.a" : 50,
"k.b" : { "$maxKey" : 1 }
},
"endKeyInclusive" : true
}
}
However, If I run the following query that use $elemMatch :
db.foo.aggregate([
{ $match: { k: {$elemMatch: {a : 50, b : { $in : [5, 6, 10]}}}} },
{ $project: { _id : 0, "dummy" : {$literal : ""}} }
])
There is a FETCH stage (which AFAIK is not necessary) :
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"k" : {
"$elemMatch" : {
"$and" : [
{ "a" : { "$eq" : 50 } },
{ "b" : { "$in" : [ 5, 6, 10 ] } }
]
}
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"k.a" : 1,
"k.b" : 1
},
"indexName" : "k.a_1_k.b_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"k.a" : [ ],
"k.b" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"k.a" : [
"[50.0, 50.0]"
],
"k.b" : [
"[5.0, 5.0]",
"[6.0, 6.0]",
"[10.0, 10.0]"
]
}
}
},
I am using MongoDB 3.4.
I ask this because I have a DB with lot of documents and there is a query that use aggregate() and $elemMatch (it perform more useful things than projecting nothing as in this question OFC, but theoretically things does not require a FETCH stage). I found out main reason of query being slow if is the FETCH stage, which AFAIK is not needed.
Is there some logic that force MongoDB to use FETCH when $elemMatch is used, or is it just a missing optimization ?

Looks like even a single "$elemMatch" forces mongo to do FETCH -> COUNT instead of COUNT_SCAN. Opened a ticket in their Jira with simple steps to reproduce - https://jira.mongodb.org/browse/SERVER-35223

TLDR: this is the expected behaviour of a multikey index combined with an $elemMatch.
From the covered query section of the multikey index documents:
Multikey indexes cannot cover queries over array field(s).
Meaning all information about a sub-document is not in the multikey index.
Let's imagine the following scenario:
//doc1
{
"k" : [ { "a" : 50, "b" : 10 } ]
}
//doc2
{
"k" : { "a" : 50, "b" : 10 }
}
Because Mongo "flattens" the array it indexes, once the index is built Mongo cannot differentiate between these 2 documents, $elemMatch specifically requires an array object to match (i.e doc2 will never match an $elemMatch query).
Meaning Mongo is forced to FETCH the documents to determine which docs will match, this is the premise causing the behaviour you see.

Related

intersecting two multi key indexes for two fields (holding array values) in a document in mongo DB

When using the find method, only the last or first index are being applied. How can I overcome the below scenario?
Sample document:
{
"_id":"5a67xxxxxx",
"name":"abc",
"address":"Newyork",
"type":[ "perm", "vend", "contract" ],
"region":[ "USA", "MEA", "UK" ]
}
Indexes created for type and region: _type_1 and _region_1:
{
"name":"region_1",
"key":{
"region":1
},
"type":"REGULAR",
"ns":“xxx.sample ", },
{
"name":"_type_1",
"key":{
"type":1
},
"type":"REGULAR",
"ns":“xxx.sample ", }
Find query:
db.sample.find({ _type_1:{$in:["perm","vend"]}, _region_1: {$in:["USA","MEA"]}})
On executing the query only _region_1 is applied. How can I apply both indexes while executing the query?
Multi-key index is an index on a field with an array as value. As such MongoDB doesn't allow create one index with two array fields (a compound index with two array fields). But, you can create separate indexes, as in this case; two single field indexes, one on each of the array fields.
The idea that why MongoDB doesn't allow a compound index with two array fields is that when such an index is created the number of keys in the index and the size of the index will be huge, and is a problem using such index without performance problems.
How about index intersection? Two single multikey indexes used to filter documents within the same query.
I created a small collection of documents with similar structure as in the question post, and indexes. And, ran some similar queries with explain(). I changed the filter data on multiple queries. It is interesting, to note that sometimes the query plan showed that the optimizer is using the index on field A only and sometimes the field B only. But, never both at the same time. But, query plans also showed sometimes that the index intersection is considered, not selected but rejected by the optimizer; that is the rejected plan showed usage of index intersection of the two multikey indexes.
How do we know that a query used an index intersection? The explain output had shown the AND_SORTED stage.
The index intersection documentation says that:
To determine if MongoDB used index intersection, run explain(); the
results of explain() will include either an AND_SORTED stage or an
AND_HASH stage.
I had posted the "rejectedPlans" details at the bottom of this post.
Generally, you can force the query optimizer to select a particular index in favor of other index; by using the hint(). But in this case this does not work.
"rejectedPlans" : [
{
"stage" : "FETCH",
"filter" : {
"a" : {
"$eq" : 2
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"b" : 1
},
"indexName" : "b_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"b" : [
"b"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"b" : [
"[2.0, 2.0]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"a" : {
"$eq" : 2
}
},
{
"b" : {
"$eq" : 2
}
}
]
},
"inputStage" : {
"stage" : "AND_SORTED",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"a" : 1
},
"indexName" : "a_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"a" : [
"a"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"a" : [
"[2.0, 2.0]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"b" : 1
},
"indexName" : "b_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"b" : [
"b"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"b" : [
"[2.0, 2.0]"
]
}
}
]
}
}
]
[EDIT ADD]:
The documents, indexes and queries:
{ _id: 1, a: [ 1, 2 ], b: [ 2, 8 ], c: "Both arrays" },
{ _id: 2, a: [ 1, 5, 9 ], b: [ 4, 8, 2 ], c: "AB arrays" }
db.multi.createIndex({ a: 1})
db.multi.createIndex({ b: 1})
db.multi.find( { a: { $in: [ 5, 9 ] }, b: { $in: [ 4 ] } } )
db.multi.find( { a: { $in: [ 2] }, b: { $in: [ 2 ] } } )

How Mongo Query planner choose his index

Mongo version : 3.4.3-10-g865d2fb
I have a request like this :
db.getCollection('c_zop_operations').find({
"a":{
"$in":[
"O",
"S",
"P"
]
},
"$or":[
{
"b":"008091",
"c":"1187",
"d":"F",
"e":ISODate("2018-07-22T22:00:00.000Z")
},
... x 39 elements in $or statement
]
}).explain("executionStats")
The request during 16 seconds and explain returns these results :
155769 documents parse in index !!!
{
"queryPlanner" : {
"plannerVersion" : 1,
...
"indexFilterSet" : false,
"parsedQuery" : {
...
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
{
"$and" : [
{
"c" : {
"$eq" : "1187"
}
},
{
"d" : {
"$eq" : "F"
}
},
{
"b" : {
"$eq" : "008091"
}
},
{
"e" : {
"$eq" : ISODate("2018-07-22T22:00:00.000Z")
}
}
]
},
x 39 times
...
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"a" : 1
},
"indexName" : "a",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"a" : [
"[\"O\", \"O\"]",
"[\"P\", \"P\"]",
"[\"S\", \"S\"]"
]
}
}
},
"rejectedPlans" : [
...
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 16010,
"totalKeysExamined" : 155769,
"totalDocsExamined" : 155769,
...
}
...
}
In my collection, I have a lot of indexes (65) and my collection contains 3 millions of documents.
Only two indexes interest me here :
aIndex : { "a" : 1 }
And
beIndex: { "b" : 1, "e" : 1 }
By default, mongo use { "a" : 1 } and the request takes 16 seconds.
If I use hint(beIndex), the request takes 0,011 second and totalKeysExamined = 0 and totalDocsExamined = 0.
Why MongoDB don't use the beIndex that is more effective ?
This behavior is a known bug. See SERVER-13732. This was fixed in MongoDB 3.6.
As a workaround, distributing the top-level filter on a into each $or clause should allow the query planner to make a better index choice.

MongoDB performs slow query on sum() based on monthly groups

This is What I tried so far on aggregated query:
db.getCollection('storage').aggregate([
{
"$match": {
"user_id": 2
}
},
{
"$project": {
"formattedDate": {
"$dateToString": { "format": "%Y-%m", "date": "$created_on" }
},
"size": "$size"
}
},
{ "$group": {
"_id" : "$formattedDate",
"size" : { "$sum": "$size" }
} }
])
This is the result:
/* 1 */
{
"_id" : "2018-02",
"size" : NumberLong(10860595386)
}
/* 2 */
{
"_id" : "2017-12",
"size" : NumberLong(524288)
}
/* 3 */
{
"_id" : "2018-01",
"size" : NumberLong(21587971)
}
And this is the document structure:
{
"_id" : ObjectId("5a59efedd006b9036159e708"),
"user_id" : NumberLong(2),
"is_transferred" : false,
"is_active" : false,
"process_id" : NumberLong(0),
"ratio" : 0.000125759169459343,
"type_id" : 201,
"size" : NumberLong(1687911),
"is_processed" : false,
"created_on" : ISODate("2018-01-13T11:39:25.000Z"),
"processed_on" : ISODate("1970-01-01T00:00:00.000Z")
}
And last, the explain result:
/* 1 */
{
"stages" : [
{
"$cursor" : {
"query" : {
"user_id" : 2.0
},
"fields" : {
"created_on" : 1,
"size" : 1,
"_id" : 1
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "data.storage",
"indexFilterSet" : false,
"parsedQuery" : {
"user_id" : {
"$eq" : 2.0
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"user_id" : 1
},
"indexName" : "user_id",
"isMultiKey" : false,
"multiKeyPaths" : {
"user_id" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"user_id" : [
"[2.0, 2.0]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$project" : {
"_id" : true,
"formattedDate" : {
"$dateToString" : {
"format" : "%Y-%m",
"date" : "$created_on"
}
},
"size" : "$size"
}
},
{
"$group" : {
"_id" : "$formattedDate",
"size" : {
"$sum" : "$size"
}
}
}
],
"ok" : 1.0
}
The problem:
I can navigate and get all results in almost instantly like in 0,002sec. However, when I specify user_id and sum them by grouping on each month, My result came in between 0,300s to 0,560s. I do similar tasks in one request and it becaomes more than a second to finish.
What I tried so far:
I've added an index for user_id
I've added an index for created_on
I used more $match conditions. However, This makes even worse.
This collection have almost 200,000 documents in it currently and approximately 150,000 of them are belongs to user_id = 2
How can I minimize the response time for this query?
Note: MongoDB 3.4.10 used.
Pratha,
try to add sort on "created_on" and "size" fields as the first stage in aggregation pipeline.
db.getCollection('storage').aggregate([
{
"$sort": {
"created_on": 1, "size": 1
}
}, ....
Before that, add compound key index:
db.getCollection('storage').createIndex({created_on:1,size:1})
If you sort data before the $group stage, it will improve the efficiency of accumulation of the totals.
Note about sort aggregation stage:
The $sort stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $sort will produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files.
P.S
get rid of match stage by userID to test performance, or add userID to compound key also.

Tune Up Mongo Query

I am new to Mongo and was trying to get distinct count of users. The field Id and Status are not individually Indexed columns but there exists a composite index on both the field. My current query is something like this where the match conditions changes depending on the requirements.
DBQuery.shellBatchSize = 1000000;
db.getCollection('username').aggregate([
{$match:
{ Status: "A"
} },
{"$group" : {_id:"$Id", count:{$sum:1}}}
]);
Is there anyway we can optimize this query more or add parallel runs on collection so that we can achieve results faster ?
Regards
You can tune your aggregation pipelines by passing in an option of explain=true in the aggregate method.
db.getCollection('username').aggregate([
{$match: { Status: "A" } },
{"$group" : {_id:"$Id", count:{$sum:1}}}],
{ explain: true });
This will then output the following to work with
{
"stages" : [
{
"$cursor" : {
"query" : {
"Status" : "A"
},
"fields" : {
"Id" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.usernames",
"indexFilterSet" : false,
"parsedQuery" : {
"Status" : {
"$eq" : "A"
}
},
"winningPlan" : {
"stage" : "EOF"
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : "$Id",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
So to speed up our query we need a index to help the match part of the pipeline, so let's create a index on Status
> db.usernames.createIndex({Status:1})
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
If we now run the explain again we'll get the following results
{
"stages" : [
{
"$cursor" : {
"query" : {
"Status" : "A"
},
"fields" : {
"Id" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.usernames",
"indexFilterSet" : false,
"parsedQuery" : {
"Status" : {
"$eq" : "A"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"Status" : 1
},
"indexName" : "Status_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"Status" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"Status" : [
"[\"A\", \"A\"]"
]
}
}
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : "$Id",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
We can now see straight away this is using a index.
https://docs.mongodb.com/manual/reference/explain-results/

MongoDB optimize indexes for aggregation

I have an aggregate on a collection with about 1.6M of registers. That consult is a simple example of other more complex, but illustrate the poor optimization of index used in my opinion.
db.getCollection('cbAlters').runCommand("aggregate", {pipeline: [
{
$match: { cre_carteraId: "31" }
},
{
$group: { _id: { ca_tramomora: "$cre_tramoMora" },
count: { $sum: 1 } }
}
]})
That query toke about 5 sec. The colleccion have 25 indexes configured to differents consults. The one used according to query explain is:
{
"v" : 1,
"key" : {
"cre_carteraId" : 1,
"cre_periodo" : 1,
"cre_tramoMora" : 1,
"cre_inactivo" : 1
},
"name" : "cartPerTramInact",
"ns" : "basedatos.cbAlters"
},
I created an index adjusted to this particular query:
{
"v" : 1,
"key" : {
"cre_carteraId" : 1,
"cre_tramoMora" : 1
},
"name" : "cartPerTramTest",
"ns" : "basedatos.cbAlters"
}
The query optimizer reject this index, and suggests me to use the initial index. Output of my query explain seem like this:
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"cre_carteraId" : "31"
},
"fields" : {
"cre_tramoMora" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "basedatos.cbAlters",
"indexFilterSet" : false,
"parsedQuery" : {
"cre_carteraId" : {
"$eq" : "31"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"cre_tramoMora" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"cre_carteraId" : 1,
"cre_periodo" : 1,
"cre_tramoMora" : 1,
"cre_inactivo" : 1
},
"indexName" : "cartPerTramInact",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"cre_carteraId" : [
"[\"31\", \"31\"]"
],
"cre_periodo" : [
"[MinKey, MaxKey]"
],
"cre_tramoMora" : [
"[MinKey, MaxKey]"
],
"cre_inactivo" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "PROJECTION",
"transformBy" : {
"cre_tramoMora" : 1,
"_id" : 0
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"cre_carteraId" : 1,
"cre_tramoMora" : 1
},
"indexName" : "cartPerTramTest",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"cre_carteraId" : [
"[\"31\", \"31\"]"
],
"cre_tramoMora" : [
"[MinKey, MaxKey]"
]
}
}
}
]
}
}
},
{
"$group" : {
"_id" : {
"ca_tramomora" : "$cre_tramoMora"
},
"count" : {
"$sum" : {
"$const" : 1.0
}
}
}
}
],
"ok" : 1.0
}
Then, why optimizer prefers an index less adjusted? Should indexFilterSet (result filtered for index) be true for this aggregate?
How can I improve this index, or something goes wrong with the query?
I do not have much experience with mongoDB, I appreciate any help
As long as you have index cartPerTramInact, optimizer won't use your cartPerTramTest index because first fields are same and in same order.
This goes with other indexes too. When there is indexes what have same keys at same order (like a.b.c.d, a.b.d, a.b) and you query use fields a.b, it will favour that a.b.c.d. Anyway you don't need that index a.b because you already have two indexes what covers a.b (a.b.c.d and a.b.d)
Index a.b.d is used only when you do query with those fields a.b.d, BUT if a.b is already very selective, it's probably faster to do select with index a.b.c.d using only part a.b and do "full table scan" to find that d
There is a hint option for aggregations that can help with the index...
See https://www.mongodb.com/docs/upcoming/reference/method/db.collection.aggregate/#mongodb-method-db.collection.aggregate