MongoDB - count() takes too long despite using an index - mongodb

I have a collection with 62k documents in it. The same collection has a bunch of indexes too, most of them simple, single field ones. What I am observing is that the following query takes extremely long to return:{"status":"complete","$or":[{"groups":{"$exists":false}},{"groups":{"$size":0}},{"groups":{"$in":["5e65ffc2a1e6ef0007bc5fa8"]}}]})
The executionStats for the above query are as follows
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
"$or" : [
"groups" : {
"$size" : 0
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
"$nor" : [
"groups" : {
"$exists" : true
"status" : {
"$eq" : "complete"
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
"groups" : {
"$size" : 0
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
"$nor" : [
"groups" : {
"$exists" : true
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"status" : 1,
"groups" : 1
"indexName" : "status_1_groups_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"status" : [ ],
"groups" : [
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"status" : [
"[\"complete\", \"complete\"]"
"groups" : [
"[MinKey, MaxKey]"
"rejectedPlans" : [
"stage" : "FETCH",
"filter" : {
"$or" : [
"groups" : {
"$size" : 0
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
"$nor" : [
"groups" : {
"$exists" : true
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"status" : 1
"indexName" : "status_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"status" : [ ]
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"status" : [
"[\"complete\", \"complete\"]"
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 62092,
"executionTimeMillis" : 9992,
"totalKeysExamined" : 62092,
"totalDocsExamined" : 62092,
"executionStages" : {
"stage" : "FETCH",
"filter" : {
"$or" : [
"groups" : {
"$size" : 0
"groups" : {
"$eq" : "5e65ffc2a1e6ef0007bc5fa8"
"$nor" : [
"groups" : {
"$exists" : true
"nReturned" : 62092,
"executionTimeMillisEstimate" : 9929,
"works" : 62093,
"advanced" : 62092,
"needTime" : 0,
"needYield" : 0,
"saveState" : 682,
"restoreState" : 682,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 62092,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 62092,
"executionTimeMillisEstimate" : 60,
"works" : 62093,
"advanced" : 62092,
"needTime" : 0,
"needYield" : 0,
"saveState" : 682,
"restoreState" : 682,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"status" : 1,
"groups" : 1
"indexName" : "status_1_groups_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"status" : [ ],
"groups" : [
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"status" : [
"[\"complete\", \"complete\"]"
"groups" : [
"[MinKey, MaxKey]"
"keysExamined" : 62092,
"seeks" : 1,
"dupsTested" : 62092,
"dupsDropped" : 0,
"seenInvalidated" : 0
"serverInfo" : {
"host" : "xxxxxxx",
"port" : 27017,
"version" : "3.6.15",
"gitVersion" : "xxxxxx"
"ok" : 1}
What I am trying to understand is why does the FETCH stage take 10 seconds when the index scan in INPUT_STAGE takes 60ms. Since I am eventually doing a count() I don't really need mongoDB to return the documents, I only need it to $sum up the number of matching keys and give me the grand total.
Any idea what I am doing wrong?

The query explained there was not a count, it returned quite a few documents:
"nReturned" : 62092,
The estimated execution for each stage suggests that the index scan was expected to take 60ms, and fetching the documents from disk took the additional 9.8 seconds.
There are a couple of reasons this count required fetching the documents:
Key existence cannot be fully determined from the index
The {"$exists":false} predicate is also troublesome. When building an index the value for a document contains the value of each indexed field. There is no value for "nonexistent", so it uses null. Since a document that contains a field whose value is explicitly set to null should not match {"$exists":false}, the query executor must load each document from disk in order to determine if the field was null nor nonexistent. This means that a COUNTSCAN stage cannot be used, which further means that all of the documents to be counted must be loaded from disk.
The $or predicate does not ensure exclusivity
The query executor cannot know ahead of time that the clauses in the $or are mutually exclusive. They are in your query, but in the general case it is possible for a single document to match more than one clause in the $or, so the query executor must load the documents to ensure deduplication.
So how to eliminate the fetch stage?
If you were to query with only the $in clause, or with only the $size clause you should find the count is derived from the index scan, without needing to load any documents.
This is, if you were to run these queries separately from the client, and sum the results, you should find that the overall execution time is less than the query that requires fetching:{"status":"complete","groups":{"$size":0}}){"status":"complete","groups":{"$in":["5e65ffc2a1e6ef0007bc5fa8"]}})
For the {"groups":{"$exists":false}} predicate, you might modify the data slightly, such as ensure that the field always exists, but assign it a value that means "undefined" that can be indexed and queried.
As an example, if you were to run the following update, the groups field would then exist in all documents:{"groups":{"$exists":false}},{"$set":{"groups":false}})
And you could get the equivalent of the above count by running these 2 queries that should both be covered by an index scan, and should run faster together than the query that requires loading documents:{"status":"complete","groups":{"$size":0}}){"status":"complete","groups":{"$in":[false, "5e65ffc2a1e6ef0007bc5fa8"]}})

.{$match: {"$or":[

If you can somehow avoid the empty array case, than the following query can be used:{"status":"complete", "groups": { $in: [ null, "5e65ffc2a1e6ef0007bc5fa8" ] } })
null is equivalent to $exists: false.
Also: I'd suggest to use ObjectId instead of string as type for the groups field.
$size never hit an index!
You can use the following query:{"status":"complete","$or":[
{"groups": {$in: [ null, "5e65ffc2a1e6ef0007bc5fa8" ]}


MongoDB slow with index and sort

I have compound index:
"hidden" : 1,
"country" : 1,
"edited" : 1,
"changeset.when" : -1
And query:
"country" : "ua",
"edited" : true,
"hidden" : false,
"changeset.when" : { "$lt" : ISODate("5138-11-16T09:46:40Z") }
It works well and fast. Now I want to sort result by: { "changeset.when" : -1 } and it slows down a lot. From hundred of milliseconds to 15 seconds.
And here is explain for query with sorting:
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"changeset.when" : -1
"limitAmount" : 15,
"inputStage" : {
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"hidden" : 1,
"country" : 1,
"edited" : 1,
"changeset.when" : -1
"indexName" : "edited_news",
"isMultiKey" : true,
"multiKeyPaths" : {
"hidden" : [ ],
"country" : [ ],
"edited" : [ ],
"changeset.when" : [
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"hidden" : [
"[false, false]"
"country" : [
"[\"ua\", \"ua\"]"
"edited" : [
"[true, true]"
"changeset.when" : [
"(new Date(100000000000000), true)"
Why is it so slow? Explain shows that it successfully uses needed index and field changeset.when is in descending order.
In case if you have compound index try to create query key sequence like your index sequencency. It will bring more performance.
You don't need to make aditional sort for result, by default result will be sorted according index (in your case result will be sorted descending by changeset.when)
For more info please share some documents from your collection.
If you have any question feel free to ask

Improve slow query count mongodb

I'm trying to improve the performance of a count query (to calculate pagination to display on a screen) on a collection of 1138633 documents. The query analyze 391232 document for 364497 returned but it takes ~2sc to be executed and i think it's too long.
My query looks like this:
"$or" : [
"field_1" : {
"$lte" : 1.0
{"field_1" : {
"$eq" : null
"field_2" : {
"$eq" : false
"field_3" : {
"$ne" : true
"field_4" : {
"$eq" : "fr-FR"
"field_5" : {
"$ne" : null
"field_6" : {
"$ne" : null
"field_7" : {
"$gte" : ISODate("2016-10-14T00:00:00.000Z")
field_1 is a number , field_2 and field_3 a boolean, field_5 a string and field_6 an object ID which refer to a collection of 2 documents.
Here are my indexes (db.myCollection.getIndexes() ) :
"v" : 2,
"key" : {
"_id" : 1
"name" : "_id_",
"ns" : "db.myCollection"
"v" : 2,
"key" : {
"field_6" : 1,
"field_7" : -1
"name" : "field_6_1_field_7_-1",
"ns" : "db.myCollection",
"background" : true
"v" : 2,
"key" : {
"field_7" : 1
"name" : "field_7_1",
"background" : true,
"ns" : "db.myCollection"
"v" : 2,
"key" : {
"field_6" : 1
"name" : "field_6_1",
"ns" : "db.myCollection",
"background" : true
"v" : 2,
"key" : {
"field_1" : 1.0
"name" : "field_1_1",
"ns" : "db.myCollection"
I tried everything , like force indexe using hint , change the order of the query ( and the order of the multi key index) but nothing work.
Someone have an idea on what can I try to improve the execution time of this query? Do you need more details? like informations of the executionStats?
EDIT : More Detail, i calculated how much document are concerned by the clause and here is my result :
field 6 : 391232
field 1 lte 1 :721005
field 1 eq null : 417625
field 5 : 819688
field 4: 1123301
field 2 : 1138620
field 7: 1138630 (all document)
field 3: 1138630 (all document)
i reordered my query in the above order and i get ~1.82sc (0.2sc winned xD)
I assume the problem is because of the indexes which are maybe wrong.
For the detail index in explain do you know what section i have to check? here is what i found in execution plan about my indexes :
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 391232,
"executionTimeMillisEstimate" : 427,
"works" : 391234,
"advanced" : 391232,
"needTime" : 1,
"needYield" : 0,
"saveState" : 3060,
"restoreState" : 3060,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"field_6" : 1,
"field_7" : -1
"indexName" : "field_6_1_field_7_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"field_6" : [],
"field_7" : []
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"field_6" : [
"[MinKey, null)",
"(null, MaxKey]"
"field_7" : [
"[new Date(9223372036854775807), new Date(1491350400000)]"
"keysExamined" : 391233,
"seeks" : 2,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0

MongoDB Shard-Key Performance Simple

I have a cluster with 3 config dbs and 3 shards. I am querying against a db with 106M records, each with 410 fields. I imported this data using a shard key of:
{state: 1, zipCode: 1}.
When I run the following queries individually each one completes in less than 5sec. (SC = 1.6M records, NC = 5.2M records)
db.records.find( { "state" : "NC" } ).count()
db.records.find( { "state" : "SC" } ).count()
db.records.find( { "state" : { $in : ["NC"] } } ).count()
db.records.find( { "state" : { $in : ["SC"] } } ).count()
However when I query both states using an $in or an $or the query takes over an hour to complete.
db.records.find( "state" : { $in : [ "NC" , "SC" ] } ).count()
db.records.find({ $or : [ { "state" : "NC" }, { "state" : "SC" } ).count()
The entirety of both states exist on 1 shard. Below are the results of .explain() using the $in query:
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SINGLE_SHARD",
"shards" : [
"shardName" : "s2",
"connectionString" : "s2/,",
"serverInfo" : {
"host" : "MonDbShard2",
"port" : 27000,
"version" : "3.4.7",
"gitVersion" : "cf38c1b8a0a8dca4a11737581beafef4fe120bcd"
"plannerVersion" : 1,
"namespace" : "DBNAME.records",
"indexFilterSet" : false,
"parsedQuery" : {
"state" : {
"$in" : [
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"state" : 1,
"zipCode" : 1
"indexName" : "state_1_zipCode_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"state" : [ ],
"zipCode" : [ ]
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"state" : [
"[\"NC\", \"NC\"]",
"[\"SC\", \"SC\"]"
"zipCode" : [
"[MinKey, MaxKey]"
"rejectedPlans" : [ ]
"ok" : 1
Why would querying two states at one time cause such a drastic difference in completion time? Also, querying a single zipCode without also including the corresponding state, results in the same drastic difference in completion time. I feel like I am misunderstanding how the shard-key actually operates. Any thoughts?

Sorting with $in not returning all docs

I have the following query.
db.getCollection('logs').find({'uid.$id': {
'$in': [
]}, levelno: { '$gte': 10 }
}).sort({_id: 1})
This should return 1847 documents. However, when executing it, I only get 1000 documents, which is the cursor's batchSize and then the cursor closes (setting its cursorId to 0), as if all documents were returned.
If I take out the sorting, then I get all 1847 documents.
So my question is, why does it silently fail when using sorting with the $in operator?
Using explain gives the following output
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "session.logs",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
"levelno" : {
"$gte" : 10
"uid.$id" : {
"$in" : [
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
"inputStage" : {
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"uid.$id" : 1,
"levelno" : 1,
"_id" : 1
"indexName" : "uid.$id_1_levelno_1__id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
"levelno" : [
"[10.0, inf.0]"
"_id" : [
"[MinKey, MaxKey]"
"rejectedPlans" : [
"stage" : "SORT",
"sortPattern" : {
"_id" : 1
"inputStage" : {
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"levelno" : 1,
"_id" : 1,
"uid.$id" : 1
"indexName" : "levelno_1__id_1_uid.$id_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"levelno" : [
"[10.0, inf.0]"
"_id" : [
"[MinKey, MaxKey]"
"uid.$id" : [
"[ObjectId('580e3397812de36b86d68c04'), ObjectId('580e3397812de36b86d68c04')]",
"[ObjectId('580e339a812de36b86d68c08'), ObjectId('580e339a812de36b86d68c08')]",
"[ObjectId('580e339a812de36b86d68c09'), ObjectId('580e339a812de36b86d68c09')]",
"[ObjectId('580e33a9812de36b86d68c0a'), ObjectId('580e33a9812de36b86d68c0a')]",
"[ObjectId('580e33a9812de36b86d68c0b'), ObjectId('580e33a9812de36b86d68c0b')]",
"[ObjectId('580e33bd812de36b86d68c11'), ObjectId('580e33bd812de36b86d68c11')]",
"[ObjectId('580e33c0812de36b86d68c13'), ObjectId('580e33c0812de36b86d68c13')]"
"stage" : "FETCH",
"filter" : {
"$and" : [
"levelno" : {
"$gte" : 10
"uid.$id" : {
"$in" : [
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"_id" : 1
"indexName" : "_id_",
"isMultiKey" : false,
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"_id" : [
"[MinKey, MaxKey]"
"ok" : 1
What's happening is that this sorted query must be performed in-memory as it's not supported by an index, and this limits the results to 32 MB. This behavior is documented here, with a JIRA about addressing this here.
Furthermore, you can't define an index to support this query as you're sorting on a field that isn't part of the query, and neither of these cases apply:
If the sort keys correspond to the index keys or an index prefix,
MongoDB can use the index to sort the query results. A prefix of a
compound index is a subset that consists of one or more keys at the
start of the index key pattern.
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.
You should be able to work around the limitation by using the aggregation framework which can be instructed to use temporary files for its pipeline stage outputs if required via the allowDiskUse: true option:
{$match: {'uid.$id': {
'$in': [
]}, levelno: { '$gte': 10 }
{$sort: {_id: 1}}
], { allowDiskUse: true })
You can use objsLeftInBatch() method to determine how many object are left in batch and iterate over it.
You can override the size and limit of the cursor batch size using cursor.batchSize(size) and cursor.limit(limit)

MongoDB optimize indexes for aggregation

I have an aggregate on a collection with about 1.6M of registers. That consult is a simple example of other more complex, but illustrate the poor optimization of index used in my opinion.
db.getCollection('cbAlters').runCommand("aggregate", {pipeline: [
$match: { cre_carteraId: "31" }
$group: { _id: { ca_tramomora: "$cre_tramoMora" },
count: { $sum: 1 } }
That query toke about 5 sec. The colleccion have 25 indexes configured to differents consults. The one used according to query explain is:
"v" : 1,
"key" : {
"cre_carteraId" : 1,
"cre_periodo" : 1,
"cre_tramoMora" : 1,
"cre_inactivo" : 1
"name" : "cartPerTramInact",
"ns" : "basedatos.cbAlters"
I created an index adjusted to this particular query:
"v" : 1,
"key" : {
"cre_carteraId" : 1,
"cre_tramoMora" : 1
"name" : "cartPerTramTest",
"ns" : "basedatos.cbAlters"
The query optimizer reject this index, and suggests me to use the initial index. Output of my query explain seem like this:
"waitedMS" : NumberLong(0),
"stages" : [
"$cursor" : {
"query" : {
"cre_carteraId" : "31"
"fields" : {
"cre_tramoMora" : 1,
"_id" : 0
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "basedatos.cbAlters",
"indexFilterSet" : false,
"parsedQuery" : {
"cre_carteraId" : {
"$eq" : "31"
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"cre_tramoMora" : 1,
"_id" : 0
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"cre_carteraId" : 1,
"cre_periodo" : 1,
"cre_tramoMora" : 1,
"cre_inactivo" : 1
"indexName" : "cartPerTramInact",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"cre_carteraId" : [
"[\"31\", \"31\"]"
"cre_periodo" : [
"[MinKey, MaxKey]"
"cre_tramoMora" : [
"[MinKey, MaxKey]"
"cre_inactivo" : [
"[MinKey, MaxKey]"
"rejectedPlans" : [
"stage" : "PROJECTION",
"transformBy" : {
"cre_tramoMora" : 1,
"_id" : 0
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"cre_carteraId" : 1,
"cre_tramoMora" : 1
"indexName" : "cartPerTramTest",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"cre_carteraId" : [
"[\"31\", \"31\"]"
"cre_tramoMora" : [
"[MinKey, MaxKey]"
"$group" : {
"_id" : {
"ca_tramomora" : "$cre_tramoMora"
"count" : {
"$sum" : {
"$const" : 1.0
"ok" : 1.0
Then, why optimizer prefers an index less adjusted? Should indexFilterSet (result filtered for index) be true for this aggregate?
How can I improve this index, or something goes wrong with the query?
I do not have much experience with mongoDB, I appreciate any help
As long as you have index cartPerTramInact, optimizer won't use your cartPerTramTest index because first fields are same and in same order.
This goes with other indexes too. When there is indexes what have same keys at same order (like a.b.c.d, a.b.d, a.b) and you query use fields a.b, it will favour that a.b.c.d. Anyway you don't need that index a.b because you already have two indexes what covers a.b (a.b.c.d and a.b.d)
Index a.b.d is used only when you do query with those fields a.b.d, BUT if a.b is already very selective, it's probably faster to do select with index a.b.c.d using only part a.b and do "full table scan" to find that d
There is a hint option for aggregations that can help with the index...