Mongodb query excessively slow - mongodb

I have a collection of tweets, with indexes on userid and tweeted_at (date). I want to find the dates of the oldest and newest tweets in the collection for a user, but the query runs very slowly.
I used explain, and here's what I got. I tried reading the documentation for explain, but I don't understand what is going on here. Is the explain just on the sort? If so, why does it take so long when it's using the index?
> db.tweets.find({userid:50263}).sort({tweeted_at:-1}).limit(1).explain(1)
{
"cursor" : "BtreeCursor tweeted_at_1 reverse",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 12705,
"nscanned" : 12705,
"nscannedObjectsAllPlans" : 12705,
"nscannedAllPlans" : 12705,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 188,
"nChunkSkips" : 0,
"millis" : 7720,
"indexBounds" : {
"tweeted_at" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor tweeted_at_1 reverse",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 12705,
"nscanned" : 12705,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tweeted_at" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
}
}
],
"server" : "adams-server:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 12807,
"yields" : 188,
"unyields" : 188,
"invalidates" : 0,
"advanced" : 0,
"needTime" : 12705,
"needFetch" : 101,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 12807,
"yields" : 188,
"unyields" : 188,
"invalidates" : 0,
"advanced" : 0,
"needTime" : 12705,
"needFetch" : 101,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 12705,
"yields" : 188,
"unyields" : 188,
"invalidates" : 0,
"advanced" : 12705,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ tweeted_at: 1.
0 }",
"boundsVerbose" : "field #0['twe
eted_at']: [MaxKey, MinKey]",
"isMultiKey" : 0,
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 12705,
"children" : [ ]
}
]
}
]
}
}
>
> db.tweets.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "honeypot.tweets"
},
{
"v" : 1,
"unique" : true,
"key" : {
"tweet_id" : 1
},
"name" : "tweet_id_1",
"ns" : "honeypot.tweets",
"dropDups" : true
},
{
"v" : 1,
"key" : {
"tweeted_at" : 1
},
"name" : "tweeted_at_1",
"ns" : "honeypot.tweets"
},
{
"v" : 1,
"key" : {
"keywords" : 1
},
"name" : "keywords_1",
"ns" : "honeypot.tweets"
},
{
"v" : 1,
"key" : {
"user_id" : 1
},
"name" : "user_id_1",
"ns" : "honeypot.tweets"
}
]
>

By looking at the cursor field you can see which index was used:
"cursor" : "BtreeCursor tweeted_at_1 reverse",
BtreeCursor indicates that the query used an index and tweeted_at_1 reverse is the name of the index that was used.
You should check the documentation for each field in the explain to see a detailed description for each field.
Your query lasted 7720 ms (milis) and 12705 documents were scanned(nscanned).
The query is slow because MongoDB scanned all documents that matched your criteria. This happened because MongoDB didn't use your index for querying, but for sorting the data.
To create an index that will be used for querying and sorting, you should create a compound index. Compound index is a single index structure that references multiple fields. You can create a compound index with up to 31 field. You can create a compound index like this (order or fields is important):
db.tweets.ensureIndex({userid: 1, tweeted_at: -1});
This index will be used for searching on userid field and to sort by tweeted_at field.
You can read and see more examples about adding indexes for sorting here.
Edit
If you have other indexes MongoDB is maybe using them. When you're testing query performance you can use hint to use a specific index.
When testing performance of your queries you should always do multiple tests and take an approx. of the results.
Also, if your queries are slow, even when using indexes, then I would check if you have enough memory on the server. Loading the data from disk is order of magnitude slower then loading from the memory. You should always ensure that you have enough RAM, so that all of your data and indexes fit in memory.

Looks like you need to create an index on tweeted_at and userid.
db.tweets.ensureIndex({'tweeted_at':1, 'userid':1})
That should make the query very quick indeed (but at a cost of storage and insert time)

Related

Mongo not using index

I have the following indexes within a collection:
db.JobStatusModel.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "jobs.JobStatusModel"
},
{
"v" : 1,
"key" : {
"peopleId" : 1,
"jobId" : 1
},
"name" : "peopleId_jobId_compounded",
"ns" : "jobs.JobStatusModel"
},
{
"v" : 1,
"key" : {
"jobId" : 1
},
"name" : "jobId_1",
"ns" : "jobs.JobStatusModel",
"background" : true
},
{
"v" : 1,
"key" : {
"peopleId" : 1,
"disInterested" : 1
},
"name" : "peopleId_1_disInterested_1",
"ns" : "jobs.JobStatusModel",
"background" : true
}
]
Trying to work out some slow running queries running against the compound indexes, however, even simple queries aren't making use of indexes:
db.JobStatusModel.find({ jobId : '1f940601ff7385931ec04dca88c853dd' }).explain(true)
{
"cursor" : "BtreeCursor jobId_1",
"isMultiKey" : false,
"n" : 221,
"nscannedObjects" : 221,
"nscanned" : 221,
"nscannedObjectsAllPlans" : 221,
"nscannedAllPlans" : 221,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 1,
"indexBounds" : {
"jobId" : [
[
"1f940601ff7385931ec04dca88c853dd",
"1f940601ff7385931ec04dca88c853dd"
]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor jobId_1",
"isMultiKey" : false,
"n" : 221,
"nscannedObjects" : 221,
"nscanned" : 221,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"jobId" : [
[
"1f940601ff7385931ec04dca88c853dd",
"1f940601ff7385931ec04dca88c853dd"
]
]
}
}
],
"server" : "mongo3.pilot.dice.com:27017",
"filterSet" : false,
"stats" : {
"type" : "FETCH",
"works" : 222,
"yields" : 1,
"unyields" : 1,
"invalidates" : 0,
"advanced" : 221,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 222,
"yields" : 1,
"unyields" : 1,
"invalidates" : 0,
"advanced" : 221,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ jobId: 1.0 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['jobId']: [\"1f940601ff7385931ec04dca88c853dd\", \"1f940601ff7385931ec04dca88c853dd\"]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 221,
"children" : [ ]
}
]
}
}
as we can see from the output I am getting the "indexOnly" : false, from the output meaning it cannot just do an index scan even though my field is indexed. How can I ensure queries are running only against indexes?
even simple queries aren't making use of indexes:
Your query did use an index as indicated by the IXSCAN stage and index cursor ("cursor" : "BtreeCursor jobId_1",).
Trying to work out some slow running queries running against the compound indexes, however,
Based on the provided getIndexes() output, your query on the single field jobId only has one candidate index to consider: {jobId:1}. This query ran in 1 millisecond ("millis" : 1) and returned 221 documents looking at 221 index keys -- an ideal 1:1 hit ratio for key comparisons to matches.
The compound index of {peopleId:1, jobId:1} would only be considered if you also provided a peopleId value in your query. However, you could potentially create a compound index with these fields in the opposite order if you sometimes query solely on jobId but also frequently query on both peopleId and jobId. A compound index on {jobId:1, peopleId:1} would obviate the need for the {jobId:1} index since it could satisfy the same queries.
For more information see Create Indexes to Support Your Queries in the MongoDB manual and the blog post Optimizing MongoDB Compound Indexes.
Note: You haven't mentioned what version of MongoDB server you are using but the format of your explain() output indicates that you're running an older version of MongoDB that has reached End-of-Life (i.e. anything older than MongoDB 3.0 as at Jan-2017). I strongly recommend upgrading to a newer and supported version (eg. MongoDB 3.2 or 3.4) as there are significant improvements. End-of-Life server release series are no longer maintained and may potentially expose your application to known bugs and vulnerabilities that have been addressed in subsequent production releases.
as we can see from the output I am getting the "indexOnly" : false, from the output meaning it cannot just do an index scan even though my field is indexed. How can I ensure queries are running only against indexes?
The indexOnly value will only be true in the special case of a covered query. A covered query is one where all of the fields in the query are part of an index and all of the fields projected in the results are in the same index. Typically indexed queries are not covered: index lookups are used to find matching documents which are then retrieved and filtered to the fields requested in the query projection.
In order to be sure you get indexOnly you need to return only those fields from the index, use projection:
db.collection.find( <query filter>, <projection> )
db.JobStatusModel.find({ jobId : '1f940601ff7385931ec04dca88c853dd' }, {jobId:1, _id:0})

MongoDB find() query scans documents twice (duplicate cursor used) when using limit() + sort()?

I'm fairly new to MongoDB, though I haven't been able to find an explanation for what I'm seeing.
I have a small dataset of about 200 documents, when I run the following query:
db.tweets.find({user:22438186})
I get n / nscannedObjects / nscanned / nscannedObjectsAllPlans / nscannedAllPlans all at 9. The cursor is BtreeCursor user_1. All good.
Introducting Sort()
If I append a sort to the query:
db.tweets.find({user:22438186}).sort({created_at:1})
nscannedObjectsAllPlans / nscannedAllPlans have increased to 30. I can see under the allPlans field:
[
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor created_at_1",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 21,
"nscanned" : 21,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
]
BtreeCursor created_at_1 scanned 21 documents and matched 2? I'm not sure what is going on here as I thought sort() was applied to the documents returned by find(), which appears to be 9 from the user_1 index. In writing this up I'm gathering from the allPlans field that it's also using my created_at_1 index for some reason.
Limit(>n) combined with Sort() == duplicate cursor & document scans?
When I append limit(10) or higher, n remains at 9, nscannedObjects / nscanned are both at 18 and nscannedObjectsAllPlans / nscannedAllPlans now return 60. Why have all but n doubled? The cursor is now QueryOptimizerCursor, There is a clauses field in my explain(true) results, both child objects are exactly the same, the same cursor was used twice causing the duplication? Is this behaviour normal?
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
I've tried a few different limit values and noticed that using a limit of 9, nscannedObjects / nscanned both return back to values of 9 and nscannedObjectsAllPlans / nscannedAllPlans drop down to 29, decrementing by 1 as I decrement the limit.
Under clauses however the 2nd child object is not the same as limit queries of 10 and higher. The cursor field now displays BtreeCursor omitting user_1 for some reason, all the n fields have a value of 0 instead of 9, besides that the rest of the object is the same. For all of these limit queries allPlans field lists the clauses field and another for BtreeCursor created_at_1 (which is used as the cursor for a query with limit of 1).
Actual Question
So what exactly is causing my documents to be scanned twice when limit() and sort() are both used in a find()? The issue only seems to happen if the limit exceeds either nscannedObjects or nscanned. When querying with only limit() or sort() documents are not scanned twice.
Update
Sorry for the confusion, the first code block shows cursor data under the allPlans field. The actual cursor used was *BtreeCursor user_1*.
The 2nd code block is from a query with limit() and sort(). I am providing cursor data listed under clauses, the clauses field lists the same cursor information twice (duplicate). The actual cursor field for that query was *QueryOptimizerCursor*. The duplicate cursors under clauses are *BtreeCursor user_1*.
I've since added a compound index {user:1, created_at:1}, The results for n fields is 9, and nAllPlans 18. Regardless of limit() value or usage with sort(). For some reason under allPlans my original user_id_1 index is still being run alongside the new compound index. If a limit is applied to the query instead of the index user_id_1/BtreeCursor user_1 being used, QueryOptimizerCursor with the two cursors in clauses is being used.
I've been looking into this further and it seems to be the Query Planner uses other indexes in parallel and selecting the optimal index result? I'm not sure if each time I perform this query this 'competition' occurs again or if it is cached.
db.tweets.find({user:22438186}).sort({created_at:1}).limit(10)
Running the query without the compound index produces the following:
{
"clauses" : [
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 9,
"nscannedObjects" : 18,
"nscanned" : 18,
"nscannedObjectsAllPlans" : 60,
"nscannedAllPlans" : 60,
"scanAndOrder" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"allPlans" : [
{
"clauses" : [
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 9,
"nscannedObjects" : 18,
"nscanned" : 18,
"scanAndOrder" : false,
"nChunkSkips" : 0
},
{
"cursor" : "BtreeCursor created_at_1",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 42,
"nscanned" : 42,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
],
"server" : "HOME-PC:27017",
"filterSet" : false,
"stats" : {
"type" : "KEEP_MUTATIONS",
"works" : 43,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 32,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "OR",
"works" : 42,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 32,
"needFetch" : 0,
"isEOF" : 1,
"dupsTested" : 18,
"dupsDropped" : 9,
"locsForgotten" : 0,
"matchTested_0" : 0,
"matchTested_1" : 0,
"children" : [
{
"type" : "SORT",
"works" : 21,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 10,
"needFetch" : 0,
"isEOF" : 1,
"forcedFetches" : 0,
"memUsage" : 6273,
"memLimit" : 33554432,
"children" : [
{
"type" : "FETCH",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ user: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['user']: [22438186.0, 22438186.0]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 9,
"children" : []
}
]
}
]
},
{
"type" : "SORT",
"works" : 21,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 10,
"needFetch" : 0,
"isEOF" : 1,
"forcedFetches" : 0,
"memUsage" : 6273,
"memLimit" : 33554432,
"children" : [
{
"type" : "FETCH",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ user: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['user']: [22438186.0, 22438186.0]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 9,
"children" : []
}
]
}
]
}
]
}
]
}
}
With the compound index:
{
"cursor" : "BtreeCursor user_1_created_at_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"nscannedObjectsAllPlans" : 18,
"nscannedAllPlans" : 18,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
],
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor user_1_created_at_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
],
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"clauses" : [
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 0,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : false,
"nChunkSkips" : 0
}
],
"server" : "HOME-PC:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 11,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 11,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ user: 1, created_at: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['user']: [22438186.0, 22438186.0], field #1['created_at']: [MinKey, MaxKey]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 9,
"children" : []
}
]
}
]
}
}
Hope that clears up the confusion.
If you see the explain() plan, you can see that:
db.tweets.find({user:22438186})
uses the user_1 index.
db.tweets.find({user:22438186}).sort({created_at:1}) uses the created_at_1 index.
This indicates that mongodb has chosen created_at_1 over user_1 for the fact that sort operations perform better when they use an index, and the sort operation is based on the created_at field. That makes mongodb ignore the user_1 index and perform a full collection scan.
So we need to define our indexes carefully in these cases. If we have a compound index on both user_1 and created_at_1, a full table scan will not occur and mongodb will choose the index that supports both the find and the sort operations, which in case would be the compound index.
JIRA has a beautiful explanation why mongoDB uses the QueryOptimizerCursor cursor.
nscannedObjectsAllPlans / nscannedAllPlans drop down to 29
You should not be worrying about these two parameters, they are a representation of the combined scans made by all the plans that mongodb has executed to select the appropriate index.
nscannedObjectsAllPlans is a number that reflects the total number of
documents scanned for all query plans during the database operation
nscannedAllPlans is a number that reflects the total number of
documents or index entries scanned for all query plans during the
database operation.
These lines are from the docs.
So what exactly is causing my documents to be scanned twice when limit() and sort() are both used in a find()?
As said, the documents are not scanned twice, they are scanned in parallel by two different plans executed by mongodb to select the appropriate index. If you have two different indexes, two plans may be run in parallel., and so on.

MongoDB object field and range query index

I have the following structure in the database:
{
"_id" : {
"user" : 14197,
"date" : ISODate("2014-10-24T00:00:00.000Z")
},
...
}
I have a performance problem when I try to select data by user & date-range. Monogo doesn't use index & runs full-scan over collection.
db.timeuse.daily.find({ "_id.user": 289006, "_id.date" : {$gt: ISODate("2014-10-23T00:00:00Z"), $lte: ISODate("2014-10-30T00:00:00Z")}}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 6,
"nscannedObjects" : 66967,
"nscanned" : 66967,
"nscannedObjectsAllPlans" : 66967,
"nscannedAllPlans" : 66967,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 523,
"nChunkSkips" : 0,
"millis" : 1392,
"server" : "mongo-shard0003:27018",
"filterSet" : false,
"stats" : {
"type" : "COLLSCAN",
"works" : 66969,
"yields" : 523,
"unyields" : 523,
"invalidates" : 16,
"advanced" : 6,
"needTime" : 66962,
"needFetch" : 0,
"isEOF" : 1,
"docsTested" : 66967,
"children" : [ ]
},
"millis" : 1392
}
So far I found only one way - use $in.
db.timeuse.daily.find({"_id": { $in: [
{"user": 289006, "date": ISODate("2014-10-23T00:00:00Z")},
{"user": 289006, "date": ISODate("2014-10-24T00:00:00Z")}
]}}).explain()
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
{
"user" : 289006,
"date" : ISODate("2014-10-23T00:00:00Z")
},
{
"user" : 289006,
"date" : ISODate("2014-10-23T00:00:00Z")
}
],
[
{
"user" : 289006,
"date" : ISODate("2014-10-24T00:00:00Z")
},
{
"user" : 289006,
"date" : ISODate("2014-10-24T00:00:00Z")
}
]
]
},
If there's a more elegant way to run this kind of query?
TL;DR: Don't put your data in the _id field and use a compound index: db.timeuse.daily.ensureIndex( { "user" : 1, "date": 1 } ).
Explanation:
You're abusing the _id key convention, or more precisely the fact that MongoDB can index entire objects. What you want to achieve requires index intersection or a compound index, that is, either two separate indexes that can be combined (that feature is called index intersection and by now, it should be available in MongoDB, but it has limitations) or a special index for the set of keys which in MongoDB is called a compound index.
The _id field is indexed by default, but it's indexed as a whole, i.e. the _id index with only support equality queries on the entire object, rather than parts of the object. That also explains why the $in query works.
In general, that data structure with the default index will behave oddly. Consider this:
> db.sort.insert({"_id" : {"name" : "foo", value : 1} });
> db.sort.insert({"_id" : {"name" : "foo", value : 1, bla : "foo"} });
> db.sort.find();
{ "_id" : { "name" : "foo", "value" : 4343 } }
{ "_id" : { "name" : "foo", "value" : 4343, "bla" : "fooffo" } }
> db.sort.find({"_id" : {"name" : "foo", value : 4343} });
{ "_id" : { "name" : "foo", "value" : 4343 } }
// no second result here...
Imagine MongoDB basically hashed the entire object and was simply looking for the object hash - such an index can't support range queries based on some part of the hash.

MongoDB Slow Query

I'm running a mongoDB query and it's taking too long. I'm querying the collection "play_sessions" for the data of 9 users as seen in (1). My documents contain data for a gameplay session for a user as seen (2). I have an index on "user_id" and this index is being used as seen in the .explain() output in (3). My indexes in the .stats() output are shown in (4).
The mongoDB version is 2.6.1. There are approximately 4 million entires in "play_sessions" and 43,000 distinct users.
This example query takes around 2 min and the actual query of 800 users takes a lot longer. I'd like to know why this query is slow and what I can do to speed it up.
(1) The query:
db.play_sessions.find({user_id: {$in: users}}, {play_data:-1}
(2) Example document:
{
"_id" : 1903200,
"score" : 1,
"user_id" : 60538,
"time" : ISODate("2014-02-12T03:49:59.919Z"),
"level" : 1,
"user_attempt_no" : 2,
"game_id" : 181,
"play_data" : [
**Some JSON in here**
],
"time_sec" : 7.989
}
(3) .explain() output
{
"cursor" : "BtreeCursor user_id_1",
"isMultiKey" : false,
"n" : 13724,
"nscannedObjects" : 13724,
"nscanned" : 13732,
"nscannedObjectsAllPlans" : 14128,
"nscannedAllPlans" : 14140,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4463,
"nChunkSkips" : 0,
"millis" : 123631,
"indexBounds" : {
"user_id" : [
[
41930,
41930
],
...,
[
67112,
67112
]
]
},
"server" : "...",
"filterSet" : false
}
(4) .stats() output for the collection:
{
"ns" : "XXX.play_sessions",
"count" : 3957328,
"size" : 318453446112,
"avgObjSize" : 80471,
"storageSize" : 319917328096,
"numExtents" : 169,
"nindexes" : 10,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 1962280880,
"indexSizes" : {
"_id_" : 184205280,
"game_id_1" : 167681584,
"user_id_1" : 113997968,
"user_id_1_game_id_1_level_1_time_1" : 288972544,
"game_id_1_level_1" : 141027824,
"game_id_1_level_1_user_id_1_time_1" : 301645344,
"user_id_1_game_id_1_level_1" : 228674544,
"game_id_1_level_1_user_id_1" : 245549808,
"user_id_1_user_attempt_no_1" : 135958704,
"user_id_1_time_1" : 154567280
},
"ok" : 1
}

MongoDb search performance

I want to know why the follow search in mongo db (C#) would take 50 seconds to execute.
I followed the basic idea of http://calv.info/indexing-schemaless-documents-in-mongo/
I have 100,000 records in a collection(captures). On each document I have a SearchTerm Collection
public class SearchTerm
{
public string Key { get; set; }
public object Value { get; set; }
}
public class Capture
{
//Some other fields
public IList<SearchTerm> SearchTerms { get; set; }
}
I have also defined a index like so
var capturesCollection = database.GetCollection<Capture>("captures");
capturesCollection.CreateIndex("SearchTerms.Key", "SearchTerms.Value");
But the following query takes 50 seconds to execute
var query = Query.Or(Query.And(Query.EQ("SearchTerms.Key", "ClientId"), Query.EQ("SearchTerms.Value", selectedClient.Id)), Query.And(Query.EQ("SearchTerms.Key", "CustomerName"), Query.EQ("SearchTerms.Value", "Jan")));
var selectedCapture = capturesCollection.Find(query).ToList();
Edit: As asked my explain:
clauses: [{ "cursor" : "BtreeCursor SearchTerms.Key_1_SearchTerms.Value_1", "isMultiKey" : true, "n" : 10003, "nscannedObjects" : 100000, "nscanned" : 100000, "scanAndOrder" : false, "indexOnly" : false, "nChunkSkips" : 0, "indexBounds" : { "SearchTerms.Key" : [["ClientId", "ClientId"]], "SearchTerms.Value" : [[{ "$minElement" : 1 }, { "$maxElement" : 1 }]] } }, { "cursor" : "BtreeCursor SearchTerms.Key_1_SearchTerms.Value_1", "isMultiKey" : true, "n" : 70328, "nscannedObjects" : 90046, "nscanned" : 211653, "scanAndOrder" : false, "indexOnly" : false, "nChunkSkips" : 0, "indexBounds" : { "SearchTerms.Key" : [["CustomerName", "CustomerName"]], "SearchTerms.Value" : [[{ "$minElement" : 1 }, { "$maxElement" : 1 }]] } }]
cursor: QueryOptimizerCursor
n: 73219
nscannedObjects: 190046
nscanned: 311653
nscannedObjectsAllPlans: 190046
nscannedAllPlans: 311653
scanAndOrder: false
nYields: 2436
nChunkSkips: 0
millis: 5196
server: piro-pc:27017
filterSet: false
stats: { "type" : "KEEP_MUTATIONS", "works" : 311655, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 73219, "needTime" : 238435, "needFetch" : 0, "isEOF" : 1, "children" : [{ "type" : "OR", "works" : 311655, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 73219, "needTime" : 238435, "needFetch" : 0, "isEOF" : 1, "dupsTested" : 80331, "dupsDropped" : 7112, "locsForgotten" : 0, "matchTested_0" : 0, "matchTested_1" : 0, "children" : [{ "type" : "FETCH", "works" : 100001, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 10003, "needTime" : 89997, "needFetch" : 0, "isEOF" : 1, "alreadyHasObj" : 0, "forcedFetches" : 0, "matchTested" : 10003, "children" : [{ "type" : "IXSCAN", "works" : 100000, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 100000, "needTime" : 0, "needFetch" : 0, "isEOF" : 1, "keyPattern" : "{ SearchTerms.Key: 1, SearchTerms.Value: 1 }", "boundsVerbose" : "field #0['SearchTerms.Key']: [\"ClientId\", \"ClientId\"], field #1['SearchTerms.Value']: [MinKey, MaxKey]", "isMultiKey" : 1, "yieldMovedCursor" : 0, "dupsTested" : 100000, "dupsDropped" : 0, "seenInvalidated" : 0, "matchTested" : 0, "keysExamined" : 100000, "children" : [] }] }, { "type" : "FETCH", "works" : 211654, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 70328, "needTime" : 141325, "needFetch" : 0, "isEOF" : 1, "alreadyHasObj" : 0, "forcedFetches" : 0, "matchTested" : 70328, "children" : [{ "type" : "IXSCAN", "works" : 211653, "yields" : 2436, "unyields" : 2436, "invalidates" : 0, "advanced" : 90046, "needTime" : 121607, "needFetch" : 0, "isEOF" : 1, "keyPattern" : "{}", "boundsVerbose" : "field #0['SearchTerms.Key']: [\"CustomerName\", \"CustomerName\"], field #1['SearchTerms.Value']: [MinKey, MaxKey]", "isMultiKey" : 1, "yieldMovedCursor" : 0, "dupsTested" : 211653, "dupsDropped" : 121607, "seenInvalidated" : 0, "matchTested" : 0, "keysExamined" : 211653, "children" : [] }] }] }] }
Thanks for posting the explain. Let's address the problems one at a time.
First, I don't think this query does what you think it does / want it to do. Let me show you by example using the mongo shell. Your query, translated into the shell, is
{ "$or" : [
{ "$and" : [
{ "SearchTerms.Key" : "ClientId" },
{ "SearchTerms.Value" : "xxx" }
]},
{ "$and" : [
{ "SearchTerms.Key" : "CustomerName" },
{ "SearchTerms.Value" : "Jan" }
]}
]}
This query finds documents where either some Key has the value "ClientId" and some Value has the value "xxx" or some Key has the value "CustomerName" and some Value the value "Jan". The key and the value don't need to be part of the same array element. For example, the following document matches your query
{ "SearchTerms" : [
{ "Key" : "ClientId", "Value" : 691 },
{ "Key" : "banana", "Value" : "xxx" }
]
}
I'm guessing your desired behavior is to match exactly the documents that contain the Key and Value in the same array element. The $elemMatch operator is the tool for the job:
{ "$or" : [
{ "SearchTerms" : { "$elemMatch" : { "Key" : "ClientId", "Value" : "xxx" } } },
{ "SearchTerms" : { "$elemMatch" : { "Key" : "CustomerName", "Value" : "Jan" } } }
]}
Second, I don't think this schema is what you are looking for. You don't describe your use case so I can't be confident, but the situation described in that blog post is a very rare situation where you need to store and search on arbitrary key-value pairs that can change from one document to the next. This is like letting users put in custom metadata. Almost no applications want or need to do this. It looks like your application is storing information about customers, probably for an internal system. You should be able to define a data model for your customers that looks like
{
"CustomerId" : 1234,
"CustomerName" : "Jan",
"ClientId" : "xpj1234",
...
}
This will simplify and improve things dramatically. I think the wires got crossed here because sometimes people call MongoDB "schemaless" and the blog post talks about "schemaless" documents. The blog post really is talking about schemaless documents where you don't know what is going to go in there. Most applications should know pretty much exactly what the general structure of the documents in a collection will be.
Finally, I think on the basis of this we can disregard the issue with the slow query for now. Feel free to ask another question or edit this one with extra explanation if you need more help or if the problem doesn't go away once you've taken into account what I've said here.
1) Please take a look at mongodb log file and see whats the query that gets generated against the database.
2) Enter that query into mongo shell and add ".explain()" at the end -- and see if your index is actually being used (does it say Basic Cursor or Btree Cursor ?)
3) If your index is used, whats the value of "nscanned" attribute? Perhaps your index does not have enough "value diversity" in it?