Mongodb best schema design for key value pairs indexation - mongodb

I experience some difficulties to find an answer to this question by myself :
What is the best way to store Key/Value Pair in a mongo document AND put an index on it in order to perform multi-criteria searches.
I have documents stored in a mongo database which look like that :
{
_id:xxxxxxxxxx,
name : "x1",
tags : [
{key:"color", value:"blue"},
{key:"size", value:"L"},
{key:"weight", value:5}
]
}
Each key of the "tags" property is unique (I cannot have more than 1 color or size per document) and each key is optionnal (I can have document without "color" specified).
According to the "Indexes FAQ", I created an index like this :
{
"tags.key":1, "tags.value":1
}
To perform a query against this index I use this query (to return all blue items) :
.find({
tags:{$elemMatch:{"key":"color","value":"blue"}}
})
An explain() shows that the query uses the index. Good!
But now, if I want to make a multi-criteria search, I use this query (to return all blue items with an "L" size) :
.find({
$and:[
{tags:{$elemMatch:{"key":"color","value":"blue"}}},
{tags:{$elemMatch:{"key":"size","value":"L"}}}
]
})
This query is working, but the execution plan shows an IXSAN that scans more documents than the number returned :
{
"cursor" : "BtreeCursor Keys",
"isMultiKey" : true,
"n" : 22,
"nscannedObjects" : 53,
"nscanned" : 53,
"nscannedObjectsAllPlans" : 53,
"nscannedAllPlans" : 109,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2,
"indexBounds" : {
"tags.key" : [
[
"color",
"color"
]
],
"tags.value" : [
[
"blue",
"blue"
]
]
},
"server" : "BLABLABLA:27017",
"filterSet" : false,
"stats" : {
"type" : "PROJECTION",
"works" : 55,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 22,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "KEEP_MUTATIONS",
"works" : 55,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 22,
"needTime" : 31,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 54,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 22,
"needTime" : 31,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 22,
"children" : [
{
"type" : "IXSCAN",
"works" : 53,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 53,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ tags.key: 1.0, tags.value: 1.0 }",
"boundsVerbose" : "field #0['tags.key']: [\"color\", \"userid\"], field #1['Keys.v']: [\"4189\", \"4189\"]",
"isMultiKey" : 1,
"yieldMovedCursor" : 0,
"dupsTested" : 53,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 53,
"children" : []
}
]
}
]
}
]
}
}
It seems that only the first $elemMatch is considered in IXSCAN.
So, is there a way to improve this or no?
Either by changing the query, either by changing the design of my document?
Your help is welcome!
Fred

Related

count slow in mongodb 2.6.8

I've a find request which takes 0.031sec, but when I try to do the same request with a count, it takes over 1sec.
I tried different indexes but it's always the same problem.
Count is still slow.
any idea?
volume
1600000 documents
My request
db.books.find(
{
"categories" : { $eq : null},
"theme" : "comics"
}
)
My Index
{
"categories" : 1,
"theme" : 1
}
Explain
{
"cursor" : "BtreeCursor categories_1_theme_1",
"isMultiKey" : false,
"n" : 353912,
"nscannedObjects" : 353912,
"nscanned" : 353912,
"nscannedObjectsAllPlans" : 354821,
"nscannedAllPlans" : 354821,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2771,
"nChunkSkips" : 0,
"millis" : 1111,
"indexBounds" : {
"theme" : [
[
"comics",
"comics"
]
],
"categories" : [
[
null,
null
]
]
},
"server" : "xxxmongoxxx:27017",
"filterSet" : false,
"stats" : {
"type" : "KEEP_MUTATIONS",
"works" : 353913,
"yields" : 2771,
"unyields" : 2771,
"invalidates" : 0,
"advanced" : 353912,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 353913,
"yields" : 2771,
"unyields" : 2771,
"invalidates" : 0,
"advanced" : 353912,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 353912,
"children" : [
{
"type" : "IXSCAN",
"works" : 353913,
"yields" : 2771,
"unyields" : 2771,
"invalidates" : 0,
"advanced" : 353912,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ categories: 1, theme: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['categories']: [\"comics\", \"comics\"], field #1['theme']: [null, null]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 353912,
"children" : []
}
]
}
]
}
}

MongoDB find() query scans documents twice (duplicate cursor used) when using limit() + sort()?

I'm fairly new to MongoDB, though I haven't been able to find an explanation for what I'm seeing.
I have a small dataset of about 200 documents, when I run the following query:
db.tweets.find({user:22438186})
I get n / nscannedObjects / nscanned / nscannedObjectsAllPlans / nscannedAllPlans all at 9. The cursor is BtreeCursor user_1. All good.
Introducting Sort()
If I append a sort to the query:
db.tweets.find({user:22438186}).sort({created_at:1})
nscannedObjectsAllPlans / nscannedAllPlans have increased to 30. I can see under the allPlans field:
[
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor created_at_1",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 21,
"nscanned" : 21,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
]
BtreeCursor created_at_1 scanned 21 documents and matched 2? I'm not sure what is going on here as I thought sort() was applied to the documents returned by find(), which appears to be 9 from the user_1 index. In writing this up I'm gathering from the allPlans field that it's also using my created_at_1 index for some reason.
Limit(>n) combined with Sort() == duplicate cursor & document scans?
When I append limit(10) or higher, n remains at 9, nscannedObjects / nscanned are both at 18 and nscannedObjectsAllPlans / nscannedAllPlans now return 60. Why have all but n doubled? The cursor is now QueryOptimizerCursor, There is a clauses field in my explain(true) results, both child objects are exactly the same, the same cursor was used twice causing the duplication? Is this behaviour normal?
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
I've tried a few different limit values and noticed that using a limit of 9, nscannedObjects / nscanned both return back to values of 9 and nscannedObjectsAllPlans / nscannedAllPlans drop down to 29, decrementing by 1 as I decrement the limit.
Under clauses however the 2nd child object is not the same as limit queries of 10 and higher. The cursor field now displays BtreeCursor omitting user_1 for some reason, all the n fields have a value of 0 instead of 9, besides that the rest of the object is the same. For all of these limit queries allPlans field lists the clauses field and another for BtreeCursor created_at_1 (which is used as the cursor for a query with limit of 1).
Actual Question
So what exactly is causing my documents to be scanned twice when limit() and sort() are both used in a find()? The issue only seems to happen if the limit exceeds either nscannedObjects or nscanned. When querying with only limit() or sort() documents are not scanned twice.
Update
Sorry for the confusion, the first code block shows cursor data under the allPlans field. The actual cursor used was *BtreeCursor user_1*.
The 2nd code block is from a query with limit() and sort(). I am providing cursor data listed under clauses, the clauses field lists the same cursor information twice (duplicate). The actual cursor field for that query was *QueryOptimizerCursor*. The duplicate cursors under clauses are *BtreeCursor user_1*.
I've since added a compound index {user:1, created_at:1}, The results for n fields is 9, and nAllPlans 18. Regardless of limit() value or usage with sort(). For some reason under allPlans my original user_id_1 index is still being run alongside the new compound index. If a limit is applied to the query instead of the index user_id_1/BtreeCursor user_1 being used, QueryOptimizerCursor with the two cursors in clauses is being used.
I've been looking into this further and it seems to be the Query Planner uses other indexes in parallel and selecting the optimal index result? I'm not sure if each time I perform this query this 'competition' occurs again or if it is cached.
db.tweets.find({user:22438186}).sort({created_at:1}).limit(10)
Running the query without the compound index produces the following:
{
"clauses" : [
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 9,
"nscannedObjects" : 18,
"nscanned" : 18,
"nscannedObjectsAllPlans" : 60,
"nscannedAllPlans" : 60,
"scanAndOrder" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"allPlans" : [
{
"clauses" : [
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 9,
"nscannedObjects" : 18,
"nscanned" : 18,
"scanAndOrder" : false,
"nChunkSkips" : 0
},
{
"cursor" : "BtreeCursor created_at_1",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 42,
"nscanned" : 42,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
],
"server" : "HOME-PC:27017",
"filterSet" : false,
"stats" : {
"type" : "KEEP_MUTATIONS",
"works" : 43,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 32,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "OR",
"works" : 42,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 32,
"needFetch" : 0,
"isEOF" : 1,
"dupsTested" : 18,
"dupsDropped" : 9,
"locsForgotten" : 0,
"matchTested_0" : 0,
"matchTested_1" : 0,
"children" : [
{
"type" : "SORT",
"works" : 21,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 10,
"needFetch" : 0,
"isEOF" : 1,
"forcedFetches" : 0,
"memUsage" : 6273,
"memLimit" : 33554432,
"children" : [
{
"type" : "FETCH",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ user: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['user']: [22438186.0, 22438186.0]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 9,
"children" : []
}
]
}
]
},
{
"type" : "SORT",
"works" : 21,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 10,
"needFetch" : 0,
"isEOF" : 1,
"forcedFetches" : 0,
"memUsage" : 6273,
"memLimit" : 33554432,
"children" : [
{
"type" : "FETCH",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ user: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['user']: [22438186.0, 22438186.0]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 9,
"children" : []
}
]
}
]
}
]
}
]
}
}
With the compound index:
{
"cursor" : "BtreeCursor user_1_created_at_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"nscannedObjectsAllPlans" : 18,
"nscannedAllPlans" : 18,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
],
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor user_1_created_at_1",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
],
"created_at" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"clauses" : [
{
"cursor" : "BtreeCursor user_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
},
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"user" : [
[
22438186,
22438186
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 0,
"nscannedObjects" : 9,
"nscanned" : 9,
"scanAndOrder" : false,
"nChunkSkips" : 0
}
],
"server" : "HOME-PC:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 11,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 11,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 10,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 9,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ user: 1, created_at: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['user']: [22438186.0, 22438186.0], field #1['created_at']: [MinKey, MaxKey]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 9,
"children" : []
}
]
}
]
}
}
Hope that clears up the confusion.
If you see the explain() plan, you can see that:
db.tweets.find({user:22438186})
uses the user_1 index.
db.tweets.find({user:22438186}).sort({created_at:1}) uses the created_at_1 index.
This indicates that mongodb has chosen created_at_1 over user_1 for the fact that sort operations perform better when they use an index, and the sort operation is based on the created_at field. That makes mongodb ignore the user_1 index and perform a full collection scan.
So we need to define our indexes carefully in these cases. If we have a compound index on both user_1 and created_at_1, a full table scan will not occur and mongodb will choose the index that supports both the find and the sort operations, which in case would be the compound index.
JIRA has a beautiful explanation why mongoDB uses the QueryOptimizerCursor cursor.
nscannedObjectsAllPlans / nscannedAllPlans drop down to 29
You should not be worrying about these two parameters, they are a representation of the combined scans made by all the plans that mongodb has executed to select the appropriate index.
nscannedObjectsAllPlans is a number that reflects the total number of
documents scanned for all query plans during the database operation
nscannedAllPlans is a number that reflects the total number of
documents or index entries scanned for all query plans during the
database operation.
These lines are from the docs.
So what exactly is causing my documents to be scanned twice when limit() and sort() are both used in a find()?
As said, the documents are not scanned twice, they are scanned in parallel by two different plans executed by mongodb to select the appropriate index. If you have two different indexes, two plans may be run in parallel., and so on.

Why indexOnly==false

I have a collection with index:
{
"UserId" : 1,
"ShareId" : 1,
"ParentId" : 1,
"DeletedDate" : 1
}
If I making query:
db.Files.find({ "UserId" : ObjectId("5450d837f32a1e098c844e2a"),
"ShareId" : ObjectId("5450d879f32a1e098c844e94"),
"ParentId" : ObjectId("5450d8af6a092a0b74a44026"),
"DeletedDate":null},
{_id:0, ShareId:1}).explain()
output says that "indexOnly" : false:
{
"cursor" : "BtreeCursor UserId_1_ShareId_1_ParentId_1_DeletedDate_1",
"isMultiKey" : false,
"n" : 2120,
"nscannedObjects" : 2120,
"nscanned" : 2120,
"nscannedObjectsAllPlans" : 2318,
"nscannedAllPlans" : 2320,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 21,
"nChunkSkips" : 0,
"millis" : 42,
"indexBounds" : {
"UserId" : [
[
ObjectId("5450d837f32a1e098c844e2a"),
ObjectId("5450d837f32a1e098c844e2a")
]
],
"ShareId" : [
[
ObjectId("5450d879f32a1e098c844e94"),
ObjectId("5450d879f32a1e098c844e94")
]
],
"ParentId" : [
[
ObjectId("5450d8af6a092a0b74a44026"),
ObjectId("5450d8af6a092a0b74a44026")
]
],
"DeletedDate" : [
[
null,
null
]
]
},
"server" : "mongowecntprod:27017",
"filterSet" : false,
"stats" : {
"type" : "PROJECTION",
"works" : 2124,
"yields" : 21,
"unyields" : 21,
"invalidates" : 0,
"advanced" : 2120,
"needTime" : 0,
"needFetch" : 2,
"isEOF" : 1,
"children" : [
{
"type" : "KEEP_MUTATIONS",
"works" : 2124,
"yields" : 21,
"unyields" : 21,
"invalidates" : 0,
"advanced" : 2120,
"needTime" : 1,
"needFetch" : 2,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 2124,
"yields" : 21,
"unyields" : 21,
"invalidates" : 0,
"advanced" : 2120,
"needTime" : 1,
"needFetch" : 2,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 2120,
"children" : [
{
"type" : "IXSCAN",
"works" : 2121,
"yields" : 21,
"unyields" : 21,
"invalidates" : 0,
"advanced" : 2120,
"needTime" : 1,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ UserId: 1, ShareId: 1, ParentId: 1, DeletedDate: 1 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['UserId']: [ObjectId('5450d837f32a1e098c844e2a'), ObjectId('5450d837f32a1e098c844e2a')], field #1['ShareId']: [ObjectId('5450d879f32a1e098c844e94'), ObjectId('5450d879f32a1e098c844e94')], field #2['ParentId']: [ObjectId('5450d8af6a092a0b74a44026'), ObjectId('5450d8af6a092a0b74a44026')], field #3['DeletedDate']: [null, null]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 2120,
"children" : []
}
]
}
]
}
]
}
}
but if I making query without DeletedDate:
db.Files.find({ "UserId" : ObjectId("5450d837f32a1e098c844e2a"),
"ShareId" : ObjectId("5450d879f32a1e098c844e94"),
"ParentId" : ObjectId("5450d8af6a092a0b74a44026")},
{_id:0, ShareId:1}).explain()
then "indexOnly" is true.
How I can change first query to making indexOnly=true?
Let me give you a simple example that will hopefully demonstrate what you're seeing when you are querying for a field being null:
db.nullexplain.find()
{ "_id" : ObjectId("5456759f51a9d5271dc55bba"), "a" : 1 }
{ "_id" : ObjectId("545675a251a9d5271dc55bbb"), "a" : null }
{ "_id" : ObjectId("545675a551a9d5271dc55bbc") }
db.nullexplain.ensureIndex({a:1})
db,nullexplain.count({a:1}).count()
1
db.nullexplain.count({a:null}).count()
2
Do you see the issue? When "a" is present and explicitly set to null, it's indexed as null.
When "a" is not present in the document, it's also indexed as null.
When you query:
db.nullexplain.find({a:null},{_id:0,a:1})
{ "a" : null }
{ }
How can we derive from the index only whether the return document should have the field "a" set to null or if the field should not be present at all?
The answer is we cannot and therefore we must examine the document itself.
db.nullexplain.find({a:null},{_id:0,a:1}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 3,
"nscanned" : 3,
"nscannedObjectsAllPlans" : 3,
"nscannedAllPlans" : 3,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 3,
"server" : "Asyas-MacBook-Pro.local:27017",
"filterSet" : false
}
Hope this helps you understand why querying for DeletedDate:null has to check the document and cannot be answered from the index.

MongoDb Java driver iterating over cursor very slow

first of all, I googled and searched this forum but found no direct answer to my question, so I decided to ask a new one.
I have a "sensors" collection containing about 20k sensors. My query is very simple:
QueryBuilder qb = new QueryBuilder().and(
new QueryBuilder().put("q1").lessThan(rb).get(),
new QueryBuilder().put("q3").greaterThan(ra).get()
);
DBCursor cursor = sensorColl.find(qb.get());
begin = System.currentTimeMillis();
while (cursor.hasNext()) {
cursor.next();
}
long totalSearchTime = System.currentTimeMillis()-begin;
logger.debug("totalSearchTime: {}", totalSearchTime);
which shows the totalSearchTime is 316647ms! I repeated this code snippet multiple time and on average it takes similar time to complete. Here is the explain() for the query:
{
"cursor" : "BtreeCursor q1_1_q3_-1",
"isMultiKey" : false,
"n" : 16905,
"nscannedObjects" : 16905,
"nscanned" : 16905,
"nscannedObjectsAllPlans" : 16905,
"nscannedAllPlans" : 16905,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 132,
"nChunkSkips" : 0,
"millis" : 102,
"indexBounds" : {
"q1" : [
[
-Infinity,
30
]
],
"q3" : [
[
Infinity,
10
]
]
},
"server" : "localhost:27017",
"filterSet" : false,
"stats" : {
"type" : "FETCH",
"works" : 16907,
"yields" : 132,
"unyields" : 132,
"invalidates" : 0,
"advanced" : 16905,
"needTime" : 1,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 16906,
"yields" : 132,
"unyields" : 132,
"invalidates" : 0,
"advanced" : 16905,
"needTime" : 1,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ q1: 1.0, q3: -1.0 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['q1']: [-inf.0, 30.0), field #1['q3']: [inf.0, 10.0)",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 16905,
"children" : []
}
]
}
}
And here is the stats()
{
"ns" : "sensor_db.sensors",
"count" : 16999,
"size" : 272697744,
"avgObjSize" : 16041,
"storageSize" : 315080704,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 84451328,
"paddingFactor" : 1.006,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 1888656,
"indexSizes" : {
"_id_" : 981120,
"q1_1_q3_-1" : 907536
},
"ok" : 1
}
I run my app in my test system (my laptop) with Intel(R) Core(TM)2 Duo CPU T7100 # 1.80GHz, 800 MHz and 1.5GB RAM (of which ~700MB are free at the time the query was run). The disk space is 58GB free so that is a lot.
I hope those information is enough for analysis. Thanks a lot for any suggestion!

MongoDB indexOnly false for covered query with sharding [duplicate]

The collection is a sharded collection over the hashed field.
The following query should definitly be indexOnly but explain shows otherwise.
db.collection.ensureIndex({field : "hashed"})
db.collection.ensureIndex({field : 1, "field2" : 1, "field3" : 1})
db.collection.find(
{
field : 100
}
,{field : 1, _id : 0}
)
//.hint({ "field" : 1, "field2" : 1, "field3" : 1})
//.hint({ "field" : "hashed"})
.explain()
"cursor" : "BtreeCursor field_hashed",
"nscannedObjects" : 1,
"nscanned" : 1,
"indexOnly" : false,
I tested to hint both indexes but none of them generate a covered query.
I would appreciate any help or suggestions.
explain():
{
"clusteredType" : "ParallelSort",
"shards" : {
"repset12" : [
{
"cursor" : "BtreeCursor field_hashed",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"field" : [
[
NumberLong(5346856657151215906),
NumberLong(5346856657151215906)
]
]
},
"server" : "server",
"filterSet" : false,
"stats" : {
"type" : "PROJECTION",
"works" : 3,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 1,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "KEEP_MUTATIONS",
"works" : 3,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 1,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "SHARDING_FILTER",
"works" : 2,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 1,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"chunkSkips" : 0,
"children" : [
{
"type" : "FETCH",
"works" : 1,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 1,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 1,
"children" : [
{
"type" : "IXSCAN",
"works" : 1,
"yields" : 0,
"unyields" : 0,
"invalidates" : 0,
"advanced" : 1,
"needTime" : 0,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ field: \"hashed\" }",
"boundsVerbose" : "field #0['field']: [5346856657151215906, 5346856657151215906]",
"isMultiKey" : 0,
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 1,
"children" : []
}
]
}
]
}
]
}
]
}
}
]
},
"cursor" : "BtreeCursor field_hashed",
"n" : 1,
"nChunkSkips" : 0,
"nYields" : 0,
"nscanned" : 1,
"nscannedAllPlans" : 2,
"nscannedObjects" : 1,
"nscannedObjectsAllPlans" : 2,
"millisShardTotal" : 0,
"millisShardAvg" : 0,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"field" : [
[
NumberLong(5346856657151215906),
NumberLong(5346856657151215906)
]
]
},
"millis" : 1
}
As at MongoDB 2.6, you won't get a fully covered sharded query because there is an extra query to check if the shard in question owns that document (see SERVER-5022 in the MongoDB issue tracker).
The mongos router filters documents that are found on a shard but that should not live there according to the sharded cluster metadata.
Documents can exist on more than one shard if:
There is a chunk migration in progress: documents are copied from a donor shard to a destination shard and are not removed from the donor shard until the chunk migration successfully completes.
Documents have been "orphaned" on a shard as a result of a failed migration or incomplete clean up. There is a cleanupOrphaned admin command in MongoDB 2.6 which can be run against a sharded mongod to delete orphaned documents.
This covered query limitation is noted in the Limits: Covered Queries in Sharded Clusters section of the MongoDB documentation but should also be highlighted in the tutorial on Creating Covered Queries. I've raised DOCS-3820 to make this more obvious.