Why is a covered query sometimes slower in MongoDB? - mongodb

I was under the impression that covered queries were always faster than scanning the collection itself. So why is this covered query slower?
Covered query:
> db.group_panel_responses.find({}, {_id: 0, _panel_id: 1, _group_id: 1, response_count: 1}).hint({_panel_id: 1, _group_id: 1, response_count: -1}).explain()
{
"cursor" : "BtreeCursor _panel_id_1__group_id_1_response_count_-1",
"isMultiKey" : false,
"n" : 20000,
"nscannedObjects" : 0,
"nscanned" : 20000,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 20000,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 156,
"nChunkSkips" : 0,
"millis" : 44,
"indexBounds" : {
"_panel_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"_group_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"response_count" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "DAAVID.local:27017",
"filterSet" : false
}
Same query but without hinting at the index, so not a covered query:
> db.group_panel_responses.find({}, {_id: 0, _panel_id: 1, _group_id: 1, response_count: 1}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 20000,
"nscannedObjects" : 20000,
"nscanned" : 20000,
"nscannedObjectsAllPlans" : 20000,
"nscannedAllPlans" : 20000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 156,
"nChunkSkips" : 0,
"millis" : 40,
"server" : "DAAVID.local:27017",
"filterSet" : false
}

Keep in mind that a query that is covered by an index is generally faster than a regular query because you can retrieve the fields for each document in one step, as opposed to two in a regular query, where you hit the index to find the location of the document and then hit the collection itself to retrieve the fields.
To over simplify a bit, in a normal case where my selection criteria would retrieve 20000 documents, a normal query would have 40000 accesses (20000 for the index and 20000 for the collection) while a covered query would only have 20000 accesses.
In your test case, however, you have no selection criteria. So both queries, covered and uncovered, will do a full collection scan. In this case you lose almost all of the performance boost of a covered query with selection criteria.
If you really want to test the value of a covered query I'd use a much larger document collection and a much more selective query. If the test you are using is representative of your actual production usage I would not expect any performance boost at all.

Related

MongoDB: degraded query performance

I have a user's collection in MongoDB with over 2.5 million of records which constitute to 30 GB. I have about 4 to 6 GB of indexes. It's in sharded environment with two shards, each consisting of replica set. Servers are dedicated especially to Mongo with no overhead. Total RAM is over 10 GB which more than enough for the kind of queries I am performing (shown below).
My concern is that despite of having indexes to the appropriate fields time to retrieve the result is huge (2 minutes to whopping 30 minutes), which is not acceptable. I am newbie to MongoDB & really in confused state as to why this is happening.
Sample schema is:
user:
{
_id: UUID (indexed by default),
name: string,
dob: ISODate,
addr: string,
createdAt: ISODate (indexed),
.
.
.,
transaction:[
{
firstTransaction: ISODate(indexed),
lastTransaction: ISODate(indexed),
amount: float,
product: string (indexed),
.
.
.
},...
],
other sub documents...
}
Sub document length varies from 0- 50 or so.
Queries which I performed are:
1) db.user.find().min({createdAt:ISODate("2014-12-01")}).max({createdAt:ISODate("2014-12-31")}).explain()
This query worked slow at first, but then was lightning fast(I guess because of warming up).
2) db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).explain()
This query took over 30 mins & warming up wasn't of help as every time the performance was same. It returned over half of the collection.
3) db.user.find({transaction:{$elemMatch:{product:'mobile'}}, firstTransaction:{$in:[ISODate("2015-01-01"),ISODate("2015-01-02")]}}}}).explain()
This is the main query which I want to be performant. But to my bad luck this query takes more than 30 mins to perform. I tried many versions of it such as this:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).min({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-01")}}}).max({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-02")}}}).explain()
This query gave me error:
planner returned error: unable to find relevant index for max/min
query & with hint():
planner returned error: hint provided does not work with min query
I used min max function because of the uncertainty of the range queries in MongoDB with $lt, $gt operators, which sometimes ignore either of the bound & end up scanning more documents than needed.
I used indexes such as:
db.user.ensureIndex({createdAt: 1})
db.user.ensureIndex({"transaction.firstTransaction":1})
db.user.ensureIndex({"transaction.lastTransaction":1})
db.user.ensureIndex({"transaction.product":1})
I tried to use compound indexing for the 3 query, which is:
db.user.ensureIndex({"transaction.firstTransaction":1, "transaction.product":1})
But this seems to give me no result. Query gets stuck & never returns the result. I mean it. NEVER. Like deadlocked. I don't know why. So I dropped this index & got the result after waiting for over half an hour (really frustrating).
Please help me out as I am really desperate to find out the solution & out of ideas.
This output might help:
Following is the output for:
query:
db.user.find({transaction:{$elemMatch:{product:"mobile", firstTransaction:{$gte:ISODate("2015-01-01"), $lt:ISODate("2015-01-02")}}}}).hint("transaction.firstTransaction_1_transaction.product_1").explain()
output:
{
"clusteredType" : "ParallelSort",
"shards" : {
"test0/mrs00.test.com:27017,mrs01.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1_transaction.firstTransaction_1",
"isMultiKey" : true,
"n" : 622,
"nscannedObjects" : 350931,
"nscanned" : 352000,
"nscannedObjectsAllPlans" : 350931,
"nscannedAllPlans" : 352000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 119503,
"nChunkSkips" : 0,
"millis" : 375693,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
],
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
]
},
"server" : "ip-12-0-0-31:27017",
"filterSet" : false
}
],
"test1/mrs10.test.com:27017,mrs11.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1_transaction.firstTransaction_1",
"isMultiKey" : true,
"n" : 547,
"nscannedObjects" : 350984,
"nscanned" : 352028,
"nscannedObjectsAllPlans" : 350984,
"nscannedAllPlans" : 352028,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 132669,
"nChunkSkips" : 0,
"millis" : 891898,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
],
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
]
},
"server" : "ip-12-0-0-34:27017",
"filterSet" : false
}
]
},
"cursor" : "BtreeCursor transaction.product_1_transaction.firstTransaction_1",
"n" : 1169,
"nChunkSkips" : 0,
"nYields" : 252172,
"nscanned" : 704028,
"nscannedAllPlans" : 704028,
"nscannedObjects" : 701915,
"nscannedObjectsAllPlans" : 701915,
"millisShardTotal" : 1267591,
"millisShardAvg" : 633795,
"numQueries" : 2,
"numShards" : 2,
"millis" : 891910
}
Query:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).explain()
Output:
{
"clusteredType" : "ParallelSort",
"shards" : {
"test0/mrs00.test.com:27017,mrs01.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1",
"isMultiKey" : true,
"n" : 553072,
"nscannedObjects" : 553072,
"nscanned" : 553072,
"nscannedObjectsAllPlans" : 553072,
"nscannedAllPlans" : 553072,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 164888,
"nChunkSkips" : 0,
"millis" : 337909,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
"server" : "ip-12-0-0-31:27017",
"filterSet" : false
}
],
"test1/mrs10.test.com:27017,mrs11.test.com:27017" : [
{
"cursor" : "BtreeCursor transaction.product_1",
"isMultiKey" : true,
"n" : 554176,
"nscannedObjects" : 554176,
"nscanned" : 554176,
"nscannedObjectsAllPlans" : 554176,
"nscannedAllPlans" : 554176,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 107496,
"nChunkSkips" : 0,
"millis" : 327928,
"indexBounds" : {
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
"server" : "ip-12-0-0-34:27017",
"filterSet" : false
}
]
},
"cursor" : "BtreeCursor transaction.product_1",
"n" : 1107248,
"nChunkSkips" : 0,
"nYields" : 272384,
"nscanned" : 1107248,
"nscannedAllPlans" : 1107248,
"nscannedObjects" : 1107248,
"nscannedObjectsAllPlans" : 1107248,
"millisShardTotal" : 665837,
"millisShardAvg" : 332918,
"numQueries" : 2,
"numShards" : 2,
"millis" : 337952
}
Please let me know if I have missed any of the details.
Thanks.
1st: Your queries are overly complicated. Using $elemMatch way too often.
2nd: if you can include your shard key in the query it will drastically improve speed.
I'm going to optimize your queries for you:
db.user.find({
createdAt: {
$gte: ISODate("2014-12-01"),
$lte: ISODate("2014-12-31")
}
}).explain()
db.user.find({
'transaction.product':'mobile'
}).explain()
db.user.find({
'transaction.product':'mobile',
firstTransaction:{
$in:[
ISODate("2015-01-01"),
ISODate("2015-01-02")
]
}
}).explain()
Bottom line is this: include your shard key each time is a time saver.
It might even save time to loop through your shard keys and make the same query multiple times.
Reason for performance degradation was the large working set. For some queries (mainly range queries) the set exceeded physical limit & page faults occurred. Due to this performance got degraded.
One solution I did was to apply some filters for the query which will limit the result set & tried to perform equality check instead of the range (iterating over range).
Those tweaks worked for me. Hope it helps others too.

index for gte, lte and sort in different fields

My query to mongodb is:
db.records.find({ from_4: { '$lte': 7495 }, to_4: { '$gte': 7495 } }).sort({ from_11: 1 }).skip(60000).limit(100).hint("from_4_1_to_4_-1_from_11_1").explain()
I suggest that it should use index from_4_1_to_4_-1_from_11_1
{
"from_4": 1,
"to_4": -1,
"from_11": 1
}
But got error:
error: {
"$err" : "Runner error: Overflow sort stage buffered data usage of 33555322 bytes exceeds internal limit of 33554432 bytes",
"code" : 17144
} at src/mongo/shell/query.js:131
How to avoid this error?
Maybe I should create another index, that better fits my query.
I tried index with all ascending fields too ...
{
"from_4": 1,
"to_4": 1,
"from_11": 1
}
... but the same error.
P.S. I noticed, that when I remove skip command ...
> db.records.find({ from_4: { '$lte': 7495 }, to_4: { '$gte': 7495 } }).sort({ from_11: 1 }).limit(100).hint("from_4_1_to_4_-1_from_11_1").explain()
...it's ok, I got explain output, but it says that I don't use index: "indexOnly" : false
{
"clauses" : [
{
"cursor" : "BtreeCursor from_4_1_to_4_-1_from_11_1",
"isMultiKey" : false,
"n" : 100,
"nscannedObjects" : 61868,
"nscanned" : 61918,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"from_4" : [
[
-Infinity,
7495
]
],
"to_4" : [
[
Infinity,
7495
]
],
"from_11" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"scanAndOrder" : true,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"from_4" : [
[
-Infinity,
7495
]
],
"to_4" : [
[
Infinity,
7495
]
],
"from_11" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 100,
"nscannedObjects" : 61868,
"nscanned" : 61918,
"nscannedObjectsAllPlans" : 61868,
"nscannedAllPlans" : 61918,
"scanAndOrder" : false,
"nYields" : 832,
"nChunkSkips" : 0,
"millis" : 508,
"server" : "myMac:27026",
"filterSet" : false
}
P.P.S I have read mongo db tutorial about sort indexes and think that I do all right.
Update
accroding #dark_shadow advice I created 2 more indexes:
db.records.ensureIndex({from_11: 1})
db.records.ensureIndex({from_11: 1, from_4: 1, to_4: 1})
and index db.records.ensureIndex({from_11: 1}) becomes what I need:
db.records.find({ from_4: { '$lte': 7495 }, to_4: { '$gte': 7495 } }).sort({ from_11: 1 }).skip(60000).limit(100).explain()
{
"cursor" : "BtreeCursor from_11_1",
"isMultiKey" : false,
"n" : 100,
"nscannedObjects" : 90154,
"nscanned" : 90155,
"nscannedObjectsAllPlans" : 164328,
"nscannedAllPlans" : 164431,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1284,
"nChunkSkips" : 0,
"millis" : 965,
"indexBounds" : {
"from_11" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "myMac:27025",
"filterSet" : false
}
When you use range queries (and you are) mongo query don't use the index for sorting anyway. You can check this by looking at the "scanAndOrder" value of your explain() once you test your query. If that value exists and is true it means it'll sort the resultset in memory (scan and order) rather than use the index directly. This is the reason why you are getting error in your first query.
As the Mongodb documentation says,
For in-memory sorts that do not use an index, the sort() operation is significantly slower. The sort() operation will abort when it uses 32 megabytes of memory.
You can check the value of scanAndOrder in your first query by using limit(100) for in memory sorting.
Your second query works because you have used limit so it will sort only 100 documents which can be done in memory.
Why "indexOnly" : false ?
This simply indicates that all the fields you wish to return are not in the index, the BtreeCursor indicates that the index was used for the query (a BasicCursor would mean it had not). For this to be an indexOnly query, you would need to be returning only the those fields in the index (that is: {_id : 0,from_4 :1, to_4:1, from_11 :1 }) in your projection. That would mean that it would never have to touch the data itself and could return everything you need from the index alone. You can check this also using the explain once you have modified your query for returning only mentioned fields.
Now, you will be confused. It uses index or not ? For sorting, it won't use the index but for querying it is using the index. That's the reason you get BtreeCusrsor (you should have seen your index name also in that).
Now, to solve your problem you can either create two index:
{
"from_4": 1,
"to_4": 1,
}
{
"from_11" : 1
}
and then see if it's giving error now or using your index for sorting by carefully observing scanOrder value.
There is one more work around:
Change the order of compund index:
{
"FROM_11" : 1,
"from_4": 1,
"to_4": 1,
}
NOT SURE ABOUT THIS APPROACH. It should work hopefully.
Looking at what you are trying to get, you can also do sort with {from_11:-1}.limit(1868).
I hope I have made the things a bit clearer now. Please do some testing based on my suggestions. If you face any issues, please let me know. We can work on it.
Thanks

MongoDB - can not get a covered query

So I have an empty database 'tests' and a collection named 'test'.
First I ensured that my index was set correctly.
db.test.ensureIndex({t:1})
db.test.getIndices()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "tests.test"
},
{
"v" : 1,
"key" : {
"t" : 1
},
"name" : "t_1",
"ns" : "tests.test"
}
]
After that I inserted some test records.
db.test.insert({t:1234})
db.test.insert({t:5678})
When I query the DB with following command and let Mongo explain the results I get the following output:
db.test.find({t:1234},{_id:0}).explain()
{
"cursor" : "BtreeCursor t_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"t" : [
[
1234,
1234
]
]
},
"server" : "XXXXXX:27017",
"filterSet" : false
}
Can anyone please explain to me why indexOnly is false?
Thanks in advance.
To be a covered index query you need to only retrieve those fields that are in the index:
> db.test.find({ t: 1234 },{ _id: 0, t: 1}).explain()
{
"cursor" : "BtreeCursor t_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 0,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"t" : [
[
1234,
1234
]
]
},
"server" : "ubuntu:27017",
"filterSet" : false
}
Essentially this means that only the index is used in order to retrieve the data, without the need to go back to the actual document and retrieve further information. This can be as many fields as you need ( within reason ), but they do need to be included within the index and the only fields that are returned.
Hmm the reason has not been clearly explained (confusing me actually) so here is my effort.
Essentially in order for MongoDB to know that said index covers the query it has to know what fields you want.
If you just say you don't want _id how can it know that * - _id = t without looking?
Here * represents all fields, like it does in SQL.
Answer is it cannot. That is why you need to provide the full field/select/projection/whatever word they use for it definition so that MongoDB can know that your return fits the index.

MongoDB index not helping query with multikey index

I have a collection of documents with a multikey index defined. However, the performance of the query is pretty poor for just 43K documents. Is ~215ms for this query considered poor? Did I define the index correctly if nscanned is 43902 (which equals the total documents in the collection)?
Document:
{
"_id": {
"$oid": "50f7c95b31e4920008dc75dc"
},
"bank_accounts": [
{
"bank_id": {
"$oid": "50f7c95a31e4920009b5fc5d"
},
"account_id": [
"ff39089358c1e7bcb880d093e70eafdd",
"adaec507c755d6e6cf2984a5a897f1e2"
]
}
],
"created_date": "2013,01,17,09,50,19,274089",
}
Index:
{ "bank_accounts.bank_id" : 1 , "bank_accounts.account_id" : 1}
Query:
db.visitor.find({ "bank_accounts.account_id" : "ff39089358c1e7bcb880d093e70eafdd" , "bank_accounts.bank_id" : ObjectId("50f7c95a31e4920009b5fc5d")}).explain()
Explain:
{
"cursor" : "BtreeCursor bank_accounts.bank_id_1_bank_accounts.account_id_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 43902,
"nscanned" : 43902,
"nscannedObjectsAllPlans" : 43902,
"nscannedAllPlans" : 43902,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 213,
"indexBounds" : {
"bank_accounts.bank_id" : [
[
ObjectId("50f7c95a31e4920009b5fc5d"),
ObjectId("50f7c95a31e4920009b5fc5d")
]
],
"bank_accounts.account_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Not_Important"
}
I see three factors in play.
First, for application purposes, make sure that $elemMatch isn't a more appropriate query for this use-case. http://docs.mongodb.org/manual/reference/operator/elemMatch/. It seems like it would be bad if the wrong results came back due to multiple subdocuments satisfying the query.
Second, I imagine the high nscanned value can be accounted for by querying on each of the field values independently. .find({ bank_accounts.bank_id: X }) vs. .find({"bank_accounts.account_id": Y}). You may see that nscanned for the full query is about equal to nscanned of the largest subquery. If the index key were being evaluated fully as a range, this would not be expected, but...
Third, the { "bank_accounts.account_id" : [[{"$minElement" : 1},{"$maxElement" : 1}]] } clause of the explain plan shows that no range is being applied to this portion of the key.
Not really sure why, but I suspect it has something to do with account_id's nature (an array within a subdocument within an array). 200ms seems about right for an nscanned that high.
A more performant document organization might be to denormalize the account_id -> bank_id relationship within the subdocument, and store:
{"bank_accounts": [
{
"bank_id": X,
"account_id: Y,
},
{
"bank_id": X,
"account_id: Z,
}
]}
instead of:
{"bank_accounts": [{
"bank_id": X,
"account_id: [Y, Z],
}]}
My tests below show that with this organization, the query optimizer gets back to work and exerts a range on both keys:
> db.accounts.insert({"something": true, "blah": [{ a: "1", b: "2"} ] })
> db.accounts.ensureIndex({"blah.a": 1, "blah.b": 1})
> db.accounts.find({"blah.a": 1, "blah.b": "A RANGE"}).explain()
{
"cursor" : "BtreeCursor blah.a_1_blah.b_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"blah.a" : [
[
1,
1
]
],
"blah.b" : [
[
"A RANGE",
"A RANGE"
]
]
}
}

Improve querying fields exist in MongoDB

I'm in progress with estimation of MongoDB for our customers. Per requirements we need associate with some entity ent variable set of name-value pairs.
db.ent.insert({'a':5775, 'b':'b1'})
db.ent.insert({'c':'its a c', 'b':'b2'})
db.ent.insert({'a':7557, 'c':'its a c'})
After this I need intensively query ent for presence of fields:
db.ent.find({'a':{$exists:true}})
db.ent.find({'c':{$exists:false}})
Per MongoDB docs:
$exists is not very efficient even with an index, and esp. with {$exists:true} since it will effectively have to scan all indexed values.
Can experts there provide more efficient way (even with shift the paradigm) to deal fast with vary name-value pairs
You can redesign your schema like this:
{
pairs:[
{k: "a", v: 5775},
{k: "b", v: "b1"},
]
}
Then you indexing your key:
db.people.ensureIndex({"pairs.k" : 1})
After this you will able to search by exact match:
db.ent.find({'pairs.k':"a"})
In case you go with Sparse index and your current schema, proposed by #WesFreeman, you will need to create an index on each key you want to search. It can affect write performance or will be not acceptable if your keys are not static.
Simply redesign your schema such that it's an indexable query. Your use case is infact analogous to the first example application given in MongoDB The Definitive Guide.
If you want/need the convenience of result.a just store the keys somewhere indexable.
instead of the existing:
db.ent.insert({a:5775, b:'b1'})
do
db.ent.insert({a:5775, b:'b1', index: ['a', 'b']})
That's then an indexable query:
db.end.find({index: "a"}).explain()
{
"cursor" : "BtreeCursor index_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"index" : [
[
"a",
"a"
]
]
}
}
or if you're ever likely to query also by value:
db.ent.insert({
a:5775,
b:'b1',
index: [
{name: 'a', value: 5775},
{name: 'b', value: 'b1'}
]
})
That's also an indexable query:
db.end.find({"index.name": "a"}).explain()
{
"cursor" : "BtreeCursor index.name_",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"index.name" : [
[
"a",
"a"
]
]
}
}
I think a sparse index is the answer to this, although you'll need an index for each field. http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes
Sparse indexes should help with $exists:true queries.
Even still, if your field is not really sparse (meaning it's mostly set), it's not going to help you that much.
Update I guess I'm wrong. Looks like there's an open issue ( https://jira.mongodb.org/browse/SERVER-4187 ) still that $exists doesn't use sparse indexes. However, you can do something like this with find and sort, which looks like it properly uses the sparse index:
db.ent.find({}).sort({a:1});
Here's a full demonstration of the difference, using your example values:
> db.ent.insert({'a':5775, 'b':'b1'})
> db.ent.insert({'c':'its a c', 'b':'b2'})
> db.ent.insert({'a':7557, 'c':'its a c'})
> db.ent.ensureIndex({a:1},{sparse:true});
Note that find({}).sort({a:1}) uses the index (BtreeCursor):
> db.ent.find({}).sort({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
And find({a:{$exists:true}}) does a full scan:
> db.ent.find({a:{$exists:true}}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Looks like you can also use .hint({a:1}) to force it to use the index.
> db.ent.find().hint({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
How about setting the non-exists field to null? Then you can query them with {field: {$ne: null}}.
db.ent.insert({'a':5775, 'b':'b1', 'c': null})
db.ent.insert({'a': null, 'b':'b2', 'c':'its a c'})
db.ent.insert({'a':7557, 'b': null, 'c':'its a c'})
db.ent.ensureIndex({"a" : 1})
db.ent.ensureIndex({"b" : 1})
db.ent.ensureIndex({"c" : 1})
db.ent.find({'a':{$ne: null}}).explain()
Here's the output:
{
"cursor" : "BtreeCursor a_1 multi",
"isMultiKey" : false,
"n" : 4,
"nscannedObjects" : 4,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
null
],
[
null,
{
"$maxElement" : 1
}
]
]
},
"server" : "my-laptop"
}