MongoDB not using even the simplest index - mongodb

Please look at the following example. It seems to me that the query should be covered by the index {a: 1}, however explain() gives me an indexOnly: false. What I am doing wrong?
> db.foo.save({a: 1, b: 2});
> db.foo.save({a: 2, b: 3});
> db.foo.ensureIndex({a: 1});
> db.foo.find({a: 1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 6,
"nscannedObjects" : 6,
"n" : 6,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
1,
1
]
]
}
}

Index only denotes a covered query ( http://docs.mongodb.org/manual/applications/indexes/#indexes-covered-queries ) whereby the query and its sort and data can all be found within a single index.
The problem with your query:
db.foo.find({a: 1}).explain();
Is that it must retrieve the full document which means it cannot find all data within the index. Instead you can use:
db.foo.find({a: 1}, {_id:0,a:1}).explain();
Which will mean you only project the a field which makes the entire query fit into the index, and so indexOnly being true.

Related

mongodb query not using index

I have a index:
{
"sourceName" : 1,
"addedDate" : 1,
"sourceKey" : 1,
"appKey" : 1,
}
But when I try to do
db.myCollection.find({and:[
{sourceName: "mySourceName"},
{addedDate: 1414878162405},
{sourceKey:"mySource Key"},
{appKey: "test"}]
}).explain()
It shows cursor is BasicCursor i.e it is not using the index:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 500,
"nscanned" : 500,
"nscannedObjectsAllPlans" : 500,
"nscannedAllPlans" : 500,
"scanAndOrder" : false,
"indexOnly" : false,
...
}
Can anyone please explain me why my query is not using defined index??
Your query object uses and instead of the $and operator so it's looking for an field named 'and' in your documents that contains your query values.
But you don't need to be using $and anyway, as multiple query terms are implicitly ANDed so you can just do:
db.myCollection.find({
sourceName: "mySourceName",
addedDate: 1414878162405,
sourceKey:"mySource Key",
appKey: "test"}
}).explain()
That should be able to use your index just fine.

Why explicit hint provides better performance?

I feel a bit confusing with how index works. If fill up database with documents with keys a, b, and c, each of which has random value (except c, it has incrementing value)
Here is python code I used:
from pymongo import MongoClient
from random import Random
r = Random()
client = MongoClient("server")
test_db = client.test
fubar_col = test_db.fubar
for i in range(100000):
doc = {'a': r.randint(10000, 99999), 'b': r.randint(100000, 999999), 'c': i}
fubar_col.insert(doc)
Then I create an index {c: 1}
Now, if I perform
>db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}, {a: 1, c: 1}).sort({c: -1}).explain()
I got
{
"cursor" : "BtreeCursor c_1 reverse",
"isMultiKey" : false,
"n" : 24668,
"nscannedObjects" : 100000,
"nscanned" : 100000,
"nscannedObjectsAllPlans" : 100869,
"nscannedAllPlans" : 100869,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 478,
"indexBounds" : {
"c" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "nuclight.org:27017"
}
See, mongodb uses c_1 index and it takes about 478 millisecond to perform. And if I specify which index I want to use ( via hint({c: 1}) ):
> db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}, {a: 1, c: 1}).sort({c: -1}).hint({c:1}).explain()
It takes only about 167 milliseconds. Why it happens?
Here is link to json dump of fubar collection fubar.tgz
p.s. I performed these queries several times and result are the same
explain forces MongoDB to re-evaluate all query plans. In a 'normal' query, the cached fastest query plan will be used. From the documentation (emphasis mine):
The explain() operation evaluates the set of query plans and reports
on the winning plan for the query. In normal operations the query
optimizer caches winning query plans and uses them for similar related
queries in the future. As a result MongoDB may sometimes select query
plans from the cache that are different from the plan displayed using
explain().
Unless you really need to iterate the entire result set for a typical query, you might want to include limit() in your query. In your particular example, using limit(100) will return a BasicCursor When using explain, not the index:
> db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}).sort({c: -1}).hint({c:1}).limit(100).explain();
{
"cursor" : "BtreeCursor c_1 reverse",
"n" : 100,
"nscanned" : 432,
"nscannedAllPlans" : 432,
"scanAndOrder" : false,
"millis" : 3,
"indexBounds" : {
"c" : [[{"$maxElement" : 1}, {"$minElement" : 1}]]
},
}
>
> db.fubar.find({'a': {$lt: 50000}, 'b': {$gt: 500000}}).sort({c: -1}).limit(100).explain();
{
"cursor" : "BasicCursor",
"n" : 100,
"nscanned" : 431,
"nscannedAllPlans" : 863,
"scanAndOrder" : true,
"millis" : 12,
"indexBounds" : { },
}
Note that this is a somewhat pathological case, because using the index doesn't help too much (compare nscanned).

mongodb compound index over extending

I have a question regarding compound indexes that i cant seem to find, or maybe just have misunderstood.
Lets say i have created a compound index {a:1, b:1, c:1}. This should make according to
http://docs.mongodb.org/manual/core/indexes/#compound-indexes
the following queries fast.
db.test.find({a:"a", b:"b",c:"c"})
db.test.find({a:"a", b:"b"})
db.test.find({a:"a"})
As i understand it the order of the query is very important, but is it only that explicit subset of {a:"a", b:"b",c:"c"} order that is important?
Lets say i do a query
db.test.find({d:"d",e:"e",a:"a", b:"b",c:"c"})
or
db.test.find({a:"a", b:"b",c:"c",d:"d",e:"e"})
Will these render useless for that specific compound index?
Compound indexes in MongoDB work on a prefix mechanism whereby a and {a,b} would be considered prefixes, by order, of the compound index, however, the order of the fields in the query itself do not normally matter.
So lets take your examples:
db.test.find({d:"d",e:"e",a:"a", b:"b",c:"c"})
Will actually use an index:
db.ghghg.find({d:1,e:1,a:1,c:1,b:1}).explain()
{
"cursor" : "BtreeCursor a_1_b_1_c_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"a" : [
[
1,
1
]
],
"b" : [
[
1,
1
]
],
"c" : [
[
1,
1
]
]
},
"server" : "ubuntu:27017"
}
Since a and b are there.
db.test.find({a:"a", b:"b",c:"c",d:"d",e:"e"})
Depends upon the selectivity and cardinality of d and e. It will use the compound index but as to whether it will use it effectively in a such a manner that allows decent performance of the query depends heavily upon what's in there.

Understanding an index on an array of subdocuments

I've been looking into array (multi-key) indexing on MongoDB and I have the following questions that I haven't been able to find much documentation on:
Indexes on an array of subdocuments
So if I have an array field that looks something like:
{field : [
{a : "1"},
{b : "2"},
{c : "3"}
]
}
I am querying only on field.a and field.c individually (not both together), I believe I have a choice between the following alternatives:
db.Collection.ensureIndex({field : 1});
db.Collection.ensureIndex({field.a : 1});
db.Collection.ensureIndex({field.c : 1});
That is: an index on the entire array; or two indexes on the embedded fields. Now my questions are:
How do you visualize an index on the entire array in option 1 (is it even useful)? What queries is such an index useful for?
Given the querying situation I have described, which of the above two options is better, and why?
You are correct that if you are querying only on the value of a in the field array, both indexes will, in a sense, help you make your query more performant.
However, have a look at the following 3 queries:
> db.zaid.save({field : [{a: 1}, {b: 2}, {c: 3}] });
> db.zaid.ensureIndex({field:1});
> db.zaid.ensureIndex({"field.a":1});
#Query 1
> db.zaid.find({"field.a":1})
{ "_id" : ObjectId("50b4be3403634cff61158dd0"), "field" : [ { "a" : 1 }, { "b" : 2 }, { "c" : 3 } ] }
> db.zaid.find({"field.a":1}).explain();
{
"cursor" : "BtreeCursor field.a_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"field.a" : [
[
1,
1
]
]
}
}
#Query 2
> db.zaid.find({"field.b":1}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
#Query 3
> db.zaid.find({"field":{b:1}}).explain();
{
"cursor" : "BtreeCursor field_1",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"field" : [
[
{
"b" : 1
},
{
"b" : 1
}
]
]
}
}
Notice that the second query doesn't have an index on it, even though you indexed the array, but the third query does. Choosing your indexes based on how you intend to query your data is as important as considering whether the index itself is what you need. In Mongo, the structure of your index can and does make a very large difference on the performance of your queries if you aren't careful. I think that explains your first question.
Your second question is a bit more open ended, but I think the answer, again, lies in how you expect to query your data. If you will only ever be interested in matching on values of "fields.a", then you should save room in memory for other indexes which you might need down the road. If, however, you are equally likely to query on any of those items in the array, and you are reasonably certain that the array will no grow infinitely (never index on an array that will potentially grow over time to an unbound size. The index will be unable to index documents once the array reaches 1024 bytes in BSON.), then you should index the full array. An example of this might be a document for a hand of playing cards which contains an array describing each card in a users hand. You can index on this array without fear of overflowing beyond the index size boundary since a hand could never have more than 52 cards.

search time with index > without index

I have one collection "numbers" with 200000 document object with {number: i} i = 1 to 200000.
Without any index $gt: 10000 gives nscanned 200000 and 115 ms.
With index on number $gt: 10000 gives nscanned 189999 and 355 ms.
Why more time with indexing?
> db.numbers.find({number: {$gt: 10000}}).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 200000,
"nscannedObjects" : 200000,
"n" : 189999,
"millis" : 115,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
> db.numbers.ensureIndex({number: 1})
> db.numbers.find({number: {$gt: 10000}}).explain()
{
"cursor" : "BtreeCursor number_1",
"nscanned" : 189999,
"nscannedObjects" : 189999,
"n" : 189999,
"millis" : 355,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"number" : [
[
10000,
1.7976931348623157e+308
]
]
}
}
In this case, the index doesn't help because your matching result set consists of almost the entire collection. That means it has to load into RAM and traverse most of the index, as well as load into RAM and traverse the documents themselves.
Without the index, it would just do a table scan, inspecting each document and returning if matched.
In cases like this where a query is going to return almost an entire collection, an index may not be helpful.
Adding a .limit() will speed the query up. You can also force the query optimizer to not use the index with .hint():
db.collection.find().hint({$natural:1})
You could also force the query to provide the result values directly from the index itself by limiting the selected fields to only the ones you've indexed. This allows it to avoid the need to load any documents after doing the index scan.
Try this and see if the explain output indicates "indexOnly":true
db.numbers.find({number: {$gt: 10000}}, {number:1}).explain()
Details here:
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-CoveredIndexes