mongodb not using indexes - mongodb

I have a collection with these indexes:
db.colaboradores.getIndexKeys()
[ { "_id" : 1 }, { "nome" : 1 }, { "sobrenome" : 1 } ]
and a query like
db.colaboradores.find({_id: ObjectId("5040e298914224dca3000006")}).explain();
thatworks fine with index
{
"cursor" : "BtreeCursor _id_",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
}
but when run:
db.colaboradores.find({nome: /^Administrador/}).explain()
mongodb do not use indexes any more:
{
"cursor" : "BtreeCursor nome_1",
"nscanned" : 10000,
"nscannedObjects" : 10000,
"n" : 10000,
"millis" : 25,
}
any solutions?
Thanks!

The behaviour you're seeing is expected from MongoDB. This is generally true for any query where you are using a compound index -- one with multiple fields.
The rules of thumb are:
If you have an index on {a:1, b:1, c:1}, then the following queries will be able to use the index efficiently:
find(a)
find(a,b)
find(a,b,c)
find(a).sort(a)
find(a).sort(b)
find(a,b).sort(b)
find(a,b).sort(c)
However, the following queries will not be able to take full advantage of the index:
find(b)
find(c)
find(b,c)
find(b,c).sort(a)
The reason is the way that MongoDB creates compound indexes. The indexes are btrees, and the nodes are present in the btree in sorted order, with the left-most field being the major sort, the next field being the secondary sort, and so on.
If you skip the leading member of the index, then the index traversal will have to skip lots of blocks. If that performance is slow, then the query optimizer will choose to use a full-collection scan rather than use the index.
For more information about MongoDB indexes, see this excellent article here:
http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/

It did use an index - you can tell because the cursor was a BtreeCursor. You have a lot (10000) of documents in your collection where 'nome' equals 'Administrador'.
An explanation of the output:
"cursor" : "Btree_Cursor nome_1" means that the database used an ascending index on "nome" to satisfy the query. If no index were used, the cursor would be "BasicCursor".
"nscanned" : The number of documents that the database had to check ("nscannedObjects" is basically the same thing for this query)
"n" : The number of documents returned. The fact that this is the same as "nscanned" means that the index is efficient - it didn't have to check any documents that didn't match the query.

Related

Indexing MongoDB for quicker find() with sort(), on different fields

I'm running lots of queries of such type:
db.mycollection.find({a:{$gt:10,$lt:100}, b:4}).sort({c:-1, a:-1})
What sort of index should I use to speed it up? I think I'll need to have both {a:1, b:1} and {c:-1, a:-1}, am I right? Or these indexes will somehow interfere with each other at no performance gain?
EDIT: The actual problem for me is that I run many queries in a loop, some of them over small range, others over large range. If I put index on {a:1, b:1}, it selects small chunks very quickly, but when it comes to large range I see an error "too much data for sort() with no index". If, otherwise, I put index on {c:-1, a:-1}, there is no error, but the smaller chunks (and there are more of those) are processed much slower. So, how is it possible to keep the quickness of selection for smaller ranges, but not get error on large amount of data?
If it matters, I run queries through Python's pymongo.
If you had read the documentation you would have seen that using two indexes here would have been useless since MongoDB only uses one index per query (unless it is an $or) until: https://jira.mongodb.org/browse/SERVER-3071 is implemented.
Not only that but also when using a compound sort the order in the index must match the sort order for a index to be used correctly, as such:
Or these indexes will somehow interfere with each other at no performance gain?
If intersectioning were implemented no they would not, {a:1,b:1} does not match the sort and {c:-1,a:-1} is sub-optimal for answering the find() plus a is not a prefix of that compound.
So immediately an iteration of a optimal index would be:
{a:-1,b:1,c:-1}
But this isn't the full story. Since $gt and $lt are actually ranges, like $in they suffer the same problem with indexes, this article should provide the answer: http://blog.mongolab.com/2012/06/cardinal-ins/ don't really see any reason to repeat its content.
Disclaimer: For MongoDB v2.4
Using hint is a nice solution, since it will force the query to use indexes that you chose, so you can optimize the query with different indexes until you are satisfied. The downside is that you are setting your own index per request.
I prefer to set the indexes on the entire collection and let Mongo choose the correct (fastest) index for me, especially for queries that are used repeatedly.
You have two problems in your query:
Never sort on params that are not indexed. You will get this error: "too much data for sort() with no index" if the amount of documents in your .find() are very big, the size depends on the version of mongo that you use. This means that you must have indexes on A and C in order for your query to work.
Now for the bigger problem. You are performing a range query ($lt and $gt on param A), which can't work with Mongo. MongoDB only uses one index at a time, you are using two indexes on the same parameter. There are several solutions to deal with it in your code:
r = range( 11,100 )
db.mycollection.find({a:{$in: r }, b:4}).sort({c:-1, a:-1})
Use only $lt or $gt in your query,
db.mycollection.find({ a: { $lt:100 }, b:4}).sort({c:-1, a:-1})
Get the results and filter them in your python code.
This solution will return more data, so if you have millions of results with that are less then A=11, don't use it!
If you choose this option, make sure you use a compound key with A and B.
Pay attention when using $or in your queries, since $or is less efficiently optimized than $in with it's usage of indexes.
If you define an index {c:-1,a:-1,b:1} it will help with some considerations.
With this option the index fully will be scanned, but based on the index values only the apropriate documents will be visited, and they will be visited in the right order so the ordering phase will not be needed after getting the results. If the index is huge i do not know how it will behave, but i assume when the result would be small it will be slower in case of the resultset is big it will be faster.
About prefix matching. If you hint the index and lower levels are useable to serve the query those levels will be used for. To demonstrate this behaviour i made a short test.
I prepared test data with:
> db.createCollection('testIndex')
{ "ok" : 1 }
> db.testIndex.ensureIndex({a:1,b:1})
> db.testIndex.ensureIndex({c:-1,a:-1})
> db.testIndex.ensureIndex({c:-1,a:-1,b:1})
> for(var i=1;i++<500;){db.testIndex.insert({a:i,b:4,c:i+5});}
> for(var i=1;i++<500;){db.testIndex.insert({a:i,b:6,c:i+5});}
te result of the query with hint:
> db.testIndex.find({a:{$gt:10,$lt:100}, b:4}).hint('c_-1_a_-1_b_1').sort({c:-1, a:-1}).explain()
{
"cursor" : "BtreeCursor c_-1_a_-1_b_1",
"isMultiKey" : false,
"n" : 89,
"nscannedObjects" : 89,
"nscanned" : 588,
"nscannedObjectsAllPlans" : 89,
"nscannedAllPlans" : 588,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1,
"indexBounds" : {
"c" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
],
"a" : [
[
100,
10
]
],
"b" : [
[
4,
4
]
]
},
"server" :""
}
Explanation of the output is the index is scanned that is why nscanned is 588 (number of scanned index entries and documents), the number at nscannedObjects is the number of the scanned documents. So based on the index mongo only reads those documents which match the criteria (the index partially covering or so). as you can see scanAndOrder is false so there is no sorting phase. (that implicates if the index is in memory that will be fast)
Along with the article what others linked : http://blog.mongolab.com/wp-content/uploads/2012/06/IndexVisitation-4.png you have to put first the sort keys in the index and the query keys after, if they have a subset match you have to include the subset in the very same order as they in the sorting criteria (while it does not matter for the query part).
I think it will be better to change the order of the fields in find.
db.mycollection.find({b:4, a:{$gt:10,$lt:100}}).sort({c:-1, a:-1})
and then you add an index
{b:1,a:-1,c:-1}
I tried two different indexes,
one with index in the order of db.mycollection.ensureIndex({a:1,b:1,c:-1})
and the explain plan was like below
{
"cursor" : "BtreeCursor a_1_b_1_c_-1",
"nscanned" : 9542,
"nscannedObjects" : 1,
"n" : 1,
"scanAndOrder" : true,
"millis" : 36,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
3,
10000
]
],
"b" : [
[
4,
4
]
],
"c" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
}
}
and other index with db.mycollection.ensureIndex({b:1,c:-1,a:-1})
> db.mycollection.find({a:{$gt:3,$lt:10000},b:4}).sort({c:-1, a:-1}).explain()
{
"cursor" : "BtreeCursor b_1_c_-1_a_-1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 8,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"b" : [
[
4,
4
]
],
"c" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
],
"a" : [
[
10000,
3
]
]
}
}
>
I believe, since you are querying 'a' over a range of values and 'b' for a specific value I guess second option is more appropriate. nscanned object changed from 9542 to 1

nscanned nscannedObjects and n attribute values are equal in mongos explian command

I am using mongodb for our Application .
I used mongodb profiler by setting up system.profiling level to 2 , did all my application operations , exported all the records from system.profile and finally set the indexes on the collections based upon the profiler result .
Now when i did explain on those queries
Query 1.
db.stocks.find({ symbol: "GOOG", date: "2013-09-13",type: "O",Mini: false, rootsymbol: "GOOG" }).sort( { "price": 1,"call":1} ).explain();
{
"cursor" : "BtreeCursor symbol_1_date_1_type_1_Mini_1_rootsymbol_1_price_1",
"nscanned" : 80,
"nscannedObjects" : 80,
"n" : 80,
"scanAndOrder" : true,
"millis" : 2,
Query 2.
db.stocks.find({ symbol: "NVO" }).explain()
{
"cursor" : "BtreeCursor symbol_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"indexBounds" : {
"symbol" : [
[
"NVO",
"NVO"
]
]
}
}
I was confused of the results as the nscanned , nscannedObjects and n are always equal on all of my queries .
Please let me know if its an issue if the nscanned , nscannedObjects and n values have equal values ??
Please let me know if i am missing something or if its a issue ??
The fact that they are all equal mean you have good, uncovered, index usage.
I shall break down the results:
"nscanned" : 80,
This is number scanned in the index
"nscannedObjects" : 80,
This is the number of documents loaded after scanning the index, if this figure is higher than scanned then you could have bad index usage (depends on scenario).
The times nscannedObjects might be lower than nscanned is on a covered index or ( http://docs.mongodb.org/manual/reference/method/cursor.explain/#explain.nscannedObjects ):
in the case of multikey index on an array field with duplicate documents.
"n" : 80,
This is the amount returned.
However, this does not mean you had a good result for the sort as shown in the first explain by:
"scanAndOrder" : true,
Your first query is good - it is using the index to find the 80 matches that exist. "n" is the number of documents returned; "nscannedRecords" is the number it needed to scan during the query (it could be less than 'scanned' if you have a covered query). "nscanned" is the number of index entries scanned (if an index could be used). Sorting 80 records should be fairly quick; you don't have an index that you could use for it (from what you've shown us).
The second query is also great - it uses an index, and finds the one document that matches.
For more details on explain(), see http://docs.mongodb.org/manual/reference/method/cursor.explain/#explain-output-fields-core.

Is there a way to force mongodb to store certain index in ram?

I have a collection with a relatively big index (but less than ram available) and looking at performance of find on this collection and amount of free ram in my system given by htop it's seems that mongo is not storing full index in the ram. Is there a way to force mongo to store this particular index in the ram?
Example query:
> db.barrels.find({"tags":{"$all": ["avi"]}}).explain()
{
"cursor" : "BtreeCursor tags_1",
"nscanned" : 300393,
"nscannedObjects" : 300393,
"n" : 300393,
"millis" : 55299,
"indexBounds" : {
"tags" : [
[
"avi",
"avi"
]
]
}
}
Not the all objects are tagged with "avi" tag:
> db.barrels.find().explain()
{
"cursor" : "BasicCursor",
"nscanned" : 823299,
"nscannedObjects" : 823299,
"n" : 823299,
"millis" : 46270,
"indexBounds" : {
}
}
Without "$all":
db.barrels.find({"tags": ["avi"]}).explain()
{
"cursor" : "BtreeCursor tags_1 multi",
"nscanned" : 300393,
"nscannedObjects" : 300393,
"n" : 0,
"millis" : 43440,
"indexBounds" : {
"tags" : [
[
"avi",
"avi"
],
[
[
"avi"
],
[
"avi"
]
]
]
}
}
Also this happens when I search for two or more tags (it scans every item as if were no index):
> db.barrels.find({"tags":{"$all": ["avi","mp3"]}}).explain()
{
"cursor" : "BtreeCursor tags_1",
"nscanned" : 300393,
"nscannedObjects" : 300393,
"n" : 6427,
"millis" : 53774,
"indexBounds" : {
"tags" : [
[
"avi",
"avi"
]
]
}
}
No. MongoDB allows the system to manage what is stored in RAM.
With that said, you should be able to keep the index in RAM by running queries against the indexes (check out query hinting) periodically to keep them from getting stale.
Useful References:
Checking Server Memory Usage
Indexing Advice and FAQ
Additionally, Kristina Chodorow provides this excellent answer regarding the relationship between MongoDB Indexes and RAM
UPDATE:
After the update providing the .explain() output, I see the following:
The query is hitting the index.
nscanned is the number of items (docs or index entries) examined.
nscannedObjects is the number of docs scanned
n is the number of docs that match the specified criteria
your dataset is 300393 entries, which is the total number of items in the index, and the matching results.
I may be reading this wrong, but what I'm reading is that all of the items in your collection are valid results. Without knowing your data, it would seem that every item contains the tag "avi". The other thing that this means is that this index is almost useless; indexes provide the most value when they work to narrow the resultant field as much as possible.
From MongoDB's "Indexing Advice and FAQ" page:
Understanding explain's output. There are three main fields to look
for when examining the explain command's output:
cursor: the value for cursor can be either BasicCursor or BtreeCursor.
The second of these indicates that the given query is using an index.
nscanned: he number of documents scanned.
n: the number of documents
returned by the query. You want the value of n to be close to the
value of nscanned. What you want to avoid is doing a collection scan,
that is, where every document in the collection is accessed. This is
the case when nscanned is equal to the number of documents in the
collection.
millis: the number of milliseconds require to complete the
query. This value is useful for comparing indexing strategies, indexed
vs. non-indexed queries, etc.
Is there a way to force mongo to store this particular index in the ram?
Sure, you can walk the index with an index-only query. That will force MongoDB to load every block of the index. But it has to be "index-only", otherwise you will also load all of the associated documents.
The only benefit this will provide is to make some potential future queries faster if those parts of the index are required.
However, if there are parts of the index that are not being accessed by the queries already running, why change this?

mongodb not using indexes when sorting?

i have a collection with these indexes:
> db.message.getIndexKeys()
[
{
"_id" : 1
},
{
"msgid" : 1
},
{
"keywords" : 1,
"msgid" : 1
}
]
and a query like
db.message.find({'keywords': {'$all': ['apple', 'banana']}}).limit(30).explain()
works fine with index
{
"cursor" : "BtreeCursor keywords_1_msgid_1",
"nscanned" : 96,
"nscannedObjects" : 96,
...
}
but when sorting with msgid:
db.message.find({'keywords': {'$all': ['apple', 'banana']}})
.sort({msgid:-1})
.limit(30).explain()
mongodb do not use indexes any more:
{
"cursor" : "BtreeCursor msgid_1 reverse",
"nscanned" : 1784455,
"nscannedObjects" : 1784455,
...
}
any solutions?
Mongo actually is using an index (which you can tell by seeing BtreeCursor in the explain), just not the compound one.
It's important to keep in mind that direction matters when you have a compound index.
Try: db.ensureIndex({ keywords: 1, msg_id: -1 })
Mongo chooses to use msg_id index in reverse in your example because its faster to retrieve the results in sorted order and then match in O(n) time than to match the results and then sort in O(nlogn) time.
It is using an index -- the index on msgid. MongoDB chooses an index to use for a query by trying all possible indexes, and using whichever one finishes first. This result is cached for 1,000 queries, or until a certain number of modifications to the collection are made (data changes, new indexes, etc).
You can see all the query plans tried by passing true to explain().
For more details, see http://www.mongodb.org/display/DOCS/Query+Optimizer.

Why does Mongo hint make a query run up to 10 times faster?

If I run a mongo query from the shell with explain(), get the name of the index used and then run the same query again, but with hint() specifying the same index to be used - "millis" field from explain plan is decreased significantly
for example
no hint provided:
>>db.event.find({ "type" : "X", "active" : true, "timestamp" : { "$gte" : NumberLong("1317498259000") }, "count" : { "$gte" : 0 } }).limit(3).sort({"timestamp" : -1 }).explain();
{
"cursor" : "BtreeCursor my_super_index",
"nscanned" : 599,
"nscannedObjects" : 587,
"n" : 3,
"millis" : 24,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : { ... }
}
hint provided:
>>db.event.find({ "type" : "X", "active" : true, "timestamp" : { "$gte" : NumberLong("1317498259000") }, "count" : { "$gte" : 0 } }).limit(3).sort({"timestamp" : -1 }).hint("my_super_index").explain();
{
"cursor" : "BtreeCursor my_super_index",
"nscanned" : 599,
"nscannedObjects" : 587,
"n" : 3,
"millis" : 2,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : { ... }
}
The only difference is "millis" field
Does anyone know why is that?
UPDATE: "Selecting which index to use" doesn't explain it, because mongo, as far as I know, selects index for each X (100?) runs, so it should be as fast as with hint next (X-1) runs
Mongo uses an algorithm to determine which index to be used when no hint is provided and then caches the index used for the similar query for next 1000 calls
But whenever you explain a mongo query it will always run the index selection algorithm, thus the explain() with hint will always take less time when compared with explain() without hint.
Similar question was answered here
Understanding mongo db explain
Mongo did the same search both times as you can see from the number of scanned objects. Also you can see that the used index was the same (take a look at the "cursor" entry), both used already your my_super_index index.
"hint" only tells Mongo to use that specific index which it already automatically did in the first query.
The second search was simple faster because all the data was probably already in the cache.
I struggled finding reason for same thing. I found that when we have lots of indexes, mongo is indeed taking more time than using hint. Mongo basically is taking lot of time deciding which index to use. Think of a scenario where you have 40 indexes and you do a query. First task which Mongo needs to do is which index is best suited to be used for particular query. This would imply mongo needs to scan all the keys as well as do some computation in every scan to find some performancce index if this key is used. hint will definitely speed up since index key scan will be saved.
I will tell you how to find out how it's faster
1) without index
It will pull every document into memory to get the result
2) with index
If you have a lot of index for that collection it will take index from the cache memory
3) with .hint(_index)
It will take that specific index which you have mention
with hint() without hint()
both time you do .explain("executionStats")
with hint() then you can check totalKeysExamined value that value will match with totalDocsExamined
without hint() you can see totalKeysExamined value is greter then totalDocsExamined
totalDocsExamined this result will perfectly match with result count most of the time.