MongoDB Indexing and Projection - mongodb

I have a few questions about MongoDB:
(1) Does indexing help with projection?
(2) I have assigned a collection a number of indexes and tried to run a find with sort, and then use explain, it shows BtreeCursor index on the sorted field.
Could it be that the other indexes have helped in the query part and explain just didn't show it because it shows only the last index that helped the find?
Or explain should show all indexes that assisted in querying, sorting, etc?
Thanks.

Does indexing helps in projection?
I believe the only time it will really help (defined by performance etc) is if the query is "covered": http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/
So for example, if you wanted to query on {d:1, e:2} and get back {_id, t, e}, you would do:
db.t.ensureIndex({d:1 , e:1, _id:1, t:1});
db.t.find({d:1, e:2}, {_id:1, t:1, e:1});
And that query's explain() output would show indexOnly as true meaning that it never loaded documents from disk to return a response.
So yes, indexes can help with projection under certain circumstances.
I have assigned a collection a number of indexes and tried to run a find with sort, and then use explain, it shows BtreeCursor index on the sorted field.
Yes it does.
Could it be that the other indexes have helped in the query part and explain just didn't show it because it shows only the last index that helped the find?
If you are a victim of index intersectioning then you would use an explain(true) to show all index plans that were used.
It is good to note that separate indexes are not used for find and sort with intersectioning, so the answer here is actually no: http://docs.mongodb.org/manual/core/index-intersection/#index-intersection-and-sort

Related

Mongodb searching on array / indexing

I'm using the airbnb sample set and it has a field that looks like:
"amenities": ["TV", "Cable TV", "Wifi"....
So I'm trying to do a case-INsensitive, wildcard search (on one or more values passed in).
Only thing I've found that works is:
{ amenities: { $in: [ /wi/ ] }}
Is that the best way?
So I ran it in Compass as the dataset was imported (5600 docs), and the Explain says it took ~20ms on my machine and warned there was no index. I then created an index on the amenities column and the same search jumped up to ~100ms. I just created the index through the Compass UI, so not sure why its taking 5x as long with an index? Or if there is a better way to do this?
The way to run that query is:
{ amenities: /wi/i }
//better but not always useful
{ amenities: /wi/i }, { amenities:1, _id:0 }
It already traverses the array, and to be case insensitive it must be on the options.
For multikey indexes the second query won't be a covered query. Otherwise, it would be blazing fast.
I've tested a similar search with and without index though. Exec. time is reduced 10X. (1500ms to 150ms, in a huge collection). Measure with Mongo Hacker.
As you report executionTimeMilliseconds is not that different. But still smaller.
The reason why you don't see a huge decrease in time is because the index stores each array entry separately. When it finds a match, it comes back to collection to fetch the whole array field, instead of using the indexes.
Probably indexes aren't very useful for arrays.
When querying with an unanchored regex, the query executor will have to scan every index key to see if there is a match.
You might find a collated index to be helpful.
Create an index with the appropriate collation, like:
(strength 1 and 2 are case-insensitive)
db.collection.createIndex({amenities:1},{collation:{locale:"en",strength:1}})
Then query using the same collation:
db.collection.find({amenities:"wifi"}).collation({locale:"en",strength:1})
The search will be case insensitive, and it can efficiently use the index.

MongoDB: Indexes, Sorting

After having read the official documentations on indexes, sort, intersection, i'm a little bit confuse on how everything work together.
I've trouble making my query use the indexes i've created. I work on a mongodb 3.0.3, on a collection having ~4millions of document.
To simplify, let's say my document is composed of 6 fields:
{
a:<text>,
b:<boolean>,
c:<text>,
d:<boolean>,
e:<date>,
f:<date>
}
The query I want to achieve is the following :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
So intuitively I've created two indexes
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1, e:1 }, {background: true,name: "test1"})
db.mycoll.createIndex({f:1}, {background: true,name: "test2"})
But the explain() give me that the first index is not used at all.
I known there is some kind of limitation when there is ranges in play in the filter (in the e field), but I can't find my way around it.
Also instead of having a single index on f, I try a compound index on {e:1,f:1} but it didn't change anything.
So What I have misunderstood?
Thanks for your support.
Update: also I find some time the following predicate for mongodb 2.6 :
A good rule of thumb for queries with sort is to order the indexed fields in this order:
First, the field(s) on which you will query for exact values.
Second, the field(s) on which you will sort.
Finally, field(s) on which you will query for a range of values (e.g., $gt, $lt, $in)
An example of using this rule of thumb is in the section on “Sorting the results of a complex query on a range of values” below, including a link to further reading.
Does this also apply for 3.X version?
Update 2: following above predicate, I created the following index
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1 , f:1, e:1}, {background: true,name: "test1"})
And for the same query :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
the index is indeed used. However too much keys seems to be scan, I may need to find a better order the fields in the query/index.
Mongo acts sometimes a bit strange when it comes to the index selection.
Mongo automagically decides what index to use. The smaller an index is the more likely it is used (especially indexes with only one field) - this is my experience. May be this happens because it is more often already loaded in RAM? To find out what index to use when Mongo performs test queries when it is idle. However the result is sometimes unexpected.
Therefore if you know what index to use you can force a query to use a specific index using the $hint option. You should try that.
Your two indexes used in the query and the sort does not overlap so MongoDB can not use them for index intersection:
Index intersection does not apply when the sort() operation requires an index completely separate from the query predicate.

How do i create an index in mongodb on a WHERE and ORDER query?

In mongo, When creating an index I am trying to figure out whether the following query would have an index on a) category_ids and status, OR b) category_ids, status and name???
Source.where(category_ids: [1,2,3], status: Status::ACTIVE).order_by(:name) # ((Ruby/Mongoid code))
Essentially, I am trying to figure out whether indexes should include the ORDER_BY columns? or only the WHERE clauses? Where could I read some more about this?
Yes, an index on thius particular query would be beneficial to the speed of the query. However there is one caveat here, the order of the index fields.
I have noticed you are using an $in there on category_ids. This link is particularly useful in understanding a little complexity which exists from using an $in with an index on the sort (or a sort in general in fact): http://blog.mongolab.com/2012/06/cardinal-ins/
Towards the end it gives you an indea of an optimal index order for your type of query:
The order of fields in an index should be:
First, fields on which you will query for exact values.
Second, fields on which you will sort.
Finally, fields on which you will query for a range of values.
For reference a couple of other helpful links are as follows:
http://docs.mongodb.org/manual/applications/indexes/
http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index
http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/
why does direction of index matter in MongoDB?
And, http://www.slideshare.net/kbanker/mongo-indexoptimizationprimer
These will help you get started on optimising your indexes and making them work for your queries.

Mongo: Quick way to find the last n inserts into a collection

I am running a query like the following:
db.sales.find({location: "LLLA",
created_at: {$gt: ISODate('2012-07-27T00:00:00Z')}}).
sort({created_at: -1}).limit(3)
This query is working as expected, but its not performing fast enough. I have an index on the location field, and a separate index on the created_at field. Please let me know if you see me missing something obvious here to make it faster.
Thanks for your time on this.
Mongo won't use two indexes for a given query, if you want filter by location and sort by created_at you can add a compound index:
db.sales.ensureIndex({location: 1, created_at: -1})
You should run explain on your queries when you have issues like that. Sort in particular causes non-intuitive complications with indexing, and the explain command can occasionally make it obvious why.
It's also worth noting that you can usually sort by _id to approximate insert time (assuming auto generated ObjectId values), but that won't help you as much as a compound index will in this case since you're filtering on an additional field.
just use IDs:
db.sales.find({location: "LLLA"}).sort({_id: -1}).limit(3)

how to structure a compound index in mongodb

I need some advice in creating and ordering indexes in mongo.
I have a post collection with 5 properties:
Posts
status
start date
end date
lowerCaseTitle
sortOrder
Almost all the posts will have the same status of 1 and only a handful will have a rejected status. All my queries will filter on status, start and end dates, and sort on sortOrder. I also will have one query that does a regex search on the title.
Should I set up a compound key on {status:1, start:1, end:1, sort:1}? Does it matter which order I put the fields in the compound index - should I put status first in the compound index since it's the most broad? Is it better to do a compound index rather than a single index on each property? Does mongo only use a single index on any given query?
Are there any hints for indexes on lowerCaseTitle if I'm doing a regex query on that?
sample queries are:
db.posts.find({status: {$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
db.posts.find( {lowerCaseTitle: /japan/, status:{$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
That's a lot of questions in one post ;) Let me go through them in a practical order :
Every query can use at most one index (with the exception of top level $or clauses and such). This includes any sorting.
Because of the above you will definitely need a compound index for your problem rather than seperate per-field indexes.
Low cardinality fields (so, fields with very few unique values across your dataset) should usually not be in the index since their selectivity is very limited.
Order of the fields in your compound index matter, and so does the relative direction of each field in your compound index (e.g. "{name:1, age:-1}"). There's a lot of documentation about compound indexes and index field directions on mongodb.org so I won't repeat all of it here.
Sorts will only use the index if the sort field is in the index and is the field in the index directly after the last field that was used to select the resultset. In most cases this would be the last field of the index.
So, you should not include status in your index at all since once the index walk has eliminated the vast majority of documents based on higher cardinality fields it will at most have 2-3 documents left in most cases which is hardly optimized by a status index (especially since you mentioned those 2-3 documents are very likely to have the same status anyway).
Now, the last note that's relevant in your case is that when you use range queries (and you are) it'll not use the index for sorting anyway. You can check this by looking at the "scanAndOrder" value of your explain() once you test your query. If that value exists and is true it means it'll sort the resultset in memory (scan and order) rather than use the index directly. This cannot be avoided in your specific case.
So, your index should therefore be :
db.posts.ensureIndex({start:1, end:1})
and your query (order modified for clarity only, query optimizer will run your original query through the same execution path but I prefer putting indexed fields first and in order) :
db.posts.find({start: {$lt: today}, end: {$gt: today}, status: {$gte:0}}).sort({sortOrder:1})