Where does compound indexes in mongodb come into play - mongodb

What are the advantages we get from compound indexes. I mean suppose we have a collection, in which I have to index over 2 fields say key1 and key2. How different is it from having a compound index {key1:1, key2:1}. Whats the problem with having 2 separate indexes. Can't mongodb make use of 2 or more indexes to satisfy a query.

As at MongoDB 2.2:
Every query, including update operations, use one and only one index.
The query optimizer selects the index empirically by occasionally running alternate query plans and by selecting the plan with the best response time for each query type.
An exception to the above rule is $or queries; each clause is executed in parallel and can use a separate index.
For more information see:
Indexing Overview
Query Optimizer
Explain

Related

How does mongodb decide which index to use for a query?

When a certain query is done on a mongodb collection, if there are multiple indexes that can be used to perform the query, how does mongodb choose the index for the query?
for an example, in a 'order' collection, if there are two indexes for columns 'customer' and 'vendor', and a query is issued with both customer and vendor specified, how does mongodb decide whether to use the customer index or the vendor index?
Is there a way to instruct mongodb to prefer a certain index over another, for a given query?
When a certain query is done on a mongodb collection, if there are
multiple indexes that can be used to perform the query, how does
mongodb choose the index for the query?
You can generate a query plan for a query you are trying to analyze - see what indexes are used and how they are used. Use the explain method for this; e.g. db.collection.explain().find(). The explain takes a parameter with values "queryPlanner" (the default), "executionStats" and "allPlansExecution". Each of these have different plan output.
The query optimizer generates plans for all the indexes that could be used for a given query. In your example order collection, the two single field indexes (one each for the fields customer and vendor) are possible candidates (for a query filter with both the fields). The optimizer uses each of the plans and executes them for a certain period of time and chooses the best performing candidate (this is determined based upon factors like - which returned most documents in least time, and other factors). Based upon this it will output the winning and rejected plans and these can be viewed in the plan output. You will see one of the indexes in the winning plan and the other in the rejected plan in the output.
MongoDB caches the plans for a given query shape. Query plans are cached so that plans need not be generated and compared against each other every time a query is executed.
Is there a way to instruct mongodb to prefer a certain index over
another, for a given query?
There are couple of ways you can use:
Force MongoDB to use a specific index using the hint() method.
Set Index Filters to specify which indexes the optimizer will evaluate for a query shape. Note that this setting is not persisted after a server shutdown.
Their official website states:
MongoDB uses multikey indexes to index the content stored in arrays. If you index a field that holds an array value, MongoDB creates separate index entries for every element of the array. These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays.
You can checkout This article for more information
For your second query, you can try creating custom indexes for documents. Checkout their Documentation for the same

In mongodb, what does it mean for an index to support a query?

I have the following definition about indexes in mongodb:
An index supports a query when the index contains all the fields scanned by the query. The query scans the index and not the collection. Creating indexes that support queries results in greatly increased query performance.
Does it imply that an index is taken into account for a query execution ONLY if it contains ALL the fields requested by the query? So that, for example, if my query is searching for fields (a,b,c) and the only index in the collection was created on (b), it won't be used at all for the execution?
It depends on the query. From the Query Plans page:
For a query, the MongoDB query optimizer chooses and caches the most efficient query plan given the available indexes.
Implicit in that statement is that the query you submit may not be the query that is executed; MongoDB may rewrite your query in multiple ways during the evaluation process. Use cursor.explain() to view the query plans considered by MongoDB and see which was chosen to execute your specific query (and why it was chosen).
The diagram below is from version 4.0 of the Query Plans page but I think it does a good job illustrating the query planner logic.

Optimizing MongoDB indexing (two fields query)

I have two fields scheduledStamp and email in a mongodb collection called inventory.
Having the following jpa query:
fun findAllByScheduledStampAfterAndEmailEquals(scheduledStamp:Long,email:String):List<Inventory>
What is the best way to index this collection?
I want to have less indexes as possible, avoiding unnecessary indexes.
Knowing that:
This collection can have more than million entries (index is needed)
Querying by:
db.inventory.find({ scheduledStamp: {$gt:1594048295294}})
for sure results few entries
Querying by:
db.inventory.find({ email: "abc#gmail.com"})
for sure results few entries
If you need to support query only on email : Indexing email is must
If you need to support query only on scheduledStamp: Indexing scheduledStamp is must
If you want of query on both, a third index is required. But you can create a compound index to cover this query and one of the above queries.
Since Mongo follows prefix match for selecting index:
You may have index on {"email":1} and {"scheduledStamp:1","email":1}
OR
You may have index on {"scheduledStamp":1} and {"email:1","scheduledStamp":1}
But since you said these fields return few documents:
Just having 2 indexes on {"email":1} and {"scheduledStamp":1} may perform good if not optimum.

Why does mongodb not use index scan but collection scan with find()?

I am using mongodb 3.2.4
When I execute db.mytable.find().explain() The winning plan is 'Collscan'
But when I execute db.mytable.find().hint(_id:1).explain() The winning plan is 'IXscan'
So here comes a question: since _id is the default index of a table, why mongodb does not use this index to query?
An index can be used when there is a filter criteria or a sort operation - when the fields in the index are used in the filter predicate and/or the sort. In your case, the find method doesn't have a filter criteria or a sort - so no index is used, and you can see that in the query plan as a collection scan. It is as expected. But, when you provide a hint to the find method the query optimizer tries to use the index, and in your case it did (and you see it in the query plan as an IXSCAN). In either case, with or the without the hint, the find has to scan all the documents or keys in the index.
The _id has a default unique index, yes, but unless you are using the _id field in the query filter predicate or in a sort, the query cannot use it (or, specify explicitly to use index with a hint). You can verify with the following queries, db.mytable.find( { _id: 123 } ) or db.mytable.find( { } ).sort( { _id: -1 } ) the query planner will show index scan even though you do not specify the hint.
The main purpose of the indexes is to make your queries run fast; it is about query performance. It has to be a query with filter predicate and/or a sort operation to use an index (and the fields used in the filter or sort must be indexed for performance). With the find method, in your case, without any of the two you are just accessing all the documents as they are in the collection and the index is of no use (and the query optimizer shows that in the plan).

How do i create an index in mongodb on a WHERE and ORDER query?

In mongo, When creating an index I am trying to figure out whether the following query would have an index on a) category_ids and status, OR b) category_ids, status and name???
Source.where(category_ids: [1,2,3], status: Status::ACTIVE).order_by(:name) # ((Ruby/Mongoid code))
Essentially, I am trying to figure out whether indexes should include the ORDER_BY columns? or only the WHERE clauses? Where could I read some more about this?
Yes, an index on thius particular query would be beneficial to the speed of the query. However there is one caveat here, the order of the index fields.
I have noticed you are using an $in there on category_ids. This link is particularly useful in understanding a little complexity which exists from using an $in with an index on the sort (or a sort in general in fact): http://blog.mongolab.com/2012/06/cardinal-ins/
Towards the end it gives you an indea of an optimal index order for your type of query:
The order of fields in an index should be:
First, fields on which you will query for exact values.
Second, fields on which you will sort.
Finally, fields on which you will query for a range of values.
For reference a couple of other helpful links are as follows:
http://docs.mongodb.org/manual/applications/indexes/
http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index
http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/
why does direction of index matter in MongoDB?
And, http://www.slideshare.net/kbanker/mongo-indexoptimizationprimer
These will help you get started on optimising your indexes and making them work for your queries.