Optimizing MongoDB indexing (two fields query) - mongodb

I have two fields scheduledStamp and email in a mongodb collection called inventory.
Having the following jpa query:
fun findAllByScheduledStampAfterAndEmailEquals(scheduledStamp:Long,email:String):List<Inventory>
What is the best way to index this collection?
I want to have less indexes as possible, avoiding unnecessary indexes.
Knowing that:
This collection can have more than million entries (index is needed)
Querying by:
db.inventory.find({ scheduledStamp: {$gt:1594048295294}})
for sure results few entries
Querying by:
db.inventory.find({ email: "abc#gmail.com"})
for sure results few entries

If you need to support query only on email : Indexing email is must
If you need to support query only on scheduledStamp: Indexing scheduledStamp is must
If you want of query on both, a third index is required. But you can create a compound index to cover this query and one of the above queries.
Since Mongo follows prefix match for selecting index:
You may have index on {"email":1} and {"scheduledStamp:1","email":1}
OR
You may have index on {"scheduledStamp":1} and {"email:1","scheduledStamp":1}
But since you said these fields return few documents:
Just having 2 indexes on {"email":1} and {"scheduledStamp":1} may perform good if not optimum.

Related

How does mongodb decide which index to use for a query?

When a certain query is done on a mongodb collection, if there are multiple indexes that can be used to perform the query, how does mongodb choose the index for the query?
for an example, in a 'order' collection, if there are two indexes for columns 'customer' and 'vendor', and a query is issued with both customer and vendor specified, how does mongodb decide whether to use the customer index or the vendor index?
Is there a way to instruct mongodb to prefer a certain index over another, for a given query?
When a certain query is done on a mongodb collection, if there are
multiple indexes that can be used to perform the query, how does
mongodb choose the index for the query?
You can generate a query plan for a query you are trying to analyze - see what indexes are used and how they are used. Use the explain method for this; e.g. db.collection.explain().find(). The explain takes a parameter with values "queryPlanner" (the default), "executionStats" and "allPlansExecution". Each of these have different plan output.
The query optimizer generates plans for all the indexes that could be used for a given query. In your example order collection, the two single field indexes (one each for the fields customer and vendor) are possible candidates (for a query filter with both the fields). The optimizer uses each of the plans and executes them for a certain period of time and chooses the best performing candidate (this is determined based upon factors like - which returned most documents in least time, and other factors). Based upon this it will output the winning and rejected plans and these can be viewed in the plan output. You will see one of the indexes in the winning plan and the other in the rejected plan in the output.
MongoDB caches the plans for a given query shape. Query plans are cached so that plans need not be generated and compared against each other every time a query is executed.
Is there a way to instruct mongodb to prefer a certain index over
another, for a given query?
There are couple of ways you can use:
Force MongoDB to use a specific index using the hint() method.
Set Index Filters to specify which indexes the optimizer will evaluate for a query shape. Note that this setting is not persisted after a server shutdown.
Their official website states:
MongoDB uses multikey indexes to index the content stored in arrays. If you index a field that holds an array value, MongoDB creates separate index entries for every element of the array. These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays.
You can checkout This article for more information
For your second query, you can try creating custom indexes for documents. Checkout their Documentation for the same

MongoDB Indexing a field which may not exist

I have a collection which has an optional field xy_id. About 10% of the documents (out of 500k) does not have this xy_id field.
I have quite a lot of queries to this collection like find({xy_id: <id>}).
I tried indexing it normally (.createIndex({xy_id: 1}, {"background": true})) and it does improve the query speed.
Is this the correct way to index the field in this case? or should I be using a sparse index or another way?
Yes, this is the correct way. The default behaviour of MongoDB is serving well in this case. You can see in the docs that index creation supports an unique flag, which is false by default. All your documents missing the index key will be indexed under a single index entry. Queries can use this index in all cases because all the documents are indexed.
On the other hand, if you use sparse index the documents missing the index key will not be indexed at all. Some operations such as count, sort and other queries will not be able to use the sparse index unless explicitly hinted to do so. If explicitly hinted, you should be okay with incorrect results - the entries not in the index will be omitted in the result. You can read about it here.

Fundamental misunderstanding of MongoDB indices

So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

Where does compound indexes in mongodb come into play

What are the advantages we get from compound indexes. I mean suppose we have a collection, in which I have to index over 2 fields say key1 and key2. How different is it from having a compound index {key1:1, key2:1}. Whats the problem with having 2 separate indexes. Can't mongodb make use of 2 or more indexes to satisfy a query.
As at MongoDB 2.2:
Every query, including update operations, use one and only one index.
The query optimizer selects the index empirically by occasionally running alternate query plans and by selecting the plan with the best response time for each query type.
An exception to the above rule is $or queries; each clause is executed in parallel and can use a separate index.
For more information see:
Indexing Overview
Query Optimizer
Explain

how to structure a compound index in mongodb

I need some advice in creating and ordering indexes in mongo.
I have a post collection with 5 properties:
Posts
status
start date
end date
lowerCaseTitle
sortOrder
Almost all the posts will have the same status of 1 and only a handful will have a rejected status. All my queries will filter on status, start and end dates, and sort on sortOrder. I also will have one query that does a regex search on the title.
Should I set up a compound key on {status:1, start:1, end:1, sort:1}? Does it matter which order I put the fields in the compound index - should I put status first in the compound index since it's the most broad? Is it better to do a compound index rather than a single index on each property? Does mongo only use a single index on any given query?
Are there any hints for indexes on lowerCaseTitle if I'm doing a regex query on that?
sample queries are:
db.posts.find({status: {$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
db.posts.find( {lowerCaseTitle: /japan/, status:{$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
That's a lot of questions in one post ;) Let me go through them in a practical order :
Every query can use at most one index (with the exception of top level $or clauses and such). This includes any sorting.
Because of the above you will definitely need a compound index for your problem rather than seperate per-field indexes.
Low cardinality fields (so, fields with very few unique values across your dataset) should usually not be in the index since their selectivity is very limited.
Order of the fields in your compound index matter, and so does the relative direction of each field in your compound index (e.g. "{name:1, age:-1}"). There's a lot of documentation about compound indexes and index field directions on mongodb.org so I won't repeat all of it here.
Sorts will only use the index if the sort field is in the index and is the field in the index directly after the last field that was used to select the resultset. In most cases this would be the last field of the index.
So, you should not include status in your index at all since once the index walk has eliminated the vast majority of documents based on higher cardinality fields it will at most have 2-3 documents left in most cases which is hardly optimized by a status index (especially since you mentioned those 2-3 documents are very likely to have the same status anyway).
Now, the last note that's relevant in your case is that when you use range queries (and you are) it'll not use the index for sorting anyway. You can check this by looking at the "scanAndOrder" value of your explain() once you test your query. If that value exists and is true it means it'll sort the resultset in memory (scan and order) rather than use the index directly. This cannot be avoided in your specific case.
So, your index should therefore be :
db.posts.ensureIndex({start:1, end:1})
and your query (order modified for clarity only, query optimizer will run your original query through the same execution path but I prefer putting indexed fields first and in order) :
db.posts.find({start: {$lt: today}, end: {$gt: today}, status: {$gte:0}}).sort({sortOrder:1})