MongoDB: Use several single indexes or a compound index - mongodb

I am a beginner at MongoDB.
I am using version 3.2.
I read in several places that MongoDB can use only one index in a query, but the pieces of information I found seem a little bit outdated, and I couldn't find something in the official docs.
I have a collection of ~500M products with this form:
{_id: ObjectId('574d92332a2b10d7618b4575'), title: A, category_id: ObjectId('574d92332a2b10d7618b4575'), price: 30.23, rating:5 },
{_id: ObjectId('574d92332a2b10d7618b4575'), title: B, category_id: ObjectId('574d92332a2b10d7618b4575'), price: 20.23, rating:3 },
{_id: ObjectId('574d92332a2b10d7618b4575'), title: C, category_id: ObjectId('574d92332a2b10d7618b4575'), price: 10.23, rating:4 }
I need to find all products per category, and sort it by rating, then by price, but the final user may also be wanting to just sort it by price directly.
Every single query will need the category_id to be passed, it is compulsery.
I created 3 indexes: {category_id:1}, {rating:1} and {price:1}.
These queries are fast:
Most expensive products per category
db.products.find({category_id:ObjectId('574d92332a2b10d7618b4575')}).sort({price:-1})
Best products per category
db.products.find({category_id:ObjectId('574d92332a2b10d7618b4575')}).sort({rating:-1})
Worst products per category
db.products.find({category_id:ObjectId('574d92332a2b10d7618b4575')}).sort({rating:1})
But this query is incredibly slow
Best products per category, then cheapest
db.products.find({category_id:ObjectId('574d92332a2b10d7618b4575')}).sort({rating:-1, price:1})
If you were me, which indexes would you create, and why?
I'm starting to think that having price and rating alone is stupid, because every query will need the category_id, so maybe my indexes should include category_id, but what confuses me is the last paragraph of the official doc about compound indexes.
I already read this whole section on the official page of MongoDB but I can't find an answer to my specific problem.

You should create compound indexes to satisfy your queries, and they should in most cases include your query terms and your sort criteria.
The confusing paragraph that I believe you are referring to is regarding when there are multiple sort criteria, e.g. a compound sort. When you have a compound sort, both the order and the direction of the index entries does matter. If you're only sorting by a single value, the direction of the index (1 or -1, ascending or descending) does not matter.
See this SO question for more details and examples. Another good resource is this Optimizing Compound Indexes blog post.
You might want to consider if you really need to allow such a compound sort, for your example it seems more common from most e-commerce sites that you'd only sort by either rating or price but not both.

Use compound index,
only one index is considered by Mongo at the time of query execution unless there is an OR condition and use .explain("executionStats") to see that.
db.collection.find({your query}).explain("executionStats")
if you execute the above query, you can find the "queryPlanner" object in result which has the detail of winningPlan(index finally considered) and rejectedPlan(all indexes which are initially considered but not good enough as winning one)

Related

Which MongoDB indexes should be created for different sorting and filtering conditions to improve performance?

I have MongoDB collection with ~100,000,000 records.
On the website, users search for these records with "Refinement search" functionality, where they can filter by multiple criteria:
by country, state, region;
by price range;
by industry;
Also, they can review search results sorted:
by title (asc/desc),
by price (asc/desc),
by bestMatch field.
I need to create indexes to avoid full scan for any of combination above (because users use most of the combinations). Following Equality-Sort-Range rule for creating indexes, I have to create a lot of indexes:
All filter combination × All sortings × All range filters, like the following:
country_title
state_title
region_title
title_price
industry_title
country_title_price
country_industry_title
state_industry_title
...
country_price
state_price
region_price
...
country_bestMatch
state_bestMatch
region_bestMatch
...
In reality, I have more criteria (including equality & range), and more sortings. For example, I have multiple price fields and users can sort by any of that prices, so I have to create all filtering indexes for each price field in case if the user will sort by that price.
We use MongoDB 4.0.9, only one server yet.
Until I had sorting, it was easier, at least I could have one compound index like country_state_region and always include country & state in the query when one searches for a region. But with sorting field at the end, I cannot do it anymore - I have to create all different indexes even for location (country/state/region) with all sorting combinations.
Also, not all products have a price, so I cannot just sort by price field. Instead, I have to create two indexes: {hasPrice: -1, price: 1}, and {hasPrice: -1, price: -1} (here, hasPrice is -1, to have records with hasPrice=true always first, no matter price sort direction).
Currently, I use the NodeJS code to generate indexes similar to the following (that's simplified example):
for (const filterFields of getAllCombinationsOf(['country', 'state', 'region', 'industry', 'price'])) {
for (const sortingField of ['name', 'price', 'bestMatch']) {
const index = {
...(_.fromPairs(filterFields.map(x => [x, 1]))),
[sortingField]: 1
};
await collection.ensureIndex(index);
}
}
So, the code above generates more than 90 indexes. And in my real task, this number is even more.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Thanks!
Firstly, in MongoDB (Refer: https://docs.mongodb.com/manual/reference/limits/), a single collection can have no more than 64 indexes. Also, you should never create 64 indexes unless there will be no writes or very minimal.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Without sacrificing either of functionality and query performance, you can't.
Few things you can do: (assuming you are using pagination to show results)
Create a separate (not compound) index on each column and let MongoDB execution planner choose index based on meta-information (cardinality, number, etc) it has. Of course, there will be a performance hit.
Based on your judgment and some analytics create compound indexes only for combinations which will be used most frequently.
Most important - While creating compound indexes you can let off sort column. Say you are filtering based on industry and sorting based on price. If you have a compound index (industry, price) then everything will work fine. But if you have index only on the industry (assuming paginated results), then for first few pages query will be quite fast, but will keep degrading as you move on to next pages. Generally, users don't navigate after 5-6 pages. Also, you have to keep in mind for larger skip values, the query will start to fail because of the 32mb memory limit for sorting. This can be overcome with aggregation (instead of the query) with allowDiskUse enable.
Check for keyset pagination (also called seek method) if that can be used in your use-case.

MongoDB Single or Compound index for sorted query?

I'm using mongoose and I have a query like this:
const stories = await Story.find({ genre: 'romance' }).sort({ createdAt: -1 })
I want to set an index on Story so that this kind of query becomes faster.
Which one of these is the best approach and why:
1. Create one Compound index with both fields:
Story.createIndex({genre: 1, createdAt: -1})
2. Create two separate indexes on each field:
Story.createIndex({genre: 1})
Story.createIndex({createdAt: -1})
If "genre" is always going to be part of the search field, using a compound index will always result in better performance.
1.) A Compound index consisting of the field being searched and the field that it is being sorted on can satisfy both the conditions.
2.) Creating more than one indexes assumes that, both the indexes will be used while fulfilling the query, that is not true. Index intersection concept is only applicable in a few circumstances. In this particular instance, since we have one field in the search criteria and another in sort, index intersection will not be employed by Mongo. (Link).
So in this situation, I would go with the compound index.
As long as all your queries that need to use the createdAt field, use the genre field as well you should use the compound index.
Lets compare the two options:
Queries: As long as what i stated above holds both queries will behave the same, There is no difference between those two when it comes to query execution speed.
Memory: A compound index will use less memory, which is crucial if you have limited RAM space. lets see the difference with an example:
Lets have 3 documents:
{
name: "john",
last_name: "mayer"
}
{
name: "john",
last_name: "cake"
}
{
name: "banana",
last_name: "pie"
}
Now if we run db.collection.stats() on option 1 the compound index we get:
totalIndexSize: 53248.0
On the contrary for option 2:
totalIndexSize: 69632.0
Inserting: full disclosure I have no idea how each is affected. from small tests it seems that a compound index is slightly quicker, however I could not really find documentation on this nor did I investigate deeper.

Mongoose indexes at both field and schema levels

I understand that indexing can be a valuable tool for quickly retrieving data, if implemented properly. I would like to be able to scan my documents for a certain field value or a combination of field values.
There are two fields I would be indexing (category, tags). Category is a string and tags is an array. I need to be able to query for items in a specific category and/or items that contain a specific tag.
Here are three examples:
Show me all of the documents in the category: "cars"
Show me all of the documents that contain the tag: "electric"
Show me all of the documents in the "cars" category that contain the "electric" tag
Will a schema level index for both fields suffice for all three scenarios?
docSchema.index({category:1, tags:1});
Or do I also need to define them at the field level, to support the scenarios when I am only searching through a single field?
docSchema = mongoose.Schema({
category: {
type: String,
index: true
},
tags: {
type: [String],
index: true
}
});
docSchema.index({category:1, tags:1}); is a compound index.
This compound index supports the scenarios 1 and 3:
-> Show me all of the documents in the category: "cars"
-> Show me all of the documents in the "cars" category that contain the "electric" tag
To support scenario 2 you will need to define an additional single index on the tag field.
docSchema.index({tags:1});
A compound index supports queries that involve all fields in the compound index as well as queries that involve a prefix of the compound index. In this case your compound index supports queries involving both categories and tags as well as queries involving just categories.
To better understand the logic please take a look at the Compound Indexes articles on MongoDB documentation site. Pay special attention to the section that talks about Prefixes.
You need an single field index on category and a multikey index on tags. You might be tempted to use a compound index instead of one of them. But it is not mandatory if you are using MongoDB >= 2.6, as it has a nice feature called index intersection.
Show me all of the documents in the category: "cars"
Show me all of the documents that contain the tag: "electric"
Show me all of the documents in the "cars" category that contain the "electric" tag
(1) will use the index on category (incl. any index having category as a prefix)
(2) will use the index on tags (incl. any index having tags as a prefix)
(3) will use the index on tags or the index on category or the index intersection of both of them (depending the choice of the query planner).
As a reference, there is a nice discussion about index intersection in the MongoDB blog. Worth reading the entire article. But to quote the conclusion, mostly comparing index intersection to compound indexes:
To be clear, compound indexing will ALWAYS be more performant [than index intersection] IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.

searching with multiple parameters with mongodb

How is fine-grained search achiveable with mongodb, without the use of external engines? Take this object as example
{
genre: 'comedy',
pages: 380,
year: 2013,
bestseller: true,
author: 'John Doe'
}
That is being searched by the following:
db.books.find({
pages: { '&gt': 100 },
year: { '&gt': 2000 },
bestseller: true,
author: "John Doe"
});
Pretty straightforward so far. Now suppose that there are a bit more values in the document, and that I am making more refined searches and I have a pretty big collection.
First thing I would do is to create indexes. But, how does it work? I have read that the index intersection, as defined in here https://jira.mongodb.org/browse/SERVER-3071 is not doable. That means that if I set the index to "year" and "pages" I will not really optimize the AND operations in searches.
So how can the searches be optimized for having many parameters?
Thanks in advance.
It seems like you are asking about compound indexes in mongodb. Compound indexes allow you to create a single index on multiple fields in a document. By creating compound indexes you can make these large/complex queries while still using an index.
On a more general note, if you create a basic index on a field that is highly selective, your search can end up being very quick. Using your example, if you had an index on author, the query engine would use that index to find all the entries where author == "John Doe". Presumably there are not that many books with that specific author relative to the number of books in the entire collection. So, even if the rest of your query is fairly complex, it is only evaluated over those few documents with the matching author. Thus, by structuring your indexes properly you can get a significant performance gain without having to have any complex indexes.

how to structure a compound index in mongodb

I need some advice in creating and ordering indexes in mongo.
I have a post collection with 5 properties:
Posts
status
start date
end date
lowerCaseTitle
sortOrder
Almost all the posts will have the same status of 1 and only a handful will have a rejected status. All my queries will filter on status, start and end dates, and sort on sortOrder. I also will have one query that does a regex search on the title.
Should I set up a compound key on {status:1, start:1, end:1, sort:1}? Does it matter which order I put the fields in the compound index - should I put status first in the compound index since it's the most broad? Is it better to do a compound index rather than a single index on each property? Does mongo only use a single index on any given query?
Are there any hints for indexes on lowerCaseTitle if I'm doing a regex query on that?
sample queries are:
db.posts.find({status: {$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
db.posts.find( {lowerCaseTitle: /japan/, status:{$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})
That's a lot of questions in one post ;) Let me go through them in a practical order :
Every query can use at most one index (with the exception of top level $or clauses and such). This includes any sorting.
Because of the above you will definitely need a compound index for your problem rather than seperate per-field indexes.
Low cardinality fields (so, fields with very few unique values across your dataset) should usually not be in the index since their selectivity is very limited.
Order of the fields in your compound index matter, and so does the relative direction of each field in your compound index (e.g. "{name:1, age:-1}"). There's a lot of documentation about compound indexes and index field directions on mongodb.org so I won't repeat all of it here.
Sorts will only use the index if the sort field is in the index and is the field in the index directly after the last field that was used to select the resultset. In most cases this would be the last field of the index.
So, you should not include status in your index at all since once the index walk has eliminated the vast majority of documents based on higher cardinality fields it will at most have 2-3 documents left in most cases which is hardly optimized by a status index (especially since you mentioned those 2-3 documents are very likely to have the same status anyway).
Now, the last note that's relevant in your case is that when you use range queries (and you are) it'll not use the index for sorting anyway. You can check this by looking at the "scanAndOrder" value of your explain() once you test your query. If that value exists and is true it means it'll sort the resultset in memory (scan and order) rather than use the index directly. This cannot be avoided in your specific case.
So, your index should therefore be :
db.posts.ensureIndex({start:1, end:1})
and your query (order modified for clarity only, query optimizer will run your original query through the same execution path but I prefer putting indexed fields first and in order) :
db.posts.find({start: {$lt: today}, end: {$gt: today}, status: {$gte:0}}).sort({sortOrder:1})