We have a collection with millions of data. This data is being rendered in the UI for stats purpose and hence time to render is of key importance.
The queries to render the data involve the below fields:
field_a and field_t
field_b and field_t
field_c and field_t
As we are querying millions of data, we want to use Compound Index to speed up the queries.
To do so, we can simply add 3 different compound indexes as below:
db.mycollection.createIndex( { "field_a": 1, "field_t": 1 }
db.mycollection.createIndex( { "field_b": 1, "field_t": 1 }
db.mycollection.createIndex( { "field_c": 1, "field_t": 1 }
ESR rule is respected while creating the indexes as field_a, field_b and field_c are equality checks and field_t is a range check.
Please note that field_t is common in all the 3 indexes.
Instead of creating 3 different indexes, is there a better approach to this?
Does mongo provide a more efficient way to handle this scenario where same field is being used in multiple compound indexes?
Better or more efficient in what regard?
Having the three indexes that you mentioned is the most efficient approach in terms of query performance. They will allow the database to process only the data that is relevant for each query and nothing else. Any other approach would reduce read efficiency (and speed) which may not be a good tradeoff.
Most databases, MongoDB included, typically use a single index when executing a query. This is mostly a consequence of how indexes work. Typically indexes use a B-tree like data structure, which is an ordered set of information. When following the ESR rule (placing equality conditions before range conditions), all of the information for a specific query is contained within a single bounded subtree in the index which can be directly traversed. It loses the ability to do this when the index is not structured in this way (including putting range keys first).
Other potential approaches using single field indexes would be things like:
Index intersection - where you create (in this case) 4 single field indexes and have the database use 2 for each query. MongoDB typically does not choose this approach very often as it results in scanning larger portions of the index when compared to the compound index approach above.
Using 1 single field index for each query - the database would end up retrieving documents to filter on the other field which could be quite inefficient depending on the selectivity of the other field.
While these may reduce the overall size of the collective indexes, it increases the cost (and decreases the efficiency) of executing the queries. Depending on what you are optimizing for, the approach you've outlined would be considered a best practice in terms of query efficiency.
Related
mongoDB document example has two fields userid and score. And while indexing in mongoDB for userid , 1 value provided and for score its -1, i.e. { userid: 1, score: -1 }. Please explain here the difference between 1 and -1 in mongoDB indexing.
I tried indexing in mongoDB to make queries faster but with less knowledge of indexing in mongoDB I am not getting expected output
As #deceze suggests in their comment, the basic answer to the question is mentioned in the documentation. The answer is that the value determines whether the key is ascending (1) or descending (-1). But what does this actually mean and is it important?
Many databases use a type of B-tree data structure as the backbone for their (standard) indexes. These data structures are ordered by nature. This specific property is one of the things that make them ideal for supporting a wide range of database queries. For example, it allows them to bound a range scan (such as $lte) as it knows that there will be no more matching values later in the index once it scans to the first key that exceeds the requested maximum value. Similarly, if the definition of the index is compatible with the sort order requested by the client then the database can satisfy the sort for "free" as it scans the index.
For single field indexes as well as sorts on a single field, the direction specified in the index definition doesn't matter. This is because indexes can be traversed in either direction. So if your index is on { timestamp: 1 } but the sort is on { timestamp: -1 }, then the database can choose to traverse the index backwards to satisfy the requested sort.
Directionality becomes important when compound sorts are involved. If the index is defined as { score: 1, timestamp: 1 } then the database will either be less efficient or completely unable to use it to satisfy a sort on { score: 1, timestamp: -1 }. This is a direct consequence of the structure of the underlying b-tree data structure that the index is built on. More information can be found here in the documentation.
I tried indexing in mongoDB to make queries faster but with less knowledge of indexing in mongoDB I am not getting expected output
This is a much different and more specific question than the one that you have actually asked. If there is a particular query that you are trying to improve the performance of, then it may be beneficial to post a question with that query (and the associated .explain() output). That would allow us to advise on actionable steps to help you improve the performance of the operation (or understand why it cannot be done), as opposed to having broader discussions about index definitions which may or may not be relevant to your issue.
I have read about indexing in DB, the point i don't understand is that why do we need to specify index to a particular column, if it makes the search fast, why it is not by default to all columns? is it possible to create multiple indexes in one table?
To resume :
Using indexes improves performances on read query, because mongodb doesn't read entire collection when searching documents. It also improves sorting.
But these improvments have a cost :
Indexes use disk/memory space.
Delete, insert and update operations will be longer : on each insert, delete or update operation, mongodb must update all you concerned indexes.
There are multiple indexes type, and some of them (compound index ie) can have planty of combinations
For these reasons (but not only), by default only index on _id field (as it need to be unique) is created on collection creation.
If there are n no. of indices and when you perform any save/update operation that modifies keys, it does it n times, hence produces an excessive write lock. When you will perform such operation, you will observe that reads would be faster with no indexes than when trying to update consistently with too many indexes. So in order to perform indexing, you should keep track of indexes else there would be a great performance issue (sake of RAM and write lock).
I want to create an Index on Mongo database for performance perspective, so could you please help me how I can do it?
Your help will be appreciated here.
If you want to index on field email on users collection:
db.users.createIndex({"email":1}, {background:true})
Before applying indexing in mongodb collections you need to understand the following aspects of indexing:
Indexing strategy:
Check your application for what type of queries does it send to mongodb.
List down all such possible queries.
Based on the number of operations, type of operations define index type
Choose the correct type of indexes for application needs. Type can be single index, compound index, partial index, TTL index and so on
Do your queries involve the sort operations? Follow this guide on indexing for operations with sort.
The more detailed guide on indexing strategy here.
Test your indexes:
Once you have the list of indexes to be applied, test your indexes performance using explain.
Generate a sample application calls on your database and enable profiler (in dev or stag) to check how your indexes are performing.
How to index:
Create indexes in the background. It will make sure that the create index operation does not block the other operations.
Depending on your data size, if the indexes to be created on large collections, consider doing it in low traffic hours. Or in a scheduled maintenance window
You may need to consider building rolling index in certain use cases to minimize the impact of indexing.
Keep track of indexes you create:
Document your indexes. This may include when you have created those indexes, why and so on.
Measure your index usage stats in production:
Once you have applied these indexes in production, in a week or two check usage stas of your indexes to check whether they're really being used
Consider dropping the indexes if they're not used at all.
Caution:
Indexes add performance penalty for write operations. Design and apply indexes which are must for your application.
The basic syntax is:
db.collection.createIndex(keys, options)
So, for example:
$ db.users.createIndex({"username" : 1})
See MongoDB Indexes for the full details.
I'm looking for an advice about which indexing strategy to use in MongoDb 3.4.
Let's suppose we have a people collection of documents with the following shape:
{
_id: 10,
name: "Bob",
age: 32,
profession: "Hacker"
}
Let's imagine that a web api to query the collection is exposed and that the only possibile filters are by name or by age.
A sample call to the api will be something like: http://myAwesomeWebSite/people?name="Bob"&age=25
Such a call will be translated in the following query: db.people.find({name: "Bob", age: 25}).
To better clarify our scenario, consider that:
the field name was already in our documents and we already have an index on that field
we are going to add the new field age due to some new features of our application
the database is only accessible via the web api mentioned above and the most important requirement is to expose a super fast web api
all the calls to the web api will apply a filter on both the fields name and age (put another way, all the calls to the web api will have the same pattern, which is the one showed above)
That said, we have to decide which of the following indexes offer the best performance:
One compound index: {name: 1, age: 1}
Two single-field indexes: {name: 1} and {age: 1}
According to some simple tests, it seems that the single compound index is much more performant than the two single-field indexes.
By executing a single query via the mongo shell, the explain() method suggests that using a single compound index you can query the database nearly ten times faster than using two single fields indexes.
This difference seems to be less drammatic in a more realistic scenario, where instead of executing a single query via the mongo shell, multiple calls are made to two different urls of a nodejs web application. Both urls execute a query to the database and return the fetched data as a json array, one using a collection with the single compound index and the other using a collection with two single-field indexes (both collections having exactly the same documents).
In this test the single compound index still seems to be the best choice in terms of performance, but this time the difference is less marked.
According to test results, we are considering to use the single compound index approach.
Does anyone has experience about this topic ? Are we missing any important consideration (maybe some disadvantage of big compound indexes) ?
Given a plain standard query (with no limit() or sort() or anything fancy applied) that has a filter condition on two fields (as in name and age in your example), in order to find the resulting documents, MongoDB will either:
do a full collection scan (read every document in the entire collection, parse the BSON, find the values in question, test them against the input and return/discard each document): This is super I/O intense and hence slow.
use one index that holds one of the fields (use index tree to locate relevant subset of documents followed by a scan of them): Depending on your data distribution/index selectivity this can be very fast or barely provide any benefit (imagine an index on age in a dataset of millions of people between 30 and 40 years --> every lookup would still yield an endless number of documents).
use two indexes that together contain both fields in question (load both indexes, perform key lookups, then calculate the intersection of the results): Again, depending on your data distribution, this may or may not give you great(er) performance. It should, however, in most cases be faster than #2. I would, however, be surprised if it was really 10x slower then #4 (as you mentioned).
use a compound index (two subsequent key lookups immediately lead to the required documents): This will be the fastest option of all given that it requires the least and cheapest operations to get to the right documents. In order to ensure the greatest level of reuse (not performance which won't be affected by this) you should in general start with the most selective field first, so in your case probably name and not age given that a lot of people will have the same age (so low selectivity) compared to name (higher selectivity). But that choice also depends on your concrete scenario and the queries you intend to run against your database. There is a pretty good article on the web about how to best define a compound index taking various aspects of your specific situation into account: https://emptysqua.re/blog/optimizing-mongodb-compound-indexes
Other aspects to consider are: Index updates come at a certain price. However, if all you care about is raw read speed and you only have a few updates every now and again, then you should go for more/bigger indexes.
And last but not least (!) the well over-used bottom line advice: Profile the hell out of your system using real data and perhaps even realistic load scenarios. And also keep measuring as your data/system changes over time.
Additional reads:
https://docs.mongodb.com/manual/core/query-optimization/index.html
https://dba.stackexchange.com/questions/158240/mongodb-index-intersection-does-not-eliminate-the-need-for-creating-compound-in
Index intersection vs. compound index?
mongodb compund index vs. index intersect
How does the order of compound indexes matter in MongoDB performance-wise?
In MongoDB, I am using a large query, how I will create compound index or single index, So My response time boost up
I am designing my database with MongoDb thinking in the scalability in the future. My main concern right now is about representing the indexes, as I have read, it is a crucial factor while scaling huge collections, in terms of RAM consumption, and sharding efficiency.
For simplicity, I have two different collections. A user collection which stores the user username, email, and some metadata, and a devices collection, that contains a device name, some metadata, and should be related with its owner. One user can have millions of devices (so it is not worth to store all in a single user document).
The devices collection should support queries in term of the whole device identifier by (username, device_name), or also by the username.
In this case I see some different approaches for storing the indexes:
Use a secondary compound index with username and device_name (in this order)
Use a primary index with and _id containing an string with username#device_name
Use an object in the _id field with both values {owner:username, device:device_name}
For testing this indexes, I have done some server load. I have created three different collections with this different alternatives and filled 5M documents. Some data:
I do not use the automatically generated _id created by mongo, as all my queries requires username/device. So this approach takes some extra space for indexing. The index size is 524MB. It is efficient while querying both by user or by user/device.
As I am replacing the _id with my own string, the index takes less space. In this case 352MB. I am still able to query efficiently by user (with a regex like /^username#/ the explain() reports almost the same results like in 1 in), and by the exact username/device.
The _id index cannot be changed to a compound index, so it is required to create a secondary compound index with {_id.owner, _id.device}. This results in a huge index size of 1059MB!. Queries goes well as in previous cases.
So, I can discard alternative 3, as this is not so much efficient. Between alternative 1 and 2, I prefer 1 as this approach is more clean, but it uses a _id field I will not use. So at this moment, the winning approach seems to be the number 2, as it allows me query efficiently by username or username/device, and it also takes less index space.
Is there a good reason to not use number 2 and follow with number 1, like when selecting the sharding key? Is there something I am missing? I am new with mongoDB and do not want to have problems when scaling my schema.