Does MongoDB maintain index statistics (data distribution for index key column)? - mongodb

SQL Server makes use of index statistics in order to decide whether to make use of index or to perform direct table scan based on the selectivity of the where criteria. Statistics helps query optimizer to choose table scan over index seek/scan when the selectivity is very low.
Does MongoDB maintain index statistics the way SQL maintains? Does the performance suffer in MongoDB when the selectivity of the find criteria is very low? If yes, is there a way to deal with such queries?

As of the current version of MongoDB (2.4), statistics about each index key are not kept.
MongoDB query optimizer has a different approach to selecting which index to use (or whether to do a collection scan). The first time you run a particular query, if there are several indexes that could be used for the query, the query engine tries all of them in parallel and the one that finishes first wins (the others get killed off) - now this is a simplification but in a nutshell that's how a query plan is selected for the next X queries (query plans are periodically re-evaluated at various points).
You can read more about this in MongoDB documentation of indexes.

Related

Why $nin is slower than $in, Mongodb

I have collection with 5M documents with correct indexes.$in working perfect, but same query with $nin super slow...What of the reason of this?
Super fast:
{'tech': {'$in': ['Wordpress', 'wordpress', 'WORDPRESS']}}
Super slow..
{'tech': {'$nin': ['Wordpress', 'wordpress', 'WORDPRESS']}}
The following explanation is accurate only for Mongo versions prior to 3.2
Mongo v3.2 has all kinds of storage engine changes which improved performance on this issue.
Now $nin hash one important quality, which is it not a selective query, First let's understand what selectivity means:
Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
Now they even state it themselfs:
For instance, the inequality operators $nin and $ne are not very selective since they often match a large portion of the index. As a result, in many cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.
Back then selectivity was a big deal performance wise. This all leads us to your question, why isn't the index being used?
Well when Mongo is asked to create a query plan he preforms a "race" between all available query plans, one of which is a COLSCAN i.e collection scan where the first plan to find 101 documents wins. Due to the poor efficiency of non-selective query's the winning plan (And actually usually faster depending on the index and values in the query) is COLSCAN, Read further about this here
When you have an index (no matter if you talk about MongoDB or any other database), it is always faster to search for a certain value, than searching for a non-existing value.
The database has to scan the entire index, often the index is even not used when you look for "not in" or "not equal". Have a look at execution plan with explain()
Some databases (e.g. Oracle) provide so called Bitmap Indexes. They work differently and usually an IN operation is as fast as an NOT IN operation. But, as usual they have other drawbacks compared to B*Tree Indexes. According to my knowledge Oracle Database is the only major RDBMS which supports Bitmap Indexes.

How reliable is MongoDB's query optimizer?

According to MongoDB docs:
For a query, the MongoDB query optimizer chooses and caches the most
efficient query plan given the available indexes.
So if the query optimizer chooses one index over the other according to indexStats, is this a good enough evaluation to delete the unused index and keep only the preferred one?
Or are the edge cases where it makes sense to keep the index that is not preferred by the query optimizer and delete the preferred one?

How, When and Where Should MongoDB Index Types be Used?

Can any one help me when it is important to use MongoDB Index and where it can be used. Also I need advantages disadvantages of using MongoDB Index?
Can anyone help me when it is important to use MongoDB Index and where it can be used?
Indexes provide efficient access to your data.
Without having indexes in place for your queries, the query can scan more number of documents that it is expected to return. Having good indexes in place avoid scanning collections and more documents that what's required to return.
A well-designed set of indexes that cater the incoming queries to your database can significantly improve the performance of your database.
Also, I need disadvantages of using MongoDB Index
Indexes need memory and space to store. If the indexes are part of your working set. they will be stored in memory. Meaning that you may need sufficient memory to store indexes in-memory along with frequently accessed data.
Every update, delete and write operation needs update to the index data structure. Having too many indexes on a collection that involves keys in write, update or delete operation needs update to an existing index. It adds the penalty to write operations.
Having large number of compound index take more time to restore index in large datasets.

How can we create an Index on MongoDB?

I want to create an Index on Mongo database for performance perspective, so could you please help me how I can do it?
Your help will be appreciated here.
If you want to index on field email on users collection:
db.users.createIndex({"email":1}, {background:true})
Before applying indexing in mongodb collections you need to understand the following aspects of indexing:
Indexing strategy:
Check your application for what type of queries does it send to mongodb.
List down all such possible queries.
Based on the number of operations, type of operations define index type
Choose the correct type of indexes for application needs. Type can be single index, compound index, partial index, TTL index and so on
Do your queries involve the sort operations? Follow this guide on indexing for operations with sort.
The more detailed guide on indexing strategy here.
Test your indexes:
Once you have the list of indexes to be applied, test your indexes performance using explain.
Generate a sample application calls on your database and enable profiler (in dev or stag) to check how your indexes are performing.
How to index:
Create indexes in the background. It will make sure that the create index operation does not block the other operations.
Depending on your data size, if the indexes to be created on large collections, consider doing it in low traffic hours. Or in a scheduled maintenance window
You may need to consider building rolling index in certain use cases to minimize the impact of indexing.
Keep track of indexes you create:
Document your indexes. This may include when you have created those indexes, why and so on.
Measure your index usage stats in production:
Once you have applied these indexes in production, in a week or two check usage stas of your indexes to check whether they're really being used
Consider dropping the indexes if they're not used at all.
Caution:
Indexes add performance penalty for write operations. Design and apply indexes which are must for your application.
The basic syntax is:
db.collection.createIndex(keys, options)
So, for example:
$ db.users.createIndex({"username" : 1})
See MongoDB Indexes for the full details.

Index Vs Query Criteria

I am new to MongoDB. I have read that Indexes limit the documents to be scanned when we query for some data.
Reference:-http://docs.mongodb.org/manual/core/indexes-introduction
I am confusing it with the Query Criteria as it limits the data.
For example:- db.users.find({score:{"$lt":30}}).
In the manual this example is given and explained in the sense of indexes. What are indexes and how are they different than Query Criteria?
Thank you
Indexes in MongoDB are similar, but not identical to indexes in relational databases. So, to get a basic feel, you can think of the two. Query criteria defines what subset of documents your queries are interested in. An index may be able to USE the query criteria to answer the query faster.
Suppose you have a collection with no indexes, and you do db.users.find({score:{$lt:30}}). With no index, you will need to scan the entire collection to answer the query, processing all documents regardless of their value. With an index on 'score', the query will be able to use the index to drill down on ONLY the documents that match your query, thereby executing faster.
Query criteria limits the data that is send to the client from server but it has to scan each and every document for the matching. On the other hand Index limits the scanning of the documents by having special data structure(B-tree in mongodb).
Ref:-http://docs.mongodb.org/manual/core/indexes-introduction