Does using a partial index over a regular field index affect overall insert performance for mongo collections? - mongodb

I want to implement a partial index that indexes a subset of documents in my collection where documents in this subset contain a shared field not used in other documents outside the subset. Call this shared field "subset_field".
I know that using a partial index reduces the size of the index in memory, but when I want to insert new documents that do not have the shared "subset_field", will the partial index slow down the insert operation as much as a regular field index would?

The documentation on partial indexes only mentions insertation in regards of unique fields:
A partial index with a unique constraint does not prevent the insertion of documents that do not meet the unique constraint if the documents do not meet the filter criteria.
But the doc on Write Op Performance does have a big note saying
MongoDB only updates a sparse or partial index if the documents involved in a write operation are included in the index.
So, the partial index is being used to check if it applies to the document but if it doesn't apply then the index update is skipped hence your performance improves.

Related

How does mongodb decide which index to use for a query?

When a certain query is done on a mongodb collection, if there are multiple indexes that can be used to perform the query, how does mongodb choose the index for the query?
for an example, in a 'order' collection, if there are two indexes for columns 'customer' and 'vendor', and a query is issued with both customer and vendor specified, how does mongodb decide whether to use the customer index or the vendor index?
Is there a way to instruct mongodb to prefer a certain index over another, for a given query?
When a certain query is done on a mongodb collection, if there are
multiple indexes that can be used to perform the query, how does
mongodb choose the index for the query?
You can generate a query plan for a query you are trying to analyze - see what indexes are used and how they are used. Use the explain method for this; e.g. db.collection.explain().find(). The explain takes a parameter with values "queryPlanner" (the default), "executionStats" and "allPlansExecution". Each of these have different plan output.
The query optimizer generates plans for all the indexes that could be used for a given query. In your example order collection, the two single field indexes (one each for the fields customer and vendor) are possible candidates (for a query filter with both the fields). The optimizer uses each of the plans and executes them for a certain period of time and chooses the best performing candidate (this is determined based upon factors like - which returned most documents in least time, and other factors). Based upon this it will output the winning and rejected plans and these can be viewed in the plan output. You will see one of the indexes in the winning plan and the other in the rejected plan in the output.
MongoDB caches the plans for a given query shape. Query plans are cached so that plans need not be generated and compared against each other every time a query is executed.
Is there a way to instruct mongodb to prefer a certain index over
another, for a given query?
There are couple of ways you can use:
Force MongoDB to use a specific index using the hint() method.
Set Index Filters to specify which indexes the optimizer will evaluate for a query shape. Note that this setting is not persisted after a server shutdown.
Their official website states:
MongoDB uses multikey indexes to index the content stored in arrays. If you index a field that holds an array value, MongoDB creates separate index entries for every element of the array. These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays.
You can checkout This article for more information
For your second query, you can try creating custom indexes for documents. Checkout their Documentation for the same

MongoDB Indexing a field which may not exist

I have a collection which has an optional field xy_id. About 10% of the documents (out of 500k) does not have this xy_id field.
I have quite a lot of queries to this collection like find({xy_id: <id>}).
I tried indexing it normally (.createIndex({xy_id: 1}, {"background": true})) and it does improve the query speed.
Is this the correct way to index the field in this case? or should I be using a sparse index or another way?
Yes, this is the correct way. The default behaviour of MongoDB is serving well in this case. You can see in the docs that index creation supports an unique flag, which is false by default. All your documents missing the index key will be indexed under a single index entry. Queries can use this index in all cases because all the documents are indexed.
On the other hand, if you use sparse index the documents missing the index key will not be indexed at all. Some operations such as count, sort and other queries will not be able to use the sparse index unless explicitly hinted to do so. If explicitly hinted, you should be okay with incorrect results - the entries not in the index will be omitted in the result. You can read about it here.

Get Number of Documents in MongoDB Index

As per the title, I would like to know if there is a way to get the number of documents in a MongoDB index.
To be clear, I am not looking for either of the following:
How to get the number of documents in a collection -- .count().
How to get the size of an index -- .stats().
An index references all of the documents in its collection unless the index is a sparse index or a partial index. From the docs:
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is “sparse” because it does not include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing null values for those documents that do not contain the indexed field.
Partial indexes only index the documents in a collection that meet a specified filter expression
So ...
The answer for non sparse and non partial indexes is db.collection.count()
The answer for sparse and partial indexes could be inferred by running a query with no criteria, hinting on that index and then counting the results. For example:
db.collection.find().hint('index_name_here').count()

Fundamental misunderstanding of MongoDB indices

So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

DB Compound indexing best practices Mongo DB

How costly is it to index some fields in MongoDB,
I have a table where i want uniqueness combining two fields, Every where i search they suggested compound index with unique set to true. But what i was doing is " Appending both field1_field2 and making it a key, so that field2 will be always unique for field1.(and add Application logic) As i thought indexing is costly.
And also as MongoDB documentation advices us not to use Custom Object ID like auto incrementing number, I end up giving big numbers to Models like Classes, Students etc, (where i could have used easily used 1,2,3 in sql lite), I didn't think to add a new field for numbering and index that field for querying.
What are the best practices advice for production
The advantage of using compound indexes vs your own indexed field system is that compound indexes allows sorting quicker than regular indexed fields. It also lowers the size of every documents.
In your case, if you want to get the documents sorted with values in field1 ascending and in field2 descending, it is better to use a compound index. If you only want to get the documents that have some specific value contained in field1_field2, it does not really matter if you use compound indexes or a regular indexed field.
However, if you already have field1 and field2 in seperate fields in the documents, and you also have a field containing field1_field2, it could be better to use a compound index on field1 and field2, and simply delete the field containing field1_field2. This could lower the size of every document and ultimately reduce the size of your database.
Regarding the cost of the indexing, you almost have to index field1_field2 if you want to go down that route anyways. Queries based on unindexed fields in MongoDB are really slow. And it does not take much more time adding a document to a database when the document has an indexed field (we're talking 1 millisecond or so). Note that adding an index on many existing documents can take a few minutes. This is why you usually plan the indexing strategy before adding any documents.
TL;DR:
If you have limited disk space or need to sort the results, go with a compound index and delete field1_field2. Otherwise, use field1_field2, but it has to be indexed!