Sparse Index in MongoDB not working? - mongodb

I'm trying to understand sparse index in MongoDB. I understand that if I do this :
> db.check.ensureIndex({"id":1},{sparse:true, unique:true})
I can only insert documents where the id field is not a duplicate and is also not absent.
Hence, I tried,
> db.check.insert({id:1})
> db.check.insert({id:1})
which, as I expected, gave :
E11000 duplicate key error index: test.check.$id_1 dup key: { : 1.0 }
However, inserting a document with a non-existant id field :
> db.check.insert({})
works! What is going wrong?

A sparse unique index means that a document doesn't need to have the indexed field(s), but when it has that field, it must be unique.
You can add any number of documents into the collection when the field is absent. Note that when you insert an empty document, the _id field will get an auto-generated ObjectID which is (as good as guaranteed) unique.

That's pretty much what sparse means. From the docs;
Sparse indexes only contain entries for documents that have the indexed field. [5] Any document that is missing the field is not indexed.
In other words, your missing id field makes the index not even consider that entry for a unique check.

Related

How to index and sorting with Pagination using custom field in MongoDB ex: name instead of id

https://scalegrid.io/blog/fast-paging-with-mongodb/
Example : {
_id,
name,
company,
state
}
I've gone through the 2 scenarios explained in the above link and it says sorting by object id makes good performance while retrieve and sort the results. Instead of default sorting using object id , I want to index for my own custom field "name" and "company" want to sort and pagination on this two fields (Both fields holds the string value).
I am not sure how we can use gt or lt for a name, currently blocked on how to resolve this to provide pagination when a user sort by name.
How to index and do pagination for two fields?
Answer to your question is
db.Example.createIndex( { name: 1, company: 1 } )
And for pagination explanation the link you have shared on your question is good enough. Ex
db.Example.find({name = "John", country = "Ireland"}). limit(10);
For Sorting
db.Example.find().sort({"name" = 1, "country" = 1}).limit(userPassedLowerLimit).skip(userPassedUpperLimit);
If the user request to fetch 21-30 first documents after sorting on Name then country both in ascending order
db.Example.find().sort({"name" = 1, "country" = 1}).limit(30).skip(20);
For basic understand of Indexing in MonogDB
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures, that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field.
Default _id Index
MongoDB creates a unique index on the _id field during the creation of a collection. The _id index prevents clients from inserting two documents with the same value for the _id field. You cannot drop this index on the _id field.
Create an Index
Syntax to execute on Mongo Shell
db.collection.createIndex( <key and index type specification>, <options> )
Ex:
db.collection.createIndex( { name: -1 } )
for ascending use 1,for descending use -1
The above rich query only creates an index if an index of the same specification does not already exist.
Index Types
MongoDB provides different index types to support specific types of data and queries. But i would like to mention 2 important types
1. Single Field
In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
2. Compound Index
MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
The order of fields listed in a compound index has significance. For instance, if a compound index consists of { name: 1, company: 1 }, the index sorts first by name and then, within each name value, sorts by company.
Source for my understanding and answer and to know more about MongoDB indexing MongoDB Indexing

Get Number of Documents in MongoDB Index

As per the title, I would like to know if there is a way to get the number of documents in a MongoDB index.
To be clear, I am not looking for either of the following:
How to get the number of documents in a collection -- .count().
How to get the size of an index -- .stats().
An index references all of the documents in its collection unless the index is a sparse index or a partial index. From the docs:
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is “sparse” because it does not include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing null values for those documents that do not contain the indexed field.
Partial indexes only index the documents in a collection that meet a specified filter expression
So ...
The answer for non sparse and non partial indexes is db.collection.count()
The answer for sparse and partial indexes could be inferred by running a query with no criteria, hinting on that index and then counting the results. For example:
db.collection.find().hint('index_name_here').count()

Mongodb indexing optional fields

I have some fields in my mongodb collection that are optional parts of a search. How can I index this query consistently (i.e. every query, regardless of parameters will use an index) if I don't know what fields the user might be querying?
You can use a Sparse Index
Sparse indexes only contain entries for documents that have the
indexed field, even if the index field contains a null value. The
index skips over any document that is missing the indexed field. The
index is “sparse” because it does not include all documents of a
collection. By contrast, non-sparse indexes contain all documents in a
collection, storing null values for those documents that do not
contain the indexed field.
db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )

Add _id when ensuring index?

I am building a webapp using Codeigniter (PHP) and MongoDB.
I am creating indexes and have one question.
If I am querying on three fields (_id, status, type) and want to
create an index do I need to include _id when ensuring the index like this:
db.comments.ensureIndex({_id: 1, status : 1, type : 1});
or will this due?
db.comments.ensureIndex({status : 1, type : 1});
You would need to explicitly include _id in your ensureIndex call if you wanted to include it in your compound index. But because filtering by _id already provides selectivity of a single document that's very rarely the right thing to do. I think it would only make sense if your documents are very large and you're trying to use covered indexes.
MongoDB will currently only use one index per query with the exception of $or queries. If your common query will always be searching on those three fields (_id, status, type) then a compound index would be helpful.
From within the DB shell you can use the explain() command on your query to get information on the indexes used.
You don't need to implicitly create index on the _id field, it's done automatically. See the mongo documentation:
The _id Index
For all collections except capped collections, an index is automatically created for the _id field. This index is special and cannot be deleted. The _id index enforces uniqueness for its keys (except for some situations with sharding).

sparse indexes and null values in mongo

I'm not sure I understand sparse indexes correctly.
I have a sparse unique index on fbId
{
"ns" : "mydb.users",
"key" : {
"fbId" : 1
},
"name" : "fbId_1",
"unique" : true,
"sparse" : true,
"background" : false,
"v" : 0
}
And I was expecting that would allow me to insert records with null as the fbId, but that throws a duplicate key exception. It only allows me to insert if the fbId property is removed completely.
Isn't a sparse index supposed to deal with that?
Sparse indexes do not contain documents that miss indexed field. However, if field exists and has value of null, it will still be indexed. So, if absense of the field and its equality to null look the same for your application and you want to maintain uniqueness of fbId, just don't insert it until you have a value for it.
You need sparse indexes when you have a large number of documents, but only a small portion of them contains some field, and you want to be able to quickly find documents by that field. Creating a normal index would be too expensive, you would just waste precious RAM on indexing documents you're not interested in.
To ensure maximum performance of the indexes, we may want to omit from indexing those documents NOT containing the field on which you are performing an index. To do this MongoDB has the sparse property that works as follows:
db.addresses.ensureIndex( { "secondAddress": 1 }, { sparse: true } );
This index will omit all the documents not containing the secondAddress field and when performing a query, those document will never be scanned.
Let me share this article about basic indexes and some of their properties:
Geospatial, Text, Hash indexes and unique and sparse properties: http://mongodbspain.com/en/2014/02/03/mongodb-indexes-part-2-geospatial-2d-2dsphere/
{a:1, b:5, c:2}
{a:8, b:15, c:7}
{a:4, b:7}
{a:3, b:10}
Let's assume that we wish to create an index on the above documents. Creating index on a & b will not be a problem. But what if we need to create an index on c. The unique constraint will not work for c keys because null value is duplicated for 2 documents. The solution in this case is to use sparse option. This option tells the database to not include the documents which misses the key. The command in concern is db.collectionName.createIndex({thing:1}, {unique:true, sparse:true}). The sparse index lets us use less space as well.
Notice that even if we have a sparse index, the database performs all documents scan especially when doing sort. This can be seen in the winning plan section of explain's result.
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is "sparse" because it does not include all documents of a collection.