Mongodb sparse index and general index - mongodb

I have created a collection with 100 documents (fields x & y), and created a normal index on fieldx and a sparse index on field y, as shown below :
for(i=1;i<100;i++)db.coll.insert({x:i,y:i})
db.coll.createIndex({x:1})
db.coll.createIndex({y:1},{sparse:true})
Then, I added a few docs without fields x & y as shown below:
for(i=1;i<100;i++)db.coll.insert({z:"stringggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg"})
Looking at db.coll.stats(), I found the sizes of the indexes:
storageSize:36864
_id:32768
x_1:32768
y_1:16384
As per the definition of sparse index, only documents containing the indexed field y are considered, hence y_1 occupies less space. But _id & x_1 indexes seem to contain all the documents in them.
If I perform a query - db.coll.find({z:99}).explain('executionStats')
It is doing a COLLSCAN and fetching the record. If this is the case, I am not clear on why MongoDB stores all the documents under _id & x_1 indexes, as it is a waste of storage space. Please help me understand. Pardon my ignorance if i missed something.
Thank you for your help.

In a "normal" index, missing fields are indexed with a null value. For example, if you have index of {a:1} and you insert {b:10} into the collection, the document will be indexed as a: null.
You can see this behaviour using a unique index:
> db.test.createIndex({a:1}, {unique:true})
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.test.insert({b:1})
WriteResult({ "nInserted" : 1 })
> db.test.insert({c:1})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: test.test index: a_1 dup key: { : null }"
}
})
Both {b:1} and {c:1} are indexed as a: null, hence the duplicate key error message.
In your collection, you have 200 documents:
100 documents with {x:..., y:...}
100 documents with {z:...}
And your indexes are:
{x:1} (normal index)
{y:1} (sparse index)
The documents will be indexed as follows:
200 documents will be in the _id index, which is always created by MongoDB
200 documents will be in the {x:1} index, from {x:.., y:..} and {z:..} documents
100 documents will be in the {y:1} index
Note that the index sizes you posted shows the same ratio as the numbers above.
Regarding your questions:
The _id index is for MongoDB internal use, see Default _id index. You cannot drop this index, and attempts to remove it could render your database inaccessible.
The x_1 index is there because you told MongoDB to build it. It contains all the documents in your collection because it's a normal index. In the case of your collection, half of the values in the index are null.
The sparse y_1 index is half the size of the x_1 index because only 100 out of 200 documents contain the y field.
The query db.coll.find({z:99}) does not use any index because you don't have an index on the z field, hence it's doing a collection scan.
For more information about indexing, please see Create Indexes to Support Your Queries

Related

If I have both simple and compound index on a field, which one gets used in queries containing that field?

I have a field "productLowerCase" in my mongo documents. I created 2 indices
1. simple
{"productLowerCase" : 1}
2. compound
{"productLowerCase" : 1, "timestamp.milliseconds" : -1}
So If I run a query which has only productLowerCase, say:
db.coll.find({"productLowerCase" : {$regex : /^cap/}})
Which index will get used?
In this case mongo will use {"productLowerCase" : 1} this index, but you can remove this index, because if you have compound index you can search with first field without performance loss.
Beside this you can use explain() to explain your query.

Mongo unique compound text index

I'm trying to create a Mongo index with 2 text fields, whereby either field can have a value in another document, but the same pair cannot. I am familiar with this concept in MySQL, but do not understand it in Mongo.
I would like to create a unique index on the symbol and date fields of these documents:
db.earnings_quotes.insert({"symbol":"UNF","date":"2017-01-04","quote":{"price": 5000}});
db.earnings_quotes.createIndex({symbol: 'text', date: 'text'}, {unique: true})
db.earnings_quotes.insert({symbol: 'HAL', date: '2018-01-22', quote: { "price": 10000 }});
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: sample.earnings_quotes.$symbol_text_date_text dup key: { : \"01\", : 0.6666666666666666 }"
}
})
I don't understand the error message here... In this case, neither symbol, nor date overlap with the first record.
A text index actually behaves a bit like a multikey index, it tries to cut text into bits that can be then queried using specific text search operators. Also, the order of the fields in the text index doesn't really matter (compared to a normal compound index), MongoDB will just go through all the values in both symbol and date and index those separately.
In this case I believe that mongo tries to index the 01 in the 2017 and the 01 in -01- separately.
I don't think in your case you really want to do a text index, it's made for searching through long texts, not fields with single values in them.
And also, the multikey nature of the text index makes it really hard to stay unique.
My advice would be to go like this:
db.earnings_quotes.createIndex({symbol: 1, date: 1}, {unique: true})
By default mongo uses _id as unique key and index, so one solution to your problem is save your data in _id field.
e.g:
{
"_id":{
"symbol" :"xyz" ,
"date" :"12-12-20" ,
}
//Other fields in collection
}
This will create a composite key.

Sparse index does not improve sort in MongoDB?

I have a collection with >100k of documents.
A sample document will be like
{
"created_at" : 1545039649,
"priority" : 3,
"id" : 68,
"name" : "document68"
}
db.mycol.find().sort({created_at:1})
and
db.mycol.find().sort({priority:1})
results in error.
Error: error: {
"ok" : 0,
"errmsg" : "Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
"code" : 96,
"codeName" : "OperationFailed"
}
Then I indexed these fields.
db.mycol.createIndex({'priority':1})
db.mycol.createIndex({'created_at':1}, {sparse:true})
Added sparse index to created_at as it is a mandatory field.
Now
db.mycol.find().sort({priority:1})
gives the result. But
db.mycol.find().sort({created_at:1})
still results in the same error.
The sparse index can only be used when you filter by created_at: {$exists: true}.
The reason being that all the other records are not part of the index (but they are still supposed to appear in the result -- probably at the end).
Maybe you don't have to make the index sparse (which only makes sense when most of the records do not have the field -- otherwise you don't save much space in index storage anyway)? created_at sounds like most records would have it.
Added sparse index to created_at as it is a mandatory field.
Actually, it is the other way around: You only want a sparse index when the field is optional (and quite rare).

Created indexes on a mongodb collection, still fails while sorting a large data set

My Query below:
db.chats.find({ bid: 'someID' }).sort({start_time: 1}).limit(10).skip(82560).pretty()
I have indexes on chats collection on the fields in this order
{
"cid" : 1,
"bid" : 1,
"start_time" : 1
}
I am trying to perform sort, but when I write a query and check the result of explain(), I still get the winningPlan as
{
"stage":"SKIP",
"skipAmount":82560,
"inputStage":{
"stage":"SORT",
"sortPattern":{
"start_time":1
},
"limitAmount":82570,
"inputStage":{
"stage":"SORT_KEY_GENERATOR",
"inputStage":{
"stage":"COLLSCAN",
"filter":{
"ID":{
"$eq":"someID"
}
},
"direction":"forward"
}
}
}
}
I was expecting not to have a sort stage in the winning plan as I have indexes created for that collection.
Having no indexes will result into the following error
MongoError: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM [duplicate]
However I managed to make the sort work by increasing the size allocation on ram from 32mb to 64mb, looking for help in adding indexes properly
The order of fields in an index matters. To sort query results by a field that is not at the start of the index key pattern the query must include equality conditions on all the prefix keys that precede the sort keys. The cid field is not in the query nor used for sorting, so you must leave it out. Then you put the bid field first in the index definition as you use it in the equality condition. The start_time goes after that to be used in sorting. Finally, the index must look like this:
{"bid" : 1, "start_time" : 1}
See the documentation for further reference.

MongoDB can not create unique sparse index (duplicate key)

I want to create a unique index over two columns where the index should allow multiple null values for the second part of the index. But:
db.model.ensureIndex({userId : 1, name : 1},{unique : true, sparse : true});
Throws a duplicate key exception: E11000 duplicate key error index: devmongo.model.$userId_1_name_1 dup key: { : "-1", : null }. I thought because of the sparse=true option the index should allow this constellation? How can I achieve this? I use MongoDB 2.6.5
Sparse compound indexes will create an index entry for a document if any of the fields exist, setting the value to null in the index for any fields that do not exist in the document. Put another way: a sparse compound index will only skip a document if all of the index fields are missing from the document.
As of v3.2, partial indexes can be used to accomplish what you're trying to do. You could use:
db.model.ensureIndex({userId : 1, name : 1}, { partialFilterExpression: { name: { $exists: true }, unique: true });
which will only index documents that have a name field.
NB: This index cannot be used by mongo to handle a query by userId as it will not contain all of the documents in the collection. Also, a null in the document is considered a value and a field that has a null value exists.
The compound index should be considered as a whole one, so unique requires (userId, name) pair must be unique in the collection, and sparse means if both userId and name missed in a document, it is allowed. The error message shows that there are at least two documents whose (userId, name) pairs are equivalent (if a field missed, the value can be considered as null).
In my case, it turns out field names are case sensitive.
So creating a compound index on {field1 : 1, field2 : 1} is not the same as {Field1 : 1, Field2 : 1}