MongoDB can not create unique sparse index (duplicate key) - mongodb

I want to create a unique index over two columns where the index should allow multiple null values for the second part of the index. But:
db.model.ensureIndex({userId : 1, name : 1},{unique : true, sparse : true});
Throws a duplicate key exception: E11000 duplicate key error index: devmongo.model.$userId_1_name_1 dup key: { : "-1", : null }. I thought because of the sparse=true option the index should allow this constellation? How can I achieve this? I use MongoDB 2.6.5

Sparse compound indexes will create an index entry for a document if any of the fields exist, setting the value to null in the index for any fields that do not exist in the document. Put another way: a sparse compound index will only skip a document if all of the index fields are missing from the document.
As of v3.2, partial indexes can be used to accomplish what you're trying to do. You could use:
db.model.ensureIndex({userId : 1, name : 1}, { partialFilterExpression: { name: { $exists: true }, unique: true });
which will only index documents that have a name field.
NB: This index cannot be used by mongo to handle a query by userId as it will not contain all of the documents in the collection. Also, a null in the document is considered a value and a field that has a null value exists.

The compound index should be considered as a whole one, so unique requires (userId, name) pair must be unique in the collection, and sparse means if both userId and name missed in a document, it is allowed. The error message shows that there are at least two documents whose (userId, name) pairs are equivalent (if a field missed, the value can be considered as null).

In my case, it turns out field names are case sensitive.
So creating a compound index on {field1 : 1, field2 : 1} is not the same as {Field1 : 1, Field2 : 1}

Related

Compound index where one field can be null MongoDB

How can I create compound index in mongo where one of the fields maybe not present or be null?
For example in below documents if I create a compound index name+age. How can I still achieve this with age being not present or null in some documents?
{
name: "Anurag",
age: "21",
},
{
name: "Nitin",
},
You can create partial Index as follow:
db.contacts.createIndex(
{ name: 1 },
{ partialFilterExpression: { age: { $exists: true } } }
)
Explained:
As per the documentation partial indexes only index the documents in a collection that meet a specified filter expression. By indexing a subset of the documents in a collection, partial indexes have lower storage requirements and reduced performance costs for index creation and maintenance. In this particular case imagine your collection have 100k documents , but only 5 documents have the "age" field existing , in this case the partial index will include only those 5 fields in the index optimizing the index storage space and providing better performance.
For the query optimizer to choose this partial index, the query predicate must include a condition on the name field as well as a non-null match on the age field.
Following example queries will be able to use the index:
db.contacts.find({name:"John"})
db.contacts.find({name:"John",age:{$gt:20}})
db.contacts.find({name:"John",age:30})
Following example query is a "covered query" based on this index:
db.contacts.find({name:"John",age:30},{_id:0,name:1,age:1})
( this query will be highly efficient since it return the data directly from the index )
Following example queries will not be able to use the index:
db.contacts.find({name:"John",age:{$exists:false}})
db.contacts.find({name:"John",age:null})
db.contacts.find({age:20})
Please, note you need to perform some analysis on if you need to search on the age field together with the name , since name field has a very good selectivity and this index will not be used in case you search only by age , maybe a good option is to create additional sparse/partial index only on the age field so you could fetch a list with contacts by certain age if this a possible search use case.

Mongo unique compound text index

I'm trying to create a Mongo index with 2 text fields, whereby either field can have a value in another document, but the same pair cannot. I am familiar with this concept in MySQL, but do not understand it in Mongo.
I would like to create a unique index on the symbol and date fields of these documents:
db.earnings_quotes.insert({"symbol":"UNF","date":"2017-01-04","quote":{"price": 5000}});
db.earnings_quotes.createIndex({symbol: 'text', date: 'text'}, {unique: true})
db.earnings_quotes.insert({symbol: 'HAL', date: '2018-01-22', quote: { "price": 10000 }});
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: sample.earnings_quotes.$symbol_text_date_text dup key: { : \"01\", : 0.6666666666666666 }"
}
})
I don't understand the error message here... In this case, neither symbol, nor date overlap with the first record.
A text index actually behaves a bit like a multikey index, it tries to cut text into bits that can be then queried using specific text search operators. Also, the order of the fields in the text index doesn't really matter (compared to a normal compound index), MongoDB will just go through all the values in both symbol and date and index those separately.
In this case I believe that mongo tries to index the 01 in the 2017 and the 01 in -01- separately.
I don't think in your case you really want to do a text index, it's made for searching through long texts, not fields with single values in them.
And also, the multikey nature of the text index makes it really hard to stay unique.
My advice would be to go like this:
db.earnings_quotes.createIndex({symbol: 1, date: 1}, {unique: true})
By default mongo uses _id as unique key and index, so one solution to your problem is save your data in _id field.
e.g:
{
"_id":{
"symbol" :"xyz" ,
"date" :"12-12-20" ,
}
//Other fields in collection
}
This will create a composite key.

Created indexes on a mongodb collection, still fails while sorting a large data set

My Query below:
db.chats.find({ bid: 'someID' }).sort({start_time: 1}).limit(10).skip(82560).pretty()
I have indexes on chats collection on the fields in this order
{
"cid" : 1,
"bid" : 1,
"start_time" : 1
}
I am trying to perform sort, but when I write a query and check the result of explain(), I still get the winningPlan as
{
"stage":"SKIP",
"skipAmount":82560,
"inputStage":{
"stage":"SORT",
"sortPattern":{
"start_time":1
},
"limitAmount":82570,
"inputStage":{
"stage":"SORT_KEY_GENERATOR",
"inputStage":{
"stage":"COLLSCAN",
"filter":{
"ID":{
"$eq":"someID"
}
},
"direction":"forward"
}
}
}
}
I was expecting not to have a sort stage in the winning plan as I have indexes created for that collection.
Having no indexes will result into the following error
MongoError: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM [duplicate]
However I managed to make the sort work by increasing the size allocation on ram from 32mb to 64mb, looking for help in adding indexes properly
The order of fields in an index matters. To sort query results by a field that is not at the start of the index key pattern the query must include equality conditions on all the prefix keys that precede the sort keys. The cid field is not in the query nor used for sorting, so you must leave it out. Then you put the bid field first in the index definition as you use it in the equality condition. The start_time goes after that to be used in sorting. Finally, the index must look like this:
{"bid" : 1, "start_time" : 1}
See the documentation for further reference.

Mongodb sparse index and general index

I have created a collection with 100 documents (fields x & y), and created a normal index on fieldx and a sparse index on field y, as shown below :
for(i=1;i<100;i++)db.coll.insert({x:i,y:i})
db.coll.createIndex({x:1})
db.coll.createIndex({y:1},{sparse:true})
Then, I added a few docs without fields x & y as shown below:
for(i=1;i<100;i++)db.coll.insert({z:"stringggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg"})
Looking at db.coll.stats(), I found the sizes of the indexes:
storageSize:36864
_id:32768
x_1:32768
y_1:16384
As per the definition of sparse index, only documents containing the indexed field y are considered, hence y_1 occupies less space. But _id & x_1 indexes seem to contain all the documents in them.
If I perform a query - db.coll.find({z:99}).explain('executionStats')
It is doing a COLLSCAN and fetching the record. If this is the case, I am not clear on why MongoDB stores all the documents under _id & x_1 indexes, as it is a waste of storage space. Please help me understand. Pardon my ignorance if i missed something.
Thank you for your help.
In a "normal" index, missing fields are indexed with a null value. For example, if you have index of {a:1} and you insert {b:10} into the collection, the document will be indexed as a: null.
You can see this behaviour using a unique index:
> db.test.createIndex({a:1}, {unique:true})
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.test.insert({b:1})
WriteResult({ "nInserted" : 1 })
> db.test.insert({c:1})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: test.test index: a_1 dup key: { : null }"
}
})
Both {b:1} and {c:1} are indexed as a: null, hence the duplicate key error message.
In your collection, you have 200 documents:
100 documents with {x:..., y:...}
100 documents with {z:...}
And your indexes are:
{x:1} (normal index)
{y:1} (sparse index)
The documents will be indexed as follows:
200 documents will be in the _id index, which is always created by MongoDB
200 documents will be in the {x:1} index, from {x:.., y:..} and {z:..} documents
100 documents will be in the {y:1} index
Note that the index sizes you posted shows the same ratio as the numbers above.
Regarding your questions:
The _id index is for MongoDB internal use, see Default _id index. You cannot drop this index, and attempts to remove it could render your database inaccessible.
The x_1 index is there because you told MongoDB to build it. It contains all the documents in your collection because it's a normal index. In the case of your collection, half of the values in the index are null.
The sparse y_1 index is half the size of the x_1 index because only 100 out of 200 documents contain the y field.
The query db.coll.find({z:99}) does not use any index because you don't have an index on the z field, hence it's doing a collection scan.
For more information about indexing, please see Create Indexes to Support Your Queries

MongoDB Covered Query For Two Fields Without Compound Index

Can you perform a MongoDB covered query for two fields, for example
db.collection.find( { _id: 1, a: 2 } )
without having a compound index such as
db.collection.ensureIndex( { _id: 1, a: 1 } )
but instead having only one index for _id (you get that by default) and another index for field "a", as in
db.collection.ensureIndex( { a: 1 } )
In other words, I'd like to know if in order to perform a covered query for two fields I need a compound index vs. needing only two single (i.e., not compound) indexes, one for each field.
Queries only use one index.
Your example shows _id as one of the elements of your index? _id Needs to be unique in a collection, so it wouldn't make sense to make a compound index of _id and something else.
If you instead had:
db.collection.ensureIndex( { a: 1, b: 1 })
You could then use the a index as needed, independently, or as a compound index with b.