Mongo sparse index is not working as I expected - mongodb

I said 'as I expected', because I might be misunderstanding how it should work.
I have a model containing objects like this one :
{
"_id" : ObjectId("56408d76ef82679937000008"),
"_type" : "ford",
"year" : 1986,
"model" : "sierra",
"model_unique" : 1,
"__v" : 0
}
I need a compound unique index that will not allow to insert two objects with the same _type and model combination unless specified.
The way I thought I could specify that, was using the model_unique column and make the index sparse, so adding the former document twice should fail, whereas the following should be allowed (note that there is no model_unique field):
{
"_id" : ObjectId("56408e0d636779c83700000a"),
"_type" : "veridianDynamics",
"year" : 1986,
"model" : "sierra",
"__v" : 0
}
{
"_id" : ObjectId("another ID"),
"_type" : "veridianDynamics",
"year" : 2003,
"model" : "sierra",
"__v" : 0
}
I thought this would work with this index:
Schema.index({"_type": 1, "model": 1, "model_unique": 1}, { unique: true, sparse: true });
But it is actually failing with:
[MongoError: insertDocument :: caused by :: 11000 E11000 duplicate key error index: mongoose-schema-extend.vehicles.$_type_1_model_1_model_unique_1 dup key: { : "veridianDynamics", : "sierra", : null }]
So apparently it is considering that the undefined fields have a null value.
I'm using mongod --version
db version v2.6.11
And npm -v mongoose
2.14.4

From the documentation on sparse compound indexes:
Sparse compound indexes that only contain ascending/descending index
keys will index a document as long as the document contains at least
one of the keys.
What this means in your case is that only when all three components of the compound index are missing from the document, will the document be excluded from the index, and thus exempt from the unique constraint.
So the sparse index you're trying to add would allow multiple docs without any of the three keys, but for all other cases, the combination of all three fields must be unique, with any missing fields getting a value of null.
In your example docs, they both would look like the following from the perspective of the unique index:
{
"_type" : "veridianDynamics",
"model" : "sierra",
"model_unique : null
}
And thus, not unique.
FYI, there are exceptions to this rule where the existence of a geospatial or text index in your compound, sparse index changes the rules to only consider that specially indexed field when determining whether to include the document in the index.

According to the unique index documentation for missing fields
Unique Index and Missing Field
If a document does not have a value for a field, the index entry for that item will be null in any index that includes it. Thus, in many situations you will want to combine the unique constraint with the sparse option. Sparse indexes skip over any document that is missing the indexed field, rather than storing null for the index entry. Since unique indexes cannot have duplicate values for a field, without the sparse option, MongoDB will reject the second document and all subsequent documents without the indexed field. Consider the following prototype.
Therefore, it seems legitimate to think that this will also work on compound indexes.
This was reported as a bug on jira.
MongoDB developers decided not to include this functionality and closed the request
It makes more sense to exclude documents from the index if ALL fields in the index are missing. Compound indexes also serve queries on the first, first+second, etc fields in the index, and so an index on a,b,c should be able to find all the documents where a=1, not only the ones where b and/or c also have values. This is more intuitive, and should be the default behavior.
Although some suggestions were made in an effort to define a proper semantics to differentiate the two possible cases
{sparse : true, sparseIfAnyValueMissing : true}
It could be useful not only for what I describe in the question, but also for document inheritance and support partial indexing
I have the situation where one of my columns is null when I first create it, but may get set to an ID later. And when it's set to an ID it needs to be unique with another column.
Unfortunately I can't enforce this using a unique index because it will fail since many rows may have null in one of the columns. If I were using a regular RDBMS with a sparse unique multi-column index, this would work fine. Unfortunately Mongo has chosen to work in a different way from all of the RDBMS' out there and cannot support this scenario.
Given that partial indexes are not a quick thing to add and don't seem like they will be added anytime soon, why is this issue closed? Please reopen and consider implementing this issue.
Unfortunately, it is not possible yet (I hope it will be at some point)

Related

Mongodb set _id as decreasing index

I want to use mongodb's default _id but in decreasing order. I want to store posts and want to get the latest posts at the start when I use find(). I am using mongoose. I tried with
postSchema.index({_id:-1})
but it didn't work
> db.posts.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mean-dev.posts"
}]
I dropped the database and restarted mongod. No luck with that.
Is there any way to set _id as a decreasing index at the sametime using mongodb's default index? I don't want to use sort() to sort the result according to _id decreasingly.
Short answer
You cannot a descending index on _id field. You also don't need it. Mongo will use the existing default index when doing a descending sort on the _id field.
Long answer
As stated in the documentation MongoDB index on _id field is created automatically as an ascending unique index and you can't remove it.
You also don't need to create an additional descending index on _id field because MongoDB can use the default index for sorting.
To verify that MongoDB is using index for your sorting you can use explain command:
db.coll.find().sort({_id : -1}).explain();
In the output explain command, the relevant part is
"cursor" : "BtreeCursor _id_ reverse"
which means that MongoDB is using index for sorting your query in reverse order.
actually you can use this index, just put .sort({"_id":-1}) at the end of you query
ObjectId values do not represent a strict insertion order.
From documentation: http://docs.mongodb.org/manual/reference/object-id/
IMPORTANT The relationship between the order of ObjectId values and
generation time is not strict within a single second. If multiple
systems, or multiple processes or threads on a single system generate
values, within a single second; ObjectId values do not represent a
strict insertion order. Clock skew between clients can also result in
non-strict ordering even for values, because client drivers generate
ObjectId values, not the mongod process.

mongodb: create a top-level index for a nested document instead of having to index each individual sublevel?

This question is about how I can use indexes in MongoDB to look something up in nested documents, without having to index each individual sublevel.
I have a collection "test" in MongoDB which basically goes something like this:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : {
"1" : { [1,2,3] },
"2" : { [4,5,6] }
}}
Scenario has multiple keys, each document can have any subset of the scenarios (i.e. from none to a subset to all). Also: Scenario can't be an array because i need it as a dictionary in Python. I created an index on the "scenario" field.
My issue is that i want to select on the collection, filtering for documents that have a certain value. So this works fine functionally:
db.test.find({"scenario.1": {$exists: true}})
However, it won't use any index i've put on scenario. Only if i put an index on the "scenario.1" an index is used. But I can have thousands (or more) scenarios (and the collection itself has 100.000s of records), so i would prefer not to!
So i tried alternatives:
db.test.find({"scenario": "1"})
This will use the index on scenario, but won't return results. Making scenario an array still gives the same index issue.
Is my question clear? Can anyone give a pointer on how I could achieve the best performance here?
P.s. I have seen this: How to Create a nested index in MongoDB? but that solution is not possible in my case (due to the amount of scenarios)
Putting an index on a subobject like scenario is useless in this case as it would only be used when you're filtering on complete scenario objects rather than individual fields (think of it as a binary blob comparison).
You either need to add an index on each of your possible fields ("scenario.1", "sceanario.2", etc.) or rework your schema to get rid of the dynamic keys by doing something like this:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [
{ id: "1", value: [1,2,3] },
{ id: "2", value: [4,5,6] }
}}
Then you can add a single index to scenario.id to support the queries you need to perform.
I know you said you need scenario to be a dict and not an array, but I don't see how you have much choice.
Johnny HK's answer is a nice explained answer and should be used in general cases. I will just suggest a workaround for you to solve your issue if you have to have many scenarios and don't need complex querying. Instead of keeping values under scenario field, just hold the id of the scenario under that field, and hold the values as another field in the document and use the scenario id as the key of this field.
Example:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [ "1", "2"],
"scenario_1": [1,2,3],
"scenario_2": [4,5,6]
}}
With this schema you can use index on scenario to find specific scenarios. But if you need to query for specific scenario values, you again need to have an index on each scenario value field i.e scenario_1, scenario_2, etc.. If you need to have indexes for each field, then don't change your original schema and use sparse indexes for each nested field and that might help reduce the size of your indexes.

Adding an index to a MongoDB collection hash field

I have a MongoDB collection that I would like to add an index on. For the purpose of this post, let's say the collection name is Cats. I have a hash key on the Cats collection so if you do db.cats.findOne(); it'll look like the following:
> db.cats.findOne();
{
"_id" : ObjectId("4f248f8ae4b0b775c9eb002d"),
"metaData" : {
"type" : "cute",
"id" : "4ed3b6c599114b488be52bc3"
},
....
}
I query very often (using Mongoid), with something like this:
Cat.first(:conditions => { "metaData.id" => an_id }
I'd really like to be able to take advantages of indexes here, but I'm not entirely sure if I should index all of metaData or just metaData.id (I query against id specifically, and very often).
Would love any solution to this problem because I think I can dramatically speed up queries if I do the right thing here. Also, this is a unique index.
also metaData is not an embedded document. it does not have its own collection. it is simply a hash with a 1:1 mapping in each cats object.
You can just define an index on the embedded document. This is covered here:
http://www.mongodb.org/display/DOCS/Indexes#Indexes-UsingDocumentsasKeys
For your specific example, this would be:
db.Cats.ensureIndex({ "metaData.id" : 1}, {unique : true})
To compare your results do some of your standard queries in the shell with a .explain() to compare the speed with and without the index. If you are not doing a lot of queries you might need to hint the index to use so that it doesn't cache the "best" index (don't forget there is one on _id by default). More explain info here:
http://www.mongodb.org/display/DOCS/Explain

sparse indexes and null values in mongo

I'm not sure I understand sparse indexes correctly.
I have a sparse unique index on fbId
{
"ns" : "mydb.users",
"key" : {
"fbId" : 1
},
"name" : "fbId_1",
"unique" : true,
"sparse" : true,
"background" : false,
"v" : 0
}
And I was expecting that would allow me to insert records with null as the fbId, but that throws a duplicate key exception. It only allows me to insert if the fbId property is removed completely.
Isn't a sparse index supposed to deal with that?
Sparse indexes do not contain documents that miss indexed field. However, if field exists and has value of null, it will still be indexed. So, if absense of the field and its equality to null look the same for your application and you want to maintain uniqueness of fbId, just don't insert it until you have a value for it.
You need sparse indexes when you have a large number of documents, but only a small portion of them contains some field, and you want to be able to quickly find documents by that field. Creating a normal index would be too expensive, you would just waste precious RAM on indexing documents you're not interested in.
To ensure maximum performance of the indexes, we may want to omit from indexing those documents NOT containing the field on which you are performing an index. To do this MongoDB has the sparse property that works as follows:
db.addresses.ensureIndex( { "secondAddress": 1 }, { sparse: true } );
This index will omit all the documents not containing the secondAddress field and when performing a query, those document will never be scanned.
Let me share this article about basic indexes and some of their properties:
Geospatial, Text, Hash indexes and unique and sparse properties: http://mongodbspain.com/en/2014/02/03/mongodb-indexes-part-2-geospatial-2d-2dsphere/
{a:1, b:5, c:2}
{a:8, b:15, c:7}
{a:4, b:7}
{a:3, b:10}
Let's assume that we wish to create an index on the above documents. Creating index on a & b will not be a problem. But what if we need to create an index on c. The unique constraint will not work for c keys because null value is duplicated for 2 documents. The solution in this case is to use sparse option. This option tells the database to not include the documents which misses the key. The command in concern is db.collectionName.createIndex({thing:1}, {unique:true, sparse:true}). The sparse index lets us use less space as well.
Notice that even if we have a sparse index, the database performs all documents scan especially when doing sort. This can be seen in the winning plan section of explain's result.
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is "sparse" because it does not include all documents of a collection.

MongoDB : Indexes order and query order must match?

This question concern the internal method to manage indexes and serching Bson Documents.
When you create a multiple indexes like "index1", "index2", "index3"...the index are stored to be used during queries, but what about the order of queries and the performance resulting.
sample
index1,index2,index3----> query in the same order index1,index2,index3 (best case)
index1,index2,index3----> query in another order index2,index1,index3 (the order altered)
Many times you use nested queries including these 3 index and others items or more indexes. The order of the queries would implicate some time lost?. Must passing the queries respecting the indexes order defined or the internal architecture take care about this order search? I searching to know if i do take care about this or can make my queries in freedom manier.
Thanks.
The order of the conditions in your query does not affect whether it can use an index or no.
e.g.
typical document structure:
{
"FieldA" : "A",
"FieldB" : "B"
}
If you have an compound index on A and B :
db.MyCollection.ensureIndex({FieldA : 1, FieldB : 1})
Then both of the following queries will be able to use that index:
db.MyCollection.find({FieldA : "A", FieldB : "B"})
db.MyCollection.find({FieldB : "B", FieldA : "A"})
So the ordering of the conditions in the query do not prevent the index being used - which I think is the question you are asking.
You can easily test this out by trying the 2 queries in the shell and adding .explain() after the find. I just did this to confirm, and they both showed that the compound index was used.
however, if you run the following query, this will NOT use the index as FieldA is not being queried on:
db.MyCollection.find({FieldB : "B"})
So it's the ordering of the fields in the index that defines whether it can be used by a query and not the ordering of the fields in the query itself (this was what Lucas was referring to).
From http://www.mongodb.org/display/DOCS/Indexes:
If you have a compound index on
multiple fields, you can use it to
query on the beginning subset of
fields. So if you have an index on
a,b,c
you can use it query on
a
a,b
a,b,c
So yes, order matters. You should clarify your question a bit if you need a more precise answer.