Adding unique index in MongoDB ignoring nulls - mongodb

I'm trying to add unique index on a group of fields in MongoDB. Not all of those fields are available in all of the documents and I'd like to index only those which have all of the fields.
So, I'm trying to run this:
db.mycollection.ensureIndex({date:1, type:1, reference:1}, {sparse: true, unique: true})
But I get an error E11000 duplicate key error index on a field which misses 'type' field (there are many of them and they are duplicate, but I just want to ignore them).
Is it possible in MongoDB or there is some workaround?

There are multiple people who want this feature and because there is no workaround for this, I would recommend voting up feature request Jira tickets in jira.mongodb.org:
SERVER-785 - support filtered (partial) indexes
SERVER-2193 - sparse indexes only support single field
Note that because 785 would provide a way to enforce this feature, 2193 is marked "won't fix" so it may be more productive to vote up and add your comments to 785.

The uniqueness, you can guarantee, using upsert operation instead of doing insert. This will make sure that if some document already exist then it will update or insert if document don't exist
test:Mongo > db.test4.ensureIndex({ a : 1, b : 1, c : 1}, {sparse : 1})
test:Mongo > db.test4.update({a : 1, b : 1}, {$set : { d : 1}}, true, false)
test:Mongo > db.test4.find()
{ "_id" : ObjectId("51ae978960d5a3436edbaf7d"), "a" : 1, "b" : 1, "d" : 1 }
test:Mongo > db.test4.update({a : 1, b : 1, c : 1}, {$set : { d : 1}}, true, false)
test:Mongo > db.test4.find()
{ "_id" : ObjectId("51ae978960d5a3436edbaf7d"), "a" : 1, "b" : 1, "d" : 1 }
{ "_id" : ObjectId("51ae97b960d5a3436edbaf7e"), "a" : 1, "b" : 1, "c" : 1, "d" : 1 }

Related

MongoDB - How Index prefixe works?

I have read this documentation : "Sort and Non-prefix Subset of an Index"
With that info. I am trying to answer this MongoDB mock test question, the question they have is
You have the following indexes on the things collection:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.things"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "test.things"
},
{
"v" : 1,
"key" : {
"c" : 1,
"b" : 1,
"a" : 1
},
"name" : "c_1_b_1_a_1",
"ns" : "test.things"
}
]
Question:
Which of the following queries will require that you load every document into RAM in order to fulfill the query? Assume that no data is being written during the query. Check all that apply.
db.things.find( { b : 1 } ).sort( { c : 1, a : 1 } )
db.things.find( { c : 1 } ).sort( { a : 1, b : 1 } )
db.things.find( { a : 1 } ).sort( { b : 1, c : 1 } )
The answer they give is...
db.things.find( { b: 1} ).sort( {c: 1, a: 1} )
Can someone help me understand why the other 2 option are not correct i.e how are they using index/Index-prefix. My understanding is the SORT part has to match indexed-column-order. Also, the suggested correct answer does not seem to meet the rule (per documentation) either.
I believe, given the options and answer, the emphasis is to find:
Which of the following queries will require that you load every document into RAM in order to fulfill the query?
So the sorting is a red-herring.
find({ b: 1 }) can't use any of the indexes provided
find({ c: 1 }) can use index c_1_b_1_a_1 since it matches the prefix
find({ a: 1 }) can use index a_1
Since options #2 and #3 can use an index, they will not load every document in order to sort them, just the ones found via the index. Option #1 will have to do a full collection scan to find documents where b is 1.

MongoDB $or + sort + index. How to avoid sorting in memory?

I have an issue to generate proper index for my mongo query, which would avoid SORT stage. I am not even sure if that is possible in my case. So here is my query with execution stats:
db.getCollection('test').find(
{
"$or" : [
{
"a" : { "$elemMatch" : { "_id" : { "$in" : [4577] } } },
"b" : { "$in" : [290] },
"c" : { "$in" : [35, 49, 57, 101, 161, 440] },
"d" : { "$lte" : 399 }
},
{
"e" : { "$elemMatch" : { "numbers" : { "$in" : ["1K0407151AC", "0K20N51150A"] } } },
"d" : { "$lte" : 399 }
}]
})
.sort({ "X" : 1, "d" : 1, "Y" : 1, "Z" : 1 }).explain("executionStats")
The fields 'm', 'a' and 'e' are arrays, that is why 'm' is not included in any index.
If you check the execution stats screenshot, you will see that memory usage is pretty close to maximum and unfortunately I had cases where the query failed to execute because of the 32MB limit.
Index for the first part of the $or query:
{
"a._id" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1,
"b" : 1,
"c" : 1
}
Index for the second part of the $or query:
{
"e.numbers" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1
}
The indexes are used by the query, but not for sorting. Instead of SORT stage I would like too see SORT_MERGE stage, but no success for now. If I run the part queries inside $or separately, they are able to use the index to avoid sorting in a memory. As a workaround it is ok, but I would need to merge and resort the results by the application.
MongoDB version is 3.4.2. I checked that and that question. My query is the result. Probably I missed something?
Edit: mongo documents look like that:
{
"_id" : "290_440_K760A03",
"Z" : "K760A03",
"c" : 440,
"Y" : "NPS",
"b" : 290,
"X" : "Schlussleuchte",
"e" : [
{
"..." : 184,
"numbers" : [
"0K20N51150A"
]
}
],
"a" : [
{
"_id" : 4577,
"..." : [
{
"..." : [
{
"..." : "R",
}
]
}
]
},
{
"_id" : 4578
}
],
"d" : 101,
"m" : [
"AT",
"BR",
"CH"
],
"moreFields":"..."
}
Edit 2: removed the filed "m" from query to decrease complexity and attached test collection dump for someone, who wants to help :)
Here is the solution-
I just added one document in my test collection as shown in your question (edit part). Then I created below four indices-
1. {"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
2. {"a._id":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
3. {"m":1,"X":1,"d":1,"Y":1,"Z":1}
4. {"e.numbers":1,"X":1,"d":1,"Y":1,"Z":1}
And when I executed given query for execution stats then it shows me the SORT_MERGE state as expected.
Here is the explanation-
MongoDB has a thing called equality-sort-range which tells a lot how we should create our indices. I just followed this rule and kept the index in that order. So Here the index should be {Equality fields, "X":1,"d":1,"Y":1,"Z":1, Range fields}. You can see that the query has range on field "d" only ("d" : { "$lte" : 101 }) but "d" is already covered in SORT fields of index ("X":1,"d":1,"Y":1,"Z":1) so we can skip range part (i.e. field "d") from the end of index.
If "d" had NOT been in sort/equality predicate then I would have taken it in index for range index field and my index would have looked like {Equality fields, "X":1,"Y":1,"Z":1,"d":1}.
Now my index is {Equality fields, "X":1,"d":1,"Y":1,"Z":1} and I am just concerned about equality fields. So to figure out equality fields I just checked the query find predicates and I found there are two conditions combined by OR operator.
The first condition has equality on "a._id", "b", "c", "m" ("d" has range, not equality). So I need to create an index like "a._id":1,"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1 but this will give error because it has two array fields "a_id" and "m". And as we know Mongo doesn't allow compound index on parallel arrays so it will fail. So I created two separate index just to allow Mongo to use whatever is chosen by query planner. And hence I created first and second index.
The second condition of OR operator has "e.numbers" and "m". Both are arrays fields so I had to create two indices as done for first condition and that's how I got my third and fourth index.
Now we know that at a time a single query can use only and only one index so I need to create these indices because I don't know which branch of OR operator will be executed.
Note: If you are concerned about size of index then you can keep only one index from first two and one from last two. Or you can also keep all four and hint mongo to use proper index if you know it well before query planner.

mongodb compound index and single index performance

Consider the following scenario:
100% of the time my query will include a in the query, and sometimes also b.
90% the query will be:
{a:"somevalue"}
and 10% it will be
{a:"somevalue",b:"somevalue"}
What would be the downside to satisfy this with a compound index only (if any)?, like so:
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
}
Or would i benefit from adding a second index satisfying only queries on a
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "foo.bar"
}
The manual has this to say;
If you have a collection that has a compound index on { a: 1, b: 1 }, as well as an index that consists of the prefix of that index, i.e. { a: 1 }, assuming none of the index has a sparse or unique constraints, then you can drop the { a: 1 } index.
MongoDB will be able to use the compound index in all of situations that it would have used the { a: 1 } index.
In other words, you'll most likely get as good or better performance using a single index, since MongoDB does not have to cache two indexes in memory or update two separate indexes on every insert.

MongoDB Why this error : can't append to array using string field name: comments

I have a DB structure like below:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"content" : "xxx"
}
]
}
I update a new subdocument in the comments feild. It is OK.
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
after that the DB structure:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"comments" : [
{
"id" : 3,
"content" : "xxx"
}
],
"content" : "xxx"
}
]
}
But when I update a new subdocument in the comment field that _id is 3, There is a error:
db.test.update(
{"_id" : 1, "comments.comments.id" : 3},
{$push : {"comments.comments.$.comments" : {id : 4, content:"xxx"}}}
)
error message:
can't append to array using string field name: comments
Well, it makes total sense if you think about it. MongoDb has the advantage and the disadvantage of solving magically certain things.
When you query the database for a specific regular field like this:
{ field : "value" }
The query {field:"value"} makes total sense, it wouldn't in case value is part of an array but Mongo solves it for you, so in case the structure is:
{ field : ["value", "anothervalue"] }
Mongo iterates through all of them and matches "value" into the field and you don't have to think about it. It works perfectly.. at only one level, because it's impossible to guess what you want to do if you have multiple levels
In your case the first query works because it's the case in this example:
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
Matches _id in the first level, and comments._id at the second level, it gets an array as a result but Mongo is able to solve it.
But in the second case, think what you need, let's isolate the where clause:
{"_id" : 1, "comments.comments.id" : 3},
"Give me from the main collection records with _id:1" (one doc)
"And comments which comments inside have and id=3" (array * array)
The first level is solved easily, comments.id, the second is not possible due comments returns an array, but one more level is an array of arrays and Mongo gets an array of arrays as a result and it's not possible to push a document into all the records of the array.
The solution is to narrow your where clause to obtain an unique document in comments (could be the first one) but it's not a good solution because you never know what is the position of the document you're looking for, using the shell I think the only option to be accurate is to do it in two steps. Check this query that works (not the solution anyway) but "solves" the multiple array part fixing it to the first record:
db.test.update(
{"_id" : 1, "comments.0.comments._id" : 3},
{$push : {"comments.0.comments.$.comments" : {id : 4, content:"xxx"}}}
)

mongodb - how it processes $lt on subdocuments

I have a collection where one of the fields is a subdocument. I am confused how mongodb supports the $lt, $gt query operators on the complete subdocument.
sample:
db.test.insert({a:1, subdocA:{x:4, y:7, z:10}, b:10})
db.test.insert({a:9, subdocA:{x:2, y:70, z:5}, b:9})
db.test.insert({a:4, subdocA:{x:8, y:2, z:45}, b:19})
In the above collection, I see that mongodb supports a query like:
db.test.find({subdocA:{$lt:{x:6, y:5, z:25}})
In fact it also supports similar queries with $gt operator. It also supports sort({subdocA:1}) on the query.
I would like to know the "logic" it uses to compare the subdocuments and thereby process the $lt, $gt operators.
I see mongodb documentation about how exact matches are processed with subdocuments. But I don't see any documentation on how $lt, $gt are handled with subdocuments.
Thanks.
You have to specify the operator for each field, naming the field with a dot (.) to reach inside the embeeded document. The documentation about $gt hints at this.
So to query a subdocument on z lower than 20, you actually search for subdocA.z being lower than 20, like this :
> db.test.find({'subdocA.z':{$lt:20}}, {_id:0})
{ "a" : 1, "subdocA" : { "x" : 4, "y" : 7, "z" : 10 }, "b" : 10 }
{ "a" : 9, "subdocA" : { "x" : 2, "y" : 70, "z" : 5 }, "b" : 9 }
You can add other criteria in the same way, here with subdocA.x lower than 3 :
> db.test.find({'subdocA.z':{$lt:20}, 'subdocA.x':{$lt:3}}, {_id:0})
{ "a" : 9, "subdocA" : { "x" : 2, "y" : 70, "z" : 5 }, "b" : 9 }
Finally, you can mix and match fields from the "base" document :
> db.test.find({'subdocA.z':{$lt:20}, 'a':{$gt:3}}, {_id:0})
{ "a" : 9, "subdocA" : { "x" : 2, "y" : 70, "z" : 5 }, "b" : 9 }