MongoDB - How Index prefixe works?

MongoDB - How Index prefixe works? - mongodb

I have read this documentation : "Sort and Non-prefix Subset of an Index"
With that info. I am trying to answer this MongoDB mock test question, the question they have is
You have the following indexes on the things collection:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.things"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "test.things"
},
{
"v" : 1,
"key" : {
"c" : 1,
"b" : 1,
"a" : 1
},
"name" : "c_1_b_1_a_1",
"ns" : "test.things"
}
]
Question:
Which of the following queries will require that you load every document into RAM in order to fulfill the query? Assume that no data is being written during the query. Check all that apply.
db.things.find( { b : 1 } ).sort( { c : 1, a : 1 } )
db.things.find( { c : 1 } ).sort( { a : 1, b : 1 } )
db.things.find( { a : 1 } ).sort( { b : 1, c : 1 } )
The answer they give is...
db.things.find( { b: 1} ).sort( {c: 1, a: 1} )
Can someone help me understand why the other 2 option are not correct i.e how are they using index/Index-prefix. My understanding is the SORT part has to match indexed-column-order. Also, the suggested correct answer does not seem to meet the rule (per documentation) either.

I believe, given the options and answer, the emphasis is to find:
Which of the following queries will require that you load every document into RAM in order to fulfill the query?
So the sorting is a red-herring.
find({ b: 1 }) can't use any of the indexes provided
find({ c: 1 }) can use index c_1_b_1_a_1 since it matches the prefix
find({ a: 1 }) can use index a_1
Since options #2 and #3 can use an index, they will not load every document in order to sort them, just the ones found via the index. Option #1 will have to do a full collection scan to find documents where b is 1.

Related

MongoDB $or + sort + index. How to avoid sorting in memory?

I have an issue to generate proper index for my mongo query, which would avoid SORT stage. I am not even sure if that is possible in my case. So here is my query with execution stats:
db.getCollection('test').find(
{
"$or" : [
{
"a" : { "$elemMatch" : { "_id" : { "$in" : [4577] } } },
"b" : { "$in" : [290] },
"c" : { "$in" : [35, 49, 57, 101, 161, 440] },
"d" : { "$lte" : 399 }
},
{
"e" : { "$elemMatch" : { "numbers" : { "$in" : ["1K0407151AC", "0K20N51150A"] } } },
"d" : { "$lte" : 399 }
}]
})
.sort({ "X" : 1, "d" : 1, "Y" : 1, "Z" : 1 }).explain("executionStats")
The fields 'm', 'a' and 'e' are arrays, that is why 'm' is not included in any index.
If you check the execution stats screenshot, you will see that memory usage is pretty close to maximum and unfortunately I had cases where the query failed to execute because of the 32MB limit.
Index for the first part of the $or query:
{
"a._id" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1,
"b" : 1,
"c" : 1
}
Index for the second part of the $or query:
{
"e.numbers" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1
}
The indexes are used by the query, but not for sorting. Instead of SORT stage I would like too see SORT_MERGE stage, but no success for now. If I run the part queries inside $or separately, they are able to use the index to avoid sorting in a memory. As a workaround it is ok, but I would need to merge and resort the results by the application.
MongoDB version is 3.4.2. I checked that and that question. My query is the result. Probably I missed something?
Edit: mongo documents look like that:
{
"_id" : "290_440_K760A03",
"Z" : "K760A03",
"c" : 440,
"Y" : "NPS",
"b" : 290,
"X" : "Schlussleuchte",
"e" : [
{
"..." : 184,
"numbers" : [
"0K20N51150A"
]
}
],
"a" : [
{
"_id" : 4577,
"..." : [
{
"..." : [
{
"..." : "R",
}
]
}
]
},
{
"_id" : 4578
}
],
"d" : 101,
"m" : [
"AT",
"BR",
"CH"
],
"moreFields":"..."
}
Edit 2: removed the filed "m" from query to decrease complexity and attached test collection dump for someone, who wants to help :)

Here is the solution-
I just added one document in my test collection as shown in your question (edit part). Then I created below four indices-
1. {"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
2. {"a._id":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
3. {"m":1,"X":1,"d":1,"Y":1,"Z":1}
4. {"e.numbers":1,"X":1,"d":1,"Y":1,"Z":1}
And when I executed given query for execution stats then it shows me the SORT_MERGE state as expected.
Here is the explanation-
MongoDB has a thing called equality-sort-range which tells a lot how we should create our indices. I just followed this rule and kept the index in that order. So Here the index should be {Equality fields, "X":1,"d":1,"Y":1,"Z":1, Range fields}. You can see that the query has range on field "d" only ("d" : { "$lte" : 101 }) but "d" is already covered in SORT fields of index ("X":1,"d":1,"Y":1,"Z":1) so we can skip range part (i.e. field "d") from the end of index.
If "d" had NOT been in sort/equality predicate then I would have taken it in index for range index field and my index would have looked like {Equality fields, "X":1,"Y":1,"Z":1,"d":1}.
Now my index is {Equality fields, "X":1,"d":1,"Y":1,"Z":1} and I am just concerned about equality fields. So to figure out equality fields I just checked the query find predicates and I found there are two conditions combined by OR operator.
The first condition has equality on "a._id", "b", "c", "m" ("d" has range, not equality). So I need to create an index like "a._id":1,"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1 but this will give error because it has two array fields "a_id" and "m". And as we know Mongo doesn't allow compound index on parallel arrays so it will fail. So I created two separate index just to allow Mongo to use whatever is chosen by query planner. And hence I created first and second index.
The second condition of OR operator has "e.numbers" and "m". Both are arrays fields so I had to create two indices as done for first condition and that's how I got my third and fourth index.
Now we know that at a time a single query can use only and only one index so I need to create these indices because I don't know which branch of OR operator will be executed.
Note: If you are concerned about size of index then you can keep only one index from first two and one from last two. Or you can also keep all four and hint mongo to use proper index if you know it well before query planner.

Optimize query performance in MongoDB

I have a collection named App and need to query those active (active: true) apps that belong to a particular user (user_id) or are available to all users (by their _id). I use query like this
{
"active" : true,
"$or" : [
{
"user_id" : "111111111111111111111111"
},
{
"_id" : {
"$in" : [
ObjectId("222222222222222222222222"),
ObjectId("333333333333333333333333"),
ObjectId("444444444444444444444444")
]
}
}
]
}
However in db.currentOp(true) I see that this query is running very slowly: lockStats.timeLockedMicros.r is about 3000.
How can I optimize performance of this query? I already have the following indexes on App:
> db.App.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mydb.App"
},
{
"v" : 1,
"key" : {
"active" : 1,
"created_at" : -1
},
"name" : "active_1_created_at_-1",
"ns" : "mydb.App",
"background" : true
},
{
"v" : 1,
"key" : {
"active" : 1,
"user_id" : 1
},
"name" : "active_1_user_id_1",
"ns" : "mydb.App",
"background" : true
}
]

Two issues I see here:
1) You would not need index on the boolean field active as it would have low selectivity and not benefiting query performance.
"If overall selectivity is low, and if MongoDB must read a number of documents to return results, then some queries may perform faster without indexes." source
2) You need an index for user_id because user_id cannot use the compound index you created for active_1_user_id_1
Edit: You can always check index efficiency by doing a explain(true) and look at which indexes are used for that query.

I would try to do the following:
remove all your indexes, your active field has a low cardinality (boolean) and does not help you at all, you are not using created_at, so there is no reason for it.
add an index only on user_id key
change your strings as numbers to numbers.

mongodb compound index and single index performance

Consider the following scenario:
100% of the time my query will include a in the query, and sometimes also b.
90% the query will be:
{a:"somevalue"}
and 10% it will be
{a:"somevalue",b:"somevalue"}
What would be the downside to satisfy this with a compound index only (if any)?, like so:
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
}
Or would i benefit from adding a second index satisfying only queries on a
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "foo.bar"
}

The manual has this to say;
If you have a collection that has a compound index on { a: 1, b: 1 }, as well as an index that consists of the prefix of that index, i.e. { a: 1 }, assuming none of the index has a sparse or unique constraints, then you can drop the { a: 1 } index.
MongoDB will be able to use the compound index in all of situations that it would have used the { a: 1 } index.
In other words, you'll most likely get as good or better performance using a single index, since MongoDB does not have to cache two indexes in memory or update two separate indexes on every insert.

Return range of documents around ID in MongoDB

I have an ID of a document and need to return the document plus the 10 documents that come before and the 10 documents after it. 21 docs total.
I do not have a start or end value from any key. Only the limit in either direction.
Best way to do this? Thank you in advance.

Did you know that ObjectID's contain a timestamp? And that therefore they always represent the natural insertion order. So if you are looking for documents before an after a known document _id you can do this:
Our documents:
{ "_id" : ObjectId("5307f2d80f936e03d1a1d1c8"), "a" : 1 }
{ "_id" : ObjectId("5307f2db0f936e03d1a1d1c9"), "b" : 1 }
{ "_id" : ObjectId("5307f2de0f936e03d1a1d1ca"), "c" : 1 }
{ "_id" : ObjectId("5307f2e20f936e03d1a1d1cb"), "d" : 1 }
{ "_id" : ObjectId("5307f2e50f936e03d1a1d1cc"), "e" : 1 }
{ "_id" : ObjectId("5307f2e90f936e03d1a1d1cd"), "f" : 1 }
{ "_id" : ObjectId("5307f2ec0f936e03d1a1d1ce"), "g" : 1 }
{ "_id" : ObjectId("5307f2ee0f936e03d1a1d1cf"), "h" : 1 }
{ "_id" : ObjectId("5307f2f10f936e03d1a1d1d0"), "i" : 1 }
{ "_id" : ObjectId("5307f2f50f936e03d1a1d1d1"), "j" : 1 }
{ "_id" : ObjectId("5307f3020f936e03d1a1d1d2"), "j" : 1 }
So we know the _id of "f", get it and the next 2 documents:
> db.items.find({ _id: {$gte: ObjectId("5307f2e90f936e03d1a1d1cd") } }).limit(3)
{ "_id" : ObjectId("5307f2e90f936e03d1a1d1cd"), "f" : 1 }
{ "_id" : ObjectId("5307f2ec0f936e03d1a1d1ce"), "g" : 1 }
{ "_id" : ObjectId("5307f2ee0f936e03d1a1d1cf"), "h" : 1 }
And do the same in reverse:
> db.items.find({ _id: {$lte: ObjectId("5307f2e90f936e03d1a1d1cd") } })
.sort({ _id: -1 }).limit(3)
{ "_id" : ObjectId("5307f2e90f936e03d1a1d1cd"), "f" : 1 }
{ "_id" : ObjectId("5307f2e50f936e03d1a1d1cc"), "e" : 1 }
{ "_id" : ObjectId("5307f2e20f936e03d1a1d1cb"), "d" : 1 }
And that's a much better approach than scanning a collection.

Neil's answer is a good answer to the question as stated (assuming that you are using automatically generated ObjectIds), but keep in mind that there's some subtlety around the concept of the 10 documents before and after a given document.
The complete format for an ObjectId is documented here. Note that it consists of the following fields:
timestamp to 1-second resolution,
machine identifier
process id
counter
Generally if you don't specify your own _ids they are automatically generated by the driver on the client machine. So as long as the ObjectIds are generated on a single process on a client single machine, their order does indeed reflect the order in which they were generated, which in a typical application will also be the insertion order (but need not be). However if you have multiple processes or multiple client machines, the order of the ObjectIds for objects generated within a given second by those multiple sources has an unpredictable relationship to the insertion order.

Adding unique index in MongoDB ignoring nulls

I'm trying to add unique index on a group of fields in MongoDB. Not all of those fields are available in all of the documents and I'd like to index only those which have all of the fields.
So, I'm trying to run this:
db.mycollection.ensureIndex({date:1, type:1, reference:1}, {sparse: true, unique: true})
But I get an error E11000 duplicate key error index on a field which misses 'type' field (there are many of them and they are duplicate, but I just want to ignore them).
Is it possible in MongoDB or there is some workaround?

There are multiple people who want this feature and because there is no workaround for this, I would recommend voting up feature request Jira tickets in jira.mongodb.org:
SERVER-785 - support filtered (partial) indexes
SERVER-2193 - sparse indexes only support single field
Note that because 785 would provide a way to enforce this feature, 2193 is marked "won't fix" so it may be more productive to vote up and add your comments to 785.

The uniqueness, you can guarantee, using upsert operation instead of doing insert. This will make sure that if some document already exist then it will update or insert if document don't exist
test:Mongo > db.test4.ensureIndex({ a : 1, b : 1, c : 1}, {sparse : 1})
test:Mongo > db.test4.update({a : 1, b : 1}, {$set : { d : 1}}, true, false)
test:Mongo > db.test4.find()
{ "_id" : ObjectId("51ae978960d5a3436edbaf7d"), "a" : 1, "b" : 1, "d" : 1 }
test:Mongo > db.test4.update({a : 1, b : 1, c : 1}, {$set : { d : 1}}, true, false)
test:Mongo > db.test4.find()
{ "_id" : ObjectId("51ae978960d5a3436edbaf7d"), "a" : 1, "b" : 1, "d" : 1 }
{ "_id" : ObjectId("51ae97b960d5a3436edbaf7e"), "a" : 1, "b" : 1, "c" : 1, "d" : 1 }

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse