mongodb compound index and single index performance

mongodb compound index and single index performance - mongodb

Consider the following scenario:
100% of the time my query will include a in the query, and sometimes also b.
90% the query will be:
{a:"somevalue"}
and 10% it will be
{a:"somevalue",b:"somevalue"}
What would be the downside to satisfy this with a compound index only (if any)?, like so:
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
}
Or would i benefit from adding a second index satisfying only queries on a
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "foo.bar"
}

The manual has this to say;
If you have a collection that has a compound index on { a: 1, b: 1 }, as well as an index that consists of the prefix of that index, i.e. { a: 1 }, assuming none of the index has a sparse or unique constraints, then you can drop the { a: 1 } index.
MongoDB will be able to use the compound index in all of situations that it would have used the { a: 1 } index.
In other words, you'll most likely get as good or better performance using a single index, since MongoDB does not have to cache two indexes in memory or update two separate indexes on every insert.

Related

Is searching by _id in mongoDB more efficient?

In my use case, I want to search a document by a given unique string in MongoDB. However, I want my queries to be fast and searching by _id will add some overhead. I want to know if there are any benefits in MongoDB to search a document by _id over any other unique value?
To my knowledge object ID are similar to any other unique value in a document [Point made for the case of searching only].
As for the overhead, you can assume I am caching the string to objectID and the cache is very small and in memory [Almost negligible], though the DB is large.

Analyzing your query performance
I advise you to use .explain() provided by mongoDB to analyze your query performance.
Let's say we are trying to execute this query
db.inventory.find( { quantity: { $gte: 100, $lte: 200 } } )
This would be the result of the query execution
{ "_id" : 2, "item" : "f2", "type" : "food", "quantity" : 100 }
{ "_id" : 3, "item" : "p1", "type" : "paper", "quantity" : 200 }
{ "_id" : 4, "item" : "p2", "type" : "paper", "quantity" : 150 }
If we call .execution() this way
db.inventory.find(
{ quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")
It will return the following result:
{
"queryPlanner" : {
"plannerVersion" : 1,
...
"winningPlan" : {
"stage" : "COLLSCAN",
...
}
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 3,
"executionTimeMillis" : 0,
"totalKeysExamined" : 0,
"totalDocsExamined" : 10,
"executionStages" : {
"stage" : "COLLSCAN",
...
},
...
},
...
}
More details about this can be found here
How efficient is search by _id and indexes
To answer your question, using indexes is always more efficient. Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. With _id being the default index provided by MongoDB, that makes it more efficient.
Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.
So, YES, using indexes like _id is better!
You can also create your own indexes by using createIndex()
db.collection.createIndex( <key and index type specification>, <options> )
Optimize your MongoDB query
In case you want to optimize your query, there are multiple ways to do that.
Creating custom indexes to support your queries
Limit the Number of Query Results to Reduce Network Demand
db.posts.find().sort( { timestamp : -1 } ).limit(10)
Use Projections to Return Only Necessary Data
db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1 } )
Use $hint to Select a Particular Index
db.users.find().hint( { age: 1 } )

Short answer, yes _id is the primary key and it's indexed. Of course it's fast.
But you can use an index on the other fields too and get more efficient queries.

MongoDB - How Index prefixe works?

I have read this documentation : "Sort and Non-prefix Subset of an Index"
With that info. I am trying to answer this MongoDB mock test question, the question they have is
You have the following indexes on the things collection:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.things"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "test.things"
},
{
"v" : 1,
"key" : {
"c" : 1,
"b" : 1,
"a" : 1
},
"name" : "c_1_b_1_a_1",
"ns" : "test.things"
}
]
Question:
Which of the following queries will require that you load every document into RAM in order to fulfill the query? Assume that no data is being written during the query. Check all that apply.
db.things.find( { b : 1 } ).sort( { c : 1, a : 1 } )
db.things.find( { c : 1 } ).sort( { a : 1, b : 1 } )
db.things.find( { a : 1 } ).sort( { b : 1, c : 1 } )
The answer they give is...
db.things.find( { b: 1} ).sort( {c: 1, a: 1} )
Can someone help me understand why the other 2 option are not correct i.e how are they using index/Index-prefix. My understanding is the SORT part has to match indexed-column-order. Also, the suggested correct answer does not seem to meet the rule (per documentation) either.

I believe, given the options and answer, the emphasis is to find:
Which of the following queries will require that you load every document into RAM in order to fulfill the query?
So the sorting is a red-herring.
find({ b: 1 }) can't use any of the indexes provided
find({ c: 1 }) can use index c_1_b_1_a_1 since it matches the prefix
find({ a: 1 }) can use index a_1
Since options #2 and #3 can use an index, they will not load every document in order to sort them, just the ones found via the index. Option #1 will have to do a full collection scan to find documents where b is 1.

MongoDB $or + sort + index. How to avoid sorting in memory?

I have an issue to generate proper index for my mongo query, which would avoid SORT stage. I am not even sure if that is possible in my case. So here is my query with execution stats:
db.getCollection('test').find(
{
"$or" : [
{
"a" : { "$elemMatch" : { "_id" : { "$in" : [4577] } } },
"b" : { "$in" : [290] },
"c" : { "$in" : [35, 49, 57, 101, 161, 440] },
"d" : { "$lte" : 399 }
},
{
"e" : { "$elemMatch" : { "numbers" : { "$in" : ["1K0407151AC", "0K20N51150A"] } } },
"d" : { "$lte" : 399 }
}]
})
.sort({ "X" : 1, "d" : 1, "Y" : 1, "Z" : 1 }).explain("executionStats")
The fields 'm', 'a' and 'e' are arrays, that is why 'm' is not included in any index.
If you check the execution stats screenshot, you will see that memory usage is pretty close to maximum and unfortunately I had cases where the query failed to execute because of the 32MB limit.
Index for the first part of the $or query:
{
"a._id" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1,
"b" : 1,
"c" : 1
}
Index for the second part of the $or query:
{
"e.numbers" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1
}
The indexes are used by the query, but not for sorting. Instead of SORT stage I would like too see SORT_MERGE stage, but no success for now. If I run the part queries inside $or separately, they are able to use the index to avoid sorting in a memory. As a workaround it is ok, but I would need to merge and resort the results by the application.
MongoDB version is 3.4.2. I checked that and that question. My query is the result. Probably I missed something?
Edit: mongo documents look like that:
{
"_id" : "290_440_K760A03",
"Z" : "K760A03",
"c" : 440,
"Y" : "NPS",
"b" : 290,
"X" : "Schlussleuchte",
"e" : [
{
"..." : 184,
"numbers" : [
"0K20N51150A"
]
}
],
"a" : [
{
"_id" : 4577,
"..." : [
{
"..." : [
{
"..." : "R",
}
]
}
]
},
{
"_id" : 4578
}
],
"d" : 101,
"m" : [
"AT",
"BR",
"CH"
],
"moreFields":"..."
}
Edit 2: removed the filed "m" from query to decrease complexity and attached test collection dump for someone, who wants to help :)

Here is the solution-
I just added one document in my test collection as shown in your question (edit part). Then I created below four indices-
1. {"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
2. {"a._id":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
3. {"m":1,"X":1,"d":1,"Y":1,"Z":1}
4. {"e.numbers":1,"X":1,"d":1,"Y":1,"Z":1}
And when I executed given query for execution stats then it shows me the SORT_MERGE state as expected.
Here is the explanation-
MongoDB has a thing called equality-sort-range which tells a lot how we should create our indices. I just followed this rule and kept the index in that order. So Here the index should be {Equality fields, "X":1,"d":1,"Y":1,"Z":1, Range fields}. You can see that the query has range on field "d" only ("d" : { "$lte" : 101 }) but "d" is already covered in SORT fields of index ("X":1,"d":1,"Y":1,"Z":1) so we can skip range part (i.e. field "d") from the end of index.
If "d" had NOT been in sort/equality predicate then I would have taken it in index for range index field and my index would have looked like {Equality fields, "X":1,"Y":1,"Z":1,"d":1}.
Now my index is {Equality fields, "X":1,"d":1,"Y":1,"Z":1} and I am just concerned about equality fields. So to figure out equality fields I just checked the query find predicates and I found there are two conditions combined by OR operator.
The first condition has equality on "a._id", "b", "c", "m" ("d" has range, not equality). So I need to create an index like "a._id":1,"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1 but this will give error because it has two array fields "a_id" and "m". And as we know Mongo doesn't allow compound index on parallel arrays so it will fail. So I created two separate index just to allow Mongo to use whatever is chosen by query planner. And hence I created first and second index.
The second condition of OR operator has "e.numbers" and "m". Both are arrays fields so I had to create two indices as done for first condition and that's how I got my third and fourth index.
Now we know that at a time a single query can use only and only one index so I need to create these indices because I don't know which branch of OR operator will be executed.
Note: If you are concerned about size of index then you can keep only one index from first two and one from last two. Or you can also keep all four and hint mongo to use proper index if you know it well before query planner.

Optimize query performance in MongoDB

I have a collection named App and need to query those active (active: true) apps that belong to a particular user (user_id) or are available to all users (by their _id). I use query like this
{
"active" : true,
"$or" : [
{
"user_id" : "111111111111111111111111"
},
{
"_id" : {
"$in" : [
ObjectId("222222222222222222222222"),
ObjectId("333333333333333333333333"),
ObjectId("444444444444444444444444")
]
}
}
]
}
However in db.currentOp(true) I see that this query is running very slowly: lockStats.timeLockedMicros.r is about 3000.
How can I optimize performance of this query? I already have the following indexes on App:
> db.App.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mydb.App"
},
{
"v" : 1,
"key" : {
"active" : 1,
"created_at" : -1
},
"name" : "active_1_created_at_-1",
"ns" : "mydb.App",
"background" : true
},
{
"v" : 1,
"key" : {
"active" : 1,
"user_id" : 1
},
"name" : "active_1_user_id_1",
"ns" : "mydb.App",
"background" : true
}
]

Two issues I see here:
1) You would not need index on the boolean field active as it would have low selectivity and not benefiting query performance.
"If overall selectivity is low, and if MongoDB must read a number of documents to return results, then some queries may perform faster without indexes." source
2) You need an index for user_id because user_id cannot use the compound index you created for active_1_user_id_1
Edit: You can always check index efficiency by doing a explain(true) and look at which indexes are used for that query.

I would try to do the following:
remove all your indexes, your active field has a low cardinality (boolean) and does not help you at all, you are not using created_at, so there is no reason for it.
add an index only on user_id key
change your strings as numbers to numbers.

Calling ensureIndex with compound key results in _id field in index object

When I call ensureIndex from the mongo shell on a collection for a compound index an _id field of type ObjectId is auto-generated in the index object.
> db.system.indexes.find();
{ "name" : "_id_", "ns" : "database.coll", "key" : { "_id" : 1 } }
{ "_id" : ObjectId("4ea78d66413e9b6a64c3e941"), "ns" : "database.coll", "key" : { "a.b" : 1, "a.c" : 1 }, "name" : "a.b_1_a.c_1" }
This makes intuitive sense as all documents in a collection need an _id field (even system.indexes, right?), but when I check the indexes generated by morphia's ensureIndex call for the same collection *there is no _id property*.
Looking at morphia's source code, it's clear that it's calling the same code that the shell uses, but for some reason (whether it's the fact that I'm creating a compound index or indexing an Embedded document or both) they produce different results. Can anyone explain this behavior to me?

Not exactly sure how you managed to get an _id field in the indexes collection but both shell and Morphia originated ensureIndex calls for compound indexes do not put an _id field in the index object :
> db.test.ensureIndex({'a.b':1, 'a.c':1})
> db.system.indexes.find({})
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.test", "name" : "_id_" }
{ "v" : 1, "key" : { "a.b" : 1, "a.c" : 1 }, "ns" : "test.test", "name" : "a.b_1_a.c_1" }
>
Upgrade to 2.x if you're running an older version to avoid running into now resolved issues. And judging from your output you are running 1.8 or earlier.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse