Mongo $in with compound index - mongodb

How to efficiently do an $in lookup on a collection with a compound index?
Index is on fields a and b per example below. EG: db.foo.createIndex({a: 1, b: 1})
Example in SQL:
SELECT *
FROM foo
WHERE (a,b)
IN (
("aVal1", "bVal1"),
("aVal2", "bVal2")
);
I know you can do something like:
db.foo.find( {
$or: [
{ a: "aVal1", b: "bVal1" },
{ a: "aVal2", b: "bVal2" },
]
} )
Is there a more performant way to do this using the $in operator?

Since you already create a compound index for (a, b), all of your clauses expression are supported by indexes -> mongo will use index scan instead of collection scan. It probably fast enough.
Reference: $or Clauses and Indexes
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
Now about your question
Is there a more performant way to do this using the $in operator?
$in match entire field. If you want to match (a,b) then obviously (a,b) must become an embedded object to search with $in.
Not sure if making embedded object fits your current schema / requirement. But if it is the case, $in has known for better performance comparing to $or:
When using $or with that are equality checks for the value of the same field, use the $in operator instead of the $or operator.
In this case, if you have embedded object like: {e: {a: 'x', b: 'y'}} then db.collections.createIndex({e: 1}) paired with $in will speed things up

Related

Mongodb compound index not being used

I have a mongodb index with close to 100k documents. On each document, there are the following 3 fields.
arrayX: [ObjectId]
someID: ObjectId
timestamp: Date
I have created a compound index for the 3 fields in that order.
When I try to then fire an aggregate query (written below in pseudocode), as
match(
and(
arrayX: (elematch: A),
someId: Y
)
)
sort (timestamp: 1)
it does not end up using the compound index.
The way I know this is when I use .explain(), the winningPlan stage is FETCH, the inputStage is IXSCAN and the indexname is timestamp_1
which means its only using the other single key index i created for the timestamp field.
What's interesting is that if I remove the sort stage, and keep everything the exact same, mongodb ends up using the compound index.
What gives?
Multi-key indexes are not useful for sorting. I would expect that a plan using the other index was listed in rejectedPlans.
If you run explain with the allPlansExecution option, the response will also show you the execution times for each plan, among other things.
Since the multi-key index can't be used for sorting the results, that plan would require a blocking sort stage. This means that all of the matching documents must be retrieved and then sorted in memory before sending a response.
On the other hand, using the timestamp_1 index means that documents will be encountered in a presorted order while traversing the index. The tradeoff here is that there is no blocking sort stage, but every document must be examined to see if it matches the query.
For data sets that are not huge, or when the query will match a significant fraction of the collection, the plan without a blocking sort will return results faster.
You might test creating another index on { someID:1, timestamp:1 } as this might reduce the number of documents scanned while still avoiding the blocking sort.
The reason the compound index is selected when you remove the sort stage is because that stage probably accounts for the majority of the execution time.
The fields in the executionStats section of the explain output are explained in Explain Results. Comparing the estimated execution times for each stage may help you determine where you can tune the queries.
I am using documents like this (based on the question post) for discussion:
{
_id: 1,
fld: "One",
arrayX: [ ObjectId("5e44f9ed221e963909537848"), ObjectId("5e44f9ed221e963909537849") ],
someID: ObjectId("5e44f9e7221e963909537845"),
timestamp: ISODate("2020-02-12T01:00:00.0Z")
}
The Indexes:
I created two indexes, as mentioned in the question post:
{ timestamp: 1 } and { arrayX:1, someID:1, timestamp:1 }
The Query:
db.test.find(
{
someID: ObjectId("5e44f9e7221e963909537845"),
arrayX: ObjectId("5e44f9ed221e963909537848")
}
).sort( { timestamp: 1 } )
In the above query I am not using $elemMatch. A query filter using $elemMatch with single field equality condition can be written without the $elemMatch. From $elemMatch Single Query Condition:
If you specify a single query predicate in the $elemMatch expression,
$elemMatch is not necessary.
The Query Plan:
I ran the query with explain, and found that the query uses the arrayX_1_someID_1_timestamp_1index. The index is used for the filter as well as the sort operations of the query.
Sample plan details:
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"arrayX" : 1,
"someID" : 1,
"timestamp" : 1
},
"indexName" : "arrayX_1_someID_1_timestamp_1",
...
The IXSCAN specifies that the query uses the index. The FETCH stage specifies that the document is retrieved for getting other details using the index id. This means that both the query's filter as well as the sort use the index. The way to know that sort uses an index is the plan will not have a SORT stage - as in this case.
Reference:
From Sort and Non-prefix Subset of an Index:
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.

Mongo Partial Compound Unique Index | Not Used In Query

I am facing a strange issue. I have a partial, compound, unique index with defination:
createIndex({a: 1, b:1, c: 1}, {unique:1, partialFilterExpression: {c: {$type: "string"}}})
Now when I perform a query this index is never used as per the explain plan. Even though there are document(s) matching the query.
Chaning same index to sparse instead of partial fixes the above issue, but sparse, compound, unique indexes have following issue:
dealing-with-mongodb-unique-sparse-compound-indexes
As noted in the query coverage documentation for partial indexes:
MongoDB will not use the partial index for a query or sort operation if using the index results in an incomplete result set.
To use the partial index, a query must contain the filter expression (or a modified filter expression that specifies a subset of the filter expression) as part of its query condition.
In your set up you create a partial index filtering on {c: {$type: "string"}}.
Your query conditions are {a:"1", b:"p", c:"2"}, or a query shape of three equality comparisons ({a: eq, b: eq, c: eq}). Since this query shape does not include a $type filter on c, the query planner has to consider that queries fitting the shape should match values of any data type and the partial index is not a viable candidate for complete results.
Some example queries that would use your partial index (tested with MongoDB 3.4.5):
// Search on {a, b} with c criteria matching the index filter
db.mydb.find({a:"1", b:"p", c: { $type: "string" } })
// Search on {a,b,c} and use $and to include the type of c
db.mydb.find({a:"1", b:"p", $and: [{ c: "2"} , {c: { $type: "string" }}]})

MongoDB index suggestion

I have the following query:
a : true AND (b : 1 OR b : 2) AND ( c: null OR (c > startDate AND c <endDate))
So basically i am thinking of a compound index of all the three fields, because i have no sorting at all. At the first step, with the index on the boolean field, i will eliminate the largest portion of documents.
Then with the index on the second field, i saw that OR clause creates two separate queries and then combines them, while removing duplicates. So this should be pretty fast and efficient.
The last condition is a simple range of dates, so i think that adding the field to the index will be a good option.
Any suggestion on my thoughts? thanks
This query:
a : true AND (b : 1 OR b : 2) AND ( c: null OR (c > startDate AND c <endDate))
could otherwise be translated as:
db.collection.find({
a:true,
b:{$in:[1,2]},
$or: [
{c:null},
{c: {$gt: startDate, $lt: endDate}}
]
})
Because of that $or you will most likely need two indexes, however, since the $or covers only c then you only need an index on c. So that our first index:
db.collection.ensureIndex({c:1})
Now we cannot use the $or with a compound index because compound indexes work upon a prefix manner and $ors are evaluated as completely separate queries for each clause, as such it would be best to use a,b as the prefix to our index here.
This means you just need an index to cover the other part of your query:
db.collection.ensureIndex({b:1,a:1})
We put b first due to the boolean value of a, our index should perform better with b first.
Note: I am unsure about an index on a at all due to its low cardinality.

Does order of indexes matter in MongoDB?

Is tip #25 in Tips and Tricks for MongoDB Developers correct?
It says that this query:
collection.find({"x" : criteria, "y" : criteria, "z" : criteria})
can be optimized with
collection.ensureIndex({"y" : 1, "z" : 1, "x" : 1})
I think it's false because for this to work, x should be in front. I thought the order of indexes matter.
So where did I go wrong?
The order of the fields in the index only matters if the query doesn't include all of the fields in the index. This query is referencing all three fields so the order of the fields in the index doesn't matter.
See more details in the docs on compound indexes.
The order of the fields in the find query object is not relevant.
For beginners who wants to understand it better
Mongodb says The index contains references to documents sorted first by the values of the item field and, within each value of the item field, sorted by values of the stock field." What does this mean ????
let's create a compound index on fields a, b, c, and d in ascescending order(1)
Model.createIndex({ a: 1, b: 1, c: 1, d: 1 });
I visualize it as:
at level-1, list of references sorted in a specified order(1) based on the value of the first index field(a)
at level-2, each reference at level-1, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(b).
at level-3, each reference at level-2, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(c).
at level-4, each reference at level-3, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(d).
This chain forms a tree structure thus is chosen to store in B-TREE data structure.
I would love to call this storage system a compound-index-chain in this context.
Normally we build indices to perform two types of operations 1. Query Operation like find() and 2. Non-query operation like Sort()
Now you created compound index on { a: 1, b: 1, c: 1, d: 1 }. But only index creation is not enought. It becomes inefficient and sometimes useless if you don't structure your database operatons(find and sort) in a way that use those indexes.
Let's dig deeper into what kinds of query supports what kind of index ?
find():
The following prefixes of the compound index supports also indexed find() query operation on fields
{a:1},
{a:1, b:1},
{a:1, b:1, c:1}
// Index prefixes are the beginning subsets of indexed fields
#JohnnyHK already said "The order of the fields in the find query object is not relevant."
The fields could be in ANY ORDER like {b:1, a1} instead of {a1:, b:1}. Index will still be utilized as long as it is find() operation being operated on compound index or the prefix of the compound index.
However the performance of the query will not be same(may degrade) even though the find() query is using the same index and index is being utilized if the order of the the fields in find() is not highly selective than other subsequent fields.
Meaning, if the first field in a query say find({a: 'red', b: tshrt}), has HIGH SELECTIVITY, the query will be less efficient than find({a: 'tshirt', b: 'red'}) as this query hs LOWER SELECTIVITY even though both queries are using one index {a:1, b:1}.
However the HIGHLY SELECTIVE query will perform better than not having any index at all.
I think #Sushil tried to touch this topic.
In case if you are still wondering, Query selectivity refers to how well the query predicate excludes or filters out documents in a collection. Query selectivity can determine whether or not queries can use indexes effectively or even use indexes at all.
Now Let's come to the prefixes of compound indexs
Note:find() behaves differently on this {a:1, c:1} prefix of the compound index {a:1, b:1, c:1, d1} than rest of its prefixes?
In this case, The find() operation will not be able to utilize our compound index efficiently.
What happens is a:1 field index will only be able to support the find query. index on c:1 field will not be used at all because compound-index-chain has been broken in between due to the absence of b:1 index field in the prefix.
So if find() query operates on a and c field together, for field a:1 IXSCAN( i.e use of index on a) and field c COLLSCAN(i.e no use of index) will be used. Meaining the query will be slower than having separate compound index on {a:1,c:1} but faster than not having any index at all.
Conclusion is Index fields are parsed in order; if a query omits a particular index prefix, it is unable to make use of any index fields that follow that prefix.
2. Sort():
For non-query-operation(i.e Sort), the subsets of the compount index must in the same order of the index as well as must also be in the either same or oposite direction of the direction specified for each field while creating the compound index.
Let's see how the our compound index { a: 1, b: 1, c: 1, d: 1 } with ascescending direction behave with sort() operation:
Let's look at the direction of the indexed fields in sorting.
As we know on single field index on {a:1} can support sort on {a:1} same-direction and {a:-1} reverse-direction,
Compound indexes follow the same rules while sorting.
{a:1, b:1, c:1, d:1} // in same-direction as of our compound index
{a:-1, b:-1, c:-1, d:-1 } // in reverse-direction of our compound index
// But these field have neither same-direction nor reverse-direction but is ARBITARY/MIXED. Thus
// Index will be discarded while performing sorting with these fields and directions
{a:-1:, b:1, c:1, d:1}
Another example would be compound index on {a:1, b:-1} can support indexed sorting on {a:1,b:-1} (same-direction) and on {a:-1,b:1}(reverse-direction) BUT NOT support {a:-1, b:-1}.
Now let's look at the order of the indexed fields in sorting
OPTIMUM SORTING:
When a Sort operation using the compound index or using the prefix of the compound index, examining the result set in the memory(RAM) is not needed. Such sorting operation is solely sattisfied by the fields available in the index, gives optimum performance in sorting operation.
For instance:
// compound index
{a:1, b:1, c:1, d:1}
// prefix of the compound index
{a:1},
{a:1, b:1},
{a:1, b:1, c:1}
Compound-Index-Chain-Break:
When a sort operation is partially covered by the compound index, may require to examine the non-indexed matched result set in the memory.
Model.find({ a: 2 }).sort({ c: 1 }); // will not use index for sorting using field c. But will be used for finding
Model.find({ a: { $gt: 2 } }).sort({ c: 1 }); // will not use index for sorting But will be used for finding
// because compound-index-chain-break due to absence of field b of the prefix {a:1, b:1, c:1} of our compound index {a:1, b:1, c:1, d:1}
Sort on Non-prefix Subset:
When prefix keys of the index appear in both the query predicate(i.e find()) and the sort(), that index fields which precedes(or overlap) the sort subset MUST have the equality conditions($eq,$gte,$lte) in the query. So
A compound index can support indexed query on the its index prefixes as well.
Model.find({ c: 5 }).sort({ c: 1 }); // will not use index at all because it does not belongs to any of the prefix of our compound index
Model.find({ b: 3, a: 4 }).sort({ c: 1 }); // will use the index for both finding and sorting as it belongs to one our index prexfix ie. {a:1, b:1, c:1}
Model.find({ a: 4 }).sort({ a: 1, b: 1 }); // will use index for finding but not use index for sorting because a field is overlapped.
Model.find({ a: { $gt: 4 } }).sort({ a: 1, b: 1 }); // will use index for both finding and sorting because overlapped field (a) in the predicate uses equality operator and it belongs to the prefix {a:1, b:1}
Model.find({ a: 5, b: 3 }).sort({ b: 1 }); // will not use index for sorting
Model.find({ a: 5, b: { $lt: 3 } }).sort({ b: 1 }); // will use index for both finding and sorting
Hope this helps somebody
The books states the below scenario
You have 3 queries to run:
Collection.find({x:criteria, y:criteria,z:criteria})
Collection.find({z:criteria, y:criteria,w:criteria})
Collection.find({y:criteria, w:criteria })
To use
collection.ensureIndex({y:1,z:1,x:1})
it is considering the occurrence and as occurrence of y is more, it want all the queries to hit y followed by z and lastly as you will be running the 1st query a thousand times more than the other two hence including x, if this was not the case and you run all 3 queries equally then the suggestion is
Collection.ensureIndex({y:1,w:1,z:1}) .
Moreover as per the MongoDB documentation “The order of fields in a compound index is very important.” But in the above case scenario the use case is different. It is trying to optimize all the use case queries with one index.

Expressions sequence in OR query of MongoDB

reading the documentations of OR queries of MongoDB the syntax of OR is :
Syntax: { $or: [ { <expression1> }, { <expression2> }, ... , { <expressionN> } ] }
The thing I didn't understand is the sequence in which theses expression executes. i.e
get documents matching expression-1 if nothing is found get documents matching expression-2 ect ... Is it true that expression execute one after other or ?
and what does it mean: "
When using indexes with $or queries, remember that each clause of an $or query will execute in parallel."
There is no particular order to a queries execution whether it be:
db.col.find({ name: 's', c: 1 });
Or:
db.col.find({$or: [{name: 's'}, {c: 1}]})
In an $or MongoDB will effectively make x queries based on the number of conditions you have in your $or, return a result for each clause, merge duplicates and then return the result (simply put). Due to this behaviour MongoDB can actually use multiple indexes for an $or clause, and at the minute only an $or clause.
This is important to note when making compound indexes. Indexes do require a certain structure to the query sometimes to work at optimal pace. As such if you have a compound index of:
{name: 1, c: 1}
It may not match both clauses of:
db.col.find({$or: [{name: 's'}, {c: 1}]})
In a performant manner. So this is something you must bare in mind here with the parallelism of $or clauses. If you wish to check that all clauses use indexes you can use an explain() on the end which will break down the clauses and their usage for you.