reading the documentations of OR queries of MongoDB the syntax of OR is :
Syntax: { $or: [ { <expression1> }, { <expression2> }, ... , { <expressionN> } ] }
The thing I didn't understand is the sequence in which theses expression executes. i.e
get documents matching expression-1 if nothing is found get documents matching expression-2 ect ... Is it true that expression execute one after other or ?
and what does it mean: "
When using indexes with $or queries, remember that each clause of an $or query will execute in parallel."
There is no particular order to a queries execution whether it be:
db.col.find({ name: 's', c: 1 });
Or:
db.col.find({$or: [{name: 's'}, {c: 1}]})
In an $or MongoDB will effectively make x queries based on the number of conditions you have in your $or, return a result for each clause, merge duplicates and then return the result (simply put). Due to this behaviour MongoDB can actually use multiple indexes for an $or clause, and at the minute only an $or clause.
This is important to note when making compound indexes. Indexes do require a certain structure to the query sometimes to work at optimal pace. As such if you have a compound index of:
{name: 1, c: 1}
It may not match both clauses of:
db.col.find({$or: [{name: 's'}, {c: 1}]})
In a performant manner. So this is something you must bare in mind here with the parallelism of $or clauses. If you wish to check that all clauses use indexes you can use an explain() on the end which will break down the clauses and their usage for you.
Related
I have a mongodb index with close to 100k documents. On each document, there are the following 3 fields.
arrayX: [ObjectId]
someID: ObjectId
timestamp: Date
I have created a compound index for the 3 fields in that order.
When I try to then fire an aggregate query (written below in pseudocode), as
match(
and(
arrayX: (elematch: A),
someId: Y
)
)
sort (timestamp: 1)
it does not end up using the compound index.
The way I know this is when I use .explain(), the winningPlan stage is FETCH, the inputStage is IXSCAN and the indexname is timestamp_1
which means its only using the other single key index i created for the timestamp field.
What's interesting is that if I remove the sort stage, and keep everything the exact same, mongodb ends up using the compound index.
What gives?
Multi-key indexes are not useful for sorting. I would expect that a plan using the other index was listed in rejectedPlans.
If you run explain with the allPlansExecution option, the response will also show you the execution times for each plan, among other things.
Since the multi-key index can't be used for sorting the results, that plan would require a blocking sort stage. This means that all of the matching documents must be retrieved and then sorted in memory before sending a response.
On the other hand, using the timestamp_1 index means that documents will be encountered in a presorted order while traversing the index. The tradeoff here is that there is no blocking sort stage, but every document must be examined to see if it matches the query.
For data sets that are not huge, or when the query will match a significant fraction of the collection, the plan without a blocking sort will return results faster.
You might test creating another index on { someID:1, timestamp:1 } as this might reduce the number of documents scanned while still avoiding the blocking sort.
The reason the compound index is selected when you remove the sort stage is because that stage probably accounts for the majority of the execution time.
The fields in the executionStats section of the explain output are explained in Explain Results. Comparing the estimated execution times for each stage may help you determine where you can tune the queries.
I am using documents like this (based on the question post) for discussion:
{
_id: 1,
fld: "One",
arrayX: [ ObjectId("5e44f9ed221e963909537848"), ObjectId("5e44f9ed221e963909537849") ],
someID: ObjectId("5e44f9e7221e963909537845"),
timestamp: ISODate("2020-02-12T01:00:00.0Z")
}
The Indexes:
I created two indexes, as mentioned in the question post:
{ timestamp: 1 } and { arrayX:1, someID:1, timestamp:1 }
The Query:
db.test.find(
{
someID: ObjectId("5e44f9e7221e963909537845"),
arrayX: ObjectId("5e44f9ed221e963909537848")
}
).sort( { timestamp: 1 } )
In the above query I am not using $elemMatch. A query filter using $elemMatch with single field equality condition can be written without the $elemMatch. From $elemMatch Single Query Condition:
If you specify a single query predicate in the $elemMatch expression,
$elemMatch is not necessary.
The Query Plan:
I ran the query with explain, and found that the query uses the arrayX_1_someID_1_timestamp_1index. The index is used for the filter as well as the sort operations of the query.
Sample plan details:
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"arrayX" : 1,
"someID" : 1,
"timestamp" : 1
},
"indexName" : "arrayX_1_someID_1_timestamp_1",
...
The IXSCAN specifies that the query uses the index. The FETCH stage specifies that the document is retrieved for getting other details using the index id. This means that both the query's filter as well as the sort use the index. The way to know that sort uses an index is the plan will not have a SORT stage - as in this case.
Reference:
From Sort and Non-prefix Subset of an Index:
An index can support sort operations on a non-prefix subset of the
index key pattern. To do so, the query must include equality
conditions on all the prefix keys that precede the sort keys.
I have a collection like below
transfer_collection
- type
- value
- from
- to
- timestamp
What I want to extract from the collection is the highest 100 values with certain type within certain time period.
For example,
db.getCollection('transfer_collection').find({$and:[{"type":"normal"}, {"timestamp":{$gte:ISODate("2018-01-10T00:00:00.000Z")}}, {"timestamp":{$lt:ISODate("2018-01-11T00:00:00.000Z")}}]}).sort({"value":-1}).limit(100)
My question is, for query performance, how to make index?
{timestamp:1, type:1, value:-1}
{type:1, timestamp:1, value:-1}
{type:1, value:-1, timestamp:1}
any other else?
Thank you in advance.
In the query, the compound index on { type: 1, timestamp: 1, value: -1 } looks like an obvious choice. But, it is not so.
The keys in a compound index are used in a query's sort only if the query conditions before the sort have equality condition, not range conditions (using operators like $gte, $lt, etc.), as in this case where the key before sort is not an equality condition ("timestamp":{$gte:ISODate....
This requires the organization of the index as: { type: 1, value: -1, timestamp: 1 }
This is a concept called as Equality, Sort and Range; the keys of the compound index are to be in that order - the type field with equality condition, the value field with the sort operation, and the rage condition for the timestamp field.
Verify this by running the explain() function with the query. Use "executionStats" mode and check the results. The query plan should have a winningPlan with IXSCAN and there should not be a SORT stage (a sort operation that uses an index will not have the sort stage).
Note About Query Filter Document:
The query filter: { $and: [ { "type":"normal" }, {"timestamp":{ $gte:ISODate("2018-01-10T00:00:00.000Z") } }, { "timestamp": { $lt:ISODate("2018-01-11T00:00:00.000Z") } } ] }
In the query, you don't need to use the $and operator. The query filter can be written somewhat in a simpler way as follows:
find( { "type":"normal", "timestamp": { $gte:ISODate("2018-01-10T00:00:00.000Z"), $lt:ISODate("2018-01-11T00:00:00.000Z") } } ).sort(...)...
How to efficiently do an $in lookup on a collection with a compound index?
Index is on fields a and b per example below. EG: db.foo.createIndex({a: 1, b: 1})
Example in SQL:
SELECT *
FROM foo
WHERE (a,b)
IN (
("aVal1", "bVal1"),
("aVal2", "bVal2")
);
I know you can do something like:
db.foo.find( {
$or: [
{ a: "aVal1", b: "bVal1" },
{ a: "aVal2", b: "bVal2" },
]
} )
Is there a more performant way to do this using the $in operator?
Since you already create a compound index for (a, b), all of your clauses expression are supported by indexes -> mongo will use index scan instead of collection scan. It probably fast enough.
Reference: $or Clauses and Indexes
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
Now about your question
Is there a more performant way to do this using the $in operator?
$in match entire field. If you want to match (a,b) then obviously (a,b) must become an embedded object to search with $in.
Not sure if making embedded object fits your current schema / requirement. But if it is the case, $in has known for better performance comparing to $or:
When using $or with that are equality checks for the value of the same field, use the $in operator instead of the $or operator.
In this case, if you have embedded object like: {e: {a: 'x', b: 'y'}} then db.collections.createIndex({e: 1}) paired with $in will speed things up
I am facing a strange issue. I have a partial, compound, unique index with defination:
createIndex({a: 1, b:1, c: 1}, {unique:1, partialFilterExpression: {c: {$type: "string"}}})
Now when I perform a query this index is never used as per the explain plan. Even though there are document(s) matching the query.
Chaning same index to sparse instead of partial fixes the above issue, but sparse, compound, unique indexes have following issue:
dealing-with-mongodb-unique-sparse-compound-indexes
As noted in the query coverage documentation for partial indexes:
MongoDB will not use the partial index for a query or sort operation if using the index results in an incomplete result set.
To use the partial index, a query must contain the filter expression (or a modified filter expression that specifies a subset of the filter expression) as part of its query condition.
In your set up you create a partial index filtering on {c: {$type: "string"}}.
Your query conditions are {a:"1", b:"p", c:"2"}, or a query shape of three equality comparisons ({a: eq, b: eq, c: eq}). Since this query shape does not include a $type filter on c, the query planner has to consider that queries fitting the shape should match values of any data type and the partial index is not a viable candidate for complete results.
Some example queries that would use your partial index (tested with MongoDB 3.4.5):
// Search on {a, b} with c criteria matching the index filter
db.mydb.find({a:"1", b:"p", c: { $type: "string" } })
// Search on {a,b,c} and use $and to include the type of c
db.mydb.find({a:"1", b:"p", $and: [{ c: "2"} , {c: { $type: "string" }}]})
I have the following query:
a : true AND (b : 1 OR b : 2) AND ( c: null OR (c > startDate AND c <endDate))
So basically i am thinking of a compound index of all the three fields, because i have no sorting at all. At the first step, with the index on the boolean field, i will eliminate the largest portion of documents.
Then with the index on the second field, i saw that OR clause creates two separate queries and then combines them, while removing duplicates. So this should be pretty fast and efficient.
The last condition is a simple range of dates, so i think that adding the field to the index will be a good option.
Any suggestion on my thoughts? thanks
This query:
a : true AND (b : 1 OR b : 2) AND ( c: null OR (c > startDate AND c <endDate))
could otherwise be translated as:
db.collection.find({
a:true,
b:{$in:[1,2]},
$or: [
{c:null},
{c: {$gt: startDate, $lt: endDate}}
]
})
Because of that $or you will most likely need two indexes, however, since the $or covers only c then you only need an index on c. So that our first index:
db.collection.ensureIndex({c:1})
Now we cannot use the $or with a compound index because compound indexes work upon a prefix manner and $ors are evaluated as completely separate queries for each clause, as such it would be best to use a,b as the prefix to our index here.
This means you just need an index to cover the other part of your query:
db.collection.ensureIndex({b:1,a:1})
We put b first due to the boolean value of a, our index should perform better with b first.
Note: I am unsure about an index on a at all due to its low cardinality.