this is a question about how to create efficient indexes when query have "or". Without “or” ,I know how to create efficient index.
This is my query.
db.collection.find({
'msg.sendTime':{$gt:1},
'msg.msgType':{$in:["chat","g_card"]},
$or:[{'msg.recvId':{$in:['xm80049258']}},{'msg.userId':'xm80049258'}],
$orderby:{'msg.sendTime':-1}})
After reading some article, I create two single index on msg.recvId and msg.userId, and this make sense.
I want to know when mongodb execute "or", Is it divides all documents at very first ,then use msg.sendTime and msg.msgType ?
How to create efficient indexes in this case? Should I create indexes (msg.sendTime:1,msg.msgType:1,msg.recvId:1) and
(msg.sendTime:1,msg.msgType:1,msg.userId:1)
Thanks very much.
Paraphrasing from $or Clauses and Indexes:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes.
Also from Indexing Strategies:
Generally, MongoDB only uses one index to fulfill most queries. However, each clause of an $or query may use a different index
What those paragraph mean for $or queries are:
In a find() query, only one index can be used. Therefore it's best to create an index that aligns with the fields in your query. Otherwise, MongoDB will do a collection scan.
Except when the query is an $or query, where MongoDB can use one index per $or term
In combination, if you have $or in your query, it's best to put the $or term as the top-level term, and create an index for each term separately
So to answer your question:
I want to know when mongodb execute "or", Is it divides all documents at very first ,then use msg.sendTime and msg.msgType ?
If your query has a top-level $or clause, MongoDB can use one index per clause. Otherwise, it will do a collection scan, or a semi-collection scan. For example, if you have an index:
db.collection.createIndex({a: 1, b: 1})
There are two general type of query you can create:
1. $or NOT on the top level of the query
This query can use the index, but will not be performant:
db.collection.find({a: 1, $or: [{b: 1}, {b: 2}]})
since the explain() output of the query is:
> db.collection.explain().find({a: 1, $or: [{b: 1}, {b: 2}]})
{
"queryPlanner": {
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[MinKey, MaxKey]"
]
...
Note that the query planner cannot use the proper boundary for the b field, where it is doing a semi-collection scan (since it's searching for b from MinKey to MaxKey, i.e. everything). The query planner result above is basically saying: "Find documents where a = 1, and scan all of them for b having value of 1 or 2"
2. $or on the top level of the query
However, pulling the $or clause to the top-level:
db.collection.find({$or: [{a: 1, b: 1}, {a: 1, b: 2}]})
will result in this query plan:
> db.test.explain().find({$or: [{a: 1, b: 1}, {a: 1, b: 2}]})
{
"queryPlanner": {
...
"winningPlan": {
"stage": "SUBPLAN",
...
"inputStages": [
{
"stage": "IXSCAN",
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[1.0, 1.0]"
]
}
},
{
"stage": "IXSCAN",
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[2.0, 2.0]"
]
Note that each term of the $or is treated as a separate query, each with a tight boundary. As such, the query plan above is saying: "Find documents where a = 1, b = 1 or a = 1, b = 2". As you can imagine, this query will be much more performant compared to the earlier query.
For your second question:
How to create efficient indexes in this case? Should I create indexes (msg.sendTime:1,msg.msgType:1,msg.recvId:1) and (msg.sendTime:1,msg.msgType:1,msg.userId:1)
As explained above, you need to combine the proper query with the proper index to achieve the best result. The two indexes you proposed will be able to be used by MongoDB and will work best if you rearrange your query to have the $or in the top-level of your query.
I encourage you to understand the explain() output of MongoDB, since it's the best tool to find out if your queries are using the proper indexes or not.
Relevant resources that you may find useful are:
Explain Results
Create Indexes to Support Your Queries
Indexing Strategies
Related
I have a lot of documents looking like this:
[{
"title": "Luxe [daagse] [verzorging] # Egypte! Incl. vluchten, transfers & 4* ho",
"price": 433,
"automatic": false,
"destination": "5d26fc92f72acc7a0b19f2c4",
"date": "2020-01-19T00:00:00.000+00:00",
"days": 8,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
},
{
"title": "Luxe [daagse] [verzorging] # Egypte! Incl. vluchten, transfers & 4* ho",
"automatic": true,
"destination": "5d26fc92f72acc7a0b19f2c4",
"prices": [{
"price": 433,
"date_from": "2020-01-19T00:00:00.000+00:00",
"date_to": "2020-01-28T00:00:00.000+00:00",
"day_count": 8,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
},
{
"price": 899,
"date_from": "2020-04-19T00:00:00.000+00:00",
"date_to": "2020-04-28T00:00:00.000+00:00",
"day_count": 19,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
}
]
}
]
As you can see, automatic deals have multiple prices (can be a lot, between 1000 and 4000) and does not have the original fields available.
Now I need to search in the original document as well in the subdocuments to look for a match.
This is the aggregation I use to search through the documents:
[{
"$match": {
"destination": {
"$in": ["5d26fc9af72acc7a0b19f313"]
}
}
}, {
"$match": {
"$or": [{
"prices": {
"$elemMatch": {
"price": {
"$lte": 1500,
"$gte": 400
},
"date_to": {
"$lte": "2020-04-30T22:00:00.000Z"
},
"date_from": {
"$gte": "2020-03-31T22:00:00.000Z"
},
"board_type": {
"$in": ["5d08e1bfff6c4f13f6db1e68"]
}
}
}
}, {
"price": {
"$lte": 1500,
"$gte": 400
},
"date": {
"$lte": "2020-04-30T22:00:00.000Z",
"$gte": "2020-03-31T22:00:00.000Z"
},
"board_type": {
"$in": ["5d08e1bfff6c4f13f6db1e68"]
}
}]
}
}, {
"$limit": 20
}]
I would like to speed things up, because it can be quite slow. I was wondering, what is the best index strategy for this aggregate, what fields do I use? Is this the best way of doing it or is there a better way?
From Mongo's $or docs:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
So with that in mind in order to avoid a collection scan in this pipeline you have to create a compound index containing both price and prices fields.
Remember that order matters in compound indexes so the order of the field should vary depending on your possible usage of it.
It seems to me that the index you want to create looks something like:
{destination: 1, date: 1, board_type: 1, price: 1, prices: 1}
A compound index including the match filter fields is required to make the aggregation run fast. In aggregation queries, having the $match stage early in the pipeline (preferably, first stage) utilizes indexes, if any are defined on the filter fields. In the posted query it is so, and defining the indexes is all needed for a fast query. But, index on what fields?
The index is going to be compound index; i.e., index on multiple fields of the query criteria. The index prefix starts with the destination field. The remaining index fields are to be determined. What are the remaining fields?
Most of these fields are in the prices array's sub-document fields - price, date_from, date_to and board_type. There is also the date field from the main document. Which of these fields need to be used in the compound index?
Defining indexes on array elements (or fields of sub-documents in an array) creates lots of index keys. This means lots of storage and for using the index the memory (or RAM). This is an important consideration. Indexes on array elements are called as multikey indexes. For an index to be properly utilized, the collection's documents and the index being used by the query (together called as working set) must fit into the RAM.
Another aspect you need to consider is the query selectivity. How many documents gets selected using a filter which uses an index field, is a factor. It is imperative that the filter field with must select a small set of the input documents to be effective. See Create Queries that Ensure Selectivity.
It is difficult to determine what other fields need to be considered (sure some of the fields of the prices) based on the above two factors. So, the index is going to be something like this:
{ destination: 1, fld1: 1, fld2: 1, ... }
The fld1, fld2, ..., are going to be the prices array sub-document fields and / or the date field. I think only one set of date fields can be used with the index. An example index can be one of these:
{ destination: 1, date: 1, "prices.price": 1, "prices.board_type": 1}
{ destination: 1, "prices.price": 1, "prices.date_from": 1, "prices.date_to": 1, "prices.board_type": 1}
Note the index keys order and the necessity of the price, date_from, date_to and board_type is to be determined based upon the two main factors - requirement of the working set and the query selectivity - this is important.
NOTES: On a small sample data set with similar structure showed usage of the compound index with the primary destination field and two fields from the prices (one with equality condition and one with range condition). The query plan using the explain showed an IXSCAN (index scan) on the compound index, and using an index will sure improve the query performance.
I'm using MongoDB version 4.2.0. I have a collection with the following indexes:
{uuid: 1},
{unique: true, name: "uuid_idx"}
and
{field1: 1, field2: 1, _id: 1},
{unique: true, name: "compound_idx"}
When executing this query
aggregate([
{"$match": {"uuid": <uuid_value>}}
])
the planner correctly selects uuid_idx.
When adding this sort clause
aggregate([
{"$match": {"uuid": <uuid_value>}},
{"$sort": {"field1": 1, "field2": 1, "_id": 1}}
])
the planner selects compound_idx, which makes the query slower.
I would expect the sort clause to not make a difference in this context. Why does Mongo not use the uuid_idx index in both cases?
EDIT:
A little clarification, I understand there are workarounds to use the correct index, but I'm looking for an explanation of why this does not happen automatically (if possible with links to the official documentation). Thanks!
Why is this happening?:
Lets understand how Mongo chooses which index to use as explained here.
If a query can be satisfied by multiple indexes (satisfied is used losely as Mongo actually chooses all possibly relevant indexes) defined in the collection.
MongoDB will then test all the applicable indexes in parallel. The first index that can returns 101 results will be selected by the query planner.
Meaning that for that certain query that index actually wins.
What can we do?:
We can use $hint, hint basically forces Mongo to use a specific index, however Mongo this is not recommended because if changes occur Mongo will not adapt to those.
The query:
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
)
doesn't use the index "uuid_idx".
There are couple of options you can work with for using indexes on both the match and sort operations:
(1) Define a new compound index: { uuid: 1, fld1: 1, fld2: 1, _id: 1 }
Both the match and match+sort queries will use this index (for both the match and sort operations).
(2) Use the hint on the uuid index (using existing indexes)
Both the match and match+sort queries will use this index (for both the match and sort operations).
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
{ hint: "uuid_idx"}
)
If you can use find instead of aggregate, it will use the right index. So this is still problem in aggregate pipeline.
I'm using mongo 2.6.8 and have the following problem:
Collection users has indexes _id_1 and b_1. When I perform query
db.users.find({"$and": [
{"b": {"$gt": ISODate("somedate")}},
{"b": {"$lt": ISODate("anotherdate")}},
{"_id": {"$gt": "somevalue"}},
{"_id": {"$lt": "anothervalue"}},
]})
I expect that mongo will perform index intersection and will use intersected index, but it chooses only b_1 index. When executing explain on this query allPlans section even doesn't contain intersected index, only _id_1 and b_1.
Why does mongo not perform index intersection?
I think this could result from the fact, that you have two restrictions in your query on the same indexed key ($gt and $lt on b (and same for _id)). What happens to your explain if you change your query to the following. If It's using intersection I would be right:
db.users.find({"$and": [
{"b": {"$gt": ISODate("somedate")}},
{"_id": {"$gt": "somevalue"}},
]})
In this case using both restrictions on one index could be faster than using only one restriction of both indexes and use the intersection.
Let's say I have the following document structure:
{ _id : 1,
items: [ {n: "Name", v: "Kevin"}, ..., {n: "Age", v: 100} ],
records : [ {n: "Environment", v: "Linux"}, ... , {n: "RecordNumber, v: 555} ]
}
If I create 2 compound indexes on items.n-items.v and records.n-records.v, I could perform an $all query:
db.collection.find( {"items" : {$all : [{ $elemMatch: {n: "Name", v: "Kevin"} },
{$elemMatch: {n: "Age", v: 100} } ]}})
I could also perform a similar search on records.
db.collection.find( {"records" : {$all : [{ $elemMatch: {n: "Environment", v: "Linux"} },
{$elemMatch: {n: "RecordNumber", v: 555} } ]}})
Can I somehow perform a query that uses the index(es) to search for a document based on the items and records field?
find all documents where item.n = "Name" and item.v = "Kevin" AND record.n="RecordNumber" and record.v = 100
I'm not sure that this is possible using $all.
You can use an index to query on one array, but not both. Per the documentation, While you can create multikey compound indexes, at most one field in a compound index may hold an array.
Practically:
You can use a Compound index to index multiple fields.
You can use a Multikey index to index all the elements of an array.
You can use a Multikey index as one element of a compound Index
You CANNOT use multiple multikey indexes in a compound index
The documentation lays out the reason for this pretty clearly:
MongoDB does not index parallel arrays because they require the index to include each value in the Cartesian product of the compound keys, which could quickly result in incredibly large and difficult to maintain indexes.
I have a doubt on MongoDB sparse Index.
I have a collection (post) with very little documents (6K the biggest) that could embed a sub-document in this way:
{
"a": "a-val",
"b": "b-val",
"meta": {
"urls": [ "url1", "url2" ... ],
"field1": "value1",
...
}
}
The field "a" and "b" are always presents, but "meta.urls" could be non existent!
Now, I have inserted just one document with "meta.urls" value and then I did
db.post.ensureIndex({"a": 1, "b": 1, "meta.urls": 1}, {sparse: true});
post stats gives me a "strange" result: the index is about 97MB!
How is it possible? Only one document with "meta.urls" inserted, and index size is 97MB ?
So, I tried to create only "meta.urls" index in this way:
db.post.ensureIndex({"meta.urls": 1}, {sparse: true});
I have now "meta.urls_1" index with just 1 document.
But if I explain a simple query like this
db.post.find({"meta.urls": {$exists: true}}).hint("meta.urls_1").explain({verbose: true});
I have another "strange" result:
"n" : 1,
"nscannedObjects" : 5,
"nscanned" : 5,
Why Mongo scans 5 docs, an not just the one in the index?
If I query for a precise match on "meta.urls", the single sparse index will work correctly.
Example:
db.post.find({"meta.urls": "url1"}).hint("meta.old_slugs_1") // 1 document
For your first question: you can use a compound index to search on a prefix of the keys it indexes. For example, your first index would be used if you searched on just a or both a and b. Thus, the sparse will only fail to index docs where a is null.
I don't have an answer for your second question, but you should trying updating MongoDB and trying again - its moving pretty quickly, and sparse indexes have gotten better in the past few months.