I am using this solution for indexing messages with many varying fields. Specifically, I am using Solution#2.
The example of range syntax
db.generic2.find({"props": { $elemMatch: {$gte: {"prop1": 6}, $lt: {"prop1": 99999999 } }}})
I have never seen this syntax in MongoDB docs, rather I see everywhere syntax like
db.generic2.find({"props": { $elemMatch: {"prop1": {$gte: 6, $lt: 99999999 }}}})
What is the difference? Funny using the first one I get fast query using indexing, using the second I get a slow query with collection scan. Both results are correct, however different.
Related
i have a query in sql that I wan't to translate into the MongoDb query.
The statement is:
select * from TBA where a/b < c/d
a,b,c are columns in the table tba and d is a constant
How can I rewrite this statement into the MongoDb query language?
I have a document collection called "TBA" where all documents are stored. Now I want to find out which documents fulfill the condition "a/b < c/d".
Thank you in advance.
Best regards,
user12682244
If you want to do a calculation using the values stored in the document, you need to use a pipeline:
db.collection.aggregate([
{$match: {
$expr: {
$lt: [
{$divide: ["$a", "$b"]},
{$divide: ["$c", d]}
]
}
}
}
])
See how it works on the playground example
I'm using MongoDB version 4.2.0. I have a collection with the following indexes:
{uuid: 1},
{unique: true, name: "uuid_idx"}
and
{field1: 1, field2: 1, _id: 1},
{unique: true, name: "compound_idx"}
When executing this query
aggregate([
{"$match": {"uuid": <uuid_value>}}
])
the planner correctly selects uuid_idx.
When adding this sort clause
aggregate([
{"$match": {"uuid": <uuid_value>}},
{"$sort": {"field1": 1, "field2": 1, "_id": 1}}
])
the planner selects compound_idx, which makes the query slower.
I would expect the sort clause to not make a difference in this context. Why does Mongo not use the uuid_idx index in both cases?
EDIT:
A little clarification, I understand there are workarounds to use the correct index, but I'm looking for an explanation of why this does not happen automatically (if possible with links to the official documentation). Thanks!
Why is this happening?:
Lets understand how Mongo chooses which index to use as explained here.
If a query can be satisfied by multiple indexes (satisfied is used losely as Mongo actually chooses all possibly relevant indexes) defined in the collection.
MongoDB will then test all the applicable indexes in parallel. The first index that can returns 101 results will be selected by the query planner.
Meaning that for that certain query that index actually wins.
What can we do?:
We can use $hint, hint basically forces Mongo to use a specific index, however Mongo this is not recommended because if changes occur Mongo will not adapt to those.
The query:
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
)
doesn't use the index "uuid_idx".
There are couple of options you can work with for using indexes on both the match and sort operations:
(1) Define a new compound index: { uuid: 1, fld1: 1, fld2: 1, _id: 1 }
Both the match and match+sort queries will use this index (for both the match and sort operations).
(2) Use the hint on the uuid index (using existing indexes)
Both the match and match+sort queries will use this index (for both the match and sort operations).
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
{ hint: "uuid_idx"}
)
If you can use find instead of aggregate, it will use the right index. So this is still problem in aggregate pipeline.
this is a question about how to create efficient indexes when query have "or". Without “or” ,I know how to create efficient index.
This is my query.
db.collection.find({
'msg.sendTime':{$gt:1},
'msg.msgType':{$in:["chat","g_card"]},
$or:[{'msg.recvId':{$in:['xm80049258']}},{'msg.userId':'xm80049258'}],
$orderby:{'msg.sendTime':-1}})
After reading some article, I create two single index on msg.recvId and msg.userId, and this make sense.
I want to know when mongodb execute "or", Is it divides all documents at very first ,then use msg.sendTime and msg.msgType ?
How to create efficient indexes in this case? Should I create indexes (msg.sendTime:1,msg.msgType:1,msg.recvId:1) and
(msg.sendTime:1,msg.msgType:1,msg.userId:1)
Thanks very much.
Paraphrasing from $or Clauses and Indexes:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes.
Also from Indexing Strategies:
Generally, MongoDB only uses one index to fulfill most queries. However, each clause of an $or query may use a different index
What those paragraph mean for $or queries are:
In a find() query, only one index can be used. Therefore it's best to create an index that aligns with the fields in your query. Otherwise, MongoDB will do a collection scan.
Except when the query is an $or query, where MongoDB can use one index per $or term
In combination, if you have $or in your query, it's best to put the $or term as the top-level term, and create an index for each term separately
So to answer your question:
I want to know when mongodb execute "or", Is it divides all documents at very first ,then use msg.sendTime and msg.msgType ?
If your query has a top-level $or clause, MongoDB can use one index per clause. Otherwise, it will do a collection scan, or a semi-collection scan. For example, if you have an index:
db.collection.createIndex({a: 1, b: 1})
There are two general type of query you can create:
1. $or NOT on the top level of the query
This query can use the index, but will not be performant:
db.collection.find({a: 1, $or: [{b: 1}, {b: 2}]})
since the explain() output of the query is:
> db.collection.explain().find({a: 1, $or: [{b: 1}, {b: 2}]})
{
"queryPlanner": {
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[MinKey, MaxKey]"
]
...
Note that the query planner cannot use the proper boundary for the b field, where it is doing a semi-collection scan (since it's searching for b from MinKey to MaxKey, i.e. everything). The query planner result above is basically saying: "Find documents where a = 1, and scan all of them for b having value of 1 or 2"
2. $or on the top level of the query
However, pulling the $or clause to the top-level:
db.collection.find({$or: [{a: 1, b: 1}, {a: 1, b: 2}]})
will result in this query plan:
> db.test.explain().find({$or: [{a: 1, b: 1}, {a: 1, b: 2}]})
{
"queryPlanner": {
...
"winningPlan": {
"stage": "SUBPLAN",
...
"inputStages": [
{
"stage": "IXSCAN",
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[1.0, 1.0]"
]
}
},
{
"stage": "IXSCAN",
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[2.0, 2.0]"
]
Note that each term of the $or is treated as a separate query, each with a tight boundary. As such, the query plan above is saying: "Find documents where a = 1, b = 1 or a = 1, b = 2". As you can imagine, this query will be much more performant compared to the earlier query.
For your second question:
How to create efficient indexes in this case? Should I create indexes (msg.sendTime:1,msg.msgType:1,msg.recvId:1) and (msg.sendTime:1,msg.msgType:1,msg.userId:1)
As explained above, you need to combine the proper query with the proper index to achieve the best result. The two indexes you proposed will be able to be used by MongoDB and will work best if you rearrange your query to have the $or in the top-level of your query.
I encourage you to understand the explain() output of MongoDB, since it's the best tool to find out if your queries are using the proper indexes or not.
Relevant resources that you may find useful are:
Explain Results
Create Indexes to Support Your Queries
Indexing Strategies
This question already has answers here:
MongoDb query condition on comparing 2 fields
(4 answers)
Closed 3 years ago.
Is it possible to find only those documents in a collections with same value in two given fields?
{
_id: 'fewSFDewvfG20df',
start: 10,
end: 10
}
As here start and end have the same value, this document would be selected.
I think about something like...
Collection.find({ start: { $eq: end } })
... which wouldn't work, as end has to be a value.
You can use $expr in mongodb 3.6 to match the two fields from the same document.
db.collection.find({ "$expr": { "$eq": ["$start", "$end"] } })
or with aggregation
db.collection.aggregate([
{ "$match": { "$expr": { "$eq": ["$start", "$end"] }}}
])
You have two options here. The first one is to use the $where operator.
Collection.find( { $where: "this.start === this.end" } )
The second option is to use the aggregation framework and the $redact operator.
Collection.aggregate([
{ "$redact": {
"$cond": [
{ "$eq": [ "$start", "$end" ] },
"$$KEEP",
"$$PRUNE"
]
}}
])
Which one is better?
The $where operator does a JavaScript evaluation and can't take advantage of indexes so query using $where can cause a drop of performance in your application. See considerations. If you use $where each of your document will be converted from BSON to JavaScript object before the $where operation which, will cause a drop of performance. Of course your query can be improved if you have an index filter. Also There is security risk if you're building your query dynamically base on user input.
The $redact like the $where doesn't use indexes and even perform a collection scan, but your query performance improves when you $redact because it is a standard MongoDB operators. That being said the aggregation option is far better because you can always filter your document using the $match operator.
$where here is fine but could be avoided. Also I believe that you only need $where when you have a schema design problem. For example adding another boolean field to the document with index can be a good option here.
this query is fast, since least function calls are involved,
Collection.find("this.start == this.end");
What's the correct way to use operations such as $not or $ne with complex values? I mean values, which are also computed with some operations. I've tried {$not: {$and: [{field1: 'a'}, {field2: 'b'}]}} and {$not: [{$and: [{field1: 'a'}, {field2: 'b'}]}]}, but none of them seem to work correctly. The same with $ne: {$ne: [field1, field2]}. The documentation shows their usage examples as field1: {$not: {$gt: 5}}, and it's nice for so simple cases, but how to deal with more complex ones?
If it makes a difference, I want to use them in a $match clause of the aggregation framework, not just in a find().
UPD:
For example, i'd want to run such query: db.test.aggregate({$match: {$not: {$and: [{f1: 'a'}, {f2: 'b'}]}}}), but it give error "invalid operator: $and" (the same code without $not works). To test that query insert documents before: db.test.insert({f1:'a', f2:'b'}); db.test.insert({f1:'b', f2:'c'}).
$not and $ne are field-specific operators, so you can't apply them to a multi-field query operation. I don't think you can construct a generalized 'negative' query like you're trying to do.
Instead, you'd need to invert your logic field by field to use a query like:
db.test.aggregate({$match: {$or: [{f1: {$ne: 'a'}}, {f2: {$ne: 'b'}}]}})