Index intersection issue in mongo - mongodb

I'm using mongo 2.6.8 and have the following problem:
Collection users has indexes _id_1 and b_1. When I perform query
db.users.find({"$and": [
{"b": {"$gt": ISODate("somedate")}},
{"b": {"$lt": ISODate("anotherdate")}},
{"_id": {"$gt": "somevalue"}},
{"_id": {"$lt": "anothervalue"}},
]})
I expect that mongo will perform index intersection and will use intersected index, but it chooses only b_1 index. When executing explain on this query allPlans section even doesn't contain intersected index, only _id_1 and b_1.
Why does mongo not perform index intersection?

I think this could result from the fact, that you have two restrictions in your query on the same indexed key ($gt and $lt on b (and same for _id)). What happens to your explain if you change your query to the following. If It's using intersection I would be right:
db.users.find({"$and": [
{"b": {"$gt": ISODate("somedate")}},
{"_id": {"$gt": "somevalue"}},
]})
In this case using both restrictions on one index could be faster than using only one restriction of both indexes and use the intersection.

Related

MongoDb Query Filtering with arithmetic expression

i have a query in sql that I wan't to translate into the MongoDb query.
The statement is:
select * from TBA where a/b < c/d
a,b,c are columns in the table tba and d is a constant
How can I rewrite this statement into the MongoDb query language?
I have a document collection called "TBA" where all documents are stored. Now I want to find out which documents fulfill the condition "a/b < c/d".
Thank you in advance.
Best regards,
user12682244
If you want to do a calculation using the values stored in the document, you need to use a pipeline:
db.collection.aggregate([
{$match: {
$expr: {
$lt: [
{$divide: ["$a", "$b"]},
{$divide: ["$c", d]}
]
}
}
}
])
See how it works on the playground example

MongoDB - Safely sort inner array after group

I'm trying to look up all records that match a certain condition, in this case _id being certain values, and then return only the top 2 results, sorted by the name field.
This is what I have
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {fk: 1, name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$project: {items: {$slice: ["$items", 2]} }}
])
and it works, BUT, it's not guaranteed. According to this Mongo thread $group does not guarantee document order.
This would also mean that all of the suggested solutions here and elsewhere, which recommend using $unwind, followed by $sort, and then $group, would also not work, for the same reason.
What is the best way to accomplish this with Mongo (any version)? I've seen suggestions that this could be accomplished in the $project phase, but I'm not quite sure how.
You are correct in saying that the result of $group is never sorted.
$group does not order its output documents.
Hence doing a;
{$sort: {fk: 1}}
then grouping with
{$group: {_id: "$fk", ... }},
will be a wasted effort.
But there is a silver lining with sorting before $group stage with name: -1. Since you are using $push (not an $addToSet), inserted objects will retain the order they've had in the newly created items array in the $group result. You can see this behaviour here (copy of your pipeline)
The items array will always have;
"items": [
{
..
"name": "Michael"
},
{
..
"name": "George"
}
]
in same order, therefore your nested array sort is a non-issue! Though I am unable to find an exact quote in documentation to confirm this behaviour, you can check;
this,
or this where it is confirmed.
Also, accumulator operator list for $group, where $addToSet has "Order of the array elements is undefined." in its description, whereas the similar operator $push does not, which might be an indirect evidence? :)
Just a simple modification of your pipeline where you move the fk: 1 sort from pre-$group stage to post-$group stage;
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$sort: {_id: 1}},
{$project: {items: {$slice: ["$items", 2]} }}
])
should be sufficient to have the main result array order fixed as well. Check it on mongoplayground
$group doesn't guarantee the document order but it would keep the grouped documents in the sorted order for each bucket. So in your case even though the documents after $group stage are not sorted by fk but each group (items) would be sorted by name descending. If you would like to keep the documents sorted by fk you could just add the {$sort:{fk:1}} after $group stage
You could also sort by order of values passed in your match query should you need by adding a extra field for each document. Something like
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$addField:{ifk:{$indexOfArray:[[1, 2],"$fk"]}}},
{$sort: {ifk: 1, name: -1}},
{$group: {_id: "$ifk", items: {$push: "$$ROOT"}}},
{$sort: {_id : 1}},
{$project: {items: {$slice: ["$items", 2]}}}
])
Update to allow array sort without group operator : I've found the jira which is going to allow sort array.
You could try below $project stage to sort the array.There maybe various way to do it. This should sort names descending.Working but a slower solution.
{"$project":{"items":{"$reduce":{
"input":"$items",
"initialValue":[],
"in":{"$let":{
"vars":{"othis":"$$this","ovalue":"$$value"},
"in":{"$let":{
"vars":{
//return index as 0 when comparing the first value with initial value (empty) or else return the index of value from the accumlator array which is closest and less than the current value.
"index":{"$cond":{
"if":{"$eq":["$$ovalue",[]]},
"then":0,
"else":{"$reduce":{
"input":"$$ovalue",
"initialValue":0,
"in":{"$cond":{
"if":{"$lt":["$$othis.name","$$this.name"]},
"then":{"$add":["$$value",1]},
"else":"$$value"}}}}
}}
},
//insert the current value at the found index
"in":{"$concatArrays":[
{"$slice":["$$ovalue","$$index"]},
["$$othis"],
{"$slice":["$$ovalue",{"$subtract":["$$index",{"$size":"$$ovalue"}]}]}]}
}}}}
}}}}
Simple example with demonstration how each iteration works
db.b.insert({"items":[2,5,4,7,6,3]});
othis ovalue index concat arrays (parts with counts) return value
2 [] 0 [],0 [2] [],0 [2]
5 [2] 0 [],0 [5] [2],-1 [5,2]
4 [5,2] 1 [5],1 [4] [2],-1 [5,4,2]
7 [5,4,2] 0 [],0 [7] [5,4,2],-3 [7,5,4,2]
6 [7,5,4,2] 1 [7],1 [6] [5,4,2],-3 [7,6,5,4,2]
3 [7,6,5,4,2] 4 [7,6,5,4],4 [3] [2],-1 [7,6,5,4,3,2]
Reference - Sorting Array with JavaScript reduce function
There is a bit of a red herring in the question as $group does guarantee that it will be processing incoming documents in order (and that's why you have to sort of them before $group to get an ordered arrays) but there is an issue with the way you propose doing it, since pushing all the documents into a single grouping is (a) inefficient and (b) could potentially exceed maximum document size.
Since you only want top two, for each of the unique fk values, the most efficient way to accomplish it is via a "subquery" using $lookup like this:
db.coll.aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$group:{_id:"$fk"}},
{$sort: {_id: 1}},
{$lookup:{
from:"coll",
as:"items",
let:{fk:"$_id"},
pipeline:[
{$match:{$expr:{$eq:["$fk","$$fk"]}}},
{$sort:{name:-1}},
{$limit:2},
{$project:{_id:0, fk:1, name:1}}
]
}}
])
Assuming you have an index on {fk:1, name:-1} as you must to get efficient sort in your proposed code, the first two stages here will use that index via DISTINCT_SCAN plan which is very efficient, and for each of them, $lookup will use that same index to filter by single value of fk and return results already sorted and limited to first two. This will be the most efficient way to do this at least until https://jira.mongodb.org/browse/SERVER-9377 is implemented by the server.

$elemMatch range query syntax

I am using this solution for indexing messages with many varying fields. Specifically, I am using Solution#2.
The example of range syntax
db.generic2.find({"props": { $elemMatch: {$gte: {"prop1": 6}, $lt: {"prop1": 99999999 } }}})
I have never seen this syntax in MongoDB docs, rather I see everywhere syntax like
db.generic2.find({"props": { $elemMatch: {"prop1": {$gte: 6, $lt: 99999999 }}}})
What is the difference? Funny using the first one I get fast query using indexing, using the second I get a slow query with collection scan. Both results are correct, however different.

how mongo query include " or " use index

this is a question about how to create efficient indexes when query have "or". Without “or” ,I know how to create efficient index.
This is my query.
db.collection.find({
'msg.sendTime':{$gt:1},
'msg.msgType':{$in:["chat","g_card"]},
$or:[{'msg.recvId':{$in:['xm80049258']}},{'msg.userId':'xm80049258'}],
$orderby:{'msg.sendTime':-1}})
After reading some article, I create two single index on msg.recvId and msg.userId, and this make sense.
I want to know when mongodb execute "or", Is it divides all documents at very first ,then use msg.sendTime and msg.msgType ?
How to create efficient indexes in this case? Should I create indexes (msg.sendTime:1,msg.msgType:1,msg.recvId:1) and
(msg.sendTime:1,msg.msgType:1,msg.userId:1)
Thanks very much.
Paraphrasing from $or Clauses and Indexes:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes.
Also from Indexing Strategies:
Generally, MongoDB only uses one index to fulfill most queries. However, each clause of an $or query may use a different index
What those paragraph mean for $or queries are:
In a find() query, only one index can be used. Therefore it's best to create an index that aligns with the fields in your query. Otherwise, MongoDB will do a collection scan.
Except when the query is an $or query, where MongoDB can use one index per $or term
In combination, if you have $or in your query, it's best to put the $or term as the top-level term, and create an index for each term separately
So to answer your question:
I want to know when mongodb execute "or", Is it divides all documents at very first ,then use msg.sendTime and msg.msgType ?
If your query has a top-level $or clause, MongoDB can use one index per clause. Otherwise, it will do a collection scan, or a semi-collection scan. For example, if you have an index:
db.collection.createIndex({a: 1, b: 1})
There are two general type of query you can create:
1. $or NOT on the top level of the query
This query can use the index, but will not be performant:
db.collection.find({a: 1, $or: [{b: 1}, {b: 2}]})
since the explain() output of the query is:
> db.collection.explain().find({a: 1, $or: [{b: 1}, {b: 2}]})
{
"queryPlanner": {
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[MinKey, MaxKey]"
]
...
Note that the query planner cannot use the proper boundary for the b field, where it is doing a semi-collection scan (since it's searching for b from MinKey to MaxKey, i.e. everything). The query planner result above is basically saying: "Find documents where a = 1, and scan all of them for b having value of 1 or 2"
2. $or on the top level of the query
However, pulling the $or clause to the top-level:
db.collection.find({$or: [{a: 1, b: 1}, {a: 1, b: 2}]})
will result in this query plan:
> db.test.explain().find({$or: [{a: 1, b: 1}, {a: 1, b: 2}]})
{
"queryPlanner": {
...
"winningPlan": {
"stage": "SUBPLAN",
...
"inputStages": [
{
"stage": "IXSCAN",
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[1.0, 1.0]"
]
}
},
{
"stage": "IXSCAN",
...
"indexBounds": {
"a": [
"[1.0, 1.0]"
],
"b": [
"[2.0, 2.0]"
]
Note that each term of the $or is treated as a separate query, each with a tight boundary. As such, the query plan above is saying: "Find documents where a = 1, b = 1 or a = 1, b = 2". As you can imagine, this query will be much more performant compared to the earlier query.
For your second question:
How to create efficient indexes in this case? Should I create indexes (msg.sendTime:1,msg.msgType:1,msg.recvId:1) and (msg.sendTime:1,msg.msgType:1,msg.userId:1)
As explained above, you need to combine the proper query with the proper index to achieve the best result. The two indexes you proposed will be able to be used by MongoDB and will work best if you rearrange your query to have the $or in the top-level of your query.
I encourage you to understand the explain() output of MongoDB, since it's the best tool to find out if your queries are using the proper indexes or not.
Relevant resources that you may find useful are:
Explain Results
Create Indexes to Support Your Queries
Indexing Strategies

Conditions in MongoDb

What's the correct way to use operations such as $not or $ne with complex values? I mean values, which are also computed with some operations. I've tried {$not: {$and: [{field1: 'a'}, {field2: 'b'}]}} and {$not: [{$and: [{field1: 'a'}, {field2: 'b'}]}]}, but none of them seem to work correctly. The same with $ne: {$ne: [field1, field2]}. The documentation shows their usage examples as field1: {$not: {$gt: 5}}, and it's nice for so simple cases, but how to deal with more complex ones?
If it makes a difference, I want to use them in a $match clause of the aggregation framework, not just in a find().
UPD:
For example, i'd want to run such query: db.test.aggregate({$match: {$not: {$and: [{f1: 'a'}, {f2: 'b'}]}}}), but it give error "invalid operator: $and" (the same code without $not works). To test that query insert documents before: db.test.insert({f1:'a', f2:'b'}); db.test.insert({f1:'b', f2:'c'}).
$not and $ne are field-specific operators, so you can't apply them to a multi-field query operation. I don't think you can construct a generalized 'negative' query like you're trying to do.
Instead, you'd need to invert your logic field by field to use a query like:
db.test.aggregate({$match: {$or: [{f1: {$ne: 'a'}}, {f2: {$ne: 'b'}}]}})