Query returns more than expected results - mongodb

Bear with me, this is not really my question. Just trying to get someone to understand.
Authors note:
The possible duplicate question solution allows $elemMatch to constrain because >all of the elements are an array. This is a little different.
So, in the accepted answer the main point has been brought up. This behavior is well
documented and you should not "compare 'apples'` with 'oranges'". The fields are of
different types, and while there is a workaround for this, the best solution for the real
world is don't do this.
Happy reading :)
I have a collection of documents I am trying to search, the collection contains the following:
{ "_id" : ObjectId("52faa8a695fa10cc7d2b7908"), "x" : 1 }
{ "_id" : ObjectId("52faa8ab95fa10cc7d2b7909"), "x" : 5 }
{ "_id" : ObjectId("52faa8ad95fa10cc7d2b790a"), "x" : 15 }
{ "_id" : ObjectId("52faa8b095fa10cc7d2b790b"), "x" : 25 }
{ "_id" : ObjectId("52faa8b795fa10cc7d2b790c"), "x" : [ 5, 25 ] }
So I want to find the results where x falls between the values of 10 and 20. So this is the query that seemed logical to me:
db.collection.find({ x: {$gt: 10, $lt: 20} })
But the problem is this returns two documents in the result:
{ "_id" : ObjectId("52faa8ad95fa10cc7d2b790a"), "x" : 15 }
{ "_id" : ObjectId("52faa8b795fa10cc7d2b790c"), "x" : [ 5, 25 ] }
I am not expecting to see the second result as none of the values are between 10 and 20.
Can someone explain why I do not get the result I expect? I think { "x": 15 } should be the only document returned.
So furthermore, how can I get what I expect?

This behaviour is expected and explained in mongo documentation here.
Query a Field that Contains an Array
If a field contains an array and your query has multiple conditional
operators, the field as a whole will match if either a single array
element meets the conditions or a combination of array elements
meet the conditions.
Mongo seems to be willing to play "smug", by giving back results when a combination of array elements match all conditions independently.
In our example, 5 matches the $lt:20 condition and 25 matches the $gt:10 condition. So, it's a match.
Both of the following will return the [5,25] result:
db.collection.find({ x: {$gt: 10, $lt: 20} })
db.collection.find({ $and : [{x: {$gt: 10}},{x:{ $lt: 20}} ] })
If this is user expected behaviour, opinions can vary. But it certainly is documented, and should be expected.
Edit, for Neil's sadistic yet highly educational edit to original answer, asking for a solution:
Use of the $elemMatch can make "stricter" element comparisons for arrays only.
db.collection.find({ x: { $elemMatch:{ $gt:10, $lt:20 } } })
Note: this will match both x:[11,12] and x:[11,25]
I believe when a query like this is needed, a combination on two queries is required, and the results combined. Below is a query that returns correct results for documents with x being not an array:
db.collection.find( { $where : "!Array.isArray(this.x)", x: {$gt: 10, $lt: 20} } )
But the best approach in this case is to change the type of x to always be an array, even when it only contains one element. Then, only the $elemMatch query is required to get correct results, with expected behaviour.

You can first check if the subdocument is not and array and provide a filter for the desired values:
db.collection.find(
{
$and :
[
{ $where : "!Array.isArray(this.x)" },
{ x: { $gt: 10, $lt: 20 } }
]
}
)
which returns:
{ "_id" : ObjectId("52fb4ec1cfe34ac4b9bab163"), "x" : 15 }

Related

Need for using $and

In MongoDB, I have this following code:
db.products.find({name: "Postcard", status: "Available"})
But isn't that the same as using $and? If not, what is the difference?
Another example...
Where the status equals "Available" and either qty is less than ($gt) 100 or name starts with the characters "Po":
db.products.find( {status:"Available", $or:[{qty:{$gt:100 }},{item:/^Po/}]})
So seems as if there is no need of using $and in these two examples. So why or when would $and be used?
In both your examples it is superfluous to use $and because using ',' to specify match conditions on several different fields accomplishes it just the same.
One instance when to use them is if you need to specify multiple conditions on the same field. Here is an example (straight from mongodb tutorial videos).
db.movieDetails.find({"$and": [{"metacritic": {"$ne": "null"}},
"metacritic": {"$exists": "true"}]})
The explanation provided was that the keys in a JSON document must be unique. So if the above query were to be specified without $and, only the last "metacritic" value would be apparently be used.
Mongodb documentation specifies another example listed with a similar explanation. Notice $or operator being specified twice.
db.inventory.find( {
$and : [
{ $or : [ { price : 0.99 }, { price : 1.99 } ] },
{ $or : [ { sale : true }, { qty : { $lt : 20 } } ] }
]
} )

Counting entries of subdocument in MongoDB documents

I have a document structure like so
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : {
"r152f47f1daf-0-2" : "c",
"r152f48413c1-0-2" : "c",
"r152f4851bf7-0-1" : "c"
}
}
My task is to find all documents with the following conditions:
The "_id" needs to start with "5:"
The number of revisions need to be exclusively greater then 3
The first part is easy, I have solved it with
db.nodes.find( {'_id': /^5:/} )
But I am struggling with the second part, am supposed to use $gt.
Since I am new to MongoDB, I was first looking at $size, but _revisions is not an array, it is a subdocument, right?.
Was also looking at $unwind and then counting the results, but that does not make sense either, since my result need to be the documents that match the above two conditions.
Any pointers highly appreciated.
Using the $where operator.
db.nodes.find(function() {
return (/^5:/.test(this._id) && Object.keys(this._revisions).length > 3 );
})
The problem with this as mentioned in the documentation is that:
$where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in).
You should definitely consider to change the _revisions field to an array of sub-documents like this:
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : [
{
"rev": "r152f47f1daf-0-2",
"value": "c"
},
{
"rev": "r152f48413c1-0-2",
"value": "c"
},
{
"rev": "r152f4851bf7-0-1",
"value": "c"
}
]
}
And use the $exists operator.
db.nodes.find({ "_id": /^5:/, "_revisions.3": { "$exists": true } } )

Is there a way to prevent mongo queries "branching" on arrays?

If I have the following documents:
{a: {x:1}} // without array
{a: [{x:1}]} // with array
Is there a way to query for {'a.x':1} that will return the first one but not the second one? IE, I want the document where a.x is 1, and a is not an array.
Please note that future version of MongoDB would incorporate the $isArray aggregation expression. In the meantime...
...the following code will do the trick as the $elemMatch operator matches only documents having an array field:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}})
Given that dataset:
> db.test.find({},{_id:0})
{ "a" : { "x" : 1 } }
{ "a" : [ { "x" : 1 } ] }
{ "a" : [ { "x" : 0 }, { "x" : 1 } ]
It will return:
> db.test.find({"a.x": 1, "a": {$not: {$elemMatch: {x:1}}}}, {_id:0})
{ "a" : { "x" : 1 } }
Please note this should be considered as a short term solution. The MongoDB team took great cares to ensure that [{x:1}] and {x:1} behave the same (see dot-notation or $type for arrays). So you should consider that at some point in the future, $elemMatch might be updated (see JIRA issue SERVER-6050). In the meantime, maybe worth considering fixing your data model so it would no longer be necessary to distinguish between an array containing one subdocument and a bare subdocument.
You can do this by adding a second term that ensures a has no elements. That second term will always be true when a is a plain subdoc, and always false when a is an array (as otherwise the first term wouldn't have matched).
db.test.find({'a.x': 1, 'a.0': {$exists: false}})

Mongo find query for longest arrays inside object

I currently have objects in mongo set up like this for my application (simplified example, I removed some irrelevant fields for clarity here):
{
"_id" : ObjectId("529159af5b508dd71500000a"),
"c" : "somecontent",
"l" : [
{
"d" : "2013-11-24T01:43:11.367Z",
"u" : "User1"
},
{
"d" : "2013-11-24T01:43:51.206Z",
"u" : "User2"
}
]
}
What I would like to do is run a find query to return the objects which have the highest array length under "l" and sort highest->lowest, limit to 25 results. Some objects may have 1 object in the array, some may have 100. I'd like to find out which ones have the most under "l". I'm new to mongo and got everything else to work up until this point, but I just can't figure out the right parameters to get this specific query. Where I'm getting confused is how to handle counting the length of the array, sorting, etc. I could manually code this by parsing everything in the collection, but I'm sure there has to be a way for mongo to do this far more efficiently. I'm not against learning, if anyone knows any resources for more advanced queries or could help me out I'd really be thankful as this is the last piece! :-)
As a side note, node.js and mongo together is amazing and I wish I started using them in conjunction a long time ago.
Use the aggregation framework. Here's how:
db.collection.aggregate( [
{ $unwind : "$l" },
{ $group : { _id : "$_id", len : { $sum : 1 } } },
{ $sort : { len : -1 } },
{ $limit : 25 }
] )
There is no easy way to do this with your existing schema. The reason for this is that there is nothing in mongodb to find the size of your array length. Yes, you have $size operator, but the way it works is just to find all the arrays of a specific length.
So you can not sort your find query based on the length of the array. The only reasonable way out is to add additional field to your schema which will hold the length of the array (you will have something like "l_length : 3" in additional to your fields for every document). Good thing is that you can do it easily by looking at this relevant answer and after this you just need to make sure to increment or decrement this value when you are modifying the array.
When you will add this field, you can easily sort it by that field and moreover you can take advantage of indexes.
There is no straight approach to do this,
You can try adding size field in your document using $size,
$addFields to add new field total to get total elements in l array
$sort by total in descending order
$limit to select single document
$project to remove total field if you don't needed
db.collection.aggregate([
{ $addFields: { total: { $size: "$l" } } },
{ $sort: { total: -1 } },
{ $limit: 25 }
// { $project: { total: 0 } }
])
Playground

Mongodb query with fields in the same documents

I have the following json:
{
"a1": {"a": "b"},
"a2": {"a": "c"}
}
How can I request all documents where a1 and a2 are not equal in the same document?
You could use $where:
db.myCollection.find( { $where: "this.a1.a != this.a2.a" } )
However, be aware that this won't be very fast, because it will have to spin up the java script engine and iterate each and every document and check the condition for each.
If you need to do this query for large collections, or very often, it's best to introduce a denormalized flag, like areEqual. Still, such low-selectivity fields don't yield good index performance, because he candidate set is still large.
update
using the new $expr operator available as of mongo 3.6 you can use aggregate expressions in find query like this:
db.myCollection.find({$expr: {$ne: ["$a1.a", "$a2.a"] } });
Although this comment solves the problem, I think a better match for this use case would be to use $addFields operator available as of version 3.4 instead of $project.
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$addFields": {
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
]);
To avoid JavaScript use the aggregation framework:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aCmp": {"$cmp":["$a1.a","$a2.a"]}
}
},
{"$match":{"aCmp":0}}
])
On our development server the equivalent JavaScript query takes 7x longer to complete.
Update (10 May 2017)
I just realized my answer didn't answer the question, which wanted values that are not equal (sometimes I'm really slow). This will work for that:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
])
$ne could be used in place of $eq if the match condition was changed to true but I find using $eq with false to be more intuitive.
MongoDB uses Javascript in the background, so
{"a": "b"} == {"a": "b"}
would be false.
So to compare each you would have to a1.a == a2.a
To do this in MongoDB you would use the $where operator
db.myCollection.find({$where: "this.a1.a != this.a2.a"});
This assumes that each embedded document will have a property "a". If that isn't the case things get more complicated.
Starting in Mongo 4.4, for those that want to compare sub-documents and not only primitive values (since {"a": "b"} == {"a": "b"} is false), we can use the new $function aggregation operator that allows applying a custom javascript function:
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 1, "y" : 2 } }
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
db.collection.aggregate(
{ $match:
{ $expr:
{ $function: {
body: function(a1, a2) { return JSON.stringify(a1) != JSON.stringify(a2); },
args: ["$a1", "$a2"],
lang: "js"
}}
}
}
)
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
$function takes 3 parameters:
body, which is the function to apply, whose parameter are the two fields to compare.
args, which contains the fields from the record that the body function takes as parameter. In our case, both "$a1" and "$a2".
lang, which is the language in which the body function is written. Only js is currently available.
Thanks all for solving my problem -- concerning the answers that use aggregate(), one thing that confused me at first is that $eq (or $in, or lots of other operators) has different meaning depending on where it is used. In a find(), or the $match phase of aggregation, $eq takes a single value, and selects matching documents:
db.items.aggregate([{$match: {_id: {$eq: ObjectId("5be5feb45da16064c88e23d4")}}}])
However, in the $project phase of aggregation, $eq takes an Array of 2 expressions, and makes a new field with value true or false:
db.items.aggregate([{$project: {new_field: {$eq: ["$_id", "$foreignID"]}}}])
In passing, here's the query I used in my project to find all items whose list of linked items (due to a bug) linked to themselves:
db.items.aggregate([{$project: {idIn: {$in: ["$_id","$header.links"]}, "header.links": 1}}, {$match: {idIn: true}}])