Check last element in array matches a condition - mongodb

I have an array of numbers in my mongodb documents and need to check if the last number in that array meets my conditions.
My documents are stored like this:
{
name: String,
data: {
dates: Array,
numbers: Array
}
}
and I need to check if the last number in numbers "lies between" two other numbers.
Any suggestions on how to do this would be appreciated.

Right now the most effficient way you have of doing this is using the JavaScript evaluation of $where as you can simply find the value of the last array element and test it programatically.
With sample documents:
{ "a": [1,2,3] },
{ "a": [1,2,4] },
{ "a": [1,2,5] }
And to query:
db.collection.find(function() { var a = this.a.pop(); return ( a > 2 ) & ( a < 5 ) })
Or simply presented with $where as a string for evaluation:
Model.find(
{
"$where": "var a = this.a.pop(); return ( a > 2 ) && ( a < 5 )"
},
function(err,results) {
// handling here
}
);
Which is a really simple way to do this and does not have "overhead" such as $unwind in the aggregation framework created to to "denormalize" and process arrays. Not really efficient there.
In the "future" however, it will be. As is currently available in development releases, there is a $slice operator for the aggregation framework. This operator will allow easy access to the "last" array element for testing.
Since the aggregation framework operators are in "native code" aand not JavaScript to be interpreted, then a single pipeline stage then becomes more efficient than the JavaScript form. Though this listing to do this looks longer in submission:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$anyElementTrue": {
"$map": {
"input": { "$slice": ["$a",-1] },
"as": "el",
"in":{
"$and": [
{ "$gt": [ "$$el", 2 ] },
{ "$lt": [ "$$el", 5 ] }
]
}
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The $redact operator that already exists is used to "logically filter" with a comparison expression here. Based on the true/false match conditions it either "keeps" or "prunes" the document from the results repectively.
The $slice operator itself in it's aggregagtion framework form will still untimately return an array, albeit a single element array in this case. This is why $map is used to "transform" each element into a true/false condition and the $anyElementTrue operator reduces the "array" to a singular reponse as is repected by $cond.
So when that is released, then it will be be most efficient way to do this. But until then, stick with the JavaScript as it is presently the fastest way to to this evaluation.
Both query forms return just the first two documents of the sample here:
{ "a": [1,2,3] },
{ "a": [1,2,4] }

MongoDB aggregate may be a feasible way. Assuming name field in your document is unique.
If you have the sample document.
{
name: "allen",
data: {
dates: ["2015-08-08"],
numbers: [20, 21, 22, 23]
}
}
The following code is used to do the check. As the db.collection.aggregate() method returns a cursor and then we can use cursor's hasNext to decide whether the last number lies between the given two numbers.
var result = db.last_one.aggregate(
[
{
// deconstruct the array field numbers
$unwind: "$data.numbers"
},
{
$group: {
_id: "$name",
// lastNumber is 23 in this case
lastNumber: { $last: "$data.numbers" }
}
},
{
$match: {
lastNumber: { $gt: num1, $lt: num2 }
}
}
]
).hasNext()
if (result) print("matched"); else print("not matched")
For example, if num1 is 22, num2 is 24, the result is matched; if num1 is 21, num2 is 22, the result is not matched.
But actually, group on name is not a good idea. It's much better if your document has an unique ObjectId then we can group on that _id.

Related

if mongodb match inside aggregation returns nothing, how to make a new query?

I use match to select some documents from the collection, and then output all other documents except those found.
If match doesn't find any documents, then I need to display all available documents from the collection.
How can this be done?
Without an example I don't know if I've understood correctly, but you can try this aggregation query (or add this aggregation stages into your query).
The ide is using $facet create two ways:
Frist way: Match the value
Second way: Get everything
And use $project to output one of these options using $cond and $size.
Into the $project if the array returned in the "exists way" is 0 (any result) the result is no_exists(i.e. all values) otherwise is the exists value.
db.collection.aggregate([
{
"$facet": {
"exists": [
{
"$match": {
// your match
}
}
],
"no_exists": []
}
},
{
"$project": {
"result": {
"$cond": {
"if": {
"$eq": [
{
"$size": "$exists"
},
0
]
},
"then": "$no_exists",
"else": "$exists"
}
}
}
}
])
Example here where value exists and output only the value, and here where not exists and output all collection.

Query on all nested docs inside a nested doc in MongoDB

I have a collection with documents that look like this:
{ keyA1: "stringVal",
keyA2: "stringVal",
keyA3: { keyB1: { feild1: intVal,
feild2: intVal}
keyB2: { feild1: intVal,
feild2: intVal}
}
}
Currently the [keyB1, keyB2, ...] set is 7 keys, same for all documents in the collection. I want to query the intVals on specific fields for all keyB's. So, for example, I might want to find all documents where field2 has value greater than 100 regardless of whcih keyB it falls in.
For any one specific keyB, I simply use the dot notation: {"keyA3.keyB2.field2": {$gte: 100}}. Right now, I have the option of looping over all keyB's, but this may not be the case in the future where more keyB values can be added. I don't want to have to modify the code then, and would like to avoid harcoding those values in anyway. I also need the solution to be fairly fast, as the final deployment is expected to have over 20M documents.
How can I write a query that can "skip" the keyB field in the dot notation and just go through all the embedded docs?
FWIW, I'm implementing this in python using pymongo. Thanks.
first convert keyA3 object to array and add new field with $addFields
then filter the new array to match field2 value is greater than 100
then query the doc that size of matched array is greater than 0 , then remove extra field we add
db.collection.aggregate([
{
"$addFields": {
"arr": {
"$objectToArray": "$keyA3"
}
}
},
{
"$addFields": {
"matchArrSize": {
$size: {
"$filter": {
"input": "$arr",
"as": "z",
"cond": {
$gt: [
"$$z.v.feild2",
100
]
}
}
}
}
}
},
{
$match: {
matchArrSize: {
$gt: 0
}
}
},
{
$unset: [
"arr",
"matchArrSize"
]
}
])
https://mongoplayground.net/p/VumwL9y7Km1

How to use $regex inside $or as an Aggregation Expression

I have a query which allows the user to filter by some string field using a format that looks like: "Where description of the latest inspection is any of: foo or bar". This works great with the following query:
db.getCollection('permits').find({
'$expr': {
'$let': {
vars: {
latestInspection: {
'$arrayElemAt': ['$inspections', {
'$indexOfArray': ['$inspections.inspectionDate', {
'$max': '$inspections.inspectionDate'
}]
}]
}
},
in: {
'$in': ['$$latestInspection.description', ['Fire inspection on property', 'Health inspection']]
}
}
}
})
What I want is for the user to be able to use wildcards which I turn into regular expressions: "Where description of the latest inspection is any of: Health inspection or Found a * at the property".
The regex I get, don't need help with that. The problem I'm facing is, apparently the aggregation $in operator does not support matching by regular expressions. So I thought I'd build this using $or since the docs don't say I can't use regex. This was my best attempt:
db.getCollection('permits').find({
'$expr': {
'$let': {
vars: {
latestInspection: {
'$arrayElemAt': ['$inspections', {
'$indexOfArray': ['$inspections.inspectionDate', {
'$max': '$inspections.inspectionDate'
}]
}]
}
},
in: {
'$or': [{
'$$latestInspection.description': {
'$regex': /^Found a .* at the property$/
}
}, {
'$$latestInspection.description': 'Health inspection'
}]
}
}
}
})
Except I'm getting the error:
"Unrecognized expression '$$latestInspection.description'"
I'm thinking I can't use $$latestInspection.description as an object key but I'm not sure (my knowledge here is limited) and I can't figure out another way to do what I want. So you see I wasn't even able to get far enough to see if I can use $regex in $or. I appreciate all the help I can get.
Everything inside $expr is an aggregation expression, and the documentation may not "say you cannot explicitly", but the lack of any named operator and the JIRA issue SERVER-11947 certainly say that. So if you need a regular expression then you really have no other option than using $where instead:
db.getCollection('permits').find({
"$where": function() {
var description = this.inspections
.sort((a,b) => b.inspectionDate.valueOf() - a.inspectionDate.valueOf())
.shift().description;
return /^Found a .* at the property$/.test(description) ||
description === "Health Inspection";
}
})
You can still use $expr and aggregation expressions for an exact match, or just keep the comparison within the $where anyway. But at this time the only regular expressions MongoDB understands is $regex within a "query" expression.
If you did actually "require" an aggregation pipeline expression that precludes you from using $where, then the only current valid approach is to first "project" the field separately from the array and then $match with the regular query expression:
db.getCollection('permits').aggregate([
{ "$addFields": {
"lastDescription": {
"$arrayElemAt": [
"$inspections.description",
{ "$indexOfArray": [
"$inspections.inspectionDate",
{ "$max": "$inspections.inspectionDate" }
]}
]
}
}},
{ "$match": {
"lastDescription": {
"$in": [/^Found a .* at the property$/,/Health Inspection/]
}
}}
])
Which leads us to the fact that you appear to be looking for the item in the array with the maximum date value. The JavaScript syntax should be making it clear that the correct approach here is instead to $sort the array on "update". In that way the "first" item in the array can be the "latest". And this is something you can do with a regular query.
To maintain the order, ensure new items are added to the array with $push and $sort like this:
db.getCollection('permits').updateOne(
{ "_id": _idOfDocument },
{
"$push": {
"inspections": {
"$each": [{ /* Detail of inspection object */ }],
"$sort": { "inspectionDate": -1 }
}
}
}
)
In fact with an empty array argument to $each an updateMany() will update all your existing documents:
db.getCollection('permits').updateMany(
{ },
{
"$push": {
"inspections": {
"$each": [],
"$sort": { "inspectionDate": -1 }
}
}
}
)
These really only should be necessary when you in fact "alter" the date stored during updates, and those updates are best issued with bulkWrite() to effectively do "both" the update and the "sort" of the array:
db.getCollection('permits').bulkWrite([
{ "updateOne": {
"filter": { "_id": _idOfDocument, "inspections._id": indentifierForArrayElement },
"update": {
"$set": { "inspections.$.inspectionDate": new Date() }
}
}},
{ "updateOne": {
"filter": { "_id": _idOfDocument },
"update": {
"$push": { "inspections": { "$each": [], "$sort": { "inspectionDate": -1 } } }
}
}}
])
However if you did not ever actually "alter" the date, then it probably makes more sense to simply use the $position modifier and "pre-pend" to the array instead of "appending", and avoiding any overhead of a $sort:
db.getCollection('permits').updateOne(
{ "_id": _idOfDocument },
{
"$push": {
"inspections": {
"$each": [{ /* Detail of inspection object */ }],
"$position": 0
}
}
}
)
With the array permanently sorted or at least constructed so the "latest" date is actually always the "first" entry, then you can simply use a regular query expression:
db.getCollection('permits').find({
"inspections.0.description": {
"$in": [/^Found a .* at the property$/,/Health Inspection/]
}
})
So the lesson here is don't try and force calculated expressions upon your logic where you really don't need to. There should be no compelling reason why you cannot order the array content as "stored" to have the "latest date first", and even if you thought you needed the array in any other order then you probably should weigh up which usage case is more important.
Once reodered you can even take advantage of an index to some extent as long as the regular expressions are either anchored to the beginning of string or at least something else in the query expression does an exact match.
In the event you feel you really cannot reorder the array, then the $where query is your only present option until the JIRA issue resolves. Which is hopefully actually for the 4.1 release as currently targeted, but that is more than likely 6 months to a year at best estimate.

Getting only documents that meet a criteria based on values in a field containing an array of objects

I have documents like this:
{
name: 'john',
array: [{foo: 3, bar: 1},{foo:1, bar: 0},...]
}
I would like to find all the documents that have a difference between foo and bar smaller than some value in one of the entries in the array. I am currently trying to use the $where query. I get back an empty list. Is my issue with the way I am using promises or with the way I am using $where?
Code:
MongoClient.connect(config.database)
.then(function(db) {
return db.collection('MyCollection')
})
.then(function (collection) {
return collection.find(
{ $where:
function() {
for(var i = 0; i < this.array.length; i++) {
if((this.array[i].foo - this.array[i].bar) < 2) {
return true;
}
}
return false;
}
}
)
})
.then(function(cursor) {
return cursor.toArray()
})
.then(function(arr) {
console.log(arr)
})
.catch(function(err) {
throw err;
});
Using the aggregation framework with the $redact pipeline operator allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient.
Consider the following example which demonstrate the above concept:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$anyElementTrue": {
"$map": {
"input": "$array",
"as": "el",
"in": {
"$lt": [
{ "$subtract": ["$$el.foo", "$$el.bar"] },
2
]
}
}
}
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
In the above example, the $anyElementTrue and $map combo works in such a way that if any of the elements in the array actually had the difference between its foo and bar values less than 2, then this is a true match and the document is "kept". Otherwise it is "pruned" and discarded.
As a result, your refactored code should look like
MongoClient.connect(config.database)
.then(function(db) {
return db.collection('MyCollection')
})
.then(function (collection) {
return collection.aggregate([
{
"$redact": {
"$cond": [
{
"$anyElementTrue": {
"$map": {
"input": "$array",
"as": "el",
"in": {
"$lt": [
{ "$subtract": ["$$el.foo", "$$el.bar"] },
2
]
}
}
}
},
"$$KEEP",
"$$PRUNE"
]
}
}
]);
})
.then(function(cursor) {
return cursor.toArray()
})
.then(function(arr) {
console.log(arr)
})
.catch(function(err) {
throw err;
});
and this should improve in performance significantly because the $redact operator uses MongoDB's native operators whilst a query operation with the $where operator calls the JavaScript engine to evaluate Javascript code on every document and checks the condition for each.
This is very slow as MongoDB evaluates non-$where query operations before $where expressions and non-$where query statements may use an index.
It is advisable to combine with indexed queries if you can so that the query may be faster. However, it's recommended to use JavaScript expressions and the $where operator as a last resort when you can't structure the data in any other way, or when you are dealing with a small subset of data.

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.