How to use $regex inside $or as an Aggregation Expression - mongodb

I have a query which allows the user to filter by some string field using a format that looks like: "Where description of the latest inspection is any of: foo or bar". This works great with the following query:
db.getCollection('permits').find({
'$expr': {
'$let': {
vars: {
latestInspection: {
'$arrayElemAt': ['$inspections', {
'$indexOfArray': ['$inspections.inspectionDate', {
'$max': '$inspections.inspectionDate'
}]
}]
}
},
in: {
'$in': ['$$latestInspection.description', ['Fire inspection on property', 'Health inspection']]
}
}
}
})
What I want is for the user to be able to use wildcards which I turn into regular expressions: "Where description of the latest inspection is any of: Health inspection or Found a * at the property".
The regex I get, don't need help with that. The problem I'm facing is, apparently the aggregation $in operator does not support matching by regular expressions. So I thought I'd build this using $or since the docs don't say I can't use regex. This was my best attempt:
db.getCollection('permits').find({
'$expr': {
'$let': {
vars: {
latestInspection: {
'$arrayElemAt': ['$inspections', {
'$indexOfArray': ['$inspections.inspectionDate', {
'$max': '$inspections.inspectionDate'
}]
}]
}
},
in: {
'$or': [{
'$$latestInspection.description': {
'$regex': /^Found a .* at the property$/
}
}, {
'$$latestInspection.description': 'Health inspection'
}]
}
}
}
})
Except I'm getting the error:
"Unrecognized expression '$$latestInspection.description'"
I'm thinking I can't use $$latestInspection.description as an object key but I'm not sure (my knowledge here is limited) and I can't figure out another way to do what I want. So you see I wasn't even able to get far enough to see if I can use $regex in $or. I appreciate all the help I can get.

Everything inside $expr is an aggregation expression, and the documentation may not "say you cannot explicitly", but the lack of any named operator and the JIRA issue SERVER-11947 certainly say that. So if you need a regular expression then you really have no other option than using $where instead:
db.getCollection('permits').find({
"$where": function() {
var description = this.inspections
.sort((a,b) => b.inspectionDate.valueOf() - a.inspectionDate.valueOf())
.shift().description;
return /^Found a .* at the property$/.test(description) ||
description === "Health Inspection";
}
})
You can still use $expr and aggregation expressions for an exact match, or just keep the comparison within the $where anyway. But at this time the only regular expressions MongoDB understands is $regex within a "query" expression.
If you did actually "require" an aggregation pipeline expression that precludes you from using $where, then the only current valid approach is to first "project" the field separately from the array and then $match with the regular query expression:
db.getCollection('permits').aggregate([
{ "$addFields": {
"lastDescription": {
"$arrayElemAt": [
"$inspections.description",
{ "$indexOfArray": [
"$inspections.inspectionDate",
{ "$max": "$inspections.inspectionDate" }
]}
]
}
}},
{ "$match": {
"lastDescription": {
"$in": [/^Found a .* at the property$/,/Health Inspection/]
}
}}
])
Which leads us to the fact that you appear to be looking for the item in the array with the maximum date value. The JavaScript syntax should be making it clear that the correct approach here is instead to $sort the array on "update". In that way the "first" item in the array can be the "latest". And this is something you can do with a regular query.
To maintain the order, ensure new items are added to the array with $push and $sort like this:
db.getCollection('permits').updateOne(
{ "_id": _idOfDocument },
{
"$push": {
"inspections": {
"$each": [{ /* Detail of inspection object */ }],
"$sort": { "inspectionDate": -1 }
}
}
}
)
In fact with an empty array argument to $each an updateMany() will update all your existing documents:
db.getCollection('permits').updateMany(
{ },
{
"$push": {
"inspections": {
"$each": [],
"$sort": { "inspectionDate": -1 }
}
}
}
)
These really only should be necessary when you in fact "alter" the date stored during updates, and those updates are best issued with bulkWrite() to effectively do "both" the update and the "sort" of the array:
db.getCollection('permits').bulkWrite([
{ "updateOne": {
"filter": { "_id": _idOfDocument, "inspections._id": indentifierForArrayElement },
"update": {
"$set": { "inspections.$.inspectionDate": new Date() }
}
}},
{ "updateOne": {
"filter": { "_id": _idOfDocument },
"update": {
"$push": { "inspections": { "$each": [], "$sort": { "inspectionDate": -1 } } }
}
}}
])
However if you did not ever actually "alter" the date, then it probably makes more sense to simply use the $position modifier and "pre-pend" to the array instead of "appending", and avoiding any overhead of a $sort:
db.getCollection('permits').updateOne(
{ "_id": _idOfDocument },
{
"$push": {
"inspections": {
"$each": [{ /* Detail of inspection object */ }],
"$position": 0
}
}
}
)
With the array permanently sorted or at least constructed so the "latest" date is actually always the "first" entry, then you can simply use a regular query expression:
db.getCollection('permits').find({
"inspections.0.description": {
"$in": [/^Found a .* at the property$/,/Health Inspection/]
}
})
So the lesson here is don't try and force calculated expressions upon your logic where you really don't need to. There should be no compelling reason why you cannot order the array content as "stored" to have the "latest date first", and even if you thought you needed the array in any other order then you probably should weigh up which usage case is more important.
Once reodered you can even take advantage of an index to some extent as long as the regular expressions are either anchored to the beginning of string or at least something else in the query expression does an exact match.
In the event you feel you really cannot reorder the array, then the $where query is your only present option until the JIRA issue resolves. Which is hopefully actually for the 4.1 release as currently targeted, but that is more than likely 6 months to a year at best estimate.

Related

Mongoose - Find object with any key and specific subkey

Let's say I have a Mongo database that contains objects such as :
[
{
"test": {
"123123": {
"someField": null
}
}
},
{
"test": {
"323143": {
"someField": "lalala"
},
"121434": {
"someField": null
}
}
},
{
"test": {
"4238023": {
"someField": "afafa"
}
}
},
]
As you can see, the keys right under "test" can vary.
I want to find all documents that have at least one someField that is not null.
Something like find : "test.*.someField": { $ne: null } ( * represents any value here)
How can i do this in mongoose ? I'm thinking an aggregation pipeline will be needed here but not exactly sure how.
Constraints :
I don't have much control over the db schema in this scenario.
Ideally i don't want to have to do this logic in nodeJS, I would like to query directly via the db.
The trickiest part here is that you cannot search keys that match a pattern. Luckily there is a workaround. Yes, you do need an aggregation pipeline.
Let's look at an individual document:
{
"test": {
"4238023": {
"someField": "afafa"
}
}
}
We need to query someField, but to get to it, we need to somehow circumvent 4238023 because it varies with each document. What if we could break that test object down and look at it presented like so:
{
"k": "4238023",
"v": {
"someField": "afafa"
}
}
Suddenly, it get a heck of a lot easier to query it. Well, mongodb aggreation offers a function called $objectToArray which does exactly that.
So what we are going to do is:
Convert the test object into an array for each document.
Match only documents where AT LEAST ONE v.someField is not null.
Put it back together to look as your original documents, minus the ones that do not match the null criterion.
So, here is the pipeline you need:
db.collection.aggregate([
{
"$project": {
"arr": {
"$objectToArray": "$$ROOT.test"
}
}
},
{
"$match": {
arr: {
$elemMatch: {
"v.someField": {
$ne: null
}
}
}
}
},
{
"$project": {
"_id": 1,
"test": {
$arrayToObject: "$arr"
}
}
}
])
Playground: https://mongoplayground.net/p/b_VNuOLgUb2
Note that in mongoose you will run this aggregation the same way you would do it in a terminal... well plus the .then.
YourCollection.aggregate([
...
...
])
.then(result => console.log(result))

MongoDB: adding fields based on partial match query - expression vs query

So I have one collection that I'd like to query/aggegate. The query is made up of several parts that are OR'ed together. For every part of the query, I have a specific set of fields that need to be shown.
So my hope was to do this with an aggregate, that will $match the queries OR'ed together all at once, and then use $project with $cond to see what fields are needed. The problem here is that $cond uses expressions, while the $match uses queries. Which is a problem since some query features are not available as an expression. So a simple conversion is not an option.
So I need another solution..
- I could just make an aggregate per separate query, because there I know what fields to match, and them merger the results together. But this will not work if I use pagination in the queries (limit/skip etc).
- find some other way to tag every document so I can (afterwards) remove any fields not needed. It might not be super efficient, but would work. No clue yet how to do that
- figure out a way to make queries that are only made of expressions. For my purpose that might be good enough, and it would mean a rewrite of the query parser. It could work, but is not ideal.
So This is the next incarnation right here. It will deduplicate and merge records and finally transform it back again to something resembling a normal query result:
db.getCollection('somecollection').aggregate(
[
{
"$facet": {
"f1": [
{
"$match": {
<some query 1>
},
{
"$project: {<some fixed field projection>}
}
],
"f2": [
{
"$match": {
<some query 1>
}
},
{
"$project: {<some fixed field projection>}
}
]
}
},
{
$project: {
"rt": { $concatArrays: [ "$f1", "$f2"] }
}
},
{ $unwind: { path: "$rt"} },
{ $replaceRoot: {newRoot:"$rt"}},
{ $group: {_id: "$_id", items: {$push: {item:"$$ROOT"} } }},
{
$project: {
"rt": { $mergeObjects: "$items" }
}
},
{ $replaceRoot: {newRoot:"$rt.item"}},
]
);
There might still be some optimisation to be, so any comments are welcome
I found an extra option using $facet. This way, I can make a facet for every group opf fields/subqueries. This seems to work fine, except that the result is a single document with a bunch of arrays. not yet sure how to convert that back to multiple documents.
okay, so now I have it figured out. I'm not sure yet about all of the intricacies of this solution, but it seems to work in general. Here an example:
db.getCollection('somecollection').aggregate(
[
{
"$facet": {
"f1": [
{
"$match": {
<some query 1>
},
{
"$project: {<some fixed field projection>
}
],
"f2": [
{
"$match": {
<some query 1>
}
},
{
"$project: {<some fixed field projection>
}
]
}
},
{
$project: {
"rt": { $concatArrays: [ "$f1", "$f2"] }
}
},
{ $unwind: { path: "$rt"} },
{ $replaceRoot: {newRoot:"$rt"}}
]
);

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.

Check last element in array matches a condition

I have an array of numbers in my mongodb documents and need to check if the last number in that array meets my conditions.
My documents are stored like this:
{
name: String,
data: {
dates: Array,
numbers: Array
}
}
and I need to check if the last number in numbers "lies between" two other numbers.
Any suggestions on how to do this would be appreciated.
Right now the most effficient way you have of doing this is using the JavaScript evaluation of $where as you can simply find the value of the last array element and test it programatically.
With sample documents:
{ "a": [1,2,3] },
{ "a": [1,2,4] },
{ "a": [1,2,5] }
And to query:
db.collection.find(function() { var a = this.a.pop(); return ( a > 2 ) & ( a < 5 ) })
Or simply presented with $where as a string for evaluation:
Model.find(
{
"$where": "var a = this.a.pop(); return ( a > 2 ) && ( a < 5 )"
},
function(err,results) {
// handling here
}
);
Which is a really simple way to do this and does not have "overhead" such as $unwind in the aggregation framework created to to "denormalize" and process arrays. Not really efficient there.
In the "future" however, it will be. As is currently available in development releases, there is a $slice operator for the aggregation framework. This operator will allow easy access to the "last" array element for testing.
Since the aggregation framework operators are in "native code" aand not JavaScript to be interpreted, then a single pipeline stage then becomes more efficient than the JavaScript form. Though this listing to do this looks longer in submission:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$anyElementTrue": {
"$map": {
"input": { "$slice": ["$a",-1] },
"as": "el",
"in":{
"$and": [
{ "$gt": [ "$$el", 2 ] },
{ "$lt": [ "$$el", 5 ] }
]
}
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The $redact operator that already exists is used to "logically filter" with a comparison expression here. Based on the true/false match conditions it either "keeps" or "prunes" the document from the results repectively.
The $slice operator itself in it's aggregagtion framework form will still untimately return an array, albeit a single element array in this case. This is why $map is used to "transform" each element into a true/false condition and the $anyElementTrue operator reduces the "array" to a singular reponse as is repected by $cond.
So when that is released, then it will be be most efficient way to do this. But until then, stick with the JavaScript as it is presently the fastest way to to this evaluation.
Both query forms return just the first two documents of the sample here:
{ "a": [1,2,3] },
{ "a": [1,2,4] }
MongoDB aggregate may be a feasible way. Assuming name field in your document is unique.
If you have the sample document.
{
name: "allen",
data: {
dates: ["2015-08-08"],
numbers: [20, 21, 22, 23]
}
}
The following code is used to do the check. As the db.collection.aggregate() method returns a cursor and then we can use cursor's hasNext to decide whether the last number lies between the given two numbers.
var result = db.last_one.aggregate(
[
{
// deconstruct the array field numbers
$unwind: "$data.numbers"
},
{
$group: {
_id: "$name",
// lastNumber is 23 in this case
lastNumber: { $last: "$data.numbers" }
}
},
{
$match: {
lastNumber: { $gt: num1, $lt: num2 }
}
}
]
).hasNext()
if (result) print("matched"); else print("not matched")
For example, if num1 is 22, num2 is 24, the result is matched; if num1 is 21, num2 is 22, the result is not matched.
But actually, group on name is not a good idea. It's much better if your document has an unique ObjectId then we can group on that _id.

mongodb query with comparison of property of itself

i have such documents
{
"_id": ObjectId("524a498ee4b018b89437f88a"),
"counter": {
"0": {
"date": "2013.9",
"counter": NumberInt(1425)
},
"1": {
"date": "2013.10",
"counter": NumberInt(1425)
}
},
"profile": ObjectId("510576242b5e30877c654aff")
}
and i wanted to search for those, where the counter.0.counter not equals counter.1.counter
tryed
db.counter.find({"profile":ObjectId("510576242b5e30877c654aff"),"counter.0.counter":{$ne:"counter.1.counter"} });
but it says its not a valid json query :/
an help ?
Two things.
You cannot actually compare like this unless resorting to JavaScript or using the aggregation framework. The form with aggregate is the better option:
db.collection.aggregate([
{ "$project": {
"counter": 1,
"matched": { "$eq": [
"$counter.0.counter",
"$counter.1.counter"
]}
}},
{ "$match": { "matched": true } }
])
Or with the bad use of JavaScript:
db.collection.find({
"$where": function() {
return this.counter.0.counter == this.counter.1.counter;
}
})
So those are the ways this can be done.
The big problems with the JavaScript $where operator are:
Invokes the JavaScript interpreter to evaluate every result document and is not native code.
Removes any opportunity to use an index to find the results as needed. By other methods you can actually use an index with a a separate "match" condition. But this operator removes that chance.