Custom functions calculated columns mongodb projection - mongodb

I am trying to use projection to get a column calculated using a custom function on columns in collection but I couldn't't figure a way how to do it. What I could do is this:
db.collection.aggregate([$project:{column1:1, calculatedCol: {$literal:[ jaro_Winkler("how to access column name")]}] )
The code might have syntax error because I don't have the code with me right now.

You seem to think it is possible to call a JavaScript function in the aggregation pipeline, but you cannot do this. You are mistaking what is actually "interpolation" of a variable from a function result for execution within the pipeline.
For instance If I do this:
var getNumbers = function() { return [ 1,2,3 ] };
Then I call this:
db.collection.aggregate([
{ "$project": {
"mynums": getNumbers()
}}
])
Then what actually happens in the JavaScript shell the values are being "interpolated" and "before" the instruction is sent to the server, like this:
db.collection.aggregate([
{ "$project": {
"mynums": [1,2,3]
}}
])
To further demonstrate that, store a function "only" on the server:
db.system.js.save({ "_id": "hello", "value": function() { return "hello" } })
Then try to run the aggregation statement:
db.collection.aggregate([
{ "$project": {
"greeting": hello()
}}
])
And that will result in an exception:
E QUERY [main] ReferenceError: hello is not defined at (shell):1:69
Which is because the execution is happening on the "client" and not the "server" and the function does not exist on the client.
The aggregation framework cannot run JavaScript, as it has no provision to do so. All operations are performed in native code, with no JavaScript engine being invoked. Therefore you use the operators there instead:
db.collection.aggregate([
{ "$project": {
"total": { "$add": [ 1, 2 ] },
"field_total": { "$subtract": [ "$gross", "$tax" ] }
}}
])
If you cannot use the operators to acheive the results then the only way you can run JavaScript code is to run mapReduce instead, which of course uses a JavaScript engine to interface with the data from the collection. And from there you can also referce a server side function inside your logic if you need to:
{ "key": 1, "value": 1 },
{ "key": 1, "value": 2 },
{ "key": 1, "value": 3 }
db.system.js.save({ "_id": "square", "value": function(num) { return num * num } })
db.collection.mapReduce(
function() {
emit(this.key,square(this.value))
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
Returns:
{
"_id": 1,
"value": 14
}
So this is not about "how to pass in a field value" but really about the fact that the aggregation framework does not support JavaScript in any way, and that what you thought was happening is not actually the case.

Related

Mongoose - Find object with any key and specific subkey

Let's say I have a Mongo database that contains objects such as :
[
{
"test": {
"123123": {
"someField": null
}
}
},
{
"test": {
"323143": {
"someField": "lalala"
},
"121434": {
"someField": null
}
}
},
{
"test": {
"4238023": {
"someField": "afafa"
}
}
},
]
As you can see, the keys right under "test" can vary.
I want to find all documents that have at least one someField that is not null.
Something like find : "test.*.someField": { $ne: null } ( * represents any value here)
How can i do this in mongoose ? I'm thinking an aggregation pipeline will be needed here but not exactly sure how.
Constraints :
I don't have much control over the db schema in this scenario.
Ideally i don't want to have to do this logic in nodeJS, I would like to query directly via the db.
The trickiest part here is that you cannot search keys that match a pattern. Luckily there is a workaround. Yes, you do need an aggregation pipeline.
Let's look at an individual document:
{
"test": {
"4238023": {
"someField": "afafa"
}
}
}
We need to query someField, but to get to it, we need to somehow circumvent 4238023 because it varies with each document. What if we could break that test object down and look at it presented like so:
{
"k": "4238023",
"v": {
"someField": "afafa"
}
}
Suddenly, it get a heck of a lot easier to query it. Well, mongodb aggreation offers a function called $objectToArray which does exactly that.
So what we are going to do is:
Convert the test object into an array for each document.
Match only documents where AT LEAST ONE v.someField is not null.
Put it back together to look as your original documents, minus the ones that do not match the null criterion.
So, here is the pipeline you need:
db.collection.aggregate([
{
"$project": {
"arr": {
"$objectToArray": "$$ROOT.test"
}
}
},
{
"$match": {
arr: {
$elemMatch: {
"v.someField": {
$ne: null
}
}
}
}
},
{
"$project": {
"_id": 1,
"test": {
$arrayToObject: "$arr"
}
}
}
])
Playground: https://mongoplayground.net/p/b_VNuOLgUb2
Note that in mongoose you will run this aggregation the same way you would do it in a terminal... well plus the .then.
YourCollection.aggregate([
...
...
])
.then(result => console.log(result))

How to use $regex inside $or as an Aggregation Expression

I have a query which allows the user to filter by some string field using a format that looks like: "Where description of the latest inspection is any of: foo or bar". This works great with the following query:
db.getCollection('permits').find({
'$expr': {
'$let': {
vars: {
latestInspection: {
'$arrayElemAt': ['$inspections', {
'$indexOfArray': ['$inspections.inspectionDate', {
'$max': '$inspections.inspectionDate'
}]
}]
}
},
in: {
'$in': ['$$latestInspection.description', ['Fire inspection on property', 'Health inspection']]
}
}
}
})
What I want is for the user to be able to use wildcards which I turn into regular expressions: "Where description of the latest inspection is any of: Health inspection or Found a * at the property".
The regex I get, don't need help with that. The problem I'm facing is, apparently the aggregation $in operator does not support matching by regular expressions. So I thought I'd build this using $or since the docs don't say I can't use regex. This was my best attempt:
db.getCollection('permits').find({
'$expr': {
'$let': {
vars: {
latestInspection: {
'$arrayElemAt': ['$inspections', {
'$indexOfArray': ['$inspections.inspectionDate', {
'$max': '$inspections.inspectionDate'
}]
}]
}
},
in: {
'$or': [{
'$$latestInspection.description': {
'$regex': /^Found a .* at the property$/
}
}, {
'$$latestInspection.description': 'Health inspection'
}]
}
}
}
})
Except I'm getting the error:
"Unrecognized expression '$$latestInspection.description'"
I'm thinking I can't use $$latestInspection.description as an object key but I'm not sure (my knowledge here is limited) and I can't figure out another way to do what I want. So you see I wasn't even able to get far enough to see if I can use $regex in $or. I appreciate all the help I can get.
Everything inside $expr is an aggregation expression, and the documentation may not "say you cannot explicitly", but the lack of any named operator and the JIRA issue SERVER-11947 certainly say that. So if you need a regular expression then you really have no other option than using $where instead:
db.getCollection('permits').find({
"$where": function() {
var description = this.inspections
.sort((a,b) => b.inspectionDate.valueOf() - a.inspectionDate.valueOf())
.shift().description;
return /^Found a .* at the property$/.test(description) ||
description === "Health Inspection";
}
})
You can still use $expr and aggregation expressions for an exact match, or just keep the comparison within the $where anyway. But at this time the only regular expressions MongoDB understands is $regex within a "query" expression.
If you did actually "require" an aggregation pipeline expression that precludes you from using $where, then the only current valid approach is to first "project" the field separately from the array and then $match with the regular query expression:
db.getCollection('permits').aggregate([
{ "$addFields": {
"lastDescription": {
"$arrayElemAt": [
"$inspections.description",
{ "$indexOfArray": [
"$inspections.inspectionDate",
{ "$max": "$inspections.inspectionDate" }
]}
]
}
}},
{ "$match": {
"lastDescription": {
"$in": [/^Found a .* at the property$/,/Health Inspection/]
}
}}
])
Which leads us to the fact that you appear to be looking for the item in the array with the maximum date value. The JavaScript syntax should be making it clear that the correct approach here is instead to $sort the array on "update". In that way the "first" item in the array can be the "latest". And this is something you can do with a regular query.
To maintain the order, ensure new items are added to the array with $push and $sort like this:
db.getCollection('permits').updateOne(
{ "_id": _idOfDocument },
{
"$push": {
"inspections": {
"$each": [{ /* Detail of inspection object */ }],
"$sort": { "inspectionDate": -1 }
}
}
}
)
In fact with an empty array argument to $each an updateMany() will update all your existing documents:
db.getCollection('permits').updateMany(
{ },
{
"$push": {
"inspections": {
"$each": [],
"$sort": { "inspectionDate": -1 }
}
}
}
)
These really only should be necessary when you in fact "alter" the date stored during updates, and those updates are best issued with bulkWrite() to effectively do "both" the update and the "sort" of the array:
db.getCollection('permits').bulkWrite([
{ "updateOne": {
"filter": { "_id": _idOfDocument, "inspections._id": indentifierForArrayElement },
"update": {
"$set": { "inspections.$.inspectionDate": new Date() }
}
}},
{ "updateOne": {
"filter": { "_id": _idOfDocument },
"update": {
"$push": { "inspections": { "$each": [], "$sort": { "inspectionDate": -1 } } }
}
}}
])
However if you did not ever actually "alter" the date, then it probably makes more sense to simply use the $position modifier and "pre-pend" to the array instead of "appending", and avoiding any overhead of a $sort:
db.getCollection('permits').updateOne(
{ "_id": _idOfDocument },
{
"$push": {
"inspections": {
"$each": [{ /* Detail of inspection object */ }],
"$position": 0
}
}
}
)
With the array permanently sorted or at least constructed so the "latest" date is actually always the "first" entry, then you can simply use a regular query expression:
db.getCollection('permits').find({
"inspections.0.description": {
"$in": [/^Found a .* at the property$/,/Health Inspection/]
}
})
So the lesson here is don't try and force calculated expressions upon your logic where you really don't need to. There should be no compelling reason why you cannot order the array content as "stored" to have the "latest date first", and even if you thought you needed the array in any other order then you probably should weigh up which usage case is more important.
Once reodered you can even take advantage of an index to some extent as long as the regular expressions are either anchored to the beginning of string or at least something else in the query expression does an exact match.
In the event you feel you really cannot reorder the array, then the $where query is your only present option until the JIRA issue resolves. Which is hopefully actually for the 4.1 release as currently targeted, but that is more than likely 6 months to a year at best estimate.

mongodb count two condition one time?

I need to count two query like below:
1. data[k]['success'] = common.find({'url': {"$regex": k}, 'result.titile':{'$ne':''}}).count()
2. data[k]['fail'] = common.find({'url': {"$regex": k}, 'result.titile':''}).count()
I think it would be more efficient if mongodb can work like below:
result = common.find({'url': {"$regex": k})
count1 = result.find({'result.titile':{'$ne':''}})
count2 = result.count() - count1
//result do not have find or count method, just for example
Two count are basing same search condition{'url': {"$regex": k}, splited by {'result.titile':{'$ne':''}} or not.
Is there some build-in way to do these without writing custom js?
The async method would be the preferred one if at all your client supports it.
You could also aggregate as below:
$match the docs which have the urls.
$group by the _id as null, and take the $sum of all documents. We need those documents, to get the sum of those which do not have a title, so just accumulate them using the $push operator.
$unwind the documents.
$match those which do not have a title.
$group, and get the $sum.
$project the desired result.
sample code:
db.t.aggregate([
{$match:{"url":{"$regex":k}}},
{$group:{"_id":null,
"count_of_url_matching_docs":{$sum:1},
"docs":{$push:"$$ROOT"}}},
{$unwind:"$docs"},
{$match:{"docs.result.titile":{$ne:""}}},
{$group:{"_id":null,
"count_of_url_matching_docs":{$first:"$count_of_url_matching_docs"},
"count_of_docs_with_titles":{$sum:1}}},
{$project:{"_id":0,
"count_of_docs_with_titles":"$count_of_docs_with_titles",
"count_difference":{$subtract:[
"$count_of_url_matching_docs",
"$count_of_docs_with_titles"]}}}
])
Test data:
db.t.insert([
{"url":"s","result":{"titile":1}},
{"url":"s","result":{"titile":""}},
{"url":"s","result":{"titile":""}},
{"url":"s","result":{"titile":2}}
])
Test Result:
{ "count_of_docs_with_titles" : 2, "count_difference" : 2 }
Use .aggregate() with a conditional key for grouping via $cond:
common.aggregate([
{ "$match": { "url": { "$regex": k } } },
{ "$group": {
"_id": {
"$cond": {
"if": { "$ne": [ "$result.title", "" ] },
"then": "success",
"else": "fail"
}
},
"count": { "$sum": 1 }
}}
])
However it is actually more efficient to run both queries in parallel if your environment supports it, such as with nodejs
async.parallel(
[
function(callback) {
common.count({
"url": { "$regex": k },
"result.title": { "$ne": "" }
}, function(err,count) {
callback(err,{ "success": count });
});
},
function(callback) {
common.count({
"url": { "$regex": k },
"result.title": ""
}, function(err,count) {
callback(err,{ "fail": count });
});
}
],
function(err,results) {
if (err) throw err;
console.log(results);
}
)
Which makes sense really since each item is not being tested and each result can actually run on the server at the same time.

mongodb query with comparison of property of itself

i have such documents
{
"_id": ObjectId("524a498ee4b018b89437f88a"),
"counter": {
"0": {
"date": "2013.9",
"counter": NumberInt(1425)
},
"1": {
"date": "2013.10",
"counter": NumberInt(1425)
}
},
"profile": ObjectId("510576242b5e30877c654aff")
}
and i wanted to search for those, where the counter.0.counter not equals counter.1.counter
tryed
db.counter.find({"profile":ObjectId("510576242b5e30877c654aff"),"counter.0.counter":{$ne:"counter.1.counter"} });
but it says its not a valid json query :/
an help ?
Two things.
You cannot actually compare like this unless resorting to JavaScript or using the aggregation framework. The form with aggregate is the better option:
db.collection.aggregate([
{ "$project": {
"counter": 1,
"matched": { "$eq": [
"$counter.0.counter",
"$counter.1.counter"
]}
}},
{ "$match": { "matched": true } }
])
Or with the bad use of JavaScript:
db.collection.find({
"$where": function() {
return this.counter.0.counter == this.counter.1.counter;
}
})
So those are the ways this can be done.
The big problems with the JavaScript $where operator are:
Invokes the JavaScript interpreter to evaluate every result document and is not native code.
Removes any opportunity to use an index to find the results as needed. By other methods you can actually use an index with a a separate "match" condition. But this operator removes that chance.

MongoDB: how to do $or with $where (it doesn't do logical OR)

How can we use $or with such a $where clause?
This query should always be returning all records (because of the date in 2015), but it doesn't return anything.
In parts, it works, but when trying to apply the $or to the Date or $where, it doesn't work as intended.
Thanks to Sammaye to fixing my previous version of this, to the following (still not working though):
db.turnys.find({
$or:[
{ start:{
$lte:new Date("2015-03-31T09:52:29.338Z")
} },
{ $where:"this.users.length == this.seats" }
]
});
How can I accomplish the intended $or?
Here is a sample of the turnys collection:
[
{
"gId": "5335e4a7b8cf51bcd054b423",
"seats": 2,
"start": "2014-03-31T08:47:48.946Z",
"end": "2014-03-31T08:49:48.946Z",
"rMin": 800,
"rMax": 900,
"users": [],
"_id": "53392bb42b70450000a834d8"
},
{
"gId": "5335e4a7b8cf51bcd054b423",
"seats": 2,
"start": "2014-03-31T08:47:48.946Z",
"end": "2014-03-31T08:49:48.946Z",
"rMin": 1000,
"rMax": 1100,
"users": [],
"_id": "53392bb42b70450000a834da"
},
Thanks!
The problem is that $ors do not work that way, in reality what you need is:
db.turnys.find({
$or:[
{ start:{
$lte:new Date("2015-03-31T09:52:29.338Z")
} },
{ $where:"this.users.length == this.seats" }
]
});
That will now create an $or query with two clauses. Each element of the $or array is classed as a $anded clause.
As I referenced to you on your other question, the use of the $where operator should be avoided as shown in the given reasons there.
So again as shown what you should be doing is "allocating" a total_users value within your document, using the $inc operator on updates. But your "query" should look like this with the use of .aggregate():
db.collection.aggregate([
{ "$project": {
"gId": 1,
"start": 1,
"alloc": { "$eq": [ "$total_users", "$seats" ] }
}},
{ "$match": {
"$or": [
{ "alloc": 1, },
{ "start": { "$lte": new Date("2015-03-31T09:52:29.338Z") } }
]
}}
])
Or even possibly use the "array size" form that was mentioned with more recent versions ( still to be released as of writing ) of MongoDB.
But also to "clarify" you need to make sure your "test" operations are actually valid.