Mongodb $cond in aggregation framework - mongodb

I have a collection with documents that look like the following:
{
ipAddr: '1.2.3.4',
"results" : [
{
"Test" : "Sight",
"Score" : "FAIL",
"Reason" : "S1002"
},
{
"Test" : "Speed",
"Score" : "FAIL",
"Reason" : "85"
},
{
"Test" : "Sound",
"Score" : "FAIL",
"Reason" : "A1001"
}
],
"finalGrade" : "FAILED"
}
Here's the aggregation query I'm trying to write, what I want to do (see commented out piece), is to create a grouped field, per ipAddr, of the
'Reason / Error' code, but only if the Reason code begins with a specific letter, and only add the code in once, I tried the following:
db.aggregate([
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push: "$finalGrade"},
// errorCodes: {$addToSet: {$cond: ["$results.Reason": /[A|B|S|N.*/, "$results.Reason", ""]}},
finalResult: {$last: "$finalGrade"} }
}
]);
Everything works, excluding the commented out 'errorCodes' line. The logic I'm attempting to create is:
"Add the the errorCodes set the value of the results.Reason code IF it begins with an A, B, S, or N, otherwise there is nothing to add".
For the Record above, the errorCodes set should contain:
...
errorCodes: [S1002,A1001],
...

$group cannot take conditional expressions, which is why that line is not working. $project is the phase where you can transform the original document based on $conditional expressions (among other things).
You need two steps in the aggregation pipeline before you can $group - first you need to $unwind the results array, and next you need to $match to filter out the results you don't care about.
That would do the simple thing of just throwing out the results with error codes you don't care about keeping, but it sounds like you want to count the total number of failures including all error codes, but then only add particular ones to the output array? There isn't a straight-forward way to do that, you would have to make two $group $unwind passes in the pipeline.
Something similar to this will do it:
db.aggregate([
{$unwind : "$results"},
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push : "$results"},
finalGrade: {$last : "$finalGrade" }
}
},
{$unwind: "$results"},
{$match: {"results.Reason":/yourMatchExpression/} },
{$group:
{ _id: "$ipAddr",
attempts: {$last:"$attempts"},
errorCodes: {$addToSet: "$results.Reason"},
finalResult: {$last: "$finalGrade"}
}
]);
If you only want to count attempts that have the matching error code then you can do that with a single $group - you will need to do $unwind, $match and $group. You could use $project with $cond as you had it, but then your array of errorCodes will have an empty string entry along with all the proper error codes.

As of Mongo 2.4, $regex can be used for pattern matching, but not as an expression returning a boolean, which is what's required by $cond
Then, you can either use a $match operator to use the $regex keyword:
http://mongotry.herokuapp.com/#?bookmarkId=52fb39e207fc4c02006fcfed
[
{
"$unwind": "$results"
},
{
"$match": {
"results.Reason": {
"$regex": "[SA].*"
}
}
},
{
"$group": {
"_id": "$ipAddr",
"attempts": {
"$sum": 1
},
"results": {
"$push": "$finalGrade"
},
"undefined": {
"$last": "$finalGrade"
},
"errorCodes": {
"$addToSet": "$results.Reason"
}
}
}
]
or you can use $substr as your pattern matching is very simple
http://mongotry.herokuapp.com/index.html#?bookmarkId=52fb47bc7f295802001baa38
[
{
"$unwind": "$results"
},
{
"$group": {
"_id": "$ipAddr",
"errorCodes": {
"$addToSet": {
"$cond": [
{
"$or": [
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"A"
]
},
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"S"
]
}
]
},
"$results.Reason",
"null"
]
}
}
}
}
]

Related

MongoDB - Cannot divide by zero error [duplicate]

Given the following record in my MongoDB table:
{
"_id" : ObjectId("5a00c1c71680084c55811ae2"),
"name" : "test",
"tenantId" : "paul",
"price" : 300,
"deposits" : [
{
"amount" : 100,
"date" : ISODate("2017-11-07T14:08:19.324Z"),
"_id" : ObjectId("5a01be55424b0f8922a5b472")
},
{
"amount" : 50,
"date" : ISODate("2017-11-87T14:08:19.324Z"),
"_id" : ObjectId("5a01be55424b0f8922a5b473")
}
],
"attention" : "",
"due" : ISODate("2017-10-26T22:00:00.000Z")
}
I would like to filter all the records with a specific tenantId, and then subtract the SUM of my amounts in the subdocument.
I found out how to Sum the Subdocument:
db.table.aggregate( [
{ $match : { tenantId: "paul" } },
{ $unwind:{ path: "$deposits", preserveNullAndEmptyArrays: true }},
{ $group: {
_id: '$_id',
deposits: { $sum: '$deposits.amount' },
} }
] );
but when i try to subtract the $sum from $price like
deposits: { $subtract: [ $price , $sum: '$deposits.amount' ] },
than i get an error saying
Error: Line 6: Unexpected token :
Actually you can simply do:
db.table.aggregate( [
{ "$match" : { "tenantId": "paul" } },
//{ $unwind:{ path: "$deposits", preserveNullAndEmptyArrays: true }},
{ "$project":
"deposits": { "$subtract": ["$price", { "$sum": "$deposits.amount" } ] }
}}
])
Since MongoDB 3.2 you can actually $project with $sum and an array of arguments ( or an array ) and therefore do not need to $unwind at all.
Changed in version 3.2: $sum is available in the $group and $project stages. In previous versions of MongoDB, $sum is available in the $group stage only.
When used in the $project stage, $sum returns the sum of the specified expression or list of expressions for each document ...
The "long" way, which is the "old" way is to actually use $unwind, but you would then actually add a $project following the $group:
db.table.aggregate( [
{ "$match" : { "tenantId": "paul" } },
{ $unwind:{ path: "$deposits", preserveNullAndEmptyArrays: true }},
{ "$group":
"_id": "$_id",
"price": { "$first": "$price" },
"deposits": { "$sum": "$deposits.amount" }
}},
{ "$project": {
"deposits": { "$subtract": [ "$price", "$deposits" ] }
}}
])
And of course you then need the $first accumulator in order to return the "price" field from the $group stage so it can be used in the following stage.
But if you can do preserveNullAndEmptyArrays, then you actually have MongoDB 3.2, and therefore are better off using the statement without the $unwind at all, since it's much faster to do it that way.

How can I get max value in nested documents?

I have a collection(named menucategories) in MongoDB 3.2.11:
{
"_id" : ...
"menus" : [
{
"code":0
},
{
"code":1
},
{
"code":2
},
{
"code":3
}
]
},
{
"_id" : ...
"menus" : [
{
"code":4
},
{
"code":5
},
{
"code":6
},
{
"code":7
}
]
},
{
"_id" : ...
"menus" : [
{
"code":8
},
{
"code":9
},
{
"code":10
},
{
"code":11
}
]
}
Every menucategory has array named menus. And every menu(element of the array) has code. The 'code' of menus is unique in every menu. I wanna get the maximum value of menu's code(in this case, 11). How can I achieve this?
If you want to find maximum value of code from all menus code then probable query will be as follows:
db.menucategories.aggregate([
{ $unwind: '$menus' },
{ $group: { _id: null, max: { $max: '$menus.code' } } },
{ $project: { max: 1, _id:0 } }
])
Click below links for more information regarding different operators:
$unwind, $group, $project
You don't need to use the $unwind aggregation pipeline operator here because starting from MongoDB 3.2, some accumulator expressions are available in the $project stage.
db.collection.aggregate([
{"$project": {"maxPerDoc": {"$max": "$menus.code"}}},
{"$group": {"_id": null, "maxValue": {"$max": "$maxPerDoc"}}}
])
Responding a previous now deleted comment, you don't need to put your pipeline in an array so the following query will work as well.
db.collection.aggregate(
{"$project": {"maxPerDoc": {"$max": "$menus.code"}}},
{"$group": {"_id": null, "maxValue": {"$max": "$maxPerDoc"}}}
)
Try with aggregation:
db.collection.aggregate({ $group : { _id: 1, max: { $max: {$max : "$menus.code"}}}});
No need of any unwind, if you need find only maximum value.

MongoDB - Operations with nested fields

I have twitter data that looks like this:
db.users.findOne()
{
"_id" : ObjectId("578ffa8e7eb9513f4f55a935"),
"user_name" : "koteras",
"retweet_count" : 0,
"tweet_followers_count" : 461,
"source" : "Twitter for iPhone",
"coordinates" : null,
"tweet_mentioned_count" : 1,
"tweet_ID" : "755891629932675072",
"tweet_text" : "RT #ochocinco: I beat them all for 10 straight hours #FIFA16KING",
"user" : {
"CreatedAt" : ISODate("2011-12-27T09:04:01Z"),
"FavouritesCount" : 5223,
"FollowersCount" : 461,
"FriendsCount" : 619,
"UserId" : 447818090,
"Location" : "501"
}
For example, I want to find the number of users that have "FollowersCount" greater than "FavouritesCount". How can I do that?
The $where operator is specifically designed for this.
db.users.find( { $where: function() { return (this.user.FollowersCount > this.user.FavouritesCount) } } );
But keep in mind that this would run single threaded JS code, and will be slower.
Another option is to use an aggregation pipeline projecting the difference, and then having a $match on the difference
db.users.aggregate([
{$project: {
diff: {$subtract: ["$user.FollowersCount", "$user.FavouritesCount"]},
// project remaining fields here
}
},
{$match: {diff: {$gt: 0}}}
])
In my experience I have found the second one to be much faster than the first.
To get the number of users that have "FollowersCount" greater than "FavouritesCount", you could use the aggregation framework which has some operators that you can apply.
Consider the first use case which looks at manipulating the comparison operators within the $project pipeline and a subsequent $match pipeline to filter documents based on the $cmp value. You can then get the final user count by applying a $group pipeline that aggregates the filtered documents:
db.users.aggregate([
{
"$project": {
"hasMoreFollowersThanFavs": {
"$cmp": [ "$user.FollowersCount", "$user.FavouritesCount" ]
}
}
},
{ "$match": { "hasMoreFollowersThanFavs": 1 } },
{
"$group": {
"_id": null,
"count": { "$sum": 1 }
}
}
])
Another option is using a single pipeline with $redact operator which incorporates the functionality of $project and $match as above and returns all documents which match a specified condition using $$KEEP system variable and discards those that don't match using the $$PRUNE system variable:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$eq": [
{ "$cmp": [ "$user.FollowersCount", "$user.FavouritesCount" ] },
1
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
"$group": {
"_id": null,
"count": { "$sum": 1 }
}
}
])

Mongodb aggregation $project get array position element field value

Document:
{
"_id" : ObjectId("560dcd15491a065d6ab1085c"),
"title" : "example title",
"views" : 1,
"messages" : [
{
"authorId" : ObjectId("560c24b853b558856ef193a3"),
"authorName" : "Karl Morrison",
"created" : ISODate("2015-10-02T00:17:25.119Z"),
"message" : "example message"
}
]
}
Project:
$project: {
_id: 1,
title: 1,
views: 1,
updated: '$messages[$messages.length-1].created' // <--- ReferenceError: $messages is not defined
}
I am trying to get the last elements created value from the array inside of the document. I was reading the documentation but this specific task has fallen short.
I've learnt it has to do with dot notation. However doesn't state how to get the last element.
You cannot just extract properties or basically change the result from a basic .find() query beyond simple top level field selection as it simply is not supported. For more advanced manipulation you can use the aggregation framework.
However, without even touching .aggregate() the $slice projection operator gets you most of the way there:
db.collection.find({},{ "messages": { "$slice": -1 } })
You cannot alter the structure, but it is the last array element with little effort.
Until a new release ( as of writing ) for MongoDB, the aggregation framework is still going to need to $unwind the array in order to get at the "last" element, which you can select with the $last grouping accumulator:
db.collection.aggregate([
{ "$unwind": "$messages" },
{ "$group": {
"_id": "$_id",
"title": { "$last": "$title" },
"views": { "$last": "$views" },
"created": { "$last": "$messages.created" }
}}
])
Future releases have $slice and $arrayElemAt in aggregation which can handle this directly. But you would also need to set a variable with $let to address the dot notated field:
[
{ "$project": {
"name": 1,
"views": 1,
"created": {
"$let": {
"vars": {
"message": {
"$arrayElemAt": [
{ "$slice": [ "$messages", -1 ] },
0
]
}
},
"in": "$$message.created"
}
}
}}
]

mongodb $aggregate empty array and multiple documents

mongodb has below document:
> db.test.find({name:{$in:["abc","abc2"]}})
{ "_id" : 1, "name" : "abc", "scores" : [ ] }
{ "_id" : 2, "name" : "abc2", "scores" : [ 10, 20 ] }
I want get scores array length for each document, how should I do?
Tried below command:
db.test.aggregate({$match:{name:"abc2"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Result:
{ "_id" : null, "count" : 2 }
But below command:
db.test.aggregate({$match:{name:"abc"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Return Nothing. Question:
How should I get each lenght of scores in 2 or more document in one
command?
Why the result of second command return nothing? and how
should I check if the array is empty?
So this is actually a common problem. The result of the $unwind phase in an aggregation pipeline where the array is "empty" is to "remove" to document from the pipeline results.
In order to return a count of "0" for such an an "empty" array then you need to do something like the following.
In MongoDB 2.6 or greater, just use $size:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$group": {
"_id": null,
"count": { "$sum": { "$size": "$scores" } }
}}
])
In earlier versions you need to do this:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$project": {
"name": 1,
"scores": {
"$cond": [
{ "$eq": [ "$scores", [] ] },
{ "$const": [false] },
"$scores"
]
}
}},
{ "$unwind": "$scores" },
{ "$group": {
"_id": null,
"count": { "$sum": {
"$cond": [
"$scores",
1,
0
]
}}
}}
])
The modern operation is simple since $size will just "measure" the array. In the latter case you need to "replace" the array with a single false value when it is empty to avoid $unwind "destroying" this for an "empty" statement.
So replacing with false allows the $cond "trinary" to choose whether to add 1 or 0 to the $sum of the overall statement.
That is how you get the length of "empty arrays".
To get the length of scores in 2 or more documents you just need to change the _id value in the $group pipeline which contains the distinct group by key, so in this case you need to group by the document _id.
Your second aggregation returns nothing because the $match query pipeline passed a document which had an empty scores array. To check if the array is empty, your match query should be
{'scores.0': {$exists: true}} or {scores: {$not: {$size: 0}}}
Overall, your aggregation should look like this:
db.test.aggregate([
{ "$match": {"scores.0": { "$exists": true } } },
{ "$unwind": "$scores" },
{
"$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}
}
])