Mongodb aggregation $project get array position element field value - mongodb

Document:
{
"_id" : ObjectId("560dcd15491a065d6ab1085c"),
"title" : "example title",
"views" : 1,
"messages" : [
{
"authorId" : ObjectId("560c24b853b558856ef193a3"),
"authorName" : "Karl Morrison",
"created" : ISODate("2015-10-02T00:17:25.119Z"),
"message" : "example message"
}
]
}
Project:
$project: {
_id: 1,
title: 1,
views: 1,
updated: '$messages[$messages.length-1].created' // <--- ReferenceError: $messages is not defined
}
I am trying to get the last elements created value from the array inside of the document. I was reading the documentation but this specific task has fallen short.
I've learnt it has to do with dot notation. However doesn't state how to get the last element.

You cannot just extract properties or basically change the result from a basic .find() query beyond simple top level field selection as it simply is not supported. For more advanced manipulation you can use the aggregation framework.
However, without even touching .aggregate() the $slice projection operator gets you most of the way there:
db.collection.find({},{ "messages": { "$slice": -1 } })
You cannot alter the structure, but it is the last array element with little effort.
Until a new release ( as of writing ) for MongoDB, the aggregation framework is still going to need to $unwind the array in order to get at the "last" element, which you can select with the $last grouping accumulator:
db.collection.aggregate([
{ "$unwind": "$messages" },
{ "$group": {
"_id": "$_id",
"title": { "$last": "$title" },
"views": { "$last": "$views" },
"created": { "$last": "$messages.created" }
}}
])
Future releases have $slice and $arrayElemAt in aggregation which can handle this directly. But you would also need to set a variable with $let to address the dot notated field:
[
{ "$project": {
"name": 1,
"views": 1,
"created": {
"$let": {
"vars": {
"message": {
"$arrayElemAt": [
{ "$slice": [ "$messages", -1 ] },
0
]
}
},
"in": "$$message.created"
}
}
}}
]

Related

Performing $lookup based on matching object attribute in other collection's array

I am trying to perform $lookup on collection with conditions, the problem I am facing is that I would like to match the text field of all objects which are inside an array (accounts array) in other (plates) collection.
I have tried using $map as well as $in and $setIntersection but nothing seems to work. And, I am unable to find a way to match the text fields of each of the objects in array.
My document structures are as follows:
plates collection:
{
"_id": "Batch 1",
"rego" : "1QX-WA-123",
"date" : 1516374000000.0
"accounts": [{
"text": "Acc1",
"date": 1516374000000
},{
"text": "Acc2",
"date": 1516474000000
}]
}
accounts collection:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0
}
I am trying to achieve something like this:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$eq': [ '$account.text', '$$accountId' ] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
}
}],
as: 'cusips'
}
},
The output I am trying to get is:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0,
"plates": [{
"_id": "Batch 1",
"rego": "1QX-WA-123"
}]
}
Personally I would be initiating the aggregation from the "plates" collection instead where the initial $match conditions can filter the date range more cleanly. Getting your desired output is then a simple matter of "unwinding" the resulting "accounts" matches and "inverting" the content.
Easy enough with MongoDB 3.6 features which you must have in order to use $lookup with $expr. We even don't need that form for $lookup here:
db.plates.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
}
}},
{ "$lookup": {
"from": "accounts",
"localField": "accounts.text",
"foreignField": "_id",
"as": "accounts"
}},
{ "$unwind": "$accounts" },
{ "$group": {
"_id": "$accounts",
"plates": { "$push": { "_id": "$_id", "rego": "$rego" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": ["$_id", { "plates": "$plates" }]
}
}}
])
This of course is an "INNER JOIN" which would only return "accounts" entries where the matc
Doing the "join" from the "accounts" collection means you need additional handling to remove the non-matching entries from the "accounts" array within the "plates" collection:
db.accounts.aggregate([
{ "$lookup": {
"from": "plates",
"let": { "account": "$_id" },
"pipeline": [
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
},
"$expr": { "$in": [ "$$account", "$accounts.text" ] }
}},
{ "$project": { "_id": 1, "rego": 1 } }
],
"as": "plates"
}}
])
Note that the $match on the "date" properties should be expressed as a regular query condition instead of within the $expr block for optimal performance of the query.
The $in is used to compare the "array" of "$accounts.text" values to the local variable defined for the "_id" value of the "accounts" document being joined to. So the first argument to $in is the "single" value and the second is the "array" of just the "text" values which should be matching.
This is also notably a "LEFT JOIN" which returns all "accounts" regardless of whether there are any matching "plates" to the conditions, and therefore you can possibly end up with an empty "plates" array in the results returned. You can filter those out if you didn't want them, but where that was the case the former query form is really far more efficient than this one since the relation is defined and we only ever deal with "plates" which would meet the criteria.
Either method returns the same response from the data provided in the question:
{
"_id" : "Acc1",
"date" : 1516374000000,
"createdAt" : 1513810712802,
"plates" : [
{
"_id" : "Batch 1",
"rego" : "1QX-WA-123"
}
]
}
Which direction you actually take that from really depends on whether the "LEFT" or "INNER" join form is what you really want and also where the most efficient query conditions can be made for the items you actually want to select.
Hmm, not sure how you tried $in, but it works for me:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$in': [ '$$accountId', '$accounts.text'] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
},
}],
as: 'cusips'
}
}

Calculating average in mongo doesn't return the expected result [duplicate]

I want to calculate the rating_average field of this object with the rating fields inside the array ratings. Can you help me to understand how to use aggregation with $avg?
{
"title": "The Hobbit",
"rating_average": "???",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
}
The aggregation framework in MongoDB 3.4 and newer offers the $reduce operator which efficiently calculates the total without the need for extra pipelines. Consider using it as an expression to return the
total ratings and get the number of ratings using $size. Together with $addFields, the average can thus be calculated using the arithmetic operator $divide as in the formula average = total ratings/number of ratings:
db.collection.aggregate([
{
"$addFields": {
"rating_average": {
"$divide": [
{ // expression returns total
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": { "$add": ["$$value", "$$this.rating"] }
}
},
{ // expression returns ratings count
"$cond": [
{ "$ne": [ { "$size": "$ratings" }, 0 ] },
{ "$size": "$ratings" },
1
]
}
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("58ab48556da32ab5198623f4"),
"title" : "The Hobbit",
"ratings" : [
{
"title" : "best book ever",
"rating" : 5.0
},
{
"title" : "good book",
"rating" : 3.5
}
],
"rating_average" : 4.25
}
With older versions, you would need to first apply the $unwind operator on the ratings array field first as your initial aggregation pipeline step. This will deconstruct the ratings array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
The second pipeline stage would be the $group operator which groups input documents by the _id and title keys identifier expression and applies the desired $avg accumulator expression to each group that calculates the average. There is another accumulator operator $push that preserves the original ratings array field by returning an array of all values that result from applying an expression to each document in the above group.
The final pipeline step is the $project operator which then reshapes each document in the stream, such as by adding the new field ratings_average.
So, if for instance you have a sample document in your collection (as from above and so below):
db.collection.insert({
"title": "The Hobbit",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
})
To calculate the ratings array average and projecting the value in another field ratings_average, you can then apply the following aggregation pipeline:
db.collection.aggregate([
{
"$unwind": "$ratings"
},
{
"$group": {
"_id": {
"_id": "$_id",
"title": "$title"
},
"ratings":{
"$push": "$ratings"
},
"ratings_average": {
"$avg": "$ratings.rating"
}
}
},
{
"$project": {
"_id": 0,
"title": "$_id.title",
"ratings_average": 1,
"ratings": 1
}
}
])
Result:
/* 1 */
{
"result" : [
{
"ratings" : [
{
"title" : "best book ever",
"rating" : 5
},
{
"title" : "good book",
"rating" : 3.5
}
],
"ratings_average" : 4.25,
"title" : "The Hobbit"
}
],
"ok" : 1
}
This really could be written so much shorter, and this was even true at the time of writing. If you want an "average" simply use $avg:
db.collection.aggregate([
{ "$addFields": {
"rating_average": { "$avg": "$ratings.rating" }
}}
])
The reason for this is that as of MongoDB 3.2 the $avg operator gained "two" things:
The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group
Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:
{ "array": [ "$fielda", "$fieldb" ] }
or in notating a single property from the array as an array of the values of that property:
{ "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
In earlier releases you would have to use $map in order to access the "rating" property inside each array element. Now you don't.
For the record, even the $reduce usage can be simplified:
db.collection.aggregate([
{ "$addFields": {
"rating_average": {
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": {
"$add": [
"$$value",
{ "$divide": [
"$$this.rating",
{ "$size": { "$ifNull": [ "$ratings", [] ] } }
]}
]
}
}
}
}}
])
Yes as stated, this is really just re-implementing the existing $avg functionality, and therefore since that operator is available then it is the one that should be used.
As you have your to-be-calculated-average data in an array, first you need to unwind it. Do it by using the $unwind in your aggregation pipeline:
{$unwind: "$ratings"}
Then you may access each element of the array as an embedded document with key ratings in the result documents of the aggregation. Then you just need to $group by title and calculate $avg:
{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}
Then just recover your title field:
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
So here is your result aggregation pipeline:
db.yourCollection.aggregate([
{$unwind: "$ratings"},
{$group: {_id: "$title",
ratings: {$push: "$ratings"},
average: {$avg: "$ratings.rating"}
}
},
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
])

Calculate the average of fields in embedded documents/array

I want to calculate the rating_average field of this object with the rating fields inside the array ratings. Can you help me to understand how to use aggregation with $avg?
{
"title": "The Hobbit",
"rating_average": "???",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
}
The aggregation framework in MongoDB 3.4 and newer offers the $reduce operator which efficiently calculates the total without the need for extra pipelines. Consider using it as an expression to return the
total ratings and get the number of ratings using $size. Together with $addFields, the average can thus be calculated using the arithmetic operator $divide as in the formula average = total ratings/number of ratings:
db.collection.aggregate([
{
"$addFields": {
"rating_average": {
"$divide": [
{ // expression returns total
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": { "$add": ["$$value", "$$this.rating"] }
}
},
{ // expression returns ratings count
"$cond": [
{ "$ne": [ { "$size": "$ratings" }, 0 ] },
{ "$size": "$ratings" },
1
]
}
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("58ab48556da32ab5198623f4"),
"title" : "The Hobbit",
"ratings" : [
{
"title" : "best book ever",
"rating" : 5.0
},
{
"title" : "good book",
"rating" : 3.5
}
],
"rating_average" : 4.25
}
With older versions, you would need to first apply the $unwind operator on the ratings array field first as your initial aggregation pipeline step. This will deconstruct the ratings array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
The second pipeline stage would be the $group operator which groups input documents by the _id and title keys identifier expression and applies the desired $avg accumulator expression to each group that calculates the average. There is another accumulator operator $push that preserves the original ratings array field by returning an array of all values that result from applying an expression to each document in the above group.
The final pipeline step is the $project operator which then reshapes each document in the stream, such as by adding the new field ratings_average.
So, if for instance you have a sample document in your collection (as from above and so below):
db.collection.insert({
"title": "The Hobbit",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
})
To calculate the ratings array average and projecting the value in another field ratings_average, you can then apply the following aggregation pipeline:
db.collection.aggregate([
{
"$unwind": "$ratings"
},
{
"$group": {
"_id": {
"_id": "$_id",
"title": "$title"
},
"ratings":{
"$push": "$ratings"
},
"ratings_average": {
"$avg": "$ratings.rating"
}
}
},
{
"$project": {
"_id": 0,
"title": "$_id.title",
"ratings_average": 1,
"ratings": 1
}
}
])
Result:
/* 1 */
{
"result" : [
{
"ratings" : [
{
"title" : "best book ever",
"rating" : 5
},
{
"title" : "good book",
"rating" : 3.5
}
],
"ratings_average" : 4.25,
"title" : "The Hobbit"
}
],
"ok" : 1
}
This really could be written so much shorter, and this was even true at the time of writing. If you want an "average" simply use $avg:
db.collection.aggregate([
{ "$addFields": {
"rating_average": { "$avg": "$ratings.rating" }
}}
])
The reason for this is that as of MongoDB 3.2 the $avg operator gained "two" things:
The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group
Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:
{ "array": [ "$fielda", "$fieldb" ] }
or in notating a single property from the array as an array of the values of that property:
{ "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
In earlier releases you would have to use $map in order to access the "rating" property inside each array element. Now you don't.
For the record, even the $reduce usage can be simplified:
db.collection.aggregate([
{ "$addFields": {
"rating_average": {
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": {
"$add": [
"$$value",
{ "$divide": [
"$$this.rating",
{ "$size": { "$ifNull": [ "$ratings", [] ] } }
]}
]
}
}
}
}}
])
Yes as stated, this is really just re-implementing the existing $avg functionality, and therefore since that operator is available then it is the one that should be used.
As you have your to-be-calculated-average data in an array, first you need to unwind it. Do it by using the $unwind in your aggregation pipeline:
{$unwind: "$ratings"}
Then you may access each element of the array as an embedded document with key ratings in the result documents of the aggregation. Then you just need to $group by title and calculate $avg:
{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}
Then just recover your title field:
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
So here is your result aggregation pipeline:
db.yourCollection.aggregate([
{$unwind: "$ratings"},
{$group: {_id: "$title",
ratings: {$push: "$ratings"},
average: {$avg: "$ratings.rating"}
}
},
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
])

Mongodb aggregation, finding within an array of values

I have a schemea that creates documents using the following structure:
{
"_id" : "2014-07-16:52TEST",
"date" : ISODate("2014-07-16T23:52:59.811Z"),
"name" : "TEST"
"values" : [
[
1405471921000,
0.737121
],
[
1405471922000,
0.737142
],
[
1405471923000,
0.737142
],
[
1405471924000,
0.737142
]
]
}
In the values, the first index is a timestamp. What I'm trying to do is query a specific timestamp to find the closest value ($gte).
I've tried the following aggregate query:
[
{ "$match": {
"values": {
"$elemMatch": { "0": {"$gte": 1405471923000} }
},
"name" : 'TEST'
}},
{ "$project" : {
"name" : 1,
"values" : 1
}},
{ "$unwind": "$values" },
{ "$match": { "values.0": { "$gte": 1405471923000 } } },
{ "$limit" : 1 },
{ "$sort": { "values.0": -1 } },
{ "$group": {
"_id": "$name",
"values": { "$push": "$values" },
}}
]
This seems to work, but it doesn't pull the closest value. It seems to pull anything greater or equal to and the sort doesn't seem to get applied, so it will pull a timestamp that is far in the future.
Any suggestions would be great!
Thank you
There are a couple of things wrong with the approach here even though it is a fair effort. You are right that you need to $sort here, but the problem is that you cannot "sort" on an inner element with an array. In order to get a value that can be sorted you must $unwind the array first as it otherwise will not sort on an array position.
You also certainly do not want $limit in the pipeline. You might be testing this against a single document, but "limit" will actually act on the entire set of documents in the pipeline. So if more than one document was matching your condition then they would be thrown away.
The key thing you want to do here is use $first in your $group stage, which is applied once you have sorted to get the "closest" element that you want.
db.collection.aggregate([
// Documents that have an array element matching the condition
{ "$match": {
"values": { "$elemMatch": { "0": {"$gte": 1405471923000 } } }
}},
// Unwind the top level array
{ "$unwind": "$values" },
// Filter just the elements that match the condition
{ "$match": { "values.0": { "$gte": 1405471923000 } } },
// Take a copy of the inner array
{ "$project": {
"date": 1,
"name": 1,
"values": 1,
"valCopy": "$values"
}},
// Unwind the inner array copy
{ "$unwind": "$valCopy" },
// Filter the inner elements
{ "$match": { "valCopy": { "$gte": 1405471923000 } }},
// Sort on the now "timestamp" values ascending for nearest
{ "$sort": { "valCopy": 1 } },
// Take the "first" values
{ "$group": {
"_id": "$_id",
"date": { "$first": "$date" },
"name": { "$first": "$name" },
"values": { "$first": "$values" },
}},
// Optionally push back to array to match the original structure
{ "$group": {
"_id": "$_id",
"date": { "$first": "$date" },
"name": { "$first": "$name" },
"values": { "$push": "$values" },
}}
])
And this produces your document with just the "nearest" timestamp value matching the original document form:
{
"_id" : "2014-07-16:52TEST",
"date" : ISODate("2014-07-16T23:52:59.811Z"),
"name" : "TEST",
"values" : [
[
1405471923000,
0.737142
]
]
}

Mongodb $cond in aggregation framework

I have a collection with documents that look like the following:
{
ipAddr: '1.2.3.4',
"results" : [
{
"Test" : "Sight",
"Score" : "FAIL",
"Reason" : "S1002"
},
{
"Test" : "Speed",
"Score" : "FAIL",
"Reason" : "85"
},
{
"Test" : "Sound",
"Score" : "FAIL",
"Reason" : "A1001"
}
],
"finalGrade" : "FAILED"
}
Here's the aggregation query I'm trying to write, what I want to do (see commented out piece), is to create a grouped field, per ipAddr, of the
'Reason / Error' code, but only if the Reason code begins with a specific letter, and only add the code in once, I tried the following:
db.aggregate([
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push: "$finalGrade"},
// errorCodes: {$addToSet: {$cond: ["$results.Reason": /[A|B|S|N.*/, "$results.Reason", ""]}},
finalResult: {$last: "$finalGrade"} }
}
]);
Everything works, excluding the commented out 'errorCodes' line. The logic I'm attempting to create is:
"Add the the errorCodes set the value of the results.Reason code IF it begins with an A, B, S, or N, otherwise there is nothing to add".
For the Record above, the errorCodes set should contain:
...
errorCodes: [S1002,A1001],
...
$group cannot take conditional expressions, which is why that line is not working. $project is the phase where you can transform the original document based on $conditional expressions (among other things).
You need two steps in the aggregation pipeline before you can $group - first you need to $unwind the results array, and next you need to $match to filter out the results you don't care about.
That would do the simple thing of just throwing out the results with error codes you don't care about keeping, but it sounds like you want to count the total number of failures including all error codes, but then only add particular ones to the output array? There isn't a straight-forward way to do that, you would have to make two $group $unwind passes in the pipeline.
Something similar to this will do it:
db.aggregate([
{$unwind : "$results"},
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push : "$results"},
finalGrade: {$last : "$finalGrade" }
}
},
{$unwind: "$results"},
{$match: {"results.Reason":/yourMatchExpression/} },
{$group:
{ _id: "$ipAddr",
attempts: {$last:"$attempts"},
errorCodes: {$addToSet: "$results.Reason"},
finalResult: {$last: "$finalGrade"}
}
]);
If you only want to count attempts that have the matching error code then you can do that with a single $group - you will need to do $unwind, $match and $group. You could use $project with $cond as you had it, but then your array of errorCodes will have an empty string entry along with all the proper error codes.
As of Mongo 2.4, $regex can be used for pattern matching, but not as an expression returning a boolean, which is what's required by $cond
Then, you can either use a $match operator to use the $regex keyword:
http://mongotry.herokuapp.com/#?bookmarkId=52fb39e207fc4c02006fcfed
[
{
"$unwind": "$results"
},
{
"$match": {
"results.Reason": {
"$regex": "[SA].*"
}
}
},
{
"$group": {
"_id": "$ipAddr",
"attempts": {
"$sum": 1
},
"results": {
"$push": "$finalGrade"
},
"undefined": {
"$last": "$finalGrade"
},
"errorCodes": {
"$addToSet": "$results.Reason"
}
}
}
]
or you can use $substr as your pattern matching is very simple
http://mongotry.herokuapp.com/index.html#?bookmarkId=52fb47bc7f295802001baa38
[
{
"$unwind": "$results"
},
{
"$group": {
"_id": "$ipAddr",
"errorCodes": {
"$addToSet": {
"$cond": [
{
"$or": [
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"A"
]
},
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"S"
]
}
]
},
"$results.Reason",
"null"
]
}
}
}
}
]