mongoDB - average on array values - mongodb

I'm trying to compute an average aggregation operation on each values of an array for each documents in my collection.
Document structure
{
myVar: myValue,
[...]
myCoordinates: [
myLng,
myLat
]
}
So, I tried to compute average of myLng and myLat values of myCoordinates array for the whole collection of documents by querying the collection like this :
myColl.aggregate([{
$group: {
_id: 0,
lngAvg: { $avg: "$myCoordinates.0" },
latAvg: { $avg: "$myCoordinates.1" }
}
}])
But unfortunately, it doesn't work and returns me a value of 0 for both lngAvg and latAvg fields.
Have you some ideas? Is this feasible at least?

Positional notation in aggregation seems to still be unsupported, check out this ticket.
As #Sammaye says you'd need to either unwind the array first, or replace your coordinates array with an embedded lng/lat embedded doc, which would make this trivial.
Given the array structure, you might unwind and project the lat/lng like this:
myColl.aggregate([
// unwind the coordinates into separate docs
{$unwind: "$myCoordinates"},
// group back into single docs, projecting the first and last
// coordinates as lng and lat, respectively
{$group: {
_id: "$_id",
lng: {$first: "$myCoordinates"},
lat: {$last: "$myCoordinates"}
}},
// then group as normal for the averaging
{$group: {
_id: 0,
lngAvg: {$avg: "$lng"},
latAvg: {$avg: "$lat"}
}}
]);

Related

How to addfields in MongoDB retrospectively

I have a schema that has a like array. This array stores all the people who have liked my post.
I just added a likeCount field as well but the likeCount fields default value is 0.
How can I create an addfields in mongoDB so that I can update the likeCount with the length of the like array?
I am on a MERN stack.
I am assuming you have a data structure like this:
{
postId: "post1",
likes: [ "ID1", "ID2", "ID3" ]
}
There is almost no reason to add a likeCount field. You should take the length of the likes array itself. Some examples:
db.foo.insert([
{'post':"P1", likes: ["ID1","ID2","ID3"]},
{'post':"P2", likes: ["ID1","ID2","ID3"]},
{'post':"P3", likes: ["ID4","ID2","ID6","ID7"]}
]);
// Which post has the most likes?
db.foo.aggregate([
{$addFields: {N: {$size: "$likes"}}},
{$sort: {"N":-1}}
//, {$limit: 2} // optionally limit to whatever
]);
// Is ID6 in likes?
// $match of a scalar to an input field ('likes') acts like
// $in for convenience:
db.foo.aggregate([ {$match: {'likes':'ID6'}} ]);
// Is ID6 OR ID3 in likes?
db.foo.aggregate([ {$match: {'likes':{$in:['ID6','ID3']}}} ]);
// Is ID2 AND ID7 in likes?
// This is a fancier way of doing set-to-set compares instead
// of a bunch of expression passed to $and:
var targets = ['ID7','ID2'];
db.foo.aggregate([
{$project: {X: {$eq:[2, {$size:{$setIntersection: ['$likes', targets]}} ]} }}
]);
// Who likes the most across all posts?
db.foo.aggregate([
{$unwind: '$likes'},
{$group: {_id: '$likes', N:{$sum:1}} },
{$sort: {'N':-1}}
]);
This is how to update all the documents with the respective likeCount values the first time:
db.collection.update({},
[
{
$addFields: {
likeCount: {
$size: "$like"
}
}
}
],
{
multi: true
})
Every next time somebody or multiple people are added to the like array , you may set the likeCount with the $size or you may increase the count with $inc operation.
Afcourse as #Buzz pointed below it is best to leave the array count() in the read code since updating every time like count() it will be an expensive operation leading to performance implication under heavy load ...
playground

What is the actual difference between $project and $group?

I have read the docs and still not quite following it. According to it, it returns me specific documents according to my own specifications inside a collection. For grouping, it pretty much says the same thing: "Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping"
So, what does this following code is actually doing? It seems redundant to me.
BillingCycle.aggregate([{
$project: {credit: {$sum: "$credits.value"}, debt: {$sum: "debts.value"}}
}, {
$group: {
_id: null,
credit: {$sum: "$credit"}, debt: {$sum: "debt"}
}
}, {
$project: {_id: 0, credit: 1, debt: 1 }
}]});
"Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping"
The purpose of $group is not only to push some fields to next stage but to gather some element on the basis of input criteria passed in the _id attribute.
On the other, hand $project function will exclude/include some field(or custom field) to next stage. As per document you can see the definition "Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields."
There is one case if we suppress the _id from $group then it will calculate accumulated values for all the input documents as a whole. Which seems to act like $project.
For the query on $project stage is redundant
BillingCycle.aggregate([ {
$group: {
_id: null,
credit: {$sum: "$credit.value"}, debt: {$sum: "debt.value"}
}
}, {
$project: {_id: 0, credit: 1, debt: 1 }
}]});

MongoDB, right projection subfield [duplicate]

Is it possible to rename the name of fields returned in a find query? I would like to use something like $rename, however I wouldn't like to change the documents I'm accessing. I want just to retrieve them differently, something that works like SELECT COORINATES AS COORDS in SQL.
What I do now:
db.tweets.findOne({}, {'level1.level2.coordinates': 1, _id:0})
{'level1': {'level2': {'coordinates': [10, 20]}}}
What I would like to be returned is:
{'coords': [10, 20]}
So basically using .aggregate() instead of .find():
db.tweets.aggregate([
{ "$project": {
"_id": 0,
"coords": "$level1.level2.coordinates"
}}
])
And that gives you the result that you want.
MongoDB 2.6 and above versions return a "cursor" just like find does.
See $project and other aggregation framework operators for more details.
For most cases you should simply rename the fields as returned from .find() when processing the cursor. For JavaScript as an example, you can use .map() to do this.
From the shell:
db.tweets.find({},{'level1.level2.coordinates': 1, _id:0}).map( doc => {
doc.coords = doc['level1']['level2'].coordinates;
delete doc['level1'];
return doc;
})
Or more inline:
db.tweets.find({},{'level1.level2.coordinates': 1, _id:0}).map( doc =>
({ coords: doc['level1']['level2'].coordinates })
)
This avoids any additional overhead on the server and should be used in such cases where the additional processing overhead would outweigh the gain of actual reduction in size of the data retrieved. In this case ( and most ) it would be minimal and therefore better to re-process the cursor result to restructure.
As mentioned by #Neil Lunn this can be achieved with an aggregation pipeline:
And starting Mongo 4.2, the $replaceWith aggregation operator can be used to replace a document by a sub-document:
// { level1: { level2: { coordinates: [10, 20] }, b: 4 }, a: 3 }
db.collection.aggregate(
{ $replaceWith: { coords: "$level1.level2.coordinates" } }
)
// { "coords" : [ 10, 20 ] }
Since you mention findOne, you can also limit the number of resulting documents to 1 as such:
db.collection.aggregate([
{ $replaceWith: { coords: "$level1.level2.coordinates" } },
{ $limit: 1 }
])
Prior to Mongo 4.2 and starting Mongo 3.4, $replaceRoot can be used in place of $replaceWith:
db.collection.aggregate(
{ $replaceRoot: { newRoot: { coords: "$level1.level2.coordinates" } } }
)
As we know, in general, $project stage takes the field names and specifies 1 or 0/true or false to include the fields in the output or not, we also can specify the value against a field instead of true or false to rename the field. Below is the syntax
db.test_collection.aggregate([
{$group: {
_id: '$field_to_group',
totalCount: {$sum: 1}
}},
{$project: {
_id: false,
renamed_field: '$_id', // here assigning a value instead of 0 or 1 / true or false effectively renames the field.
totalCount: true
}}
])
Stages (>= 4.2)
$addFields : {"New": "$Old"}
$unset : {"$Old": 1}

Meteor + Mongo (2.6.7) Pushing Document to Array in Sorted Order

I have a document with an array (which should be denormalised, but can't be because the reactive events will fire "add" too many times at client startup).
I need to be able to push a document to that array, and keep it in sorted (or roughly sorted) order. I've tried this query:
{ $push: {
'events': {
$each: [{'id': new Mongo.ObjectID, 'start':startDate,...}],
$sort: {'start': 1},
$slice: -1
}
}
But it requires the $slice operator to be present... I don't want to delete all my old data, I just want to be able to insert data into an array, and then have that array be sorted so that I can query the array later and say "slice greater than or equal to time X".
Is this possible?
Edit:
This mongo aggregate query nearly works, except for one level of document in the result array, but aggregating is not reactive (probably because they're expensive computations). Here is the aggregate query if anyone can see how to translate it to a find, or why it can't be translated:
Coll.aggregate({$unwind: '$events'},
{$sort: {'events.start':1}},
{$match: {'events.start': {$gte: new Date()}}},
{$group: {_id: '$_id', 'events': {$push: '$events'} }})

Mongodb aggregate query help - grouping with multiple fields and converting to an array

I have the following document in the mongodb collection
[{quarter:'Q1',project:'project1',user:'u1',cost:'100'},
{quarter:'Q2',project:'project1',user:'u2',cost:'100'},
{quarter:'Q3',project:'project1',user:'u1',cost:'200'},
{quarter:'Q1',project:'project2',user:'u2',cost:'200'},
{quarter:'Q2',project:'project2',user:'u1',cost:'300'},
{quarter:'Q3',project:'project2',user:'u2',cost:'300'}]
i need to generate an output which will sum the cost based on quarter and project and put it in the format so that it can be rendered in the Extjs chart.
[{quarter:'Q1','project1':100,'project2':200,'project3':300},
{quarter:'Q2','project1':100,'project2':200,'project3':300},
{quarter:'Q3','project1':100,'project2':200,'project3':300}]
i have tried various permutations and combinations of aggregates but couldnt really come up with a pipeline. your help or direction is greatly appreciated
Your cost data appears to be strings, which isn't helping, but assuming you're around that:
The main component is the $cond operator in the document projection, and assuming your data is larger and you want to group the results:
db.mstats.aggregate([
// Optionaly match first depending on what you are doing
// Sum up cost for each quarter and project
{$group: {_id: { quarter: "$quarter", project: "$project" }, cost: {$sum: "$cost" }}},
// Change the "projection" in $group, using $cond to add a key per "project" value
// We use $sum and the false case of 0 to fill in values not in the row.
// These will then group on the key adding the real cost and 0 together.
{$group: {
_id: "$_id.quarter",
project1: {$sum: {$cond:[ {$eq: [ "$_id.project", "project1" ]}, "$cost", 0 ]}},
project2: {$sum: {$cond:[ {$eq: [ "$_id.project", "project2" ]}, "$cost", 0 ]}}
}},
// Change the document to have the "quarter" key
{$project: { _id:0, quarter: "$_id", project1: 1, project2: 1}},
// Optionall sort by quarter
{$sort: {quarter: 1 }}
])
So after doing the initial grouping the document is altered with use of $cond to determine if the value of a key is going to go into the new key name. Essentially this asks if the current value of project is "project1" then put the cost value into this project1 key. And so on.
As we put a 0 value into this new document key when there was no match, we need to group the results again in order to merge the documents. Sorting is optional, but probably what you want for a chart.
Naturally you will have to build this up dynamically and probably query for the project keys that you want. But otherwise this should be what you are looking for.