mongodb: aggregates over a list contained in each document - mongodb

I have currently reading the mongoDb aggregation introduction. The examples show how the aggregation operation is powerful, for example, to sum certain values across a subset of documents in a collection.
What I need is actually a bit different: I need to perform the same operation within a list that is contained in each document of a collection. In this way I would still get an element for each document that is contained in the collection, but the lists that are contained in each document would be collapsed, by summation on a certain field contained in the sub-documents contained in the list.
Is this possible with normal pipeline/aggregation operations in MongoDB?

I discovered that the $unwind operator allows to expand a list contained in a document across several documents.
For example, the following query just expands the sessions list into several documents, that can be used, afterwards, for an aggregation over the Ts field:
db.userStats.aggregate([
{ $match: {"u":{ "$in": [1,2,3,4,5] }}},
{ $unwind: "$sessions" },
{ $group: { _id:"$u" , total: { $sum: "$sessions.Ts"}}},
])

It sounds like you want to do a $project, possibly followed by a $group if you'd prefer to collapse all the results into a single document. Something like:
db.userStats.aggregate([
{ $match: {"u":{ "$in": [1,2,3,4,5] }}},
{ $project: { total: { $sum: "$sessions.Ts"}}},
{ $group: { _id:"$u" , total: { $first: "$total" }}},
])

Related

Project nested array element to top level using MongoDB aggregation pipeline

I have a groups collection with documents of the form
{
"_id": "g123"
...,
"invites": [
{
"senderAccountId": "a456",
"recipientAccountId": "a789"
},
...
]
}
I want to be able to list all the invites received by a user.
I thought of using an aggregation pipeline on the groups collection that filters all the groups to return only those to which the user has been invited to.
db.groups.aggregate([
{
$match: {
"invites.recipientAccountID": "<user-id>"
}
}
])
Lastly I want to project this array of groups to end up with an array of the form
[
{
"senderAccountId": "a...",
"recipientAccountId": "<user-id>",
"groupId": "g...", // Equal to "_id" field of document.
},
...
]
But I'm missing the "project" step in my aggregation pipeline to bring to the top-level the nested senderAccountId and recipientAccountId fields. I have seen examples online of projections in MongoDB queries and aggregation pipelines but I couldn't find examples for projecting the previously matched element of an array field of a document to the top-level.
I've thought of using Array Update Operators to reference the matched element but couldn't get any meaningful progress using this method.
There are multiple ways to do this, using a combination of unwind and project would work as well. Unwind will create one object for each and project let you choose how you want to structure your result with current variables.
db.collection.aggregate([
{
"$unwind": "$invites"
},
{
"$match": {
"invites.recipientAccountId": "a789"
}
},
{
"$project": {
recipientAccountId: "$invites.recipientAccountId",
senderAccountId: "$invites.senderAccountId",
groupId: "$_id",
_id: 0 // don't show _id key:value
}
}
])
You can also use nimrod serok's $replaceRoot in place of the $project one
{$replaceRoot: {newRoot: {$mergeObjects: ["$invites", {group: "$_id"}]}}}
playground
nimrod serok's solution might be a bit better because mine unwind it first and then matches it but I believe mine is more readable
I think what you want is $replaceRoot:
db.collection.aggregate([
{$match: {"invites.recipientAccountId": "a789"}},
{$set: {
invites: {$first: {
$filter: {
input: "$invites",
cond: {$eq: ["$$this.recipientAccountId", "a789"]}
}
}}
}},
{$replaceRoot: {newRoot: {$mergeObjects: ["$invites", {group: "$_id"}]}}}
])
See how it works on the playground example

How to access overall document count during arithmetic aggregation expression

I have a collection of documents in this format:
{
_id: ObjectId,
items: [
{
defindex: number,
...
},
...
]
}
Certain parts of the schema not relevant are omitted, and each item defindex within the items array is guaranteed to be unique for that array. The same defindex can occur in different documents' items fields, but will only occur once in each respective array if present.
I currently call $unwind upon the items field, followed by $sortByCount upon items.defindex to get a sorted list of items with the highest count.
I now want to add a new field to this final sorted list using $set called usage, that shows the item's usage as a percentage of the initial number of total documents in the collection.
(i.e. if the item's count is 1300, and the overall document count pre-$unwind was 2600, the usage value will be 0.5)
My initial plan was to use $facet upon the initial collection, creating a document as so:
{
total: number (achieved using $count),
documents: [{...}] (achieved using an empty $set)
}
And then calling $unwind on the documents field to add the total document count to each document. Calculating the usage value is then trivial using $set, since the total count is a field in the document itself.
This approach ran into memory issues though, since my collection is far larger than the 16MB limit.
How would I solve this?
One way to do it is use $setWindowFields:
db.collection.aggregate([
{
$setWindowFields: {
output: {
totalCount: {$count: {}}
}
}
},
{
$unwind: "$items"
},
{
$group: {
_id: "$items.defindex",
count: {$sum: 1},
totalCount: {$first: "$totalCount"}
}
},
{
$project: {
count: 1,
usage: {$divide: ["$count", "$totalCount"]
}
}
},
{$sort: {count: -1}}
])
As you can see here

How to sum the length of particular array field present in all objects inside an array in mongodb document

This is the sample of my mongodb document( try to use jsonformatter.com to analyse it):
{"_id":"6278686","playerName":"Rohit Lal","tournamentId":"197831","score":[{"_id":"1611380","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":["Mohit Mishra"],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1602732","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536514","runsScored":1,"ballFaced":3,"fours":0,"sixes":0,"strikeRate":33.33,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"run out Sameer Baveja","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536474","runsScored":2,"ballFaced":7,"fours":0,"sixes":0,"strikeRate":28.57,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c Rajesh b Prasad Naik","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536467","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1500825","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1461428","runsScored":18,"ballFaced":6,"fours":1,"sixes":2,"strikeRate":300,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"not out","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1461408","runsScored":0,"ballFaced":1,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c Sudhir b Vinay Kasat *vk*","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1451175","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1451146","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1392796","runsScored":0,"ballFaced":1,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c †Vinay Kedia b Lalit","catches":[],"stumping":[],"runout":[],"participatedRunout":[]}],"__v":0}
I want to sum the length of catches array field of all objects inside score array. I know, I can achieve it using aggregation framework, but I am begineer in mongodb and does not have knowledge of many aggregation operators. Here is the aggregation pipeline I have tried but it returns the number of existence of this field, not the sum of length of this array:
[
"totalCatches": {
$size: "$score.catches"
}
]
$unwind - Descontruct score array field to multiple documents.
$group - Group by null (for all objects), next $sum for the $size of score.catches.
db.collection.aggregate([
{
$unwind: "$score"
},
{
$group: {
_id: null,
"totalCatches": {
$sum: {
$size: "$score.catches"
}
}
}
}
])
Sample Mongo Playground
Note: If you want the result to be based on each document (not combine all documents), then you need to change the $group's _id as:
{
$group: {
_id: "$_id",
...
}
}

Mongodb how to aggregate the number of occurencies(count) of distinct values?

I have a set with 2m hashtags. However only around 200k are distinct values. I want to know wich hashtags are more repeated on my data.
I used this to find how many times each hashtag is repeated on my dataset:
db.hashtags.aggregate([{ "$group": {"_id": "$hashtag","count": { "$sum": 1 }}}]);
However, I would like to save the values in a distinct collection only with the unique values and its correspondency number of occurency.
How should I do that?
Please, if possible provide me some information in order that I can UNDERSTAND how to do it not only the code.
Thank you.
You can use the $out pipeline operator to write the output of the aggregation to another collection.
db.hashtags.aggregate([
{ "$group": {"_id": "$hashtag", "count": { "$sum": 1 }}},
{ "$out": "newcoll" }
]);
Note that this feature was added in MongoDB 2.6
Using the aggregation framework the following will, for hashtag with multiple records, return the duplicate hashtag and the corresponding record count:
db.hashtags.aggregate([
{
$group: {
_id: "$hashtag",
count: { $sum: 1 }
}
},
{ $match: { count: { $gt: 1 } } },
{ $sort : { count : -1} },
{ $limit : 200 },
{ $out: "duphashtags" }
])
The $sum operator adds up the values of the fields passed to it, in this case the constant 1 - thereby counting the number of grouped records into the count field. The $match filters documents with a count greater than 1, i.e. duplicates. $sort sorts the most frequent duplicates first, and limit the results to the top 200. The $out operator writes the documents returned by the aggregation pipeline to a specified collection, say "duphashtags".

Get single array from mongoDB collection where the status is current

i want to find accepted bodypart which have status active
i tried this
db.patients.find({
"injury.injurydata.injuryinformation.dateofinjury": {
"$gte": ISODate("2014-05-21T08:00:00Z") ,
"$lt": ISODate("2014-06-03T08:00:00Z")
},
{
"injury.injurydata.acceptedbodyparts":1,
"injury.injurydata.injuryinformation.dateofinjury":1
"injury":{
$elemMatch: {
"injury.injurydata.acceptedbodyparts.status": "current"
}
}
})
but still get both array
If acceptedbodyparts is an array, you can't query acceptedbodyparts.status. If status is a field on the documents contained in the array, you would need to use another $elemMatch clause in your query. So the last part would look something like this:
{"injury":{ "$elemMatch": { "injurydata.acceptedbodyparts": {"$elemMatch": {"status":"current"} }} }}
I also removed the injury. prefix in the first $elemMatch because you're querying data within the injury array.
Note that this will return the entire document with the full array, as long as it contains the document you're searching for. If your intention is to retrieve a particular element in an array, $elemMatch is the wrong approach.
Standard projection will not work with nested arrays or limiting any fields inside arrays. For that you need the aggregation framework:
db.patients.aggregate([
// First match, Matches documents
{ "$match": {
"injury.injurydata.injuryinformation.dateofinjury": {
"$gte": ISODate("2014-05-21T08:00:00Z"),
"$lt": ISODate("2014-06-03T08:00:00Z")
}
}},
// Un-wind the arrays
{ "$unwind": "$injury" },
{ "$unwind": "$injury.injurydata" },
{ "$unwind": "$injury.injurydata.acceptedbodyparts" },
// Now match the required data in the array
{ "$match": {
"injury.injurydata.acceptedbodyparts.status": "current"
}},
// Group only wanted fields
{ "$group": {
"_id": "$_id",
"acceptedbodyparts": {
"$push": "injury.injurydata.acceptedbodyparts"
}
}}
])
You can add in other fields outside of the array either using $first or by akin g them part of the _id in the grouping.
This is just something that is outside of the scope of the standard projection available and the aggregation framework with the extended manipulation capabilities solves this.