MongoDB two groups Aggregate - mongodb

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.
I would like to transform that :
{
"_id" : ObjectId("5836b919885383034437d4a7"),
"Identificador" : "G-3474",
"Miembros" : [
{
"_id" : ObjectId("5836b916885383034437d238"),
"Nombre" : "Pilar",
"Email" : "pcarrillocasa#gmail.es",
"Edad" : 24,
"País" : "España",
"Tipo" : "Usuario individual",
"Apellidos" : "Carrillo Casa",
"Teléfono" : 637567234,
"Ciudad" : "Santander",
"Identificador" : "U-3486",
"Información_creación" : {
"Fecha_creación" : {
"Mes" : 4,
"Día" : 22,
"Año" : 2016
},
"Hora_creación" : {
"Hora" : 15,
"Minutos" : 34,
"Segundos" : 20
}
}
}
}
into that
{
"Nombre_Grupo" : "Amigo invisible"
"Ciudades" : [
{
"Ciudad" : "Madrid",
"Miembros": 30
},
{
"Ciudad" : "Almería",
"Miembros": 10
}
{
"Ciudad" : "Badajoz",
"Miembros": 20
}
]
}
with MongoDB.
I tried with that:
db.Grupos_usuarios.aggregate([
{ $group: { _id: "$Nombre_Grupo",total: { $sum: "$amount" } },
$group: { _id: "$Ciudad",total: { $sum: "$amount" } } }
])
but I could not get what I needed.
May somebody help me to know what I am doing bad?

The following aggregation gets the output you are looking for.
The $unwind stage deconstructs an array field from the input documents to output a document for each element. These documents are used to group by the Miembros.Ciudad and get the total Miembros for each Ciudad. In the second group stage we Pivot data to get all the Ciudades from the previous grouping into an array. The last $project is for formatting the output.
db.test.aggregate( [
{
$unwind: "$Miembros"
},
{
$group: {
_id: "$Miembros.Ciudad",
total: { $sum: 1 }
}
},
{
$group: {
_id: "Amigo invisible",
Ciudades: { $push: { Ciudad: "$_id", Miembros: "$total"} }
}
},
{
$project: {
Nombre_Grupo: "$_id",
Ciudades: 1,
_id: 0
}
}
] )

Related

Mongo Aggregation does not analyze all documents

I'm trying to get some statistics using Mongo's aggregation Framework but my queries do not seem to work right. So I have a collection of documents, each document having the structure below:
{
"_id" : ObjectId("5e46af306d0f5d63d4de6d0f"),
"homeEmperor" : {
"name" : "Home",
"units" : {
"Mage" : 15,
"Warrior" : 15,
"Swordmaster" : 15,
"Rogue" : 15,
"Warlock" : 15
}
},
"awayEmperor" : {
"name" : "Away",
"units" : {
"Druid" : 15,
"Ranger" : 15,
"Priest" : 15,
"Monk" : 15,
"Dragon" : 15
}
},
"dateCreated" : ISODate("2020-02-06T00:00:00.000Z"),
"winner" : "away",
"battleType" : "mutual",
"turns" : 45,
"_class" : "com.deathstar.Datahouse.domain.mongo.HistoricMongoRecord"
}
So what I'm doing at the moment is using this query to get the number of wins each unit has
db.getCollection('WarHistory').aggregate([
{ $match: { battleType: "mutual" } },
{ $project: {"winners": {$cond: { if: { $eq: [ "$winner", "home" ] }, then: "$homeEmperor.units", else: "$awayEmperor.units"} }} },
{ $project: {"winnerUnits": { $objectToArray: "$winners" }} },
{ $group: { _id: {unitType: "$winnerUnits.k"}} },
{ $unwind: "$_id.unitType" },
{ $group: { _id: {unitType: "$_id.unitType"}, count:{$sum:1} }},
{ $sort : { count : -1 }}
])
and this query to get each unit's participation
db.getCollection('WarHistory').aggregate([
{ $match: { battleType: "mutual" } },
{ $project: { "participants": { $mergeObjects: [ "$homeEmperor.units", "$awayEmperor.units" ] } } },
{ $project: {"participantUnits": { $objectToArray: "$participants" }} },
{ $group: { _id: {unitType: "$participantUnits.k"}} },
{ $unwind: "$_id.unitType" },
{ $group: { _id: {unitType: "$_id.unitType"}, count:{$sum:1} }},
{ $sort : { count : -1 }}
])
The output of both queries is as shown below which is the way I want it:
{
"_id" : {
"unitType" : "Pirate"
},
"count" : 1331.0
}
After that I divide them and so on. I also have a Java service which produces these documents.
The problem is that while at first the queries seem to work fine, after some point they seem to stop counting all documents, so I see the "Count" counter in the Collection increasing but the query results remain exactly the same.
I've checked and the documents continue to have the same structure along the way.
I also tried the "allowDiskUse:true" parameter in case the problem was the memory capacity(I have 16GB RAM), but it really made no difference.
Can anyone please confirm that the queries above can do what I want them to?
Any help would be really appreciated! Thanks for your time!

How can i count total documents and also grouped counts simultanously in mongodb aggregation?

I have a dataset in mongodb collection named visitorsSession like
{ip : 192.2.1.1,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.3.1.8,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.5.1.4,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.8.1.7,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.1.1.3,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'}
I am using this mongodb aggregation
[{$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}}, {$group: {
_id : "$country",
totalSessions : {
$sum: 1
}
}}, {$project: {
_id : 0,
country : "$_id",
totalSessions : 1
}}, {$sort: {
country: -1
}}]
using above aggregation i am getting results like this
[{country : 'US',totalSessions : 3},{country : 'UK',totalSessions : 2}]
But i also total visitors also along with result like totalVisitors : 5
How can i do this in mongodb aggregation ?
You can use $facet aggregation stage to calculate total visitors as well as visitors by country in a single pass:
db.visitorsSession.aggregate( [
{
$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}
},
{
$facet: {
totalVisitors: [
{
$count: "count"
}
],
countrySessions: [
{
$group: {
_id : "$country",
sessions : { $sum: 1 }
}
},
{
$project: {
country: "$_id",
_id: 0,
sessions: 1
}
}
],
}
},
{
$addFields: {
totalVisitors: { $arrayElemAt: [ "$totalVisitors.count" , 0 ] },
}
}
] )
The output:
{
"totalVisitors" : 5,
"countrySessions" : [
{
"sessions" : 2,
"country" : "UK"
},
{
"sessions" : 3,
"country" : "US"
}
]
}
You could be better off with two queries to do this.
To save the two db round trips following aggregation can be used which IMO is kinda verbose (and might be little expensive if documents are very large) to just count the documents.
Idea: Is to have a $group at the top to count documents and preserve the original documents using $push and $$ROOT. And then before other matches/filter ops $unwind the created array of original docs.
db.collection.aggregate([
{
$group: {
_id: null,
docsCount: {
$sum: 1
},
originals: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$originals"
},
{ $match: "..." }, //and other stages on `originals` which contains the source documents
{
$group: {
_id: "$originals.country",
totalSessions: {
$sum: 1
},
totalVisitors: {
$first: "$docsCount"
}
}
}
]);
Sample O/P: Playground Link
[
{
"_id": "UK",
"totalSessions": 2,
"totalVisitors": 5
},
{
"_id": "US",
"totalSessions": 3,
"totalVisitors": 5
}
]

Mongodb splitting aggregation result

I'm currently trying to split an aggregation result in two differents arrays using only mongodb.
My main goal is to create two subset of user with the same distribution regarding the number of interactions that they have made. For this I'm currently making this request:
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $sort : { count : -1 }},
{ $group : { _id :{$mod : [_rand() * 2, 2]}, ids : { $push: "$_id"}}}
}
My main issue actualy is that the _rand() function is called only once during the aggregation execution to I only have all my result in a single array.
Also, a random distribution is not so good. Is there a way to use the index of each result ?
Edit 1 :
After #dnickless answer I still got an issue on distribution in the groupBy part. Ideally I would like to do something like this
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $sort : { count : -1 }},
{ $bucket: {
groupBy: { $mod: [ { $indexOfArray : ??? }, 2 ] },
boundaries: [ 0, 1 ],
default: 2,
output: {
"users": { $push: "$_id"}
}
}
}
],
{ allowDiskUse: true })
That could split even index and odd index into two separated array. But I would like to apply the $indexOfArray on the current aggregation result.
To give you more context here is my Interaction object model :
{ "_id" : ObjectId("5af01..."), "name" : "WATCH", "date" : ISODate("2018-05-07T09:32:53.219Z") }
Without the bucket part I have this result :
{ "_id" : "5b1e7f...", "count" : 43.0 }
{ "_id" : "5b1e75...", "count" : 41.0 }
{ "_id" : "5b1e7a...", "count" : 40.0 }
...
I would like my answer to look like this :
{
{ "_id" : 0, "users" : [ "5b1e7f...", "5b1e7a...", ... ] }, // even index results
{ "_id" : 1, "users" : [ "5b1e75...", ... ] } // odd index results
}
My end goal is to split my users in 2 groups with evenly distributed numbers of interactions.
Edit 2 :
Finally found a solution to resolve my problem :
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $sort : { count : -1 }},
{ $group : { _id : "whatever" , user : { $push : { _id : "$_id" , count : "$count"}}}},
{ $unwind : { path : "$user" , "includeArrayIndex" : "rank"}},
{ $bucket: {
groupBy: { $mod: [ "$rank" , 2 ] },
boundaries: [ 0, 1 ],
default: 2,
output: {
"users": { $push: "$user._id"}
}
}
}
],
{ allowDiskUse: true })
Probably not the most optimized solution at all, but still do the job :)
If you have any advise to improve it I'm still interested in.
I don't fuly understand what exactly you are trying to achieve here without seeing some sample input and output. However, have you tried using $bucketAuto? Something like this:
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $bucketAuto : {
groupBy : "$count",
buckets : 2, // number of buckets goes here
output : {
ids : { $push : "$id" }
}
}
}])
If you want to go more sophisticated regarding the distribution aspect you could perhaps try something like this which would throw all even counts into one pot and all odd ones into another:
$bucket: {
groupBy: { $mod: [ "$count", 2 ] },
boundaries: [ 0, 1 ],
default: 2,
output: {
"docs": { $push: "$$ROOT" }
}
}
Depending on the type of your userId field you could perhaps come up with a more "random" distribution.
Lastly, I am not sure what exactly you mean by
"Is there a way to use the index of each result ?"
Perhaps something like $size, $arrayElemAt and/or $indexOfArray...?
Alternatively, you could perhaps try to $slice the sorted array into two equally sized parts (using $size $divided by 2), then $reverseArray one of them and then $zip both arrays up again which should result in something like when you shuffle a deck of playing cards. After that, you would need to flatten the nested array into a single one again (using $reduce and $concatArrays or so) and then slice the array again in two parts which should be what you are looking for if I am not too tired by now to think through the statistical parts here.

Need to sum from array object value in mongodb

I am trying to calculate total value if that value exits. But query is not working 100%. So can somebody help me to solve this problem. Here my sample document. I have attached two documents. Please these documents & find out best solution
Document : 1
{
"_id" : 1"),
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "11",
"saleValue": 1000
},
{
"id" : "112",
"saleValue": 1400
},
{
"id" : "22",
},
{
"id" : "234",
"saleValue": 111
}
],
},
"createdTime" : ISODate("2018-03-18T10:18:48.000Z")
}
Document : 2
{
"_id" : 444,
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "444",
"saleValue" : 2060
},
{
"id" : "444",
},
{
"id" : 234,
"saleValue" : 260
},
{
"id" : "34534",
}
]
},
"createdTime" : ISODate("2018-03-18T03:11:50.000Z")
}
Needed Output:
{
total : 4831
}
My query :
db.getCollection('myCollection').aggregate([
{
"$group": {
"_id": "$Id",
"totalValue": {
$sum: {
$sum: "$messages.data.saleValue"
}
}
}
}
])
So please if possible help me to solve this problem. Thanks in advance
It's not working correctly because it is aggregating all the documents in the collection; you are grouping on a constant "_id": "tempId", you just need to reference the correct key by adding the $ as:
db.getCollection('myCollection').aggregate([
{ "$group": {
"_id": "$tempId",
"totalValue": {
"$sum": { "$sum": "$messages.data.saleValue" }
}
} }
])
which in essence is a single stage pipeline version of an aggregate operation with an extra field that holds the sum expression before the group pipeline then calling that field as the $sum operator in the group.
The above works since $sum from MongoDB 3.2+ is available in both the $project and $group stages and when used in the $project stage, $sum returns the sum of the list of expressions. The expression "$messages.data.value" returns a list of numbers [120, 1200] which are then used as the $sum expression:
db.getCollection('myCollection').aggregate([
{ "$project": {
"values": { "$sum": "$messages.data.value" },
"tempId": 1,
} },
{ "$group": {
"_id": "$tempId",
"totalValue": { "$sum": "$values" }
} }
])
You can add a $unwind before your $group, in that way you will deconstructs the data array, and then you can group properly:
db.myCollection.aggregate([
{
"$unwind": "$messages.data"
},
{
"$group": {
"_id": "tempId",
"totalValue": {
$sum: {
$sum: "$messages.data.value"
}
}
}
}
])
Output:
{ "_id" : "tempId", "totalValue" : 1320 }
db.getCollection('myCollection').aggregate([
{
$unwind: "$messages.data",
$group: {
"_id": "tempId",
"totalValue": { $sum: "$messages.data.value" }
}
}
])
$unwind
According to description as mentioned into above question, as a solution please try executing following aggregate query
db.myCollection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path: '$messages.data'
}
},
// Stage 2
{
$group: {
_id: {
pageId: '$pageId'
},
total: {
$sum: '$messages.data.saleValue'
}
}
},
// Stage 3
{
$project: {
pageId: '$_id.pageId',
total: 1,
_id: 0
}
}
]
);
You can do it without using $group. Grouping made other data to be managed and addressed. So, I prefer using $sum and $map as shown below:
db.getCollection('myCollection').aggregate([
{
$addFields: {
total: {
$sum: {
$map: {
input: "$messages.data",
as: "message",
in: "$$message.saleValue",
},
},
},
},
},
}
])

MongoDB order by a sum on a subset

I have the following collection:
error_reports
[
{
"_id":{
"$oid":"5184de1261"
},
"date":"29/04/2013",
"errors":[
{
"_id":"10",
"failures":2,
"alerts":1,
},
{
"_id":"11",
"failures":7,
"alerts":4,
}
]
},
{
"_id":{
"$oid":"5184de1262"
},
"date":"30/04/2013",
"errors":[
{
"_id":"15",
"failures":3,
"alerts":2,
},
{
"_id":"16",
"failures":9,
"alerts":1,
}
]
}
]
Is it possible to retrieve the list of documents with failures and alerts sum sorted by failures in descending order? I am new to mongodb, I have been searching for 2 days but I can't figure out what is the proper query...
I tried something like this :
db.error_reports.aggregate(
{ $sort : { failures: -1} },
{ $group:
{ _id: "$_id",
failures: { "$sum": "$errors.failures" }
}
}
);
But it didn't work, I think it is because of the $sum: $errors.failures thing, I would like to sum this attribute on every item of the day_hours subcollection but I don't know of to do this in a query...
You were very close with your attempt. The only thing missing is the $unwind aggregation operator. $unwind basically splits each document out based on a sub-document. So before you group the failures and alerts, you unwind the errors, like so:
db.error_reports.aggregate(
{ $unwind : '$errors' },
{ $group : {
_id : '$_id',
'failures' : { $sum : '$errors.failures' },
'alerts' : { $sum : '$errors.alerts' }
} },
{ $sort : { 'failures': -1 } }
);
Which gives you the follow result:
{
"result" : [
{
"_id" : ObjectId("5184de1262"),
"failures" : 12,
"alerts" : 3
},
{
"_id" : ObjectId("5184de1261"),
"failures" : 9,
"alerts" : 5
}
],
"ok" : 1
}