Mongodb count distinct with multiple group fields - mongodb

I have transaction table which is populated by holidays taken by the employees.
I would need help on following sql scenario in mongodb.
select employee,month,year,count(distinct (holiday_type) from
transactions group by employee,month,year
I need to use aggregation in mongodb and was created mongo query like this and this gives me wrong solution
db.transactions.aggregate([
{ "$group": {
"_id": {
"Month": { "$month" : "$date" },
"Year": { "$year" : "$date" },
"employee" : "$employee",
"holiday_type" : "$holiday_type"
},
"Count_of_Transactions" : { "$sum" : 1 }
}}
]);
I am confused in using count distinct logic in mongodb. Any suggestion would be helpful

Part of the way there but you need to get the "distinct" values for "holiday_type" first, then you $group again:
db.transactions.aggregate([
{ "$group": {
"_id": {
"employee" : "$employee",
"Month": { "$month" : "$date" },
"Year": { "$year" : "$date" },
"holiday_type" : "$holiday_type"
},
}},
{ "$group": {
"_id": {
"employee" : "$_id.employee",
"Month": "$_id.Month",
"Year": "$_id.Year"
},
"count": { "$sum": 1 }
}}
], { "allowDiskUse": true }
);
That is the general process as "distinct" in SQL is kind of a grouping operation in itself. So it is a double $group operation in order to get your correct result.

Related

Group by months and get their counts in MongoDB

I have a document with the following structure.
{
"_id" : ObjectId(""),
"review_id" : "1",
"product_id" : "1",
"date" : 1638869377,
"rating" : 5,
"title" : "lorem",
"review" : "lorem",
"updated_at" : ISODate("2021-12-07T07:10:55.732Z"),
"created_at" : ISODate("2021-12-07T05:04:11.750Z")
}
I managed to get the number of user comments by month. But I want to get it separately for each rating.
For example, in the 10th month there are 7 comments and how can I get 2 of them 1 star, 3 of them 4 stars, 2 of them 5 stars?
{
"$group": {
"_id": {
"month": {
"$month": {
"$toDate": "$date"
}
}
},
"reviews": {
"$sum": 1
}
}
}
You could add everything you want to group by inside the _id field. Just adding a new field with the rating would do the job.
{
"$group": {
"_id": {
"month": {
"$month": {
"$toDate": "$date"
}
},
"rating": "$rating"
},
"reviews": {
"$sum": 1
}
}
}
Note: I would also add a group by year, so you can get each month separed properly. If you dont add them, Mongo will group every January of every year together for example.
Playground
You just want to $group by both fields, the month and the rating, like so:
db.collection.aggregate([
{
"$group": {
"_id": {
"month": {
"$month": {
"$toDate": "$date"
}
},
rating: "$rating"
},
"reviews": {
"$sum": 1
}
}
}
])
Mongo Playground
Or my recommendation just to add a little more structuring to the output is to use this pipeline.

Agregating by field value on MongoDB

I have a collection composed of documents similar to one below:
{
"_id" : ObjectId("5dc916a72440b14b3f0ec096"),
"date" : ISODate("2019-11-11T11:07:03.968+03:00"),
"actions" : [
{
"type" : "Type1",
"action" : true
},
{
"type" : "Type2",
"action" : true
},
{
"type" : "Type3",
"action" : false
}
]
}
I am trying to count all the action types based on the boolean value of the actions.action property.
This is how I came so far:
db.Actions.aggregate(
{
$group: {
_id: {
year: { $year: "$date" },
month: { $month: "$date" },
day: { $dayOfMonth: "$date" },
},
count: { $sum: 1 }
}
}
);
As you can see this only gives me the count of the documents in the collection grouped by the action date.
What I need is something like this:
{
"_id" : {
"year" : 2019,
"month" : 10,
"day" : 13
},
"Type1": 300,
"Type2": 200,
"Type3": 120,
"count" : 305
}
Is this possible with a query or should I go in the direction of creating a cursor and agregating the values with it?
db.Actions.aggregate([
// Unwind to de-normalize the array
{ "$unwind": "$actions" },
// Group on both day and "type"
{ "$group": {
"_id": {
"date": {
"$toDate": {
"$subtract": [
{ "$toLong": "$date" },
{ "$mod": [{ "$toLong": { "$toDate": "$date" } }, 1000 * 60 * 60 * 24 ] }
]
}
},
"type": "$actions.type"
},
"total": { "$sum": { "$toLong": "$actions.action" } }
}},
// Roll-up the grouping to just by "day"
{ "$group": {
"_id": "$_id.date",
"data": { "$push": { "k": "$_id.type", "v": "$total" } }
}},
// Convert to key/value output
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "_id": "$_id", "count": { "$sum": "$data.v" } },
{ "$arrayToObject": "$data" }
]
}
}}
])
To summarize:
The $unwind is needed simply because you want to "group" on a value which is inside an array of a document. Using this "de-normalizes", or essentially makes each array element into a new document for the same property and all other "parent" properties of the document in which that array resides. In simple speak, you get a "copy" of the containing document for every array member as a new document.
The next $group basically uses a "Date math" approach to rounding to a singular day. This is a bit prettier than methods like $year and $month etc, and actually returns a Date object, which you client language of choice will understand.
Of course this is a compound grouping key, meaning that the other part is of course the type field from the array of actions. And since you only want true results to count, we apply $toLong again in order to translate the Boolean into a numeric value to $sum ( which basically means "count" when it's 0 or 1 ). In older releases you could also do this using $cond, but the simple type conversion is a lot more simple to read for intent.
The rest of this is basically about translating to the expected "key/value"* output of the question. Really, you got the desired result in the very first $group operation but to be "key/value" you need to put all those results into an array ( by "date" of course ) using $push, and then convert that array into the root document using the $arrayToObject function.

MongoDB aggregate - average on specific values in array of documents

I'm currently working on a database with the following structure:
{"_id" : ObjectId("1abc2"),
"startdatetime" : ISODate("2016-09-11T18:00:37Z"),
"diveValues" : [
{
"temp" : 15.269,
"depth" : 0.0,
},
{
"temp" : 14.779257384,
"depth" : 1.0,
},
{
"temp" : 14.3940253165,
"depth" : 2.0,
},
{
"temp" : 13.9225795455,
"depth" : 3.0,
},
{
"temp" : 13.8214431818,
"depth" : 4.0,
},
{
"temp" : 13.6899553571,
"depth" : 5.0,
}
]}
The database has information about depth n metres in water, and the temperature on given depth. This is stored in the "diveValues" array. I have been successful on averaging on all depths between to dates, both monthly average and daily average. What I'm having a serious issue with is to get the average between to depths, say between 1 and 4 metres, for every month the last 6 months.
Here is an example of average temperature for each month from January to June, for all depths:
db.collection.aggregate(
[
{$unwind:"$diveValues"},
{$match:
{'startdatetime':
{$gt:new ISODate("2016-01-10T06:00:29Z"),
$lt:new ISODate("2016-06-10T06:00:29Z")}
}
},
{$group:
{_id:
{ year: { $year: "$startdatetime" },
month: { $month: "$startdatetime" }},
avgTemp: { $avg: "$diveValues.temp" }}
},
{$sort:{_id:1}}
]
)
Resulting in:
{ "_id" : { "year" : 2016, "month" : 1 }, "avgTemp" : 7.575706502958313 }
{ "_id" : { "year" : 2016, "month" : 3 }, "avgTemp" : 6.85037457740135 }
{ "_id" : { "year" : 2016, "month" : 4 }, "avgTemp" : 7.215702831902588 }
{ "_id" : { "year" : 2016, "month" : 5 }, "avgTemp" : 9.153453683614638 }
{ "_id" : { "year" : 2016, "month" : 6 }, "avgTemp" : 11.497953009390237 }
Now, I can not seem to figure out how to get average temperature between 1 and 4 metres for the same period.
I have been trying to group the values by wanted depths, but have not managed it - more often than not ending up with bad syntax. Also, if I'm not wrong, the $match pipeline would return all depths as long as the dive has values for 1 and 4 metres, so that will not work.
With the find() tool I am using $slice to return the values I intend from the array - but have not been successful along with the aggregate() function.
Is there a way to solve this? Thanks in advance, much appreciated!
You'd need to place your $match pipeline before $unwind to optimize your aggregation operation as doing an $unwind operation on the whole collection could potentially cause some performance issues since it produces a copy of each document per array entry and that uses more memory (possible memory cap on aggregation pipelines of 10% total memory) thus takes "time" to produce the flattened arrays as well as "time" to process it. Hence it's better to limit the number of documents getting into the pipeline to be flattened.
db.collection.aggregate([
{
"$match": {
"startdatetime": {
"$gt": new ISODate("2016-01-10T06:00:29Z"),
"$lt": new ISODate("2016-06-10T06:00:29Z")
},
"diveValues.depth": { "$gte": 1, "$lte": 4 }
}
},
{ "$unwind": "$diveValues" },
{ "$match": { "diveValues.depth": { "$gte": 1, "$lte": 4 } } },
{
"$group": {
"_id": {
"year": { "$year": "$startdatetime" },
"month": { "$month": "$startdatetime" }
},
"avgTemp": { "$avg": "$diveValues.temp" }
}
}
])
If you want results to contain the average temps for all depths and for the 1-4 depth range, then you would need to run this pipeline which would use the $cond tenary operator to feed the $avg operator the accumulated temperatures within a group based on the depth range:
db.collection.aggregate([
{
"$match": {
"startdatetime": {
"$gt": new ISODate("2016-01-10T06:00:29Z"),
"$lt": new ISODate("2016-06-10T06:00:29Z")
}
}
},
{ "$unwind": "$diveValues" },
{
"$group": {
"_id": {
"year": { "$year": "$startdatetime" },
"month": { "$month": "$startdatetime" }
},
"avgTemp": { "$avg": "$diveValues.temp" },
"avgTempDepth1-4": {
"$avg": {
"$cond": [
{
"$and": [
{ "$gte": [ "$diveValues.depth", 1 ] },
{ "$lte": [ "$diveValues.depth", 4 ] }
]
},
"$diveValues.temp",
null
]
}
}
}
}
])
First of all, the date $match operator should be used at the beginning of the pipeline so that indexes can be used.
Now, to the question, you just need to filter the depth interval like you did with the dates:
db.col.aggregate([
{"$match": {
'startdatetime': {
"$gt": new ISODate("2016-01-10T06:00:29Z"),
"$lt": new ISODate("2016-11-10T06:00:29Z")
}
}},
{"$unwind": "$diveValues"},
{"$match": {
"diveValues.depth": {
"$gte": 1.0,
"$lt": 4.0
}
}},
{"$group": {
"_id": {
"year": {"$year": "$startdatetime" },
"month": {"$month": "$startdatetime" }
},
"avgTemp": { "$avg": "$diveValues.temp" }}
}
])
This will give you the average only for the chosen depth interval.

is It possible to compare two Months Data in single Collection in MongoDB?

I have collection database with 10 000 000 call records.
I want to compare call usage of previous month to next month.
Example of collection document
{
"_id" : ObjectId("54ed74d76c68d23af73e230a"),
"msisdn" : "9818441000",
"callType" : "ISD"
"duration" : 10.109999656677246,
"charges" : 200,
"traffic" : "Voice",
"Date" : ISODate("2014-01-05T19:51:01.928Z")
}
{
"_id" : ObjectId("54ed74d76c68d23af73e230b"),
"msisdn" : "9818843796",
"callType" : "Local",
"duration" : 1,
"charges" : 150,
"traffic" : "Voice",
"Date" : ISODate("2014-02-04T14:25:35.861Z")
}
Duration is my usage.
I want to compare duration of ISODate("2014-01-04T14:25:35.861Z") with next month ISODate("2014-02-04T14:25:35.861Z") of all records.
All msisdn number are same in both months.
The obvious call here seems to be to aggregate the data, which in MongoDB the aggregation framework is well suited to. Taking the general use case fields that I see present here. And yes, we generally talk in terms of discrete months rather than some value assumed to be one month from the current point in time:
db.collection.aggregate([
{ "$match": {
"msisdn": "9818441000",
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } }
])
The intent there is to produce two records in the response representing each month as a distinct value.
You can basically take those two results and compare the difference between them in client code.
Or you can do this over all "MSISDN" values with months grouped into pairs within the document:
db.collection.aggregate([
{ "$match": {
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"msisdn": "$msisdn",
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": {
"msisdn": "$_id.msisdn",
"callType": "$_id.callType",
"traffic": "$_id.traffic"
},
"data": { "$push": {
"year": "$_id.year",
"month": "$_id.month",
"charges": "$charges",
"duration": "$duration"
}}
}}
])

Aggregate Query in Mongodb returns specific field

Document Sample:
{
"_id" : ObjectId("53329dfgg43771e49538b4567"),
"u" : {
"_id" : ObjectId("532a435gs4c771edb168c1bd7"),
"n" : "Salman khan",
"e" : "salman#gmail.com"
},
"ps" : 0,
"os" : 1,
"rs" : 0,
"cd" : 1395685800,
"ud" : 0
}
Query:
db.collectiontmp.aggregate([
{$match: {os:1}},
{$project : { name:{$toUpper:"$u.e"} , _id:0 } },
{$group: { _id: "$u._id",total: {$sum:1} }},
{$sort: {total: -1}}, { $limit: 10 }
]);
I need following things from the above query:
Group by u._id
Returns total number of records and email from the record, as shown below:
{
"result":
[
{
"email": "",
"total": ""
},
{
"email": "",
"total": ""
}
],
"ok":
1
}
The first thing you are doing wrong here is not understanding how $project is intended to work. Pipeline stages such as $project and $group will only output the fields that are "explicitly" identified. So only the fields you say to output will be available to the following pipeline stages.
Specifically here you "project" only part of the "u" field in your document and you therefore removed the other data from being available. The only present field here now is "name", which is the one you "projected".
Perhaps it was really your intention to do something like this:
db.collectiontmp.aggregate([
{ "$group": {
"_id": {
"_id": "$u._id",
"email": { "$toUpper": "$u.e" }
},
"total": { "$sum": 1 },
}},
{ "$project": {
"_id": 0,
"email": "$_id.email",
"total": 1
}},
{ "$sort": { "total": -1 } },
{ "$limit": 10 }
])
Or even:
db.collectiontmp.aggregate([
{ "$group": {
"_id": "$u._id",
"email": { "$first": { "$toUpper": "$u.e" } }
"total": { "$sum": 1 },
}},
{ "$project": {
"_id": 0,
"email": 1,
"total": 1
}},
{ "$sort": { "total": -1 } },
{ "$limit": 10 }
])
That gives you the sort of output you are looking for.
Remember that as this is a "pipeline", then only the "output" from a prior stage is available to the "next" stage. There is no "global" concept of the document as this is not a declarative statement such as in SQL, but a "pipeline".
So think Unix pipe "|" command, or otherwise look that up. Then your thinking will fall into place.