is It possible to compare two Months Data in single Collection in MongoDB? - mongodb

I have collection database with 10 000 000 call records.
I want to compare call usage of previous month to next month.
Example of collection document
{
"_id" : ObjectId("54ed74d76c68d23af73e230a"),
"msisdn" : "9818441000",
"callType" : "ISD"
"duration" : 10.109999656677246,
"charges" : 200,
"traffic" : "Voice",
"Date" : ISODate("2014-01-05T19:51:01.928Z")
}
{
"_id" : ObjectId("54ed74d76c68d23af73e230b"),
"msisdn" : "9818843796",
"callType" : "Local",
"duration" : 1,
"charges" : 150,
"traffic" : "Voice",
"Date" : ISODate("2014-02-04T14:25:35.861Z")
}
Duration is my usage.
I want to compare duration of ISODate("2014-01-04T14:25:35.861Z") with next month ISODate("2014-02-04T14:25:35.861Z") of all records.
All msisdn number are same in both months.

The obvious call here seems to be to aggregate the data, which in MongoDB the aggregation framework is well suited to. Taking the general use case fields that I see present here. And yes, we generally talk in terms of discrete months rather than some value assumed to be one month from the current point in time:
db.collection.aggregate([
{ "$match": {
"msisdn": "9818441000",
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } }
])
The intent there is to produce two records in the response representing each month as a distinct value.
You can basically take those two results and compare the difference between them in client code.
Or you can do this over all "MSISDN" values with months grouped into pairs within the document:
db.collection.aggregate([
{ "$match": {
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"msisdn": "$msisdn",
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": {
"msisdn": "$_id.msisdn",
"callType": "$_id.callType",
"traffic": "$_id.traffic"
},
"data": { "$push": {
"year": "$_id.year",
"month": "$_id.month",
"charges": "$charges",
"duration": "$duration"
}}
}}
])

Related

Group by months and get their counts in MongoDB

I have a document with the following structure.
{
"_id" : ObjectId(""),
"review_id" : "1",
"product_id" : "1",
"date" : 1638869377,
"rating" : 5,
"title" : "lorem",
"review" : "lorem",
"updated_at" : ISODate("2021-12-07T07:10:55.732Z"),
"created_at" : ISODate("2021-12-07T05:04:11.750Z")
}
I managed to get the number of user comments by month. But I want to get it separately for each rating.
For example, in the 10th month there are 7 comments and how can I get 2 of them 1 star, 3 of them 4 stars, 2 of them 5 stars?
{
"$group": {
"_id": {
"month": {
"$month": {
"$toDate": "$date"
}
}
},
"reviews": {
"$sum": 1
}
}
}
You could add everything you want to group by inside the _id field. Just adding a new field with the rating would do the job.
{
"$group": {
"_id": {
"month": {
"$month": {
"$toDate": "$date"
}
},
"rating": "$rating"
},
"reviews": {
"$sum": 1
}
}
}
Note: I would also add a group by year, so you can get each month separed properly. If you dont add them, Mongo will group every January of every year together for example.
Playground
You just want to $group by both fields, the month and the rating, like so:
db.collection.aggregate([
{
"$group": {
"_id": {
"month": {
"$month": {
"$toDate": "$date"
}
},
rating: "$rating"
},
"reviews": {
"$sum": 1
}
}
}
])
Mongo Playground
Or my recommendation just to add a little more structuring to the output is to use this pipeline.

How can i alter this query return the average over all overs in the db. mongodb

db.temperature.aggregate([
{ "$match": {
"$and": [
{ "date": { "$gte": ISODate("2017-10-12T22:00:00Z") }},
{ "date": { "$lt": ISODate("2017-10-12T22:59:99Z") }}
]
}},
{ "$group": {
"_id": { "$hour": "$date" },
"temperature": {
"$avg": "$temperature"
}
}}
])
The data looks like
{
"_id" : ObjectId("5df25dd648bfdfee3906e0cd"),
"date" : ISODate("2017-10-12T22:00:00Z"),
"power" : 39
}
There is a record for every minute and i am trying to get the average over every hour in the database. This returns the average over a specific hour.
You can simply remove the $match part of your query:
db.temperature.aggregate([
{ "$group": {
"_id": { "$hour": "$date" },
"temperature": {
"$avg": "$temperature"
}
}}
])
You can see the output of this query, with some sample data, by clicking on run in this playground.

MongoDB - Aggregate by distinct field then count per day

I have a mongodb database that collects device data.
Example document is
{
"_id" : ObjectId("5c125a185dea1b0252c5352"),
"time" : ISODate("2018-12-13T15:09:42.536Z"),
"mac" : "10:06:21:3e:0a:ff",
}
The goal would be to count the unique mac values per day, from the first document in the db to the last document in the db.
I've been playing around and came to the conclusion that I would need to have multiple groups as well as projects during my aggregations.
This is what I tried - not sure if it's in the right direction or not or just completely messed up.
pipeline = [
{"$project": {
"_id": 1,
"mac": 1,
"day": {
"$dayOfMonth":"$time"
},
"month": {
"$month":"$time"
},
"year": {
"$year":"$time"
}
}
},
{
"$project": {
"_id": 1,
"mac": 1,
"time": {
"$concat": [{
"$substr":["$year", 0, 4]
},
"-", {
"$substr": ["$month", 0, 2]
},
"-",
{
"$substr":["$day", 0, 2]
}]
}
}
},
{
"$group": {
"_id": {
"time": "$time",
"mac": "$mac"
}
},
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1},
}
}
]
data = list(collection.aggregate(pipeline, allowDiskUse=True))
The output now doesn't look like it did any aggregation,
[{"_id": null, "count": 751050}]
I'm using Pymongo as my driver and using Mongodb 4.
Ideally it should just show the date and count (eg { "_id" : "2018-12-13", "count" : 2 }.
I would love some feedback and advice.
Thanks in advance.
I prefer to minimize the number of stages, and especially to avoid unnecessary $group stages. So I would do it with the following pipeline:
pipeline = [
{ '$group' : {
'_id': { '$dateToString': { 'format': "%Y-%m-%d", 'date': "$time" } },
'macs':{ '$addToSet': '$mac' }
} },
{$addFields:{ 'macs':{'$size':'$macs'}}}
]
There's an operator called "$dateToString", which would solve most of your problems.
Edit: Didn't read the question carefully, #Asya Kamsky, thank you for pointing out. Here' the new answer.
pipeline = [
{
"$group": {
"_id": {
"date": {
$dateToString: {
format: "%Y-%m-%d",
date: "$time"
}
},
"mac": "$mac"
}
}
},
{
"$group": {
"_id": "$_id.date",
"count": {
"$sum": 1
}
}
}
]
[
{
"$project": {
"_id": 1,
"mac": 1,
"time": { "$dateToString": { "format": "%Y-%m-%d", "date": "$time", "timezone": "Africa/Johannesburg"}}
},
},
{
"$group": {
"_id":{
"time": "$time",
"mac": "$mac",
}}},{
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1}
}},
{"$sort": SON([("_id", -1)])}
]
Does exactly what it should do.
Thanks. :)

how to aggregate in mongoDB

I have a document called user.monthly, in that I have we used store 'day' : no. of clicks .
Here I have given 2 samples for different date
For month January
{
name : "devid",
date : ISODate("2014-01-21T11:32:42.392Z"),
daily: {'1':12,'9':13,'30':13}
}
For month February
{
name : "devid",
date : ISODate("2014-02-21T11:32:42.392Z"),
daily: {'3':12,'12':13,'25':13}
}
How can I aggregate this and get total clicks for January and February ?
Please help me to resolve my problem.
Your current schema is not helping you here as the "daily" field ( which we presume is your clicks per type or something like that ) is represented as a sub-document, which means that you need to explicitly name the path to each field in order to do something with it.
A better approach would be to put this information in an array:
{
"name" : "devid",
"date" : ISODate("2014-02-21T11:32:42.392Z"),
"daily": [
{ "type": "3", "clicks": 12 },
{ "type": "12", "clicks": 13 },
{ "type": "25", "clicks": 13 }
]
}
Then you have an aggregation statement that goes like this:
db.collection.aggregate([
// Just match the dates in January and February
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
// Unwind the "daily" array
{ "$unwind": "$daily" },
// Group the values together by "type" on "January" and "February"
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"type": "$daily.type"
},
"clicks": { "$sum": "$daily.clicks" }
}},
// Sort the result nicely
{ "$sort": {
"_id.year": 1,
"_id.month": 1,
"_id.type": 1
}}
])
That form is pretty simple. Or even if you do not care about the type as a grouping and just want the month totals:
db.collection.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
{ "$unwind": "$daily" },
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"clicks": { "$sum": "$daily.clicks" }
}},
{ "$sort": { "_id.year": 1, "_id.month": 1 }}
])
But with the current sub-document form you currently have this becomes ugly:
db.collection.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"clicks": {
"$sum": {
"$add": [
{ "$ifNull": ["$daily.1", 0] },
{ "$ifNull": ["$daily.3", 0] },
{ "$ifNull": ["$daily.9", 0] },
{ "$ifNull": ["$daily.12", 0] },
{ "$ifNull": ["$daily.25", 0] },
{ "$ifNull": ["$daily.30", 0] },
]
}
}
}}
])
That shows that you have no other option here other than to specify what is essentially every possible field under daily ( so probably much larger ). Then we have to evaluate as that key may possibly not exist for a given document to return a default value.
For example, your first document has no key "daily.3" so without the $ifNull check the returned value would be null and invalidate the whole $sum process so that the total would be "0".
Grouping on those keys as in the first aggregate example gets even worse:
db.collection.aggregate([
// Just match the dates in January and February
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
// Project with an array to match all possible values
{ "$project": {
"date": 1,
"daily": 1,
"type": { "$literal": ["1", "3", "9", "12", "25", "30" ] }
}},
// Unwind the "type" array
{ "$unwind": "$type" },
// Project values onto the "type" while grouping
{ "$group" : {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"type": "$type"
},
"clicks": { "$sum": { "$cond": [
{ "$eq": [ "$type", "1" ] },
"$daily.1",
{ "$cond": [
{ "$eq": [ "$type", "3" ] },
"$daily.3",
{ "$cond": [
{ "$eq": [ "$type", "9" ] },
"$daily.9",
{ "$cond": [
{ "$eq": [ "$type", "12" ] },
"$daily.12",
{ "$cond": [
{ "$eq": [ "$type", "25" ] },
"$daily.25",
"$daily.30"
]}
]}
]}
]}
]}}
}},
{ "$sort": {
"_id.year": 1,
"_id.month": 1,
"_id.type": 1
}}
])
Which is creating one big conditional evaluation using $cond to match out the values to the "type" which we projected all possible values in an array using the $literal operator.
If you do not have MongoDB 2.6 or greater you can always do this in place of the $literal operator statement:
"type": { "$cond": [1, ["1", "3", "9", "12", "25", "30" ], 0] }
Where essentially the true evaluation from $cond returns a "literal" declared value, which is how you specify an array. There is also the hidden $const operator that is not documented, but now exposed as $literal.
As you can see the structure here is doing you no favors, so the best option is to change it. But if you cannot and otherwise find the aggregation concept for this too hard to handle, then mapReduce offers an approach, but the processing will be much slower:
db.collection.mapReduce(
function () {
for ( var k in this.daily ) {
emit(
{
year: this.date.getFullYear(),
month: this.date.getMonth() + 1,
type: k
},
this.daily[k]
);
}
},
function(key,values) {
return Array.sum( values );
},
{
"query": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
},
"out": { "inline": 1 }
}
)
The general lesson here is that you will get the cleanest and fastest results by altering the document format and using the aggregation framework. But all the ways to do this are listed here.

Mongodb count distinct with multiple group fields

I have transaction table which is populated by holidays taken by the employees.
I would need help on following sql scenario in mongodb.
select employee,month,year,count(distinct (holiday_type) from
transactions group by employee,month,year
I need to use aggregation in mongodb and was created mongo query like this and this gives me wrong solution
db.transactions.aggregate([
{ "$group": {
"_id": {
"Month": { "$month" : "$date" },
"Year": { "$year" : "$date" },
"employee" : "$employee",
"holiday_type" : "$holiday_type"
},
"Count_of_Transactions" : { "$sum" : 1 }
}}
]);
I am confused in using count distinct logic in mongodb. Any suggestion would be helpful
Part of the way there but you need to get the "distinct" values for "holiday_type" first, then you $group again:
db.transactions.aggregate([
{ "$group": {
"_id": {
"employee" : "$employee",
"Month": { "$month" : "$date" },
"Year": { "$year" : "$date" },
"holiday_type" : "$holiday_type"
},
}},
{ "$group": {
"_id": {
"employee" : "$_id.employee",
"Month": "$_id.Month",
"Year": "$_id.Year"
},
"count": { "$sum": 1 }
}}
], { "allowDiskUse": true }
);
That is the general process as "distinct" in SQL is kind of a grouping operation in itself. So it is a double $group operation in order to get your correct result.