I have a collection of transaction data in mongodb, like this:
[
{timestamp: ISODate("2015-11-10T11:33:41.075Z"), nominal: 25.121},
{timestamp: ISODate("2015-11-22T11:33:41.075Z"), nominal: 25.121},
{timestamp: ISODate("2015-11-23T11:33:41.075Z"), nominal: 26.121},
{timestamp: ISODate("2015-12-03T11:33:41.075Z"), nominal: 30.121},
]
How can I use mongodb's aggregate to calculate my total transaction each month?
I tried:
db.getCollection('transaction').aggregate([
{ $group: {_id: "$timestamp", total: {$sum: "$nominal"} } }
])
But it failed since I use timestamp instead of month. I don't want to add another field for month to transaction data. I think about a custom made function for $group pipeline that returns month value.
You need a preliminary $project stage where you use the $month operator to return the "month".
db.transaction.aggregate([
{ "$project": {
"nominal": 1,
"month": { "$month": "$timestamp" }
}},
{ "$group": {
"_id": "$month",
"total": { "$sum": "$nominal" }
}}
])
Which returns:
{ "_id" : 12, "total" : 30.121 }
{ "_id" : 11, "total" : 76.363 }
In case you want to group per year-month (to avoid months from different years to be grouped together), you can use $dateToString:
// { timestamp: ISODate("2015-11-10T11:33:41.075Z"), nominal: 25.121 }
// { timestamp: ISODate("2015-11-22T11:33:41.075Z"), nominal: 25.121 }
// { timestamp: ISODate("2015-11-23T11:33:41.075Z"), nominal: 26.121 }
// { timestamp: ISODate("2015-12-03T11:33:41.075Z"), nominal: 30.121 }
db.collection.aggregate([
{ $group: {
_id: { $dateToString: { date: "$timestamp", format: "%Y-%m" } },
total: { $sum: "$nominal" }
}}
])
// { _id: "2015-12", total: 30.121 }
// { _id: "2015-11", total: 76.363 }
Related
My aggregation gets the data of documents per week. In this case I'm getting data from days 18 to 24 of may:
{ "_id" : 20, "count" : 795 }
{ "_id" : 21, "count" : 221 }
Since 'week' in mongo starts from sundays, the data from sundays is creating a new week (In this case is 21). Is there any way I can transfer the data from Sundays to the week before or backwards?
The result would be:
{ "_id" : 20, "count" : 1016 }
Aggregation:
[{
$match: {
start_date: {
$gte: ISODate('2020-05-18T00:00:01'),
$lte: ISODate('2020-05-24T23:59:59')
}
}
}, {
$project: {
week: {
$week: '$start_date'
},
solved: '$solved',
survey: '$survey'
}
}, {
$group: {
_id: '$week',
count: {
$sum: 1
}
}
}, {
$sort: {
_id: 1
}
}]
UPDATE:
I think the below query will do the trick.
The timezone key in if condition can be removed if your week-wise-sort is independent of the time zone of ISODate value in DB
db.<Collection-Name>.aggregate([
{
$match: {
start_date: {
$gte: ISODate('2020-05-18T00:00:01'),
$lte: ISODate('2020-05-24T23:59:59')
}
}
}, {
$project: {
week: {
"$cond": {
"if": {"$eq": [{"$dayOfWeek": {"date": "$start_date", "timezone": "-0500"}}, 1]},
"then": {"$subtract": [{"$week": '$start_date'}, 1]},
"else": {"$week": '$start_date'}
}
},
solved: '$solved',
survey: '$survey'
}
}, {
$group: {
_id: '$week',
count: {
$sum: 1
}
}
}, {
$sort: {
_id: 1
}
}
])
I have the following documents stored in a collection:
{
"REQUESTTIMESTAMP" : "26-JUN-19 01.34.10.095000000 AM",
"UNHANDLED_INTENT" : 0,
"USERID" : "John",
"START_OF_INTENT_SKILL_CONVERSATION" : 0,
"PROPERTYCODE" : ""
}
I want to group this by the hour(which we will get from 'REQUESTTIMESTAMP')
Earlier, I had this document stored in the collection in a different way, where I had a separate field for hours, and used that hours field to group:
Previous aggregation query :
collection.aggregate([
{'$match': query}, {
'$group': {
"_id": {
"hour": "$hour",
"sessionId": "$sessionId"
}
}
}, {
"$group": {
"_id": "$_id.hour",
"count": {
"$sum": 1
}
}
}
])
Previous collection structure:
{
"timestamp" : "1581533210921",
"date" : "12-02-2020",
"hour" : "13",
"month" : "02",
"time" : "13:46:50",
"weekDay" : "Wednesday",
"__v" : 0
}
How can I do the above same Previous aggregation query with the new document structure (After extracting hours from 'REQUESTTIMESTAMP' field?)
You should convert your timestamp to Date object then take hour from your date object.
db.collection.aggregate([{
'$match': query
}, {
$project: {
date: {
$dateFromString: {
dateString: '$REQUESTTIMESTAMP',
format: "%m-%d-%Y" //This should be your date format
}
}
}
}, {
$group: {
_id: {
hour: {
$hour: "$date"
}
}
}
}])
Problem is months names are not supported by MongoDB. Either you write a lot of code or you use libraries like moments.js. First update your REQUESTTIMESTAMP to proper Date object, then you can group it.
db.collection.find().forEach(function (doc) {
var d = moment(doc.REQUESTTIMESTAMP, "DD-MMM-YY hh.mm.ss.SSS a");
db.collection.updateOne(
{ _id: doc._id },
{ $set: { date: d.toDate() } }
);
})
db.collection.aggregate([
{
$group: {
_id: { $hour: "$date" },
count: { $sum: 1 }
}
}
])
In case if you're not able to update DB with actual date field & still wanted to proceed with existing format, try this query it will add hour field extracted from given string field REQUESTTIMESTAMP :
Query :
db.collection.aggregate([
{
$addFields: {
hour: {
$let: {
/** split string into three parts date + hours + AM/PM */
vars: { hour: { $slice: [{ $split: ["$REQUESTTIMESTAMP", " "] }, 1, 2] } },
in: {
$cond: [{ $in: ["AM", "$$hour"] }, // Check AM exists in array
{ $toInt: { $substr: [{ $arrayElemAt: ["$$hour", 0] }, 0, 2] } }, // If yes then return int of first 2 letters of first element in hour array
{ $add: [{ $toInt: { $substr: [{ $arrayElemAt: ["$$hour", 0] }, 0, 2] } }, 12] } ] // If PM add 12 to int of first 2 letters of first element in hour array
}
}
}
}
}
])
Test : MongoDB-Playground
I have a document which describes counts of different things observed by a camera within a 15 minute period. It looks like this:
{
"_id" : ObjectId("5b1a709a83552d002516ac19"),
"start" : ISODate("2018-06-08T11:45:00.000Z"),
"end" : ISODate("2018-06-08T12:00:00.000Z"),
"recording" : ObjectId("5b1a654683552d002516ac16"),
"data" : {
"counts" : {
"5b434d05da1f0e00252566be" : 12,
"5b434d05da1f0e00252566cc" : 4,
"5b434d05da1f0e00252566ca" : 1
}
}
}
The keys inside the data.counts object change with each document and refer to additional data that is fetched at a later date. There are unlimited number of keys inside data.counts (but usually about 20)
I am trying to aggregate all these 15 minute documents up to daily aggregated documents.
I have this query at the moment to do that:
db.getCollection("segments").aggregate([
{$match:{
"recording": ObjectId("5bf7f68ad8293a00261dd83f")
}},
{$project:{
"start": 1,
"recording": 1,
"data": 1
}},
{$group:{
_id: { $dateToString: { format: "%Y-%m-%d", date: "$start" } },
"segments": { $push: "$$ROOT" }
}},
{$sort: {_id: -1}},
]);
This does the grouping and returns all the segments in an array.
I want to also aggregate the information inside data.counts, so that I get the sum of values for all keys that are the same within the daily group.
This would save me from having another service loop through each 15 minute segment summing values with the same keys. E.g. the query would return something like this:
{
"_id" : "2019-02-27",
"counts" : {
"5b434d05da1f0e00252566be" : 351,
"5b434d05da1f0e00252566cc" : 194,
"5b434d05da1f0e00252566ca" : 111
... any other keys that were found within a day
}
}
How might I amend the query I already have, or use a different query?
Thanks!
You could use the $facet pipeline stage to create two sub-pipelines; one for segments and another for counts. These sub-pipelines can be joined by using $zip to stitch them together and $map to merge each 2-element array produced from zip. Note this will only work correctly if the sub-pipelines output sorted arrays of the same size, which is why we group and sort by start_date in each sub-pipeline.
Here's the query:
db.getCollection("segments").aggregate([{
$match: {
recording: ObjectId("5b1a654683552d002516ac16")
}
}, {
$project: {
start: 1,
recording: 1,
data: 1,
start_date: { $dateToString: { format: "%Y-%m-%d", date: "$start" }}
}
}, {
$facet: {
segments_pipeline: [{
$group: {
_id: "$start_date",
segments: {
$push: {
start: "$start",
recording: "$recording",
data: "$data"
}
}
}
}, {
$sort: {
_id: -1
}
}],
counts_pipeline: [{
$project: {
start_date: "$start_date",
count: { $objectToArray: "$data.counts" }
}
}, {
$unwind: "$count"
}, {
$group: {
_id: {
start_date: "$start_date",
count_id: "$count.k"
},
count_sum: { $sum: "$count.v" }
}
}, {
$group: {
_id: "$_id.start_date",
counts: {
$push: {
$arrayToObject: [[{
k: "$_id.count_id",
v: "$count_sum"
}]]
}
}
}
}, {
$project: {
counts: { $mergeObjects: "$counts" }
}
}, {
$sort: {
_id: -1
}
}]
}
}, {
$project: {
result: {
$map: {
input: { $zip: { inputs: ["$segments_pipeline", "$counts_pipeline"] }},
in: { $mergeObjects: "$$this" }
}
}
}
}, {
$unwind: "$result"
}, {
$replaceRoot: {
newRoot: "$result"
}
}])
Try it out here: Mongoplayground.
Collection
{
"_id" : ObjectId("5a143a79ca78479b1dc90161"),
"createdAt" : ISODate("2017-11-21T14:38:49.375Z"),
"amount" : 227.93359186,
"pair" : "ant_eth"
}
Expected output
{
"12-12-2012": [
{
"pair": "ant_eth",
"sum": "sum of amounts in 12-12-2012"
},
{
"pair": "new_pair",
"sum": "sum of amounts in 12-12-2012"
},
],
"13-12-2012": [{
"pair": "ant_eth",
"sum": "sum of amounts in 13-12-2012"
}]
}
What I achieved so far from my knowledge is;
const criteria = [
{ $group: {
_id: '$pair',
totalAmount: { $sum: '$amount' } } }
]
Any help to achieve the expected output is much appreciated.
OK, so you want to sum up amount by just the date portion of a datetime and pair, and then "organize" all the pair+sum by date. You can do this by "regrouping" as follows. The first $group creates the sums but leaves you with repeating dates. The second $group fixes up the output to almost what you wish except that the dates remain as rvals to the _id instead of becoming lvals (field names) themselves.
db.foo.aggregate([
{
$group: {
_id: {d: {$dateToString: { format: "%Y-%m-%d", date: "$createdAt"}}, pair: "$pair"},
n: {$sum: "$amount"}
}
},
{
$group: {
_id: "$_id.d",
items: {$push: {pair: "$_id.pair", sum: "$n"}}
}
}
]);
If you REALLY want to have field names, then add these two stages after the second $group:
,{$project: {x: [["$_id","$items"]] }}
,{$replaceRoot: { newRoot: {$arrayToObject: "$x"} }}
This is what I could get to:
db.collection.aggregate([{
$group: {
_id: {
year: {
"$year": "$createdAt"
},
month: {
"$month": "$createdAt"
},
day: {
"$dayOfMonth": "$createdAt"
},
pair: "$pair"
},
sum: {
$sum: "$amount"
}
}
}])
For rest of the thing, you probably need to do app side parsing to generate output you want
Here's my problem:
Model:
{ application: "abc", date: Time.now, status: "1" user_id: [ id1, id2,
id4] }
{ application: "abc", date: Time.yesterday, status: "1", user_id: [
id1, id3, id5] }
{ application: "abc", date: Time.yesterday-1, status: "1", user_id: [
id1, id3, id5] }
I need to count the unique number of user_ids in a period of time.
Expected result:
{ application: "abc", status: "1", unique_id_count: 5 }
I'm currently using the aggregation framework and counting the ids outside mongodb.
{ $match: { application: "abc" } }, { $unwind: "$users" }, { $group:
{ _id: { status: "$status"},
users: { $addToSet: "$users" } } }
My arrays of users ids are very large, so I have to iterate the dates or I'll get the maximum document limit (16mb).
I could also $group by
{ year: { $year: "$date" }, month: { $month: "$date" }, day: {
$dayOfMonth: "$date" }
but I also get the document size limitation.
Is it possible to count the set size in mongodb?
thanks
The following will return number of uniqueUsers per application. This will apply an group operation to a result of a group operation by using pipeline feature of mongodb.
{ $match: { application: "abc" } },
{ $unwind: "$users" },
{ $group: { _id: "$status", users: { $addToSet: "$users" } } },
{ $unwind:"$users" },
{ $group : {_id : "$_id", count : {$sum : 1} } }
Hopefully this will be done in an easier way in the following releases of mongo by a command which gives the size of an array under a projection. {$project: {id: "$_id", count: {$size: "$uniqueUsers"}}}
https://jira.mongodb.org/browse/SERVER-4899
Cheers
Sorry I'm a little late to the party. Simply grouping on the 'user_id' and counting the result with a trivial group works just fine and doesn't run into doc size limits.
[
{$match: {application: 'abc', date: {$gte: startDate, $lte: endDate}}},
{$unwind: '$user_id'},
{$group: {_id: '$user_id'}},
{$group: {_id: 'singleton', count: {$sum: 1}}}
];
Use $size to get the size of set.
[
{
$match: {"application": "abc"}
},
{
$unwind: "$user_id"
},
{
$group: {
"_id": "$status",
"application": "$application",
"unique_user_id": {$addToSet: "$user_id"}
}
},
{
$project:{
"_id": "$_id",
"application": "$application",
"count": {$size: "$unique_user_id"}
}
}
]