Mongo group aggregation with priority of record - mongodb

I am trying to perform a MongoDB 3.6 aggregation and I can't figure out the right way.
The problem is following. After performing several aggregation steps I end up with result set like this:
[
{ _id: { month: 1, type: 'estimate' }, value: 50 },
{ _id: { month: 2, type: 'estimate' }, value: 40 },
{ _id: { month: 3, type: 'estimate' }, value: 35 },
{ _id: { month: 3, type: 'exact' }, value: 33.532 },
{ _id: { month: 4, type: 'estimate' }, value: 10 },
{ _id: { month: 4, type: 'exact' }, value: 11.244 },
]
It contains values grouped by month. Value for every month can be 'estimated' or 'exact'. Now I would like to reduce this result to achieve this:
[
{ _id: { month: 1 }, value: 50 },
{ _id: { month: 2 }, value: 40 },
{ _id: { month: 3 }, value: 33.532 },
{ _id: { month: 4 }, value: 11.244 },
]
Basically I want to use the value of type 'exact' whenever it's possible and only fallback to 'estimate' value in months where the 'exact' is not known.
Any help or tip will be greatly appreciated. I would like to perform that aggregation in the DB not on server.

You can simply $sort by type and then take use $first in next $group stage which will give you exact if exists and estimate otherwise. Try:
db.col.aggregate([
{
$sort: { "_id.type": -1 }
},
{
$group:{
_id: "$_id.month",
value: { $first: "$value" }
}
},
{
$sort: { _id: 1 }
}
])
Prints:
{ "_id" : 1, "value" : 50 }
{ "_id" : 2, "value" : 40 }
{ "_id" : 3, "value" : 33.532 }
{ "_id" : 4, "value" : 11.244 }
So sorting by type is considered as prioritizing here since we know that lexically exact will be before estimate. You can also be more explicit and add extra field called weight (evaluated using $cond) operator and then sort by that weight:
db.col.aggregate([
{
$addFields: {
weight: { $cond: [ { $eq: [ "$_id.type", "exact" ] }, 2, 1 ] }
}
},
{
$sort: { "weight": -1 }
},
{
$group:{
_id: "$_id.month",
value: { $first: "$value" }
}
},
{
$sort: { _id: 1 }
}
])

Related

MongoDB get count of field per season from MM/DD/YYYY date field

I am facing a problem in MongoDB. Suppose, I have the following collection.
{ id: 1, issueDate: "07/05/2021", code: "31" },
{ id: 2, issueDate: "12/11/2020", code: "14" },
{ id: 3, issueDate: "02/11/2021", code: "98" },
{ id: 4, issueDate: "01/02/2021", code: "14" },
{ id: 5, issueDate: "06/23/2020", code: "14" },
{ id: 6, issueDate: "07/01/2020", code: "31" },
{ id: 7, issueDate: "07/05/2022", code: "14" },
{ id: 8, issueDate: "07/02/2022", code: "20" },
{ id: 9, issueDate: "07/02/2022", code: "14" }
The date field is in the format MM/DD/YYYY. My goal is to get the count of items with each season (spring (March-May), summer (June-August), autumn (September-November) and winter (December-February).
The result I'm expecting is:
count of fields for each season:
{ "_id" : "Summer", "count" : 6 }
{ "_id" : "Winter", "count" : 3 }
top 2 codes (first and second most recurring) per season:
{ "_id" : "Summer", "codes" : {14, 31} }
{ "_id" : "Winter", "codes" : {14, 98} }
How can this be done?
You should never store date/time values as string, store always proper Date objects.
You can use $setWindowFields opedrator for that:
db.collection.aggregate([
// Convert string into Date
{ $set: { issueDate: { $dateFromString: { dateString: "$issueDate", format: "%m/%d/%Y" } } } },
// Determine the season (0..3)
{
$set: {
season: { $mod: [{ $toInt: { $divide: [{ $add: [{ $subtract: [{ $month: "$issueDate" }, 1] }, 1] }, 3] } }, 4] }
}
},
// Count codes per season
{
$group: {
_id: { season: "$season", code: "$code" },
count: { $count: {} },
}
},
// Rank occurrence of codes per season
{
$setWindowFields: {
partitionBy: "$_id.season",
sortBy: { count: -1 },
output: {
rank: { $denseRank: {} },
count: { $sum: "$count" }
}
}
},
// Get only top 2 ranks
{ $match: { rank: { $lte: 2 } } },
// Final grouping
{
$group: {
_id: "$_id.season",
count: { $first: "$count" },
codes: { $push: "$_id.code" }
}
},
// Some cosmetic for output
{
$set: {
season: {
$switch: {
branches: [
{ case: { $eq: ["$_id", 0] }, then: 'Winter' },
{ case: { $eq: ["$_id", 1] }, then: 'Spring' },
{ case: { $eq: ["$_id", 2] }, then: 'Summer' },
{ case: { $eq: ["$_id", 3] }, then: 'Autumn' },
]
}
}
}
}
])
Mongo Playground
I will give you clues,
You need to use $group with _id as $month on issueDate, use accumulator $sum to get month wise count.
You can divide month by 3, to get modulo, using $toInt, $divide, then put them into category using $cond.
Another option:
db.collection.aggregate([
{
$addFields: {
"season": {
$switch: {
branches: [
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"06",
"07",
"08"
]
]
},
then: "Summer"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"03",
"04",
"05"
]
]
},
then: "Spring"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"12",
"01",
"02"
]
]
},
then: "Winter"
}
],
default: "No date found."
}
}
}
},
{
$group: {
_id: {
s: "$season",
c: "$code"
},
cnt1: {
$sum: 1
}
}
},
{
$sort: {
cnt1: -1
}
},
{
$group: {
_id: "$_id.s",
codes: {
$push: "$_id.c"
},
cnt: {
$sum: "$cnt1"
}
}
},
{
$project: {
_id: 0,
season: "$_id",
count: "$cnt",
codes: {
"$slice": [
"$codes",
2
]
}
}
}
])
Explained:
Add one more field for season based on $switch per month(extracted from issueDate string)
Group to collect per season/code.
$sort per code DESCENDING
group per season to form an array with most recurring codes in descending order.
Project the fields to the desired output and $slice the codes to limit only to the fist two most recurring.
Comment:
Indeed keeping dates in string is not a good idea in general ...
Playground

MongoDB sum of all fields with integer values

inside the aggregation framework, it's possibile in some way, for each document like this below:
{
"Title": "Number orders",
"2021-03-16": 3,
"2021-03-15": 6,
"2021-03-19": 1,
"2021-03-14": 19
}
Obtain a new document like this?
{
"Title": "Number orders",
"2021-03-16": 3,
"2021-03-15": 6,
"2021-03-19": 1,
"2021-03-14": 19
"Total": 29
}
Basically, I want a new field that have inside the sum of all the values of the fields that are integer.
Another thing to take in consideration is that the date fields are dynamic, so one week could be like the one in the example, the following week the fields would become like
{
"Title": "Number orders",
"2021-03-23": 3,
"2021-03-22": 6,
"2021-03-26": 1,
"2021-03-21": 19
}
Thanks!
Demo - https://mongoplayground.net/p/724nerJUQtK
$$ROOT is the entire document, add total using $addFields use $sum to add them up and remove allData using $unset
db.collection.aggregate([
{ $addFields: { allData: { "$objectToArray": "$$ROOT" } } } },
{ $addFields: { "total": { $sum: "$allData.v" } } },
{ $unset: "allData" }
])
Based on your older question, I think this might help:
db.collection.aggregate([
{
$group: {
_id: {
dDate: "$deliveryDay",
name: "$plate.name"
},
v: { $sum: "$plate.quantity" }
}
},
{
$group: {
_id: "$_id.name",
Total: { $sum: "$v" },
array: {
$push: { k: "$_id.dDate", v: "$v" }
}
}
},
{
$addFields: {
array: {
$concatArrays: [
[{ k: "Title", v: "Number orders" }],
"$array",
[{ k: "Total", v: "$Total" }]
]
}
}
},
{
$replaceRoot: {
newRoot: { $arrayToObject: "$array" }
}
}
])
Output:
/* 1 */
{
"Title" : "Number orders",
"2021-01-16" : 2,
"Total" : 2
},
/* 2 */
{
"Title" : "Number orders",
"2021-01-14" : 1,
"2021-01-16" : 3,
"Total" : 4
}

MongoDb aggregate pipeline with multiple groupings

I'm trying to get my head around an aggregate pipeline in MongoDb with multiple groups.
I have the following data: https://gist.github.com/bomortensen/36e6b3fbc987a096be36a66bbfe30d82
Expected data would be: https://gist.github.com/bomortensen/7b220df1f1da83be838acfb2ed79a2ee (total quantity sum based on highest version, hourly)
I need to write a query which does the following:
Group the data by the field MeterId to get unique meter groups.
In each group I then need to group by the StartDate's year, month, day and hour since all objects StartDate is stored as quarters, but I need to aggregate them into whole hours.
Finally, I need to only get the highest version from the Versions array by VersionNumber
I've tried the following query, but must admit I'm stuck:
mycollection.aggregate([
{ $group: {
_id : { ediel: "$_id.MeterId", start: "$_id.StartDate" },
versions: { $push: "$Versions" }
}
},
{ $unwind: { path: "$versions" } },
{ $group: {
_id: {
hour: { $hour: "$_id.start.DateTime" },
key: "$_id"
},
quantitySum: { $sum: "$Versions.Quantity" }
}
},
{ $sort: { "_id.hour": -1 } }
]);
Does anyone know how I should do this? :-)
This would give :
1 $project : get $hour from date, create a maxVersion field per record
1 $unwind to remove the Versions array
1 $project to add a keep field that will contain a boolean to check if the record should be kept or not
1 $match that match only higher version number eg keep == true
1 $group that group by id/hour and sum the quantity
1 $project to set up your required format
Query is :
db.mycollection.aggregate([{
$project: {
_id: 1,
Versions: 1,
hour: {
"$hour": "$_id.StartDate"
},
maxVersion: { $max: "$Versions.VersionNumber" }
}
}, {
$unwind: "$Versions"
}, {
$project: {
_id: 1,
Versions: 1,
hour: 1,
maxVersion: 1,
keep: { $eq: ["$Versions.VersionNumber", "$maxVersion"] }
}
}, {
$match: { "keep": true }
}, {
$group: {
_id: { _id: "$_id.MeterId", hour: "$hour" },
StartDate: { $first: "$_id.StartDate" },
QuantitySum: { $sum: "$Versions.Quantity" }
}
}, {
$project: {
_id: { _id: "$_id._id", StartDate: "$StartDate" },
hour: "$_id.hour",
QuantitySum: 1
}
}])
In your example output you take into account only the first higher versionNumber, You have { "VersionNumber" : 2, "Quantity" : 7.5 } and { "VersionNumber" : 2, "Quantity" : 8.4 } for hour 2 and id 1234 but you only take { "VersionNumber" : 2, "Quantity" : 7.5 }
I dont know if this is intended or not but in this case you want to take only the first MaxVersion number. After the $match, I added :
1 $group that push versions previously filter in an array
1 $project that $slice this array to take only the first element
1 $unwind to remove this array (which contains only one elemement)
The query that match your output is :
db.mycollection.aggregate([{
$project: {
_id: 1,
Versions: 1,
hour: {
"$hour": "$_id.StartDate"
},
maxVersion: { $max: "$Versions.VersionNumber" }
}
}, {
$unwind: "$Versions"
}, {
$project: {
_id: 1,
Versions: 1,
hour: 1,
maxVersion: 1,
keep: { $eq: ["$Versions.VersionNumber", "$maxVersion"] }
}
}, {
$match: { "keep": true }
}, {
$group: {
_id: { _id: "$_id.MeterId", StartDate: "$_id.StartDate" },
Versions: { $push: "$Versions" },
hour: { "$first": "$hour" }
}
}, {
$project: {
_id: 1,
hour: 1,
Versions: { $slice: ["$Versions", 1] }
}
}, {
$unwind: "$Versions"
}, {
$sort: {
_id: 1
}
}, {
$group: {
_id: { _id: "$_id._id", hour: "$hour" },
StartDate: { $first: "$_id.StartDate" },
QuantitySum: { $sum: "$Versions.Quantity" }
}
}, {
$project: {
_id: { _id: "$MeterId._id", StartDate: "$StartDate" },
Hour: "$_id.hour",
QuantitySum: 1
}
}])
Output is :
{ "_id" : { "MeterId" : "4567", "StartDate" : ISODate("2016-09-20T03:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 3 }
{ "_id" : { "MeterId" : "4567", "StartDate" : ISODate("2016-09-20T02:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 2 }
{ "_id" : { "MeterId" : "1234", "StartDate" : ISODate("2016-09-20T03:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 3 }
{ "_id" : { "MeterId" : "1234", "StartDate" : ISODate("2016-09-20T02:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 2 }
Sorry, I just dont find a straight forward way to round the hour. You can try the following. You will unwind the versions, so you can apply the grouping to collect the max version, push the versions for the next step, which is to project to filter the matching records with max version and final project to sum the max versions quantity. Right now start dt is the min from the group. You should be fine as long as you have versions at the top of the hour.
db.collection.aggregate([{
$unwind: {
path: "$Versions"
}
}, {
$group: {
_id: {
MeterId: "$_id.MeterId",
start: {
$hour: "$_id.StartDate"
}
},
startDate: {
$min: "$_id.StartDate"
},
maxVersion: {
$max: "$Versions.VersionNumber"
},
Versions: {
$push: "$Versions"
}
}
}, {
$sort: {
"_id.start": -1
}
}, {
$project: {
_id: {
MeterId: "$_id.MeterId",
StartDate: "$startDate"
},
hour: "$_id.start",
Versions: {
$filter: {
input: "$Versions",
as: "version",
cond: {
$eq: ["$maxVersion", "$$version.VersionNumber"]
}
}
}
}
}, {
$project: {
_id: 1,
hour: 1,
QuantitySum: {
$sum: "$Versions.Quantity"
}
}
}]);
Sample Output
{
"_id": {
"MeterId": "1234",
"StartDate": ISODate("2016-09-20T02:00:00Z")
},
"QuantitySum": 15,
"hour": 2
}

Grouping and counting across documents?

I have a collection with documents similar to the following format:
{
departure:{name: "abe"},
arrival:{name: "tom"}
},
{
departure:{name: "bob"},
arrival:{name: "abe"}
}
And to get output like so:
{
name: "abe",
departureCount: 1,
arrivalCount: 1
},
{
name: "bob",
departureCount: 1,
arrivalCount: 0
},
{
name: "tom",
departureCount: 0,
arrivalCount: 1
}
I'm able to get the counts individually by doing a query for the specific data like so:
db.sched.aggregate([
{
"$group":{
_id: "$departure.name",
departureCount: {$sum: 1}
}
}
])
But I haven't figured out how to merge the arrival and departure name into one document along with counts for both. Any suggestions on how to accomplish this?
You should use a $map to split your doc into 2, then $unwind and $group..
[
{
$project: {
dep: '$departure.name',
arr: '$arrival.name'
}
},
{
$project: {
f: {
$map: {
input: {
$literal: ['dep', 'arr']
},
as: 'el',
in : {
type: '$$el',
name: {
$cond: [{
$eq: ['$$el', 'dep']
}, '$dep', '$arr']
}
}
}
}
}
},
{
$unwind: '$f'
}, {
$group: {
_id: {
'name': '$f.name'
},
departureCount: {
$sum: {
$cond: [{
$eq: ['$f.type', 'dep']
}, 1, 0]
}
},
arrivalCount: {
$sum: {
$cond: [{
$eq: ['$f.type', 'arr']
}, 1, 0]
}
}
}
}, {
$project: {
_id: 0,
name: '$_id.name',
departureCount: 1,
arrivalCount: 1
}
}
]

How to group date quarterly wise?

I have documents which contain a date and I'm wondering how to group them according to quarterly basis?
My schema is:
var ekgsanswermodel = new mongoose.Schema({
userId: {type: Schema.Types.ObjectId},
topicId : {type: Schema.Types.ObjectId},
ekgId : {type: Schema.Types.ObjectId},
answerSubmitted :{type: Number},
dateAttempted : { type: Date},
title : {type: String},
submissionSessionId : {type: String}
});
1st quarter contains months 1, 2, 3. 2nd quarter contains months 4, 5, 6 and so on up-to 4th quarter.
My final result should be:
"result" : [
{
_id: {
quater:
},
_id: {
quater:
},
_id: {
quater:
},
_id: {
quater:
}
}
You could make use of the $cond operator to check if:
The $month is <= 3, project a field named quarter with
value as "one".
The $month is <= 6, project a field named quarter with
value as "two".
The $month is <= 9, project a field named quarter with
value as "three".
else the value of the field quarter would be "fourth".
Then $group by the quarter field.
Code:
db.collection.aggregate([
{
$project: {
date: 1,
quarter: {
$cond: [
{ $lte: [{ $month: "$date" }, 3] },
"first",
{
$cond: [
{ $lte: [{ $month: "$date" }, 6] },
"second",
{
$cond: [{ $lte: [{ $month: "$date" }, 9] }, "third", "fourth"],
},
],
},
],
},
},
},
{ $group: { _id: { quarter: "$quarter" }, results: { $push: "$date" } } },
]);
Specific to your schema:
db.collection.aggregate([
{
$project: {
dateAttempted: 1,
userId: 1,
topicId: 1,
ekgId: 1,
title: 1,
quarter: {
$cond: [
{ $lte: [{ $month: "$dateAttempted" }, 3] },
"first",
{
$cond: [
{ $lte: [{ $month: "$dateAttempted" }, 6] },
"second",
{
$cond: [
{ $lte: [{ $month: "$dateAttempted" }, 9] },
"third",
"fourth",
],
},
],
},
],
},
},
},
{ $group: { _id: { quarter: "$quarter" }, results: { $push: "$$ROOT" } } },
]);
You could use following to group documents quarterly.
{
$project : {
dateAttempted : 1,
dateQuarter: {
$trunc : {$add: [{$divide: [{$subtract: [{$month:
"$dateAttempted"}, 1]}, 3]}, 1]}
}
}
}
Starting in Mongo 5, it's a perfect use case for the new $dateTrunc aggregation operator:
// { date: ISODate("2012-10-11") }
// { date: ISODate("2013-02-27") }
// { date: ISODate("2013-01-12") }
// { date: ISODate("2013-03-11") }
// { date: ISODate("2013-07-14") }
db.collection.aggregate([
{ $group: {
_id: { $dateTrunc: { date: "$date", unit: "quarter" } },
total: { $count: {} }
}}
])
// { _id: ISODate("2012-10-01"), total: 1 }
// { _id: ISODate("2013-01-01"), total: 3 }
// { _id: ISODate("2013-07-01"), total: 1 }
$dateTrunc truncates your dates at the beginning of their quarter (the truncation unit). It's kind of a modulo on dates per quarter.
Quarters in the output will be defined by their first day (Q3 2013 will be 2013-07-01). And you can always adapt it using $dateToString projection for instance.