MongoDB get count of field per season from MM/DD/YYYY date field - mongodb

I am facing a problem in MongoDB. Suppose, I have the following collection.
{ id: 1, issueDate: "07/05/2021", code: "31" },
{ id: 2, issueDate: "12/11/2020", code: "14" },
{ id: 3, issueDate: "02/11/2021", code: "98" },
{ id: 4, issueDate: "01/02/2021", code: "14" },
{ id: 5, issueDate: "06/23/2020", code: "14" },
{ id: 6, issueDate: "07/01/2020", code: "31" },
{ id: 7, issueDate: "07/05/2022", code: "14" },
{ id: 8, issueDate: "07/02/2022", code: "20" },
{ id: 9, issueDate: "07/02/2022", code: "14" }
The date field is in the format MM/DD/YYYY. My goal is to get the count of items with each season (spring (March-May), summer (June-August), autumn (September-November) and winter (December-February).
The result I'm expecting is:
count of fields for each season:
{ "_id" : "Summer", "count" : 6 }
{ "_id" : "Winter", "count" : 3 }
top 2 codes (first and second most recurring) per season:
{ "_id" : "Summer", "codes" : {14, 31} }
{ "_id" : "Winter", "codes" : {14, 98} }
How can this be done?

You should never store date/time values as string, store always proper Date objects.
You can use $setWindowFields opedrator for that:
db.collection.aggregate([
// Convert string into Date
{ $set: { issueDate: { $dateFromString: { dateString: "$issueDate", format: "%m/%d/%Y" } } } },
// Determine the season (0..3)
{
$set: {
season: { $mod: [{ $toInt: { $divide: [{ $add: [{ $subtract: [{ $month: "$issueDate" }, 1] }, 1] }, 3] } }, 4] }
}
},
// Count codes per season
{
$group: {
_id: { season: "$season", code: "$code" },
count: { $count: {} },
}
},
// Rank occurrence of codes per season
{
$setWindowFields: {
partitionBy: "$_id.season",
sortBy: { count: -1 },
output: {
rank: { $denseRank: {} },
count: { $sum: "$count" }
}
}
},
// Get only top 2 ranks
{ $match: { rank: { $lte: 2 } } },
// Final grouping
{
$group: {
_id: "$_id.season",
count: { $first: "$count" },
codes: { $push: "$_id.code" }
}
},
// Some cosmetic for output
{
$set: {
season: {
$switch: {
branches: [
{ case: { $eq: ["$_id", 0] }, then: 'Winter' },
{ case: { $eq: ["$_id", 1] }, then: 'Spring' },
{ case: { $eq: ["$_id", 2] }, then: 'Summer' },
{ case: { $eq: ["$_id", 3] }, then: 'Autumn' },
]
}
}
}
}
])
Mongo Playground

I will give you clues,
You need to use $group with _id as $month on issueDate, use accumulator $sum to get month wise count.
You can divide month by 3, to get modulo, using $toInt, $divide, then put them into category using $cond.

Another option:
db.collection.aggregate([
{
$addFields: {
"season": {
$switch: {
branches: [
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"06",
"07",
"08"
]
]
},
then: "Summer"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"03",
"04",
"05"
]
]
},
then: "Spring"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"12",
"01",
"02"
]
]
},
then: "Winter"
}
],
default: "No date found."
}
}
}
},
{
$group: {
_id: {
s: "$season",
c: "$code"
},
cnt1: {
$sum: 1
}
}
},
{
$sort: {
cnt1: -1
}
},
{
$group: {
_id: "$_id.s",
codes: {
$push: "$_id.c"
},
cnt: {
$sum: "$cnt1"
}
}
},
{
$project: {
_id: 0,
season: "$_id",
count: "$cnt",
codes: {
"$slice": [
"$codes",
2
]
}
}
}
])
Explained:
Add one more field for season based on $switch per month(extracted from issueDate string)
Group to collect per season/code.
$sort per code DESCENDING
group per season to form an array with most recurring codes in descending order.
Project the fields to the desired output and $slice the codes to limit only to the fist two most recurring.
Comment:
Indeed keeping dates in string is not a good idea in general ...
Playground

Related

Need help to MongoDB aggregate $group state

I have a collection of 1000 documents like this:
{
"_id" : ObjectId("628b63d66a5951db6bb79905"),
"index" : 0,
"name" : "Aurelia Gonzales",
"isActive" : false,
"registered" : ISODate("2015-02-11T04:22:39.000+0000"),
"age" : 41,
"gender" : "female",
"eyeColor" : "green",
"favoriteFruit" : "banana",
"company" : {
"title" : "YURTURE",
"email" : "aureliagonzales#yurture.com",
"phone" : "+1 (940) 501-3963",
"location" : {
"country" : "USA",
"address" : "694 Hewes Street"
}
},
"tags" : [
"enim",
"id",
"velit",
"ad",
"consequat"
]
}
I want to group those by year and gender. Like In 2014 male registration 105 and female registration 131. And finally return documents like this:
{
_id:2014,
male:105,
female:131,
total:236
},
{
_id:2015,
male:136,
female:128,
total:264
}
I have tried till group by registered and gender like this:
db.persons.aggregate([
{ $group: { _id: { year: { $year: "$registered" }, gender: "$gender" }, total: { $sum: NumberInt(1) } } },
{ $sort: { "_id.year": 1,"_id.gender":1 } }
])
which is return document like this:
{
"_id" : {
"year" : 2014,
"gender" : "female"
},
"total" : 131
}
{
"_id" : {
"year" : 2014,
"gender" : "male"
},
"total" : 105
}
Please guide to figure out from this whole.
db.collection.aggregate([
{
"$group": { //Group things
"_id": "$_id.year",
"gender": {
"$addToSet": {
k: "$_id.gender",
v: "$total"
}
},
sum: { //Sum it
$sum: "$total"
}
}
},
{
"$project": {//Reshape it
g: {
"$arrayToObject": "$gender"
},
_id: 1,
sum: 1
}
},
{
"$project": { //Reshape it
_id: 1,
"g.female": 1,
"g.male": 1,
sum: 1
}
}
])
Play
Just add one more group stage to your aggregation pipeline, like this:
db.persons.aggregate([
{ $group: { _id: { year: { $year: "$registered" }, gender: "$gender" }, total: { $sum: NumberInt(1) } } },
{ $sort: { "_id.year": 1,"_id.gender":1 } },
{
$group: {
_id: "$_id.year",
male: {
$sum: {
$cond: {
if: {
$eq: [
"$_id.gender",
"male"
]
},
then: "$total",
else: 0
}
}
},
female: {
$sum: {
$cond: {
if: {
$eq: [
"$_id.gender",
"female"
]
},
then: "$total",
else: 0
}
}
},
total: {
$sum: "$total"
}
},
}
]);
Here's the working link. We are grouping by year in this last step, and calculating the counts for gender conditionally and the total is just the total of the counts irrespective of the gender.
Besides #Gibbs mentioned in the comment which proposes the solution with 2 $group stages,
You can achieve the result as below:
$group - Group by year of registered. Add gender value into genders array.
$sort - Order by _id.
$project - Decorate output documents.
3.1. male - Get the size of array from $filter the value of "male" in "genders" array.
3.2. female - Get the size of array from $filter the value of "female" in "genders" array.
3.3. total - Get the size of "genders" array.
Propose this method if you are expected to count and return the "male" and "female" gender fields.
db.collection.aggregate([
{
$group: {
_id: {
$year: "$registered"
},
genders: {
$push: "$gender"
}
}
},
{
$sort: {
"_id": 1
}
},
{
$project: {
_id: 1,
male: {
$size: {
$filter: {
input: "$genders",
cond: {
$eq: [
"$$this",
"male"
]
}
}
}
},
female: {
$size: {
$filter: {
input: "$genders",
cond: {
$eq: [
"$$this",
"female"
]
}
}
}
},
total: {
$size: "$genders"
}
}
}
])
Sample Mongo Playground

MongoDB - Query calculation and group multiple items

Let's say I have this data:
{"Plane":"5546","Time":"55.0", City:"LA"}
{"Plane":"5548","Time":"25.0", City:"CA"}
{"Plane":"5546","Time":"6.0", City:"LA"}
{"Plane":"5548","Time":"5.0", City:"CA"}
{"Plane":"5555","Time":"15.0", City:"XA"}
{"Plane":"5555","Time":"8.0", City:"XA"}
and more but I just visualize the data
I want to calculate and group all the time and plane, this is expected output:
{"_id:":["5546","LA"],"Sum":2,"LateRate":1,"Prob"0.5}
The sum is sum all the time, Late is sum all the time with time > "15" and Prob is Late/Sum
The code I have tried but it still is missing something:
db.Collection.aggregate([
{
$project: {
Sum: 1,
Late: {
$cond: [{ $gt: ["$Time", 15.0] }, 1, 0]
},
prob:1
}
},
{
$group:{
_id:{Plane:"$Plane", City:"$City"},
Sum: {$sum:1},
Late: {$sum: "$Late"}
}
},
{
$addFields: {
prob: {
"$divide": [
"$Late",
"$Sum"
]
}
}
},
])
db.collection.aggregate([
{
$project: {
Time: 1,
Late: {
$cond: [
{
$gt: [
{
$toDouble: "$Time"
},
15.0
]
},
"$Time",
0
]
},
prob: 1,
Plane: 1,
City: 1
}
},
{
$group: {
_id: {
Plane: "$Plane",
City: "$City"
},
Sum: {
$sum: {
"$toDouble": "$Time"
}
},
Late: {
$sum: {
$toDouble: "$Late"
}
}
}
},
{
$addFields: {
prob: {
"$divide": [
"$Late",
"$Sum"
]
}
}
}
])
Project limits the fields passed to the next stage
On string, you cannot perform all relational/arithmetic operations
Playground

MongoDB sum of all fields with integer values

inside the aggregation framework, it's possibile in some way, for each document like this below:
{
"Title": "Number orders",
"2021-03-16": 3,
"2021-03-15": 6,
"2021-03-19": 1,
"2021-03-14": 19
}
Obtain a new document like this?
{
"Title": "Number orders",
"2021-03-16": 3,
"2021-03-15": 6,
"2021-03-19": 1,
"2021-03-14": 19
"Total": 29
}
Basically, I want a new field that have inside the sum of all the values of the fields that are integer.
Another thing to take in consideration is that the date fields are dynamic, so one week could be like the one in the example, the following week the fields would become like
{
"Title": "Number orders",
"2021-03-23": 3,
"2021-03-22": 6,
"2021-03-26": 1,
"2021-03-21": 19
}
Thanks!
Demo - https://mongoplayground.net/p/724nerJUQtK
$$ROOT is the entire document, add total using $addFields use $sum to add them up and remove allData using $unset
db.collection.aggregate([
{ $addFields: { allData: { "$objectToArray": "$$ROOT" } } } },
{ $addFields: { "total": { $sum: "$allData.v" } } },
{ $unset: "allData" }
])
Based on your older question, I think this might help:
db.collection.aggregate([
{
$group: {
_id: {
dDate: "$deliveryDay",
name: "$plate.name"
},
v: { $sum: "$plate.quantity" }
}
},
{
$group: {
_id: "$_id.name",
Total: { $sum: "$v" },
array: {
$push: { k: "$_id.dDate", v: "$v" }
}
}
},
{
$addFields: {
array: {
$concatArrays: [
[{ k: "Title", v: "Number orders" }],
"$array",
[{ k: "Total", v: "$Total" }]
]
}
}
},
{
$replaceRoot: {
newRoot: { $arrayToObject: "$array" }
}
}
])
Output:
/* 1 */
{
"Title" : "Number orders",
"2021-01-16" : 2,
"Total" : 2
},
/* 2 */
{
"Title" : "Number orders",
"2021-01-14" : 1,
"2021-01-16" : 3,
"Total" : 4
}

mongoDB aggregate with two percent by $group

My dataset :
{
"codepostal": 84000,
"siren": 520010234,
"type": "home"
},
{
"codepostal": 84000,
"siren": 0,
"type": "home"
},
{
"codepostal": 84000,
"siren": 450123003,
"type": "appt"
} ...
My pipeline (total is an integer) :
var pipeline = [
{
$match: { codepostal: 84000 }
},
{
$group: {
_id: { type: "$type" },
count: { $sum: 1 }
}
},
{
$project: {
percentage: { $multiply: ["$count", 100 / total] }
}
},
{
$sort: { _id: 1 }
}
];
Results :
[ { _id: { type: 'appt' }, percentage: 66 },
{ _id: { type: 'home' }, percentage: 34 } ]
Expected results is to count when "siren" is set to 0 or another number.
Count siren=0 => part
Count siren!=0 => pro
[ { _id: { type: 'appt' }, totalPercent: 66, proPercent: 20, partPercent: 80},
{ _id: { type: 'home' }, totalPercent: 34, proPercent: 45, partPercent: 55 } ]
Thanks a lot for your help !!
You can use $cond to get 0 or 1 for pro/part documents depending o value of siren field. Then it's easy to calculate totals for each type of document:
[
{
$match: { codepostal: 84000 }
},
{
$group: {
_id: { type: "$type" },
count: { $sum: 1 },
countPro: { $sum: {$cond: [{$eq:["$siren",0]}, 0, 1]} },
countPart: {$sum: {$cond: [{$eq:["$siren",0]}, 1, 0]} }
}
},
{
$project: {
totalPercent: { $multiply: ["$count", 100 / total] },
proPercent: { $multiply: ["$countPro", {$divide: [100, "$count"]}] },
partPercent: { $multiply: ["$countPart", {$divide: [100, "$count"]}] }
}
},
{
$sort: { _id: 1 }
}
]
Note that I used $divide to calculate pro/part percentage relative to the count of document within type group.
For your sample documents (total = 3) output will be:
[
{
"_id" : { "type" : "appt" },
"totalPercent" : 33.3333333333333,
"proPercent" : 100,
"partPercent" : 0
},
{
"_id" : { "type" : "home" },
"totalPercent" : 66.6666666666667,
"proPercent" : 50,
"partPercent" : 50
}
]

How to simplify this Aggregation Framework Query (with Date Formatting & Comparisons)

I already have a working query for what I need (included below), but I can't help but feel that there must be a better way to accomplish this. My requirements are fairly simple, but the resulting query itself is the definition of eye-bleed code.
Here's a sample document that we're iterating over (with irrelevant properties removed):
> db.Thing.find().limit(1).pretty()
{
"_id": ObjectId(...),
"created": ISODate(...),
"updated": ISODate(...)
}
My requirements for the query are:
Only match on Things where created > updated.
Group on the YYYY-MM value of the created field, and reduce to a count.
Output should look like the following:
{ "count": 93592, "month": "2014-06" },
{ "count": 81629, "month": "2014-07" },
{ "count": 126183, "month": "2014-08" },
...
Again, this feels like it should be really simple. Here's my correctly functioning query that currently does this:
db.Thing.aggregate([
{ $project: {
cmpDates: { $cmp: ['$created', '$updated'] },
created: '$created'
}},
{ $match: {
cmpDates: { $ne: 0 }
}},
{ $project: {
month: {
$concat: [
{ $substr: [ { $year: '$created' }, 0, 4 ] },
'-',
{ $cond: [
{ $lte: [ { $month: '$created' }, 9 ] },
{ $concat: [
'0',
{ $substr: [ { $month: '$created' }, 0, 2 ] }
]},
{ $substr: [ { $month: '$created' }, 0, 2 ] }
] }
]
},
_id: 0
}},
{ $group: {
_id: '$month',
count: { $sum: 1 }
}},
{ $project: {
month: '$_id',
count: 1,
_id: 0
}},
{ $sort: { month: 1 } }
]);
My question: Can this query be simplified, and if so, how?
Thanks!
Try this:
db.test.aggregate([
{ "$project" : {
"cmpDates" : { "$cmp" : ["$created", "$updated"] },
"createdYear" : { "$year" : "$created" },
"createdMonth" : { "$month" : "$created" }
} },
{ "$match" : { "cmpDates" { "$ne" : 0 } } },
{ "$group" : {
"_id" : { "y" : "$createdYear", "m" : "$createdMonth" },
"count" : { "$sum" : 1 }
} }
])
The big difference is that I used a compound key for the group, so I'm grouping the pair (year, month) instead of constructing a string value YYYY-MM to accomplish the same purpose.