MongoDB query group with 'sub' group - mongodb

From a product stocks log I have created a MongoDB collection. The relevant fields are: sku, stock and date. Every time a products stock is updated there is a new entry with the total stock.
The skus are made up of two parts. A parent part, say 'A' and a variant or child part, say '1', '2', '3', etc.. So a sku might look like this: 'A2'.
I can query for a single products stock, grouped by day, with this query:
[{
$match: {
sku: 'A2'
}
},
{
$group: {
_id: {
year: {$year: '$date'},
day: {$dayOfYear: '$date'}
},
stock: {
$min: '$stock'
},
date: {
$first: '$date'
}
}
},
{
$sort: {
date: 1
}
}]
Note: I want the minimum stock for each day.
But I need to query for all variations (minimum) stocks added up. I can change the $match object to:
[{
$match: {
sku: /^A/
}
}
How do I create a 'sub' group in the $group stage?
EDIT:
The data looks like this:
{
sku: 'A1',
date: '2015-01-01',
stock: 15
}
{
sku: 'A1',
date: '2015-01-01',
stock: 14
}
{
sku: 'A2',
date: '2015-01-01',
stock: 20
}
Two stocks for 'A1' and one for 'A2' on a single day. My query (all skus grouped by day) would give me stock 14 as a result ($min of the 3 values). But I want the result to be 34. 20 (min for A2) plus 14 (min for A1)

If you add the sku to the _id field in the group phase it will aggregate on that as well, i.e. group per sku, year & day.
db.stocks.aggregate(
[
{
$group: {
_id: {
sku: '$sku',
year: {$year: '$date'},
day: {$dayOfYear: '$date'}
},
stock: {
$min: '$stock'
},
date: {
$first: '$date'
}
}
},
{
$sort: {
date: 1
}
}]
)

Related

How to create time series of paying customers with MongoDB Aggregate?

I have a customers model:
const CustomerSchema = new Schema({
...
activeStartDate: Date,
activeEndDate: Date
...
}
Now I want to create an aggregate that creates a timeseries of active customers. So an output of:
[
{
_id: {year: 2022 month: 7}
activeCustomers: 500
},
...
]
The issue I cant figure out is how to get one customer document to count in multiple groups. A customer could be active for years, and therefore they should appear in multiple timeframes.
One option is:
Create a list of dates according to the months difference
$unwind to create a document per each month
$group by year and month and count the number of customers
db.collection.aggregate([
{$set: {
months: {$map: {
input: {
$range: [
0,
{$add: [
{$dateDiff: {
startDate: "$activeStartDate",
endDate: "$activeEndDate",
unit: "month"
}},
1]}
]
},
in: {$dateAdd: {
startDate: {$dateTrunc: {date: "$activeStartDate", unit: "month"}},
unit: "month",
amount: "$$this"
}}
}}
}},
{$unwind: "$months"},
{$group: {
_id: {year: {$year: "$months"}, month: {$month: "$months"}},
activeCustomers: {$sum: 1}
}}
])
See how it works on the playground example

mongodb: Select the latest entry in an embedded array

Given the following data of Items with a price history:
{
item: "Item A",
priceHistory: [
{
date: ISODate("2021-04-01T08:32:45.561Z"),
value: 100
},
{
date: ISODate("2021-04-02T08:32:45.561Z"),
value: 200
},
{
date: ISODate("2021-04-04T08:32:45.561Z"),
value: 400
},
{
date: ISODate("2021-04-03T08:32:45.561Z"),
value: 300
},
]
},{
item: "Item B",
priceHistory: [
{
date: ISODate("2021-04-01T08:32:45.561Z"),
value: 1
}
]
}, ...
Note that the priceHistory field is not sorted.
I want to find the latest price for each item:
{
item: "Item A",
price: 400
},{
item: "Item B",
price: 1
}, ...
Now I'm struggling to select the LATEST entry of the priceHistory
What I tried already
I know that I can use { $unwind: "$priceHistory" } to get a result for each entry in priceHistory.
With $max: "$priceHistory.date" I can find the latest date
I know that since MongoDB 4.4 there is $last to get the last item in an array -> not useful here since items are not in order
But I struggle to bring it all together.
On a side note, maybe the problem lies within the data model itself? Would it make sense to segregate price history into its own collection, and only store the latest price on the item itself?
Demo - https://mongoplayground.net/p/-LQPcTn_-Aj
db.collection.aggregate([
{ $unwind: "$priceHistory" }, // unwind to individual documents
{ $sort: { "priceHistory.date": -1 } }, // sort by priceHistory.date to get max date at the top (descending)
{
$group: {
_id: "$_id", // group by id back and get priceHistory sorted in descending order by date
price: { $first: "$priceHistory.value" }, // get the first price which is for max date record
item: { $first: "$item"}
}
}
])

Add numbers in an array in MongoDB

I have made the two following insertions into my document. I have been trying to find a way to show the total number of domestic students and the total number of international students for
the recorded years for every university. I tried using $sum aggregating by using in $project stage but I just get the answer 0. I am also not sure whether it is adding all the domestic students from each year and adding all the international students from each year.
db.universities.insertMany([
{country: "Australia", city: "Melbourne", name: "SUT",
domestic_students : [
{ year: 2014, number: 24774 },
{ year: 2015, number: 23166 },
{ year: 2016, number: 21913 },
{ year: 2017, number: 21715}],
international_students : [
{ year: 2014, number: 32178 },
{ year: 2015, number: 36780 },
{ year: 2016, number: 67899 },
{ year: 2017, number: 65321 }]
},
{country: "Australia", city: "Sydney", name: "UTS",
domestic_students : [
{ year: 2014, number: 67891 },
{ year: 2015, number: 56312 },
{ year: 2016, number: 45679 },
{ year: 2017, number: 71235}]
}
]);
You can sum over an array of numbers, but you can't sum over an array of subdocuments. For that you need $unwind.
$unwind "explodes" the array into different documents (mid-aggregation). So if you do:
$unwind: {
path: '$domestic_students',
preserveNullAndEmptyArrays: false
}
You'll end up with several documents that have a subdocument of domestic_student (not an array of subdocuments).
I think this does what you want:
db.universities.aggregate[{
$unwind: {
path: '$domestic_students',
preserveNullAndEmptyArrays: false
}
}, {
$unwind: {
path: '$international_students',
preserveNullAndEmptyArrays: false
}
}, {
$group: {
_id: '$country',
dtotal: {
$sum: '$domestic_students.number'
},
itotal: {
$sum: '$international_students.number'
}
}
}]
I like using MongoDB compass to help with aggregations, because I can see the stages and outcome from a sample:

MongoDB aggregation: $unwind after grouping by date

I have this model for purchases:
{
purchase_date: 2018-03-11 00:00:00.000,
total_cost: 400,
items: [
{
title: 'Pringles',
price: 200,
quantity: 2,
category: 'Snacks'
}
]
}
What I'm trying to do is to, first of all, to group the purchases by date, by doing so:
{$group: {
_id: {
date: $purchase_date,
items: '$items'
}
}}
However, now what I want to do is group the purchases of each day by items[].category and calculate how much was spent for each category in that day. I was able to do that with one day, but when I grouped each purchase by date I no longer able to $unwind the items.
I tried passing the path $items and it doesn't find it at all. If I try to use $_id.$items or _id.$items in both cases I get an error stating that it is not a valid path for $unwind.
You can use purchase_data and items.category as a grouping _id but you need to use $unwind on items before and then you can add another $group to get all groups per day
db.col.aggregate([
{ $unwind: "$items" },
{
$group: {
_id: {
purchase_date: "$purchase_date",
category: "$items.category",
},
total: { $sum: { $multiply: [ "$items.price", "$items.quantity" ] } }
}
},
{
$group: {
_id: "$_id.purchase_date",
categories: { $push: { name: "$_id.category", total: "$total" } }
}
}
])

MongoDB Aggregate for a sum on a per week basis for all prior weeks

I've got a series of docs in MongoDB. An example doc would be
{
createdAt: Mon Oct 12 2015 09:45:20 GMT-0700 (PDT),
year: 2015,
week: 41
}
Imagine these span all weeks of the year and there can be many in the same week. I want to aggregate them in such a way that the resulting values are a sum of each week and all its prior weeks counting the total docs.
So if there were something like 10 in the first week of the year and 20 in the second, the result could be something like
[{ week: 1, total: 10, weekTotal: 10},
{ week: 2, total: 30, weekTotal: 20}]
Creating an aggregation to find the weekTotal is easy enough. Including a projection to show the first part
db.collection.aggregate([
{
$project: {
"createdAt": 1,
year: {$year: "$createdAt"},
week: {$week: "$createdAt"},
_id: 0
}
},
{
$group: {
_id: {year: "$year", week: "$week"},
weekTotal : { $sum : 1 }
}
},
]);
But getting past this to sum based on that week and those weeks preceding is proving tricky.
The aggregation framework is not able to do this as all operations can only effectively look at one document or grouping boundary at a time. In order to do this on the "server" you need something with access to a global variable to keep the "running total", and that means mapReduce instead:
db.collection.mapReduce(
function() {
Date.prototype.getWeekNumber = function(){
var d = new Date(+this);
d.setHours(0,0,0);
d.setDate(d.getDate()+4-(d.getDay()||7));
return Math.ceil((((d-new Date(d.getFullYear(),0,1))/8.64e7)+1)/7);
};
emit({ year: this.createdAt.getFullYear(), week: this.createdAt.getWeekNumber() }, 1);
},
function(values) {
return Array.sum(values);
},
{
out: { inline: 1 },
scope: { total: 0 },
finalize: function(value) {
total += value;
return { total: total, weekTotal: value }
}
}
)
If you can live with the operation occuring on the "client" then you need to loop through the aggregation result and similarly sum up the totals:
var total = 0;
db.collection.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$createdAt" },
"week": { "$week": "$createdAt" }
},
"weekTotal": { "$sum": 1 }
}},
{ "$sort": { "_id": 1 } }
]).map(function(doc) {
total += doc.weekTotal;
doc.total = total;
return doc;
});
It's all a matter of whether it makes the most sense to you of whether this needs to happen on the server or on the client. But since the aggregation pipline has no such "globals", then you probably should not be looking at this for any further processing without outputting to another collection anyway.