How to create time series of paying customers with MongoDB Aggregate? - mongodb

I have a customers model:
const CustomerSchema = new Schema({
...
activeStartDate: Date,
activeEndDate: Date
...
}
Now I want to create an aggregate that creates a timeseries of active customers. So an output of:
[
{
_id: {year: 2022 month: 7}
activeCustomers: 500
},
...
]
The issue I cant figure out is how to get one customer document to count in multiple groups. A customer could be active for years, and therefore they should appear in multiple timeframes.

One option is:
Create a list of dates according to the months difference
$unwind to create a document per each month
$group by year and month and count the number of customers
db.collection.aggregate([
{$set: {
months: {$map: {
input: {
$range: [
0,
{$add: [
{$dateDiff: {
startDate: "$activeStartDate",
endDate: "$activeEndDate",
unit: "month"
}},
1]}
]
},
in: {$dateAdd: {
startDate: {$dateTrunc: {date: "$activeStartDate", unit: "month"}},
unit: "month",
amount: "$$this"
}}
}}
}},
{$unwind: "$months"},
{$group: {
_id: {year: {$year: "$months"}, month: {$month: "$months"}},
activeCustomers: {$sum: 1}
}}
])
See how it works on the playground example

Related

How can I aggregate MongoDB documents by hour?

I have documents in a collection called "files" in MongoDB 5.0.2. This collection is for user uploaded content. I want to get a count of the number of files uploaded within the last 24 hours at a 1 hour interval.
For example, user uploaded 3 files at 13:00, 2 files at 14:00, 0 files at 15:00, and so on.
I already store a "timestamp" field in my Mongo documents which is an ISODate object from Javascript.
I have looked at countless stackoverflow questions but I cannot find anything that fits my needs or that I understand.
Would be this one:
{ $match: {timestamp: {$gt: moment().startOf('hour').subtract(24, 'hours').toDate() } } },
{ $group:
_id: {
y: {$year: "$timestamp"},
m: {$month: "$timestamp"},
d: {$dayOfMonth: "$timestamp"},
h: {$hour: "$timestamp"},
},
count: ...
}
or in Mongo version 5.0 you can do
{ $group:
_id: { $dateTrunc: { date: "$timestamp", unit: "hour" } },
count: ...
}
For any datetime related operations I recommend the moment.js library

MongoDB query group with 'sub' group

From a product stocks log I have created a MongoDB collection. The relevant fields are: sku, stock and date. Every time a products stock is updated there is a new entry with the total stock.
The skus are made up of two parts. A parent part, say 'A' and a variant or child part, say '1', '2', '3', etc.. So a sku might look like this: 'A2'.
I can query for a single products stock, grouped by day, with this query:
[{
$match: {
sku: 'A2'
}
},
{
$group: {
_id: {
year: {$year: '$date'},
day: {$dayOfYear: '$date'}
},
stock: {
$min: '$stock'
},
date: {
$first: '$date'
}
}
},
{
$sort: {
date: 1
}
}]
Note: I want the minimum stock for each day.
But I need to query for all variations (minimum) stocks added up. I can change the $match object to:
[{
$match: {
sku: /^A/
}
}
How do I create a 'sub' group in the $group stage?
EDIT:
The data looks like this:
{
sku: 'A1',
date: '2015-01-01',
stock: 15
}
{
sku: 'A1',
date: '2015-01-01',
stock: 14
}
{
sku: 'A2',
date: '2015-01-01',
stock: 20
}
Two stocks for 'A1' and one for 'A2' on a single day. My query (all skus grouped by day) would give me stock 14 as a result ($min of the 3 values). But I want the result to be 34. 20 (min for A2) plus 14 (min for A1)
If you add the sku to the _id field in the group phase it will aggregate on that as well, i.e. group per sku, year & day.
db.stocks.aggregate(
[
{
$group: {
_id: {
sku: '$sku',
year: {$year: '$date'},
day: {$dayOfYear: '$date'}
},
stock: {
$min: '$stock'
},
date: {
$first: '$date'
}
}
},
{
$sort: {
date: 1
}
}]
)

Is it possible to use a MongoDB aggregate query with date matching?

Im trying to get the count of certain items grouped on certain dates.
This is working using the following aggregate query:
// this query works, without matching dates
[
{'$match': {
'some_id': ObjectId('foobar'),
'some_boolean_value': true
}
},
{'$project':
{'day':
{'$substr': ['$some_date', 0, 10]}}
},
{'$group': {_id: '$day', count: { '$sum': 1 }}},
{'$sort': {_id: -1}}
]
The next step is that I want to use this query but with date limits.
I want the count, grouped per day, between certain date limits.
// the query below does not work as soon as date matching is added
// this query always return 0 documents
[
{'$match': {
'some_id': ObjectId('foobar'),
'some_boolean_value': true,
'some_date':
{
'$gte': '2015-08-01T00:00:00.000Z',
'$lte': '2015-08-31T23:59:59.999Z'
}
}
},
{'$project':
{'day':
{'$substr': ['$some_date', 0, 10]}}
},
{'$group': {_id: '$day', count: { '$sum': 1 }}},
{'$sort': {_id: -1}}
]
You want to filter documents and match only those in a specified datetime window. But you use string comparison instead of date comparison.
Therefore replace this:
'$gte': '2015-08-01T00:00:00.000Z',
'$lte': '2015-08-31T23:59:59.999Z'
with this:
'$gte': new Date('2015-08-01T00:00:00.000Z'),
'$lte': new Date('2015-08-31T23:59:59.999Z')

Doing a sum with mongo db aggregation framework

I have the following kind of docs in a collection in mongo db
{ _id:xx,
iddoc:yy,
type1:"sometype1",
type2:"sometype2",
date:
{
year:2015,
month:4,
day:29,
type:"day"
},
count:23
}
I would like to do a sum over the field count grouping by iddoc for all docs where:
type1 in ["type1A","type1B",...]
where type2 in ["type2A","type2B",...]
date.year: 2015,
date.month: 4,
date.type: "day"
date.day between 4 and 7
I would like then to sort these sums.
I think this is probably easy to do within mongo db aggregation framework but I am new to it and would appreciate a tip to get started.
This is straightforward to do with an aggregation pipeline:
db.test.aggregate([
// Filter the docs based on your criteria
{$match: {
type1: {$in: ['type1A', 'type1B']},
type2: {$in: ['type2A', 'type2B']},
'date.year': 2015,
'date.month': 4,
'date.type': 'day',
'date.day': {$gte: 4, $lte: 7}
}},
// Group by iddoc and count them
{$group: {
_id: '$iddoc',
sum: {$sum: 1}
}},
// Sort by sum, descending
{$sort: {sum: -1}}
])
If I understood you correctly:
db.col.aggregate
(
[{
$match:
{
type1: {$in: ["type1A", type1B",...]},
type2: {$in: ["type2A", type2B",...]},
"date.year": 2015,
"date.month": 4,,
"date.day": {$gte: 4, $lte: 7},
"date.type": "day"
}
},
{
$group:
{
_id: "$iddoc",
total_count: {$sum: "$count"}
}
},
{ $sort: {total_count: 1}}]
)
This is filtering the field date.day between 4 and 7 inclusive (if not, use $gt and $lt to exclude them). And it sorts results from lower to higher (ascending), if you want to do a descending sort, then:
{ $sort: {total_count: -1}}

Analog for group concat in sql

In an aggregation process I've got this data:
{
"_id" : "billing/DefaultController/actionIndex",
"min_time" : 0.033,
"max_time" : 5.25,
"exec_time" : 555.490999999997,
"qt" : 9059,
"count" : 2,
"date" : [
ISODate("2014-02-10T00:00:00.000Z"),
ISODate("2014-02-11T00:00:00.000Z")
]
},
How to change my query:
db.page_speed_reduced.aggregate([
{$group: {
_id: "$value.route",
min_time: {$min: "$value.min_time"},
max_time: {$max: "$value.max_time"},
exec_time: {$sum: "$value.exec_time"},
qt: {$sum: "$value.qt"},
count: {$sum: NumberInt(1)},
date: {$push: "$_id.date"},
}}
]);
for getting "$date" as concatenated string:
2014-02-10, 2014-02-11
UPDATE:
I tried this variant, but mongodb generated the error:
db.page_speed_reduced.aggregate([
{$group: {
_id: "$value.route",
min_time: {$min: "$value.min_time"},
max_time: {$max: "$value.max_time"},
exec_time: {$sum: "$value.exec_time"},
qt: {$sum: "$value.qt"},
count: {$sum: NumberInt(1)},
date: {$push: "test sting"},
}},
{$project: {
'date': {$concat: ['$date']}
//'date': {$concat: '$date'} //some error
}}
]);
uncaught exception: aggregate failed: {
"errmsg" : "exception: $concat only supports strings, not Array",
"code" : 16702,
"ok" : 0
}
'date': {$concat: '$date'}
As per comments so far it is unclear what you are grouping or what you want as the end result, other than to say that you want to get your dates concatenated into something like "just the day" with no hours or minutes together. Presumably you want those distinct days for some purpose.
There are various Date Operators in the pipeline you can use on dates, and the is the $concat operator as well. Unfortunately all of the Date Operators produce an integer as their result, and for the sort of Date string you want, $concat will only work with strings. The other problem being that you cannot cast the integer into a string type within aggregation.
But you can use sub-documents, here we'll just work with the date:
db.record.aggregate([
// Unwind the array to work with it
{$unwind: "$date"},
// project into our new 'day' document
{$project:{
day: {
year: {$year: "$date"},
month: {$month: "$date"},
day: {$dayOfMonth: "$date"}
}
} },
// optionalally sort if date order is important [ oldest -> newest ]
{$sort: { "day.year": -1, "day.month": -1, "day.day": -1}},
// Wind back unique values into the array
{$group: {_id:"$_id", days: {$addToSet: "$day"} }}
])
So, it's not a string, but it can easily be post-processed into one, but most importantly it's grouped and sortable.
The principles remain the same if you want the unique dates this way as an array at the end or whether you want to group totals by those dates. So primarily keep in mind the $unwind and $project parts using the date operators.
--EDIT--
With thanks to the community as shown in this post there is this undocumented behavior of $substr, in which integers can be cast as strings.
db.record.aggregate([
// Unwind the array to work with it
{$unwind: "$date"},
// project into our new 'day' document
{$project:{
day: {
year: {$year: "$date"},
month: {$month: "$date"},
day: {$dayOfMonth: "$date"}
}
} },
// optionalally sort if date order is important [ oldest -> newest ]
{$sort: { "day.year": -1, "day.month": -1, "day.day": -1}},
// now we are going to project to a string ** magic #heinob **
{$project: {
day: {$concat: [
{$substr: [ "$day.year", 0, 4 ]},
"-",
{$substr: [ "$day.month", 0, 2 ]},
"-",
{$substr: [ "$day.day", 0, 2 ]}
]}
}},
// Wind back unique values into the array
{$group: {_id:"$_id", days: {$addToSet: "$day"} }}
])
And now the days are strings. As I noted before, if the ordering is important to you then the best approach is to project into a document type as has been done and sort on the numeric keys. Naturally the $project that transforms the date can be wound into the $group stage for brevity, which is probably what you want to do when working with the whole document.
This link might give you a hint:
http://docs.mongodb.org/manual/reference/operator/aggregation/concat/
year: {$concat: [ $year ]}