Save each month, each week etc. MongoDB logic - mongodb

I'm having a hard time wrapping my head around how to logically solve an issue of mine regarding data that is being read from an API and inserted into MongoDB.
Let's say I have a field called "apples", that changes in amount from month to month, due to seasonal effects, and I want to record these changes up to 6 months back, what do I do? Obviously I can't save new values for months that have passed, but looking forward, what can I do to save Novembers value for November and then Decembers value for December?
I would like to use NodeJS for this btw.
Sorry if I am unclear, it was even hard to explain!
Kind regards,
Erik

It sounds like you want to group things together. There is this thing called aggregation framework in mongodb.
There are a lot of things which you can do with it and one of them is grouping.
More on that you can read in $group
You can insert each apple (document) separately for the given date.
So for example:
In "2017-11-26T16:00:00Z" we have 6 apples and price 15
In "2017-11-25T16:00:00Z" we have 4 apples and price 16
In "2017-10-25T16:00:00Z" we have 9 apples and price 30
1
Lets say we have these three entries:
/* 1 */
{
"_id" : ObjectId("5a1adc774d8a2fe38bec83e4"),
"date" : ISODate("2017-11-26T16:00:00.000Z"),
"apples" : 6,
"price" : 15
}
/* 2 */
{
"_id" : ObjectId("5a1adc924d8a2fe38bec83e8"),
"date" : ISODate("2017-11-25T16:00:00.000Z"),
"apples" : 4,
"price" : 16
}
/* 3 */
{
"date" : ISODate("2017-10-25T16:00:00.000Z"),
"apples" : 9,
"price" : 30
}
Now we want to group them by month and sum the apples per month we could do the following:
db.yourCollection.aggregate([
{
$project:
{
month: { $month: "$date" },
apples: 1, // here we just assign the value of apples. There is no change here
price: 1 // also just assigning the value to price. Nothing is happening here.
}
},
{
$group: // grouping phase
{
_id: "$month", // This is what we group by
monthApples: {$sum: "$apples"} // here we sum the apples per month
monthPrice: {$sum: "$price"} // here we sum the price for each month
}
}
])
In the $project we can make use of date aggregation operators.
The above aggregation pipeline would result to this:
/* 1 */
{
"_id" : 10, // month (October)
"monthApples" : 9 // sum of apples
"monthPrice" : 30 // sum of price for month 10
}
/* 2 */
{
"_id" : 11, // month (November)
"monthApples" : 10 // sum of apples
"monthPrice" : 31 // sum of price for month 11
}
2
Now imagine we have the apple type also saved in the database.
/* 1 */
{
"_id" : ObjectId("5a1adc774d8a2fe38bec83e4"),
"date" : ISODate("2017-11-26T16:00:00.000Z"),
"apples" : 6,
"price" : 15,
"appleType" : "Goldrush"
}
/* 2 */
{
"_id" : ObjectId("5a1adc924d8a2fe38bec83e8"),
"date" : ISODate("2017-11-25T16:00:00.000Z"),
"apples" : 4,
"price" : 16,
"appleType" : "Pink Lady"
}
/* 3 */
{
"_id" : ObjectId("5a1b1c144d8a2fe38bec8a56"),
"date" : ISODate("2017-10-25T16:00:00.000Z"),
"apples" : 9,
"price" : 30,
"appleType" : "Pink Lady"
}
We could group for example by apple type like that.
db.yourCollection.aggregate([
{
$project:
{
apples: 1, // here we just assign the value of apples. There is no change here
price: 1, // also just assigning the value to price. Nothing is happening here.
appleType: 1
}
},
{
$group: // grouping phase
{
_id: "$appleType", // group by appletype
monthApples: {$sum: "$apples"}, // here we sum the apples per month
monthPrice: {$sum: "$price"} // here we sum the price for each month
}
}
])

One of the possible way to model this data will be creating a document for each product that will store it's pricing history for a month:
{
product: "apple",
amount:[
{day: ISODate("2017-11-01T00:00:00.000Z"), price: 24},
{day: ISODate("2017-11-02T00:00:00.000Z"), price: 20},
{day: ISODate("2017-11-03T00:00:00.000Z"), price: 19},
{day: ISODate("2017-11-03T00:00:00.000Z"), price: 25}
],
quality: "best"
}

Related

How to find the amount of objects that user have in particular date?

I am learning mongo and I am trying to provide two metrics for given user in given time range. Precisely I need to calculate this kind of array of objects that represents state of the backpack for the particular day:
{
data: [
{ date: '2020-01-01', itemsCount: 1, itemsSize: 5 },
{ date: '2020-01-02', itemsCount: 3, itemsSize: 12 },
...
]
}
where itemsCount is the total number of all user items and itemsSize is the sum of sizes of all items.
I have a mongodb collection of four types of events with the structure as below:
{
type: "backpack.created" // type of event
backpackId: 1,
timestamp: 1604311699, // timestamp in seconds when event occurred
ownerId: 1,
size: 15, // sum of sizes of all items located in the backpack
itemsCount: 5 // number of items in the backpack
}
{
type: "backpack.owner.changed",
timestamp: 1604311699,
newOwnerId: 2,
backpackId: 1,
}
{
type: "backpack.deleted",
backpackId: 1,
timestamp: 1604311699,
}
{
type: "backpack.updated",
backpackId: 1,
size: 5,
itemsCount: 25,
timestamp: 1604311699,
}
First idea to solve the problem was to load all the events for given user and time range in memory and do calculations, but this sounds terrible to my memory. So I am wondering how to do such a query that will provide me given metrics? And is it possible to do it with mongo? I do not know how to handle ownership changes in this.
Note: backpack created and deleted same day means it's contribution for that day is 0.
I do not believe what you wish to do, which is create a cross-backpack position by day, is fully served by a mongodb pipeline. The reason is that you need to track state day over day so that when, say, 3 days from now a backpack.deleted event occurs, you know how much to delete from the running aggregate position.
That said, mongodb can help you in 2 ways:
Act as a master filter of events for a range and excluding owner.changed which does not affect position.
A convenient "last event" of the day generator. Since update has new total levels, not incremental, the last update of the day is the new position; if the last event is delete, the position for that backpack becomes zero.
var sdate = new ISODate("2020-11-01");
var edate = new ISODate("2020-12-01");
c=db.foo.aggregate([
// Convert timestamp into something more filterable:
{$addFields: {D: {$toDate: {$multiply:[1000,"$timestamp"]} } }}
// Use DB to do what it does best: filter!
,{$match: {type: {$ne: 'backpack.owner.changed'},
D: {$gte: sdate, $lt: edate}
}}
// Ensure material is coming out date DESCENDING (most recent first)
// to properly set up for the $group/$first to follow:
,{$sort: {D:-1}}
// Since the timestamps include hours/mins/seconds and we only
// care about day, just turn it into string. In mongodb 5.0,
// you should use $dateTrunc to set H:H:S to 00:00:00.
,{$group: {_id: {
D: {$dateToString: {format: '%Y-%m-%d', date:'$D'}},
B: '$backpackId'
}
// Thanks to the $sort above, regardless of the $group set
// ordering of date + backpackId, taking the $first is the
// last one for that particular day:
, Lsize: {$first: '$size'}
, LitemsCount: {$first: '$itemsCount'}
, Laction: {$first: '$type'}
}}
// Now, group *again* to reorganize the content by date alone.
// This makes it easy for the client to pick up a cursor of
// dates which is the intent of the day-to-day position
// building:
,{$group: {_id: '$_id.D',
X: {$push: {B:'$_id.B'
, Lsize: '$Lsize'
, LitemsCount: '$LitemsCount'
, Laction: '$Laction'}
}
}}
// ...and of course sort by date so the client can easily
// walk forward on the cursor by date:
,{$sort: {'_id':1}}
]);
At this point you end up with something like this (there are more events in this output than the OP from my tests):
{
"_id" : "2020-11-02",
"X" : [
{
"B" : 3,
"Lsize" : 3,
"LitemsCount" : 35,
"Laction" : "backpack.created"
},
{
"B" : 2,
"Lsize" : 13,
"LitemsCount" : 9,
"Laction" : "backpack.created"
},
{
"B" : 1,
"Lsize" : 8,
"LitemsCount" : 28,
"Laction" : "backpack.updated"
}
]
}
{
"_id" : "2020-11-03",
"X" : [
{
"B" : 2,
"Lsize" : 7,
"LitemsCount" : 11,
"Laction" : "backpack.updated"
}
]
}
{
"_id" : "2020-11-04",
"X" : [
{
"B" : 1,
"Lsize" : null,
"LitemsCount" : null,
"Laction" : "backpack.deleted"
}
]
}
{
"_id" : "2020-11-05",
"X" : [
{
"B" : 3,
"Lsize" : null,
"LitemsCount" : null,
"Laction" : "backpack.deleted"
}
]
}
It is left as an exercise to the reader to walk this cursor and for each date+backpackId, accumulate a sum of size and itemsCount by backpackId. Any time a deleted event is hit, on that day the sum goes to zero. To get size and itemsCount from all the backpacks, simply ask for all the sums on a given date.
Moving the agg logic out of MongoDB also makes it easier to represent date aggregates for which there is no material, e.g.:
{ date: '2020-01-01', itemsCount: 1, itemsSize: 5 },
{ date: '2020-01-02', itemsCount: 0, itemsSize: 0 },
{ date: '2020-01-03', itemsCount: 0, itemsSize: 0 },
{ date: '2020-01-04', itemsCount: 6, itemsSize: 21},
...

Convert month from number to string question in Mongodb query

I am trying to get some avg number per month in the financial year. The collection is called test and the month data comes from CreateDate field. I want to get the avg price per month. The collection data is like below:
{
"_id" : ObjectId("5fd289a93f7cf02c36837ca7"),
"ClientName" : "John",
"OrderNumber" : "12345A",
"Price" : 10,
"CreateDate" : ISODate("2020-09-20T06:00:00.000Z"),
}
{
"_id" : ObjectId("5fd289a93f7cf02c36837cc7"),
"ClientName" : "John",
"OrderNumber" : "12345",
"Price" : 20,
"CreateDate" : ISODate("2020-09-12T06:00:00.000Z"),
}
So I am writing the query to get the avg number per month by the following within the financial year (from Sep to Aug):
db.test.aggregate([
{
$match: {
"CreateDate": {
$lt: ISODate("2021-08-31T00:00:00.000Z"),
$gte: ISODate("2020-09-01T00:00:00.000Z")
}
}
},
{
$group: {
_id: {$month: "$CreateDate"},
"AvgPrice": {
"$avg": "$Price",
}
}
},
{ $project:{ _id : 0 , Month: '$_id' , "AvgPrice ": '$AvgPrice' } }
])
The result I am getting is with the following format:
{
"Month" : 9,
"AvgPrice " : 15.0
}
{
"Month" : 10,
"AvgPrice " : 18.6666666666667
}
How can I display of the month converting to a string instead of the number. For example, the following is the ideal return:
{
"Month" : Sep,
"AvgPrice" : 15.0
}
{
"Month" : Oct,
"AvgPrice" : 18.6666666666667
}
I also have two more questions:
I am using the Mongodb 3.6 version, is there any way to round up the avg price to two digit after the decimal point? For example, above will be "18.67" instead of "18.66666". Mongo 4.2 has something called $round but 3.6 seems doesn't have this function.
If I want to break down by client, has the returning result like below:
{
"ClientName": "John",
"Month" : Sep,
"AvgPrice" : 15.0
}
{
"ClientName" : "Mary"
"Month" : Oct,
"AvgPrice" : 18.6666666666667
}
How do I put another level of the group to breakdown to the client level and then month level?
Any help will be appreciated!
If I want to break down by client
You can add ClientName field in _id,
{
$group: {
_id: {
ClientName: "$ClientName",
month: { $month: "$CreateDate" }
},
AvgPrice: { $avg: "$Price" }
}
},
How can I display of the month converting to a string instead of the number.
There is no any straight way to get month name in mongodb, but if you prepare array of months in string and access it by index,
$arrayElemAt to select month by its number
{
$project: {
_id: 0,
ClientName: "$_id.ClientName",
Month: {
$arrayElemAt: [
["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],
"$_id.month"
]
},
AvgPrice: 1
}
}
Playground
I am using the Mongodb 3.6 version, is there any way to round up the avg price to two digit after the decimal point?
There is no any option in mongodb 3.6 or below, you already know there is a option $round in mongodb 4.2.
You can refer this question Rounding to 2 decimal places using MongoDB aggregation framework
, there are many tricks.

Group and sum day by day

This is how my collection structure looks like:
{
"_id" : ObjectId("57589d2a9108dace306602b8"),
"IDproject" : NumberLong(53),
"email" : "john.doe#gmail.com",
"dc" : ISODate("2016-06-06T22:33:13.000Z")
}
{
"_id" : ObjectId("57589d2a9108dace306602b8"),
"IDproject" : NumberLong(53),
"email" : "david.doe#gmail.com",
"dc" : ISODate("2016-06-07T22:33:13.000Z")
}
{
"_id" : ObjectId("57589d2a9108dace306602b8"),
"IDproject" : NumberLong(53),
"email" : "elizabeth.doe#gmail.com",
"dc" : ISODate("2016-06-078T22:33:13.000Z")
}
As you can see, there are two customers added on June 7th and one on June 6th. I would like to group and sum these results for the last 30 days.
It should looks something like this:
{
"dc" : "2016-06-05"
"total" : 0
}
{
"dc" : "2016-06-06"
"total" : 1
}
{
"dc" : "2016-06-07"
"total" : 2
}
As, you can see, there are no records on June 6th, so it's zero. It should be zero for June 5th, etc.
That would be the case #1, and the case #2 are following results:
{
"dc" : "2016-06-05"
"total" : 0
}
{
"dc" : "2016-06-06"
"total" : 1
}
{
"dc" : "2016-06-07"
"total" : 3
}
I've tried this:
db.getCollection('customer').aggregate([
{$match : { IDproject : 53}},
{ $group: { _id: "$dc", total: { $sum: "$dc" } } }, ]);
But seems complicated. I'm first time working with noSQL database.
Thanks.
Here's how you will get daily counts (the common idiom for row count is {$sum: 1}).
However, you cannot obtain zeros for days that are lacking data – because there is no data that would give the grouping key for these days. You must handle these cases in PHP by generating a list of desided dates and then looking if there's data for that each date.
db.getCollection('customer').aggregate([
{$match : { IDproject : 53}},
{$group: {
_id: {year: {$year: "$dc"}, month: {$month: "$dc"}, day: {$dayOfMonth: "$dc"}}},
total: {$sum: 1}
}},
]);
Note that MongoDB only operates in the UTC timezone; there are no aggregation pipeline operators that can convert timestamps to local timezones reliably. The $year, $month and $dayOfMonth operators give the date in UTC which may not be the same as in the local timezone. Solutions include:
saving timestamps in the local timezone (= lying to MongoDB that they are in UTC),
saving the timezone offset with the timestamp,
saving the local year, month and dayOfMonth with the timestamp.

Counting number of records that where date is in date range?

I have a collection with documents like below:
{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-05T00:00:00Z")},
{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")},
{startDate: ISODate("2016-01-07T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")}
I would like to return a record for every date between the minimum startDate and the maximum endDate. Along with each of these records I would like to return a count of the number of records where the startDate and endDate contain this date.
So for my above example the min startDate is 1/2/2016 and the max endDate is 1/10/2016 so I would like to return all dates between those two along with the counts. See desired output below:
{date: ISODate("2016-01-02T00:00:00Z"), count: 2}
{date: ISODate("2016-01-03T00:00:00Z"), count: 2}
{date: ISODate("2016-01-04T00:00:00Z"), count: 2}
{date: ISODate("2016-01-05T00:00:00Z"), count: 4}
{date: ISODate("2016-01-06T00:00:00Z"), count: 3}
{date: ISODate("2016-01-07T00:00:00Z"), count: 4}
{date: ISODate("2016-01-08T00:00:00Z"), count: 4}
{date: ISODate("2016-01-09T00:00:00Z"), count: 2}
{date: ISODate("2016-01-010T00:00:00Z"), count: 2}
Please let me know if this doesn't make sense and I can try to explain in more detail.
I am able to do this using a loop like below:
var startDate = ISODate("2016-01-02T00:00:00Z")
var endDate = ISODate("2016-02-10T00:00:00Z")
while(startDate < endDate){
var counts = db.data.find(
{
startDate: {$lte: startDate},
endDate: {$gte: startDate}
}
).count()
print(startDate, counts)
startDate.setDate(startDate.getDate() + 1)
}
But i'm wondering if there is a way to do this using the aggregation framework? I come from a mostly SQL background where looping to get data is often a bad idea. Does this same rule apply for MongoDB? Should I be concerned about using looping here and try to use the aggregation framework or is this a valid solution?
Your best bet here is mapReduce. This is because you can loop values in between "startDate" and "endDate" within each document and emit for each day ( or other required interval ) between those values. Then it is just a matter of accumulating per emitted date key from all data:
db.collection.mapReduce(
function() {
for( var d = this.startDate.valueOf(); d <= this.endDate.valueOf(); d += 1000 * 60 * 60 * 24 ) {
emit(new Date(d), 1)
}
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
This produces results like this:
{
"results" : [
{
"_id" : ISODate("2016-01-02T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-03T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-04T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-05T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-06T00:00:00Z"),
"value" : 3
},
{
"_id" : ISODate("2016-01-07T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-08T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-09T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-10T00:00:00Z"),
"value" : 2
}
],
"timeMillis" : 35,
"counts" : {
"input" : 5,
"emit" : 25,
"reduce" : 9,
"output" : 9
},
"ok" : 1
}
Your dates are rounded to a day in the sample, but if they were not in real data then it is just a simple matter of date math to be applied in order to round per interval.
In mongodb aggregate framework there are stages instead of loop. It is a pipeline and it goes through each stage until it reaches the last stage specified. That is why you see a [] when using aggregate framework. there are several stages, to name a few (match, group and project). Take a look at their document it is quite simple. anyways that was very brief. As for your question here is my proposition:
I have not tried this. If you can try this and let me know if it works:
First you only keep those with dates in the range you desire using $match. Then follow that with the $group stage.
Example:
db.collection.aggregate{
[
{$match: {
$and : [
{startDate: {$gte:ISODate("2016-01-02T00:00:00Z")},
{endDate: {$lte:ISODate("2016-02-10T00:00:00Z")}
]
},
{$group:
{_id: {startDate:"$startDate",endDate:"$endDate"},
count:{$sum:1}
}
}
]
}
If you want to just group using startDate as in you example replace
_id: {startDate:"$startDate",endDate:"$endDate"
with this:
_id: "$startDate"
I hope that helps

mongo query select only first of month

is it possible to query only the first (or last or any single?) day of the month of a mongo date field.
i use the $date aggregation operators regularly but within a $group clause.
basically i have field that is already aggregated (averaged) for each day of the month. i want to select only one of these days (with the value as a representative of the entire month.)
following is a sample of a record set from jan 1, 2014 to feb 1, 2015 with price as the daily price and 28day_avg as the trailing monthly average for 28 days.
{ "date" : ISODate("2014-01-01T00:00:00Z"), "_id" : ObjectId("533b3697574e2fd08f431cff"), "price": 59.23, "28day_avg": 54.21}
{ "date" : ISODate("2014-01-02T00:00:00Z"), "_id" : ObjectId("533b3697574e2fd08f431cff"), "price": 58.75, "28day_avg": 54.15}
...
{ "date" : ISODate("2015-02-01T00:00:00Z"), "_id" : ObjectId("533b3697574e2fd08f431cff"), "price": 123.50, "28day_avg": 122.25}
method 1.
im currently running an aggregation using $month data (and summing the price) but one issue is im seeking to retrieve the underlying date value ISODate("2015-02-01T00:00:00Z") versus the 0,1,2 value that comes with several of the date aggregations (that loop at the first of the week, month, year). mod(28) on a date?
method 2
i'd like to simply pluck out a single record of the 28day_avg as representative of the period. the 1st of the month would be adequate
the desired output is...
_id: ISODate("2015-02-01T00:00:00Z"), value: 122.25,
_id: ISODate("2015-01-01T00:00:00Z"), value: 120.78,
_id: ISODate("2014-12-01T00:00:00Z"), value: 118.71,
...
_id: ISODate("2014-01-01T00:00:00Z"), value: 53.21,
of course, the value will vary from method 1 to method 2 but that is fine. one is 28 days trailing while the other will account for 28, 30, 31 day months...dont care about that so much.
A non-agg is ok but also doesnt work. aka {"date": { "$mod": [ 28, 0 ]} }
To pick the first of the month for each month (method 2), use the following aggregation:
db.test.aggregate([
{ "$project" : { "_id" : "$date", "day" : { "$dayOfMonth" : "$date" }, "28day_avg" : 1 } },
{ "$match" : { "day" : 1 } }
])
You can't use an index for the match, so this is not efficient. I'd suggest adding another field to each document that holds the $dayOfMonth value, so you can index it and do a simple find:
{
"date" : ISODate("2014-01-01T00:00:00Z"),
"price" : 59.23,
"28day_avg" : 54.21,
"dayOfMonth" : 1
}
db.test.ensureIndex({ "dayOfMonth" : 1 })
db.test.find({ "dayOfMonth" : 1 }, { "_id" : 0, "date" : 1, "28day_avg" : 1 })