Aggregating last n days in MongoDB - mongodb

I'm trying to build a query in MongoDb (to be run with pymongo) to get a sum by groups for the last 30 days. I'm really struggling to combine both aggregate function and date differences. Tne SQL equivalent of the query would be:
SELECT item, sum(volume) from table
where date >= DATEADD(DAY, -30, now())
group by item
Can anyone help?

This is quite simple, first you want to match the matching documents by their date field and then use $group to calculate the sum.
First what we need for the code:
import pymongo
from datetime import date, timedelta, datetime
connection = pymongo.MongoClient("connection url")
collection = connection["db_name"]['collection_name']
Main Aggregation:
thirty_days_ago = date.today() - timedelta(days=30)
results = list(collection.aggregate([
{
"$match": {
"date": {"$gt": thirty_days_ago }
}
},
{
"$group": {
"_id": "item",
"sum": {"$sum": "$volume"}
}
}
]))
I had to guess some of the fields names as you did not provide a schema but you should be able to easily adapt it to your needs.

Related

Converting StringDate to a queryable representation. Group and project sum of today,yesterday, week, month

I was about to try to figure out how to complete the given question.
It might consists of 2 parts, first one being - my collection dates are stored in plain string with a mysql format (YYYY-mm-dd HH:mm:ss), the second how to project the (today, yesterday, 7 day, month - summaries).
I have been experimenting around and this is what I came up with.
Pipe 1.
$match - nothing fancy there just a simple field = value.
Pipe 2.
$addField - trying to process the string date as a ISO date I believe? I am not sure
{
expired: {
$dateFromString: {
dateString: '$expired',
timezone: 'America/New_York'
}
}
}
Pipe 3.
$match - Quoted out wanted to select only a specific range so not more than 30 days - doesn't work
expired: {
$gt: ISODate(new Date(new Date(ISODate().getTime() - 1000*60*60*24*30)))
}
Pipe 4.
$group - Here I group and sum everything per day. So an output is
_id: 2021-09-27, theVal : 100
{
_id: {
$dateToString: {
date: { $toDate: "$expired" },
format: "%Y-%m-%d" }
},
theVal : {$sum:{$first:"$values.quantity"}} // as $values is an array [0].quantity,[1].quantity,[2].quantity - I am just interested in the first element.
}
Pipe 5.
$project - getting rid of the _id field - making it date name field, keeping theVal.
{
"date": "$_id",
"theVal": 1,
"_id": 0
}
theVal is a sum of integers within a day.
Questions
Between Pipe 1 and 2 ( temporary 3 ) I should be able to match dates
within the last 30 days to reduce the processing?
How to get a desired output like this:
{
today : 100,
yesterday : 10,
7days : 220,
month: 1000,
}
Really appreciate any help here.
Tried to "replicate" what you intend to do as you didn't provided sample test data.
You may want to do the followings in an aggregation pipeline:
$match : filter out the ids you want - same as your pipe 1
$dateFromString: use "format": "%Y-%m-%d %H:%M:%S", "timezone": "America/New_York"
$match : filter out records that are within 30 days with $expr and $$NOW
$group : group by date without time; achieved by converting to dateString with date part only
$addFields : project flags that determine if the record are within today, ``yesterday, 7days, month`
$group : As you didn't provided what is the meaning of today, ``yesterday, 7days, month, I made an assumption that they are the cumulative sum in the ranges. Simply and conditional $sum` will do the summation with the help of flags in step 5
Here is a Mongo playground for your reference.

Mongodb query to get document count for each month of a specific year

I have a time field as below in my collection.
"time" : NumberLong(1531958400000)
I first want to query documents by the current year(using $match) and then get the document count for each month.
I have managed to match the year using the below query.
db.myCollection.aggregate([
{$project: {
year: { "$year":{"$add":[new Date(0),"$time"]}}
}
},
{$match: {year: 2021}}
])
How can I write a mongodb query for the mentioned scenario?
Thanks in advance!
You can do following:
parsed the timestamp using $date
compare the parsed date field with current year
$group by $month value to get the count
Here is the Mongo playground for your reference.

Looking for function similar to "BETWEEN" in MongoDB

I need to get all records from MongoDB collection "employee" where joining_date is between current_date and current_date + 5 days. I couldn't find anything similar to BETWEEN operator in MongoDB documentation. Below query works fine in Google BigQuery. Looking for similar solution in MongoDB.
select * from employee where joining_date BETWEEN current_date() and DATE_ADD(current_date(), interval 5 DAY);
The $gt and $lt Comparison Query Operators can be used to find matches within a range of dates. Here's one approach.
db.employee.find({
"joining_date": {
$gt: new Date(),
$lt: new Date(new Date().setDate(new Date().getDate() + 5))
}
})

MongoDB query to retrieve distinct documents by date

I have documents in the database with a dateTime value like so:
{
"_id" : ObjectId("5a66fa22d29dbd0001521023"),
"exportSuccessful" : true,
"month" : 0,
"week" : 4,
"weekDay" : "Mon",
"dateTime" : ISODate("2018-01-22T09:02:26.525Z"),
"__v" : 0
}
I'd like to:
query the database for a given date and have it return the document that contains the dateTime if the date matches (I don't care about the time). This is mainly to test before inserting a document that there isn't already one for this date. In the above example, if my given date is 2018-01-22 I'd like the document to be returned.
retrieve all documents with a distinct date from the database (again, I don't care about the time portion). If there are two documents with the same date (but different times), just return the first one.
From what I understand Mongo's ISODate type does not allow me to store only a date, it will always have to be a dateTime value. And on my side, I don't have control over what goes in the database.
Try range query with start date time from start of the day to end date time to end of the day. So basically create dates a day apart.
Something like
var start = moment().utc().startOf('day');
var end = moment().utc().endOf('day');
db.collection.find({
dateTime: {
$gte: start,
$lte: end
}
})
Get all distinct dates documents:
db.collection.aggregate(
{"$group":{
"_id":{
"$dateToString":{"format":"%Y-%m-%d","date":"$dateTime"}
},
"first":{
"$first":"$$ROOT"
}
}}])

Find rows between two dates that are an interval n apart

Say I have an entry for every day in the year (or possibly every hour, every minute, ...). What I'd like to do is query all rows that are in between the range of two dates and only return one entry for every interval n (e.g. one entry each week or one entry every second day, ...)
For a more specific example, my database has entries like this:
{ _id: ..., date: ISODate("2014-07-T01:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-07-02T12:00:00Z"), values: ... }
...
{ _id: ..., date: ISODate("2015-03-17T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2015-03-18T12:00:00Z"), values: ... }
I want every result between 2014-12-05 and 2015-02-05 but only one every 3 days. The result set should look like this:
{ _id: ..., date: ISODate("2014-12-05T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-08T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-11T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-14T12:00:00Z"), values: ... }
...
Can this be done somehow?
Using the aggregation framework (and an awfully complicated query), you can achieve your goal. Something along the lines of the following:
db.coll.aggregate([
{$match: {
date: {
$gte: ISODate("2014-12-08T12:00:00.000Z"),
$lt: ISODate("2014-12-12T00:00:00.000Z")
}
}},
{$project:
{ date:1,
value: 1,
grp: { $let:
{
vars: { delta:{$subtract:["$date", ISODate("2014-12-08T12:00:00.000Z")]}},
in: {$subtract:["$$delta", {$mod:["$$delta",3*24*3600*1000]}]}
}
}
}
},
{$sort: { date: 1 }},
{$group: {_id:"$grp", date: {$first:"$date"}, value: {$first: "$value"}}}
])
the $match step will keep only rows in the desired range;
the project step will keep date and value, and will compute a "group number" based on the date. delta is the time difference in ms between the given date and some arbitrary application dependent origin. As MongoDB does not have the integer division operator, I use a substitute: delta-mod(delta, 3*24*3600*1000). This will change every 3 days (3 days × 24 hours × 3600 sec × 1000 ms);
the $sort step is maybe not required depending your use case. I use it in order to ensure a deterministic result when keeping the first date and value of each group in the next step;
finally (!) $group will group documents by the grp value calculated before, keeping only the first date and value of each group.
You can query for ranges using the following syntax:
db.collection.find( { field: { $gt: value1, $lt: value2 } } );
In your case, field would be the date field and this question may help you format the values:
return query based on date
Edit: I did not see the requirement for retrieving every nth document. In that case, I'm not sure MongoDB has built in support for that. You may have to manipulate the returned array yourself. In this case, once you get the range you can filter by index. Here's some boilerplate (I couldn't figure out an efficient use of Array.prototype.filter since that function removes the need for indices -- the opposite of what you want.):
var result =[]
for (var i = 0; i < inputArray.length ; i+=3) {
result.push(numList[i]);
}
return result;