Get number of daily records, per month - mongodb

I am trying to get some data visualization for an application I am making and I am currently having an issue.
The current query I am using to get the documents grouped by month is the following:
# Generating our pipeline
pipeline = [
{"$match": query_match
},
{"$group": {
'_id': {
'$dateTrunc': {
'date': "$date", 'unit': "month"
}
},
"total": {
"$sum": 1
}
}
},
{'$sort': {
'_id': 1
}
}
]
This however, will return me the sum of documents for each month.
I want to take this a step further and calculate the average number of documents per day. but ONLY for the days which I have collections for.
As an example, the above query currently returns something like this:
Index _id total_documents
0 2022-07-01 10425
1 2022-08-01 27981
2 2022-09-01 24872
3 2022-10-01 1633
What I want is, for 2022-7 for example, I have documents submitted for 20 of the 31 days that the month has, so I want to return 10452 / 20, instead of 10452 / 31 which would technically be the daily average for that month.
Is there a way to do this in a single aggregation or would I have to use an additional query to determine how many days I have documents for first?
Thanks

Related

How to aggregate across a $substr in MongoDB

I have a collection, ledger, with the following document format:
{
_id: ###,
month: 202112,
name: 'XXXXXXXXXXXX',
gross_revenue: 482.28
}
The month actually contains both the year and month, YYYYMM. And there are multiple entries per 'month'. What I'm wanting to do is sum the gross_revenue values across the years. So take a $substr of month to get the year and then sum up gross_revenue.The result would ideally look like this:
2019: 99999.99,
2020: 88888.88,
.
.
I can aggregate for a given month, and I can get the substr, but can't figure out how to do combine them to aggregate by year.
db.ledger.aggregate([ { $match: {month: 202111} },{ $group: { _id: null, total: { $sum: "$gross_revenue" } } } ] )
db.ledger.aggregate([{$project: { year: {$substr:["$month", 0,4]}}}])
Any help is appreciated.
Query
group by year (we can use any aggregation operator to group)
replace root to make the expected output
Test code here
aggregate(
[{"$group":
{"_id":{"$substrCP":["$month", 0, 4]},
"sum":{"$sum":"$gross_revenue"}}},
{"$replaceRoot":
{"newRoot":{"$arrayToObject":[[{"k":"$_id", "v":"$sum"}]]}}}])

Mongodb selecting every nth of a given sorted aggregation

I want to be able to retrieve every nth item of a given collection which is quite large (millions of records)
Here is a sample of my collection
{
_id: ObjectId("614965487d5d1c55794ad324"),
hour: ISODate("2021-09-21T17:21:03.259Z"),
searches: [
ObjectId("614965487d5d1c55794ce670")
]
}
My start of aggregation is like so
[
{
$match: {
searches: {
$in: [ObjectId('614965487d5d1c55794ce670')],
},
},
},
{ $sort: { hour: -1 } },
{ $project: { hour: 1 } },
...
]
I have tried many things including
$sample which does not make the pick in the good order
Using $skip makes it very slow as the number given to skip grows
Using _id instead of $skip but my ids are unfortunately not created in an ordered manner
My goal is thus to retrieve the hour of a record, every 20000 record, so that I can then make a call to retrieve data by chunks of approximately 20000 records.
I imagine it would be possible to
sort, and number every records, then keep only the first, 20000, 40000, ..., and the last
Thanks for your help and let me know if you need more information

I need returns the average hourly rate between two dates in Mongodb

I need to write a query [ aggregate ] that returns the average hourly rate between two dates in Mongodb.
I found in my research the following code
db.transactions.aggregate([
{
$match: {
transactionDate: {
$gte: ISODate("2017-01-01T00:00:00.000Z"),
$lt: ISODate("2017-01-31T23:59:59.000Z")
}
}
}, {
$group: {
_id: null,
average_transaction_amount: {
$avg: "$amount"
}
}
}
]);
The previous code returns the average one value between two dates,but I need Average per hour.
sample document is
{"_id":"5a4e1fa1e4b02d76985c39b1",
"temp1":4,
"temp2":3,
"temp3‌​":2,
"created_on":"20‌​18-01-04T12:35:45.83‌​3Z"}
Result for example :
{ {temp1:5,temp2:6,temp3:6} for Average hour {temp1:2,temp2:4,temp3:6}for Average hour next ,{temp1:5,temp2:7,temp3:9},{temp1:8,temp2:4,temp3:7},{temp1:‌​4,temp2:2,temp3:6},{‌​temp1:9,temp2:6,temp‌​3:4}
}
Between every hour there are a lot of values ,So I need to calculate the Avg per hour
Please help

Mongoose: Score query then sort by score - non text fields

In my db, I have a collection of books.
Each have:
a count of upvotes
a count of downvotes
a count of views
I would like to sort my db by scoring as follows:
upvote: 8 points
downvote: -4 points
view: 1/2 point
So the score will be:
(NumberOfViews*(1/2)) + (NumberOfDownvotes*-4)+ (NumberOfUpvotes*8)
So if I have:
book1 = {name:'book1', views:3000,upvotes:340, downvotes:120}
book2 = {name:'book2', views:9000,upvotes:210, downvotes:620}
book3 = {name:'book3', views:7000,upvotes:6010, downvotes:2}
The score should be:
book1Score = 3740
book2Score = 3700
book3Score = 51572
And the query should output
book3,book1,book2
How can I achieve such a thing in mongoose?
Bonus: What if I want records that are more recent to rank higher than older records on that same query?
Thanks
Well I ended up doing it all inside mongoose.
I run this query every 24 hours to re-score my collection.
Book.aggregate(
[
//I match my query
{$match:query},
{
$project: {
//take the id for reference
_id: 1,
//calculate the score of the views
viewScore: {
$multiply: [ "$views", 0.5 ]
},
//calculate the score of the upvotes
upvoteScore: {
$multiply: [ {$size: '$upvotes'}, 8 ]
},
//calculate the score of the downvotes
downvoteScore: {
$multiply: [ {$size: '$downvotes'}, -4 ]
}
}
},
{
//project a second time
$project: {
//take my id for reference
_id: 1,
//get my total score
score: {
$add:['$viewScore','$upvoteScore','$downvoteScore']
},
}
},
//sort by the score.
{$sort : {'score' : -1}},
]
)
I think the best way would be to query mongoose for the list of book then do the sorting yourself.
Something like:
// Get query results from mongoose then ...
books.sort((a,b) => {
return ((a.views*(1/2))+(a.downvotes*-4)+(a.upvotes*8))-((b.view*(1/2))+ b.downvotes*-4)+(b.upvotes*8))
});
This would sort the books in ascending order of highest points
EDIT: The above answer is for sorting after you've received the query. (And also just realized you want descending for above^ so just switch the placement to be b - a)
If you want to receive the query already sorted, you could instead calculate the score at the time you input the book and add that as a field. The use mongoose's Query#sort. Which would look something like
query.sort({ score: 'desc'});
More info on Query#sort: http://mongoosejs.com/docs/api.html#query_Query-sort

Date day/minute in mongodb queries

I have time series data stored in a mongodb database, where one of the fields is an ISODate object. I'm trying to retrieve all items for which the ISODate object has a zero value for minutes and seconds. That is, all the objects that have a timestamp at a round hour.
Is there any way to do that, or do I need to create separate fields for hour, min, second, and query for them directly by doing, e.g., find({"minute":0, "second":0})?
Thanks!
You could do this as #Devesh says or if it fits better you could use the aggregation framework:
db.col.aggregate([
{$project: {_id:1, date: {mins: {$minute: '$dateField'}, secs: {$second: '$dateField'}}}},
{$match: {mins: 0, secs: 0}}
]);
Like so.
Use the $expr operator along with the date aggregate operators $minute and $second in your find query as:
db.collection.find({
'$expr': {
'$and': [
{ '$eq': [ { '$minute': '$dateField' }, 0 ] },
{ '$eq': [ { '$second': '$dateField' }, 0 ] },
]
}
})
Can you have one more column added in the collection only containing the datetime without minutes and seconds . It will make your query faster and easy to use. It will be datetime column with no minutes and seconds parts