List days occuring in dataset - mongodb

I have this huge dataset for which every entry has a datetime field. The data was inserted irregularly. For example:
2015-04-20 : 500 entries,
2015-04-23 : 300 entries,
2015-05-01 : 600 entries
The thing is, I do not know when these active days are. What I would like is a mongodb query which returns some sort of array containing all days which occur in the database, like so:
['2015-04-20,
'2015-04-23,
'2015-04-23,
'2015-04-25,
'2015-05-01,
'2015-05-05,
'2015-05-09]
Is this possible, and if so: how can I achieve this?

There is a "distinct" command that has shell wrapper, which can be used something like:
db.collection.distinct(dateFieldName, query)
If you are not running it from shell, check whether your driver wraps this command, if not you can use the command directly:
{ distinct: "<collection>", key: "<field>", query: <query> }
http://docs.mongodb.org/manual/reference/command/distinct/#dbcmd.distinct
If your time stamp field needs some additinal processing, you can use aggregation framework.
db.collection.aggregate([{$group: {_id: $substr: ["$timestamp", 0, 10]}}]
http://docs.mongodb.org/v2.6/core/aggregation-introduction/

Assuming a field named dateField that contains Date values, you can use the aggregation date operators with $group to do this.
It's easiest if you're using Mongo 3.x where the $dateToString operator is available:
db.dates.aggregate([
{$group: {
_id: {$dateToString: {format: '%Y-%m-%d', date: '$dateField'}},
count: {$sum: 1}
}},
{$sort: {count: -1}}
])
Prior to 3.0 you need to use multiple date operators to piece together the date into the _id when grouping:
db.dates.aggregate([
{$group: {
_id: {
year: {$year: '$dateField'},
month: {$month: '$dateField'},
day: {$dayOfMonth: '$dateField'}
},
count: {$sum: 1}
}},
{$sort: {count: -1}}
])
In both cases, note the use of $sort to order the results by the number of docs on each day, descending.

Related

How to aggregate across a $substr in MongoDB

I have a collection, ledger, with the following document format:
{
_id: ###,
month: 202112,
name: 'XXXXXXXXXXXX',
gross_revenue: 482.28
}
The month actually contains both the year and month, YYYYMM. And there are multiple entries per 'month'. What I'm wanting to do is sum the gross_revenue values across the years. So take a $substr of month to get the year and then sum up gross_revenue.The result would ideally look like this:
2019: 99999.99,
2020: 88888.88,
.
.
I can aggregate for a given month, and I can get the substr, but can't figure out how to do combine them to aggregate by year.
db.ledger.aggregate([ { $match: {month: 202111} },{ $group: { _id: null, total: { $sum: "$gross_revenue" } } } ] )
db.ledger.aggregate([{$project: { year: {$substr:["$month", 0,4]}}}])
Any help is appreciated.
Query
group by year (we can use any aggregation operator to group)
replace root to make the expected output
Test code here
aggregate(
[{"$group":
{"_id":{"$substrCP":["$month", 0, 4]},
"sum":{"$sum":"$gross_revenue"}}},
{"$replaceRoot":
{"newRoot":{"$arrayToObject":[[{"k":"$_id", "v":"$sum"}]]}}}])

MongoDB - filter in find function

I am writing a query in mongo db. I want to know that how do I use substring function in find function, like I have a date string. I want to filter the date by year and then group by month.
This is what I have tried:
db.booking.find({$substr:["$bookingdatetime",5,2]:"14"}).aggregate([ {$group: { _id: {$substr: ['$bookingdatetime', 5, 2]}, numberofbookings: {$sum: 1} }} ])
Where am I going wrong ?
$substr can be used only as a string aggregation operation. You cannot use it as a operator in a find query. You can form the aggregation pipeline as below:
db.booking.aggregate([
{$project:{"year":{$substr:["$bookingdatetime",5,2]},
"numberofbookings":1,
"bookingdatetime":1}},
{$match:{"year":14}},
{$group:{"_id":{$substr: ['$bookingdatetime', 5, 2]},
numberofbookings: {$sum: 1}}}
])
or,
use a regex to match the date pattern with specific year.

Mongodb aggregate query help - grouping with multiple fields and converting to an array

I have the following document in the mongodb collection
[{quarter:'Q1',project:'project1',user:'u1',cost:'100'},
{quarter:'Q2',project:'project1',user:'u2',cost:'100'},
{quarter:'Q3',project:'project1',user:'u1',cost:'200'},
{quarter:'Q1',project:'project2',user:'u2',cost:'200'},
{quarter:'Q2',project:'project2',user:'u1',cost:'300'},
{quarter:'Q3',project:'project2',user:'u2',cost:'300'}]
i need to generate an output which will sum the cost based on quarter and project and put it in the format so that it can be rendered in the Extjs chart.
[{quarter:'Q1','project1':100,'project2':200,'project3':300},
{quarter:'Q2','project1':100,'project2':200,'project3':300},
{quarter:'Q3','project1':100,'project2':200,'project3':300}]
i have tried various permutations and combinations of aggregates but couldnt really come up with a pipeline. your help or direction is greatly appreciated
Your cost data appears to be strings, which isn't helping, but assuming you're around that:
The main component is the $cond operator in the document projection, and assuming your data is larger and you want to group the results:
db.mstats.aggregate([
// Optionaly match first depending on what you are doing
// Sum up cost for each quarter and project
{$group: {_id: { quarter: "$quarter", project: "$project" }, cost: {$sum: "$cost" }}},
// Change the "projection" in $group, using $cond to add a key per "project" value
// We use $sum and the false case of 0 to fill in values not in the row.
// These will then group on the key adding the real cost and 0 together.
{$group: {
_id: "$_id.quarter",
project1: {$sum: {$cond:[ {$eq: [ "$_id.project", "project1" ]}, "$cost", 0 ]}},
project2: {$sum: {$cond:[ {$eq: [ "$_id.project", "project2" ]}, "$cost", 0 ]}}
}},
// Change the document to have the "quarter" key
{$project: { _id:0, quarter: "$_id", project1: 1, project2: 1}},
// Optionall sort by quarter
{$sort: {quarter: 1 }}
])
So after doing the initial grouping the document is altered with use of $cond to determine if the value of a key is going to go into the new key name. Essentially this asks if the current value of project is "project1" then put the cost value into this project1 key. And so on.
As we put a 0 value into this new document key when there was no match, we need to group the results again in order to merge the documents. Sorting is optional, but probably what you want for a chart.
Naturally you will have to build this up dynamically and probably query for the project keys that you want. But otherwise this should be what you are looking for.

How do you get DayHours from Mongo date field?

I am trying to group by DayHours in a mongo aggregate function to get the past 24 hours of data.
For example: if the time of an event was 6:00 Friday the "DayHour" would be 6-5.
I'm easily able to group by hour with the following query:
db.api_log.aggregate([
{ '$group': {
'_id': {
'$hour': '$time'
},
'count': {
'$sum':1
}
}
},
{ '$sort' : { '_id': -1 } }
])
I feel like there is a better way to do this. I've tried concatenation in the $project statement, however you can only concatenate strings in mongo(apparently).
I effectively just need to end up grouping by day and hour, however it gets done. Thank You.
I assume that time field contains ISODate.
If you want only last 24 hours you can use this:
var yesterday = new Date((new Date).setDate(new Date().getDate() - 1));
db.api_log.aggregate(
{$match: {time: {$gt: yesterday}}},
{$group: {
_id: {
hour: {$hour: "$time"},
day: {$dayOfMonth: "$time"},
},
count: {$sum: 1}
}}
)
If you want general grouping by day-hour you can use this:
db.api_log.aggregate(
{$group: {
_id: {
hour: {$hour: "$time"},
day: {$dayOfMonth: "$time"},
month: {$month: "$time"},
year: {$year: "$time"}
},
count: {$sum: 1}
}}
)
Also this is not an answer per se (I do not have mongodb now to come up with the answer), but I think that you can not do this just with aggregation framework (I might be wrong, so I will explain myself).
You can obtain date and time information from mongoId using .getTimestamp method. The problem that you can not output this information in mongo query (something like db.find({},{_id.getTimestamp}) does not work). You also can not search by this field (except of using $where clause).
So if it is possible to achieve, it can be done only using mapreduce, where in reduce function you group based on the output of getTimestamp.
If this is the query you are going to do quite often I would recommend actually adding date field to your document, because using this field you will be able properly aggregate your data and also you can use indeces not to scan all your collection (like you are doing with $sort -1, but to $match only the part which is bigger then current date - 24 hours).
I hope this can help even without a code. If no one will be able to answer this, I will try to play with it tomorrow.

Mongodb aggregation and date manipulation

Consider a collection which contains documents with a date and a count field :
{ _id: ObjectId("..."), date: ISODate("..."), count: 3}
I would like to query the count by week, so I have to group the data by a date truncated to the beginning of the week.
But it seems there is no way to achieve that with the mongodb aggregation framework.
I was expecting to be able to do something like this ($dateOfWeek is a date operator I imagined to truncate the date at the beginning of the week) :
db.data.aggregate( [ {$project : { date: {$dateOfWeek: '$date'}, count:1},
{ $group: {_id:'$date', count: {$sum: '$count'}} ])
But I didn't find a suitable date operator to do it.
I know I can do it with mapreduce but it would be so much more elegant to have a date operator rather than writing javascript code.