Need some help completing this aggregation pipeline - mongodb

I have an analytics collection where I store queries as individual documents. I want to count the number of queries taking place over the past day (24 hours). Here's the aggregation command as it is:
db.analytics.aggregate([{$group:{_id:{"day":{$dayOfMonth:"$datetime"},"hour":{$hour:"$datetime"}},"count":{$sum:1}}},{$sort:{"_id.day":1,"_id.hour":1}}])
The result looks like:
.
.
.
{
"_id" : {
"day" : 17,
"hour" : 19
},
"count" : 8
},
{
"_id" : {
"day" : 17,
"hour" : 22
},
"count" : 1
},
{
"_id" : {
"day" : 18,
"hour" : 0
},
"count" : 1
}
.
.
.
Originally, my plan was to add a $limit operation to simply take the last 24 results. That's a great plan until you realize that there are some hours without any queries at all. So the last 24 documents could go back more than a single day. I thought of using $match, but I'm just not sure how to go about constructing it. Any ideas?

First of all you need to get the day just as current date or as most recent document from the collection. Then use query for specified day like:
db.analytics.aggregate([
{$project:{datetime:"$datetime",day:{$dayOfMonth:"$datetime"}}},
{$match:{day:3}},
{$group:{_id:{"hour":{$hour:"$datetime"}},"count":{$sum:1}}},
{$sort:{"_id.hour":1}}
]);
where 3 is the day of the month here {$match:{day:3}}
The idea is to add a day field, so, we able to filter by it, then group documents of the day by hours and sort.

Related

MongoDB - Get aggregated difference between two date fields

I have one collection called lists with following fields:
{ "_id" : ObjectId("5a7c9f60c05d7370232a1b73"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:40:10Z") }
{ "_id" : ObjectId("5a7c9f85c05d7370232a1b74"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:41:10Z") }
{ "_id" : ObjectId("5a7c9f89c05d7370232a1b75"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:42:10Z") }
{ "_id" : ObjectId("5a7c9f8cc05d7370232a1b76"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:42:20Z") }
I need to find out aggregated result in the following format (the difference between processed_date and created_date):
[{
"30Sec":count_for_diffrence_1,
"<=60Sec":count_for_diffrence_2,
"<=90Sec":count_for_diffrence_3
}]
One more thing if we can find out how may item took 30 sec, 60 sec and so on, also make sure that the result for <=60 Sec should not come in <=90Sec.
Any help will be appreciated.
You can try below aggregation query in 3.6 version.
$match with $expr to limit the documents where the time difference is 90 or less seconds.
$group with $sum to count different time slices occurences.
db.collection.aggregate([
{"$match":{"$expr":{"$lte":[{"$subtract":["$processed_date","$created_date"]},90000]}}},
{"$group":{
"_id":null,
"30Sec":{"$sum":{"$cond":{"if":{"$eq":[{"$subtract":["$processed_date","$created_date"]},30000]},"then":1,"else":0}}},
"<=60Sec":{"$sum":{"$cond":{"if":{"$lte":[{"$subtract":["$processed_date","$created_date"]},60000]},"then":1,"else":0}}},
"<=90Sec":{"$sum":{"$cond":{"if":{"$lte":[{"$subtract":["$processed_date","$created_date"]},90000]},"then":1,"else":0}}}
}}
])
Note if the created date is greater than processed date you may want to add a condition to look only for values where difference is between 0 and your requested time slice.
Something like
{$and:[{"$gte":[{"$subtract":["$processed_date","$created_date"]},0]}, {"$lte":[{"$subtract":["$processed_date","$created_date"]},60000]}]}

Getting data from outside of group

I have a lot of devices non-periodically inserting data into mongo.
I need to get statistics of this data (value by day/month/year). Currently i am doing this by adding a field where I parse the date to day month and year using $month, $year, $dayOfMonth. Then grouping them by these values. The problem is when I get no (or only one) data a day. Then I cant get actual value in this day because I need 2 values to subtract.
Is there a way to get the closest document by day to this group? in one query?
Lets say I have data:
{id : 1, ts : "2017-12-15T10:00:00.000Z", value : 10}
{id : 2, ts : "2017-12-15T17:00:00.000Z", value : 10}
{id : 2, ts : "2017-12-14T12:00:00.000Z", value : 6}
{id : 1, ts : "2017-12-14T15:00:00.000Z", value : 10}
{id : 1, ts : "2017-12-14T10:00:00.000Z", value : 10}
{id : 2, ts : "2017-12-14T09:00:00.000Z", value : 3}
Explanation of problem:
The value is actual read from the meter, for example lets say consumed energy. If device sonsumes 4W/min after 1 min it will be 4 after 2 minutes it will be 8. So the delta between 1. and 2. minute will be 4 . If i have record from 2017-12-14T23:58:00.000Z lets say 10W 23:59 it will be 14W so dValue should be 4 and 00:00 the next day i am not able to calculate the dValue because this is the first and only record in this group
If I group this data by day I can calculate the value difference only in 2017-12-14.
For now I am using this query:
{
$addFields : {
month : {$month : "$ts"},
year : {$year : "$ts"},
day : {$dayOfMonth : "$ts"}
}
},
{
$group : {
_id : {
year : "$year",
month : "$month",
day : "$day",
id : "$id"
},
first : {$min : "$$ROOT"},
last : {$max : "$$ROOT"},
}
},
{
$addFields : {
dValue: {$subtract : [last.value, first.value]} //delta value
}
},
This query works but only if there is more than one document in a day. If there is only one document i cant get accurate data. I want to do this in one query, because i have a lot of these devices and the number is going to only increase and if i have to do a query for every device i get insane number of queries to the database. Is there a way how to solve this ?

Can sorting before grouping improve query performance in Mongo using the aggregate framework?

I'm trying to aggregate data for 100 accounts for a 14-15 month period, grouping by year and month.
However, the query performance is horrible as it takes 22-27 seconds. There are currently over 15 million records in the collection and I've got an index on the match criteria and can see using explain() that the optimizer uses it.
I tried adding another index on the sort criteria in the query below and after adding the index, the query now takes over 50 seconds! This happens even after I remove the sort from the query.
I'm extremely confused. I thought because grouping can't utilize an index, that if the collection was sorted beforehand, then the grouping could be much faster. Is this assumption correct? If not, what other options do I have? I can bear the query performance to be as much as 5 seconds but nothing more than that.
//Document Structure
{
Acc: 1,
UIC: true,
date: ISODate("2015-12-01T05:00:00Z"),
y: 2015
mm: 12
value: 22.3
}
//Query
db.MyCollection.aggregate([
{ "$match" : { "UIC" : true, "Acc" : { "$in" : [1, 2, 3, ..., 99, 100] }, "date" : { "$gte" : ISODate("2015-12-01T05:00:00Z"), "$lt" : ISODate("2017-02-01T05:00:00Z") } } },
//{ "$sort" : { "UIC" : 1, "Acc" : 1, "y" : -1, "mm" : 1 } },
{ "$group" : { "_id" : { "Num" : "$Num", "Year" : "$y", "Month" : "$mm" }, "Sum" : { "$sum" : "$value" } } }
])
What I would suggest you to do is to make a script (can be in nodejs) that aggregates the data in a different collection. When you have these long queries, what's advisable is to make a different collection containing the aggregation data and query from that.
My second advice would be to create a composed index in this aggregated collection and search by regular expression. In your case I would make an index containing accountId:period. For example, for account 1, and February of 2016, The index would be something like 1:201602.
Then you would be able to perform queries using regular expressions by account and timestamp. Like as if you wanted the registers for 2016 of account 1, you could do something like:
db.aggregatedCollection.find{_id : \1:2016\})
Hope my answer was helpful

mongo query select only first of month

is it possible to query only the first (or last or any single?) day of the month of a mongo date field.
i use the $date aggregation operators regularly but within a $group clause.
basically i have field that is already aggregated (averaged) for each day of the month. i want to select only one of these days (with the value as a representative of the entire month.)
following is a sample of a record set from jan 1, 2014 to feb 1, 2015 with price as the daily price and 28day_avg as the trailing monthly average for 28 days.
{ "date" : ISODate("2014-01-01T00:00:00Z"), "_id" : ObjectId("533b3697574e2fd08f431cff"), "price": 59.23, "28day_avg": 54.21}
{ "date" : ISODate("2014-01-02T00:00:00Z"), "_id" : ObjectId("533b3697574e2fd08f431cff"), "price": 58.75, "28day_avg": 54.15}
...
{ "date" : ISODate("2015-02-01T00:00:00Z"), "_id" : ObjectId("533b3697574e2fd08f431cff"), "price": 123.50, "28day_avg": 122.25}
method 1.
im currently running an aggregation using $month data (and summing the price) but one issue is im seeking to retrieve the underlying date value ISODate("2015-02-01T00:00:00Z") versus the 0,1,2 value that comes with several of the date aggregations (that loop at the first of the week, month, year). mod(28) on a date?
method 2
i'd like to simply pluck out a single record of the 28day_avg as representative of the period. the 1st of the month would be adequate
the desired output is...
_id: ISODate("2015-02-01T00:00:00Z"), value: 122.25,
_id: ISODate("2015-01-01T00:00:00Z"), value: 120.78,
_id: ISODate("2014-12-01T00:00:00Z"), value: 118.71,
...
_id: ISODate("2014-01-01T00:00:00Z"), value: 53.21,
of course, the value will vary from method 1 to method 2 but that is fine. one is 28 days trailing while the other will account for 28, 30, 31 day months...dont care about that so much.
A non-agg is ok but also doesnt work. aka {"date": { "$mod": [ 28, 0 ]} }
To pick the first of the month for each month (method 2), use the following aggregation:
db.test.aggregate([
{ "$project" : { "_id" : "$date", "day" : { "$dayOfMonth" : "$date" }, "28day_avg" : 1 } },
{ "$match" : { "day" : 1 } }
])
You can't use an index for the match, so this is not efficient. I'd suggest adding another field to each document that holds the $dayOfMonth value, so you can index it and do a simple find:
{
"date" : ISODate("2014-01-01T00:00:00Z"),
"price" : 59.23,
"28day_avg" : 54.21,
"dayOfMonth" : 1
}
db.test.ensureIndex({ "dayOfMonth" : 1 })
db.test.find({ "dayOfMonth" : 1 }, { "_id" : 0, "date" : 1, "28day_avg" : 1 })

mongodb $dayOfYear equivalent Unix epoch time aggregation

Is there a method of grouping a Unix epoch time by day, equiv to $dayOfYear
or a process of aggregating floats, ints (into quartiles, hundreds, thousands, %)
try to avoid map reduce but an example of it would be awesome.
You can almost but not quite use Unix time seconds in aggregation pipeline by utilizing the $mod and $divide operators.
The math is Unix time seconds / 86400 to convert seconds into days since Epoch. Then modula that result by 365.25 for the day of the year (leaps every 4).
So the full aggregation for $dayOfYear using seconds is almost as simple as
db.MyCollection.aggregate( {$project : {"day" : {$mod : [ {$divide : ["$unix_seconds", 86400] } , 365.25] } } }, { $group : { _id : "$day" , num : { $sum : 1 } } } , {$sort : {_id : 1}} )
The above adds sorting for sequential day of year.
The problem is that the $mod operator returns both the whole number and remainder. and there is no way of rounding or truncating the remainder. Therefore the results are grouped by whole and remainder.
{
"_id" : 235.1864887063916,
"num" : 1
},
{
"_id" : 235.24300889818738,
"num" : 1
},
{
"_id" : 235.60299520864623,
"num" : 3
},
{
"_id" : 235.66453935674085,
"num" : 1
},
{
"_id" : 235.79900382758004,
"num" : 1
},
{
"_id" : 235.80265845312474,
"num" : 1
},
.. when clearly we want only the whole number
{
"_id" : 235,
"num" : 8
},
What would be nice is a $trunc or modula returning only the whole ($modw), and mod returning only remainder ($modr) operators in mongo.
JavaScript has the Date object which would be available to any server side JavaScript processing for MapReduce functions.
You seem to be aware of the $dayOfYear operator in the aggregation pipeline. There are other operators there for processing dates.
Unless your needs are very specific you should be using the aggregation pipeline. It is very flexible and in most cases will be considerably faster than the equivalent actions run under mapReduce.