How can i alter this query return the average over all overs in the db. mongodb - mongodb

db.temperature.aggregate([
{ "$match": {
"$and": [
{ "date": { "$gte": ISODate("2017-10-12T22:00:00Z") }},
{ "date": { "$lt": ISODate("2017-10-12T22:59:99Z") }}
]
}},
{ "$group": {
"_id": { "$hour": "$date" },
"temperature": {
"$avg": "$temperature"
}
}}
])
The data looks like
{
"_id" : ObjectId("5df25dd648bfdfee3906e0cd"),
"date" : ISODate("2017-10-12T22:00:00Z"),
"power" : 39
}
There is a record for every minute and i am trying to get the average over every hour in the database. This returns the average over a specific hour.

You can simply remove the $match part of your query:
db.temperature.aggregate([
{ "$group": {
"_id": { "$hour": "$date" },
"temperature": {
"$avg": "$temperature"
}
}}
])
You can see the output of this query, with some sample data, by clicking on run in this playground.

Related

Using the aggregation framework to compare array element overlap

I have a collections with documents structured like below:
{
carrier: "abc",
flightNumber: 123,
dates: [
ISODate("2015-01-01T00:00:00Z"),
ISODate("2015-01-02T00:00:00Z"),
ISODate("2015-01-03T00:00:00Z")
]
}
I would like to search the collection to see if there are any documents with the same carrier and flightNumber that also have dates in the dates array that over lap. For example:
{
carrier: "abc",
flightNumber: 123,
dates: [
ISODate("2015-01-01T00:00:00Z"),
ISODate("2015-01-02T00:00:00Z"),
ISODate("2015-01-03T00:00:00Z")
]
},
{
carrier: "abc",
flightNumber: 123,
dates: [
ISODate("2015-01-03T00:00:00Z"),
ISODate("2015-01-04T00:00:00Z"),
ISODate("2015-01-05T00:00:00Z")
]
}
If the above records were present in the collection I would like to return them because they both have carrier: abc, flightNumber: 123 and they also have the date ISODate("2015-01-03T00:00:00Z") in the dates array. If this date were not present in the second document then neither should be returned.
Typically I would do this by grouping and counting like below:
db.flights.aggregate([
{
$group: {
_id: { carrier: "$carrier", flightNumber: "$flightNumber" },
uniqueIds: { $addToSet: "$_id" },
count: { $sum: 1 }
}
},
{
$match: {
count: { $gt: 1 }
}
}
])
But I'm not sure how I could modify this to look for array overlap. Can anyone suggest how to achieve this?
You $unwind the array if you want to look at the contents as "grouped" within them:
db.flights.aggregate([
{ "$unwind": "$dates" },
{ "$group": {
"_id": { "carrier": "$carrier", "flightnumber": "$flightnumber", "date": "$dates" },
"count": { "$sum": 1 },
"_ids": { "$addToSet": "$_id" }
}},
{ "$match": { "count": { "$gt": 1 } } },
{ "$unwind": "$_ids" },
{ "$group": { "_id": "$_ids" } }
])
That does in fact tell you which documents where the "overlap" resides, because the "same dates" along with the other same grouping key values that you are concerned about have a "count" which occurs more than once. Indicating the overlap.
Anything after the $match is really just for "presentation" as there is no point reporting the same _id value for multiple overlaps if you just want to see the overlaps. In fact if you want to see them together it would probably be best to leave the "grouped set" alone.
Now you could add a $lookup to that if retrieving the actual documents was important to you:
db.flights.aggregate([
{ "$unwind": "$dates" },
{ "$group": {
"_id": { "carrier": "$carrier", "flightnumber": "$flightnumber", "date": "$dates" },
"count": { "$sum": 1 },
"_ids": { "$addToSet": "$_id" }
}},
{ "$match": { "count": { "$gt": 1 } } },
{ "$unwind": "$_ids" },
{ "$group": { "_id": "$_ids" } },
}},
{ "$lookup": {
"from": "flights",
"localField": "_id",
"foreignField": "_id",
"as": "_ids"
}},
{ "$unwind": "$_ids" },
{ "$replaceRoot": {
"newRoot": "$_ids"
}}
])
And even do a $replaceRoot or $project to make it return the whole document. Or you could have even done $addToSet with $$ROOT if it was not a problem for size.
But the overall point is covered in the first three pipeline stages, or mostly in just the "first". If you want to work with arrays "across documents", then the primary operator is still $unwind.
Alternately for a more "reporting" like format:
db.flights.aggregate([
{ "$addFields": { "copy": "$$ROOT" } },
{ "$unwind": "$dates" },
{ "$group": {
"_id": {
"carrier": "$carrier",
"flightNumber": "$flightNumber",
"dates": "$dates"
},
"count": { "$sum": 1 },
"_docs": { "$addToSet": "$copy" }
}},
{ "$match": { "count": { "$gt": 1 } } },
{ "$group": {
"_id": {
"carrier": "$_id.carrier",
"flightNumber": "$_id.flightNumber",
},
"overlaps": {
"$push": {
"date": "$_id.dates",
"_docs": "$_docs"
}
}
}}
])
Which would report the overlapped dates within each group and tell you which documents contained the overlap:
{
"_id" : {
"carrier" : "abc",
"flightNumber" : 123.0
},
"overlaps" : [
{
"date" : ISODate("2015-01-03T00:00:00.000Z"),
"_docs" : [
{
"_id" : ObjectId("5977f9187dcd6a5f6a9b4b97"),
"carrier" : "abc",
"flightNumber" : 123.0,
"dates" : [
ISODate("2015-01-03T00:00:00.000Z"),
ISODate("2015-01-04T00:00:00.000Z"),
ISODate("2015-01-05T00:00:00.000Z")
]
},
{
"_id" : ObjectId("5977f9187dcd6a5f6a9b4b96"),
"carrier" : "abc",
"flightNumber" : 123.0,
"dates" : [
ISODate("2015-01-01T00:00:00.000Z"),
ISODate("2015-01-02T00:00:00.000Z"),
ISODate("2015-01-03T00:00:00.000Z")
]
}
]
}
]
}

Aggregation query in mongo and spring-data-mongo

Hi everyone I have a big problem in querying my data. I have documents like this:
{
"_id" : NumberLong(999789748357864),
"text" : "#asd #weila #asd2 welcome in my house",
"date" : ISODate("2016-12-13T21:44:37.000Z"),
"dateString" : "2016-12-13",
"hashtags" : [
"asd",
"weila",
"asd2"
]
}
and I want to build two queries:
1) count for each day the number of hashtag and get out for example something like this:
{_id:"2016-12-13",
hashtags:[
{hashtag:"asd",count:20},
{hashtag:"weila",count:18},
{hashtag:"asd2",count:10},
....
]
}
{_id:"2016-12-14",
hashtags:[
{hashtag:"asd",count:18},
{hashtag:"asd2",count:14},
{hashtag:"weila",count:10},
....
]
}
2)another is the same but I want to set a period from 2016-12-13 to 2016-12-17.
For the first one I write this query and I get what I search but in Spring Data Mongo I don't know how to write.
db.comment.aggregate([
{$unwind:"$hashtags"},
{"$group":{
"_id":{
"date" : "$dateString",
"hashtag": "$hashtags"
},
"count":{"$sum":1}
}
},
{"$group":{
"_id": "$_id.date",
"hashtags": {
"$push": {
"hashtag": "$_id.hashtag",
"count": "$count"
}},
"count": { "$sum": "$count" }
}},
{"$sort": { count: -1}},
{"$unwind": "$hashtags"},
{"$sort": { "count": -1, "hashtags.count": -1}},
{"$group": {
"_id": "$_id",
"hashtags": { "$push": "$hashtags" },
"count": { "$first": "$count" }
}},
{$project:{name:1,hashtags: { $slice: ["$hashtags", 2 ]}}}
]);
You can still use a fraction of the same aggregation operation minus the pipeline steps after the second group stage but for the filtering aspect you'd have to introduce a date range query in an initial $match pipeline step.
The following mongo shell examples
show how you filter the aggregates for a particular date range:
1) Set a period from 2016-12-13 to 2016-12-14:
var startDate = new Date("2016-12-13");
startDate.setHours(0,0,0,0);
var endDate = new Date("2016-12-14");
endDate.setHours(23,59,59,999);
var pipeline = [
{
"$match": {
"date": { "$gte": startDate, "$lte": endDate }
}
}
{ "$unwind": "$hashtags" },
{
"$group": {
"_id": {
"date": "$dateString",
"hashtag": "$hashtags"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.date",
"hashtags": {
"$push": {
"hashtag": "$_id.hashtag",
"count": "$count"
}
}
}
}
]
db.comment.aggregate(pipeline)
2) Set a period from 2016-12-13 to 2016-12-17:
var startDate = new Date("2016-12-13");
startDate.setHours(0,0,0,0);
var endDate = new Date("2016-12-17");
endDate.setHours(23,59,59,999);
// run the same pipeline as above but with the date range query set as required
Spring Data Equivalent (untested):
import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;
Aggregation agg = newAggregation(
match(Criteria.where("date").gte(startDate).lte(endDate)),
unwind("hashtags"),
group("dateString", "hashtags").count().as("count"),
group("_id.dateString")
.push(new BasicDBObject
("hashtag", "$_id.hashtags").append
("count", "$count")
).as("hashtags")
);
AggregationResults<Comment> results = mongoTemplate.aggregate(agg, Comment.class);
List<Comment> comments = results.getMappedResults();

is It possible to compare two Months Data in single Collection in MongoDB?

I have collection database with 10 000 000 call records.
I want to compare call usage of previous month to next month.
Example of collection document
{
"_id" : ObjectId("54ed74d76c68d23af73e230a"),
"msisdn" : "9818441000",
"callType" : "ISD"
"duration" : 10.109999656677246,
"charges" : 200,
"traffic" : "Voice",
"Date" : ISODate("2014-01-05T19:51:01.928Z")
}
{
"_id" : ObjectId("54ed74d76c68d23af73e230b"),
"msisdn" : "9818843796",
"callType" : "Local",
"duration" : 1,
"charges" : 150,
"traffic" : "Voice",
"Date" : ISODate("2014-02-04T14:25:35.861Z")
}
Duration is my usage.
I want to compare duration of ISODate("2014-01-04T14:25:35.861Z") with next month ISODate("2014-02-04T14:25:35.861Z") of all records.
All msisdn number are same in both months.
The obvious call here seems to be to aggregate the data, which in MongoDB the aggregation framework is well suited to. Taking the general use case fields that I see present here. And yes, we generally talk in terms of discrete months rather than some value assumed to be one month from the current point in time:
db.collection.aggregate([
{ "$match": {
"msisdn": "9818441000",
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } }
])
The intent there is to produce two records in the response representing each month as a distinct value.
You can basically take those two results and compare the difference between them in client code.
Or you can do this over all "MSISDN" values with months grouped into pairs within the document:
db.collection.aggregate([
{ "$match": {
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"msisdn": "$msisdn",
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": {
"msisdn": "$_id.msisdn",
"callType": "$_id.callType",
"traffic": "$_id.traffic"
},
"data": { "$push": {
"year": "$_id.year",
"month": "$_id.month",
"charges": "$charges",
"duration": "$duration"
}}
}}
])

Correct query for group by user, per month

I have MongoDB collection that stores documents in this format:
"name" : "Username",
"timeOfError" : ISODate("...")
I'm using this collection to keep track of who got an error and when it occurred.
What I want to do now is create a query that retrieves errors per user, per month or something similar. Something like this:
{
"result": [
{
"_id": "$name",
"errorsPerMonth": [
{
"month": "0",
"errorsThisMonth": 10
},
{
"month": "1",
"errorsThisMonth": 20
}
]
}
]
}
I have tried several different queries, but none have given the desired result. The closest result came from this query:
db.collection.aggregate(
[
{
$group:
{
_id: { $month: "$timeOfError"},
name: { $push: "$name" },
totalErrorsThisMonth: { $sum: 1 }
}
}
]
);
The problem here is that the $push just adds the username for each error. So I get an array with duplicate names.
You need to compound the _id value in $group:
db.collection.aggregate([
{ "$group": {
"_id": {
"name": "$name",
"month": { "$month": "$timeOfError" }
},
"totalErrors": { "$sum": 1 }
}}
])
The _id is essentially the "grouping key", so whatever elements you want to group by need to be a part of that.
If you want a different order then you can change the grouping key precedence:
db.collection.aggregate([
{ "$group": {
"_id": {
"month": { "$month": "$timeOfError" },
"name": "$name"
},
"totalErrors": { "$sum": 1 }
}}
])
Or if you even wanted to or had other conditions in your pipeline with different fields, just add a $sort pipeline stage at the end:
db.collection.aggregate([
{ "$group": {
"_id": {
"month": { "$month": "$timeOfError" },
"name": "$name"
},
"totalErrors": { "$sum": 1 }
}},
{ "$sort": { "_id.name": 1, "_id.month": 1 } }
])
Where you can essentially $sort on whatever you want.

how to aggregate in mongoDB

I have a document called user.monthly, in that I have we used store 'day' : no. of clicks .
Here I have given 2 samples for different date
For month January
{
name : "devid",
date : ISODate("2014-01-21T11:32:42.392Z"),
daily: {'1':12,'9':13,'30':13}
}
For month February
{
name : "devid",
date : ISODate("2014-02-21T11:32:42.392Z"),
daily: {'3':12,'12':13,'25':13}
}
How can I aggregate this and get total clicks for January and February ?
Please help me to resolve my problem.
Your current schema is not helping you here as the "daily" field ( which we presume is your clicks per type or something like that ) is represented as a sub-document, which means that you need to explicitly name the path to each field in order to do something with it.
A better approach would be to put this information in an array:
{
"name" : "devid",
"date" : ISODate("2014-02-21T11:32:42.392Z"),
"daily": [
{ "type": "3", "clicks": 12 },
{ "type": "12", "clicks": 13 },
{ "type": "25", "clicks": 13 }
]
}
Then you have an aggregation statement that goes like this:
db.collection.aggregate([
// Just match the dates in January and February
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
// Unwind the "daily" array
{ "$unwind": "$daily" },
// Group the values together by "type" on "January" and "February"
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"type": "$daily.type"
},
"clicks": { "$sum": "$daily.clicks" }
}},
// Sort the result nicely
{ "$sort": {
"_id.year": 1,
"_id.month": 1,
"_id.type": 1
}}
])
That form is pretty simple. Or even if you do not care about the type as a grouping and just want the month totals:
db.collection.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
{ "$unwind": "$daily" },
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"clicks": { "$sum": "$daily.clicks" }
}},
{ "$sort": { "_id.year": 1, "_id.month": 1 }}
])
But with the current sub-document form you currently have this becomes ugly:
db.collection.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"clicks": {
"$sum": {
"$add": [
{ "$ifNull": ["$daily.1", 0] },
{ "$ifNull": ["$daily.3", 0] },
{ "$ifNull": ["$daily.9", 0] },
{ "$ifNull": ["$daily.12", 0] },
{ "$ifNull": ["$daily.25", 0] },
{ "$ifNull": ["$daily.30", 0] },
]
}
}
}}
])
That shows that you have no other option here other than to specify what is essentially every possible field under daily ( so probably much larger ). Then we have to evaluate as that key may possibly not exist for a given document to return a default value.
For example, your first document has no key "daily.3" so without the $ifNull check the returned value would be null and invalidate the whole $sum process so that the total would be "0".
Grouping on those keys as in the first aggregate example gets even worse:
db.collection.aggregate([
// Just match the dates in January and February
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
// Project with an array to match all possible values
{ "$project": {
"date": 1,
"daily": 1,
"type": { "$literal": ["1", "3", "9", "12", "25", "30" ] }
}},
// Unwind the "type" array
{ "$unwind": "$type" },
// Project values onto the "type" while grouping
{ "$group" : {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"type": "$type"
},
"clicks": { "$sum": { "$cond": [
{ "$eq": [ "$type", "1" ] },
"$daily.1",
{ "$cond": [
{ "$eq": [ "$type", "3" ] },
"$daily.3",
{ "$cond": [
{ "$eq": [ "$type", "9" ] },
"$daily.9",
{ "$cond": [
{ "$eq": [ "$type", "12" ] },
"$daily.12",
{ "$cond": [
{ "$eq": [ "$type", "25" ] },
"$daily.25",
"$daily.30"
]}
]}
]}
]}
]}}
}},
{ "$sort": {
"_id.year": 1,
"_id.month": 1,
"_id.type": 1
}}
])
Which is creating one big conditional evaluation using $cond to match out the values to the "type" which we projected all possible values in an array using the $literal operator.
If you do not have MongoDB 2.6 or greater you can always do this in place of the $literal operator statement:
"type": { "$cond": [1, ["1", "3", "9", "12", "25", "30" ], 0] }
Where essentially the true evaluation from $cond returns a "literal" declared value, which is how you specify an array. There is also the hidden $const operator that is not documented, but now exposed as $literal.
As you can see the structure here is doing you no favors, so the best option is to change it. But if you cannot and otherwise find the aggregation concept for this too hard to handle, then mapReduce offers an approach, but the processing will be much slower:
db.collection.mapReduce(
function () {
for ( var k in this.daily ) {
emit(
{
year: this.date.getFullYear(),
month: this.date.getMonth() + 1,
type: k
},
this.daily[k]
);
}
},
function(key,values) {
return Array.sum( values );
},
{
"query": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
},
"out": { "inline": 1 }
}
)
The general lesson here is that you will get the cleanest and fastest results by altering the document format and using the aggregation framework. But all the ways to do this are listed here.