I have a timeseries dataset with a few hundred thousand records in it. I am trying to create an aggregate query in mongo to group this data in intervals all while averaging the price.
Ideally I would want 10minute intervals (600000ms) and the price averages. I'm not too sure how to carry on from where I am at.
Data ~a few hundred thousand records:
{
"time" : 1391485215000,
"price" : "0.00133355",
}
query = [
{
"$project": {
"_id":"$_id",
"price":"$price",
"time": {
xxxx
}
}
},
{
"$group": {xxxx}
}
]
So it would appear that I had a fundamental flaw in my Schema. I was using an epoch timestamp instead of mongo's Date type, as well as storing the other numbers as strings instead of doubles. I tried a few workarounds but it doesn't look like you are able to use the built in aggregate functions unless they are of the correct type.
$project: {
year: { $year: '$time'},
month: { $month: '$time'},
day: { $dayOfMonth: '$time'},
hour: { $hour: '$time'},
price: 1,
total: 1,
amount: 1
}
},
{
$group : {
_id: { year: '$year', month: '$month', day: '$day', hour: '$hour' },
price:{
$avg: "$price"
},
high:{
$max: "$price"
},
low:{
$min: "$price"
},
amount:{
$sum: "$amount"
},
total:{
$sum: "$total"
}
}
Related
How is change in a value over time calculated in mongodb?
Streaming values over time.
Streaming data is collected at sub-second random time intervals into individual documents.
Streamed data is grouped and averaged over 1 minute time group.
Goal is to compare each value to the one minute average one hour later.
Example data:
[
{
_id: ObjectId("63a318c36ccc42d2330fae5e"),
timestamp: ISODate("2022-12-21T14:30:31.172Z"),
value: 3.8
},
{
_id: ObjectId("63a318c46ccc42d2330fae8d"),
timestamp: ISODate("2022-12-21T14:30:32.189Z"),
value: 4.0
},
{
_id: ObjectId("63a318c36ccc42d2330fae5e"),
timestamp: ISODate("2022-12-21T15:30:14.025Z"),
value: 5.0
},
{
_id: ObjectId("63a318c36ccc42d2330fae5e"),
timestamp: ISODate("2022-12-21T15:30:18.025Z"),
value: 5.5
}
]
values grouped and averaged in one minute groups:
{$group:{_id:{
"code": "$code",
"year": { "$year": "$timestamp" },
"dayOfYear": { "$dayOfYear": "$timestamp" },
"hour": { "$hour": "$timestamp" },
"minute":{$minute:"$timestamp"}
},
value:{$avg:"$value"},
timestamp:{$first:"$timestamp"},
this gets close to the goal, but aggregates all the prices over an hour interval:
{$group:{_id:{
"code": "$code",
"year": { "$year": "$timestamp" },
"dayOfYear": { "$dayOfYear": "$timestamp" },
"hour": { "$hour": "$timestamp" }
},
value:{$first:"$value"},
valueLast:{$last:"$value"},
timestamp:{$first:"$timestamp"},
}
},
Instead, I want to look at change in the individual documents
That is, what is the 14:30 value at 15:30, and what is the 15:35 value at 16:35:
How do I compare a value to one hour later for each document?
[
{
_id: ObjectId("63a318c36ccc42d2330fae5e"),
timestamp: ISODate("2022-12-21T14:30:31.172Z"),
value: 3.8,
valueLast: 5.25,
gainPct: .382
},
{
_id: ObjectId("63a318c46ccc42d2330fae8d"),
timestamp: ISODate("2022-12-21T14:30:32.189Z"),
value: 4.0,
valueLast: 5.25,
gainPct: .313
},
]
One option is to use $setWindowFields with time range for this:
It allows you to group by code sort by cleanTimeStamp and preform an accumulation function ($avg) on all document within a (time) range from your current document (each document in context):
db.collection.aggregate([
{$set: {
cleanTimeStamp: {
$dateTrunc: {
date: "$timestamp",
unit: "minute"
}
}
}},
{$setWindowFields: {
partitionBy: "$code",
sortBy: {cleanTimeStamp: 1},
output: {
valueLast: {
$avg: "$value",
window: {range: [59, 60], unit: "minute"}
}
}
}},
{$set: {
gainPct: {$round: [{$divide: [{$subtract: ["$valueLast", "$value"]}, "$value"]}, 3]},
cleanTimeStamp: "$$REMOVE"
}
}
])
See how it works on the playground example
It is not clear to me if you want the result for each document or for a specific timestamp. If you only want the query to return results for a specific minute, you can add one more step of $match, as a first step, to limit the context of your documents to be between the wanted timestamp and 1 hour after it.
I am spinning my wheels on this. I am needing to find all documents within a collection that have a timestamp ("createdTs") that have a 3 second or less difference (to be clear: month/day/time/year all the same, save those few seconds). An example of createdTs field (it's type Date): 2021-04-26T20:39:01.851Z
db.getCollection("CollectionName").aggregate([
{ $match: { memberId: ObjectId("1234") } },
{
$project:
{
year: { $year: "$createdTs" },
month: { $month: "$createdTs" },
day: { $dayOfMonth: "$createdTs" },
hour: { $hour: "$createdTs" },
minutes: { $minute: "$createdTs" },
seconds: { $second: "$createdTs" },
milliseconds: { $millisecond: "$createdTs" },
dayOfYear: { $dayOfYear: "$createdTs" },
dayOfWeek: { $dayOfWeek: "$createdTs" },
week: { $week: "$createdTs" }
}
}
])
I've tried a lot of different variances. Where I'm struggling is how to compare these findings to one another. I'd also prefer to just search the entire collection and not match on the "memberId" field, just collect any documents that have less than a 3 second createdTs difference, and group/display those.
Is this possible? Newer to Mongo, and spun my wheels on this for two days now. Any advice would be greatly appreciated, thank you!
I saw this on another post, but not sure how to utilize it since I'm wanting to compare the same field:
db.collection.aggregate([
{ "$project": {
"difference": {
"$divide": [
{ "$subtract": ["$logoutTime", "$loginTime"] },
60 * 1000 * 60
]
}
}},
{ "$group": {
"_id": "$studentID",
"totalDifference": { "$sum": "$difference" }
}},
{ "$match": { "totalDifference": { "$gte": 20 }}}
])
Also am trying...
db.getCollection("CollectionName").aggregate([
{ $match: { memberId: ObjectId("1234") } },
{
$project:
{
year: { $year: "$createdTs" },
month: { $month: "$createdTs" },
total:
{ $add: ["$year", "$month"] }
}
}
])
But this returns a total of null. Not sure if it's because $year and $month are difference? The types are both int32, so I thought that'd work. Was wondering if there's a way to compare if all the fields are 0, then if seconds is not/difference is $gte 3 when using $group, could go from there.
Collection
{
"_id" : ObjectId("5a143a79ca78479b1dc90161"),
"createdAt" : ISODate("2017-11-21T14:38:49.375Z"),
"amount" : 227.93359186,
"pair" : "ant_eth"
}
Expected output
{
"12-12-2012": [
{
"pair": "ant_eth",
"sum": "sum of amounts in 12-12-2012"
},
{
"pair": "new_pair",
"sum": "sum of amounts in 12-12-2012"
},
],
"13-12-2012": [{
"pair": "ant_eth",
"sum": "sum of amounts in 13-12-2012"
}]
}
What I achieved so far from my knowledge is;
const criteria = [
{ $group: {
_id: '$pair',
totalAmount: { $sum: '$amount' } } }
]
Any help to achieve the expected output is much appreciated.
OK, so you want to sum up amount by just the date portion of a datetime and pair, and then "organize" all the pair+sum by date. You can do this by "regrouping" as follows. The first $group creates the sums but leaves you with repeating dates. The second $group fixes up the output to almost what you wish except that the dates remain as rvals to the _id instead of becoming lvals (field names) themselves.
db.foo.aggregate([
{
$group: {
_id: {d: {$dateToString: { format: "%Y-%m-%d", date: "$createdAt"}}, pair: "$pair"},
n: {$sum: "$amount"}
}
},
{
$group: {
_id: "$_id.d",
items: {$push: {pair: "$_id.pair", sum: "$n"}}
}
}
]);
If you REALLY want to have field names, then add these two stages after the second $group:
,{$project: {x: [["$_id","$items"]] }}
,{$replaceRoot: { newRoot: {$arrayToObject: "$x"} }}
This is what I could get to:
db.collection.aggregate([{
$group: {
_id: {
year: {
"$year": "$createdAt"
},
month: {
"$month": "$createdAt"
},
day: {
"$dayOfMonth": "$createdAt"
},
pair: "$pair"
},
sum: {
$sum: "$amount"
}
}
}])
For rest of the thing, you probably need to do app side parsing to generate output you want
Just starting with mongodb and used this post to do a similar query but I need the actual date like mm/dd/yyyy instead of just the day of year - help!
How to group by multiple fields in MongoDB when one is a date field
Here is the query I have (almost the same as the post above):
db.col.aggregate(
{ $group: {
_id: {
status: "$status",
dayOfYear: { $dayOfYear: "$datetime" }
},
hits: { $sum: "$hits" }
} }
)
Here is sample data:
{
"_id" : ObjectId("45f5ed29f4e1a522bfe53f13"),
"hits" : 2,
"status" : 400,
"datetime" : ISODate("2014-01-10T10:17:57.216Z")
}
You can add more date columns or remove them to get different groupings:
db.col.aggregate(
{ $group: {
_id: {
status: "$status",
month: { $month: "$datetime" },
day: { $dayOfYear: "$datetime" },
year: { $year: "$datetime" }
},
hits: { $sum: "$hits" }
} }
)
What this means is: for each unique status that appeared on a particular day (ignore time) sum up all hits.
I will try to keep this simple. I have user objects, inside my user objects I have a field that is an array which just contains ISODates of the the days a user has logged in. I would like to count how many users logged in on a particular date for all dates that exist.
Sample user:
{
"_id": "some_id",
"name": "bob",
"logins": [isodate, isodate, isodate...],
//...
}
I'd like an output that tells me something like:
{
"date": ISODate,
"number_of_users_logged_in": 10
}
Is this possible? How would I go about doing it?
You need to use $unwind operation explode array, then $group by date (using the granularity that you want) and $project only the date and count, as below:
db.user.aggregate({
$unwind: "$logins"
},
{
$group: {
_id: {
year: {
$year: "$logins"
},
month: {
$month: "$logins"
},
day: {
$dayOfMonth: "$logins"
},
hour: {
$hour: "$logins"
}
},
date: {
$first: "$logins"
},
count: {
$sum: 1
}
}
},
{
$project: {
_id : 0,
date: "$date",
number_of_users_logged_in: "$count"
}
})
I grouped by year/month/day/hour.