Mongodb Time Series operations and generation - mongodb

I've got a Mongodb Collection with this kind of docs :
{
"_id" : ObjectId("53cb898bed4bd6c24ae07a9f"),
"account" : "C1"
"created_on" : ISODate("2014-10-01T01:23:00.000Z")
"value" : 253
}
and
{
"_id" : ObjectId("52cb898bed4bd6c24ae06a9e"),
"account" : "C2"
"created_on" : ISODate("2014-10-01T01:23:00.000Z")
"value" : 9381
}
There is a document every minutes for C1 and C2.
I would like to generate data for an other account "C0" which will be equal to : (C2 - C1)*0.25
So the aim is to generate data for every minutes in the collection.
According to you, is it possible to do that in mongo shell ?
Thank you very much :)

The logic to solve this problem, is as below:
a) group all the records by created_on date.
b) get the value of both the documents in each group.
c) calculate the difference the C2 and C1 documents for each group.
d) In case one of the documents is missing difference
would be the value of the existing document.
d) project a document with value as (difference*.25) in each group.
e) insert the projected document to the collection.
I would like to propose two solutions to this, the first one would be on your assumption,
There is a document every minutes for C1 and C2.
So for every created_on time, there would be only two documents, C1 and C2.
db.time.aggregate([ {
$match : {
"account" : {
$in : [ "C1", "C2" ]
}
}
}, {
$group : {
"_id" : "$created_on",
"first" : {
$first : "$value"
},
"second" : {
$last : "$value"
},
"count" : {
$sum : 1
}
}
}, {
$project : {
"_id" : 0,
"value" : {
$multiply : [ {
$cond : [ {
$lte : [ "$count", 1 ]
}, "$first", {
$subtract : [ "$first", "$second" ]
} ]
}, 0.25 ]
},
"created_on" : "$_id",
"account" : {
$literal : "C0"
}
}
} ]).forEach(function(doc) {
doc.value = Math.abs(doc.value);
db.time.insert(doc);
});
The second solution is based on real-time scenarios. For a particular created_on time, there can be 'n' number of C1 documents and 'm' number of C2 documents with different values, but we would need only one 'C0' document representing the differences, for that particular created_on time. You would need an extra $group pipeline operator as below:
db.time.aggregate([ {
$match : {
"account" : {
$in : [ "C1", "C2" ]
}
}
}, {
$group : {
"_id" : {
"created_on" : "$created_on",
"account" : "$account"
},
"created_on" : {
$first : "$created_on"
},
"values" : {
$sum : "$value"
}
}
}, {
$group : {
"_id" : "$created_on",
"first" : {
$first : "$values"
},
"second" : {
$last : "$values"
},
"count" : {
$sum : 1
}
}
}, {
$project : {
"_id" : 0,
"value" : {
$multiply : [ {
$cond : [ {
$lte : [ "$count", 1 ]
}, "$first", {
$subtract : [ "$first", "$second" ]
} ]
}, 0.25 ]
},
"created_on" : "$_id",
"account" : {
$literal : "C0"
}
}
} ]).forEach(function(doc) {
doc.value = Math.abs(doc.value);
db.time.insert(doc);
});

Related

MongoDB filtering out subdocuments with lookup aggregation

Our project database has a capped collection called values which gets updated every few minutes with new data from sensors. These sensors all belong to a single sensor node, and I would like to query the last data from these nodes in a single aggregation. The problem I am having is filtering out just the last of ALL the types of sensors while still having only one (efficient) query. I looked around and found the $group argument, but I can't seem to figure out how to use it correctly in this case.
The database is structured as follows:
nodes:
{
"_id": 681
"sensors": [
{
"type": "foo"
},
{
"type": "bar"
}
]
}
values:
{
"_id" : ObjectId("570cc8b6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 10
}
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 681,
"value" : 20
}
{
"_id" : ObjectId("167bc997bb66750d5740665e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 200,
"value" : 20
}
{
"_id" : ObjectId("110cc9c6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 12
}
so let's imagine I want the data from node 681, I would want a structure like this:
nodes:
{
"_id": 681
"sensors": [
{
"_id" : ObjectId("570cc8b6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 10
},
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 681,
"value" : 20
}
]
}
Notice how one value of foo is not queried, because I want to only get the latest value possible if there are more than one value (which is always going to be the case). The ordering of the collection is already according to the timestamp because the collection is capped.
I have this query, but it just gets all the values from the database (which is waaay too much to do in a lifetime, let alone one request of the web app), so I was wondering how I would filter it before it gets aggregated.
query:
db.nodes.aggregate(
[
{
$unwind: "$sensors"
},
{
$match:{
nodeid: 681
}
},
{
$lookup:{
from: "values", localField: "sensors.type", foreignField: "type", as: "sensors"
}
}
}
]
)
Try this
// Pipeline
[
// Stage 1 - sort the data collection if not already done (optional)
{
$sort: {
"timestamp":1
}
},
// Stage 2 - group by type & nodeid then get first item found in each group
{
$group: {
"_id":{type:"$type",nodeid:"$nodeid"},
"sensors": {"$first":"$$CURRENT"} //consider using $last if your collection is on reverse
}
},
// Stage 3 - project the fields in desired
{
$project: {
"_id":"$sensors._id",
"timestamp":"$sensors.timestamp",
"type":"$sensors.type",
"nodeid":"$sensors.nodeid",
"value":"$sensors.value"
}
},
// Stage 4 - group and push it to array sensors
{
$group: {
"_id":{nodeid:"$nodeid"},
"sensors": {"$addToSet":"$$CURRENT"}
}
}
]
as far as I got document structure, there is no need to use $lookup as all data is in readings(values) collection.
Please see proposed solution:
db.readings.aggregate([{
$match : {
nodeid : 681
}
},
{
$group : {
_id : {
type : "$type",
nodeid : "$nodeid"
},
readings : {
$push : {
timestamp : "$timestamp",
value : "$value",
id : "$_id"
}
}
}
}, {
$project : {
_id : "$_id",
readings : {
$slice : ["$readings", -1]
}
}
}, {
$unwind : "$readings"
}, {
$project : {
_id : "$readings.id",
type : "$_id.type",
nodeid : "$_id.nodeid",
timestamp : "$readings.timestamp",
value : "$readings.value",
}
}, {
$group : {
_id : "$nodeid",
sensors : {
$push : {
_id : "$_id",
timestamp : "$timestamp",
value : "$value",
type:"$type"
}
}
}
}
])
and output:
{
"_id" : 681,
"sensors" : [
{
"_id" : ObjectId("110cc9c6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
"value" : 12,
"type" : "foo"
},
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"value" : 20,
"type" : "bar"
}
]
}
Any comments welcome!

compare two collection in mongodb

I have two different collection book and music in JSON .First I give a book collection example:
{
"_id" : ObjectId("b1"),
"author" : [
"Mary",
],
"title" : "Book1",
}
{
"_id" : ObjectId("b2"),
"author" : [
"Joe",
"Tony",
"Mary"
],
"title" : "Book2",
}
{
"_id" : ObjectId("b3"),
"author" : [
"Joe",
"Mary"
],
"title" : "Book3",
}
.......
Mary writes 3 books, Joe write 2 books, Tony writes 1 book. Second I give a music collection example:
{
"_id" : ObjectId("m1"),
"author" : [
"Tony"
],
"title" : "Music1",
}
{
"_id" : ObjectId("m2"),
"author" : [
"Joe",
"Tony"
],
"title" : "Music2",
}
.......
Tony has 2 musics, Joe has 1 music, Mary has 0 music.
I hope to get the number of authors who write more books than music.
Thus, Mary(3 > 0) and Joe(2 > 1) should take into consideration, but not Tony(1 < 2). Thus the final result should be 2(Mary and Joe).
I write down following code, but don't know how to compare:
db.book.aggregate([
{ $project:{ _id:0, author:1}},
{ $unwind:"$author" },
{$group:{_id:"$author", count:{$sum:1}}}
]
)
db.music.aggregate([
{ $project:{ _id:0, author:1}},
{ $unwind:"$author" },
{$group:{_id:"$author", count:{$sum:1}}}
]
)
Is it so far right? How to do the following comparison? Thanks.
to solve that problem, we need to use $out phase and store result of both queries in intermediate collection and then use aggregated query to join them ($lookup).
db.books.aggregate([{
$project : {
_id : 0,
author : 1
}
}, {
$unwind : "$author"
}, {
$group : {
_id : "$author",
count : {
$sum : 1
}
}
}, {
$project : {
_id : 0,
author : "$_id",
count : 1
}
}, {
$out : "bookAuthors"
}
])
db.music.aggregate([{
$project : {
_id : 0,
author : 1
}
}, {
$unwind : "$author"
}, {
$group : {
_id : "$author",
count : {
$sum : 1
}
}
}, {
$project : {
_id : 0,
author : "$_id",
count : 1
}
}, {
$out : "musicAuthors"
}
])
db.bookAuthors.aggregate([{
$lookup : {
from : "musicAuthors",
localField : "author",
foreignField : "author",
as : "music"
}
}, {
$unwind : "$music"
}, {
$project : {
_id : "$author",
result : {
$gt : ["$count", "$music.count"]
},
count : 1,
}
}, {
$match : {
result : true
}
}
])
EDIT CHANGES:
used author field instead of _id
added logical statement embeded in document in $project phase
result : { $gt : ["$count", "$music.count"]
Any questions welcome!
Have a fun!

MongoDB $sum and $avg of sub documents

I need to get $sum and $avg of subdocuments, i would like to get $sum and $avg of Channels[0].. and other channels as well.
my data structure looks like this
{
_id : ... Location : 1,
Channels : [
{ _id: ...,
Value: 25
},
{
_id: ... ,
Value: 39
},
{
_id: ..,
Value: 12
}
]
}
In order to get the sum and average of the Channels.Value elements for each document in your collection you will need to use mongodb's Aggregation processing. Further, since Channels is an array you will need to use the $unwind operator to deconstruct the array.
Assuming that your collection is called example, here's how you could get both the document sum and average of the Channels.Values:
db.example.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$_id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )
The output from your post's data would be:
{
"_id" : SomeObjectIdValue,
"documentSum" : 76,
"documentAvg" : 25.333333333333332
}
If you have more than one document in your collection then you will see a result row for each document containing a Channels array.
Solution 1: Using two groups based this example:
previous question
db.records.aggregate(
[
{ $unwind: "$Channels" },
{ $group: {
_id: {
"loc" : "$Location",
"cId" : "$Channels.Id"
},
"value" : {$sum : "$Channels.Value" },
"average" : {$avg : "$Channels.Value"},
"maximun" : {$max : "$Channels.Value"},
"minimum" : {$min : "$Channels.Value"}
}},
{ $group: {
_id : "$_id.loc",
"ChannelsSumary" : { $push :
{ "channelId" : '$_id.cId',
"value" :'$value',
"average" : '$average',
"maximun" : '$maximun',
"minimum" : '$minimum'
}}
}
}
]
)
Solution 2:
there is property i didn't show on my original question that might of help "Channels.Id" independent from "Channels._Id"
db.records.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$Channels.Id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )

mongodb aggregation find min value and other fields in nested array

Is it possible to find in a nested array the max date and show its price then show the parent field like the actual price.
The result I want it to show like this :
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"actualPrice":19500,
"lastModifDate" :ISODate("2015-05-04T22:53:50.583Z"),
"price":"16000"
}
The data :
db.adds.findOne()
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"addTitle" : "Clio pack luxe",
"actualPrice" : 19500,
"fistModificationDate" : ISODate("2015-05-03T22:00:00Z"),
"addID" : "1746540",
"history" : [
{
"price" : 18000,
"modifDate" : ISODate("2015-05-04T22:01:47.272Z"),
"_id" : ObjectId("5547ec4bfeb20b0414e8e51b")
},
{
"price" : 16000,
"modifDate" : ISODate("2015-05-04T22:53:50.583Z"),
"_id" : ObjectId("5547f87e83a1dae00bc033fa")
},
{
"price" : 19000,
"modifDate" : ISODate("2015-04-04T22:53:50.583Z"),
"_id" : ObjectId("5547f87e83a1dae00bc033fe")
}
],
"__v" : 1
}
my query
db.adds.aggregate(
[
{ $match:{addID:"1746540"}},
{ $unwind:"$history"},
{ $group:{
_id:0,
lastModifDate:{$max:"$historique.modifDate"}
}
}
])
I dont know how to include other fields I used $project but I get errors
thanks for helping
You could try the following aggregation pipeline which does not need to make use of the $group operator stage as the $project operator takes care of the fields projection:
db.adds.aggregate([
{
"$match": {"addID": "1746540"}
},
{
"$unwind": "$history"
},
{
"$project": {
"actualPrice": 1,
"lastModifDate": "$history.modifDate",
"price": "$history.price"
}
},
{
"$sort": { "lastModifDate": -1 }
},
{
"$limit": 1
}
])
Output
/* 1 */
{
"result" : [
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"actualPrice" : 19500,
"lastModifDate" : ISODate("2015-05-04T22:53:50.583Z"),
"price" : 16000
}
],
"ok" : 1
}

$avg in mongodb aggregation

Document looks like this:
{
"_id" : ObjectId("361de42f1938e89b179dda42"),
"user_id" : "u1",
"evaluator_id" : "e1",
"candidate_id" : ObjectId("54f65356294160421ead3ca1"),
"OVERALL_SCORE" : 150,
"SCORES" : [
{ "NAME" : "asd", "OBTAINED_SCORE" : 30}, { "NAME" : "acd", "OBTAINED_SCORE" : 36}
]
}
Aggregation function:
db.coll.aggregate([ {$unwind:"$SCORES"}, {$group : { _id : { user_id : "$user_id", evaluator_id : "$evaluator_id"}, AVG_SCORE : { $avg : "$SCORES.OBTAINED_SCORE" }}} ])
Suppose if there are two documents with same "user_id" (say u1) and different "evaluator_id" (say e1 and e2).
For example:
1) Average will work like this ((30 + 20) / 2 = 25). This is working for me.
2) But for { evaluator_id : "e1" } document, score is 30 for { "NAME" : "asd" } and { evaluator_id : "e2" } document, score is 0 for { "NAME" : "asd" }. In this case, I want the AVG_SCORE to be 30 only (not (30 + 0) / 2 = 15).
Is it possible through aggregation??
Could any one help me out.
It's possible by placing a $match between the $unwind and $group aggregation pipelines to first filter the arrays which match the specified condition to include in the average computation and that is, score array where the obtained score is not equal to 0 "SCORES.OBTAINED_SCORE" : { $ne : 0 }
db.coll.aggregate([
{
$unwind: "$SCORES"
},
{
$match : {
"SCORES.OBTAINED_SCORE" : { $ne : 0 }
}
},
{
$group : {
_id : {
user_id : "$user_id",
evaluator_id : "$evaluator_id"
},
AVG_SCORE : {
$avg : "$SCORES.OBTAINED_SCORE"
}
}
}
])
For example, the aggregation result for this document:
{
"_id" : ObjectId("5500aaeaa7ef65c7460fa3d9"),
"user_id" : "u1",
"evaluator_id" : "e1",
"candidate_id" : ObjectId("54f65356294160421ead3ca1"),
"OVERALL_SCORE" : 150,
"SCORES" : [
{
"NAME" : "asd",
"OBTAINED_SCORE" : 0
},
{
"NAME" : "acd",
"OBTAINED_SCORE" : 36
}
]
}
will yield:
{
"result" : [
{
"_id" : {
"user_id" : "u1",
"evaluator_id" : "e1"
},
"AVG_SCORE" : 36
}
],
"ok" : 1
}