MongoDB Aggregation Data - mongodb

I am new to Mongo Aggregation and have this data
{
"_id": {
"$oid": "5654a8f0d487dd1434571a6e"
},
"ValidationDate": {
"$date": "2015-11-24T13:06:19.363Z"
},
"DataRaw": " WL 00100100012015-08-28 02:44:17+0000+ 16.81 8.879 1084.00",
"ReadingsAreValid": true,
"locationID": " WL 001",
"Readings": {
"pH": {
"value": 8.879
},
"SensoreDate": {
"value": {
"$date": "2015-08-28T02:44:17.000Z"
}
},
"temperature": {
"value": 16.81
},
"Conductivity": {
"value": 1084
}
},
"HMAC":"ecb98d73fcb34ce2c5bbcc9c1265c8ca939f639d791a1de0f6275e2d0d71a801"
}
My goal is to calculate the average temperature for every two hours interval, then per month, year and week. I have tried various queries but no luck. This is what i have tried so far
data.aggregate([{"$unwind":"$Readings"},
{"$project":{"HourRecord":{"$hour":"Readings.SensoreDate.value"},
"YearRecord":{"$year":"$Readings.SensoreDate.value"}}},
{'$group' : {'_id' : "$locationID",
'AverageTemp' : { '$avg' : '$Readings.temperature.value'}}}
])
and i got an empty results like this
{u'ok': 1.0, u'waitedMS': 0L, u'result': [{u'AverageTemp': None, u'_id': None}]}
I have tried several other combinations but still got empty results.
The following Queries Return the required result per hour, month etc but how to i group by two hours interval instead of one hour
test_Agg.aggregate([{"$unwind":"$Readings"},
{"$project":{ "HourRecord": { "$hour":"$Readings.SensoreDate.value"},
"YearRecord": {"$year":"$Readings.SensoreDate.value"},
"MonthRecord": {"$month":"$Readings.SensoreDate.value"},
"locationID" : 1,
"Readings.pH.value":1,
'Readings.temperature.value' : 1}
},
{'$group' : {'_id' :"$HourRecord",
'AverageTemp' : { '$avg' : '$Readings.temperature.value'}}
}])

The second pipeline $project in the query didn't project the required values used in the subsequent pipelines. Hence, the result was empty.
The required fields have been added in the $project. Try the below query.
db.temperature.aggregate([{"$unwind":"$Readings"},
{"$project":{ "HourRecord": { "$hour":"$Readings.SensoreDate.value"},
"YearRecord": {"$year":"$Readings.SensoreDate.value"},
locationID : 1,
'Readings.temperature.value' : 1}
},
{'$group' : {'_id' : "$locationID",
'AverageTemp' : { '$avg' : '$Readings.temperature.value'}}}
]);
Output for the one document provided in the post:-
{
"_id" : " WL 001",
"AverageTemp" : 16.81
}

Related

How to combine Documents in aggregation pipeline with MongoDB Java driver 3.6?

I am using an aggregation pipeline with the MongoDB Java driver version 3.6. If I have documents that look something like:
doc1 --
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
doc2 --
{
"CAR": {
"VIN": "ASDF1234",
"AVAILABILITY": "In Stock"
}
}
And if I submit a query like:
collection.aggregate(
Arrays.asList(
Aggregates.match(
and(
in("CAR.VIN", vinList),
or(
eq("CAR.MAKE", carMake),
eq("CAR.AVAILABILITY", carAvailability),
)
)
)
)
)
Let us assume that there are exactly two different records for which the "CAR.VIN" criteria match for every VIN, and I am going to get two results. Rather than deal with two results each time, I would like to merge the documents so that the result looks like this:
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord",
"AVAILABILITY": "In Stock"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
The example where I have two and only two results trivializes my need for this. Imagine that vinList is a list of 10000 values, and it might return 2 x 10000 documents. When I return an AggregateIterable to the client that is calling my code, I do not want to impose the requirement that they have to group or collate the results in any way, but that they will receive one document for each result that has all of the information that they will want to parse, cleanly and easily.
Of course, people will suggest that the data is simply combined into one document with all of the data in the MongoDB collection. For reasons that I cannot control, there are two separate documents corresponding to each VIN in the same collection, and that is something that I am unable to change. There is a value in our system that makes this more reasonable than it might seem, so please don't focus on this apparent problem with the data.
I am trying, with not much luck, to utilize the Aggretes.group() operation to merge the fields in my aggregation pipeline. Accumulators.push seems to be the closest operation to what I need, but I do not want to complicate the document structure with extra arrays, etc. Is there a straightforward approach that I am not seeing?
you can try $mergeObjects added in mongo v3.6
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$mergeObjects : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : {
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
}
}
>
to get features as array
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$push : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : [
{
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
},
null
]
}
>

Query to count number of occurrence in array grouped by day

I have the following document structure:
(trackerEventsCollection) =
{
"_id" : ObjectId("5b26c4fb7c696201040c8ed1"),
"trackerId" : ObjectId("598fc51324h51901043d76de"),
"trackingEvents" : [
{
"type" : "checkin",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:34:58.964Z")
},
{
"type" : "power",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:51:23.795Z")
},
{
"type" : "position",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:51:23.985Z")
}
]
}
I would like to write a query that would count number of trackingEvents with type "type" : "power" grouped by day. This seems to be quite tricky to me because parent document does not have date and I should rely on timestamp field that belongs to the trackingEvents array members.
I'm not really experienced mongodb user and couldn't understand how can this be achieved so far.
Would really appreciate any help, thanks
To process your nested array as a separate documents you need to use $unwind. In the next stage you can use $match to filter out by type. Then you can group by single days counting occurences. The point is that you have to build grouping key containing year, month and day like in following code:
db.trackerEvents.aggregate([
{ $unwind: "$trackingEvents" },
{ $match: { "trackingEvents.type": "power" } },
{
$group: {
_id: {
year: { $year:"$trackingEvents.timestamp" },
month:{ $month:"$trackingEvents.timestamp" },
day: { $dayOfMonth:"$trackingEvents.timestamp" }
},
count: { $sum: 1 }
}
}
])

Project with Match in aggregate not working after use substr in mongodb

I have face one use with mongodb.
below is my sample record.
{
"_id" : ObjectId("56fa21da0be9b4e3328b4567"),
"us_u_id" : "1459169911J4gPxpYQ7A",
"us_dealer_u_id" : "1459169911J4gPxpYQ7A",
"us_corporate_dealer_u_id" : "1459169173rgSdxVeMLa",
"us_oem_u_id" : "1459169848CK5yOpXito",
"us_part_number" : "E200026",
"us_sup_part_number" : "",
"us_alter_part_number" : "",
"us_qty" : 0,
"us_sale_qty" : 2,
"us_date" : "20160326",
"us_source_name" : "BOMAG",
"us_source_address" : "",
"us_source_city" : "",
"us_source_state" : "",
"us_zip_code" : "",
"us_alternet_source_code" : "",
"updated_at" : ISODate("2016-03-29T06:34:02.728Z"),
"created_at" : ISODate("2016-03-29T06:34:02.728Z")
}
I have try to get all recored having unique date
So, I have made below query using aggregate
.aggregate(
[
{
"$match":{
"yearSubstring":"2016",
"monthSubstring":"03",
"us_dealer_u_id":"1459169911J4gPxpYQ7A"
}
},
{
"$project":
{
"yearSubstring":{"$substr":["$us_date",0,4]},
"monthSubstring":{"$substr":["$us_date",4,2]},
"daySubstring":{"$substr":["$us_date",6,2]}
}
},
{
"$group":
{
"_id":{"monthSubstring":"$monthSubstring",
"yearSubstring":"$yearSubstring",
"daySubstring":"$daySubstring"
},
"daySubstring":{"$last":"$daySubstring"}
}
},
{"$sort":{"us_date":1}}
]
)
I have try both way to pass year and month (as string and as int)
but I have get blank result.
if I'm remove month and year from condition then record came.
mostly I have try all the diff. diff. solution but result is same.
Thank in advance.
You have written incorrect query.
You don't have yearSubstring and monthSubstring fields on this stage.
{
"$match":{
"yearSubstring":"2016",
"monthSubstring":"03",
"us_dealer_u_id":"1459169911J4gPxpYQ7A"
}
},
You should write as following:
.aggregate(
[
{
"$match":{
"us_dealer_u_id":"1459169911J4gPxpYQ7A"
}
},
{
"$project":
{
"yearSubstring":{"$substr":["$us_date",0,4]},
"monthSubstring":{"$substr":["$us_date",4,2]},
"daySubstring":{"$substr":["$us_date",6,2]}
}
},
{
"$match":{
"yearSubstring":"2016",
"monthSubstring":"03"
}
},
{
"$group":
{
"_id":{"monthSubstring":"$monthSubstring",
"yearSubstring":"$yearSubstring",
"daySubstring":"$daySubstring"
},
"daySubstring":{"$last":"$daySubstring"}
}
},
{"$sort":{"us_date":1}}
]
)
If you want to get other fields, you should include them into projection stage.

Sorting by maximum array field, ascending or descending

In my Meteor app, I have a collection of documents with an array of subdocuments that look like this:
/* 1 */
{
"_id" : "5xF9iDTj3reLDKNHh",
"name" : "Lorem ipsum",
"revisions" : [
{
"number" : 0,
"comment" : "Dolor sit amet",
"created" : ISODate("2016-02-11T01:22:45.588Z")
}
],
"number" : 1
}
/* 2 */
{
"_id" : "qTF8kEphNoB3eTNRA",
"name" : "Consecitur quinam",
"revisions" : [
{
"comment" : "Hoste ad poderiquem",
"number" : 1,
"created" : ISODate("2016-02-11T23:25:46.033Z")
},
{
"number" : 0,
"comment" : "Fagor questibilus",
"created" : ISODate("2016-02-11T01:22:45.588Z")
}
],
"number" : 2
}
What I want to do is query this collection and sort the result set by the maximum date in the created field of the revisions array. Something I haven't been able to pull off yet. Some constraints I have are:
Just sorting by revisions.created doesn't cut it, because the date used from the collection depends on the sort direction. I have to use the maximum date in the set regardless of sort order.
I cannot rely on post-query manipulation of an unsorted result set, so, this must be done by a proper query or aggregation by the database.
There's no guarantee that the revisions array will be pre-sorted.
There may be extra fields in some documents and those have to come along, so careful with $project.
Meteor is still using MongoDB 2.6, newer API features are no good :(
The basic problem with what you are asking here comes down to the fact that the data in question is within an "array", and therefore there are some basic assumptions made by MongoDB as to how this gets handled.
If you applied a sort in "descending order", then MongoDB will do exactly what you ask and sort the documents by the "largest" value of the specified field within the array:
.sort({ "revisions.created": -1 ))
But if instead you sort in "ascending" order then of course the reverse is true and the "smallest" value is considered.
.sort({ "revisions.created": 1 })
So the only way of doing this means working out which is the maximum date from the data in the array, and then sorting on that result. This basically means applying .aggregate(), which for meteor is a server side operation, being unfortunately something like this:
Collection.aggregate([
{ "$unwind": "$revisions" },
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"revisions": { "$push": "$revisions" },
"number": { "$first": "$number" }
"maxDate": { "$max": "$revisions.created" }
}},
{ "$sort": { "maxDate": 1 }
])
Or at best with MongoDB 3.2, where $max can be applied directly to an array expression:
Collection.aggregate([
{ "$project": {
"name": 1,
"revisions": 1,
"number": 1,
"maxDate": {
"$max": {
"$map": {
"input": "$revisions",
"as": "el",
"in": "$$el.created"
}
}
}
}},
{ "$sort": { "maxDate": 1 } }
])
But really both are not that great, even if the MongoDB 3.2 approach has way less overhead than what is available to prior versions, it's still not as good as you can get in terms of performance due to the need to pass through the data and work out the value to sort on.
So for best performance, "always" keep such data you are going to need "outside" of the array. For this there is the $max "update" operator, which will only replace a value within the document "if" the provided value is "greater than" the existing value already there. i.e:
Collection.update(
{ "_id": "qTF8kEphNoB3eTNRA" },
{
"$push": {
"revisions": { "created": new Date("2016-02-01") }
},
"$max": { "maxDate": new Date("2016-02-01") }
}
)
This means that the value you want will "always" be already present within the document with the expected value, so it is just now a simple matter of sorting on that field:
.sort({ "maxDate": 1 })
So for my money, I would go though the existing data with either of the .aggregate() statements available, and use those results to update each doccument to contain a "maxDate" field. Then change the coding of all additions and revisions of array data to apply that $max "update" on every change.
Having a solid field rather than a calculation always makes much more sense if you are using it often enough. And the maintenance is quite simple.
In any case, considering the above applied example date, which is "less than" the other maximum dates present would return for me in all forms:
{
"_id" : "5xF9iDTj3reLDKNHh",
"name" : "Lorem ipsum",
"revisions" : [
{
"number" : 0,
"comment" : "Dolor sit amet",
"created" : ISODate("2016-02-11T01:22:45.588Z")
}
],
"number" : 1,
"maxDate" : ISODate("2016-02-11T01:22:45.588Z")
}
{
"_id" : "qTF8kEphNoB3eTNRA",
"name" : "Consecitur quinam",
"revisions" : [
{
"comment" : "Hoste ad poderiquem",
"number" : 1,
"created" : ISODate("2016-02-11T23:25:46.033Z")
},
{
"number" : 0,
"comment" : "Fagor questibilus",
"created" : ISODate("2016-02-11T01:22:45.588Z")
},
{
"created" : ISODate("2016-02-01T00:00:00Z")
}
],
"number" : 2,
"maxDate" : ISODate("2016-02-11T23:25:46.033Z")
}
Which correctly places the first document at the top of the sort order with consideration to the "maxDate".

mongo- convert the field value of one field to create datetime during projection

I am using the mongodb agreegate framework, and this is how my normal object looks like
{
"_id" : "6b109972c9bd9d16a09b70b96686f691bfe2f9b6",
"history" : [
{
"dtEntry" : 1428929906,
"type" : "I",
"refname" : "ref1"
},
{
"dtEntry" : 1429082064,
"type" : "U",
"refname" : "ref1"
}
],
"c" : "SomeVal",
"p" : "anotherVal"
}
here the history.dtEntry is an epoch value (please don't advise me to change this to isodate before entering here, Its out of my scope).
I want to project the c,p history.type,and history.dtEntry as (day of month).
db.mydataset.aggregate({$project:{c:1,p:1,type:"$history.type",DtEntry:"$history.dtEntry",dater:{$dayOfMonth:new Date(DtEntry)}}})
if I use any epoch value directly the day of month comes out just fine, but I have to pass the value of dtEntry and neither of the ways seem to work for me
I have tried
$dayOfMonth:new Date(DtEntry)
$dayOfMonth:new Date(history.DtEntry)
$dayOfMonth:new Date("history.DtEntry")
$dayOfMonth:new Date("$history.DtEntry")
I found some other ways may be this will help you, check below aggregation query
db.collectionName.aggregate({
"$unwind": "$history"
}, {
"$project": {
"c": 1,
"p": 1,
"type": "$history.type",
"dater": {
"$dayOfMonth": {
"$add": [new Date(0), {
"$multiply": ["$history.dtEntry", 1000]
}]
}
}
}
})