mongodb $avg aggregation calculation out by a few decimals. - mongodb

We have a collection in Mongodb which saves a value linked to a timestamp.
Our document looks as follows (I have pasted an actual one here):
{
"_id" : ObjectId("5a99596b0155fe271cfcf41d"),
"Timestamp" : ISODate("2018-03-02T16:00:00.000Z"),
"TagID" : ObjectId("59f8609eefbb4102f4c249e3"),
"Value" : 71.3,
"FileReferenceID" : ObjectId("000000000000000000000000"),
"WasValueInterpolated" : 0
}
What we then do is calculate the avg between two intervals for a given period, in more basic terms, work out an aggregated profile.
The aggregation code we use is:
{[{ "$match" :
{
"TagID" : ObjectId("59f8609eefbb4102f4c249e3") }
},
{
"$match" : { "Timestamp" : { "$gte" : ISODate("2018-03-12T00:00:00.001Z") } }
},
{
"$match" : { "Timestamp" : { "$lte" : ISODate("2018-03-13T00:00:00.001Z") } }
},
{
"$group" :
{
"_id" : { "GroupedMillisecond" :
{
"$let" :
{
"vars" :
{ "newMillisecondField" :
{
"$subtract" : ["$Timestamp", ISODate("2018-03-12T00:00:00.001Z")]
}
},
"in" : { "$subtract" : ["$$newMillisecondField", { "$mod" : ["$$newMillisecondField", NumberLong(1800000)] }] }
}
} }, "AverageValue" : { "$avg" : "$Value" }
}
}, { "$sort" : { "_id.GroupedMillisecond" : 1 } }
]}
The problem is this, the value it should give back is 71.3, but we get back 71.299999999999997
In this case, I posted above we are calculating the avg value, half hourly aggregated, for a day. And there is only one value per half hour logged (I checked this in the database). The value is also logged as a constant, as far back as I manually checked (a few months back) it is 71.3
So my question is why does the value differ?

Related

Mongo pipeline projection on sub array

I have the following document:
{
id: "myId",
boundedPlan: {
plannedWeeks: [
0 : {
weekStartDate: date
weekEndDate: date
plannedDays: []
},
...
]
},
unboundedPlan: {
plannedWeeks: [
0 : {
weekStartDate: date
weekEndDate: date
plannedDays: []
},
...
]
}
}
This plan represent some number of weeks in the future. The plan has a bounded or unbounded plan .
(I have the same structure on two different fields, because in the code they correspond to two different classes with different behavior).
I now have to do the following query.
"Get the current plan week given a date"
I wrote the following pipeline:
[
{ "$match" : { "ownerId" : "defaultOwnerId"}},
{ "$project" : {
"boundedPlan" : 1,
"unboundedPlan" : 1,
"plannedWeeks" : {
"$cond" : {
"if" : { "$ne" : ["$boundedPlan", null]}, "then" : "$boundedPlan.plannedWeeks",
"else" : "$unboundedPlan.plannedWeeks"}
}
}
},
{ "$match" : {
"boundedPlan.plannedWeeks" : {
"$elemMatch" : { "weekStart" : { "$lte" : { "$date" : "2021-03-10T00:00:00Z"}}, "weekEnd" : { "$gte" : { "$date" : "2021-03-10T00:00:00Z"}}}},
"$or" : [{
"unboundedPlan.plannedWeeks" : {
"$elemMatch" : { "weekStart" : { "$lte" : { "$date" : "2021-03-10T00:00:00Z"}}, "weekEnd" : { "$gte" : { "$date" : "2021-03-10T00:00:00Z"}}}}
}]}
}
]
The problem is the following:
knowing that Im operating over a plan with an unbounded plan and explicitly setting the second match :
"$match" : {
"unboundedPlan.plannedWeeks" : {
"$elemMatch" : { "weekStart" : { "$lte" : { "$date" : "2021-03-10T00:00:00Z"}}, "weekEnd" : { "$gte" : { "$date" : "2021-03-10T00:00:00Z"}}}},
}
works.
of course I dont know if the plan is from the unbounded or bounded field, so I tried to add the or operator, which causes no selection at all.
Is there something Im missing?
(working with spring data mongo)
Thank you
Ok, found out... I was using the OrOperator class from spring data mongo wrongly:
new Criteria("field1").orOperator(new Criteria("field2"))
is not the same as
new Criteria().orOperator(new Criteria("field1"), new Criteria("field2")

Mongo aggregation groups and subgroup

Hi I have a Mongo aggregation:
[
{
"$match" : {
"dateTime" : {
"$gte" : ISODate("2017-01-01T00:00:00.000+0000"),
"$lt" : ISODate("2018-01-01T00:00:00.000+0000")
}
}
},
{
"$group" : {
"_id" : "dateTime",
"totals" : {
"$sum" : "$payment.totalAmount"
},
"count" : {
"$sum" : 1.0
}
}
}
],
{
"allowDiskUse" : false
}
);
This works fine. It aggregates, and sums by date range I supplied and I get an output as follows.
{
"_id" : "dateTime",
"totals" : 2625293.825017198,
"count" : 12038.0
}
However, I also want to further refine the groupings.
I have a field called 'companyId' and I want to calculate the sum and count by each company Id for the given time range.
I would like to get an output similar to this, where I get a sum and count for each company ID in the date range I queried, not just a sum/count of all the data:
[
{
"companyId" : "Acme Co",
"totals" : 2625293.825017198,
"count" : 12038.0
},
{
"companyId" : "Beta Co",
"totals" : 162593.82198,
"count" : 138.0
},
{
"companyId" : "Cel Co",
"totals" : 593.82,
"count" : 38.0
}
]
How do I do this? I have not been able to find a good example online.
Thanks

Mongodb embedded document - aggregation query

I have got the below documents in Mongo database:
db.totaldemands.insert({ "data" : "UKToChina", "demandPerCountry" :
{ "from" : "UK" , to: "China" ,
"demandPerItem" : [ { "item" : "apples" , "demand" : 200 },
{ "item" : "plums" , "demand" : 100 }
] } });
db.totaldemands.insert({ "data" : "UKToSingapore",
"demandPerCountry" : { "from" : "UK" , to: "Singapore" ,
"demandPerItem" : [ { "item" : "apples" , "demand" : 100 },
{ "item" : "plums" , "demand" : 50 }
] } });
I need to write a query to find the count of apples exported from UK to any country.
I have tried the following query:
db.totaldemands.aggregate(
{ $match : { "demandPerCountry.from" : "UK" ,
"demandPerCountry.demandPerItem.item" : "apples" } },
{ $unwind : "$demandPerCountry.demandPerItem" },
{ $group : { "_id" : "$demandPerCountry.demandPerItem.item",
"total" : { $sum : "$demandPerCountry.demandPerItem.demand"
} } }
);
But it gives me the output with both apples and plums like below:
{ "_id" : "apples", "total" : 300 }
{ "_id" : "plums", "total" : 150 }
But, my expected output is:
{ "_id" : "apples", "total" : 300 }
So, How can I modify the above query to return only the count of apples exported from UK ?
Also, is there any other better way to achieve the output without unwinding ?
You can add another $match to get only apples.
As you have embedded document structure and performing aggregation, $unwind is required here. The alternate option could be map and reduce. However, unwind is most suitable here.
If you are thinking about performance, unwind shouldn't cause performance issue.
db.totaldemands.aggregate(
{ $match : { "demandPerCountry.from" : "UK" ,
"demandPerCountry.demandPerItem.item" : "apples" } },
{ $unwind : "$demandPerCountry.demandPerItem" },
{ $group : { "_id" : "$demandPerCountry.demandPerItem.item",
"total" : { $sum : "$demandPerCountry.demandPerItem.demand"
} } },
{$match : {"_id" : "apples"}}
);

MongoDB aggregation - ignore key names

I have a query:
db.events.aggregate(
{ $match: { "camera._id": "1NJE48", "start_timestamp": { $lte: 1407803834.07 } } },
{ $sort: { "start_timestamp": -1 } },
{ $limit: 2 },
{ $project: { "_id": 0, "snapshots": 1 } }
)
It returns data like so:
/* 0 */
{
"result" : [
{
"snapshots" : {
"1401330834010" : {
"uploaded_timestamp" : 1401330895,
"filename_timestamp" : 1401330834.01,
"timestamp" : 1401330834.01
},
"1401330835010" : {
"uploaded_timestamp" : 1401330896,
"filename_timestamp" : 1401330835.01,
"timestamp" : 1401330835.01
},
"1401330837010" : {
"uploaded_timestamp" : 1401330899,
"filename_timestamp" : 1401330837.01,
"timestamp" : 1401330837.01
}
}
},
{
"snapshots" : {
"1401319837010" : {
"uploaded_timestamp" : 1401319848,
"filename_timestamp" : 1401319837.01,
"timestamp" : 1401319837.01
},
"1401319838010" : {
"uploaded_timestamp" : 1401319849,
"filename_timestamp" : 1401319838.01,
"timestamp" : 1401319838.01
},
"1401319839010" : {
"uploaded_timestamp" : 1401319850,
"filename_timestamp" : 1401319839.01,
"timestamp" : 1401319839.01
}
}
}
],
"ok" : 1
}
I would like an array of snapshots:
/* 0 */
{
"result" : [
{
"uploaded_timestamp" : 1401330895,
"filename_timestamp" : 1401330834.01,
"timestamp" : 1401330834.01
},
{
"uploaded_timestamp" : 1401330896,
"filename_timestamp" : 1401330835.01,
"timestamp" : 1401330835.01
},
{
"uploaded_timestamp" : 1401330899,
"filename_timestamp" : 1401330837.01,
"timestamp" : 1401330837.01
},
{
"uploaded_timestamp" : 1401319848,
"filename_timestamp" : 1401319837.01,
"timestamp" : 1401319837.01
},
{
"uploaded_timestamp" : 1401319849,
"filename_timestamp" : 1401319838.01,
"timestamp" : 1401319838.01
},
{
"uploaded_timestamp" : 1401319850,
"filename_timestamp" : 1401319839.01,
"timestamp" : 1401319839.01
}
],
"ok" : 1
}
I.e. no key names. I'm struggling to understand how to deal with the aggregation framework when the key names are unique like they are here.
The problem is that the only way you know the key names is by looking at the document itself. MongoDB does not handle this type of situation well, in general. You are expected to know the structure of your own documents, i.e. to know what the keys are and what their types should be.
I don't know your use case and there's no sample document so I can't evaluate your data model, but having keys-as-values is generally a bad idea as you will run into a host of limitations whenever you can't say what the keys on a document should be a priori. Consider using an array instead of an embedded object for snapshots, or using an array of key-value pairs pattern like
{
...
"result" : [
{
"snapshots" : [
{
"key" : "1401330834010",
"value" : {
"uploaded_timestamp" : 1401330895,
"filename_timestamp" : 1401330834.01,
"timestamp" : 1401330834.01
},
}
]
},
...
}
If you provide a sample document and some detail about what you're trying to accomplish I'd be happy to provide more complete advice.
Came up with a stop gap solution. We will store an array of the snapshot keys, in an array on an event. It essentially acts as an index. We can then perform 2 queries - one to fetch the keys, and do a filter, and another to correctly fetch the single snapshot we need.
It's no pretty, nor backwards compatible, but it will hopefully speed it up.

Mongo map-reduce output, how to read results back?

I have a map-reduce query that "works" and does what I want however I have so far spectacularly failed to make use of my output data because I cannot workout how to read it back... let me explain... here is my emit:
emit( { jobid: this.job_id, type: this.type}, { count: 1 })
and the reduce function:
reduce: function (key, values) {
var total = 0;
for( i = 0; i < values.length; i++ ) {
total += values[i].count;
}
return { jobid: this.job_id, type:this.type, count: total};
},
It functions and the output I get in the results collection looks like this:
{ "_id" : { "jobid" : "5051ef142a120", "type" : 3 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051ef142a120", "type" : 5 }, "value" : { "count" : 43 } }
{ "_id" : { "jobid" : "5051f1a9d5442", "type" : 2 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f1a9d5442", "type" : 3 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f299340b1", "type" : 2 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f299340b1", "type" : 3 }, "value" : { "count" : 1 } }
BUT HOW the hell do I issue a query that says find me all jobid entries by "jobid" whilst ignoring the type? I tried this intiailly, expecting two rows of output but got none!
db.mrtest.find( { "_id": { "jobid" : "5051f299340b1" }} );
I have also tried and failed with:
db.mrtest.find( { "_id": { "jobid" : "5051f299340b1" }} );
and whilst:
db.mrtest.find( { "_id" : { "jobid" : "5051f299340b1", "type" : 2 }} )
does produce one row of output as hoped for, changing it to this again fails to produce anything:
db.mrtest.find( { "_id" : { "jobid" : "5051f299340b1", "type" : { $in: [2] }}} )
I get the impression that you can't do such things with the _id field, or can you? I am thinking I need to re-organise my mr output instead but that feels like failing somehow ?!?!
Help!
PS: If anybody can explain why the count is contained in a field called "value", that would also be welcome!"5051f299340b1"
Have you tried:
db.mrtest.find( { "_id.jobid": "506ea3a85e126" })
That works for me!
db.mrtest.find( { "_id.jobid": "506ea3a85e126" })