Exclude 0 values from mongodb $avg but keeping other fields - mongodb

I run some aggregation queries on MongoDB 3.2.
I would like to group documents by a field with an average on another numeric field.
I need the average to ignore the 0 values.
The problem is I can't entirely filter the document, cause there is another field I need for a count.
Let's illustrate :
This is the structure of my documents:
{"stringToGroupByOn":"foo", "valueToAvg":42, "valueToSum":21}
{"stringToGroupByOn":"foo", "valueToAvg":0, "valueToSum":13}
I can't just filter like this:
db.foobar.aggregate([
{
$match : { valueToAvg : { $gt : 0 } }
},
{
$group : {
_id : '$stringToGroupByOn',
avg : { $avg : '$valueToAvg' },
count : { $sum : '$valueToSum' }
}
}
])
Because I lose the value 13 for the count.
Do you think there is a way to do it in only one query ?

You can use $cond in projection to set null instead of 0, as null is not considered when using average.
db.avg.aggregate([
{$project:{
_id:1,
valueToSum:1,
stringToGroupByOn:1,
valueToAvg:{$cond:
{ if: { $eq: [ "$valueToAvg", 0 ] },
then: null,
else: "$valueToAvg" }}
}},
{
$group : {
_id : '$stringToGroupByOn',
avg : { $avg : '$valueToAvg' },
count : { $sum : '$valueToSum' }
}
}
output:
{
"_id" : "foo",
"avg" : 42.0,
"count" : 34.0
}

Related

mongoDB distict problems

It's one of my data as JSON format:
{
"_id" : ObjectId("5bfdb412a80939b6ed682090"),
"accounts" : [
{
"_id" : ObjectId("5bf106eee639bd0df4bd8e05"),
"accountType" : "DDA",
"productName" : "DDA1"
},
{
"_id" : ObjectId("5bf106eee639bd0df4bd8df8"),
"accountType" : "VSA",
"productName" : "VSA1"
},
{
"_id" : ObjectId("5bf106eee639bd0df4bd8df9"),
"accountType" : "VSA",
"productName" : "VSA2"
}
]
}
I want to make a query to get all productName(no duplicate) of accountType = VSA.
I write a mongo query:
db.Collection.distinct("accounts.productName", {"accounts.accountType": "VSA" })
I expect: ['VSA1', 'VSA2']
I get: ['DDA','VSA1', 'VSA2']
Anybody knows why the query doesn't work in distinct?
Second parameter of distinct method represents:
A query that specifies the documents from which to retrieve the distinct values.
But the thing is that you showed only one document with nested array of elements so whole document will be returned for your condition "accounts.accountType": "VSA".
To fix that you have to use Aggregation Framework and $unwind nested array before you apply the filtering and then you can use $group with $addToSet to get unique values. Try:
db.col.aggregate([
{
$unwind: "$accounts"
},
{
$match: {
"accounts.accountType": "VSA"
}
},
{
$group: {
_id: null,
uniqueProductNames: { $addToSet: "$accounts.productName" }
}
}
])
which prints:
{ "_id" : null, "uniqueProductNames" : [ "VSA2", "VSA1" ] }

MongoDB: how do I check that all array entries are unique in the entire collection?

A little brainteaser for mongo users.
I have a collection of documents like
{
"_id" : ObjectId("19628f4f0545a733185b672f"),
"name" : "hello",
"items" : [
{
"itemNumber" : 12512,
"value" : "let"
},
{
"itemNumber" : 2546,
"value" : "put"
}
]
}
I need to make sure that every item's itemNumber is unique globally in the collection.
In SQL database I would have a separate table for items and the query for checking if numbers are unique would be something like
select count(1)
from (
select itemNumber, count(itemNumber) as cnt
from items
group by itemNumber) sel
where cnt>1;
Resulting 0 would mean that all itemNumbers are unique. (Probably there are better ways to make that check in SQL)
With MongoDB the only solution that I can come to is
a) use forEach to extract all items to separate collection
b) make a simple aggregation
db.items.aggregate(
{ $group : { _id : '$itemNumber', count : {$sum : 1} } },
{ $out : "cnt" }
)
c) db.cnt.find({count: {$gt: 1}}).count()
Is there any one-query way to do it?
Performace notice: the collection is about 3M documents, 2,2KB each. I have noticed that aggreations that contain $group run like forever on this collection.
How about something like that:
db.items.aggregate(
{ $unwind: "$items" } ,
{ $group : { _id : '$items.itemNumber', count : { $sum : 1 } } },
{ $match: { "count": { $gt: 1 } } }
)

MongoDB aggregate $avg always returns zero

I'm trying to get an average over a set of records from MongoDB, but it is always returning zero.
db.entries.aggregate([
{ $match : { "date" : { $gte : 1465672815466 }, "type" : "sgv" } },
{ $group : { _id : null, avgBG : { $avg: "sgv" } } }
])
I'm very new to MongoDB and I'm not sure if somehow this is caused by records coming back where there is no "sgv" value or if I'm doing something else wrong here.

Mongo DB - how to query for id dependent on oldest date in array of a field

Lets say I have a collection called phone_audit with document entries of the following form - _id which is the phone number, and value containing items that always contains 2 entries (id, and a date).
Please see below:
{
"_id" : {
"phone_number" : "+012345678"
},
"value" : {
"items" : [
{
"_id" : "c14b4ac1db691680a3fb65320fba7261",
"updated_at" : ISODate("2016-03-14T12:35:06.533Z")
},
{
"_id" : "986b58e55f8606270f8a43cd7f32392b",
"updated_at" : ISODate("2016-07-23T11:17:53.552Z")
}
]
}
},
......
I need to get a list of _id values for every entry in that collection representing the older of the two items in each document.
So in the above - result would be [c14b4ac1db691680a3fb65320fba7261,...]
Any pointers at the type of query to execute would be v.helpful even if the exact syntax is not correct.
With aggregate(), you can $unwind value.items, $sort by update_at, then use $first to get the oldest:
[
{
"$unwind": "$value.items"
},
{
"$sort": { "value.items.updated_at": 1 }
},
{
"$group":{
_id: "$_id.phone_number",
oldest:{$first:"$value.items"}
}
},
{
"$project":{
value_id: "$oldest._id"
}
}
]

MongoDb aggregation Group by Date

I'm trying to group by timestamp for the collection named "foo" { _id, TimeStamp }
db.foos.aggregate(
[
{$group : { _id : new Date (Date.UTC({ $year : '$TimeStamp' },{ $month : '$TimeStamp' },{$dayOfMonth : '$TimeStamp'})) }}
])
Expecting many dates but the result is just one date. The data i'm using is correct (has many foo and different dates except 1970). There's some problem in the date parsing but i can not solve yet.
{
"result" : [
{
"_id" : ISODate("1970-01-01T00:00:00.000Z")
}
],
"ok" : 1
}
Tried this One:
db.foos.aggregate(
[
{$group : { _id : { year : { $year : '$TimeStamp' }, month : { $month : '$TimeStamp' }, day : {$dayOfMonth : '$TimeStamp'} }, count : { $sum : 1 } }},
{$project : { parsedDate : new Date('$_id.year', '$_id.month', '$_id.day') , count : 1, _id : 0} }
])
Result :
uncaught exception: aggregate failed: {
"errmsg" : "exception: disallowed field type Date in object expression (at 'parsedDate')",
"code" : 15992,
"ok" : 0
}
And that one:
db.foos.aggregate(
[
{$group : { _id : { year : { $year : '$TimeStamp' }, month : { $month : '$TimeStamp' }, day : {$dayOfMonth : '$TimeStamp'} }, count : { $sum : 1 } }},
{$project : { parsedDate : Date.UTC('$_id.year', '$_id.month', '$_id.day') , count : 1, _id : 0} }
])
Can not see dates in the result
{
"result" : [
{
"count" : 412
},
{
"count" : 1702
},
{
"count" : 422
}
],
"ok" : 1
}
db.foos.aggregate(
[
{ $project : { day : {$substr: ["$TimeStamp", 0, 10] }}},
{ $group : { _id : "$day", number : { $sum : 1 }}},
{ $sort : { _id : 1 }}
]
)
Group by date can be done in two steps in the aggregation framework, an additional third step is needed for sorting the result, if sorting is desired:
$project in combination with $substr takes the first 10 characters (YYYY:MM:DD) of the ISODate object from each document (the result is a collection of documents with the fields "_id" and "day");
$group groups by day, adding (summing) the number 1 for each matching document;
$sort ascending by "_id", which is the day from the previous aggregation step - this is optional if sorted result is desired.
This solution can not take advantage of indexes like db.twitter.ensureIndex( { TimeStamp: 1 } ), because it transforms the ISODate object to a string object on the fly. For large collections (millions of documents) this could be a performance bottleneck and more sophisticated approaches should be used.
It depends on whether you want to have the date as ISODate type in the final output. If so, then you can do one of two things:
Extract $year, $month, $dayOfMonth from your timestamp and then reconstruct a new date out of them (you are already trying to do that, but you're using syntax that doesn't work in aggregation framework).
If the original Timestamp is of type ISODate() then you can do date arithmetic to subtract the hours, minutes, seconds and milliseconds from your timestamp to get a new date that's "rounded" to the day.
There is an example of 2 here.
Here is how you would do 1. I'm making an assumption that all your dates are this year, but you can easily adjust the math to accommodate your oldest date.
project1={$project:{_id:0,
y:{$subtract:[{$year:"$TimeStamp"}, 2013]},
d:{$subtract:[{$dayOfYear:"$TimeStamp"},1]},
TimeStamp:1,
jan1:{$literal:new ISODate("2013-01-01T00:00:00")}
} };
project2={$project:{tsDate:{$add:[
"$jan1",
{$multiply:["$y", 365*24*60*60*1000]},
{$multiply:["$d", 24*60*60*1000]}
] } } };
Sample data:
db.foos.find({},{_id:0,TimeStamp:1})
{ "TimeStamp" : ISODate("2013-11-13T19:15:05.600Z") }
{ "TimeStamp" : ISODate("2014-02-01T10:00:00Z") }
Aggregation result:
> db.foos.aggregate(project1, project2)
{ "tsDate" : ISODate("2013-11-13T00:00:00Z") }
{ "tsDate" : ISODate("2014-02-01T00:00:00Z") }
This is what I use in one of my projects :
collection.aggregate(
// group results by date
{$group : {
_id : { date : "$date" }
// do whatever you want here, like $push, $sum...
}},
// _id is the date
{$sort : { _id : -1}},
{$orderby: { _id : -1 }})
.toArray()
Where $date is a Date object in mongo. I get results indexed by date.