For MongoDB, need to group on event and supply multiple attribute fields.
My purpose is to show for a group the associated field attributes together with a list of sum totals from numeric fields.
For the MongoDB the aggregate has $group and $project. $project can show fields listed within the $group.
My $group is working fine by itself, e.g, without the $project. When I supply $project before or after the $group, I receive the following error:
query failed: (Location40323) A pipeline stage specification object must contain exactly one field.
My code is as follows:
db.collection.aggregate([
{
$project: {
eventName: "$EVENT_TYPE",
registeredDate: "$BEGIN_DATE_TIME",
stateName: "$STATE",
damageCosts: "$DAMAGE_PROPERTY",
peopleCosts: "$DEATHS_DIRECT",
injuriyCosts: "$INJURIES_DIRECT",
cropCosts: "$DAMAGE_CROPS"
},
$group: {
_id: "$EVENT_TYPE",
totalPropCost: {
$sum: "$DAMAGE_PROPERTY"
},
totalDeaths: {
$sum: "$DEATHS_DIRECTY"
},
totalInjury: {
$sum: "$INJURIES_DIRECT"
},
totalCropCost: {
$sum: "$DAMAGE_CROPS"
}
}
}
])
Alternately: attempted to use the $push command, and it looks like a valid run with $push:
db.collection.aggregate([
{
$group: {
_id: "$EVENT_TYPE",
totalPropCost: {
$sum: "$DAMAGE_PROPERTY"
},
totalDeaths: {
$sum: "$DEATHS_DIRECTY"
},
totalInjury: {
$sum: "$INJURIES_DIRECT"
},
totalCropCost: {
$sum: "$DAMAGE_CROPS"
},
events: {
$push: {
name: "$EVENT_TYPE",
date: "$BEGIN_DATE_TIME",
state: "$STATE"
}
}
}
}
])
I use the following sample of data for this mongoDb query, which without the $project, works correct on $sum.
[
{
"BEGIN_YEARMONTH": 201007,
"BEGIN_DAY": 7,
"END_YEARMONTH": 201007,
"END_DAY": 7,
"END_TIME": 1630,
"STATE": "NEW HAMPSHIRE",
"YEAR": 2010,
"EVENT_TYPE": "Heat",
"BEGIN_DATE_TIME": "07-JUL-10 12:51:00",
"END_DATE_TIME": "07-JUL-10 16:30:00",
"INJURIES_DIRECT": 0,
"DEATHS_DIRECT": 0,
"DAMAGE_PROPERTY": "0.00K",
"DAMAGE_CROPS": "0.00K"
},
{
"BEGIN_YEARMONTH": 201001,
"BEGIN_DAY": 17,
"END_YEARMONTH": 201001,
"END_DAY": 18,
"END_TIME": 1500,
"STATE": "NEW HAMPSHIRE",
"YEAR": 2010,
"MONTH_NAME": "January",
"EVENT_TYPE": "Heavy Snow",
"BEGIN_DATE_TIME": "17-JAN-10 23:00:00",
"END_DATE_TIME": "18-JAN-10 15:00:00",
"INJURIES_DIRECT": 0,
"DEATHS_DIRECT": 0,
"DAMAGE_PROPERTY": "0.00K",
"DAMAGE_CROPS": "0.00K"
}
]
For this investigation, after many tries, and other ideas welcome, especially using alternative solutions with $project. By and by, this much is giving me adequate results. The $push command appear to work best. If anyone knows how to get a $project working, please advise and provide explanation. For now, $push work on one solutions for group and list of attribute fields.
db.collection.aggregate([
{
$group: {
_id: "$STATE",
totalPropCost: {
$sum: "$DAMAGE_PROPERTY"
},
totalDeaths: {
$sum: "$DEATHS_DIRECTY"
},
totalInjury: {
$sum: "$INJURIES_DIRECT"
},
totalCropCost: {
$sum: "$DAMAGE_CROPS"
},
events: {
$push: {
eventName: "$EVENT_TYPE",
CzName: "$CZ_NAME",
cZTimeZone: "$CZ_TIMEZONE",
eventDate: "$BEGIN_DATE_TIME",
eventMonth: "$MONTH_NAME",
state: "$STATE",
observer: "$SOURCE"
}
}
}
}
])
Related
I've done this sometime last year, but now I really can't recall and can't find any helpful resources.
I want to get the statistics of my collection based on types.
This is my data object
{
"_id": {
"$oid": "63bfc374378c59a5328f229e"
},
"amountEarned": 11500,
"amountPaid": 10350,
"relianceCommission": 1150,
"receiverType": "RESTAURANT",
"__v": 0
}
I just need the sum of amountPaid for each receiverType, it could be STORE, RESTAURANT or SHOPPER. Then I also need the sum of relianceCommission for all. Resulting in a shape like
{
storeEarnings: 500,
restaurantEarnings: 30,
shopperEarnings: 40,
totalRelianceCommission: 45
}
I've tried
aggregate([
{
$group: {_id: "$receiverType", total: {$sum: "amountPaid"}}
}
])
And then joining with another pipeline to calculate totalRelianceCommission, but I feel there should be a neater way to do it. I'm also not sure how to do the projections to result in the desired shape. Please help.
You need conditional sum.
db.collection.aggregate([
{
$group: {
_id: null,
storeEarnings: {
$sum: {
$cond: [{$eq: ["$receiverType","STORE"]},"$amountPaid",0]
}
},
restaurantEarnings: {
$sum: {
$cond: [{$eq: ["$receiverType","RESTAURANT"]},"$amountPaid",0]
}
},
shopperEarnings: {
$sum: {
$cond: [{$eq: ["$receiverType","SHOPPER"]},"$amountPaid",0]
}
},
totalRelianceCommission: {
$sum: "$relianceCommission"
}
}
}
])
Demo
query:
{
$group: {
_id: "$receiverType",
total: {
$sum: "$amountPaid"
},
commissions: {
$sum: "$relianceCommission"
}
}
}
result:[
{
"_id": "STORE",
"commissions": 1150,
"total": 10350
},
{
"_id": "RESTAURANT",
"commissions": 2300,
"total": 20700
}
]
loop through the array to get a sum of commissions
Collection
{
"_id" : ObjectId("5a143a79ca78479b1dc90161"),
"createdAt" : ISODate("2017-11-21T14:38:49.375Z"),
"amount" : 227.93359186,
"pair" : "ant_eth"
}
Expected output
{
"12-12-2012": [
{
"pair": "ant_eth",
"sum": "sum of amounts in 12-12-2012"
},
{
"pair": "new_pair",
"sum": "sum of amounts in 12-12-2012"
},
],
"13-12-2012": [{
"pair": "ant_eth",
"sum": "sum of amounts in 13-12-2012"
}]
}
What I achieved so far from my knowledge is;
const criteria = [
{ $group: {
_id: '$pair',
totalAmount: { $sum: '$amount' } } }
]
Any help to achieve the expected output is much appreciated.
OK, so you want to sum up amount by just the date portion of a datetime and pair, and then "organize" all the pair+sum by date. You can do this by "regrouping" as follows. The first $group creates the sums but leaves you with repeating dates. The second $group fixes up the output to almost what you wish except that the dates remain as rvals to the _id instead of becoming lvals (field names) themselves.
db.foo.aggregate([
{
$group: {
_id: {d: {$dateToString: { format: "%Y-%m-%d", date: "$createdAt"}}, pair: "$pair"},
n: {$sum: "$amount"}
}
},
{
$group: {
_id: "$_id.d",
items: {$push: {pair: "$_id.pair", sum: "$n"}}
}
}
]);
If you REALLY want to have field names, then add these two stages after the second $group:
,{$project: {x: [["$_id","$items"]] }}
,{$replaceRoot: { newRoot: {$arrayToObject: "$x"} }}
This is what I could get to:
db.collection.aggregate([{
$group: {
_id: {
year: {
"$year": "$createdAt"
},
month: {
"$month": "$createdAt"
},
day: {
"$dayOfMonth": "$createdAt"
},
pair: "$pair"
},
sum: {
$sum: "$amount"
}
}
}])
For rest of the thing, you probably need to do app side parsing to generate output you want
I am trying to calculate the avg flights per month but i am receiving an error
"A pipeline stage specification object must contain exactly one field.",
db.Flights.aggregate([
{$unwind: "$flights"},
{$project:
{_id: 0,
status: 1,
flights: 1
},
$match: {"status": "active"},
$group: {_id: {"flights" : "$flights.flight_id", "Month": "$depart_info.month_name_long"},
avg_flights: {$avg: "$flights.count"}}}
])
Your aggregation pipeline is somewhat malformed; specifically the $match and $group stages. Each stage needs to be a JSON document. Try the following:
db.Flights.aggregate([
{
$unwind: "$flights"
},
{
$project: {
_id: 0,
status: 1,
flights: 1
},
},
{
$match: {
"status": "active"
}
},
{
$group: {
_id: {
"flights": "$flights.flight_id",
"Month": "$depart_info.month_name_long"
},
avg_flights: {
$avg: "$flights.count"
}
}
}
])
I have a few documents that look like this:
{
'page_id': 123131,
'timestamp': ISODate('2014-06-10T12:13:59'),
'processed': false
}
The documents have other fields, but these are the only one relevant for this purpose. On this collection is also an index for these documents:
{
'page_id': 1
'timestamp': -1
}
I run a mapreduce that returns distinct (page_id, day) results, with day being the date-portion of the timestamp (in the above, it would be 2014-06-10).
This is done with the following mapreduce:
function() {
emit({
site_id: this.page_id,
day: Date.UTC(this.timestamp.getUTCFullYear(),
this.timestamp.getUTCMonth(),
this.timestamp.getUTCDate())
}, {
count: 1
});
}
The reduce-function basically just returns { count: 1 } as I am not really interested in the number, just unique tuples.
I wish to make this more efficient. I tried adding sort: { 'page_id' }, but it triggers an error - googling shows that I can apparently only sort by the key, but since this is not a "raw" key how does that work?
Also, is there an alternative to this mapreduce that is faster? I know mongodb has the distinct, but from what I can gather it only works on one field. Might the group aggregate function be relevant?
The aggregation framework would seem more appropriate since it runs in native code where mapReduce runs under a JavaScript interpreter instance. MapReduce has it's uses, but generally the aggregation framework should be best suited to common tasks that do not require specific processing where only the JavaScript methods allow the needed control:
db.collection.aggregate([
{ "$group": {
"_id": {
"page": "$page_id",
"day": {
"year": { "$year": "$timestamp" },
"month": { "$month": "$timestamp" },
"day": { "$dayOfMonth": "$timestamp" },
}
},
"count": { "$sum": 1 }
}}
])
This largely makes use of the date aggregation operators. See other aggregation framework operators for more details.
Of course if you wanted to reverse sort those unique dates (which is the opposite of what mapReduce will do) or other fields then just add a $sort to the end of the pipeline for what you want:
db.collection.aggregate([
{ "$group": {
"_id": {
"page": "$page_id",
"day": {
"year": { "$year": "$timestamp" },
"month": { "$month": "$timestamp" },
"day": { "$dayOfMonth": "$timestamp" },
}
},
"count": { "$sum": 1 }
}},
{ "$sort": {
"day.year": -1, "day.month": -1, "day.day": -1
}}
])
you might want to look at the aggregation framework.
query like this:
collection.aggregate([
{$group:
{
_id: {
year: { $year: [ "$timestamp" ] },
month: { $month: [ "$timestamp" ] },
day: { $dayOfMonth: [ "$timestamp" ] },
pageId: "$page_id"
}
}
])
will give you all unique combinations of the fields you're looking for.
Here's my problem:
Model:
{ application: "abc", date: Time.now, status: "1" user_id: [ id1, id2,
id4] }
{ application: "abc", date: Time.yesterday, status: "1", user_id: [
id1, id3, id5] }
{ application: "abc", date: Time.yesterday-1, status: "1", user_id: [
id1, id3, id5] }
I need to count the unique number of user_ids in a period of time.
Expected result:
{ application: "abc", status: "1", unique_id_count: 5 }
I'm currently using the aggregation framework and counting the ids outside mongodb.
{ $match: { application: "abc" } }, { $unwind: "$users" }, { $group:
{ _id: { status: "$status"},
users: { $addToSet: "$users" } } }
My arrays of users ids are very large, so I have to iterate the dates or I'll get the maximum document limit (16mb).
I could also $group by
{ year: { $year: "$date" }, month: { $month: "$date" }, day: {
$dayOfMonth: "$date" }
but I also get the document size limitation.
Is it possible to count the set size in mongodb?
thanks
The following will return number of uniqueUsers per application. This will apply an group operation to a result of a group operation by using pipeline feature of mongodb.
{ $match: { application: "abc" } },
{ $unwind: "$users" },
{ $group: { _id: "$status", users: { $addToSet: "$users" } } },
{ $unwind:"$users" },
{ $group : {_id : "$_id", count : {$sum : 1} } }
Hopefully this will be done in an easier way in the following releases of mongo by a command which gives the size of an array under a projection. {$project: {id: "$_id", count: {$size: "$uniqueUsers"}}}
https://jira.mongodb.org/browse/SERVER-4899
Cheers
Sorry I'm a little late to the party. Simply grouping on the 'user_id' and counting the result with a trivial group works just fine and doesn't run into doc size limits.
[
{$match: {application: 'abc', date: {$gte: startDate, $lte: endDate}}},
{$unwind: '$user_id'},
{$group: {_id: '$user_id'}},
{$group: {_id: 'singleton', count: {$sum: 1}}}
];
Use $size to get the size of set.
[
{
$match: {"application": "abc"}
},
{
$unwind: "$user_id"
},
{
$group: {
"_id": "$status",
"application": "$application",
"unique_user_id": {$addToSet: "$user_id"}
}
},
{
$project:{
"_id": "$_id",
"application": "$application",
"count": {$size: "$unique_user_id"}
}
}
]