MongoDB group and merge all sub documents into parent document - mongodb

Group documents and merge the list under each group to the parent document using one of the field in the sub document as key.
Here are my documents
{ "_id" : "1", "a" : "11", "b" : "o11", "c" : "s11" }
{ "_id" : "2", "a" : "22", "b" : "o22", "c" : "s22" }
{ "_id" : "3", "a" : "11", "b" : "o12", "c" : "s12" }
{ "_id" : "4", "a" : "11", "b" : "o13", "c" : "s13" }
{ "_id" : "5", "a" : "22", "b" : "o23", "c" : "s23" }
I want to group by 'a' and and merge list of subdocuments to one document like below:
{a:"11", o11: { "_id" : "1", "a" : "11", "b" : "o11", "c" : "s11" }, o12: { "_id" : "3", "a" : "11", "b" : "o12", "c" : "s12" }, o13: { "_id" : "4", "a" : "11", "b" : "o13", "c" : "s13" }},
{a:"22", o22: { "_id" : "2", "a" : "22", "b" : "o22", "c" : "s22" }, o23: { "_id" : "5", "a" : "22", "b" : "o23", "c" : "s23" }


How to query: aggregate.$match -> $lookup -> where x=11

I have two mongo collections which are in a relation:
{ "contact_id" : "1", "e" : "" }
{ "contact_id" : "2", "e" : "" }
{ "contact_id" : "3", "e" : "" }
{ "contact_id" : "2", "group_id" : "1" }
{ "contact_id" : "3", "group_id" : "1" }
{ "contact_id" : "3", "group_id" : "2" }
So typical many-to-many relation - I have also a groups_collection. Every contact can be in multi groups.
When I query:
db.contacts.aggregate([{ "$match" : { "a" : false, "e" : /ab/ } }, { "$lookup" : { "from" : "contacts_groups", "localField" : "contact_id", "foreignField" : "contact_id", "as" : "details" } }])
I get:
1) { "contact_id" : "1", "e" : "", "details" : [{ "contact_id" : "1", "group_id" : "1" }, { "contact_id" : "1", "group_id" : "2" }] }
2) { "contact_id" : "2", "e" : "", "details" : [{ "contact_id" : "2", "group_id" : "2" }, { "contact_id" : "2", "group_id" : "3" }] }
3) { "contact_id" : "3", "e" : "", "details" : [] }
But I would like get only the second row (2), so only those where "group_id" : "3".
How can I do this?
Is the way I do this 'join and where' most effective?

MongoDB Distinct Count issue

I have collection with following data (Collection contains more than 10 million records)
> db.LogBuff.find()
{ "_id" : ObjectId("578899d5d2b76f77d083f16c"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16d"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16e"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16f"), "SUBJECT" : "AA", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f170"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f171"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f172"), "SUBJECT" : "CC", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f173"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f174"), "SUBJECT" : "CC", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f175"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f176"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f177"), "SUBJECT" : "BB", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f178"), "SUBJECT" : "CC", "SYS" : "D" }
{ "_id" : ObjectId("578899d5d2b76f77d083f179"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17a"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17b"), "SUBJECT" : "BB", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17c"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17d"), "SUBJECT" : "CC", "SYS" : "C" }
I want to get the following kind of output
{ "_id" : { "SUBJECT" : "CC", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "DD", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "B" }, "COUNT" : 2 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "B" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "A" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "D" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "B" }, "COUNT" : 1 }
This is my code
emit( { SUBJECT : this.SUBJECT, SYS : this.SYS } , this.SYS);
return $count:1 <-stuck here
Due to some limitations I can't use the Aggregation method. I used the following aggregation code:
db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}},])
While this works for a limited number of records, for large amounts it returns this error (note - I am not the root user, therefore I can't change the configuration):
assert: command failed: { "ok" : 0, "errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.", "code" : 16819 } :
aggregate failed _getErrorWithCode#src/mongo/shell/utils.js:25:13
Try to use allowDiskUse option:
db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}}], {allowDiskUse: true})

How to count all occurrences of an element in MongoDB?

Let's say I have the following entries in my MongoDB.
{ "_id" : ObjectId("5474af69d4b28042fb63b81b"), "name" : "a", "time" : NumberLong("1412774562000"), "location" : "DE" }
{ "_id" : ObjectId("5474af69d4b28042fb63b81c"), "name" : "b", "time" : NumberLong("1412774562020"), "location" : "DE" }
{ "_id" : ObjectId("5474af69d4b28042fb63b81d"), "name" : "c", "time" : NumberLong("1412774562040"), "location" : "US" }
{ "_id" : ObjectId("5474af69d4b28042fb63b81e"), "name" : "d", "time" : NumberLong("1412774562060"), "location" : "AU" }
{ "_id" : ObjectId("5474af69d4b28042fb63b81f"), "name" : "e", "time" : NumberLong("1412774562080"), "location" : "CN" }
As a result, I need to know how often each specific "location" can be found in the database e.g.
{"DE": "2",
"US": "1",
"AU": "1",
"CN": "1"}
I have no information about all the different locations in the database so querying after a known location for example
db.c.find({"location": "DE"})
would not solve my problem.
You need to make use of the aggregation pipeline with a group and project stage operators.

Filling in with documents with default values after find/aggregate

I have a collection:
{ "name" : "A", "value" : 1, "date" : ISODate("2014-01-01T00:00:00.000Z") }
{ "name" : "B", "value" : 7, "date" : ISODate("2014-01-01T00:00:00.000Z") }
{ "name" : "A", "value" : 3, "date" : ISODate("2014-01-02T00:00:00.000Z") }
{ "name" : "B", "value" : 8, "date" : ISODate("2014-01-02T00:00:00.000Z") }
{ "name" : "B", "value" : 8, "date" : ISODate("2014-01-03T00:00:00.000Z") }
{ "name" : "A", "value" : 5, "date" : ISODate("2014-01-04T00:00:00.000Z") }
{ "name" : "A", "value" : 4, "date" : ISODate("2014-01-05T00:00:00.000Z") }
The document for A on 3rd Jan 2014 is not available. When I do a find/aggregate on A, I would like the document to appear in my result set with a default value (or better, value to be same as previous date). For example:
{ "name" : "A", "value" : 1, "date" : ISODate("2014-01-01T00:00:00.000Z") }
{ "name" : "A", "value" : 3, "date" : ISODate("2014-01-02T00:00:00.000Z") }
{ "name" : "A", "value" : 3 (or default value -1), "date" : ISODate("2014-01-03T00:00:00.000Z") }
{ "name" : "A", "value" : 5, "date" : ISODate("2014-01-04T00:00:00.000Z") }
{ "name" : "A", "value" : 4, "date" : ISODate("2014-01-05T00:00:00.000Z") }
How can this be done?
One thing you need in order to be able to do this in aggregation framework is an array of dates that you want your report to cover. For example, for input that you show, you might have an array:
days = [ ISODate("2014-01-01T00:00:00Z"), ISODate("2014-01-02T00:00:00Z"),
ISODate("2014-01-03T00:00:00Z"), ISODate("2014-01-04T00:00:00Z"),
ISODate("2014-01-05T00:00:00Z"), ISODate("2014-01-06T00:00:00Z") ];
to indicate that you want every one of these six days represented.
Here is the aggregation that you would run:
db.coll.aggregate( [
{$group : {_id:{name:"$name",date:"$date"},value:{$sum:"$value"}}},
{$group : {_id:"$", days:{$addToSet:"$"},docs:{$push:"$$ROOT"}}},
{$project : {missingDays:{$setDifference:[days,"$days"]},docs:1}},
{$unwind : "$missingDays"},
{$unwind : "$docs"},
{$group : {
} },
{$project : {_id:0, name:"$_id", date:{$setUnion:["$days","$missingDays"]}}},
{$unwind : "$date"},
{$sort : {date:1,name:1}}
] )
On your sample input with days defined as above it outputs:
{ "name" : "A", "date" : { "date" : ISODate("2014-01-01T00:00:00Z"), "value" : 1 } }
{ "name" : "A", "date" : { "date" : ISODate("2014-01-02T00:00:00Z"), "value" : 3 } }
{ "name" : "A", "date" : { "date" : ISODate("2014-01-03T00:00:00Z"), "value" : 0 } }
{ "name" : "A", "date" : { "date" : ISODate("2014-01-04T00:00:00Z"), "value" : 5 } }
{ "name" : "A", "date" : { "date" : ISODate("2014-01-05T00:00:00Z"), "value" : 4 } }
{ "name" : "A", "date" : { "date" : ISODate("2014-01-06T00:00:00Z"), "value" : 0 } }
{ "name" : "B", "date" : { "date" : ISODate("2014-01-01T00:00:00Z"), "value" : 7 } }
{ "name" : "B", "date" : { "date" : ISODate("2014-01-02T00:00:00Z"), "value" : 8 } }
{ "name" : "B", "date" : { "date" : ISODate("2014-01-03T00:00:00Z"), "value" : 8 } }
{ "name" : "B", "date" : { "date" : ISODate("2014-01-04T00:00:00Z"), "value" : 0 } }
{ "name" : "B", "date" : { "date" : ISODate("2014-01-05T00:00:00Z"), "value" : 0 } }
{ "name" : "B", "date" : { "date" : ISODate("2014-01-06T00:00:00Z"), "value" : 0 } }
The first group stage may not be necessary in your case - it's there in case there are multiple documents for the same name and date, in that case you want to add the values for them. The second $group and $project stage figure out the difference between the days present for each name and the array of days you want covered, creating missingDays which will be getting the value 0 in the next $group stage. That group stage creates for each name an array of dates that have data and array of missing dates that don't. It structures them the say way so that the following $project stage can create a union of them using the $setUnion operator. After that all that's left is to $unwind the array of dates and sort it whichever way you want.

Get all array subdocuments with a conditional in MongoDB

I have a collection with lot of documents like this:
"Items" : [],
"Technicians" : [],
"_id" : ObjectId("537b5ea4c61b1d1743f4341f"),
"budgets" : [
"concepts" : [
"position" : 0,
"description" : "A",
"price" : "1",
"qty" : "11",
"total" : 11
"position" : 1,
"description" : "A",
"price" : "2",
"qty" : "22",
"total" : 44
"position" : 2,
"description" : "A",
"price" : "3",
"qty" : "33",
"total" : 99
"position" : 3,
"description" : "A",
"price" : "4",
"qty" : "44",
"total" : 176
"date" : "2014-05-21T10:20:48.696Z",
"id" : 9989,
"status" : "joooder",
"total" : 500
"id" : 260,
"date" : "2014-05-22T11:12:40.260Z",
"concepts" : [
"position" : 0,
"description" : "Nueva",
"price" : "1",
"qty" : "1",
"total" : 1
"total" : 1,
"status" : "pending"
"id" : 111,
"date" : "2014-05-22T13:36:28.111Z",
"concepts" : [
"position" : 0,
"description" : "Blabla",
"price" : "1",
"qty" : "11",
"total" : 11
"total" : 11,
"status" : "pending"
Now, I want to get the budgets, but only the budgets if status is quals to pending, but I need to find in all documents of collection.
I tried this:
{ $unwind : "$budgets" },
{ $match : {
"budgets.status": "pending"
But this isn't ok...
Edit two:
With this:
{ $project : {
budgets : 1
{ $match : {
"budgets.status": "pending"
I get this:
"result" : [
"_id" : ObjectId("537b5ea4c61b1d1743f4341f"),
"budgets" : [
"concepts" : [
"position" : 0,
"description" : "A",
"price" : "1",
"qty" : "11",
"total" : 11
"position" : 1,
"description" : "A",
"price" : "2",
"qty" : "22",
"total" : 44
"position" : 2,
"description" : "A",
"price" : "3",
"qty" : "33",
"total" : 99
"position" : 3,
"description" : "A",
"price" : "4",
"qty" : "44",
"total" : 176
"date" : "2014-05-21T10:20:48.696Z",
"id" : 9989,
"status" : "joooder",
"total" : 500
"id" : 260,
"date" : "2014-05-22T11:12:40.260Z",
"concepts" : [
"position" : 0,
"description" : "Nueva",
"price" : "1",
"qty" : "1",
"total" : 1
"total" : 1,
"status" : "pending"
"id" : 111,
"date" : "2014-05-22T13:36:28.111Z",
"concepts" : [
"position" : 0,
"description" : "Blabla",
"price" : "1",
"qty" : "11",
"total" : 11
"total" : 11,
"status" : "pending"
"ok" : 1
The first status item isn't pending.