Aggregation on the basis of the set of nested docs - mongodb

Let's say I have the next 5 docs:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
I want to manipulate the collection so that it will return a group of students (with their _id) by set (combination) of courses they take and calculate how many students in each set.
In the example above I have 3 set (combination) of courses and number of students as below:
1 - [ "A", "B" ] <- 2 students take this combination
2 - [ "A", "B", "C" ] <- 2 students
3 - [ "A", "B", "D" ] <- 1 student
I feel like this is more like MapReduce task rather than Aggregation...not sure...
UPDATE 1
Thanks a lot to #ExplosionPills
So the following aggregation command:
db.students.aggregate([{
$group: {
_id: "$courses",
count: {$sum: 1},
students: {$push: "$_id"}
}
}])
gives me the following output:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
It groups by set of courses, counts number of students belong to it and their _ids.
UPDATE 2
I found out, the aggregation above treats combination [ "C", "A", "B" ] as different from [ "A", "B", "C" ]. But I need these 2 count as same.
So let's look at the following documents:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }
Let's see this in output:
{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
See the lines 1 and 3 - this is not what I wanted.
So, to treat [ "C", "A", "B" ] and [ "A", "B", "C" ] as same combination I changed the aggregation as follows:
db.students.aggregate([
{$unwind: "$courses" },
{$sort : {"courses": 1}},
{$group: {_id: "$_id", courses: {$push: "$courses"}}},
{$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
])
Output:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }

This is an aggregate operation using grouping.
db.students.aggregate([{
$group: {
// Uniquely identify the document.
// The $ syntax queries on this field
_id: "$courses",
// Add 1 for each field found (effectively a counter)
count: {$sum: 1}
}
}]);
EDIT:
If the courses can be in any order, you can $unwind, $sort, and $group again as suggested in the edited question. It's also possible to do this via mapReduce, but I'm not sure which is faster.
db.students.mapReduce(
function () {
// Use the sorted courses as the key
emit(this.courses.sort(), this._id);
},
function (key, values) {
return {"students": values, count: values.length};
},
{out: {inline: 1}}
)

Related

Group using the value from two possible fields

Let's say I have a match collection in the following format
{user1: "a", user2: "b"},
{user1: "a", user2: "c"},
{user1: "b", user2: "d"},
{user1: "b", user2: "c"},
{user1: "b", user2: "e"},
{user1: "c", user2: "f"}
I would like to know which user has the most appearance (either in user1 or user2). The result should be in this format ordered by the number of occurence.
{"user": "b", count:4},
{"user": "c", count:3},
{"user": "a", count:2},
{"user": "d", count:1},
{"user": "f", count:1},
{"user": "e", count:1}
Is there a way I can group on the value of two fields?
Something like match.aggregate({$group: {_id: {$or:["user1","user2]}}, count:{$sum:1}})
db.match.aggregate([
{$project: { user: [ "$user1", "$user2" ]}},
{$unwind: "$user"},
{$group: {_id: "$user", count: {$sum:1}}}
])
First stage projects each document into array of users
{user: ["a", "b"]},
{user: ["a", "c"]},
{user: ["b", "d"]},
...
Next we unwind arrays
{user:"a"},
{user:"b"},
{user:"a"},
{user:"c"},
{user:"b"},
...
And simple grouping at the end
Basically the concept is to $map onto an array and work from there:
db.collection.aggregate([
{ "$project": {
"_id": 0,
"user": { "$map": {
"input": ["A","B"],
"as": "el",
"in": {
"$cond": {
"if": { "$eq": [ "$$el", "A" ] },
"then": "$user1",
"else": "$user2"
}
}
}}
}},
{ "$unwind": "$user" },
{ "$group": {
"_id": "$user",
"count": { "$sum": 1 }
}}
])
Let us take an example and go through
db.users_data.find();
{
"_id" : 1,
"user1" : "a",
"user2" : "aa",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T08:52:32.434Z")
},
{
"_id" : 2,
"user1" : "a",
"user2" : "ab",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T09:52:32.434Z")
},
{
"_id" : 3,
"user1" : "b",
"user2" : "aa",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T10:52:32.434Z")
},
{
"_id" : 4,
"user1" : "b",
"user2" : "ab",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T10:52:32.434Z")
},
{
"_id" : 5,
"user1" : "a",
"user2" : "aa",
"status" : "OLD",
"createdDate" : ISODate("2015-05-03T08:52:32.434Z")
},
{
"_id" : 6,
"user1" : "a",
"user2" : "ab",
"status" : "OLD",
"createdDate" : ISODate("2015-05-03T08:52:32.434Z")
},
Then
db.users_data.aggregate([
{"$group" : {_id:{user1:"$user1",user2:"$user2"}, count:{$sum:1}}} ])
])
will give the resuls as
{ "_id" : { "user1" : "a", "user2" : "aa" }, "count" : 2}
{ "_id" : { "user1" : "a", "user2" : "ab" }, "count" : 2}
{ "_id" : { "user1" : "b", "user2" : "aa" }, "count" : 1}
{ "_id" : { "user1" : "b", "user2" : "ab" }, "count" : 1}
Thus grouping by multiple ids are possible
Now one more variation
db.users_data.aggregate([
{"$group" : {_id:{user1:"$user1",user2:"$user2",status:"$status"}, count:{$sum:1}}} ])
])
will give the resuls as
{ "_id" : { "user1" : "a", "user2" : "aa","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "a", "user2" : "ab","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "b", "user2" : "aa","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "b", "user2" : "ab","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "a", "user2" : "aa","status":"OLD" }, "count" : 1}
{ "_id" : { "user1" : "a", "user2" : "ab","status":"OLD" }, "count" : 1}

How to aggregate 2 list if at least one element matches?

For example, I have 6 items in collection
{ _id: 1, list: ["A", "B"] }
{ _id: 2, list: ["C", "A"] }
{ _id: 3, list: ["E", "F"] }
{ _id: 4, list: ["E", "D"] }
{ _id: 5, list: ["U", "I"] }
{ _id: 6, list: ["D", "K"] }
I would do a query to merge all the items which its list have at least 1 element matches. So the result will be:
{ _id: 7, list: ["A", "B", "C"] }
{ _id: 8, list: ["E", "F", "D", "K"] }
I'm new to MongoDB so anyone help me for this query ? Thanks alot.
I found this solution which almost solves your problem.
db.lists.aggregate([
{$unwind:"$list"},
{$group:{_id:"$list", merged:{$addToSet:"$_id"}, size:{$sum:1}}},
{$match:{size: {$gt: 1}}},
{$project:{_id: 1, merged:1, size: 1, merged1: "$merged"}},
{$unwind:"$merged"},
{$unwind:"$merged1"},
{$group:{_id:"$merged", letter:{$first:"$_id"}, size:{$sum: 1}, set: {$addToSet:"$merged1"}}},
{$sort:{size:1}},
{$group:{_id: "$letter", mergedIds:{$last:"$set"}, size:{$sum:1}}},
{$match: {size:{$gt:1}}}
])
I have tested this in my mongo shell which gives the following output:
{ "_id" : "E", "matchedIds" : [ 6, 3, 4 ], "size" : 2 }
{ "_id" : "A", "matchedIds" : [ 1, 2 ], "size" : 2 }
The matchedIds represents the docs id-s which have common value in the list array.
I think in the above aggregation can be done some optimization, but initially I found this, will try to find other ways. In addition you can use $lookup aggregation at the end of aggregation pipline to match the id-s with the set values. I couldn't test this because my mongo version doesn't support $lookup. But you can manually get that values inside some for loop if you use Node.js or something else.
Edited
This algorithm will only work if the amount of intersected lists for each list is no more than 3.
For example this will work:
{ "_id" : 1, "list" : [ "A", "B" ] }
{ "_id" : 2, "list" : [ "C", "A" ] }
{ "_id" : 3, "list" : [ "E", "F" ] }
{ "_id" : 4, "list" : [ "E", "D" ] }
{ "_id" : 5, "list" : [ "U", "I" ] }
{ "_id" : 6, "list" : [ "D", "K" ] }
{ "_id" : 7, "list" : [ "A", "L" ] }
but this will not:
{ "_id" : 1, "list" : [ "A", "B" ] }
{ "_id" : 2, "list" : [ "C", "A" ] }
{ "_id" : 3, "list" : [ "E", "F" ] }
{ "_id" : 4, "list" : [ "E", "D" ] }
{ "_id" : 5, "list" : [ "U", "I" ] }
{ "_id" : 6, "list" : [ "D", "K" ] }
{ "_id" : 7, "list" : [ "L", "K" ] }
Here the lists with ids of 7, 6, 4, 3 has intersection, so the number of intersected lists is 4, in this case the provided algorithm will not work. It will work only if the amount of intersection is less than 4 for each list
Final notice
It seems you can't achieve to your desired result by doing merge computation in the mongo database layer. If you are building an application then it will be better to do computation also in the application layer.

Mongodb group query for embeded docs

Is it possible to query value counts for a particular key inside embeded documents.
Here is my document:
{ "_id" : 1, "drives" : [ {"fw": "A"}, {"fw": "B"} ] }
{ "_id" : 2, "drives" : [ {"fw": "B"}, {"fw": "C"} ] }
{ "_id" : 3, "drives" : [ {"fw": "A"}, {"fw": "C"} ] }
{ "_id" : 4, "drives" : [ {"fw": "A"}, {"fw": "D"} ] }
And i would like to get the count of "fw":
Output:
counts : {"A": 3, "B": 2, "C": 2, "D": 1 }
Aggregation will do it. First $unwind the array to get a row per fw value, then $group by the value of fw along with a $sum of 1 to get the count per fw value;
db.test.aggregate( { $unwind: "$drives" },
{ $group: { _id: "$drives.fw", cnt: {$sum:1} } } )
# { "_id" : "D", "cnt" : 1 }
# { "_id" : "C", "cnt" : 2 }
# { "_id" : "B", "cnt" : 2 }
# { "_id" : "A", "cnt" : 3 }

Is there a `$slice` like comparison for MongoDB's filters?

In MongoDB there is a projection operator $slice which allows projecting a subarray.
Is there any way to filter by an array slice as well? Something like:
db.testdb.find( {arrayofstring: { $eqSlice: {$slice: [0,1], $val: [ "a" ] } } }, {...})
Edit: An example and its expected output
> db.studentsTestDataTypes.find({},{ _id: 1, int: 1, arraystring: 1})
{ "_id" : ObjectId("56977186756088b586154f9d"), "int" : 2001, "arraystring" : [ "a", "b", "c" ] }
{ "_id" : ObjectId("56977186756088b586154f9e"), "int" : 2002, "arraystring" : [ "d", "e", "f" ] }
Example of expected result: Filtering by those entries with value "a" at the first position of arraystring:
{ "_id" : ObjectId("56977186756088b586154f9d"), "int" : 2001, "arraystring" : [ "a", "b", "c" ] }
Suppose you have the following document in your collection:
{ "_id" : ObjectId("56977186756088b586154f9d"), "int" : 2001, "arraystring" : [ "a", "b", "c" ] }
{ "_id" : ObjectId("56977186756088b586154f9e"), "int" : 2002, "arraystring" : [ "d", "e", "f" ] }
{ "_id" : ObjectId("56978e21ae9bb55c0d7cdc67"), "int" : 2001, "arraystring" : [ "b", "a", "c" ] }
The easier and best way is to use dot notation
db.collection.find({ "arraystring.0": "a" } )
Which yields:
{
"_id" : ObjectId("56977186756088b586154f9d"),
"int" : 2001,
"arraystring" : [
"a",
"b",
"c"
]
}

Count and group by with mongo db

I m actually facing a problem with mongoDB.
I need to display some statistics :
- A treatment is an information that contain a date, the user who treated, a list of anomalies
Can you help me with the request to get :
"The numbers of anomalies by users ?"
Thanks for all :D
db.treatment.aggregate(
{
$group : {_id : "$anomalies", totalUser : { $sum : 1 }}
}
);
Note : change your collection and document key name if I put wrong.
Source : http://www.mkyong.com/mongodb/mongodb-aggregate-and-group-example/
So, if your collection had the following documents:
> db.treatments.find()
{ "_id" : 1, "date" : ISODate("2014-08-29T15:44:45.843Z"), "user" : "A", "anomalies" : [ "a", "b", "c" ] }
{ "_id" : 2, "date" : ISODate("2014-08-29T15:45:01.782Z"), "user" : "A", "anomalies" : [ "e", "f", "g" ] }
{ "_id" : 3, "date" : ISODate("2014-08-29T15:45:34.889Z"), "user" : "B", "anomalies" : [ "a", "b", "c", "e", "f", "g" ] }
{ "_id" : 4, "date" : ISODate("2014-08-29T15:48:01.860Z"), "user" : "B", "anomalies" : [ "a", "b", "c", "e", "f", "g" ] }
{ "_id" : 5, "date" : ISODate("2014-08-29T15:48:28.937Z"), "user" : "A", "anomalies" : [ "x", "y", "z" ] }
You can use $group stage to $sum the $size of the anomalies array
> db.treatments.aggregate([ { $group: { _id: "$user", allAnomalies: { $sum: { $size: "$anomalies" } } } } ] )
{ "_id" : "B", "allAnomalies" : 12 }
{ "_id" : "A", "allAnomalies" : 9 }