Count and group by with mongo db - mongodb

I m actually facing a problem with mongoDB.
I need to display some statistics :
- A treatment is an information that contain a date, the user who treated, a list of anomalies
Can you help me with the request to get :
"The numbers of anomalies by users ?"
Thanks for all :D

db.treatment.aggregate(
{
$group : {_id : "$anomalies", totalUser : { $sum : 1 }}
}
);
Note : change your collection and document key name if I put wrong.
Source : http://www.mkyong.com/mongodb/mongodb-aggregate-and-group-example/

So, if your collection had the following documents:
> db.treatments.find()
{ "_id" : 1, "date" : ISODate("2014-08-29T15:44:45.843Z"), "user" : "A", "anomalies" : [ "a", "b", "c" ] }
{ "_id" : 2, "date" : ISODate("2014-08-29T15:45:01.782Z"), "user" : "A", "anomalies" : [ "e", "f", "g" ] }
{ "_id" : 3, "date" : ISODate("2014-08-29T15:45:34.889Z"), "user" : "B", "anomalies" : [ "a", "b", "c", "e", "f", "g" ] }
{ "_id" : 4, "date" : ISODate("2014-08-29T15:48:01.860Z"), "user" : "B", "anomalies" : [ "a", "b", "c", "e", "f", "g" ] }
{ "_id" : 5, "date" : ISODate("2014-08-29T15:48:28.937Z"), "user" : "A", "anomalies" : [ "x", "y", "z" ] }
You can use $group stage to $sum the $size of the anomalies array
> db.treatments.aggregate([ { $group: { _id: "$user", allAnomalies: { $sum: { $size: "$anomalies" } } } } ] )
{ "_id" : "B", "allAnomalies" : 12 }
{ "_id" : "A", "allAnomalies" : 9 }

Related

Group using the value from two possible fields

Let's say I have a match collection in the following format
{user1: "a", user2: "b"},
{user1: "a", user2: "c"},
{user1: "b", user2: "d"},
{user1: "b", user2: "c"},
{user1: "b", user2: "e"},
{user1: "c", user2: "f"}
I would like to know which user has the most appearance (either in user1 or user2). The result should be in this format ordered by the number of occurence.
{"user": "b", count:4},
{"user": "c", count:3},
{"user": "a", count:2},
{"user": "d", count:1},
{"user": "f", count:1},
{"user": "e", count:1}
Is there a way I can group on the value of two fields?
Something like match.aggregate({$group: {_id: {$or:["user1","user2]}}, count:{$sum:1}})
db.match.aggregate([
{$project: { user: [ "$user1", "$user2" ]}},
{$unwind: "$user"},
{$group: {_id: "$user", count: {$sum:1}}}
])
First stage projects each document into array of users
{user: ["a", "b"]},
{user: ["a", "c"]},
{user: ["b", "d"]},
...
Next we unwind arrays
{user:"a"},
{user:"b"},
{user:"a"},
{user:"c"},
{user:"b"},
...
And simple grouping at the end
Basically the concept is to $map onto an array and work from there:
db.collection.aggregate([
{ "$project": {
"_id": 0,
"user": { "$map": {
"input": ["A","B"],
"as": "el",
"in": {
"$cond": {
"if": { "$eq": [ "$$el", "A" ] },
"then": "$user1",
"else": "$user2"
}
}
}}
}},
{ "$unwind": "$user" },
{ "$group": {
"_id": "$user",
"count": { "$sum": 1 }
}}
])
Let us take an example and go through
db.users_data.find();
{
"_id" : 1,
"user1" : "a",
"user2" : "aa",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T08:52:32.434Z")
},
{
"_id" : 2,
"user1" : "a",
"user2" : "ab",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T09:52:32.434Z")
},
{
"_id" : 3,
"user1" : "b",
"user2" : "aa",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T10:52:32.434Z")
},
{
"_id" : 4,
"user1" : "b",
"user2" : "ab",
"status" : "NEW",
"createdDate" : ISODate("2016-05-03T10:52:32.434Z")
},
{
"_id" : 5,
"user1" : "a",
"user2" : "aa",
"status" : "OLD",
"createdDate" : ISODate("2015-05-03T08:52:32.434Z")
},
{
"_id" : 6,
"user1" : "a",
"user2" : "ab",
"status" : "OLD",
"createdDate" : ISODate("2015-05-03T08:52:32.434Z")
},
Then
db.users_data.aggregate([
{"$group" : {_id:{user1:"$user1",user2:"$user2"}, count:{$sum:1}}} ])
])
will give the resuls as
{ "_id" : { "user1" : "a", "user2" : "aa" }, "count" : 2}
{ "_id" : { "user1" : "a", "user2" : "ab" }, "count" : 2}
{ "_id" : { "user1" : "b", "user2" : "aa" }, "count" : 1}
{ "_id" : { "user1" : "b", "user2" : "ab" }, "count" : 1}
Thus grouping by multiple ids are possible
Now one more variation
db.users_data.aggregate([
{"$group" : {_id:{user1:"$user1",user2:"$user2",status:"$status"}, count:{$sum:1}}} ])
])
will give the resuls as
{ "_id" : { "user1" : "a", "user2" : "aa","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "a", "user2" : "ab","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "b", "user2" : "aa","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "b", "user2" : "ab","status":"NEW" }, "count" : 1}
{ "_id" : { "user1" : "a", "user2" : "aa","status":"OLD" }, "count" : 1}
{ "_id" : { "user1" : "a", "user2" : "ab","status":"OLD" }, "count" : 1}

What aggregate should I do in MongoDB?

I have a collection that contains data as below sample
{ "_id" : "...." , "team" : [ {"name" : "A", "state" : "Active"} , {"name" : "B", "state" : "Deactive", {"name" : "C", "state" : "Unknown"} } ]},
{ "_id" : "...." , "team" : [ {"name" : "A", "state" : "Unknown"} , {"name" : "B", "state" : "Deactive", {"name" : "C", "state" : "Unknown"} } ]},
{ "_id" : "...." , "team" : [ {"name" : "A", "state" : "Active"} , {"name" : "B", "state" : "Deactive", {"name" : "C", "state" : "Unknown"} } ]}
I filter by "team.name" and want to know what is state and count each state as below sample result
{ "name" : "A", "Active" : 2, "Unknown" : 1},
{ "name" : "B", "Deactive" : 3},
{ "name" : "C", "Unknown" : 3"}
Is it possible to use only aggregate function of MongoDB without any codes?
Is it possible to use only aggregate function of MongoDB without any
codes?
Yes
TL;DR unwind and group
db.getCollection('foo').aggregate([
{$unwind:"$team"},
{
$group: {
_id: "$team.name",
"Active": {$sum: {$cond:[{$eq:["$team.state","Active"]},1,0]}},
"Deactive": {$sum: {$cond:[{$eq:["$team.state","Deactive"]},1,0]}},
"Unknown": {$sum: {$cond:[{$eq:["$team.state","Unknown"]},1,0]}}
}
},
{$project: {_id:0, name:"$_id", Active:1, Deactive:1, Unknown:1}},
{$sort: {name:1}}
])
Output for your sample:
[
{ "name" : "A", "Active" : 2, "Deactive" : 0, "Unknown" : 1 },
{ "name" : "B", "Active" : 0, "Deactive" : 3, "Unknown" : 0 },
{ "name" : "C", "Active" : 0, "Deactive" : 0, "Unknown" : 3 }
]

Is there a `$slice` like comparison for MongoDB's filters?

In MongoDB there is a projection operator $slice which allows projecting a subarray.
Is there any way to filter by an array slice as well? Something like:
db.testdb.find( {arrayofstring: { $eqSlice: {$slice: [0,1], $val: [ "a" ] } } }, {...})
Edit: An example and its expected output
> db.studentsTestDataTypes.find({},{ _id: 1, int: 1, arraystring: 1})
{ "_id" : ObjectId("56977186756088b586154f9d"), "int" : 2001, "arraystring" : [ "a", "b", "c" ] }
{ "_id" : ObjectId("56977186756088b586154f9e"), "int" : 2002, "arraystring" : [ "d", "e", "f" ] }
Example of expected result: Filtering by those entries with value "a" at the first position of arraystring:
{ "_id" : ObjectId("56977186756088b586154f9d"), "int" : 2001, "arraystring" : [ "a", "b", "c" ] }
Suppose you have the following document in your collection:
{ "_id" : ObjectId("56977186756088b586154f9d"), "int" : 2001, "arraystring" : [ "a", "b", "c" ] }
{ "_id" : ObjectId("56977186756088b586154f9e"), "int" : 2002, "arraystring" : [ "d", "e", "f" ] }
{ "_id" : ObjectId("56978e21ae9bb55c0d7cdc67"), "int" : 2001, "arraystring" : [ "b", "a", "c" ] }
The easier and best way is to use dot notation
db.collection.find({ "arraystring.0": "a" } )
Which yields:
{
"_id" : ObjectId("56977186756088b586154f9d"),
"int" : 2001,
"arraystring" : [
"a",
"b",
"c"
]
}

Aggregation on the basis of the set of nested docs

Let's say I have the next 5 docs:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
I want to manipulate the collection so that it will return a group of students (with their _id) by set (combination) of courses they take and calculate how many students in each set.
In the example above I have 3 set (combination) of courses and number of students as below:
1 - [ "A", "B" ] <- 2 students take this combination
2 - [ "A", "B", "C" ] <- 2 students
3 - [ "A", "B", "D" ] <- 1 student
I feel like this is more like MapReduce task rather than Aggregation...not sure...
UPDATE 1
Thanks a lot to #ExplosionPills
So the following aggregation command:
db.students.aggregate([{
$group: {
_id: "$courses",
count: {$sum: 1},
students: {$push: "$_id"}
}
}])
gives me the following output:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
It groups by set of courses, counts number of students belong to it and their _ids.
UPDATE 2
I found out, the aggregation above treats combination [ "C", "A", "B" ] as different from [ "A", "B", "C" ]. But I need these 2 count as same.
So let's look at the following documents:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }
Let's see this in output:
{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
See the lines 1 and 3 - this is not what I wanted.
So, to treat [ "C", "A", "B" ] and [ "A", "B", "C" ] as same combination I changed the aggregation as follows:
db.students.aggregate([
{$unwind: "$courses" },
{$sort : {"courses": 1}},
{$group: {_id: "$_id", courses: {$push: "$courses"}}},
{$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
])
Output:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }
This is an aggregate operation using grouping.
db.students.aggregate([{
$group: {
// Uniquely identify the document.
// The $ syntax queries on this field
_id: "$courses",
// Add 1 for each field found (effectively a counter)
count: {$sum: 1}
}
}]);
EDIT:
If the courses can be in any order, you can $unwind, $sort, and $group again as suggested in the edited question. It's also possible to do this via mapReduce, but I'm not sure which is faster.
db.students.mapReduce(
function () {
// Use the sorted courses as the key
emit(this.courses.sort(), this._id);
},
function (key, values) {
return {"students": values, count: values.length};
},
{out: {inline: 1}}
)

MongoDB with flexible fields. How to find all records with specific field name?

I've a scheme
{
"_id" : ObjectId("50ec1d93ba02ece1979ee4a5"),
"url" : "google.com"
"results" : [
{ "1357651347" : { "data1" : "a", "data2" : "b", "data3" : "c" }},
{ "1357651706" : { "data1" : "d", "data2" : "e", "data3" : "f" }},
{ "1357651772" : { "data1" : "g", "data2" : "h", "data3" : "i" }}
]
}
I'm interested in the results with id 1357651706. How do I get them (in PHP)?
You can check if something exists or you can check if something is null (or not).
So for $exists ( http://docs.mongodb.org/manual/reference/operator/exists/ ):
db.col.find({"results.1357651706": {$exists:true}})
And for checking if something is not null:
db.col.find({ "results.1357651706": {$ne: null} })
Note: It is normally better to use the null query the other way around to check if something is null and then do the process of judgement in your app. This way you can use sparse index on your query too to make it leaner.
+1 to Sammaye's answer, but consider reworking your schema to get rid of the dynamic field names which make queries like this awkward.
Something like this instead:
{
"_id" : ObjectId("50ec1d93ba02ece1979ee4a5"),
"url" : "google.com"
"results" : [
{ id: 1357651347, "data1" : "a", "data2" : "b", "data3" : "c" },
{ id: 1357651706, "data1" : "d", "data2" : "e", "data3" : "f" },
{ id: 1357651772, "data1" : "g", "data2" : "h", "data3" : "i" }
]
}
Then you can query for the doc containing the result you're looking for like this:
db.col.find({'results.id': 1357651706})