Aggregate on array of embedded documents - mongodb

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:
{
"name": string,
"count": integer
}
Now I want to aggregate these subdocuments to find
The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.
I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.
[{
$match: {
_id: id
}
}, {
$unwind: {
path: "$array"
}
}, {
$sort: {
'count': -1
}
}, {
$limit: x
}]
Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.

The sort has to include the array name in order to avoid an additional sort later on.
Given the following document to work with:
{
students: [{
count: 4,
name: "Ann"
}, {
count: 7,
name: "Brad"
}, {
count: 6,
name: "Beth"
}, {
count: 8,
name: "Catherine"
}]
}
As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.
db.tests.aggregate([
{$match: {
_id: ObjectId("5c1b191b251d9663f4e3ce65")
}},
{$unwind: {
path: "$students"
}},
{$match: {
"students.name": /[he]/
}},
{$sort: {
"students.count": -1
}},
{$limit: 2}
])
This is the output given the above mentioned input:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }
Both names contain the letters "h" and "e", and the output is sorted from high to low.
When setting the limit to 1, the output is limited to:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
In this case only the highest count has been kept after having matched the names.
=====================
Edit for the extra question:
Yes, the first $match can be changed to filter on specific universities.
{$match: {
university: "University X"
}},
That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.
The following match would retrieve the students for the given university for a given academic year in case that would be needed.
{$match: {
university: "University X",
academic_year: "2018-2019"
}},
That should narrow it down to get the correct documents.

Related

How to aggregate all existing field in my document [duplicate]

I got a problem when I use db.collection.aggregate in MongoDB.
I have a data structure like:
_id:...
Segment:{
"S1":1,
"S2":5,
...
"Sn":10
}
It means the following in Segment: I might have several sub attributes with numeric values. I'd like to sum them up as 1 + 5 + .. + 10
The problem is: I'm not sure about the sub attributes names since for each document the segment numbers are different. So I cannot list each segment name. I just want to use something like a for loop to sum all values together.
I tried queries like:
db.collection.aggregate([
{$group:{
_id:"$Account",
total:{$sum:"$Segment.$"}
])
but it doesn't work.
You have made the classical mistake to have arbitrary field names. MongoDB is "schema-free", but it doesn't mean you don't need to think about your schema. Key names should be descriptive, and in your case, f.e. "S2" does not really mean anything. In order to do most kinds of queries and operations, you will need to redesign you schema to store your data like this:
_id:...
Segment:[
{ field: "S1", value: 1 },
{ field: "S2", value: 5 },
{ field: "Sn", value: 10 },
]
You can then run your query like:
db.collection.aggregate( [
{ $unwind: "$Segment" },
{ $group: {
_id: '$_id',
sum: { $sum: '$Segment.value' }
} }
] );
Which then results into something like this (with the only document from your question):
{
"result" : [
{
"_id" : ObjectId("51e4772e13573be11ac2ca6f"),
"sum" : 16
}
],
"ok" : 1
}
Starting Mongo 3.4, this can be achieved by applying inline operations and thus avoid expensive operations such as $group:
// { _id: "xx", segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
db.collection.aggregate([
{ $addFields: {
total: { $sum: {
$map: { input: { $objectToArray: "$segments" }, as: "kv", in: "$$kv.v" }
}}
}}
])
// { _id: "xx", total: 42, segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
The idea is to transform the object (containing the numbers to sum) as an array. This is the role of $objectToArray, which starting Mongo 3.4.4, transforms { s1: 1, s2: 3, ... } into [ { k: "s1", v: 1 }, { k: "s2", v: 3 }, ... ]. This way, we don't need to care about the field names since we can access values through their "v" fields.
Having an array rather than an object is a first step towards being able to sum its elements. But the elements obtained with $objectToArray are objects and not simple integers. We can get passed this by mapping (the $map operation) these array elements to extract the value of their "v" field. Which in our case results in creating this kind of array: [1, 3, 18, 42].
Finally, it's a simple matter of summing elements within this array, using the $sum operation.
Segment: {s1: 10, s2: 4, s3: 12}
{$set: {"new_array":{$objectToArray: "$Segment"}}}, //makes field names all "k" or "v"
{$project: {_id:0, total:{$sum: "$new_array.v"}}}
"total" will be 26.
$set replaces $addFields in newer versions of mongo. (I'm using 4.2.)
"new_array": [
{
"k": "s1",
"v": 10
},
{
"k": "s2",
"v": 4
},
{
"k": "s3",
"v": 12
}
]
You can also use regular expressions. Eg. /^s/i for words starting with "s".

(mongo) How could i get the documents that have a value in array along with size

I have a mongo collection with something like the below:
{
"_id" : ObjectId("59e013e83260c739f029ee21"),
"createdAt" : ISODate("2017-10-13T01:16:24.653+0000"),
"updatedAt" : ISODate("2017-11-11T17:13:52.956+0000"),
"age" : NumberInt(34),
"attributes" : [
{
"year" : "2017",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
},
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2016",
"contest" : [
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2015",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
}
]
}
],
"name" : {
"id" : NumberInt(9850214),
"first" : "john",
"last" : "afham"
}
}
now how could i get the number of documents who have contest with name category1 more than one time or more than 2 times ... and so on
I tried to use size and $gt but couldn't form a correct result
Assuming that a single contest will never contain the same name (e.g. "Category1") value more than once, here is what you can do.
The absence of any unwinds will result in improved performance in particular on big collections or data sets with loads of entries in your attributes arrays.
db.collection.aggregate({
$project: {
"numberOfOccurrences": {
$size: { // count the number of matching contest elements
$filter: { // get rid of all contest entries that do not contain at least one entry with name "Category1"
input: "$attributes",
cond: { $in: [ "Category1", "$$this.contest.name" ] }
}
}
}
}
}, {
$match: { // filter the number of documents
"numberOfOccurrences": {
$gt: 1 // put your desired min. number of matching contest entries here
}
}
}, {
$count: "numberOfDocuments" // count the number of matching documents
})
Try this on for size.
db.foo.aggregate([
// Start with breaking down attributes:
{$unwind: "$attributes"}
// Next, extract only name = Category1 from the contest array. This will yield
// an array of 0 or 1 because I am assuming that contest names WITHIN
// the contest array are unique. If found and we get an array of 1, turn that
// into a single doc instead of an array of a single doc by taking arrayElemAt 0.
// Otherwise, "x" is not set into the doc AT ALL. All other vars in the doc
// will go away after $project; if you want to keep them, change this to
// $addFields:
,{$project: {x: {$arrayElemAt: [ {$filter: {
input: "$attributes.contest",
as: "z",
cond: {$eq: [ "$$z.name", "Category1" ]}
}}, 0 ]}
}}
// We split up attributes before, creating multiple docs with the same _id. We
// must now "recombine" these _id (OP said he wants # of docs with name).
// We now have to capture all the single "x" that we created above; docs without
// Category1 will have NO "x" and we don't want to include them in the count.
// Also, we KNOW that name can only be Category 1 but division could vary, so
// let's capture that in the $push in case we might want it:
,{$group: {_id: "$_id", x: {$push: "$x.division"}}}
// One more pass to compute length of array:
,{$addFields: {len: {$size: "$x"}} }
// And lastly, the filter for one time or two times or n times:
,{$match: {len: {$gt: 2} }}
]);
First, we need to flatten the document by the attributes and contest fields. Then to group by the document initial _id and a contest names counting different contests along the way. Finally, we filter the result.
db.person.aggregate([
{ $unwind: "$attributes" },
{ $unwind: "$attributes.contest" },
{$group: {
_id: {initial_id: "$_id", contest: "$attributes.contest.name"},
count: {$sum: 1}
}
},
{$match: {$and: [{"_id.contest": "Category1"}, {"count": {$gt: 1}}]}}]);

How to query items in array alone in Mongodb

My collection name is employee and my collections as follows
{
"Title":"IssueFixingTeam",
"TeamLead":"Mr.Bean",
"workers":["xxx","yyy","zzz"]
},
{
"Title":"DevelopmentTeam",
"TeamLead":"Mr.John Doe",
"workers":["aa","dd","ss"]
}
how to query to find, how many workers are there under TeamLead "Mr.Bean"
Thanks in advance
if you are interested in just one record (otherwise, see the answer by #felix) belonging to "Mr.Bean", then this could give you the required count:
db.employee.findOne({'TeamLead': 'Mr.Bean'}).workers.length
Use Match
to filter TeamLead: Mr.Bean
use Size operator in Project
to get size of array,
db.collection.aggregate([{
$match: {
TeamLead: "Mr.Bean"
}
}, {
$project: {
"TeamLead":1,
workers: {
$size: "$workers"
}
}
}])
You can use the aggregation framework.
In case you are only interested in matching documents of a specific TeamLead and sum per document:
db.foo.aggregate([{$match: {"TeamLead": "Mr.Bean"}},
{$project: {"num_workers": {$size: "$workers"}}}])
Output:
{ "_id" : ObjectId("58c6a5ef9bc86fa5c7e4fa50"), "num_workers" : 3 }
If you want to group documents by TeamLead and get the number of unique workers under each TeamLead:
db.foo.aggregate([{$group: {"_id": "$TeamLead", "workers": {$addToSet: "$workers"}}},
{$unwind: "$workers"},
{$project: {"num_workers": {$size: "$workers"}}}])
Output:
{ "_id" : "Mr.John Doe", "num_workers" : 3 }
{ "_id" : "Mr.Bean", "num_workers" : 3 }

How to aggregate with group by and sort correctly

I'm using Mongodb.
Consider my next document:
{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 1, created: ISODate("2014-05-01..."), another_col : "f" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col : "a" }
What I'm trying to do is a simple groupby on the uid and sorting the created by descending order so i could get the first row for each uid.
An example for an expected output
{ uid: 1, created: ISODate("2014-05-05..."), another_col: "y" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col: "a" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col: "w" }
The best I could get is:
db.mycollection.aggregate( {$group: {_id: "$uid", rows: {$push: { "created" : "$created" }}}}, sort { // doesnt work well } )
Anyone can guide me for the right combination of group by and sorting?
It just doesn't work as I was expecting.
(note: I have checked many threads, but I'm unable to find the correct answer for my case)
There are a few catches here to understand.
When you use $group the boundaries will be sorted in the order that they were discovered without either an initial or ending stage $sort operation. So if your documents were originally in an order like this:
{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
Then just using $group without a $sort on the end on the pipeline would return you results like this:
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
That is one concept, but it actually seems like what you are expecting in results requires returning the "last other fields" by a sorted order of the uid is what you are looking for. In that case the way to get your result is actually to $sort first and then make use of the $last operator:
db.mycollection.aggregate([
// Sorts everything first by _id and created
{ "$sort": { "_id": 1, "created": 1 } },
// Group with the $last results from each boundary
{ "$group": {
"_id": "$uid",
"created": { "$last": "$created" },
"another_col": { "$last": "$created" }
}}
])
Or essentially apply the sort to what you want.
The difference between $last and $max is that the latter will choose the "highest" value for the given field within the grouping _id, regardless of the current sorted on un-sorted order. On the other hand, $last will choose the value that occurs in the same "row" as the "last" grouping _id value.
If you were actually looking to sort the values of an array then the approach is similar. Keeping the array members in "created" order you would also sort first:
db.mycollection.aggregate([
// Sorts everything first by _id and created
{ "$sort": { "_id": 1, "created": 1 } },
// Group with the $last results from each boundary
{ "$group": {
"_id": "$uid",
"row": {
"$push": {
"created": "$created",
"another_col": "$another_col"
}
}
}}
])
And the documents with those fields will be added to the array with the order they were already sorted by.
using $project along with this
db.mycollection.aggregate([{$group: {_id: "$uid", rows: {$max:"$created"}}}])
should help you, refer to these links
http://docs.mongodb.org/manual/reference/operator/aggregation/project/
Mongodb group and project operators
mongodb aggregation framework group + project
If all you're looking for is the first row that means you're looking for the max. Just use the built-in $max accumulator.
db.mycollection.aggregate([{$group: {_id: "$uid", rows: {$max:"$created"}}}])
You would use the $push accumulator if you needed to process all the creation dates. For more information on the accumulators see: http://docs.mongodb.org/manual/reference/operator/aggregation/group/
From your comments if you want the full documents returned, and want to be able to iterate over all the documents then you really don't need to aggregate the results. Something like this should get you what you want.
db.mycollection.find({$query:{}, $orderby:{uid:1,created:-1}})

How to sum every fields in a sub document of MongoDB?

I got a problem when I use db.collection.aggregate in MongoDB.
I have a data structure like:
_id:...
Segment:{
"S1":1,
"S2":5,
...
"Sn":10
}
It means the following in Segment: I might have several sub attributes with numeric values. I'd like to sum them up as 1 + 5 + .. + 10
The problem is: I'm not sure about the sub attributes names since for each document the segment numbers are different. So I cannot list each segment name. I just want to use something like a for loop to sum all values together.
I tried queries like:
db.collection.aggregate([
{$group:{
_id:"$Account",
total:{$sum:"$Segment.$"}
])
but it doesn't work.
You have made the classical mistake to have arbitrary field names. MongoDB is "schema-free", but it doesn't mean you don't need to think about your schema. Key names should be descriptive, and in your case, f.e. "S2" does not really mean anything. In order to do most kinds of queries and operations, you will need to redesign you schema to store your data like this:
_id:...
Segment:[
{ field: "S1", value: 1 },
{ field: "S2", value: 5 },
{ field: "Sn", value: 10 },
]
You can then run your query like:
db.collection.aggregate( [
{ $unwind: "$Segment" },
{ $group: {
_id: '$_id',
sum: { $sum: '$Segment.value' }
} }
] );
Which then results into something like this (with the only document from your question):
{
"result" : [
{
"_id" : ObjectId("51e4772e13573be11ac2ca6f"),
"sum" : 16
}
],
"ok" : 1
}
Starting Mongo 3.4, this can be achieved by applying inline operations and thus avoid expensive operations such as $group:
// { _id: "xx", segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
db.collection.aggregate([
{ $addFields: {
total: { $sum: {
$map: { input: { $objectToArray: "$segments" }, as: "kv", in: "$$kv.v" }
}}
}}
])
// { _id: "xx", total: 42, segments: { s1: 1, s2: 3, s3: 18, s4: 20 } }
The idea is to transform the object (containing the numbers to sum) as an array. This is the role of $objectToArray, which starting Mongo 3.4.4, transforms { s1: 1, s2: 3, ... } into [ { k: "s1", v: 1 }, { k: "s2", v: 3 }, ... ]. This way, we don't need to care about the field names since we can access values through their "v" fields.
Having an array rather than an object is a first step towards being able to sum its elements. But the elements obtained with $objectToArray are objects and not simple integers. We can get passed this by mapping (the $map operation) these array elements to extract the value of their "v" field. Which in our case results in creating this kind of array: [1, 3, 18, 42].
Finally, it's a simple matter of summing elements within this array, using the $sum operation.
Segment: {s1: 10, s2: 4, s3: 12}
{$set: {"new_array":{$objectToArray: "$Segment"}}}, //makes field names all "k" or "v"
{$project: {_id:0, total:{$sum: "$new_array.v"}}}
"total" will be 26.
$set replaces $addFields in newer versions of mongo. (I'm using 4.2.)
"new_array": [
{
"k": "s1",
"v": 10
},
{
"k": "s2",
"v": 4
},
{
"k": "s3",
"v": 12
}
]
You can also use regular expressions. Eg. /^s/i for words starting with "s".