Getting aggregate count for nested fields in mongo - mongodb

I have a mongo collection mimicking this java class. A student can be taught a number of subjects across campus.
class Students {
String studentName;
Map<String,List<String>> subjectsByCampus;
}
So a structure will look like this
{
_id: ObjectId("someId"),
studentName:'student1',
subjectByCampusName:{
campus1:['subject1','subject2'],
campus2: ['subject3']
},
_class: 'fqnOfTheEntity'
}
I want to find the count of subjects offered by each campus or be able to query the count of subjects offered by a specific campus. Is there a way to get it through query?

Mentioned in the comments, but the schema here does not appear to be particularly well-suited toward gathering the data requested in the question. For an individual student, this is doable via $size and processing the object (as an array) via $map. For example, if we want your desired out put of campus1:2, campus2:1 for the sample document provided, a pipeline to produce that in a countsByCampus field might look as follows:
[
{
"$addFields": {
"countsByCampus": {
"$arrayToObject": {
"$map": {
"input": {
"$objectToArray": "$subjectByCampusName"
},
"in": {
"$mergeObjects": [
"$$this",
{
v: {
$size: "$$this.v"
}
}
]
}
}
}
}
}
}
]
Playground demonstration here with an output document of:
{
"_class": "fqnOfTheEntity",
"_id": "someId",
"countsByCampus": {
"campus1": 2,
"campus2": 1
},
"studentName": "student1",
"subjectByCampusName": {
"campus1": [
"subject1",
"subject2"
],
"campus2": [
"subject3"
]
}
}
Doing that across the entire collection would involve $grouping the results together. This can be done but would be an extremely resource-intensive and likely slow operation.

Related

how to query through hashmap data using mongo terminal

I have data stored in following format in mongodb .. help me in knowing how can I write a query to find if the database has a particular id stored in one of the sheet or not
the structure of data is like the following :-
{
"name": "abc",
"linked_data": {
"sheet 1": [
"7d387e05d3f8180a",
"8534sfjog904395"
],
"sheet 2": [
"7d387e05d3f8180a",
"54647sgdsevey67r34"
]
}
}
for example if id "8534sfjog904395" is mapped with "sheet 1".. then it should return me this data where id is mapped with the sheet.
I am passing id in the query and want to find inside linked_data and then all the sheets
Using dynamic value as field name is considered as an anti-pattern and introduce unnecessary complexity to queries. Nevertheless, you can use $objectToArray to convert the linked_data into array of k-v tuples. $filter to get only the sheet you want. Finally, revert back to original structure using $arrayToObject
db.collection.aggregate([
{
"$set": {
"linked_data": {
"$objectToArray": "$linked_data"
}
}
},
{
$set: {
linked_data: {
"$filter": {
"input": "$linked_data",
"as": "ld",
"cond": {
"$in": [
"8534sfjog904395",
"$$ld.v"
]
}
}
}
}
},
{
"$set": {
"linked_data": {
"$arrayToObject": "$linked_data"
}
}
},
{
$match: {
linked_data: {
$ne: {}
}
}
}
])
Mongo Playground

Merging / aggregating multiple documents in MongoDB

I have a little problem with my aggregations. I have a large collection of flat documents with the following schema:
{
_id:ObjectId("5dc027d38da295b969eca568"),
emp_no:10001,
salary:60117,
from_date:"1986-06-26",
to_date:"1987-06-26"
}
It's all about annual employee salaries. The data is exported from relational database so there are multiple documents with the same value of "emp_no" but the rest of their attributes vary. I need to aggregate them by values of attribute "emp_no" so as a result I will have something like this:
//one document
{
_id:ObjectId("5dc027d38da295b969eca568"),
emp_no:10001,
salaries: [
{
salary:60117,
from_date:"1986-06-26",
to_date:"1987-06-26"
},
{
salary:62102,
from_date:"1987-06-26",
to_date:"1988-06-25"
},
...
]
}
//another document
{
_id:ObjectId("5dc027d38da295b969eca579"),
emp_no:10002,
salaries: [
{
salary:65828,
from_date:"1996-08-03",
to_date:"1997-08-03"
},
...
]
}
//and so on
Last but not least there are almost 2.9m of documents so aggregating by "emp_no" manually would be a bit of a problem. Is there a way I can aggregate them using just mongo queries? How do I do this kind of thing? Thank you in advance for any help
The group stage of aggregation pipeline can be used to get this type of aggregates. Specify the attribute you want to group by as the value of _id field in the group stage.
How does the below query work for you?
db.collection.aggregate([
{
"$group": {
"_id": "$emp_no",
"salaries": {
"$push": {
"salary": "$salary",
"from_data": "$from_date",
"to_data": "$to_date"
}
},
"emp_no": {
"$first": "$emp_no"
},
"first_document_id": {
"$first": "$_id"
}
}
},
{
"$project": {
"_id": "$first_document_id",
"salaries": 1,
"emp_no": 1
}
}
])

MongoDb - How to only return field of nested subdocument when using lookup aggregation?

I'm very new to MongoDb so I'm used to SQL.
Right now I have two collections in my database:
1) Series (which has nested subdocuments)
2) Review (decided to reference to episode subdocument because there will be a lot of reviews)
See this picture for a better understanding.
Now I want to achieve te following. For every review (two in this case), I want to get the episode name.
I tried the following:
db.review.aggregate([
{
$lookup:{
from:"series",
localField:"episode",
foreignField:"seasons.episodes._id",
as:"episode_entry"
}
}
]).pretty()
The problem is that this returns (ofcourse) not only the title of the referenced episode, but it returns the whole season document.
See the picture below for my current output.
I don't know how to achieve it. Please help me.
I'm using Mongo 3.4.9
I would recommend the following series structure which unwinds the season array into multiple documents one for each season.
This will help you with inserting/updating the episodes directly.
Something like
db.series.insertMany([
{
"title": "Sherlock Holmes",
"nr": 1,
"episodes": [
{
"title": "A Study in Pink",
"nr": 1
},
{
"title": "The Blind Banker",
"nr": 2
}
]
},
{
"title": "Sherlock Holmes",
"nr": 2,
"episodes": [
{
"title": "A Scandal in Belgravia",
"nr": 1
},
{
"title": "The Hounds of Baskerville",
"nr": 2
}
]
}
])
The lookup query will do something like this
episode: { $in: [ episodes._id1, episodes._id2, ... ] }
From the docs
If the field holds an array, then the $in operator selects the
documents whose field holds an array that contains at least one
element that matches a value in the specified array (e.g. , , etc.)
So lookup will return all episodes when there is a match. You can then filter to keep only the one matching your review's episode.
So the query will look like
db.review.aggregate([
{
"$lookup": {
"from": "series",
"localField": "episode",
"foreignField": "episodes._id",
"as": "episode_entry"
}
},
{
"$addFields": {
"episode_entry": [
{
"$arrayElemAt": {
"$filter": {
"input": {
"$let": {
"vars": {
"season": {
"$arrayElemAt": [
"$episode_entry",
0
]
}
},
"in": "$$season.episodes"
}
},
"as": "result",
"cond": {
"$eq": [
"$$result._id",
"$episode"
]
}
}
}
},
0
]
}
}
])

Turn _ids into Keys of New Object

I have a huge bunch of documents as such:
{
_id: '1abc',
colors: [
{ value: 'red', count: 2 },
{ value: 'blue', count: 3}
]
},
{
_id: '2abc',
colors: [
{ value: 'red', count: 7 },
{ value: 'blue', count: 34},
{ value: 'yellow', count: 12}
]
}
Is it possible to make use of aggregate() to get the following?
{
_id: 'null',
colors: {
"1abc": [
{ value: 'red', count: 2 },
{ value: 'blue', count: 3}
],
"2abc": [
{ value: 'red', count: 7 },
{ value: 'blue', count: 34},
{ value: 'yellow', count: 12}
]
}
}
Basically, is it possible to turn all of the original documents' _ids into keys of a new object in the singular new aggregated document?
So far, when trying to use$group, I had not been able to use a variable value, e.g. $_id, on the left hand side of an assignment. Am I missing something or is it simply impossible?
I can do this easily using Javascript but it is unbearably slow. Hence why I am looking to see if it is possible using mongo native aggregate(), which will probably be faster.
If impossible... I would appreciate any kind suggestions that could point towards a sufficient alternative (change structure, etc.?). Thank you!
Like a said in comments, whilst there are things you can do with the aggregation framework or even mapReduce to make the "server" reshape the response, it's kind of silly to do so.
Lets consider the cases:
Aggregate
db.collection.aggregate([
{ "$match": { "_id": { "$in": ["1abc","2abc"] } } },
{ "$group": {
"_id": null,
"result": { "$push": "$$ROOT" }
}},
{ "$project": {
"colors": {
"1abc": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": "$result",
"as": "r",
"cond": { "$eq": [ "$$r._id", "1abc" ] },
}
},
"as": "r",
"in": "$$r.colors"
}},
0
]
},
"2abc": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": "$result",
"as": "r",
"cond": { "$eq": [ "$$r._id", "2abc" ] },
}
},
"as": "r",
"in": "$$r.colors"
}},
0
]
}
}
}}
])
So the aggregation framework purely does not dynamically generate "keys" of a document. If you want to process this way, then you need to know all of the "values" that you are going to use to make the keys in the result.
After putting everything into one document with $group, you can then work with the result array to extact data for your "keys". The basic operators here are:
$filter to get the matched element of the array for the "value" that you want.
$map to return just the specific property from the filtered array
$arrayElemAt to just grab the single elment that was filtered out of the resulting mapped array
So it really isn't practical in a lot of cases, and the coding of the statement is fairly involved.
MapReduce
db.collection.mapReduce(
function() {
var obj = { "colors": {} };
obj.colors[this._id] = this.colors;
emit(null,obj);
},
function(key,values) {
var obj = { "colors": {} };
values.forEach(function(value) {
Object.keys(value.colors).forEach(function(key) {
obj.colors[key] = value.colors[key];
});
})
return obj;
},
{ "out": { "inline": 1 } }
)
Since it is actually written in a "language" then you have the ability to loop structures and "build things" in a more dynamic way.
However, close inspection should tell you that the "reducer" function here is not doing anything more than being the processor of "all the results" which have been "stuffed into it" but each emitted document.
That means that "iterating the values" fed to the reducer is really no different to "iterating the cursor", and that leads to the next conclusion.
Cursor Iteration
var result = { "colors": {} };
db.collection.find().forEach(function(doc) {
result.colors[doc._id] = doc.colors;
})
printjson(result)
The simplicity of this should really speak volumes. It is afterall doing exactly what you are trying to "shoehorn" into a server operation and nothing more, and just simply "rolls up it sleeves" and gets on with the task at hand.
The key point here is none of the process requires any "aggregation" in a real sense, that cannot be equally achieved by simply iterating the cursor and building up the response document.
This is really why you always need to look at what you are doing and choose the right method. "Server side" aggregation has a primary task of "reducing" a result so you would not need to iterate a cursor. But nothing here "reduces" anything. It's just all of the data, transformed into a different format.
Therefore the simple approach for this type of "transform" is to just iterate the cursor and build up your transformed version of "all the results" anyway.

aggregating metrics data in mongodb

I'm trying to pull report data out of a realtime metrics system inspired by the NYC MUG/SimpleReach schema, and maybe my mind is still stuck in SQL mode.
The data is stored in a document like so...
{
"_id": ObjectId("5209683b915288435894cb8b"),
"account_id": 922,
"project_id": 22492,
"stats": {
"2009": {
"04": {
"17": {
"10": {
"sum": {
"impressions": 11
}
},
"11": {
"sum": {
"impressions": 603
}
},
},
},
},
}}
and I've been trying different variations of the aggregation pipeline with no success.
db.metrics.aggregate({
$match: {
'project_id':22492
}}, {
$group: {
_id: "$project_id",
'impressions': {
//This works, but doesn't sum up the data...
$sum: '$stats.2009.04.17.10.sum.impressions'
/* none of these work.
$sum: ['$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions']
$sum: {'$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions'}
$sum: '$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions'
*/
}
}
any help would be appreciated.
(ps. does anyone have any ideas on how to do date range searches using this document schema? )
$group is designed to be applied to many documents, but here we only have one matched document.
Instead, $project could be used to sum up specific fields, like this:
db.metrics.aggregate(
{ $match: {
'project_id':22492
}
},
{ $project: {
'impressions': {
$add: [
'$stats.2009.04.17.10.sum.impressions',
'$stats.2009.04.17.11.sum.impressions'
]
}
}
})
I don't think there is an elegant way to do date range searches with this schema, because MongoDB operations/predictions are designed to be applied on values, rather than keys in a document. If I understand correctly, the most interesting point in the slides you mentioned is to cache/pre-aggregate metrics when updating. That's a good idea, but could be implemented with another schema. For example, using date and time with indexes, which are supported by MongoDB, might be a good choice for range searches. Even aggregation framework supports data operations, giving more flexibility.