I have a list of students in one collection and their grades in another collection. The schema (stripped of other details) look like
Students
{
_id: 1234,
student: {
code: "WUKD984KUK"
}
}
Grades
{
_id: 3456,
grade: 24,
studentCode: "WUKD984KUK"
}
There can be multiple grade entries for each student. I need a total count of students who are present in grades table but not in the student table. I also need a count of grades for each student who are not in the student table. The following is the query I had written,
var existingStudents = db.students.find({}, {_id: 0, 'student.code': 1});
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);
But this returns me all of the student details as if the match is not working. When I run just the first part of this query, I get the student details as
{ "student" : { "code" : "A210225504104" } }
I feel that because the return value is two deep the match isnt working. What is the right way to get this?
Use this code
var existingStudents=[];
db.students.find({}, {_id: 0, 'student.code': 1}).forEach(function(doc){existingStudents.push(doc.student.code)})
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);
Related
I have a collection of documents (grades) with some missing keys like this:
Name
ID
Github
Points
Peter
1
123
1
PPane
456
Alice
Alice1
234
2
Alice1
567
I want to group this data by matching any of Name, ID or Github together and collecting the points.
The result should look like this:
_id
Points
[Peter, 1, PPane]
[123, 456]
[Alice, 2, Alice1]
[234, 567]
Right now I am doing this in the backend like this:
const students = new Map<string, CourseStudent>();
const keys = ['Name', 'ID', 'Github'];
for (const grade of grades) {
let student: CourseStudent | undefined = undefined;
for (const key of keys) {
const value = grade[key];
if (value && (student = students.get(value))) {
break;
}
}
if (!student) {
const {Name, ID, Github} = grade;
student = {_id: {Name, ID, Github}, points: []};
}
for (const key of keys) {
const value = grade[key];
if (value) {
students.set(value, student);
}
}
student.points.push(grade.points);
}
return Array.from(students.values());
The data sizes in my use case are 1000-10000 grades (100-1000 students x 10 assignments).
The actual "grade" data contains a lot more fields, most of which are not used for the final result, but keeping all of it in memory can be costly.
Is there a way to achieve this in the database with an aggregation pipeline, e.g. using $group?
To start with, here is a non-working aggregation, because it requires ALL fields to match instead of just one:
{$group: {_id: ['$Name', '$ID', '$Github'], points: {$push: '$Points'}}},
Since you have only 3 keys to use, grouping by one and collecting the 2 others will result in one missing key only:
$group by ID and collect other keys. Assuming there is at least one document per each user which contains the user's key (as in your example), this step will result with number of documents as the number of users + 1. Each document contains the user ID and Name or Github. One document refer all document without any ID.
$match to keep only the users documents
For each user document get all the matching original documents using $lookup, we now have the data needed to get them.
Group and format the results.
db.collection.aggregate([
{$group: {_id: "$ID", Name: {$first: "$Name"}, Github: {$first: "$Github"}}},
{$match: {_id: {$ne: null}}},
{$lookup: {
from: "collection",
let: {github: "$Github", iD: "$_id"},
pipeline: [
{$match: {
$expr: {$or: [
{$eq: ["$$github", "$Github"]},
{$eq: ["$$iD", "$ID"]}
]}
}},
{$group: {
_id: 0,
Name: {$addToSet: "$Name"},
Github: {$addToSet: "$Github"},
ID: {$addToSet: "$ID"},
Points: {$push: "$Points"}
}}
],
as: "docs"
}
},
{$replaceRoot: {newRoot: {$first: "$docs"}}},
{$project: {
_id: [{$first: "$Name"}, {$first: "$ID"}, {$first: "$Github"}],
Points: 1
}}
])
See how it works on the playground example
I have a table Thread:
{
userId: String
messageId: String
}
Now I have an array of userIds, I need to query 20 messageIds for each of them, I can do it with a loop:
const messageIds = {}
for (const userId of userIds) {
const results = await Thread.find({ userId }).sort({ _id: -1 }).limit(20).exec()
messageIds[userId] = results.map(result => result.messageId)
}
But of course this doesn't perform well. Is there a better solution?
The problem with your approach is that you are issuing multiple separate queries to MongoDB.
The simplest workaround to this is using the $push and $slice approach. But this has the problem that the intermediate step would creating an array of huge size.
Another way could be to use $facet as part of aggregation query.
So you need a $facet step in the aggregation like -
[
{$facet: {
'userId1': [
{$match: {userId: 'userId1'} },
{$limit: 20},
{$group: {_id: '', msg: {$push: '$messageId'} } }
],
'userId2': [
{$match: {userId: 'userId2'} },
{$limit: 20},
{$group: {_id: '', msg: {$push: '$messageId'} } }
],
.... (for each userId in array)
}}
]
You can easily just generate this query by iterating over the list of users and adding keys for each user.
So you end up with an object where key is the userId and the value is the array of messages (obj[userId].msg)
You can use aggregation to group threads by userId, and return the top 20:
db.threads.aggregate([
{$match: {userId:{$in: userIds}}},
{$sort: {_id: -1}},
{$group: {_id: "$userId", threads: {$push: "$$ROOT"}}},
{$project: {_id:0, userId:"$_id", threads: {$slice:["$threads", 20]}}}
])
I have a document (name, firstName, age).
This command gives me the different names of the document:
db.getCollection('persons').distinct("name")
How do I do to get the corresponding firstNames?
Thanks!
You could try an aggregation query which groups the name and firstName. Eventually you could also add a count to it to see which combinations are repeated (but that is not necessary).
Here is an example:
db.test1.aggregate(
[
{
$group: {_id: {name: "$name", firstName: "$firstName"}, count: {$sum: 1}}
}
]
)
Here is another another option to show an aggregated list:
db.test1.aggregate(
[
{
$group: {_id: {name: "$name"}, firstName: { $push: "$firstName" }}
}
]
)
Following this question's answer (https://stackoverflow.com/a/20817040/2656506) I was able to group a field based on it's first character with this command:
db.kits.aggregate({ $group: {_id: {$substr: ['$kit', 0, 1]}, count: {$sum: 1}}})
But I can't figure out how I can additionally group only those documents which match an additional condition like _id: 'abc' in the same query. Can it be done in one query?
Thanks in advance!
add $match pipeline stage to your aggregation query:
db.kits.aggregate(
[
{
$match: {
_id: 'abc'
}
},
{
$group: {
_id: {
$substr: ['$kit', 0, 1]
},
count: {$sum: 1}
}
}
]
)
I have a collection comments. Each comment has an authorId.
I want to group the comments collection into 'threads' by the authorId, and attach the 5 most recent comments.
So far I have tried this:
db.comments.aggregate([{$group: { _id: "$authorId", recentComments: { $push: "$$ROOT"} }}])
But this attaches all comments. I then tried to add a limit like this:
db.comments.aggregate([{$group: { _id: "$authorId", recentComments: { $push: "$$ROOT"} }}, {$limit: 5}])
But this doesnt limit the number of documents, but instead the number of grouped documents.
Any ideas?
Adding a project worked for me...
db.comments.aggregate([
{$group: { _id: "$authorId", recentComments: { $push: "$$ROOT"} }},
{$project: {_id: 1, title: 1, recentComments: {$slice: ['$recentComments', 0, 5]}}}])