I have a document (name, firstName, age).
This command gives me the different names of the document:
db.getCollection('persons').distinct("name")
How do I do to get the corresponding firstNames?
Thanks!
You could try an aggregation query which groups the name and firstName. Eventually you could also add a count to it to see which combinations are repeated (but that is not necessary).
Here is an example:
db.test1.aggregate(
[
{
$group: {_id: {name: "$name", firstName: "$firstName"}, count: {$sum: 1}}
}
]
)
Here is another another option to show an aggregated list:
db.test1.aggregate(
[
{
$group: {_id: {name: "$name"}, firstName: { $push: "$firstName" }}
}
]
)
Related
I have a collection of documents (grades) with some missing keys like this:
Name
ID
Github
Points
Peter
1
123
1
PPane
456
Alice
Alice1
234
2
Alice1
567
I want to group this data by matching any of Name, ID or Github together and collecting the points.
The result should look like this:
_id
Points
[Peter, 1, PPane]
[123, 456]
[Alice, 2, Alice1]
[234, 567]
Right now I am doing this in the backend like this:
const students = new Map<string, CourseStudent>();
const keys = ['Name', 'ID', 'Github'];
for (const grade of grades) {
let student: CourseStudent | undefined = undefined;
for (const key of keys) {
const value = grade[key];
if (value && (student = students.get(value))) {
break;
}
}
if (!student) {
const {Name, ID, Github} = grade;
student = {_id: {Name, ID, Github}, points: []};
}
for (const key of keys) {
const value = grade[key];
if (value) {
students.set(value, student);
}
}
student.points.push(grade.points);
}
return Array.from(students.values());
The data sizes in my use case are 1000-10000 grades (100-1000 students x 10 assignments).
The actual "grade" data contains a lot more fields, most of which are not used for the final result, but keeping all of it in memory can be costly.
Is there a way to achieve this in the database with an aggregation pipeline, e.g. using $group?
To start with, here is a non-working aggregation, because it requires ALL fields to match instead of just one:
{$group: {_id: ['$Name', '$ID', '$Github'], points: {$push: '$Points'}}},
Since you have only 3 keys to use, grouping by one and collecting the 2 others will result in one missing key only:
$group by ID and collect other keys. Assuming there is at least one document per each user which contains the user's key (as in your example), this step will result with number of documents as the number of users + 1. Each document contains the user ID and Name or Github. One document refer all document without any ID.
$match to keep only the users documents
For each user document get all the matching original documents using $lookup, we now have the data needed to get them.
Group and format the results.
db.collection.aggregate([
{$group: {_id: "$ID", Name: {$first: "$Name"}, Github: {$first: "$Github"}}},
{$match: {_id: {$ne: null}}},
{$lookup: {
from: "collection",
let: {github: "$Github", iD: "$_id"},
pipeline: [
{$match: {
$expr: {$or: [
{$eq: ["$$github", "$Github"]},
{$eq: ["$$iD", "$ID"]}
]}
}},
{$group: {
_id: 0,
Name: {$addToSet: "$Name"},
Github: {$addToSet: "$Github"},
ID: {$addToSet: "$ID"},
Points: {$push: "$Points"}
}}
],
as: "docs"
}
},
{$replaceRoot: {newRoot: {$first: "$docs"}}},
{$project: {
_id: [{$first: "$Name"}, {$first: "$ID"}, {$first: "$Github"}],
Points: 1
}}
])
See how it works on the playground example
I have a database of about 50k "company" records.
I want to find duplicates by matching:
name and street fields.
OR
phone field
(I consider both #1 and #2 unique identifiers, so either can be used to find duplicates.)
I am able to write the $group statement to match based on #1:
_id: {
name: '$name',
street: 'street'
},
uniqueIds: {
$addToSet: '$_id'
},
count: {
$sum: 1
}
I tried something like this to match one or the other:
_id: {
$or: [
{name: '$name', street: '$street'},
{phone: '$phone}
]
}...
But that just returns a boolean.
How to group by filtering for #1 or #2 above in the same aggregation?
One option is to use $facet:
db.company.aggregate([
{ $facet:{
by_name_street:[ {$group:{ _id:{n:"$name",str:"$street" }, cnt:{$sum:1} }} ] ,
by_phone:[ {$group:{ _id:"$phone" , cnt:{$sum:1} }} ]
} }
])
Following this question's answer (https://stackoverflow.com/a/20817040/2656506) I was able to group a field based on it's first character with this command:
db.kits.aggregate({ $group: {_id: {$substr: ['$kit', 0, 1]}, count: {$sum: 1}}})
But I can't figure out how I can additionally group only those documents which match an additional condition like _id: 'abc' in the same query. Can it be done in one query?
Thanks in advance!
add $match pipeline stage to your aggregation query:
db.kits.aggregate(
[
{
$match: {
_id: 'abc'
}
},
{
$group: {
_id: {
$substr: ['$kit', 0, 1]
},
count: {$sum: 1}
}
}
]
)
I have a collection comments. Each comment has an authorId.
I want to group the comments collection into 'threads' by the authorId, and attach the 5 most recent comments.
So far I have tried this:
db.comments.aggregate([{$group: { _id: "$authorId", recentComments: { $push: "$$ROOT"} }}])
But this attaches all comments. I then tried to add a limit like this:
db.comments.aggregate([{$group: { _id: "$authorId", recentComments: { $push: "$$ROOT"} }}, {$limit: 5}])
But this doesnt limit the number of documents, but instead the number of grouped documents.
Any ideas?
Adding a project worked for me...
db.comments.aggregate([
{$group: { _id: "$authorId", recentComments: { $push: "$$ROOT"} }},
{$project: {_id: 1, title: 1, recentComments: {$slice: ['$recentComments', 0, 5]}}}])
I have a list of students in one collection and their grades in another collection. The schema (stripped of other details) look like
Students
{
_id: 1234,
student: {
code: "WUKD984KUK"
}
}
Grades
{
_id: 3456,
grade: 24,
studentCode: "WUKD984KUK"
}
There can be multiple grade entries for each student. I need a total count of students who are present in grades table but not in the student table. I also need a count of grades for each student who are not in the student table. The following is the query I had written,
var existingStudents = db.students.find({}, {_id: 0, 'student.code': 1});
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);
But this returns me all of the student details as if the match is not working. When I run just the first part of this query, I get the student details as
{ "student" : { "code" : "A210225504104" } }
I feel that because the return value is two deep the match isnt working. What is the right way to get this?
Use this code
var existingStudents=[];
db.students.find({}, {_id: 0, 'student.code': 1}).forEach(function(doc){existingStudents.push(doc.student.code)})
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);