I have a collection of documents (grades) with some missing keys like this:
Name
ID
Github
Points
Peter
1
123
1
PPane
456
Alice
Alice1
234
2
Alice1
567
I want to group this data by matching any of Name, ID or Github together and collecting the points.
The result should look like this:
_id
Points
[Peter, 1, PPane]
[123, 456]
[Alice, 2, Alice1]
[234, 567]
Right now I am doing this in the backend like this:
const students = new Map<string, CourseStudent>();
const keys = ['Name', 'ID', 'Github'];
for (const grade of grades) {
let student: CourseStudent | undefined = undefined;
for (const key of keys) {
const value = grade[key];
if (value && (student = students.get(value))) {
break;
}
}
if (!student) {
const {Name, ID, Github} = grade;
student = {_id: {Name, ID, Github}, points: []};
}
for (const key of keys) {
const value = grade[key];
if (value) {
students.set(value, student);
}
}
student.points.push(grade.points);
}
return Array.from(students.values());
The data sizes in my use case are 1000-10000 grades (100-1000 students x 10 assignments).
The actual "grade" data contains a lot more fields, most of which are not used for the final result, but keeping all of it in memory can be costly.
Is there a way to achieve this in the database with an aggregation pipeline, e.g. using $group?
To start with, here is a non-working aggregation, because it requires ALL fields to match instead of just one:
{$group: {_id: ['$Name', '$ID', '$Github'], points: {$push: '$Points'}}},
Since you have only 3 keys to use, grouping by one and collecting the 2 others will result in one missing key only:
$group by ID and collect other keys. Assuming there is at least one document per each user which contains the user's key (as in your example), this step will result with number of documents as the number of users + 1. Each document contains the user ID and Name or Github. One document refer all document without any ID.
$match to keep only the users documents
For each user document get all the matching original documents using $lookup, we now have the data needed to get them.
Group and format the results.
db.collection.aggregate([
{$group: {_id: "$ID", Name: {$first: "$Name"}, Github: {$first: "$Github"}}},
{$match: {_id: {$ne: null}}},
{$lookup: {
from: "collection",
let: {github: "$Github", iD: "$_id"},
pipeline: [
{$match: {
$expr: {$or: [
{$eq: ["$$github", "$Github"]},
{$eq: ["$$iD", "$ID"]}
]}
}},
{$group: {
_id: 0,
Name: {$addToSet: "$Name"},
Github: {$addToSet: "$Github"},
ID: {$addToSet: "$ID"},
Points: {$push: "$Points"}
}}
],
as: "docs"
}
},
{$replaceRoot: {newRoot: {$first: "$docs"}}},
{$project: {
_id: [{$first: "$Name"}, {$first: "$ID"}, {$first: "$Github"}],
Points: 1
}}
])
See how it works on the playground example
Related
How to get one random row at a time from one collection after looking up entries in another collection?
Users collection
1) _id: abc, name: abc, group: 1
2) _id: xyz, name: xyyy, group: 3
3) _id: 123, name: yyy, group: 1
4) _id: rrr, name: tttt, group: 1
5) _id: eee, name: uuu, group: 1
Partnership Collection
1) _id: abc_123, fromUser: abc, toUser: 123
Mongo query to find random user from users collection where
_id not the req.query.Id (for example - not abc) AND
group matches the group of user (req.query.Id) AND
an entry with both users does not already exist in Partnership
collection - In above example, the user 123 will be ignored
because its already in partnership collection either as fromUser
or toUser
I started a query but need some help to proceed:
const users = await req.db
.collection('users')
.aggregate( [{
$match:{
group: group //group of req.user._id
},
},
{
$lookup: {
from: "partnership",
let: {
userId: "$fromUser"
},
as: "userDetails",
pipeline: [
{
$match: {
$expr: {
$eq: [
"$$userId",
"$_id"
],
}
}
},....
One option to continue is:
From the $lookup get only documents which have a forbidden connection
$match only users with no forbidden connection. Now we have only valid documents
Use $rand to get a random document (You may consider to use $sample, but only if you expect more than 100 documents as a response)
db.users.aggregate([
{$match: {
group: group ,
_id: {$ne: user_id}
}
},
{$lookup: {
from: "partnership",
let: {userId: "$_id"},
as: "prob",
pipeline: [
{$set: {users: ["$fromUser", "$toUser"]}},
{$match: {
$expr: {
$and: [{$in: ["$$userId", "$users"]}, {$in: [user_id, "$users"]}]
}
}
}
]
}
},
{$match: {"prob.0": {$exists: false}}},
{$set: {prob: {$rand: {}}}},
{$sort: {prob: 1}},
{$limit: 1},
{$unset: "prob"}
])
See how it works on the playground example
for example
animals = ['cat','mat','rat'];
collection contains only 'cat' and 'mat'
I want the query to return 'rat' which is not there in collection..
collection contains
[
{
_id:objectid,
animal:'cat'
},
{
_id:objectid,
animal:'mat'
}
]
db.collection.find({'animal':{$nin:animals}})
(or)
db.collection.find({'animal':{$nin:['cat','mat','rat']}})
EDIT:
One option is:
Use $facet to $group all existing values to a set. using $facet allows to continue even if the db is empty, as #leoll2 mentioned.
$project with $cond to handle both cases: with or without data.
Find the set difference
db.collection.aggregate([
{$facet: {data: [{$group: {_id: 0, animals: {$addToSet: "$animal"}}}]}},
{$project: {
data: {
$cond: [{$gt: [{$size: "$data"}, 0]}, {$first: "$data"}, {animals: []}]
}
}},
{$project: {data: "$data.animals"}},
{$project: {_id: 0, missing: {$setDifference: [animals, "$data"]}}}
])
See how it works on the playground example - with data or playground example - without data
I have a table Thread:
{
userId: String
messageId: String
}
Now I have an array of userIds, I need to query 20 messageIds for each of them, I can do it with a loop:
const messageIds = {}
for (const userId of userIds) {
const results = await Thread.find({ userId }).sort({ _id: -1 }).limit(20).exec()
messageIds[userId] = results.map(result => result.messageId)
}
But of course this doesn't perform well. Is there a better solution?
The problem with your approach is that you are issuing multiple separate queries to MongoDB.
The simplest workaround to this is using the $push and $slice approach. But this has the problem that the intermediate step would creating an array of huge size.
Another way could be to use $facet as part of aggregation query.
So you need a $facet step in the aggregation like -
[
{$facet: {
'userId1': [
{$match: {userId: 'userId1'} },
{$limit: 20},
{$group: {_id: '', msg: {$push: '$messageId'} } }
],
'userId2': [
{$match: {userId: 'userId2'} },
{$limit: 20},
{$group: {_id: '', msg: {$push: '$messageId'} } }
],
.... (for each userId in array)
}}
]
You can easily just generate this query by iterating over the list of users and adding keys for each user.
So you end up with an object where key is the userId and the value is the array of messages (obj[userId].msg)
You can use aggregation to group threads by userId, and return the top 20:
db.threads.aggregate([
{$match: {userId:{$in: userIds}}},
{$sort: {_id: -1}},
{$group: {_id: "$userId", threads: {$push: "$$ROOT"}}},
{$project: {_id:0, userId:"$_id", threads: {$slice:["$threads", 20]}}}
])
I have a document (name, firstName, age).
This command gives me the different names of the document:
db.getCollection('persons').distinct("name")
How do I do to get the corresponding firstNames?
Thanks!
You could try an aggregation query which groups the name and firstName. Eventually you could also add a count to it to see which combinations are repeated (but that is not necessary).
Here is an example:
db.test1.aggregate(
[
{
$group: {_id: {name: "$name", firstName: "$firstName"}, count: {$sum: 1}}
}
]
)
Here is another another option to show an aggregated list:
db.test1.aggregate(
[
{
$group: {_id: {name: "$name"}, firstName: { $push: "$firstName" }}
}
]
)
I have a list of students in one collection and their grades in another collection. The schema (stripped of other details) look like
Students
{
_id: 1234,
student: {
code: "WUKD984KUK"
}
}
Grades
{
_id: 3456,
grade: 24,
studentCode: "WUKD984KUK"
}
There can be multiple grade entries for each student. I need a total count of students who are present in grades table but not in the student table. I also need a count of grades for each student who are not in the student table. The following is the query I had written,
var existingStudents = db.students.find({}, {_id: 0, 'student.code': 1});
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);
But this returns me all of the student details as if the match is not working. When I run just the first part of this query, I get the student details as
{ "student" : { "code" : "A210225504104" } }
I feel that because the return value is two deep the match isnt working. What is the right way to get this?
Use this code
var existingStudents=[];
db.students.find({}, {_id: 0, 'student.code': 1}).forEach(function(doc){existingStudents.push(doc.student.code)})
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);