I have a table Thread:
{
userId: String
messageId: String
}
Now I have an array of userIds, I need to query 20 messageIds for each of them, I can do it with a loop:
const messageIds = {}
for (const userId of userIds) {
const results = await Thread.find({ userId }).sort({ _id: -1 }).limit(20).exec()
messageIds[userId] = results.map(result => result.messageId)
}
But of course this doesn't perform well. Is there a better solution?
The problem with your approach is that you are issuing multiple separate queries to MongoDB.
The simplest workaround to this is using the $push and $slice approach. But this has the problem that the intermediate step would creating an array of huge size.
Another way could be to use $facet as part of aggregation query.
So you need a $facet step in the aggregation like -
[
{$facet: {
'userId1': [
{$match: {userId: 'userId1'} },
{$limit: 20},
{$group: {_id: '', msg: {$push: '$messageId'} } }
],
'userId2': [
{$match: {userId: 'userId2'} },
{$limit: 20},
{$group: {_id: '', msg: {$push: '$messageId'} } }
],
.... (for each userId in array)
}}
]
You can easily just generate this query by iterating over the list of users and adding keys for each user.
So you end up with an object where key is the userId and the value is the array of messages (obj[userId].msg)
You can use aggregation to group threads by userId, and return the top 20:
db.threads.aggregate([
{$match: {userId:{$in: userIds}}},
{$sort: {_id: -1}},
{$group: {_id: "$userId", threads: {$push: "$$ROOT"}}},
{$project: {_id:0, userId:"$_id", threads: {$slice:["$threads", 20]}}}
])
Related
I have a collection of documents (grades) with some missing keys like this:
Name
ID
Github
Points
Peter
1
123
1
PPane
456
Alice
Alice1
234
2
Alice1
567
I want to group this data by matching any of Name, ID or Github together and collecting the points.
The result should look like this:
_id
Points
[Peter, 1, PPane]
[123, 456]
[Alice, 2, Alice1]
[234, 567]
Right now I am doing this in the backend like this:
const students = new Map<string, CourseStudent>();
const keys = ['Name', 'ID', 'Github'];
for (const grade of grades) {
let student: CourseStudent | undefined = undefined;
for (const key of keys) {
const value = grade[key];
if (value && (student = students.get(value))) {
break;
}
}
if (!student) {
const {Name, ID, Github} = grade;
student = {_id: {Name, ID, Github}, points: []};
}
for (const key of keys) {
const value = grade[key];
if (value) {
students.set(value, student);
}
}
student.points.push(grade.points);
}
return Array.from(students.values());
The data sizes in my use case are 1000-10000 grades (100-1000 students x 10 assignments).
The actual "grade" data contains a lot more fields, most of which are not used for the final result, but keeping all of it in memory can be costly.
Is there a way to achieve this in the database with an aggregation pipeline, e.g. using $group?
To start with, here is a non-working aggregation, because it requires ALL fields to match instead of just one:
{$group: {_id: ['$Name', '$ID', '$Github'], points: {$push: '$Points'}}},
Since you have only 3 keys to use, grouping by one and collecting the 2 others will result in one missing key only:
$group by ID and collect other keys. Assuming there is at least one document per each user which contains the user's key (as in your example), this step will result with number of documents as the number of users + 1. Each document contains the user ID and Name or Github. One document refer all document without any ID.
$match to keep only the users documents
For each user document get all the matching original documents using $lookup, we now have the data needed to get them.
Group and format the results.
db.collection.aggregate([
{$group: {_id: "$ID", Name: {$first: "$Name"}, Github: {$first: "$Github"}}},
{$match: {_id: {$ne: null}}},
{$lookup: {
from: "collection",
let: {github: "$Github", iD: "$_id"},
pipeline: [
{$match: {
$expr: {$or: [
{$eq: ["$$github", "$Github"]},
{$eq: ["$$iD", "$ID"]}
]}
}},
{$group: {
_id: 0,
Name: {$addToSet: "$Name"},
Github: {$addToSet: "$Github"},
ID: {$addToSet: "$ID"},
Points: {$push: "$Points"}
}}
],
as: "docs"
}
},
{$replaceRoot: {newRoot: {$first: "$docs"}}},
{$project: {
_id: [{$first: "$Name"}, {$first: "$ID"}, {$first: "$Github"}],
Points: 1
}}
])
See how it works on the playground example
for example
animals = ['cat','mat','rat'];
collection contains only 'cat' and 'mat'
I want the query to return 'rat' which is not there in collection..
collection contains
[
{
_id:objectid,
animal:'cat'
},
{
_id:objectid,
animal:'mat'
}
]
db.collection.find({'animal':{$nin:animals}})
(or)
db.collection.find({'animal':{$nin:['cat','mat','rat']}})
EDIT:
One option is:
Use $facet to $group all existing values to a set. using $facet allows to continue even if the db is empty, as #leoll2 mentioned.
$project with $cond to handle both cases: with or without data.
Find the set difference
db.collection.aggregate([
{$facet: {data: [{$group: {_id: 0, animals: {$addToSet: "$animal"}}}]}},
{$project: {
data: {
$cond: [{$gt: [{$size: "$data"}, 0]}, {$first: "$data"}, {animals: []}]
}
}},
{$project: {data: "$data.animals"}},
{$project: {_id: 0, missing: {$setDifference: [animals, "$data"]}}}
])
See how it works on the playground example - with data or playground example - without data
Is it possible to runCommand distinct with substr on the key I'm targeting?
I keep getting missing : after property id :
db.runCommand(
{
distinct: "mycollection",
key: {"myfield" : { $substr: { "$myfield", 0, 10 } }},
}
)
Can't do this with runCommand distinct. You need to use the agg framework to process the field and then get distinct values using $group, thusly:
db.foo.aggregate([
{$group: {_id: {$substr: [ "$myfield",0,10]} }}
]);
Very often it is useful to get the count of those distinct values:
db.foo.aggregate([
{$group: {_id: {$substr: ["$myfield",0,10]}, count: {$sum:1} }}
]);
I am new to Mongodb, and NoSQL in general and I am trying to use mongodbs aggregate function to aggregate data from one collection to be inserted into another. An example of the original collection would be this:
Original Collection
{
supplier: 'aldi',
timestamp: '1492807458',
user: 'eddardstark#gmail.com',
hasBeenAggregated:false,
items:[{
name: 'butter',
supplier: 'aldi',
expiry: '1492807458',
amount: 454,
measureSymbol: 'g',
cost: 2.19
},{
name: 'milk',
supplier: 'aldi',
expiry: '1492807458',
amount: 2000,
measureSymbol: 'ml',
cost: 1.49
}]
}
An example of the output I am trying to achieve would be:
New Collection
{
user:'eddardstark#gmail.com',
amount: 3.68,
isIncome: false,
title: 'food_shopping',
timestamp: '1492807458'
}
The aggregation function that I am using is:
Aggregation
var result = db.runCommand({
aggregate: 'food_transactions',
pipeline: [
{$match: {hasBeenAggregated: false}},
{$unwind: '$items'},
{$group:{_id: '$_id',amount:{$sum: '$items.cost'}}},
{$project: {
_id:0,
user:1,
amount:1,
isIncome: {$literal: false},
title:{$literal: 'food_shopping'},
timestamp:1
}}
]
});
printjson(result)
This aggregation function does not return the user or timestamp fields. Instead, I get the following output:
Output
{
"amount" : 3.6799999999999997,
"isIncome" : false,
"title" : "food_shopping"
}
If I don't group the results and perform the calculations in the $project stage, the fields are all projected correctly, but obviously, there is a new document created for each sub-document in the items array and that rather defeats the purpose of the aggregation.
What am I doing wrong?
Update your $group pipeline to include all the fields you wish to project further down the pipeline.
To include user field you can use $first
{$group:{_id: '$_id', user:{$first:'$user`}, amount:{$sum: '$items.cost'}}},
Additionally, if you are 3.4 version you can simplify your aggregation to below.
Use $reduce to sum all the item's cost in a single document. For all documents you can add $group after $reduce.
db.collection.aggregate([
{$match: {hasBeenAggregated: false}},
{$project: {
_id:0,
user:1,
amount: {
$reduce: {
input: "$items",
initialValue: 0,
in: { $add : ["$$value", "$$this.cost"] }
}
},
isIncome: {$literal: false},
title:{$literal: 'food_shopping'},
timestamp:1
}}
])
I have a list of students in one collection and their grades in another collection. The schema (stripped of other details) look like
Students
{
_id: 1234,
student: {
code: "WUKD984KUK"
}
}
Grades
{
_id: 3456,
grade: 24,
studentCode: "WUKD984KUK"
}
There can be multiple grade entries for each student. I need a total count of students who are present in grades table but not in the student table. I also need a count of grades for each student who are not in the student table. The following is the query I had written,
var existingStudents = db.students.find({}, {_id: 0, 'student.code': 1});
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);
But this returns me all of the student details as if the match is not working. When I run just the first part of this query, I get the student details as
{ "student" : { "code" : "A210225504104" } }
I feel that because the return value is two deep the match isnt working. What is the right way to get this?
Use this code
var existingStudents=[];
db.students.find({}, {_id: 0, 'student.code': 1}).forEach(function(doc){existingStudents.push(doc.student.code)})
db.grades.aggregate(
{$match: { 'studentCode': {'$nin': existingStudents}}},
{$group: {_id: '$studentCode', count:{$sum: 1}}},
{$project: {tmp: {code: '$_id', count: '$count'}}},
{$group: {_id: null, total:{$sum:1}, data:{$addToSet: '$tmp'}}}
);