Mongodb aggregate match and group documents, with additional check - mongodb

So I have 2 models: Question and Answer.
Answer has: questionId, userId, answer (String).
I need an aggregation pipline that will:
match all answers by questionId
see if the current user already voted (is his id in matched documents)
group answers and count them
I implemented 1 and 3 like this:
const q = ObjectId('5d6e52a68558b63fb9302efd');
const user = ObjectId('5d0b3f7daceeb50c477b49e0');
Answer.aggregate([
{ $match: { questionId: q } },
{ $group: { _id: '$answer', count: { $sum: 1 } } },
])
I am missing a step between those 2 aggregation pipelines, where I would iterate thru matched documents, and check if userId matches user.
I would like to get some object like this:
{
didIVote: true,
result: [ { _id: 'YES', count: 5 }, { _id: 'NO', count: 2 } ]
}
Or maybe even like this:
[
{ _id: 'YES', count: 5, didIVote: true },
{ _id: 'NO', count: 2, didIVote: false },
]

In the $group stage, create an array with the users that voted
for each answer.
Add an aditional $project stage to check if the user is in the array.
const q = ObjectId('5d6e52a68558b63fb9302efd');
const user = ObjectId('5d0b3f7daceeb50c477b49e0');
Answer.aggregate([
{ $match: { questionId: q } },
{
$group: {
_id: '$answer',
count: { $sum: 1 },
voted: { $addToSet: "$userId" }
}
},
{
$project: {
count: 1,
didIVote: { $in: [ user, "$voted" ] },
}
}
]);

Related

Mongoose - filter matched documents and assign the resultant length to a field

I have this collection(some irrelevant fields were omitted for brevity):
clients: {
userId: ObjectId,
clientSalesValue: Number,
currentDebt: Number,
}
Then I have this query that matches all the clients for a specific user, then calculates the sum of all debts and sales and put those results in a separate field each of them:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
}
},
{
$unset: ['_id']
}
]).exec();
This works as expected, it returns an array with only one item which is an object, but now I need to also include in that resultant object a field for the amount of debtors, that is for the amount of clients that have currentDebt > 0, how can I do that is the same query? is it possible?
PD: I cannot modify the $match condition, it need to always return all the clients for the corresponding users.
To include a count of how many matching documents have a positive currentDebt, you can use the $sum and $cond operators like so:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
numDebtors: {
$sum: {
$cond: [{ $gt: ['$currentDebt', 0] }, 1, 0]
}
},
}
},
{
$unset: ['_id']
}
]).exec();

Mongodb find document in collection from field in another collection

I have two collections: Sharing and Material.
Sharing:
{
from_id: 2
to_id: 1
material_id: material1
}
Material:
{
_id: material1
organization_id: 2
},
{
_id: material2
organization_id: 1
},
{
_id: material3
organization_id: 1
},
--Edit:
There are three materials, 2 belong to organization_id(1) and 1 belongs to organization_id(2). The organization_id does not match 1 in material1 (and instead belongs to material2), but in the Sharing collection, the to_id does match 1. If the match exists, I'd like to find the Material document _id which is equal to the material_id of Sharing AND find the Material documents where the organization_id is equal to 1.
I'd like to check if a field in Sharing (to_id) has a value that is equal to a field in Material (organization_id) AND check if organization_id is equal to 1. If there is a document that exists from this, do another check to find whether the _id of Material is equal to the material_id of Sharing and return all documents & the total count.
If there is no equal value, I'd like to omit that result and send the object with only organization_id equal to 1 and get the total count of this result.
Right now, I do it in a very inefficient way using .map() to find this. Below is my code:
export const getMaterials = async (req, res) => {
const sharing = await Sharing.find({to_id: 1});
let doneLoad;
try {
if (sharing && sharing.length>0) {
const sharingTotal = await Material.find( {$or: [ {organization_id: 1}, {_id: sharing.map((item) => item.material_id)} ] } ).countDocuments();
const sharingMats = await Material.find( {$or: [ {organization_id: 1}, {_id: sharing.map((item) => item.material_id)} ] } );
res.status(200).json({data: sharingMats});
doneLoad= true;
}
else if (!doneLoad) {
const materialTotal = await Material.find({organization_id: 1}).countDocuments();
const materials = await Material.find({organization_id: 1});
res.status(200).json({data: materials});
}
} catch (error) {
res.status(404).json({ message: error.message });
}
}
I have tried using aggregation to get my desired result but I cannot find any solution that fits my requirements. Any help would be great as I am quite new to using mongodb. Thanks.
Edit (desired result):
Materials: [
{
_id: material1,
organization_id: 1
},
{
_id: material2,
organization_id: 1
},
{
_id: material3,
organization_id: 1
}
]
You can use sub-pipeline in a $lookup to perform the filtering. $addFields the count using $size later.
db.Sharing.aggregate([
{
"$match": {
to_id: 1
}
},
{
"$lookup": {
"from": "Material",
"let": {
to_id: "$to_id",
material_id: "$material_id"
},
"pipeline": [
{
"$match": {
$expr: {
$or: [
{
$eq: [
"$$to_id",
"$organization_id"
]
},
{
$eq: [
"$$material_id",
"$_id"
]
}
]
}
}
},
{
"$addFields": {
"organization_id": 1
}
}
],
"as": "materialLookup"
}
},
{
"$addFields": {
"materialCount": {
$size: "$materialLookup"
}
}
}
])
Here is the Mongo playground for your reference.

MongoDB: How to speed up my data reorganisation query/operation?

I'm trying to analyse some data and I thought my queries would be faster ultimately by storing a relationship between my collections instead. So I wrote something to do the data normalisation, which is as follows:
var count = 0;
db.Interest.find({'PersonID':{$exists: false}, 'Data.DateOfBirth': {$ne: null}})
.toArray()
.forEach(function (x) {
if (null != x.Data.DateOfBirth) {
var peep = { 'Name': x.Data.Name, 'BirthMonth' :x.Data.DateOfBirth.Month, 'BirthYear' :x.Data.DateOfBirth.Year};
var person = db.People.findOne(peep);
if (null == person) {
peep._id = db.People.insertOne(peep).insertedId;
//print(peep._id);
}
db.Interest.updateOne({ '_id': x._id }, {$set: { 'PersonID':peep._id }})
++count;
if ((count % 1000) == 0) {
print(count + ' updated');
}
}
})
This script is just passed to mongo.exe.
Basically, I attempt to find an existing person, if they don't exist create them. In either case, link the originating record with the individual person.
However this is very slow! There's about 10 million documents and at the current rate it will take about 5 days to complete.
Can I speed this up simply? I know I can multithread it to cut it down, but have I missed something?
In order to insert new persons into People collection, use this one:
db.Interest.aggregate([
{
$project: {
Name: "$Data.Name",
BirthMonth: "$Data.DateOfBirth.Month",
BirthYear: "$Data.DateOfBirth.Year",
_id: 0
}
},
{
$merge: {
into: "People",
// requires an unique index on {Name: 1, BirthMonth: 1, BirthYear: 1}
on: ["Name", "BirthMonth", "BirthYear"]
}
}
])
For updating PersonID in Interest collection use this pipeline:
db.Interest.aggregate([
{
$lookup: {
from: "People",
let: {
name: "$Data.Name",
month: "$Data.DateOfBirth.Month",
year: "$Data.DateOfBirth.Year"
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$Name", "$$name"] },
{ $eq: ["$BirthMonth", "$$month"] },
{ $eq: ["$BirthYear", "$$year"] }
]
}
}
},
{ $project: { _id: 1 } }
],
as: "interests"
}
},
{
$set: {
PersonID: { $first: "$interests._id" },
interests: "$$REMOVE"
}
},
{ $merge: { into: "Interest" } }
])
Mongo Playground

find duplicates in array per document in mongodb

Let's say that I have some document with this structure:
_id: ObjectId('444455'),
name: 'test',
email: 'email,
points: {
spendable: 23,
history: [
{
comment: 'Points earned by transaction #1234',
points: 1
},
{
comment: 'Points earned by transaction #456',
points: 3
},
{
comment: 'Points earned by transaction #456',
points: 3
}
]
}
}
Now I have a problem that some documents contains duplicates objects in the points.history array.
Is there a way to easily find all those duplicates by a query?
I already tried this query: Find duplicate records in MongoDB
but that shows the total count of every duplicated line in all documents. I need a overview of the duplicates per document like this:
{
_id: ObjectId('444455') //_id of the document not of the array item itself
duplicates: [
{
comment: 'Points earned by transaction #456
}
]
}, {
_id: ObjectId('444456') //_id of the document not of the array item itself
duplicates: [
{
comment: 'Points earned by transaction #66234
},
{
comment: 'Points earned by transaction #7989
}
]
}
How can I achieve that?
Try below aggregate pipeline
collectionName.aggregate([
{
$unwind: "$points.history"
},
{
$group: {
_id: {
id: "$_id",
comment: "$points.history.comment",
points: "$points.history.points"
},
sum: {
$sum: 1
},
}
},
{
$match: {
sum: {
$gt: 1
}
}
},
{
$project: {
_id: "$_id._id",
duplicates: {
comment: "$_id.comment"
}
}
}
])

MongoDB sum with match

I have a collection with the following data structure:
{
_id: ObjectId,
text: 'This contains some text',
type: 'one',
category: {
name: 'Testing',
slug: 'test'
},
state: 'active'
}
What I'm ultimately trying to do is get a list of categories and counts. I'm using the following:
const query = [
{
$match: {
state: 'active'
}
},
{
$project: {
_id: 0,
categories: 1
}
},
{
$unwind: '$categories'
},
{
$group: {
_id: { category: '$categories.name', slug: '$categories.slug' },
count: { $sum: 1 }
}
}
]
This returns all categories (that are active) and the total counts for documents matching each category.
The problem is that I want to introduce two additional $match that should still return all the unique categories, but only affect the counts. For example, I'm trying to add a text search (which is indexed on the text field) and also a match for type.
I can't do this at the top of the pipeline because it would then only return categories that match, not only affect the $sum. So basically it would be like being able to add a $match within the $group only for the $sum. Haven't been able to find a solution for this and any help would be greatly appreciated. Thank you!
You can use $cond inside of your $group statement:
{
$group: {
_id: { category: '$categories.name', slug: '$categories.slug' },
count: { $sum: { $cond: [ { $eq: [ "$categories.type", "one" ] }, 1, 0 ] } }
}
}