MongoDB use aggregate to format data from multiple collections - mongodb

I have data in MongoDB collections as below:
Users:
{ name: String, email: String }
Books:
{ name: String, author: ref -> Users }
Chapters:
{ name: String, book: ref -> Books }
Paragraphs:
{ text: String, chapter: ref -> Chapters, created: Date, updated: Date, isRemoved: boolean }
I am trying to get some kind of statistical data in the following format:
[
{
book: { _id, name },
chaptersCount: 10,
paragraphs: {
count, mostRecent: { updated, created }}
author: { name, email },
},
{
...
},
...
]
So far, I have been able to get some data using the aggregate pipeline, but I am lost at this point. I have no idea how to convert it into the format I wish it to be. I can do it programmatically, but filters/sorting need to be applied on each of the final fields and it will be difficult to do that for a huge number of records (say a million).
Here is what I have so far:
const data = await ParagraphsDB.aggregate([
{ $match: { isRemoved: { $exists: false }, project: { $exists: true } } },
{ $lookup: { from: 'chapters', localField: 'chapter', foreignField: '_id', as: 'chapterDoc' }},
{ $unwind: '$chapterDoc' },
{ $lookup: { from: 'books', localField: 'chapterDoc.book', foreignField: '_id', as: 'bookDoc' }},
{ $unwind: '$bookDoc' },
{
$facet: {
paragraphCount: [
{ $count: 'value' },
],
pipelineResults: [
{ $project: { _id: 1, 'chapterDoc._id': 1, 'chapterDoc.name': 1, 'bookDoc._id': 1, 'bookDoc.name': 1 } },
],
},
},
{ $unwind: '$pipelineResults' },
{ $unwind: '$paragraphCount' },
{
$replaceRoot: {
newRoot: {
$mergeObjects: [ '$pipelineResults', { paragraphCount: '$paragraphCount.value' } ],
},
},
},
]);
I started with the Paragraph data because it is the smallest unit that could be sorted upon. How do I achieve the desired result?
Also, once I have formatted the data in the desired format, how can I sort by one of those fields?
Any help will be highly appreciated. Thanks.

Related

Boost search score from data in another collection

I use Atlas Search to return a list of documents (using Mongoose):
const searchResults = await Resource.aggregate()
.search({
text: {
query: searchQuery,
path: ["title", "tags", "link", "creatorName"],
},
}
)
.match({ approved: true })
.addFields({
score: { $meta: "searchScore" }
})
.exec();
These resources can be up and downvoted by users (like questions on Stackoverflow). I want to boost the search score depending on these votes.
I can use the boost operator for that.
Problem: The votes are not a property of the Resource document. Instead, they are stored in a separate collection:
const resourceVoteSchema = mongoose.Schema({
_id: { type: String },
userId: { type: mongoose.Types.ObjectId, required: true },
resourceId: { type: mongoose.Types.ObjectId, required: true },
upDown: { type: String, required: true },
After I get my search results above, I fetch the votes separately and add them to each search result:
for (const resource of searchResults) {
const resourceVotes = await ResourceVote.find({ resourceId: resource._id }).exec();
resource.votes = resourceVotes
}
I then subtract the downvotes from the upvotes on the client and show the final number in the UI.
How can I incorporate this vote points value into the score of the search results? Do I have to reorder them on the client?
Edit:
Here is my updated code. The only part that's missing is letting the resource votes boost the search score, while at the same time keeping all resource-votes documents in the votes field so that I can access them later. I'm using Mongoose syntax but an answer with normal MongoDB syntax will work for me:
const searchResults = await Resource.aggregate()
.search({
compound: {
should: [
{
wildcard: {
query: queryStringSegmented,
path: ["title", "link", "creatorName"],
allowAnalyzedField: true,
}
},
{
wildcard: {
query: queryStringSegmented,
path: ["topics"],
allowAnalyzedField: true,
score: { boost: { value: 2 } },
}
}
,
{
wildcard: {
query: queryStringSegmented,
path: ["description"],
allowAnalyzedField: true,
score: { boost: { value: .2 } },
}
}
]
}
}
)
.lookup({
from: "resourcevotes",
localField: "_id",
foreignField: "resourceId",
as: "votes",
})
.addFields({
searchScore: { $meta: "searchScore" },
})
.facet({
approved: [
{ $match: matchFilter },
{ $skip: (page - 1) * pageSize },
{ $limit: pageSize },
],
resultCount: [
{ $match: matchFilter },
{ $group: { _id: null, count: { $sum: 1 } } }
],
uniqueLanguages: [{ $group: { _id: null, all: { $addToSet: "$language" } } }],
})
.exec();
It could be done with one query only, looking similar to:
Resource.aggregate([
{
$search: {
text: {
query: "searchQuery",
path: ["title", "tags", "link", "creatorName"]
}
}
},
{$match: {approved: true}},
{$addFields: {score: {$meta: "searchScore"}}},
{
$lookup: {
from: "ResourceVote",
localField: "_id",
foreignField: "resourceId",
as: "votes"
}
}
])
Using the $lookup step to get the votes from the ResourceVote collection
If you want to use the votes to boost the score, you can replace the above $lookup step with something like:
{
$lookup: {
from: "resourceVote",
let: {resourceId: "$_id"},
pipeline: [
{
$match: {$expr: {$eq: ["$resourceId", "$$resourceId"]}}
},
{
$group: {
_id: 0,
sum: {$sum: {$cond: [{$eq: ["$upDown", "up"]}, 1, -1]}}
}
}
],
as: "votes"
}
},
{$addFields: { votes: {$arrayElemAt: ["$votes", 0]}}},
{
$project: {
"wScore": {
$ifNull: [
{$multiply: ["$score", "$votes.sum"]},
"$score"
]
},
createdAt: 1,
score: 1
}
}
As you can see on this playground example
EDIT: If you want to keep the votes on the results, you can do something like:
db.searchResults.aggregate([
{
$lookup: {
from: "ResourceVote",
localField: "_id",
foreignField: "resourceId",
as: "votes"
}
},
{
"$addFields": {
"votesCount": {
$reduce: {
input: "$votes",
initialValue: 0,
in: {$add: ["$$value", {$cond: [{$eq: ["$$this.upDown", "up"]}, 1, -1]}]}
}
}
}
},
{
$addFields: {
"wScore": {
$add: [{$multiply: ["$votesCount", 0.1]}, "$score"]
}
}
}
])
As can be seen here

MongoDB aggregate performance

I have two collections one is bids and another one is auctions. I am able to get bid inside customer wise count bids collection inside almost one million records. and auctions collection have 500k records. I need this same result as quick fetching in mongodb.
this is getting almost 29 seconds for response time. but i need quick time to get response
{ $match: { customer: '00000000823026' } },
{ $group: {
_id: '$auctioncode',
data: {
$last: '$$ROOT'
}
}
},
{
$lookup: {
from: 'auctions',
let: { auctioncode: '$_id' },
pipeline: [
{
$match: {
$expr: {
$and: [{ $eq: ['$_id', '$$auctioncode'] }],
},
},
},
],
as: 'auction',
},
},
{ $match: { auction: { $exists: true, $not: { $size: 0 } } } },
{
$addFields: {
_id: '$data._id',
auctioncode: '$data.auctioncode',
amount: '$data.amount',
customer: '$data.customer',
customerName: '$data.customerName',
maxBid: '$data.maxBid',
stockcode: '$data.stockcode',
watchlistHidden: '$data.watchlistHidden',
winner: '$data.winner'
}
},
{
$match: {
$and: [
{
// to get only RUNNING auctions in bid history
'auction.status': { $ne: 'RUNNING'},
// to filter auctions based on status
// tslint:disable-next-line:max-line-length
},
],
},
},
{ $sort: { 'auction.enddate': -1 } },
{ $count: 'totalCount'}
current result is totalCount 2640
how to optimize and need to find a way to performance changes in mongodb
If all that you require is the count of results, the below code is more optimized.
Index customer key for even better execution time.
Note: You can make use of pipeline method of $lookup if you are using MongoDB version >= 5.0 as it makes use if indexes unlike the lower version.
db.collection.aggregate([
{
$match: {
customer: '00000000823026'
}
},
{
$group: {
_id: '$auctioncode',
data: {
$last: '$$ROOT'
}
}
},
{
$lookup: {
from: 'auctions',
localField: '_id',
foreignField: '_id',
as: 'auction',
},
},
{
$match: {
// auction: {$ne: []},
// to get only RUNNING auctions in bid history
'auction.status': { $ne: 'RUNNING'},
// to filter auctions based on status
// tslint:disable-next-line:max-line-length
// {
// $addFields: { <-- Not Required
// _id: '$data._id',
// auctioncode: '$data.auctioncode',
// amount: '$data.amount',
// customer: '$data.customer',
// customerName: '$data.customerName',
// maxBid: '$data.maxBid',
// stockcode: '$data.stockcode',
// watchlistHidden: '$data.watchlistHidden',
// winner: '$data.winner'
// }
// },
// { $sort: { 'auction.enddate': -1 } }, <-- Not Required
{ $count: 'totalCount'}
], {
allowDiskUse: true
})

Merge $lookup value inside objects nested in array mongoose

So I have 2 models user & form.
User Schema
firstName: {
type: String,
required: true,
},
lastName: {
type: String,
required: true,
},
email: {
type: String,
required: true,
}
Form Schema
approvalLog: [
{
attachments: {
type: [String],
},
by: {
type: ObjectId,
},
comment: {
type: String,
},
date: {
type: Date,
},
},
],
userId: {
type: ObjectId,
required: true,
},
... other form parameters
When returning a form, I'm trying to aggregate the user info of every user in the approvalLog into their respective objects as below.
{
...other form info
approvalLog: [
{
attachments: [],
_id: '619cc4953de8413b548f61a6',
by: '619cba9cd64af530448b6347',
comment: 'visit store for disburement',
date: '2021-11-23T10:38:13.565Z',
user: {
_id: '619cba9cd64af530448b6347',
firstName: 'admin',
lastName: 'user',
email: 'admin#mail.com',
},
},
{
attachments: [],
_id: '619cc4ec3ea3e940a42b2d01',
by: '619cbd7b3de8413b548f61a0',
comment: '',
date: '2021-11-23T10:39:40.168Z',
user: {
_id: '619cbd7b3de8413b548f61a0',
firstName: 'sam',
lastName: 'ben',
email: 'sb#mail.com',
},
},
{
attachments: [],
_id: '61a9deab8f472c52d8bac095',
by: '61a87fd93dac9b209096ed94',
comment: '',
date: '2021-12-03T09:08:59.479Z',
user: {
_id: '61a87fd93dac9b209096ed94',
firstName: 'john',
lastName: 'doe',
email: 'jd#mail.com',
},
},
],
}
My current code is
Form.aggregate([
{
$lookup: {
from: 'users',
localField: 'approvalLog.by',
foreignField: '_id',
as: 'approvedBy',
},
},
{ $addFields: { 'approvalLog.user': { $arrayElemAt: ['$approvedBy', 0] } } },
])
but it only returns the same user for all objects. How do I attach the matching user for each index?
I've also tried
Form.aggregate([
{
$lookup: {
from: 'users',
localField: 'approvalLog.by',
foreignField: '_id',
as: 'approvedBy',
},
},
{
$addFields: {
approvalLog: {
$map: {
input: { $zip: { inputs: ['$approvalLog', '$approvedBy'] } },
in: { $mergeObjects: '$$this' },
},
},
},
},
])
This adds the right user to their respective objects, but I can only add the to the root object and not a new one.
You can try the approach,
$map to iterate loop of approvalLog
$filter to iterate loop of approvedBy array and search for user id by
$arrayElemAt to get first element from above filtered result
$mergeObjects to merge current object properties of approvalLog and filtered user
$$REMOVE don't need approvedBy now
await Form.aggregate([
{
$lookup: {
from: "users",
localField: "approvalLog.by",
foreignField: "_id",
as: "approvedBy"
}
},
{
$addFields: {
approvalLog: {
$map: {
input: "$approvalLog",
as: "a",
in: {
$mergeObjects: [
"$$a",
{
user: {
$arrayElemAt: [
{
$filter: {
input: "$approvedBy",
cond: { $eq: ["$$a.by", "$$this._id"] }
}
},
0
]
}
}
]
}
}
},
approvedBy: "$$REMOVE"
}
}
])
Playground
The second approach using $unwind,
$unwind deconstruct the approvalLog array
$lookup with user collection
$addFields and $arrayElemAt to get first element from lookup result
$group by _id and reconstruct the approvalLog array and get first value of other required properties
await Form.aggregate([
{ $unwind: "$approvalLog" },
{
$lookup: {
from: "users",
localField: "approvalLog.by",
foreignField: "_id",
as: "approvalLog.user"
}
},
{
$addFields: {
"approvalLog.user": {
$arrayElemAt: ["$approvalLog.user", 0]
}
}
},
{
$group: {
_id: "$_id",
approvalLog: { $push: "$approvalLog" },
userId: { $first: "$userId" },
// add your other properties like userId
}
}
])
Playground

aggregation lookup and match a nested array

Hello i am trying to join two collections...
#COLLECTION 1
const valuesSchema= new Schema({
value: { type: String },
})
const categoriesSchema = new Schema({
name: { type: String },
values: [valuesSchema]
})
mongoose.model('categories', categoriesSchema )
#COLLECTION 2
const productsSchema = new Schema({
name: { type: String },
description: { type: String },
categories: [{
type: mongoose.Schema.Types.ObjectId,
ref: 'categories',
}]
})
mongoose.model('productos', productsSchema )
Now, what i pretend to do is join these collections and have an output like this.
#Example Product Document
{
name: 'My laptop',
description: 'Very ugly laptop',
categories: ['5f55949054f3f31db0491b5c','5f55949054f3f31db0491b5b'] // these are _id of valuesSchema
}
#Expected Output
{
name: 'My laptop',
description: 'Very ugly laptop',
categories: [{value: 'Laptop'}, {value: 'PC'}]
}
This is what i tried.
{
$lookup: {
from: "categories",
let: { "categories": "$categories" },
as: "categories",
pipeline: [
{
$match: {
$expr: {
$in: [ '$values._id','$$categories']
},
}
},
]
}
}
but this query is not matching... Any help please?
You can try,
$lookup with categories
$unwind deconstruct values array
$match categories id with value id
$project to show required field
db.products.aggregate([
{
$lookup: {
from: "categories",
let: { cat: "$categories" },
as: "categories",
pipeline: [
{ $unwind: "$values" },
{ $match: { $expr: { $in: ["$values._id", "$$cat"] } } },
{
$project: {
_id: 0,
value: "$values.value"
}
}
]
}
}
])
Playground
Since you try to use the non-co-related queries, I appreciate it, you can easily achieve with $unwind to flat the array and then $match. To regroup the array we use $group. The $reduce helps to move on each arrays and store some particular values.
[
{
$lookup: {
from: "categories",
let: {
"categories": "$categories"
},
as: "categories",
pipeline: [
{
$unwind: "$values"
},
{
$match: {
$expr: {
$in: [
"$values._id",
"$$categories"
]
},
}
},
{
$group: {
_id: "$_id",
values: {
$addToSet: "$values"
}
}
}
]
}
},
{
$project: {
categories: {
$reduce: {
input: "$categories",
initialValue: [],
in: {
$concatArrays: [
"$$this.values",
"$$value"
]
}
}
}
}
}
]
Working Mongo template

MongoDB aggregate returning only specific fields

I have the following code:
const profiles = await Profile.aggregate([
{
$lookup: {
from: "users",
localField: "user",
foreignField: "_id",
as: "user",
},
},
{
$unwind: "$user",
},
{
$match: {
"user.name": {
$regex: q.trim(),
$options: "i",
},
},
},
{
$skip: req.params.page ? (req.params.page - 1) * 10 : 0,
},
{
$limit: 11,
},
{
$group: {
_id: "$_id",
skills:{skills}
user: { name: "$name" },
user: { avatar: "$avatar" },
},
},
]);
I want to return only specific fields like skills _id and user.name and user.avatar, but this doesn't work. I'm pretty sure that the problem is in $group. I want to receive only these fields
[
{
_id: 5ef78d005d23020ca847aa76,
skills: [ 'asd' ],
user: {
_id: 5ef78c7c5d23020ca847aa75,
name: 'Simeon Lazarov',
avatar: 'uploads\\1593286096227 - background.jpg',
}
}
]
You can make use of $project to get specific fields.
After grouping add the below:
{
$project: {_id:1, skills:1, user:1}
}
Projection value of 0 means that the field needs to be excluded, Value 1 represents inclusion of the field.
Document reference: https://docs.mongodb.com/manual/reference/operator/aggregation/project/