MongoDB aggregations group by and count with a join - mongodb

I have MongoDB model called candidates
appliedJobs: [
{
job: { type: Schema.ObjectId, ref: "JobPost" },
date:Date
},
],
candidate may have multiple records in appliedJobs array. There I refer to the jobPost.
jobPost has the companyName, property.
companyName: String,
What I want is to get the company names with send job applications counts. For an example
|Company|Applications|
|--------|---------------|
|Facebook|10 applications|
|Google|5 applications|
I created this query
Candidate.aggregate([
{
$match: {
appliedJobs: { $exists: true },
},
},
{ $group: { _id: '$companyName', count: { $sum: 1 } } },
])
The problem here is I can't access the companyName like this. Because it's on another collection. How do I solve this?

In order to get data from another collection you can use $lookup (nore efficient) or populate (mongoose - considered more organized), so one option is:
db.candidate.aggregate([
{$match: {appliedJobs: {$exists: true}}},
{$unwind: "$appliedJobs"},
{$lookup: {
from: "JobPost",
localField: "appliedJobs.job",
foreignField: "_id",
as: "appliedJobs"
}
},
{$project: {companyName: {$first: "$appliedJobs.companyName"}}},
{$group: {_id: {candidate: "$_id", company: "$companyName"}, count: {$sum: 1}}},
{$group: {
_id: "$_id.candidate",
appliedJobs: {$push: {k: "$_id.company", v: "$count"}}
}},
{$project: {appliedJobs: {$arrayToObject: "$appliedJobs"}}}
])
See how it works on the playground example

Simply $unwind the appliedJobs array. Perform $lookup to get the companyName. Then, $group to get count of applications by company.
db.Candidate.aggregate([
{
$match: {
appliedJobs: {
$exists: true
}
}
},
{
$unwind: "$appliedJobs"
},
{
"$lookup": {
"from": "JobPost",
"localField": "appliedJobs._id",
"foreignField": "_id",
"as": "JobPostLookup"
}
},
{
$unwind: "$JobPostLookup"
},
{
"$group": {
"_id": "$JobPostLookup.companyName",
"Applications": {
"$sum": 1
}
}
}
])
Here is the Mongo Playground for your reference.

Related

Find MongoDB documents that are not contained across arrays

MongoDB Collection A contains documents with an array with some document ids of collection B:
Collection A:
{
some_ids_of_b: ["id1", ...]
}
Collection B:
{
_id: "id1"
},
{
_id: "id2"
},
...
How do I query all documents from B whose _ids are NOT in contained in the some_ids_of_b arrays of documents of A?
Simple lookup from collection B to A and filter to keep only those documents where you don't find any matches.
db.collb.aggregate([
{
"$lookup": {
"from": "colla",
"localField": "_id",
"foreignField": "someIdsOfB",
"as": "a"
}
},
{
$match: {
$expr: {
$eq: [{$size: "$a"}, 0]
}
}
}
])
Demo
One option is:
db.collectionB.aggregate([
{$lookup: {
from: "collectionA",
let: {my_id: "$_id"},
pipeline: [
{$match: {$and: [
{_id: collADocId},
{$expr: {$in: ["$$my_id", "$some_ids_of_b"]}}
]}},
{$project: {_id: 1}}
],
as: "some_ids_of_b"
}},
{$match: {"some_ids_of_b.0": {$exists: false}}},
{$unset: "some_ids_of_b"}
])
See how it works on the playground example
You can do it with Aggregation Framework:
$group and $addToSet - To get all $some_ids_of_b from all the documents in A collection.
$set with $reduce - To create an array with all unique values of the IDs from the B collection.
$lookup - To fetch the documents from the B collection, where the _id of the document is not present in the $b_ids array.
$project - To project data as expected output.
db.A.aggregate([
{
"$group": {
"_id": null,
"b_ids": {
"$addToSet": "$some_ids_of_b"
}
}
},
{
"$set": {
b_ids: {
$reduce: {
input: "$b_ids",
initialValue: [],
in: {
$setUnion: [
"$$value",
"$$this"
]
}
}
}
}
},
{
"$lookup": {
from: "B",
let: {
b_ids: "$b_ids"
},
pipeline: [
{
"$match": {
"$expr": {
$ne: [
{
"$in": [
"$_id",
"$$b_ids"
]
},
true
]
}
}
}
],
as: "data"
}
},
{
"$project": {
data: 1,
_id: 0
}
}
])
Working Example

Change element name from the result set of Mongo DB Query

I have collection like below named as "FormData",
{
"_id": ObjectId("5e3c27bf1ef77236945ef07b"),
"eed12747-0923-4290-b09c-5a05107f5609": "20200206",
"bd637691-782d-4cfd-8624-feeedfe11b3e": "20200206_1#mail.com"
}
I have another collection named as "Form" which will have Title of Fields,
{
"_id": ObjectId("5e3c27bf1ef77236945ef07b"),
"Fields":[
{
"FieldID": "eed12747-0923-4290-b09c-5a05107f5609",
"Title": "Phone"
},
{
"FieldID": "bd637691-782d-4cfd-8624-feeedfe11b3e",
"Title": "Email"
}]
}
Now I have to map element name with Form field title and I need result like below,
{
"_id": ObjectId("5e3c27bf1ef77236945ef07b"),
"Phone": "20200206",
"Email": "20200206_1#mail.com"
}
Please help me to solve this.
Thanks in advance!
You can:
$objectToArray to convert the $$ROOT document into an array of k-v pairs for future lookups
use a sub-pipeline in $lookup to find the value by the uuid
use $mergeObject to combine the original values(i.e. "20200206"...) with the new field name looked up (i.e. "Phone"...)
wrangle the result back into original form using $arrayToObject and $replaceRoot
db.FormData.aggregate([
{
$match: {
"_id": ObjectId("5e3c27bf1ef77236945ef07b")
}
},
{
$project: {
kv: {
"$objectToArray": "$$ROOT"
}
}
},
{
$unwind: "$kv"
},
{
"$lookup": {
"from": "Form",
"let": {
uuid: "$kv.k"
},
"pipeline": [
{
$match: {
"_id": ObjectId("5e3c27bf1ef77236945ef07b")
}
},
{
"$unwind": "$Fields"
},
{
$match: {
$expr: {
$eq: [
"$$uuid",
"$Fields.FieldID"
]
}
}
},
{
$project: {
_id: false,
k: "$Fields.Title"
}
}
],
"as": "formLookup"
}
},
{
$unwind: "$formLookup"
},
{
$project: {
kv: {
"$mergeObjects": [
"$kv",
"$formLookup"
]
}
}
},
{
$group: {
_id: "$_id",
kv: {
$push: "$kv"
}
}
},
{
"$project": {
newDoc: {
"$arrayToObject": "$kv"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"_id": "$_id"
},
"$newDoc"
]
}
}
}
])
Mongo Playground
Another option is to start from Form collection and avoid $unwind:
$match and $lookup to get all needed data into one document
$objectToArray to get known keys for FormData
Match the items using $indexOfArray and $arrayElemAt and merge them using $mergeObjects. Then use arrayToObject to format the response
db.Form.aggregate([
{$match: {_id: ObjectId("5e3c27bf1ef77236945ef07b")}},
{$lookup: {
from: "FormData",
localField: "_id",
foreignField: "_id",
as: "formLookup",
pipeline: [{$project: {_id: 0}}]
}},
{$set: {formLookup: {$objectToArray: {$first: "$formLookup"}}}},
{$replaceRoot: {
newRoot: {
$mergeObjects: [
{$arrayToObject: {
$map: {
input: "$formLookup",
in: {$mergeObjects: [
{v: "$$this.v"},
{k: {$getField: {
input: {$arrayElemAt: [
"$Fields",
{$indexOfArray: ["$Fields.FieldID", "$$this.k"]}
]},
field: "Title"
}}}
]}
}
}},
{_id: "$_id"}
]
}
}}
])
See how it works on the playground example

MongoDB: slow performance pipeline lookup compared to basic lookup

I have two collection:
matches:
[{
date: "2020-02-15T17:00:00Z",
players: [
{_id: "5efd9485aba4e3d01942a2ce"},
{_id: "5efd9485aba4e3d01942a2cf"}
]
},
{...}]
and players:
[{
_id: "5efd9485aba4e3d01942a2ce",
name: "Rafa Nadal"
},
{
_id: "5efd9485aba4e3d01942a2ce",
name: "Roger Federer"
},
{...}]
I need to use lookup pipeline because I'm building a graphql resolver with recursive functions and I need nested lookup. I've followed this example https://docs.mongodb.com/datalake/reference/pipeline/lookup-stage#nested-example
My problem is that with pipeline lookup I need 11 seconds but with basic lookup only 0.67 seconds. And my test database is very short! about 1300 players and 700 matches.
This is my pipeline lookup (11 seconds to resolve)
db.collection('matches').aggregate([{
$lookup: {
from: 'players',
let: { ids: '$players' },
pipeline: [{ $match: { $expr: { $in: ['$_id', '$$ids' ] } } }],
as: 'players'
}
}]);
And this my basic lookup (0.67 seconds to resolve)
db.collection('matches').aggregate([{
$lookup: {
from: "players",
localField: "players",
foreignField: "_id",
as: "players"
}
}]);
Why so much difference? In what way can I do faster pipeline lookup?
The thing is that when you do a lookup using pipeline with a match stage, then the index would be used only for the fields that are matched with $eq operator and for the rest index will not be used.
And the example you specified with pipeline will work like this ( again index will not be used here as it is not $eq )
db.matches.aggregate([
{
$lookup: {
from: "players",
let: {
ids: {
$map: {
input: "$players",
in: "$$this._id"
}
}
},
pipeline: [
{
$match: {
$expr: {
$in: [
"$_id",
"$$ids"
]
}
}
}
],
as: "players"
}
}
])
As players is an array of object so it need to be mapped to array of ids first
MongoDB Playground
As #namar sood comments there are several tickets that refer to this issue:
https://jira.mongodb.org/browse/SERVER-37470
https://jira.mongodb.org/browse/SERVER-32549
Meanwhile a solution could be (also works nested):
db.collection('matches').aggregate([
{ $unwind: '$players' },
{
$lookup: {
from: 'players',
let: { id: '$players' },
pipeline: [{ $match: { $expr: { $eq: ['$_id', '$$id' ] } } }],
as: 'players'
},
{ $unwind: '$players' },
{
$group: {
"_id": "$_id",
"data": { "$first": "$$ROOT" },
"players": {$push: "$players"}
}
},
{ $addFields: {"data.players": "$players"} },
{ $replaceRoot: { newRoot: "$data" }}
]);

Aggregate function two match with lookup mongodb

db.setting.aggregate([
{
$match: {
status: true,
deleted_at: 0,
_id: {
$in: [
ObjectId("5c4ee7eea4affa32face874b"),
ObjectId("5ebf891245aa27c290672325")
]
}
}
},
{
$lookup: {
from: "site",
localField: "_id",
foreignField: "admin_id",
as: "data"
}
},
{
$project: {
name: 1,
status: 1,
numberOfRecord: {
$size: "$data"
}
}
},
{
$sort: {
numberOfRecord: 1
}
}
])
I would like to fetch record and number of record which are greater then equal to 2020-01-01 tried to add below code but doesn't get success.
How can I make this correct please guide thanks in advance. Here is playground https://mongoplayground.net/p/GU8WbTVqo2I
{
$match: {
"data.createdAt": {
$gte: new Date("2020-01-01")
}
}
},
Output should be
[
{
"_id": ObjectId("5ebf891245aa27c290672325"),
"name": "Menz",
"numberOfRecord": 0,
"status": true
},
{
"_id": ObjectId("5c4ee7eea4affa32face874b"),
"name": "Dave",
"numberOfRecord": 1, // instead 2 bcoz this is only gte "2020-01-01"
"status": true
}
]
You can use $unwind to split up the array, but using $match will completely eliminate documents that don't have any matching documents, so you would need to used $group with a conditional count, perhaps:
db.setting.aggregate([
{$match: {
status: true,
deleted_at: 0,
_id: {
$in: [
ObjectId("5c4ee7eea4affa32face874b"),
ObjectId("5ebf891245aa27c290672325")
]
}
}},
{$lookup: {
from: "site",
localField: "_id",
foreignField: "admin_id",
as: "data"
}},
{$unwind: {
path: "$data",
preserveNullAndEmptyArrays: true
}},
{$group: {
_id: "$_id",
name: {$first: "$name"},
status: {$first: "$status"},
numberOfRecord: {
$sum:{
$cond:{
if:{
$gte:[
"$data.createdAt",
new Date("2020-01-01")
]
},
then: 1,
else: 0
}
}
}
}},
{$sort: { numberOfRecord: 1 }}
])
Playground

Using $group in aggregate Spring Data equivalent

Following is the MongoDB query. I am stuck on finding the right Spring API equivalent for the group and project.
db.users.aggregate([
{ $lookup: { from: "organizations", localField: "orgId",
foreignField: "_id", as:"organization" } },
{$unwind: "$organization"},
{ $group: { _id: "$userOrgMap.orgId", userCount: { $sum: 1 }, name: { "$first": "$organization.name"}, status: { "$first": "$organization.status"}, },
{$project:{ _id:1, name:1, status:1, userCount :1 } }, { $sort:{userCount:-1} } ]);
Need help to provide details on the Spring Data MongoDB API which needs to be used for the $group and $project pipelines