Complex aggregation query with in clause from document array - mongodb

Below is the sample MongoDB Data Model for a user collection:
{
"_id": ObjectId('58842568c706f50f5c1de662'),
"userId": "123455",
"user_name":"Bob"
"interestedTags": [
"music",
"cricket",
"hiking",
"F1",
"Mobile",
"racing"
],
"listFriends": [
"123456",
"123457",
"123458"
]
}
listFriends is an array of userId for other users
For a particular userId I need to extract the listFriends (userId's) and for those userId's I need to aggregate the interestedTags and their count.
I would be able to achieve this by splitting the query into two parts:
1.) Extract the listFriends for a particular userId,
2.) Use this list in an aggregate() function, something like this
db.user.aggregate([
{ $match: { userId: { $in: [ "123456","123457","123458" ] } } },
{ $unwind: '$interestedTags' },
{ $group: { _id: '$interestedTags', countTags: { $sum : 1 } } }
])
I am trying to solve the question: Is there a way to achieve the above functionality (both steps 1 and 2) in a single aggregate function?

You could use $lookup to look for friend documents. This stage is usually used to join two different collection, but it can also do join upon one single collection, in your case I think it should be fine:
db.user.aggregate([{
$match: {
_id: 'user1',
}
}, {
$unwind: '$listFriends',
}, {
$lookup: {
from: 'user',
localField: 'listFriends',
foreignField: '_id',
as: 'friend',
}
}, {
$project: {
friend: {
$arrayElemAt: ['$friend', 0]
}
}
}, {
$unwind: '$friend.interestedTags'
}, {
$group: {
_id: '$friend.interestedTags',
count: {
$sum: 1
}
}
}]);
Note: I use $lookup and $arrayElemAt which are only available in Mongo 3.2 or newer version, so check your Mongo version before using this pipeline.

Related

How to ggregate two collections and match field with array

I need to group the results of two collections candidatos and ofertas, and then "merge" those groups to return an array with matched values.
I've created this example with the aggregate and similar data to make this easier to test:
https://mongoplayground.net/p/m0PUfdjEye4
This is the explanation of the problem that I'm facing.
I can get both groups with the desired results independently:
candidatos collection:
db.getCollection('ofertas').aggregate([
{"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}}
]);
This is the result...
ofertas collection:
db.getCollection('candidatos').aggregate([
{"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}}
]);
This is the result...
What I need to do, is to aggregate those groups to merge their results based on their _id coincidence. I think I'm going in the right way with the next aggregate, but the field countOfertas always returns 0.0. I think that there is something wrong in my project $cond, but I don't know what is it. This is the aggregate:
db.getCollection('candidatos').aggregate([
{"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}},
{
$lookup: {
from: 'ofertas',
let: {},
pipeline: [
{"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}}
],
as: 'ofertas'
}
},
{
$project: {
_id: 1,
countProvinciaCandidato: 1,
countOfertas: {
$cond: {
if: {
$eq: ['$ofertas._id', "$_id"]
},
then: '$ofertas.countProvinciaOferta',
else: 0,
}
}
}
},
{ $sort: { "countProvinciaCandidato": -1}},
{ $limit: 20 }
]);
And this is the result, but as you can see, field countOfertas is always 0
Any kind of help will be welcome
What you have tried is so much appreciated. But in $project you need to use $reduce which helps to loop through the array and satisfy the condition
Here is the code
db.candidatos.aggregate([
{
"$group": {
_id: "$que_busco.ubicacion_puesto_trabajo.provincia",
countProvinciaCandidato: { $sum: 1 }
}
},
{
$lookup: {
from: "ofertas",
let: {},
pipeline: [
{
"$group": {
_id: "$ubicacion_puesto.provincia",
countProvinciaOferta: { $sum: 1 }
}
}
],
as: "ofertas"
}
},
{
$project: {
_id: 1,
countProvinciaCandidato: 1,
countOfertas: {
"$reduce": {
"input": "$ofertas",
initialValue: 0,
"in": {
$cond: [
{ $eq: [ "$$this._id", "$_id" ] },
{ $add: [ "$$value", 1 ] },
"$$value"
]
}
}
}
}
},
{ $sort: { "countProvinciaCandidato": -1 } },
{ $limit: 20 }
])
Working Mongo playground
Note : If you need to do with aggregations only, this is fine. But I personally feel this approach is not good. My suggestion is, you can concurrently call group aggregations in different service and do it with programmatically. Because $lookup is expensive, when you get massive data, this performance will be reduced
The $eq in the $cond is comparing an array to an ObjectId, so it never matches.
The $lookup stage results will be in the ofertas field as an array of documents, so '$ofertas._id' will be an array of all the _id values.
You will probably need to use $unwind, $reduce after the $lookup.

Array is reordered when using $lookup

I have this aggregation:
db.getCollection("users").aggregate([
{
"$match": {
"_id": "5a708a38e6a4078bd49f01d5"
}
},
{
"$lookup": {
"from": "user-locations",
"localField": "locations",
"as": "locations",
"foreignField": "_id"
}
}
])
It works well, but there is one small thing that I don't understand and I can't fix.
In the query output, the locations array is reordered by ObjectId and I really need to keep the original order of data.
Here is how the locations array from the users collection looks like
'locations' : [
ObjectId("5b55e9820b720a1a7cd19633"),
ObjectId("5a708a38e6a4078bd49ef13f")
],
And here is the result after the aggregation:
'locations' : [
{
'_id' : ObjectId("5a708a38e6a4078bd49ef13f"),
'name': 'Location 2'
},
{
'_id' : ObjectId("5b55e9820b720a1a7cd19633"),
'name': 'Location 1'
}
],
What am I missing here? I really have no idea how to proceed with this issue.
Could you give me a push?
$lookup does not guarantee order of result documents, you can try a approach to manage natural order of document,
$unwind deconstruct locations array and add auto index number will start from 0,
$lookup with locations
$set to select first element from locations
$sort by index field in ascending order
$group by _id and reconstruct locations array
db.users.aggregate([
{ $match: { _id: "5a708a38e6a4078bd49f01d5" } },
{
$unwind: {
path: "$locations",
includeArrayIndex: "index"
}
},
{
$lookup: {
from: "user-locations",
localField: "locations",
foreignField: "_id",
as: "locations"
}
},
{ $set: { locations: { $arrayElemAt: ["$locations", 0] } } },
{ $sort: { index: 1 } },
{
$group: {
_id: "$_id",
locations: { $push: "$locations" }
}
}
])
Playground
From this closed bug report:
When using $lookup, the order of the documents returned is not guaranteed. The documents are returned in "natural order" - as they are encountered in the database. The only way to get a guaranteed consistent order is to add a $sort stage to the query.
Basically the way any Mongo query/pipeline works is that it returns documents in the order they were matched, meaning the "right" order is not guaranteed especially if there's indes usage involved.
What you should do is add a $sort stage as suggested, like so:
db.collection.aggregate([
{
"$match": {
"_id": "5a708a38e6a4078bd49f01d5"
}
},
{
"$lookup": {
"from": "user-locations",
"let": {
"locations": "$locations"
},
"pipeline": [
{
"$match": {
"$expr": {
"$setIsSubset": [
[
"$_id"
],
"$$locations"
]
}
}
},
{
$sort: {
_id: 1 // any other sort field you want.
}
}
],
"as": "locations",
}
}
])
You can also keep the original $lookup syntax you're using and just $unwind, $sort and then $group to restore the structure.

Optimization and Indexing on Mongo Query

Help me on what kind of indexes need to be created and fields to be indexed.
I have tested multiple indexes but still it taking long time to execute
db.collection_1.aggregate([{ $match: { $and: [ { date: { $gte: new Date(1593561600000), $lt: new Date(1604966400000) } },
{ type: 0 }, { $or: [ { partysr: 0 }, {} ] }, { $or: [ { code: "******" }, { _id: { $type: -1 } } ] } ] } },
{ $sort: { date: -1 } }, { $skip: 0 }, { $limit: 100 },
{ $lookup: { from: "collection_2", localField: "code", foreignField: "code", as: "j" } },
{ $group: { _id: "$codeTsr" } } ]).explain("executionStats")
You gave us very little information here.
Have you tried testing this query with and without the $lookup stage?
How does the query behave without lookup speed-wise?
My first guess here is that your collection_2 collection does not have a proper index and it slows query. If your query is much faster without the lookup stage, I would create index on collection_2 for the "code" property.
Also, one more performance optimization might be to first do the $group stage, and after that, you do the $lookup stage.

How to use NOT IN array condition inside mongodb $lookup aggregate

I have two collections:
Users:
{
_id: ObjectId('5e11d2d8ad9c4b6e05e55b82'),
name:"vijay"
}
Followers :
{
_id:ObjectId('5ed0c8faac47af698ab9f659'),
user_id:ObjectId('5e11d2d8ad9c4b6e05e55b82'),
following:[
ObjectId(5ee5ca5fac47af698ab9f666'),
ObjectId('5df7c0a66243414ad2663088')
]
created_at:"2020-05-29T08:34:02.959+00:00"
}
I need to list all users who are not in the following array from users table for a particular user, I've come up with the below by writing aggregate function in followers table.
[
{
'$match': {
'user_id': new ObjectId('5e11d2d8ad9c4b6e05e55b82')
}
}, {
'$project': {
'user_id': '$user_id',
'following': '$following'
}
}, {
'$lookup': {
'from': 'users',
'pipeline': [
{
'$match': {
'_id': {
'$nin': [
'$following'
]
}
}
}
],
'as': 'result'
}
}
]
but this is not populating the output I needed.
Can anyone help me with this?
Thanks
You should use $not $in with $expr expression, Because $nin is a query operator not for aggregation expression,
one more fix you need to create variable using let: { following: "$following"} and use inside pipeline $$following, because lookup pipeline will not allow to access fields without reference,
{
$lookup: {
from: "Users",
let: {
following: "$following"
},
pipeline: [
{
$match: {
$expr: {
$not: {
$in: [
"$_id",
"$$following"
]
}
}
}
}
],
as: "result"
}
}
Working Playground: https://mongoplayground.net/p/08OT6NnuYHx

Mongodb: FInd one article for each author ID

I have a user with an array of authors that he follow, like this:
"authors" : [
ObjectId("5a66d368486631e55a4ed05c"),
ObjectId("5a6765f5486631e55a564ae2")
]
And I have articles with author ID, like this:
"authorId" : ObjectId("5a66d368486631e55a4ed05c"),
I want to get the last article for each author without making multiples calls to the database with a recursivity.
Some ideas?
PD: I'm using the mongodb driver, I don't want to use mongoose for this, thanks
In MongoDB v 3.6 you can use custom pipelines for $lookup operator. In your case you can use $in inside $match stage to get matching articles and then $group those articles by authorId and take last one (using $sort and $last operators). You can add $replaceRoot to get initial shape from articles collection.
db.user.aggregate([
{
$match: { userId: "some user Id" }
},
{
$lookup: {
from: "articles",
let: { authors: "$authors" },
pipeline: [
{
$match: {
$expr: {
$in: [ "$authorId", "$$authors" ]
}
}
},
{
$sort: { createdAt: -1 }
},
{
$group: {
_id: "$authorId",
article: { $first: "$$ROOT" }
}
},
{
$replaceRoot: { newRoot: "$article" }
}
],
as: "articles"
}
}
])