MongoDB select distinct where not in select distinct - mongodb

In a mongoDB bd, I need to find all the records where those records aren't in a different collection
Say I have 2 collections
1) user_autos
{
make: string,
user_id: objId
}
2) auto_makes
{
mfg: string,
make: string
}
I need to find all the "makes" that are not part of the "master makes" list
I want to do the parallel to this SQL
SELECT DISTINCT
a.make
FROM
user_autos a
WHERE
a.make NOT IN (
SELECT DISTINCT
b.make
FROM
auto_makes b
)
Help please

to achieve this, you need to make use of aggragation with pipeline stage 'lookup'.
lookup does left join between two collections. so, obviously the unmatching documents of
'user_autos' gives an empty nested array 'auto_makes'. and then 'group' the 'user_autos'
with 'make'. so that a list of 'user_auto' documents will be resulted.
you can do it as below.
db.user_autos.aggregate([
{$lookup:{
from:"äuto_makes",
localField:"make",
foreignField:"make",
as:"m"
}},
{$match:{
m:{$exists:false}
}},
{$group:{
_id:"$make"
}}
//if you want to get the distinct 'make' values as an array of single
//document, add another $group stage.
{$group:{
_id:"",
make_list:{$addToSet:"$_id"}
}}
])
Visit https://docs.mongodb.com/manual/reference/operator/aggregation/group/ ,
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

Related

Mongodb4.4 aggregation sorting random behaviour

I have an aggregation query that i am running on mongo4.4 and getting weird sorting order . If the order of two document is same , getting random sorting order for those document having same sorting order . Ideally if sorting order is same then results should be sorted by natural order . Query is running fine on mongo3.6 .
db.getCollection('job').aggregate([
{"$match":{"$text":{"$search":"\"Cleaner\""}}},
{"$match":{"active":true}},
{"$match":{"status":"OPEN"}},
{"$project":
{"id":1,"source":1,"feed":1,"cardType":1,"groupCategory":1,"isPremium":1,"premiumTillDate":1,"createdDate":1,"title":1,"featuredImageUrl":1,"companyName":1,"salaryType":1,"contractType":1,"jobDescription":1,"location":1,"scope":1,"microRole":1,"address":1,"minimumSalary":1,"hiringManagerName":1,"hiringManagerImageUrl":1,"createdBy":1,"perks":1,"showMapView":1,"distance":1,"startDate":1,"link":"$externalJobDetail.link","publishDateTime":"$externalJobDetail.publishDateTime","salaryDescription":"$externalJobDetail.salaryDescription","companyJobLogoURL":"$externalJobDetail.companyJobLogoURL","monetisation":"$monetisation.value","order":"$monetisation.value"}},
{"$sort":{"order":-1}},
{"$skip":0},
{"$limit":29}]
Add the _id field to the sort query to achieve a stable sort.
db.getCollection('job').aggregate(
[
// pipeline stages
{ $sort : { order : -1, _id: -1 } }
]
)
From the docs:
If a stable sort is desired, include at least one field in your sort that contains exclusively unique values. The easiest way to guarantee this is to include the _id field in your sort query.

How to filter an array in a MongoDB document based on query on collection?

Suppose the following collection of documents that include an 'user_id' field and an array of ids that this user follows
{"user_id": 1 , "follows" : [2,30]},
{"user_id": 2 , "follows" : [1,40]},
{"user_id": 3 , "follows" : [2,50]},
... large collection
I would like to filter out from "references" the numbers that don't exist in the collection as an id. Think about it as a data cleaning procedure, where follows to users that don't exist anymore need to be deleted. Example output from input above:
{"user_id": 1 , "follows" : [2]},
{"user_id": 2 , "follows" : [1]},
{"user_id": 3 , "follows" : [2]},
... large collection
I thought about a projection with a "$filter", but I can't find an expression for checking that a document with that id exists in the whole collection (as $filter seems to be limited to the current document).
Then I tried to aggregate a set of all ids to use an $in condition, but that failed miserable due to the size of collection (too large object error).
Thought about unwinding, but I'm hitting the same rock: can't find an expression to $match or $project that answers the question "Does this value of 'follows' exists as an 'id' in the collection?"
The only other thing I see doing the filtering client side with a few independent queries, but wanted to check first with the community if I'm missing something.
You could do a $lookup, like this:
$lookup: {
from: 'users',
localField: 'follows',
foreignField: 'user_id',
as: 'follows'
}
This will produce a result like { user_id: 1, follows: [ {user_id: 2, follows: [1, 40] } ] }. Then you should be able to get the result you want with $addFields (to map follows to follows.user_id).
$addFields: { follows: "$follows.user_id" }

Mongodb group by pair

I have got a data like this : DATA  , I try to group by domaine names , I want a result to be look like that :
[{
{ "domain": "gmail_com_"
"A": 3
"B": 5
"C": 3 },
............
}]
Where A,B are the lenght of the list for the domain names that match,and C is the size of duplicated ip address .But as you see in the result if the domain names is present in more than two diff timestamp it only group with the two first one, and I want to group two by two with all the possiblities, in my exemple , facebook is present in 3 diff tsp so we should have three diff pair. if someone can help me.
thnx
To get every possible pair of two discreet values from a series of documents, you will need to:
gather the values into an array
assign an index of some sort to identify each
duplicate the array
unwind both duplicates
eliminate pairs with the same index
An aggregation pipeline might look like so:
db.collection.aggregate([
{$group:{
_id:"$domain",
list:{$push:"$ip"}
}},
{$project:{
numberedList:{
$reduce: {
input: "$list",
initialValue: {a:[],c:0},
in:{
a:{$concatArrays:["$$value.a",[{ip:"$$this",idx:"$$value.c"}]]},
c:{$add:["$$value.c",1]}
}}}}},
{$project:{
left:"$numberedList",
right:"$numberedList"
}},
{$unwind:"$left"},
{$unwind:"$right"},
{$match:{$expr:{$ne:["$left.idx","$right.idx"]}}}
])
This should leave you with the domain name in _id and a pair of results in left and right, which you can then process as needed.

Mongodb aggregation skip and limit with sorting bring duplicate records in pagination

I have the following query that first sort the documents then skip and limit 10 records, following is my query:
db.getCollection('jobpostings').aggregate([
{"$match":{
"expireDate":{"$gte": ISODate("2018-08-12T00:00:00.000Z")},
"publishDate":{"$lt": ISODate("2018-08-13T00:00:00.000Z")},
"isPublished":true,
"isDrafted":false,
"deletedAt":{"$eq":null},
"deleted":false,
"blocked":{"$exists":false}
}},
{"$lookup":{"from":"companies","localField":"company.id","foreignField":"_id","as":"companyDetails"}},
{"$match":{"companyDetails":{"$ne":[]}}},
{"$sort":{
"isFeatured":-1,
"refreshes.refreshAt":-1,
"publishDate":-1
}},
{"$skip":0},
{"$limit":10},
{"$project":{
"position":1,"summary":1,"company":1,"publishDate":1,
"expireDate":{"$dateToString":{"format":"%Y-%m-%d","date":"$expireDate"}},
"locations":1,"minimumEducation":1,"workType":1,"skills":1,"contractType":1,
"isExtensible":1,"salary":1,"gender":1,"yearsOfExperience":1,"canApplyOnline":1,"number":1,
"isFeatured":1,"viewsCount":1,
"status":{"$cond":{
"if":{"$and":[
{"$lt":["$publishDate", ISODate("2018-08-13T00:00:00.000Z")]},
{"$gt":["$publishDate", ISODate("2018-08-11T00:00:00.000Z")]}]},"then":"New",
"else":{"$cond":{
"if":{"$lt":["$publishDate",ISODate("2018-08-12T00:00:00.000Z")]},"then":"Old","else":"Future"}}}},
"companyDetails.profilePic":1,"companyDetails.businessUnits":1,"companyDetails.totalRatingAverage":1,
"expiringDuration":{"$floor":{"$divide":[{"$subtract":["$expireDate",ISODate("2018-08-12T00:00:00.000Z")]},
86400000]}},
"companyDetails.totalReviews":{"$size":{"$ifNull":[{"$let":{"vars":{
"companyDetailOne":{"$arrayElemAt":["$companyDetails",0]}},"in":"$$companyDetailOne.reviews"}},[]]}}}}
])
And if I comment skip and limit following is my result:
But following is my result with skip = 0, limit = 10:
Now compare above results with following for skip=10, limit=10: highlighted documents are duplicate in second page (skip=10, limit=10):
And the same thing existed in other pages, for other documents.
It looks like the three fields you're sorting by are not unique and therefore the order can be different in subsequent executions. To fix that you can add additional field to your $sort. Since _id is always unique it can be a good candidate. Try:
{"$sort":{
"isFeatured":-1,
"refreshes.refreshAt":-1,
"publishDate":-1,
"_id": -1
}}

MongoDB: Get whole document with aggregate method

I'm trying to reach something like this:
I have collections of activities that belong to some user.
I want to get the activity names distincted ordered by 'added_time', so I used 'group by' on the activity name and get the max value of 'added_time'.
Also, I want to sort them by 'added_time', and then to get the whole document.
The only thing that I reached so far, is to get only the name that I grouped by, and the 'added_time' property.
This is the query:
db.getCollection('user_activities').aggregate
(
{$match: {'type': 'food', 'user_id': '123'}},
{$group:{'_id':'$name', 'added_time':{$max:'$added_time'}}},
{$sort:{'added_time':-1}},
{$project: {_id: 0,name: "$_id",count: 1,sum: 1, 'added_time': 1}}
)
Can someone help me with reaching the whole document?
Thank's!