MongoDB $lookup with conditional foreignField - mongodb

Playground: https://mongoplayground.net/p/OxMnsCFZpmQ
My MongoDB version: 4.2.
I have a collection car_parts and customers.
As the name suggests car_parts has car parts, where some of them can have a field sub_parts which is a list of car_parts._ids this part consists of.
Every customer that bought something at us is stored in customers. The parts field for a customer contains a list of parts the customer bought together on a certain date.
I would like to have an aggregate query in MongoDB that returns a mapping of which car parts were bought (bought_parts) from which customers. However, if the car_parts has the field sub_parts, the customer should show up for the subparts only.
So the query in the playground gives almost the correct result already, except for the sub_parts topic.
Example for customer_3:
{
"_id": "customer_3",
"parts": [
{
"bought_parts": [
3
],
date: "15.07.2020"
}
]
}
Since bought_parts has car_parts._id = 3:
{
"_id": 3,
"name": "steering wheel",
"sub_parts": [
1, // other car_parts._id s
2
]
}
The result should show customer_3 as a customer of car parts 1 and 2.
I'm not sure how to accomplish this, but I assume a "temporary" replacement of the id 3 in bought_parts with the actual ids [1,2] might solve it.
Expected output:
[
{
"_id": 1,
"customers": [
"customer_1",
"customer_2",
"customer_3" // <- since customer_3 bought car part 3 which has 1 in sub_parts
]
},
{
"_id": 2,
"customers": [
"customer_3" // <- since customer_3 bought car part 3 which has 2 in sub_parts
]
},
{
"_id": 3,
"customers": [
"customer_1", // <- since car_parts.id = 3 has [1, 2] in sub_parts, show customers of ids [1, 2]
"customer_2",
"customer_3"
]
},
{
"_id": 4,
"customers": [
"customer_1",
"customer_2"
]
}
]
Thanks a lot in advance!

EDIT: One way to do it is:
db.car_parts.aggregate([
{
$project: {
topLevel: {$concatArrays: [{$ifNull: ["$sub_parts", []]}, ["$_id"]]},
sub_parts: 1
}
},
{$unwind: "$topLevel"},
{
$group: {
_id: "$topLevel",
parts: {$push: "$_id"},
sub_parts: {$first: "$sub_parts"}
}
},
{
$project: {
parts: {$concatArrays: [{"$ifNull": ["$sub_parts", []]}, "$parts"]}
}
},
{
$lookup: {
from: "customers",
localField: "parts",
foreignField: "parts.scanned_parts",
as: "customers"
}
},
{$project: {customers: "$customers._id"}}
])
As you can see working on this playground.
Since you said there is only one level of sub-parts, I used another idea: creating a top level before the $lookup. Since you want customers that used part 3 for example, to be registered under parts 1,2 which are sub-parts of 3, the idea is to group them. This connection is a bit clumsy after the $lookup, but if we use the data that we have on the car_parts collection before the $lookup, we actually knows already that parts 1,2 are subpart of 3. Creating a topLevel temporary field, allows to group, in advance, all the parts and sub-parts that if a customer used on of them, he should be registered under this top level part. This makes things much more elegant...

Related

join two collections by a common field and get only a few fields

I'm new to MongoDB and I have 2 collections, one called "EN" and another one called "csv_import". I just need to join these 2 collections using a common field and get the results. For results, I just need the Part number and product id. The 2 collections structure is as follow:
csv_import:
product_id
part_no
vendor_standard
EN:
under object "ICECAT-interface.Product":
#Prod_id
#Name
(these are the main ones but there are other non important fields, for the sake of this example I'm including only relevant ones.
just as clarification, "#" is part of the field name
I'm using this to join the two collections:
db.EN.aggregate([
{
$lookup: {
from: 'csv_import',
localField: 'ICECAT-interface.Product.#Prod_id',
foreignField: 'part_no',
as: 'part_number'
}
}]);
Unfortunately, I get an empty array (see screenshot) when I'm expecting just the results that match. Also, how can I specify which fields I want to get back? I thought adding "as: part_number" would be enough but doesn't seem to be the case
Here's some collection sample (taken from "EN")
[{
"_id": "1414",
"ICECAT-interface": {
"#xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"#xsi:noNamespaceSchemaLocation": "https://data.icecat.biz/xsd/ICECAT-interface_response.xsd",
"Product": {
"#Code": "1",
"#HighPic": "https://images.icecat.biz/img/norm/high/1414-HP.jpg",
"#HighPicHeight": "400",
"#HighPicSize": "43288",
"#HighPicWidth": "400",
"#ID": "1414",
"#LowPic": "https://images.icecat.biz/img/norm/low/1414-HP.jpg",
"#LowPicHeight": "200",
"#LowPicSize": "17390",
"#LowPicWidth": "200",
"#Name": "C6614NE",
"#IntName": "C6614NE",
"#LocalName": "",
"#Pic500x500": "https://images.icecat.biz/img/gallery_mediums/img_1414_medium_1480667779_072_2323.jpg",
"#Pic500x500Height": "500",
"#Pic500x500Size": "101045",
"#Pic500x500Width": "500",
"#Prod_id": "C6614NE",
Sample collection data taken from "csv_import" collection:
{
"_id": "ObjectId(\"6348339cc6e5c8ce0b7da5a4\")",
"index": 23679,
"product_id": 4019734,
"part_no": "CP-HAR-EP-ADVANCED-REN-1Y",
"vendor_standard": "Check Point"
},
EDIT
I was able to run this:
db.EN.aggregate([
{
$lookup: {
from: "csv_import",
let: { pn: "$ICECAT-interface.Product.#Prod_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$$pn", "$part_no" ] } } }
],
as: "part_number_info"
}
},
{ $match: { part_number_info: { $ne: [] } } }
])
But it takes ages to complete a (lot of CPU processing) and I ran explain and it does a COLLSCAN, I do have an index on #Prod_id, not sure if I need an additional index

Windowing function in MongoDB

I have a collection that is made up of companies. Each company has a "number_of_employees" as well as a subdocument of "offices" which includes "state_code" and "country_code". For example:
{
'_id': ObjectId('52cdef7c4bab8bd675297da5'),
'name': 'Technorati',
'number_of_employees': 35,
'offices': [
{'description': '',
'address1': '360 Post St. Ste. 1100',
'address2': '',
'zip_code': '94108',
'city': 'San Francisco',
'state_code': 'CA',
'country_code': 'USA',
'latitude': 37.779558,
'longitude': -122.393041}
]
}
I'm trying to get the number of employees per state across all companies. My latest attempt looks like:
db.research.aggregate([
{ "$match": {"offices.country_code": "USA" } },
{ "$unwind": "$offices" },
{ "$project": { "_id": 1, "number_of_employees": 1, "offices.state_code": 1 } }
])
But now I'm stuck on how to do the $group. Because the num_of_employees is at the company level and not the office level I want to split them evenly across the offices. For example, if Technorati has 5 offices in 5 different states then each state would be allocated 7 employees.
In SQL I could do this easily enough using a windowed function to get average employees across offices by company and then summing those while grouping by state. I can't seem to find any clear examples of similar functionality in MongoDB though.
Note, this is for a school assignment, so the use of third-party libraries isn't feasible. Also, I'm hoping that this can all be done in a simple snippet of code, possibly even one call. I could certainly create new intermediate collections or do this in Python and process data there, but that's probably outside of the scope of the homework.
Anything to point me in the right direction would be greatly appreciated!
You are actually on the right track. You just need to derive an extra field numOfEmpPerOffice by using $divide and $sum it when $group by state.
db.collection.aggregate([
{
"$match": {
"offices.country_code": "USA"
}
},
{
"$addFields": {
"numOfEmpPerOffice": {
"$divide": [
"$number_of_employees",
{
"$size": "$offices"
}
]
}
}
},
{
"$unwind": "$offices"
},
{
$group: {
_id: "$offices.state_code",
totalEmp: {
$sum: "$numOfEmpPerOffice"
}
}
}
])
Here is the Mongo playground for your reference.

How to maintain the top count(s) of array elements in mongoDB?

I am looking a way to get the top two (or any other number) counts of a specific element from the given collection.
{"id": "xyz" , "fruits": ["Apple", "Mango"]}
{"id": "abx", "fruits": ["Apple", "Banana"]}
{"id" : "pqr", "fruits": ["Apple", "Mango"]}
For above example, the result would be: Apple and Mango because the occurrence of Apple (three times) is higher followed by Mango (two times). Do I need to go with Mongo map-reduce functionality?
I am more leaned towards the performance and stability of backend platform. How can I move forward if the "number of occurrence" is happening real time?
Any help would be appreciable.
You could use aggregate. Here is a simple example which assumes that a fruit value will not be repeated within a single document:
[
{
$unwind: "$fruits"
},
{
$group: {
_id: "$fruits",
count: {$sum: 1}
}
},
{
$sort: {count:-1}
},
{
$limit: 2
}
]

mongoDB find document greatest date and check value

I have a Conversation collection that looks like this:
[
{
"_id": "QzuTQYkGDBkgGnHrZ",
"participants": [
{
"id": "YymyFZ27NKtuLyP2C"
},
{
"id": "d3y7uSA2aKCQfLySw",
"lastVisited": "2016-02-04T02:59:10.056Z",
"lastMessage": "2016-02-04T02:59:10.056Z"
}
]
},
{
"_id": "e4iRefrkqrhnokH7Y",
"participants": [
{
"id": "d3y7uSA2aKCQfLySw",
"lastVisited": "2016-02-04T03:26:33.953Z",
"lastMessage": "2016-02-04T03:26:53.509Z"
},
{
"id": "SRobpwtjBANPe9hXg",
"lastVisited": "2016-02-04T03:26:35.210Z",
"lastMessage": "2016-02-04T03:15:05.779Z"
}
]
},
{
"_id": "twXPHb76MMxQ3MQor",
"participants": [
{
"id": "d3y7uSA2aKCQfLySw"
},
{
"id": "SRobpwtjBANPe9hXg",
"lastMessage": "2016-02-04T03:27:35.281Z",
"lastVisited": "2016-02-04T03:57:51.036Z"
}
]
}
]
Each conversation (document) can have a participant object with the properties of id, lastMessage, lastVisited.
Sometimes, depending on how new the conversation is, some of these values don't exist just yet (such as lastMessage, lastVisited).
What I'm trying to do is compare each participant in each individual conversation (document) and see if out of the all the participants, the greatest lastMessage field value belongs to the logged in user. Otherwise, I'm assuming that the conversation has messages that the logged in user hasn't seen yet. I want to get that count of messages that the user possibly hasn't seen yet.
In the example above, say we're logged in as d3y7uSA2aKCQfLySw. We can see that he was the last person to send a message for conversation 1, 2 BUT not 3. The count returning for how many updated conversations that d3y7uSA2aKCQfLySw hasn't seen should be 1.
Can someone point me in the right direction? I haven't the slightest clue as to how to approach the issue. My apologies for the lengthy question.
It is always advisable to store dates as ISODate rather than strings to leverage the flexibility provided by various date operators in the aggregation framework.
One way of getting the count is to,
$match the conversations in which the user is involved.
$unwind the participants field.
$sort by the lastMessage field in descending order
$group by the _id to get back the original conversations intact, and get the latest message per group(conversation) using the $first operator.
$project a field with value 0, for each group where the top most record is of the user we are looking for and 1 for others.
$group again to get the total count of the conversations in which he has not been the last one to send a message.
sample code:
var userId = "d3y7uSA2aKCQfLySw";
db.t.aggregate([
{
$match:{"participants.id":userId}
},
{
$unwind:"$participants"
},
{
$sort:{"participants.lastMessage":-1}
},
{
$group:{"_id":"$_id","lastParticipant":{$first:"$$ROOT.participants"}}
},
{
$project:{
"hasNotSeen":{$cond:[
{$eq:["$lastParticipant.id",userId]},
0,
1
]},
"_id":0}
},
{
$group:{"_id":null,"count":{$sum:"$hasNotSeen"}}
},
{
$project:{"_id":0,"numberOfConversationsNotSeen":"$count"}
}
])
I'd like to try this function.
function findUseen(uId) {
var numMessages = db.demo.aggregate(
[
{
$project: {
"participants.lastMessage": 1,
"participants.id": 1
}
},
{$unwind: "$participants"},
{$sort: {"participants.lastMessage": -1}},
{
$group: {
_id: "$_id",
participantsId: {$first: "$participants.id"},
lastMessage: {$max: "$participants.lastMessage"}
}
},
{$match: {participantsId: {$ne: uId}}},
]
).toArray().length;
return numMessages;
}
calling findUnseen("d3y7uSA2aKCQfLySw") will return 1.
I have adopted this function just to return count, but as you see it's easy to tweak it to return all unseen message metadata too.

Can I use populate before aggregate in mongoose?

I have two models, one is user
userSchema = new Schema({
userID: String,
age: Number
});
and the other is the score recorded several times everyday for all users
ScoreSchema = new Schema({
userID: {type: String, ref: 'User'},
score: Number,
created_date = Date,
....
})
I would like to do some query/calculation on the score for some users meeting specific requirement, say I would like to calculate the average of score for all users greater than 20 day by day.
My thought is that firstly do the populate on Scores to populate user's ages and then do the aggregate after that.
Something like
Score.
populate('userID','age').
aggregate([
{$match: {'userID.age': {$gt: 20}}},
{$group: ...},
{$group: ...}
], function(err, data){});
Is it Ok to use populate before aggregate? Or I first find all the userID meeting the requirement and save them in a array and then use $in to match the score document?
No you cannot call .populate() before .aggregate(), and there is a very good reason why you cannot. But there are different approaches you can take.
The .populate() method works "client side" where the underlying code actually performs additional queries ( or more accurately an $in query ) to "lookup" the specified element(s) from the referenced collection.
In contrast .aggregate() is a "server side" operation, so you basically cannot manipulate content "client side", and then have that data available to the aggregation pipeline stages later. It all needs to be present in the collection you are operating on.
A better approach here is available with MongoDB 3.2 and later, via the $lookup aggregation pipeline operation. Also probably best to handle from the User collection in this case in order to narrow down the selection:
User.aggregate(
[
// Filter first
{ "$match": {
"age": { "$gt": 20 }
}},
// Then join
{ "$lookup": {
"from": "scores",
"localField": "userID",
"foriegnField": "userID",
"as": "score"
}},
// More stages
],
function(err,results) {
}
)
This is basically going to include a new field "score" within the User object as an "array" of items that matched on "lookup" to the other collection:
{
"userID": "abc",
"age": 21,
"score": [{
"userID": "abc",
"score": 42,
// other fields
}]
}
The result is always an array, as the general expected usage is a "left join" of a possible "one to many" relationship. If no result is matched then it is just an empty array.
To use the content, just work with an array in any way. For instance, you can use the $arrayElemAt operator in order to just get the single first element of the array in any future operations. And then you can just use the content like any normal embedded field:
{ "$project": {
"userID": 1,
"age": 1,
"score": { "$arrayElemAt": [ "$score", 0 ] }
}}
If you don't have MongoDB 3.2 available, then your other option to process a query limited by the relations of another collection is to first get the results from that collection and then use $in to filter on the second:
// Match the user collection
User.find({ "age": { "$gt": 20 } },function(err,users) {
// Get id list
userList = users.map(function(user) {
return user.userID;
});
Score.aggregate(
[
// use the id list to select items
{ "$match": {
"userId": { "$in": userList }
}},
// more stages
],
function(err,results) {
}
);
});
So by getting the list of valid users from the other collection to the client and then feeding that to the other collection in a query is the onyl way to get this to happen in earlier releases.