join two collections by a common field and get only a few fields - mongodb

I'm new to MongoDB and I have 2 collections, one called "EN" and another one called "csv_import". I just need to join these 2 collections using a common field and get the results. For results, I just need the Part number and product id. The 2 collections structure is as follow:
csv_import:
product_id
part_no
vendor_standard
EN:
under object "ICECAT-interface.Product":
#Prod_id
#Name
(these are the main ones but there are other non important fields, for the sake of this example I'm including only relevant ones.
just as clarification, "#" is part of the field name
I'm using this to join the two collections:
db.EN.aggregate([
{
$lookup: {
from: 'csv_import',
localField: 'ICECAT-interface.Product.#Prod_id',
foreignField: 'part_no',
as: 'part_number'
}
}]);
Unfortunately, I get an empty array (see screenshot) when I'm expecting just the results that match. Also, how can I specify which fields I want to get back? I thought adding "as: part_number" would be enough but doesn't seem to be the case
Here's some collection sample (taken from "EN")
[{
"_id": "1414",
"ICECAT-interface": {
"#xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"#xsi:noNamespaceSchemaLocation": "https://data.icecat.biz/xsd/ICECAT-interface_response.xsd",
"Product": {
"#Code": "1",
"#HighPic": "https://images.icecat.biz/img/norm/high/1414-HP.jpg",
"#HighPicHeight": "400",
"#HighPicSize": "43288",
"#HighPicWidth": "400",
"#ID": "1414",
"#LowPic": "https://images.icecat.biz/img/norm/low/1414-HP.jpg",
"#LowPicHeight": "200",
"#LowPicSize": "17390",
"#LowPicWidth": "200",
"#Name": "C6614NE",
"#IntName": "C6614NE",
"#LocalName": "",
"#Pic500x500": "https://images.icecat.biz/img/gallery_mediums/img_1414_medium_1480667779_072_2323.jpg",
"#Pic500x500Height": "500",
"#Pic500x500Size": "101045",
"#Pic500x500Width": "500",
"#Prod_id": "C6614NE",
Sample collection data taken from "csv_import" collection:
{
"_id": "ObjectId(\"6348339cc6e5c8ce0b7da5a4\")",
"index": 23679,
"product_id": 4019734,
"part_no": "CP-HAR-EP-ADVANCED-REN-1Y",
"vendor_standard": "Check Point"
},
EDIT
I was able to run this:
db.EN.aggregate([
{
$lookup: {
from: "csv_import",
let: { pn: "$ICECAT-interface.Product.#Prod_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$$pn", "$part_no" ] } } }
],
as: "part_number_info"
}
},
{ $match: { part_number_info: { $ne: [] } } }
])
But it takes ages to complete a (lot of CPU processing) and I ran explain and it does a COLLSCAN, I do have an index on #Prod_id, not sure if I need an additional index

Related

MongoDB $lookup with conditional foreignField

Playground: https://mongoplayground.net/p/OxMnsCFZpmQ
My MongoDB version: 4.2.
I have a collection car_parts and customers.
As the name suggests car_parts has car parts, where some of them can have a field sub_parts which is a list of car_parts._ids this part consists of.
Every customer that bought something at us is stored in customers. The parts field for a customer contains a list of parts the customer bought together on a certain date.
I would like to have an aggregate query in MongoDB that returns a mapping of which car parts were bought (bought_parts) from which customers. However, if the car_parts has the field sub_parts, the customer should show up for the subparts only.
So the query in the playground gives almost the correct result already, except for the sub_parts topic.
Example for customer_3:
{
"_id": "customer_3",
"parts": [
{
"bought_parts": [
3
],
date: "15.07.2020"
}
]
}
Since bought_parts has car_parts._id = 3:
{
"_id": 3,
"name": "steering wheel",
"sub_parts": [
1, // other car_parts._id s
2
]
}
The result should show customer_3 as a customer of car parts 1 and 2.
I'm not sure how to accomplish this, but I assume a "temporary" replacement of the id 3 in bought_parts with the actual ids [1,2] might solve it.
Expected output:
[
{
"_id": 1,
"customers": [
"customer_1",
"customer_2",
"customer_3" // <- since customer_3 bought car part 3 which has 1 in sub_parts
]
},
{
"_id": 2,
"customers": [
"customer_3" // <- since customer_3 bought car part 3 which has 2 in sub_parts
]
},
{
"_id": 3,
"customers": [
"customer_1", // <- since car_parts.id = 3 has [1, 2] in sub_parts, show customers of ids [1, 2]
"customer_2",
"customer_3"
]
},
{
"_id": 4,
"customers": [
"customer_1",
"customer_2"
]
}
]
Thanks a lot in advance!
EDIT: One way to do it is:
db.car_parts.aggregate([
{
$project: {
topLevel: {$concatArrays: [{$ifNull: ["$sub_parts", []]}, ["$_id"]]},
sub_parts: 1
}
},
{$unwind: "$topLevel"},
{
$group: {
_id: "$topLevel",
parts: {$push: "$_id"},
sub_parts: {$first: "$sub_parts"}
}
},
{
$project: {
parts: {$concatArrays: [{"$ifNull": ["$sub_parts", []]}, "$parts"]}
}
},
{
$lookup: {
from: "customers",
localField: "parts",
foreignField: "parts.scanned_parts",
as: "customers"
}
},
{$project: {customers: "$customers._id"}}
])
As you can see working on this playground.
Since you said there is only one level of sub-parts, I used another idea: creating a top level before the $lookup. Since you want customers that used part 3 for example, to be registered under parts 1,2 which are sub-parts of 3, the idea is to group them. This connection is a bit clumsy after the $lookup, but if we use the data that we have on the car_parts collection before the $lookup, we actually knows already that parts 1,2 are subpart of 3. Creating a topLevel temporary field, allows to group, in advance, all the parts and sub-parts that if a customer used on of them, he should be registered under this top level part. This makes things much more elegant...

Trying to fetch data from Nested MongoDB Database?

I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.

Find documents matching ObjectIDs in a foreign array

I have a collection Users:
{
_id: "5cds8f8rfdshfd"
name: "Ted"
attending: [ObjectId("2cd9fjdkfsld")]
}
I have another collection Events:
{
_id: "2cd9fjdkfsld"
title: "Some Event Attended"
},
{
_id: "34dshfj29jg"
title: "Some Event NOT Attended"
}
I would like to return a list of all events being attended by a given user. However, I need to do this query from the Events collection as this is part of a larger query.
I have gone through the following questions:
$lookup on ObjectId's in an array - This question has the array as a local field; mine is foreign
MongoDB lookup when foreign field is an array of objects - The array is of objects themselves
MongoDB lookup when foreign field is an array
I have tried various ways of modifying the above answers to fit my situation but have been unsuccessful. The second answer from the third question gets me closest but I would like to filter out unmatching results rather than have them returned with a value of 0.
My desired output:
[
{
_id: "2cd9fjdkfsld"
title: "Some Event Attended"
},
]
One option would be like this:
db.getCollection('Events').aggregate({
$lookup: // join
{
from: "Users", // on Users collection
let: { eId: "$_id" }, // keep a local variable "eId" that points to the currently looked at event's "_id"
pipeline: [{
$match: { // filter where
"_id": ObjectId("5c6efc937ef75175b2b8e7a4"), // a specific user
$expr: { $in: [ "$$eId", "$attending" ] } // attends the event we're looking at
}
}],
as: "users" // push all matched users into the "users" array
}
}, {
$match: { // remove events that the user does not attend
"users": { $ne: [] }
}
})
You could obviously get rid of the users field by adding another projection if needed.

How to perform equality match for documents of single collection using lookup in mongodb?

I am new to MongoDB, and I am trying to do an equality match, but keep failing to do so. I have a "user" collection and it has following data:
{
uid:"1230"
age:20
Name:"Alex"
gender:"male"
interestedIn:"female"
}
{
uid:"1231"
age:23
Name:"Neil"
gender:"male"
interestedIn:"male"
}
{
uid:"1232"
age:20
Name:"Amy"
gender:"female"
interestedIn:"male"
}
What I want is to find the records whose "gender" is equal to the "interestedIn" of the current user and vice versa. i.e. when i am accessing as user "Alex", then the query should provide me the record of "Amy" only (as Amy's gender value is equal to the Alex's "interestedIn" value) and not that of Neil. and
When I am accessing as user "Neil", no data should return as there is no any user who is male as well as interested in male !
I tried to use the $lookup but its not working at all. And i am struggling to use $cond operator for this case. Here is my code:
// I am also using the geospatial functions (by hard coding the coordinates) but its not the problem here, and nothing is being returned.
db.users.aggregate([{
"$geoNear": {
"near": {
type: "Point",
coordinates: [31.9686, 99.90]
},
"spherical": true,
"distanceField": "distance",
"maxDistance": 500,
"query": {
uid: {
$ne: "1230"
},
age: {
$gt: 18,
$lt: 25
},
{
$lookup:
{
from: "users",
localField: "gender",
foreignField: "interestedIn",
as: "gender_docs"
}
},
limit: 5
}
},
{
"$project": {
location: 0
}
}
]).pretty()
Could someone please let me know how should I do it properly ?
// My expected output when current user is Alex
{
uid:"1232"
age:20
Name:"Amy"
gender:"female"
interestedIn:"male"
}
My expected output when current user is Neil should be blank as there are no any other record whose gender is male and interestedIn is male !
You don't need the aggregation framework and $lookup here. I expect you have the attributes of the current user (e.g. interestedIn) present in your app.
A simple query for your question would be:
db.users.find({
"gender": currentUserInterestedIn,
"_id": { "$ne": currentUserID },
"location": "$near": {
"type": "Point",
"coordinates": [currentUserLat, currentUserLong],
"$maxDistance": 500
},
})
Where currentUserInterestedIn should be the actual value of the current users interest, i.e. "male" or "female".
This assumes a users location is stored in a field named location.

mongodb check regex on fields from one collection to all fields in other collection

After digging google and SO for a week I've ended up asking the question here. Suppose there are two collections,
UsersCollection:
[
{...
name:"James"
userregex: "a|regex|str|here"
},
{...
name:"James"
userregex: "another|regex|string|there"
},
...
]
PostCollection:
[
{...
title:"a string here ..."
},
{...
title: "another string here ..."
},
...
]
I need to get all users whose userregex will match any post.title(Need user_id, post_id groups or something similar).
What I've tried so far:
1. Get all users in collection, run regex on all products, works but too dirty! it'll have to execute a query for each user
2. Same as above, but using a foreach in Mongo query, it's the same as above but only Database layer instead of application layer
I searched alot for available methods such as aggregations, upwind etc with no luck.
So is it possible to do this in Mongo? Should i change my database type? if yes what type would be good? performance is my first priority. Thanks
It is not possible to reference the regex field stored in the document in the regex operator inside match expression.
So it can't be done in mongo side with current structure.
$lookup works well with equality condition. So one alternative ( similar to what Nic suggested ) would be update your post collection to include an extra field called keywords ( array of keyword values it can be searched on ) for each title.
db.users.aggregate([
{$lookup: {
from: "posts",
localField: "userregex",
foreignField: "keywords",
as: "posts"
}
}
])
The above query will do something like this (works from 3.4).
keywords: { $in: [ userregex.elem1, userregex.elem2, ... ] }.
From the docs
If the field holds an array, then the $in operator selects the
documents whose field holds an array that contains at least one
element that matches a value in the specified array (e.g. ,
, etc.)
It looks like earlier versions ( tested on 3.2 ) will only match if array have same order, values and length of arrays is same.
Sample Input:
Users
db.users.insertMany([
{
"name": "James",
"userregex": [
"another",
"here"
]
},
{
"name": "John",
"userregex": [
"another",
"string"
]
}
])
Posts
db.posts.insertMany([
{
"title": "a string here",
"keyword": [
"here"
]
},
{
"title": "another string here",
"keywords": [
"another",
"here"
]
},
{
"title": "one string here",
"keywords": [
"string"
]
}
])
Sample Output:
[
{
"name": "James",
"userregex": [
"another",
"here"
],
"posts": [
{
"title": "another string here",
"keywords": [
"another",
"here"
]
},
{
"title": "a string here",
"keywords": [
"here"
]
}
]
},
{
"name": "John",
"userregex": [
"another",
"string"
],
"posts": [
{
"title": "another string here",
"keywords": [
"another",
"here"
]
},
{
"title": "one string here",
"keywords": [
"string"
]
}
]
}
]
MongoDB is good for your use case but you need to use a approach different from current one. Since you are only concerned about any title matching any post, you can store the last results of such a match. Below is a example code
db.users.find({last_post_id: {$exists: 0}}).forEach(
function(row) {
var regex = new RegExp(row['userregex']);
var found = db.post_collection.findOne({title: regex});
if (found) {
post_id = found["post_id"];
db.users.updateOne({
user_id: row["user_id"]
}, {
$set :{ last_post_id: post_id}
});
}
}
)
What it does is that only filters users which don't have last_post_id set, searches post records for that and sets the last_post_id if a record is found. So after running this, you can return the results like
db.users.find({last_post_id: {$exists: 1}}, {user_id:1, last_post_id:1, _id:0})
The only thing you need to be concerned about is a edit/delete to an existing post. So after every edit/delete, you should just run below, so that all matches for that post id are run again.
post_id_changed = 1
db.users.updateMany({last_post_id: post_id_changed}, {$unset: {last_post_id: 1}})
This will make sure that next time you run the update these users are processed again. The approach does have one drawback that for every user without a matching title, the query for such users would run again and again. Though you can workaround that by using some timestamps or post count check
Also you should make to sure to put index on post_collection.title
I was thinking that if you pre-tokenized your post titles like this:
{
"_id": ...
"title": "Another string there",
"keywords": [
"another",
"string",
"there"
]
}
but unfortunately $lookup requires that foreignField is a single element, so my idea of something like this will not work :( But maybe it will give you another idea?
db.Post.aggregate([
{$lookup: {
from: "Users",
localField: "keywords",
foreignField: "keywords",
as: "users"
}
},
]))