I am doing a lookup where the local field is an ObjectId and the foreign field is an array of ObjectId's. Performing the lookup gives me the error:
arguments to $lookup must be strings
I have done similar lookups where the foreign field is not an array (but is an ObjectId) so the error seems to be ambiguous. my database consists two collections: Song and Playlist. A Song can belong to many playlists. I am trying to write an aggregation that returns a matching song that contains an array of playlists the song belongs to:
Songs:
[
{
songName: "In Da Club",
_id: ObjectId(1)
},
{
songName: "Happy Birthday",
_id: ObjectId(2)
},
{
songName: "Ode to Joy",
_id: ObjectId(3)
}
]
Playlists:
[
{
_id: ObjectId(4)
playlistName: "PlaylistOne,
songs: [ObjectId(1), ObjectId(3)]
},
{
_id: ObjectId(5)
playlistName: "PlaylistTwo,
songs: [ObjectId(1)]
}
]
Desired outcome:
{
songName: "In Da Club",
_id: ObjectId(1),
playlists: [
{
_id: ObjectId(4),
playlistName: "PlaylistOne,
},
{
_id: ObjectId(5),
playlistName: "PlaylistTwo"
}
]
}
The query I tried:
db.songs.aggregate([
{
$match: {
songName: "In Da Club"
}
},
{
$lookup: {
from: 'playlists',
let: { songId: '$_id'},
pipeline: [
{
$match: {
$expr: {
{
$in: ["$$songId", "$songs"]
}
}
}
}
],
as: 'playlists'
}
}
])
It seems to be a relatively simple query and I'm not sure how I can get around the "arguments passed into lookup must be strings" error since my lookup is based on ObjectId's. Any help would be greatly appreciated! TIA!
Though $lookup is introduced in version 3.2 but couple of enhancements are done on it in sub-sequent updates :
As per docs :
Starting MongoDB 3.4, if the localField is an array, you can match the
array elements against a scalar foreignField without needing an
$unwind stage.
MongoDB 3.6, adds support for executing a pipeline on the joined
collection, which allows for specifying multiple join conditions as
well as uncorrelated sub-queries.
So the issue might be from mongodb version being low. Anyway for your requirement you can use $lookup which is used for single equality join as array is on foreignField.
{
$lookup: {
from: "playlists",
localField: "_id", // Scalar value
foreignField: "songs", // Against an array
as: "playlists"
}
}
Test : mongoplayground
Related
I apologize for the vague question description, but I have quite a complex question regarding filtration in MongoDB aggregations. Please, see my data schema to understand the question better:
Company {
_id: ObjectId
name: string
}
License {
_id: ObjectId
companyId: ObjectId
userId: ObjectId
}
User {
_id: ObjectId
companyId: ObjectId
email: string
}
The goal:
I would like to query all non-licensed users. In order to do this, you would need these plain MongoDB queries:
const licenses = db.licenses.find({ companyId }); // Get all licenses for specific company
const userIds = licenses.toArray().map(l => l.userId); // Collect all licensed user ids
const nonLicensedUsers = db.users.find({ _id: { $nin: userIds } }); // Query all users that don't hold a license
The problem:
The code above works perfectly fine. However, in our system, companies may have hundreds of thousands of users. Therefore, the first and the last step become exceptionally expensive. I'll elaborate on this. First things first, you need to fetch a big number of documents from DB and transmit them via the network, which is fairly expensive. Then, we need to pass a huge $nin query to MongoDB over the network again, which doubles overhead costs.
So, I would like to perform all the mentioned operations on the MongoDB end and return a small slice of non-licensed users to avoid network transmission costs. Are there ideas on how to achieve this?
I was able to come pretty close using the following aggregation (pseudo-code):
db.company.aggregate([
{ $match: { _id: id } }, // Step 1. Find the company entity by id
{ $lookup: {...} }, // Step 2. Joins 'users' collection by `companyId` field
{ $lookup: {...} }, // Step 3. Joins 'licenses' collection by `companyId` field
{
$project: {
licensesMap: // Step 4. Convert 'licenses' array to the map with the shape { 'user-id': true }. Could be done with $arrayToObject operator
}
},
{
$project: {
unlicensedUsers: {
$filter: {...} // And this is the place, where I stopped
}
}
}
]);
Let's have a closer look at the final stage of the above aggregation. I tried to utilize the $filter aggregation in the following manner:
{
$filter: {
input: "$users"
as: "user",
cond: {
$neq: ["$licensesMap[$$user._id]", true]
}
}
}
But, unfortunately, that didn't work. It seemed like MongoDB didn't apply interpolation and just tried to compare a raw "$licensesMap[$$user._id]" string with true boolean value.
Note #1:
Unfortunately, we're not in a position to change the current data schema. It would be costly for us.
Note #2:
I didn't include this in the aggregation example above, but I did convert Mongo object ids to strings to be able to create the licensesMap. And also, I stringified the ids of the users list to be able to access licensesMap properly.
Sample data:
Companies collection:
[
{ _id: "1", name: "Acme" }
]
Licenses collection
[
{ _id: "1", companyId: "1", userId: "1" },
{ _id: "2", companyId: "1", userId: "2" }
]
Users collection:
[
{ _id: "1", companyId: "1" },
{ _id: "2", companyId: "1" },
{ _id: "3", companyId: "1" },
{ _id: "4", companyId: "1" },
]
The expected result is:
[
_id: "1", // company id
name: "Acme",
unlicensedUsers: [
{ _id: "3", companyId: "1" },
{ _id: "4", companyId: "1" },
]
]
Explanation: unlicensedUsers list contains the third and the fourth users because they don't have corresponding entries in the licenses collection.
How about something simple like:
db.usersCollection.aggregate([
{
$lookup: {
from: "licensesCollection",
localField: "_id",
foreignField: "userId",
as: "licensedUsers"
}
},
{$match: {"licensedUsers.0": {$exists: false}}},
{
$group: {
_id: "$companyId",
unlicensedUsers: {$push: {_id: "$_id", companyId: "$companyId"}}
}
},
{
$lookup: {
from: "companiesCollection",
localField: "_id",
foreignField: "_id",
as: "company"
}
},
{$project: {unlicensedUsers: 1, company: {$arrayElemAt: ["$company", 0]}}},
{$project: {unlicensedUsers: 1, name: "$company.name"}}
])
playground example
users collection and licenses collection, both have anything you need on the users so after the first $lookup that "merges" them, and a simple $match to keep only the unlicensed users, all that left is just formatting to the format you request.
Bonus: This solution can work with any type of id. For example playground
If you're facing a similar situation. Bear in mind that the above solution will work fast only with the hashed index.
I have two collections:
cats
balls
"cats" collection has documents with key "ballId" of type string
"balls" collection has documents with key "_id" of type ObjectId
An $lookup inside an aggregation is able to retrieve results if the join is done on keys with the same data type. However in my case, "ballId" and "_id" are of different types. This code retrieves the cats but doesn't retrieve the related balls:
collection('cats').aggregate([
{ $match:{} },
{
$lookup: {
from: "balls",
localField: "ballId",
foreignField: "_id",
as: "balls"
}
}
]);
How can I use $lookup with lossy data type?
Use $lookup with pipeline stage.
Join both collections by converting balls' _id to string ($toString) and next compare both values as string ($eq).
db.cats.aggregate([
{
$match: {}
},
{
$lookup: {
from: "balls",
let: {
ballId: "$ballId"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
{
"$toString": "$_id"
},
"$$ballId"
]
},
}
}
],
as: "balls"
}
}
])
Sample Mongo Playground
Suppose, In MongoDB i have two collections. one is "Students" and the another is "Course".
Student have the document such as
{"id":"1","name":"Alex"},..
and Course has the document such as
{"course_id":"111","course_name":"React"},..
and there is a third collection named "students-courses" where i have kept student's id with their corresponding course id. Like this
{"student_id":"1","course_id":"111"}
i want to make a query with student's id so that it gives the output with his/her enrolled course. like this
{
"id": "1",
"name":"Alex",
"taken_courses": [
{"course_id":"111","course_name":"React"},
{"course_id":"112","course_name":"Vue"}
]
}
it will be many to many relationship in MongoDB without using ORM. How can i make this query?
Need to use $loopup with pipeline,
First $group by student_id because we are going to get courses of students, $push all course_id in course_ids for next step - lookup purpose
db.StudentCourses.aggregate([
{
$group: {
_id: "$student_id",
course_ids: {
$push: "$course_id"
}
}
},
$lookup with Student Collection and get the student details in student
$unwind student because its an array and we need only one from group of same student record
$project required fields
{
$lookup: {
from: "Student",
localField: "_id",
foreignField: "id",
as: "student"
}
},
{
$unwind: "$student"
},
{
$project: {
id: "$_id",
name: "$student.name",
course_ids: 1
}
},
$lookup Course Collection and get all courses that contains course_ids, that we have prepared in above $group
$project the required fields
course details will store in taken_courses
{
$lookup: {
from: "Course",
let: {
cId: "$course_ids"
},
pipeline: [
{
$match: {
$expr: {
$in: [
"$course_id",
"$$cId"
]
}
}
},
{
$project: {
_id: 0
}
}
],
as: "taken_courses"
}
},
$project details, removed not required fields
{
$project: {
_id: 0,
course_ids: 0
}
}
])
Working Playground: https://mongoplayground.net/p/FMZgkyKHPEe
For more details related syntax and usage, check aggregation
I'm fairly new to MongoDB and need help doing a select, or perhaps some sort of left join, on one collection based on another collection's data.
I have two collections, animals and meals, and I want to get the animal(s) that has had it's last registered meal after a certain date (let's say 20171001) to determine if the animal is still active.
collection animals:
{
name: mr floof,
id: 12345,
lastMeal: abcdef
},
{
name: pluto,
id: 6789,
lastMeal: ghijkl
}
collection meals:
{
id: abcdef,
created: 20171008,
name: carrots
},
{
id: ghijkl,
created: 20170918,
name: lettuce
}
So the expected output of the query in this case would be:
{
name: mr floof,
id: 12345,
lastMeal: abcdef
}
As Mr Floof has had his last meal 20171008, i.e. after 20171001.
Hope I was clear enough, but if not, don't hesitate to ask.
You can try below aggregation query.
db.animals.aggregate([ [
{
"$lookup": {
"from": "meals",
"localField": "lastMeal",
"foreignField": "id",
"as": "last_meal"
}
},
{
"$unwind": "$last_meal"
},
{
"$match": {
"last_meal.created": {
"$gt": 20171001
}
}
}
])
More info here.
You can use $project with exclusion after $match stage to format the response to exclude joined fields. Something like { $project: {"last_meal":0} }
MongoDB supports joins with $lookup , In your case you can use query like:-
db.animals.aggregate([
{
$lookup:
{
from: "meals",
localField: "lastMeal",
foreignField: "id",
as: "last_meal"
}
},
{
$match: {
"created" : {
$gt: "date" //your date format
}
}
}
])
thanks !
I have two collections
Posts:
{
"_Id": "1",
"_PostTypeId": "1",
"_AcceptedAnswerId": "192",
"_CreationDate": "2012-02-08T20:02:48.790",
"_Score": "10",
...
"_OwnerUserId": "6",
...
},
...
and users:
{
"_Id": "1",
"_Reputation": "101",
"_CreationDate": "2012-02-08T19:45:13.447",
"_DisplayName": "Geoff Dalgas",
...
"_AccountId": "2"
},
...
and I want to find users who write between 5 and 15 posts.
This is how my query looks like:
db.posts.aggregate([
{
$lookup: {
from: "users",
localField: "_OwnerUserId",
foreignField: "_AccountId",
as: "X"
}
},
{
$group: {
_id: "$X._AccountId",
posts: { $sum: 1 }
}
},
{
$match : {posts: {$gte: 5, $lte: 15}}
},
{
$sort: {posts: -1 }
},
{
$project : {posts: 1}
}
])
and it works terrible slow. For 6k users and 10k posts it tooks over 40 seconds to get response while in relational database I get response in a split second.
Where's the problem? I'm just getting started with mongodb and it's quite possible that I messed up this query.
from https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
foreignField Specifies the field from the documents in the from
collection. $lookup performs an equality match on the foreignField to
the localField from the input documents. If a document in the from
collection does not contain the foreignField, the $lookup treats the
value as null for matching purposes.
This will be performed the same as any other query.
If you don't have an index on the field _AccountId, it will do a full tablescan query for each one of the 10,000 posts. The bulk of the time will be spent in that tablescan.
db.users.ensureIndex("_AccountId", 1)
speeds up the process so it's doing 10,000 index hits instead of 10,000 table scans.
In addition to bauman.space's suggestion to put an index on the _accountId field (which is critical), you should also do your $match stage as early as possible in the aggregation pipeline (i.e. as the first stage). Even though it won't use any indexes (unless you index the posts field), it will filter the result set before doing the $lookup (join) stage.
The reason why your query is terribly slow is that for every post, it is doing a non-indexed lookup (sequential read) for every user. That's around 60m reads!
Check out the Pipeline Optimization section of the MongoDB Aggregation Docs.
First use $match then $lookup. $match filter the rows need to be examined to $lookup. It's efficient.
as long as you're going to group by user _AccountId, you should do the $group first by _OwnerUserId then lookup only after filtering accounts having 10<postsCount<15 this will reduce lookups:
db.posts.aggregate([{
$group: {
_id: "$_OwnerUserId",
postsCount: {
$sum: 1
},
posts: {
$push: "$$ROOT"
} //if you need to keep original posts data
}
},
{
$match: {
postsCount: {
$gte: 5,
$lte: 15
}
}
},
{
$lookup: {
from: "users",
localField: "_id",
foreignField: "_AccountId",
as: "X"
}
},
{
$unwind: "$X"
},
{
$sort: {
postsCount: -1
}
},
{
$project: {
postsCount: 1,
X: 1
}
}
])