How to join two collection in mongo without lookup - mongodb

I have two collection, there name are post and comment.
The model structure is in the following.
I want to use aggregation query post and sort by comments like length sum, currently I can query a post comments like length sum in the following query statement.
My question is how can I query post and join comment collection in Mongo version 2.6. I know after Mongo 3.2 have a lookup function.
I want to query post collection and sort by foreign comments likes length. Is it have a best way to do this in mongo 2.6?
post
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
}
comment
/* 1 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello world",
"like": [
"2"
]
}
/* 2 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello stackoverflow",
"like": [
"1",
"2"
]
}
Query a post comments like sum
db.getCollection('comment').aggregate([
{
"$match": {
post_id: "5a39e22c27308912334b4567"
}
},
{
"$project": {
"likeLength": {
"$size": "$like"
},
"post_id": "$post_id"
}
},
{
"$group": {
_id: "$post_id",
"likeLengthSum": {
"$sum": "$likeLength"
}
}
}
])

There is no "best" way to query, as it'll really depend on your specific needs, but... you cannot perform a single query across multiple collections (aside from the $lookup aggregation pipeline function in later versions, as you already are aware).
You'll need to make multiple queries: one to your post collection, and one to your comment collection.
If you must perform a single query, then consider storing both types of documents in a single collection (with some identifier property to let you filter on either posts or comments, within your query).

There is no other way to join collections in the current MongoDB v6 without $lookup,
I can predict two reasons that causing you the issues,
The $lookup is slow and expensive - How to improve performance?
$lookup optimization:
Follow the guideline provided in the documentation
Use indexs:
You can use the index on the reference collection's fields, as per your sample data you can create an index for post_id field, an index for uid field, or a compound index for both the fields on the basis of your use cases
You can read more about How to Improve Performance with Indexes and Document Filters
db.comment.createIndex({ "post_id": -1 });
db.comment.createIndex({ "uid": -1 });
// or
db.comment.createIndex({ "post_id": -1, "uid": -1 });
Document Filters:
Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline
You can refer to the documentation for more detailed examples
{ $skip: 0 },
{ $limit: 10 } // as per your use case
Limit the $lookup result:
Try to limit the result of lookup by $limit stage,
Try to coordinate or balance with improved query and the UI/Use cases
You want to avoid $lookup - How to improve the collection schema to avoid $lookup?
Store the analytics/metrics:
If you are trying to get the total counts of the comments in a particular post then you must store the total count in the post collection whenever you get post get a new comment
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10
}
Store minimum reference data:
If you want to show the comments of a particular post, you can limit the result for ex: show 5 comments per post
You can also store a max of 5 latest comments in the post collection to avoid the $lookup, whenever you get the latest comment then add it and just remove the oldest comment from 5 comments
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10,
"comments": [
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"comment": "hello world"
},
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"comment": "hello stackoverflow"
}
]
}
Must read about Reduce $lookup Operations
Must read about Improve Your Schema

Related

Trying to fetch data from Nested MongoDB Database?

I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.

Querying the most recent posts in a MongoDB collection

Rather new to Mongodb/Mongoose/Node. Trying to make a query to retrieve the most recent posts (example being the 10 most recent posts) across all documents in a collection.
I tried querying this a few different ways.
MessageboardModel.find({"posts": {"time": {"$gte": ISODate("2014-07-02T00:00:00Z")}}} ...
I tried doing the above just to try getting to the proper nested time property, but everything I was trying throws an error. I'm definitely missing something here...
Here is an example document in the collection:
{
"_id": {
"$oid": "5c435d493dcf9281500cd177"
},
"movie": 433249,
"posts": [
{
"replies": [],
"_id": {
"$oid": "5c435d493dcf9281500cd142"
},
"username": "Username1",
"time": {
"$date": "2019-01-19T17:24:25.204Z"
},
"post": "This is a post title",
"content": "Content here."
},
{
"replies": [],
"_id": {
"$oid": "5c435d493dcf9281500cd123"
},
"username": "Username2",
"time": {
"$date": "2019-01-12T17:24:25.204Z"
},
"post": "This is another post made earlier",
"content": "Content here."
}
],
"__v": 0
}
There are many documents in the collection. I want to get, say the most recent 10 posts, across all of the documents in the entire collection.
Any help?
You can try using aggregation query:
Steps:
1> Match Specific doc
2> Stretch docs of its array using $unwind.
3> Sort using the time field from the posts.
4> Select fields , if specific fields needs to be shown.
5> Add limit, how many docs you want.
<YOUR_MODEL>.aggregate([
{$match:{
"movie": 433249 //you may add find conditions here, otherwise you can keep {} or remove $match from here
}},
{$unwind:"$posts"}, //this will make the each array element with different different docs.
{$sort:{"posts. time":1}}, // sort using the date field now, depends on your requirement use -1 /1
{$project:{posts:1}}, //select docs only from posts field. [u can remove if you want every element, or may modify]
{$limit:10} //you want only last 10 posts
]).exec();
let me know if you still having any issue or getting any error.
would love answer.

search in mongodb embedded records

We have a mongodb document as given below, and we configured text index on messageTopic, messageTopicQuestion and answer fields, if i search with a text string then I expect only matched embedded records in the results not the entire document.
For example in below document if i search with word "private", then results should only return the first embedded document not both the records. How to retrieve only matched embedded documents and exclude unmatched ones.
{
"_id": ObjectId("586e8efdde81e56032000084"),
"messageTopic": "My Private",
"messageText": [{
"messageTopicQuestion": "agent private",
"answer": "agent private",
"_id": ObjectId("586e8efdde81e56032000085"),
"keywords": ["private"]
}, {
"messageTopicQuestion": "Greetings Checking",
"answer": "Heloo I am good What about u",
"_id": ObjectId("586fc80ccced739407000f4e"),
"keywords": ["Hi-Good", "Heloo"]
}],
"__v": 3
}
I am using below script
db.getCollection('messagetemplates').aggregate([{
$match: {
$text: {$search: 'private'},
visible: 'PUB'
}
},{ $sort: { score: { $meta: "textScore" } } }])
Appreciate help. Thanks.
I believe the question is a variation of this problem How to get a specific embedded document inside a MongoDB collection?
The issue is how to get the single embedded document and exclude the rest. My suggestion is to use db.collection.find() instead of aggregation.
Something in that sense
db.collection.find({ 'messageText.keyword': 'private' }, {'messageText.$': 1});
, as indicated by the answer above.
messageText.keyword can be replaced with whichever field you want to be searched.
I can confirm that the scenario works on my database.

how to restrict $push in mongodb?

I am learning mongodb and wondering if can I restrict push by matching values.
For example:
field1 = {
id:123,
title:123,
likes: [{by:1,type:'like'}, {by:2, type:'like'}]
}
Can I restrict push by id in likes?
What you may have already tried was the $addToSet operator, but then found out it does not suit the case here as the combination of "id" and "type" can possibly vary. For instance what you don't want is the same "id" value with both types "like" and "dislike".
This is however a typical "voting" model, and the current structure is not the best one. A better model for this is as so, with the basic fields just for example:
{
"_id": 123,
"likeCount": 2,
"dislikeCount": 0,
"likes": [456,789]
"dislikes": []
}
Having seperate arrays is important to the atomic update process, since you cannot both $pull and $push from an array. But more than that, as it re-enforces the logic behind keeping the "count" values, as this is useful for simple queries as sorting as opposed to calculating array length.
In order to post a "like" for a user who you don't want to duplicate in the array, the $addToSet operator is still not be best one despite the values now being truly unique. You want to contrain the "count" as well, so add the conditions to the query in the update instead:
db.collection.update(
{ "_id": 123, "likes": { "$ne": 456 } },
{
"$push": { "likes": 456 },
"$inc": { "likeCount": 1 }
}
)
That way, if the user has already voted their "like" then not only is nothing added but the "count" is kept at the correct total as well. Basically the query condition on the update was not met as there already was an element in the array matching that value. So the document does not match and nothing is updated.
That is a good approach, but we can make that better still. What if the user already posted to "dislike" and now changes their mind to "like" instead? What you really need here are "two" update statements to cover the possible conditions, and this is where the Bulk Operations API comes in, to handle that logic in a single request:
var bulk = db.collection.initializeOrderedBulkOp();
// match and update where a dislike is present
bulk.find({
"_id": 123,
"likes": { "$ne": 456 },
"dislikes": 456
}).updateOne({
"$push": { "likes": 456 },
"$pull": { "dislikes": 456 }
"$inc": {
"likeCount": 1,
"dislikeCount": -1
}
});
// match and update where no dislike exists
bulk.find({
"_id": 123,
"likes": { "$ne": 456 },
"dislikes": { "$ne": 456 }
}).updateOne({
"$push": { "likes": 456 },
"$inc": { "likeCount": 1 }
});
// Send requests to server and respond
bulk.execute();
In this case if the first statement did not match because there was no dislike then nothing would be updated, but if there was a dislike then the correct adjustments would be made.
With the second request, this one would be applied if there was nothing in the dislikes array to match and there was also not a matching item in the likes array. So this would apply for a new vote and also does not conflict with the previous statement. Despite the two statements, the upadte is only ever applied once or not at all depending on the state conditions.
That is the basic pattern for handling this kind of voting properly, as you keep lists of each vote type as well as maintaining the counts for ease of access. The "dislikes" process is pretty much just the reverse of the logic for the elements you need to check for, and removing votes has similar conditions as well.

Querying MongoDB (Using Edge Collection - The most efficient way?)

I've written Users, Clubs and Followers collections for the sake of an example the below.
I want to find all user documents from the Users collection that are following "A famous club". How can I find those? and Which way is the fastest?
More info about 'what do I want to do - Edge collections'
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA"
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}
Followers collection
{
"_id": "159",
"user_id": "1",
"club_id": "12"
}
PS: I can get the documents using Mongoose like the below way. However, creating followers array takes about 8 seconds with 150.000 records. And second find query -which is queried using followers array- takes about 40 seconds. Is it normal?
Clubs.find(
{ club_id: "12" },
'-_id user_id', // select only one field to better perf.
function(err, docs){
var followers = [];
docs.forEach(function(item){
followers.push(item.user_id)
})
Users.find(
{ _id:{ $in: followers } },
function(error, users) {
console.log(users) // RESULTS
})
})
There is no an eligible formula to manipulate join many-to-many relation on MongoDB. So I combined collections as embedded documents like the below. But the most important taks in this case creating indexes. For instance if you want to query by followingClubs you should create an index like schema.index({ 'followingClubs._id':1 }) using Mongoose. And if you want to query country and followingClubs you should create another index like schema.index({ 'country':1, 'followingClubs._id':1 })
Pay attention when working with Embedded Documents: http://askasya.com/post/largeembeddedarrays
Then you can get your documents fastly. I've tried to get count of 150.000 records using this way it took only 1 second. It's enough for me...
ps: we musn't forget that in my tests my Users collection has never experienced any data fragmentation. Therefore my queries may demonstrated good performance. Especially, followingClubs array of embedded documents.
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA",
"followingClubs": [ {"_id": "12"} ]
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}