MongoEngine: How to collect embedded documents from referencing document - mongodb

I have a class Post which has a list of embedded document called "comments"
Here all i want to do is to retrieve latest comments for all the posts user posted.
How can i achieve that? My current code, i just loop though the 'Post' class for that user and manually collect "comment".
But I also want this to be sorted by recently added, so have sort function to loop over manually collected comments and re-sort.
This seems like very inefficient, so asking for advise. Thanks!

Firstly if you $push onto the list with an update then you will keep the comments in order.
You can use the $slice operator to return the last x comments eg:
Post.objects(id=xxx).fields(slice__comments=-5)
However, the schema may not be efficient especially if you keep growing the number of comments, or comments can be unpublished. In that case you may want to split comments out into their own Document Model and link the comments to the Post by id. This would be two round trips to the database but offers more flexibility - eg. you could filter on date and published.

Related

Nosql database design - MongoDB

I am trying to build an app where I just have these 3 models:
topic (has just a title (max 100 chars.))
comment (has text (may be very long), author_id, topic_id, createdDate)
author (has just a username)
Actually a very simple db structure. A Topic may have many comments, which are created by authors. And an author may have many comments.
I am still trying to figure out the best way of designing the database structure (documents). First I though to put everything to its own schema like above. 3 Documents. But since this is a nosql db, I should actually try to eliminate the needs for a join. And now I am really thinking of putting everything to a single document, which also sounds crazy.
These are my actually queries from ui:
Homepage query: Listing all the topics, which have received the most comments today (will run very often)
Auto suggestion list for search field: Listing all the topics, whose title contains string "X"
Main page of a topic query: Listing all the comments of a topic, with their authors' username.
Since most of my queries need data from at least 2 documents, should I really just use them all together in a single document like this:
Comment (text, username, topic_title, createdDate)
This way I will not need any join, but also save i.e. the title of topics multiple times.. in every comment..
I just could not decide.
I appreciate any help.
You can do the second design you suggested but it all comes down to how you want to use the data. I assume you’re going to be using it for a website.
If you want the comments to be clickable, in such that clicking on the topic name will redirect to the topic’s page or clicking the username will redirect to the user’s page where you can see all his comments, i suggest you keep them as IDs. Since you can later use .populate(“field1 field2”) and you can select the fields you would like to get from that ID.
Alternatively you can store both the topic_name and username and their IDs in the same document to reduce queries, but you would end up storing more redundant data.
Revised design:
The three queries (in the question post) are likely to be like this (pseudo-code):
select all topics from comments, where date is today, group by topic and count comments, order by count (desc)
select topics from comments, where topic matches search, group by topic.
select all from comments, where topic matches topic_param, order by comment_date (desc).
So, as you had intended (in your question post) it is likely there will be one main collection, comments.
comments:
date
author
text
topic
The user and topic collections with one field each, are optional, to maintain uniqueness.
Note the group-by queries will be aggregation queries, for example, the main query will be like this:
db.comments.aggregate( [
{ $match: { date: ISODate("2019-11-15") } },
{ $group: { _id: "$topic", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
] )
This will give you all the topics names, today and with highest counted topics first.
You could also take a bit different approach. Storing information redundant is not a bad thing in all cases.
1. Homepage query: Listing all the topics, which have received the most comments today (will run very often)
You could implement this as two extra fields in your Topic entity. One describing the last date a comment was added and the second to count the amount of comments added that day. By doing so you do not need to join but can write a query that only looks at the Topic collection.
You could also store these statistics independently of the other data and update it when required. Think of this as having a document that describes your database its current state (at least those parts relevant to you).
This might give you a time penalty on storing information but it improves reading times.
2. Auto suggestion list for search field: Listing all the topics, whose title contains string "X"
Far as I understand this one you only need the topic title. Meaning you can query the database once and retrieve all titles. If the collection grows so big this becomes slow you could trigger a refresh of the retrieval query that only returns a subset (a user is not likely to go through 100 possible topics).
3. Main page of a topic query: Listing all the comments of a topic, with their authors' username.
This is actually the tricky one. If this is really what it is you want to do then you are most likely best off storing all data in one document. However I would ask you: what is the problem making more than one query? I doubt you will be showing all comments at once when there are thousands (as you say). Instead of storing each in a separate document or throwing all in one document, you could also bucket them and retrieve only the 20 most recent ones (if you would create buckets of size 20). Read more about the bucket pattern here and update the ones shown when required.
You said:
"Since most of my queries need data from at least 2 documents, should I really just use them all together in a single document like this..."
I"ll make an argument from a 'domain driven design' point of view.
Given that all your data exists within the same bounded context (business domain). Then it is acceptable to encapsulate it all within the same document!

MongoDB, sort an array that is the value of a field of a document, with a slice on that array

I have a Mongo collection for profile comments, which is structured like this:
{
"_id": "",
"comments": []
}
The id refers to the ID of the user from the profiles collection. For improved server power, I only want to show 10 comments per page. To do this, I use Mongo's $slice operator. Here is my code for the query.
mongoCols.profileComments.findOne({"_id":doc._id},{comments:{$slice:[(comPage-1)*10,10]}},function(err,doc) {
Here's the problem. I want to show the comments in reverse order, meaning that the newest comments are on the first page, and latest comments are on the last. I thought of a few solutions to this but they aren't very efficient.
1) I could retrieve the entire array, then use JavaScript (I'm using nodejs for this) to sort that array, then only take the 10 elements that I want. This seems inefficient because I'm asking Mongo to retrieve what is potentially a ton of elements from an array, when I only need 10.
2) I could make each comment a separate document, with a field saying what user the comment is for. I could then only find documents where the comment was sent to the requested user, and use the skip and limit options to only retrieve the 10 documents I want. My problem with this is that Mongo will have to go through almost every single comment every time you request for a user's comments. This seems inefficient, but it is my best solution so far.
I would prefer to keep the structure I currently have, but if I need to change for it to work, then I will comply.

Determine which Mongo collection an document exists in?

Is there a way in Meteor/MongoDB to do a find to get the collection an document's _id exists in?
What I am trying to accomplish is to create a generic Comments framework for my app, where comments can be applied to several different document types that are saved in multiple Mongo collections. For instance, comments can be applied to Pages as well as Comments. What I need to do is save the comment, then modify the parent document. I can pass in the _id of the parent, but without strong typing I can't figure out if this is a Page or a Comment (or any other "commentable" type I might come up with.
One solution, I think, would be to store the "parent"'s ID in the comment, but I wanted to try to save an array of comments in the parent instead.

Where to extend collection documents with computed fields in Meteor?

We have information we need on the client which is computed on a document. Like for example the number of entries in an array.
More practically we have a document Workshop which helds an array of participants (user's _id). Now we want the Workshop.numberOfParticipants().
There is no need to transmit the whole array to the client, so where to calculate this value? Is it possible to add this value to the document "Workshop" as a field like the other data?
I like to circumvent the generation of a Template.workshop.numberOfParticipants().
One option for the future is MongoDB's oddly-named aggregation framework. Queries written against the aggregate API can return documents with calculated fields.
Meteor core doesn't support aggregate queries yet, but it's on the wishlist.
You'll need to publish a set of documents called NumParticipants and then add an observer that updates a count property or something similar when documents are added (and similarly reduces that property when docs are removed).
An example of how to do this is described in the documentation for publish.

MapReduce on child objects not embedded

I am having a problem with creating a mapreduce algorithm that will get me the stats i need. I have a user object that can create a post and a post can have many likes by other users.
User
--Post
----Likes
The Post is not embedded in the user because we access posts separately and not just in a user context. The stat I need is the number of likes an author has gotten and i need to get this through the likes of the posts of a user. The problem is that because the posts are not embedded, I cannot access them in my map function. Here are the map and reduce functions I currently have
def reputation_map
<<-MAP
function() {
var posts = db.posts.find({user_id:this._id});
emit(this._id, {posts:posts});
}
MAP
end
def reputation_reduce
<<-REDUCE
function(key, values) {
var count = 0;
while(values.hasNext()){
values.next();
count+=1;
}
return {posts:count};
}
REDUCE
end
This should only return the posts for each user so I have not even gotten to the likes level yet but instead of a count, this only returns a dbquery for posts. What is the correct way of doing this?
Map Reduce is really designed to operate on a single collection at a time.
Technically, it is possible to query a separate collection from inside a Map function as you have done, but take caution as this is not recommended nor supported. you may run into issues, especially if the collection is sharded.
A similar question was asked a while back: How to call to mongodb inside my map/reduce functions? Is it a good practice?
If you are aggregating results from multiple collections, you may find that the safest and most straight-forward way to do it is in the application.
Alternatively, if likes per author is a value that will be searched for with some frequency, it may be preferable to include it as a value in each document, and spend a little more overhead on each update to increment this value, rather than periodically performing a potentially resource-heavy calculation of all the votes per author.
Hopefully this will give you some food for thought for retrieving the values that you need to.
If you would like some assistance writing a Map Reduce operation for a single collection, the Community is here to help. Please include a sample input document, and a description of the desired output.
For more information on Map Reduce, the documentation may be found here:
http://www.mongodb.org/display/DOCS/MapReduce
Additionally, there are some good Map Reduce examples in the MongoDB Cookbook:
http://cookbook.mongodb.org/
The "Extras" section of the cookbook article "Finding Max And Min Values with Versioned Documents" http://cookbook.mongodb.org/patterns/finding_max_and_min/ contains a good step-by-step walkthrough of a Map Reduce operation, explaining how the functions are executed.