How to balance quickness and redundancy in MongoDB data structures? - mongodb

I am creating a MongoDB database with a users collection (with UserFiles in it) and a posts collection. Each post has tags and sharedFrom fields in it. I eventually plan to have users' search results influenced by what tags they normally post about and from which other users they often share posts. Would it be better to:
make a field in the UserFile document of each user that lists the post IDs made by the user?
make a field in the UserFile that documents that lists all the tags they have used and other users that they have sharedFrom?
make the search function look up the searchers activity that then influences the search results?
something I haven't thought of?

Related

Firestore security rule to limit reads to use collectionGroup(..)?

Several questions address whether knowing a Firestore uid allows hackers to edit that person's data, like this question and this question. My question is about security rules to filter when users can read another's data.
Specifically, I have a social media app that allows people to post data anonymously. My data model is /users/{user}/posts/{post}. I use db.collectionGroup("posts") to build a timeline of posts, (some anonymous, others with users' names).
Posts that are not anonymous have a valid uid, so it wouldn't be tough for a hacker to figure out someone's uid, which I'm not concerned about. My concern is whether a hacker could then query usersRef.document(uid).posts.getDocuments(); to get all the posts of that user, including the anonymous ones?
Because my app builds timelines from users "posts" collection, I can't write a rule that they can't read another user's posts. Can I write a rule that they can only read posts with collectionGroup?
That's not going to be possible with the way things are structured now. Here's the way you write a rule to allow collection group queries, as described in the documentation
match /{path=**}/posts/{post} {
allow read: if ...condition...;
}
The path wildcard in the rule explicitly allows all reads for all collections named "posts". The rule does not limit the reads to only collection group queries - any normal collection query on any "posts" will be allowed.
Bear in mind also that a collection group query would not hide any data from the caller compared to a normal collection query. The query results will still contain a reference to the full path of each document, which includes the document uid in the path.

MongoDB design decision for Documents

I'm building an API for a small social network and I came across a design decision that I have to make. I'm working with Express and MongoDB with mongoose to deal with the database.
I have two Documents: Users and Posts. I want the Users to be able to mark Posts as their favorites. I came up with two different ways for the Implementation:
Option A: Saving the favorites in the User Document. It makes it easy to show all favorite posts of an user. But how would I query the users, that have favorited a specific Post?
UserSchema:
favorite_posts: [
{
type: mongoose.Schema.Types.ObjectId,
ref: "posts"
}
]
Option B: Saving the Users, that hit the favorite button in the Post Document. The benefit would be, that you can easily display all Users, that have favorited a Post. But how do I list all Posts that one specific User has marked as favorites.
PostSchema:
users_favorited: [
{
type: mongoose.Schema.Types.ObjectId,
ref: "users"
}
]
Can somebody explain me how to query such things? I'm not getting smarter from the documentation... :(
As already mentioned in the comments your best bet would be a join-table to make a n:m relation work. Mongoose does emulate the sql inner-join functionality through the populate() functionality in regular queries or the $lookup-step in an aggregation. So basically create a table called "likes" that only holds refs to the user and the post. Using the aggregation framework, you can then easily query for all likes of a user or all likes on a post by first using the $match operator, then $group by either the user or the post and $push to create an array of all likes of a user or vice versa and then join the needed data on it using the $lookup step.
However, you could, as you've decribed, put all the favorites in a array on either the user- or the post-documents, but unless you know for sure that these arrays won't grow large, I'd recommend against it, as mongoDb is not designed for this kind of usage and you'll very quickly run into performance problems. See http://www.askasya.com/post/largeembeddedarrays/ for more.
If you are gonna query a lot by userid, you can just add a userid column on the favorites document. This would save queries/joins/aggregations

Twitter like network using graph theroy and MongoDB

My application consists of users posting content and interacting with each others. It's a twitter like containing a news feed populated from the persons they follow.
I would like to display the posts from the direct followers, and optionally going deeper (followers of followers, etc...).
What king of implementation, using MongoDB, would you suggest ?
It seems like a good candidate for graphs, using the graphLookup aggregation stage. I thought about using 3 collections :
users : containing users name, phone number, etc...
relationships : this collection would contains every user's relationships
posts : containing posts
The idea is to have one collection modelling relationships (relationships), and use it to select any other content from a person's network. In the example it's about posts, but some more content could be added in the future (coming from other collections). The goal is to keep separated the relationships from the other collections.
I'm looking for an efficient way to do this king of queries using MongoDB.

Posts, tags and mongoose in Express JS - or any other language for that matter

I'm trying to figure out the best approach for tagging posts in my express application. There are two types of post, say 'Phones' and 'Tablets'. They both can share tags but require different Models to access them (this wont change).
I opened up Wordpress to see how it handles tags, but there is a lot of replica data in the DB and I don't feel this is right for my application.
Should I store tags as a String with a delimiter and query it within the post? Or should I create a new table for those tags that has a post ID associated with the list of tags so that when I search I only have to search that given table, rather than two different ones?
Thanks
As long as document will not exceed 16MB I will keep tags inside a document as an array field.
Then I will create an index on tags field - to have an easy way to display documents containing specific tag (mongo will index all array entries and provide fast search).

How to model a "checkin" style object in MongoDB?

I am making a new application in MongoDB, and I have found that the Document-oriented modeling style fits all of my models quite well.
However, my one stumbling block is a "CheckIn" style action. Users can check in at a location, and I need to store the following for each check in:
User ID
Place ID
Date of checkin
Now normally I'd just store this under the User document as an embed, but I frequently will want to ask the following questions:
Where are all places a user has checked in?
What are all checkins that have happened at a certain place?
All checkins for a given user-place combo?
All checkins for a user or place in a specific time frame?
In a relational database this screams has-many through, but in Mongo that's not such an obvious relation. Should I just make Checkin a top-level object and take the performance hit of the join-style query? I might also need to add fields to the checkin object over time, so I want to keep the solution flexible.
Yes. If you embed checkins as an array within the user document, then the query "10 most recent checkins for a place" will be nearly impossible. Same if you embed in place, "10 most recent checkins for user" will be very hard. So make checkins its own collection.
If you index both userid and placeid in the checkins collection your queries should be fast enough. For example, to find user Jesse's most recent checkins, look up users by name to find Jesse's _id, and query checkins for that userid. It's just two queries. Same for a place's most recent checkins.
If you query the most recent checkins for a place and want the users' names, you can first query the checkins collection to get the list of userids, and use an $in query to get all the user documents. Again, it's just two queries, and both are fully indexed if you create the proper indexes.