MongoDB Making my schema structure less relational - mongodb

I've been chugging along building my mongodb but I just realised I haven't actually taken the time to think of the best strategy for that and I might be misusing the nosql structure and building a more traditional relational structure. I'm building a community forum and so far the collections I have are as follows:
User - stores all user settings/data such as email, name, password, date joined, email/notification preferences, etc..
Profile - stores handle, gender, user location, forum rank, interests and then several arrays containing id's of things like an array of follower id's, array of post ids, array of upload id's, array of club id's, array of posts the user has liked, etc.
Posts - stores comment data, creator user id, category and then has an array of id's to uploaded files and an array of user id's for likes.
Uploads - GridFS schema to use when uploading files
What I'm now realising that all these arrays of id's of things in other collections are behaving a lot more like a relational db, especially the Profile schema which is basically just a collection of id's to other collections. Can you give any advise on the type of db I'm creating and how to improve it? For example, should I have a single User schema which contains all posts and profile data directly inside it rather than storing id's to a separate schema in a different collection? As I'm using this project to learn, I'd really like to continue using mongodb rather than moving to something like MySQL.

As Akrion eluded too in the comment above, the first step was to combine the User and Profile schemas. The other thing that I wasn't thinking about before was what information gets called together. The new UserProfile schema shouldn't contain things like comments/posts/uploads/likes as those things will be better off existing as part of the Post document. My problem was thinking that having a record of those things in my UserProfile would optimise the retrieval of those items when I needed them but I just ended up bloating my UserProfile schema. It's no different searching the posts collection by creator rather than by an ID taken from the UserProfile document. The Upload documents are now no longer GridFS schemas but are instead records of files in the file system but that is less relevant to my original question.

Related

Database design for queries that has tons of sql-like join

I have a collection named posts consisting of multiple article posts and a collection called users which has a lot of user info. Each post has a field called author that references the post author in the users collection.
On my home page I will query the posts collection and return a list of posts to the client. Since I also want to display the author of the post I need to do sql-like join commands so that all the posts will have author names, ids,...etc.
If I return a list of 40 posts I'd have to do 40 sqllike-joins. Which means each time I will do 41 queries to get a list of posts with author info. Which just seems really expensive.
I am thinking to store the author info at the time I am storing the post info. This way I only need to do 1 query to retrieve all posts and author info. However when the user info changes (such as name changes) the list will be outdated and it seems not quite easy to manage lists like this.
So is there's a better or standard approach to this?
p.s: I am using mongodb
Mongo is NoSQL DB. By definition, NoSQL solutions are meant to be denormalized(all required data should be located at a same location)
In your example, relationship between authors and posts is one to many but ratio of authors as compared to posts is very small. In simple words, no. of authors as compared to no. of posts will be very small.
Based on this, you can safely store author info in posts collection.
If you need to query posts collection i.e. if you know your most queries will be executed on posts collection then it makes sense to store author in posts. It wont take huge space to store one attribute but it will make huge difference in query performance and easiness to code/retrieve the data.

Mongodb - most efficient way to structure my db?

I'm a bit new to mongodb and I'm trying to setup a simple server where I will have users, posts, comments, like and dislikes, among some things. What I'm wondering is which way this should be setup most efficiently?
Should I have one table for likes where I add userId and postId (more or less same for the dislike and comments table)
Or would it be better if likes, dislikes and comments are parts of the post? Like:
//Post structure
{
"_id":"kljflskds",
"field1":"content",
"field2":"content",
"likes":[userId,userId,userId],
"dislikes":[userId,userId,userId],
"comments":[{comment object},{comment object},{comment object}]
}
Because for each post when I retreive them I would like to know how many likes it has, how many dislikes and how many comments. With the first version I would either need to multiple queries on the server(unnecessary processor power?) or on the phone(unnecessary bandwidth). But the second would only need one query. I believe the second option with having comments as a part of the posts seems more efficient, but I'm not a pro so I'd like to hear what other people think of this?
As has been pointed out, there are no tables in a document-oriented database. What you'll also find is that unlike a relational database where there is often a 'right way' to structure the database, the same is not true with MongoDB. Your schema should be structured based on how you're going to access the information most regularly. Documents are extremely flexible, unlike rows in tables.
You could create a comments collection or have them directly in the post documents. Two considerations would be: 1. Will you need to access the comments without accessing the post? and 2. Are your documents going to get too big and unwieldy?
In both of these cases with your blog, it most likely would be better to nest the comments as most of your traffic will be searching for posts, and you'll be pulling all of the comments related to the post. Also, a comment will not be owned by multiple tables; besides, MongoDB isn't meant to be denormalized like a relational database, so having duplicate information in multiple documents (i.e. tag names, city names, etc.) is normal.
Also, having a collection for likes is a very 'relational' way of thinking. In MongoDB, I can't think of a use case where you'd want a likes collection. When you're coming from the relational world, you really have to step back and rethink how you're creating your database because you'll be constantly fighting it otherwise.
With only two collections, posts and users, getting the information that you're looking for would be trivial, as you can just get the count of the likes and comments and they're all right there.

What is the correct noSQL collection structure for the following case?

As someone who got used to thinking in relational terms, I am trying to get a grasp of thinking in the "noSQL way".
Assume the following scenario:
We have a blog (eg. 9gag.com) with many posts and registered users. Every post can be liked by each user. We would like to build a recommendation engine, so we need to track:
all posts viewed by a user
all posts liked by a user
Posts have: title, body, category. Users have: username, password, email, other data.
In a relational DB we would have something like: posts, users, posts_users_views (post_id, users_id, view_date), posts_users_likes (post_id, user_id, like_date).
Question
What is the "right" structure would be in a document/column oriented noSQL database?
Clarification: Should we save an array of all viewed/liked post ids in users (or user ids in posts)? If so, won't we have a problem with a row size getting huge?
In CouchDB you could have separate documents for the user, post, view and like. Showing the views/likes by user can be arranged by the "view" (materialized map/reduce query) with map function emitting an array key [user_id, post_id]. As the result you will get the sorted dictionary (ordered lexicographically by the key), so taking all the views per user='ID' is the query with keys starting from [ID] to [ID,{}]. You can optimize it, but the basic solution is very simple.
In CouchDB wiki there is a comment on using relationally modeled design and view collation mechanism (which can substitute some simple joins). To get some intuition I rather advice to study the problems of post and comments, which is also very simple, but not that much trivial as view and likes :)
There may be no NoSQL way, but I think most of the map/reduce systems share similar type of thinking. The CouchDB is a good tool to start, because it is very limited :) It is difficult to do any queries inefficient in distributed environment and its map and reduce query functions cannot have side effects (they are generating the materialized view, incrementally when the document set is changed, and the result should not depend on the order of document updates).

Sharing a document with users

I have to choose a database for implementing a sharing system.
My system will have users and documents. I have to share a document with a few users.
Example:
There are 2 users, and there is one document.
So if I have to share that one document with both the users, I could do these possible solutions:
The current method I'm using is with MySQL (I don't want to use this):
Relational Databases (MySQL)
Users Table = user1, user2
Docs Table = doc1
Docs-User Relation Table = doc1, user1
doc1, user2
And I would like to use something like this:
NoSQL Document Stores (MongoDB)
Users Documents:
{
_id: user1,
docs_i_have_access_to: {doc1}
}
{
_id: user2,
docs_i_have_access_to: {doc1}
}
Document's Document:
{
_id: doc1
members_of_this_doc: {user1, user2}
}
And I don't yet know how I would implement in a key-value store like Redis.
So I just wanted to know, would the MongoDB way I have given above, the best solution?
And is there any other way I could implement this? Maybe with another database solution?
Should I try to implement it with Redis or not?
Which database and which method should I choose and will be the best to share the data and why?
Note: I want something highly scalable and persistent. :D
Thanks. :D
Actually, you need to represent a many-to-many relationship. One user can have several documents. One document can be shared among several users.
See my previous answer to this question: how to have relations many to many in redis
With Redis, representing relationship with the set datatype is a pretty common pattern. You can expect to get better performance than with MongoDB for this kind of data model. And as a bonus, you can easily and efficiently find which users have a given list of documents in common, or which documents are shared by a given set of users.
Considering only this simple example (you just need to keep who owns what) SQL seems to be the most appropriate, as it will give additional options for free, such as reporting who has how many docs, the most popular documents, most active user etc with almost zero cost + the data will be more consistent (no duplication, possibly foreign keys). This is valid unless you have millions of documents of course.
If I chose between document-oriented and relational DB, I'd make a decision based mostly on the structure of the document itself. Whether they're all uniform or may have different fields for different types, do you nested sub-documents or arrays with the ability to search by their contents.

Storing secondary documents in MongoDB doc

Say User2 has these objects of data, they are stored in User2.objects as an array of objects.
User2:{objects:[{object1},{object2},{object3}]}
If anyone other then User2 needs to query this data, like User1 needs any of those objects that pertain to them. Then it should be broken out into it's own collections of in MongoDB db right?
Because say a User1 wants to find every object like that are part of. They would need to have a reference to all the User2s they created data with then look through every object in each user, and return the one they need.
I should break those objects out into their own collections? Then I can just index user ids and each user can just query once for their own id.
Sorry if this Q is confusing I'm a little lost.
It appears as though there may be some confusion between Mongo Document structure and using authentication with MongoDB.
The documentation on how to set up user authentication for a Mongo Database is here:
http://www.mongodb.org/display/DOCS/Security+and+Authentication
If User2 needs to run a query on a collection that was created by User1, then User2 must have an account with the Database where that collection resides, and must be properly authenticated.
The example document provided is also a little confusing. It is a better idea to use key names that will be the same across all documents. For example:
{userName:"user1", name:"Marc"},
{userName:"user2", name:"Jeff"},
{userName:"user3", name:"Steve"}
is preferable to
{user1:"Marc"},
{user2:"Jeff"},
{user3:"Steve"}
In the second example, the username (user1, user2, etc) will have to be known in order to find out the name of the user. MongoDB does not support wildcards in queries.
The following document structure would be preferable:
{
user: "User2",
objects:[object1,object2,object3]
},
{
user: "User1",
objects:[object1,object2,object3]
}
All of the objects created by user1 could be retrieved with the following query:
> db.<your collection name>.find({user: "User1"}, {objects:1})
For more information on the MongoDB document structure, I recommend reading the following:
http://www.mongodb.org/display/DOCS/Schema+Design - A great introduction to the way data is stored in MongoDB, including example documents, best practices, and an introduction to indexing.
Hopefully the above will put you on the right track in terms of deciding on a schema for your collection and creating users and setting permissions. Authentication is one of MongoDB's more advanced features, so I would begin by focusing on building an efficient schema and organizing your data correctly before worrying about authentication.
If you have any additional questions about these topics, or anything else MongoDB-related, the Community is here to help! Good Luck!