I'm approaching the noSQL world.
I studied a little bit around the web (not the best way to study!) and I read the Mongodb documentation.
Around the web I wasn't able to find a real case example (only fancy flights on big architectures not well explained or too basic to be real world examples).
So I have still some huge holes in my understanding of a noSQL and Mongodb.
I try to summarise one of them, the worst one actually, here below:
Let's imagine the data structure for a post of a simple blog structure:
{
"_id": ObjectId(),
"title": "Title here",
"body": "text of the post here",
"date": ISODate("2010-09-24"),
"author": "author_of_the_post_name",
"comments": [
{
"author": "comment_author_name",
"text": "comment text",
"date": ISODate("date")
},
{
"author": "comment_author_name2",
"text": "comment text",
"date": ISODate("date")
},
...
]
}
So far so good.
All works fine if the author_of_the_post does not change his name (not considering profile picture and description).
The same for all comment_authors.
So if I want to consider this situation I have to use relationships:
"authorID": <author_of_the_post_id>,
for post's author and
"authorID": <comment_author_id>,
for comments authors.
But MongoDB does not allow joins when querying. So there will be a different query for each authorID.
So what happens if I have 100 comments on my blog post?
1 query for the post
1 query to retrieve authors informations
100 queries to retrieve comments' authors informations
**total of 102 queries!!!**
Am I right?
Where is the advantage of using a noSQL here?
In my understanding 102 queries VS 1 bigger query using joins.
Or am I missing something and there is a different way to model this situation?
Thanks for your contribution!
Have you seen this?
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
It sounds like what you are doing is NOT a good use case for NoSQL. Use relational database for basic data storage to back applications, use NoSQL for caching and the like.
NoSQL databases are used for storage of non-sensitive data for instance posts, comments..
You are able to retrieve all data with one query. Example: Don't care about outdated fields as author_name, profile_picture_url or whatever because it's just a post and in the future this post will not be visible as newer ones. But if you want to have updated fields you have two options:
First option is to use some kind of worker service. If some user change his username or profile picture you will give some kind of signal to your service to traverse all posts and comments and update all fields his new username.
Second option use authorId instead of author name, and instead of 2 query you will make N+2 queries to query for comment_author_profile. But use pagination, instead of querying for 100 comments take 10 and show "load more" button/link, so you will make 12 queries.
Hope this helps.
Related
I am more used to MySQL but I decided to go MongoDB for this project.
Basically it's a social network.
I have a posts collection where documents currently look like this:
{
"text": "Some post...",
"user": "3j219dj21h18skd2" // User's "_id"
}
I am looking to implement a replies system. Will it be better to simply add an array of liking users, like so:
{
"text": "Some post...",
"user": "3j219dj21h18skd2", // User's "_id"
"replies": [
{
"user": "3j219dj200928smd81",
"text": "Nice one!"
},
{
"user": "3j219dj2321md81zb3",
"text": "Wow, this is amazing!"
}
]
}
Or will it be better to have a whole separate "replies" collection with a unique ID for each reply, and then "link" to it by ID in the posts collection?
I am not sure, but feels like the 1st way is more "NoSQL-like", and the 2nd way is the way I would go for MySQL.
Any inputs are welcome.
This is a typical data modeling question in MongoDB. Since you are planning to store just the _id of the user the answer is definitely to embed it because those replies are part of the post object.
If those replies can number in the hundreds or thousands and you are not going to show them by default (for example, you are going to have the users click to load those comments) then it would make more sense to store the replies in a separate collection.
Finally, if you need to store more than the user _id (such as the name) you have to think about maintaining the name in two places (here and in the user maintenance page) as you are duplicating data. This can be manageable or too much work. You have to decide.
I'm quite new to nosql world.
If I have a very simple webapp with users authenticating & publishing posts, what's the mongodb(nosql) way to store users & posts on the nosql db?
Do I have (like in relationnal databases) to store users & posts each one in his own collection? Or store them in the same collection, on different documents? Or, finally with a redondant user infos (credentials) on each post he has published?
A way you could do it is to use two collection, a posts collection and a authors collection. They could look like the following:
Posts
{
title: "Post title",
body: "Content of the post",
author: "author_id",
date: "...",
comments: [
{
name: "name of the commenter",
email: "...",
comment: "..."
}],
tags: [
"tag1", "tag2, "tag3
]
}
Authors
{
"_id": "author_id",
"password": "..."
}
Of course, you can put it in a single collection, but #jcrade mentioned a reason why you would/should use two collections. Remember, that's NoSQL. You should design your database from an application point of you, that means ask yourself what data is consumed and how.
This post says it all:
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
It really depends on your application, and how many posts you expect your users to have: if it's a one-to-few relationship, then probably using embedded documents (inside your users model) is the way to go. If it's one to many (up to a couple of thousands) then just embed an array of IDs in your users model. If it's more than that, then use the answer provided by Horizon_Net.
Read the post, and you get a pretty good idea of what you will have to do. Good luck!
When you are modeling nosql database you should think in 3 basic ideas
Desnormalization
Copy same data on multiple documents. in order to simplify/optimize query processing or to fit the user’s data into a particular data model
Aggregation
Embed data into documents for example (blog post and coments) in order to impact updates both in performance and consistency because mongo has one document consistency at time
Application level Joins
Create applicaciton level joins when its not good idea to agregate information (for example each post as idependent document will be really bad because we need to accces to the same resource)
to answer your question
Create two document one is blogPost with all the comments, and tags on it and user ui. Second User with all user information.
I have the following schema for posts. Each post has an embedded author and attachments (array of links / videos / photos etc).
{
"content": "Pixable tempts Everpix users with quick-import tool for photos ahead of December 15 closure http:\/\/t.co\/tbsSrVYneK by #psawers",
"author": {
"username": "TheNextWeb",
"id": "10876852",
"name": "The Next Web",
"photo": "https:\/\/pbs.twimg.com\/profile_images\/378800000147133877\/895fa7d3daeed8d32b7c089d9b3e976e_bigger.png",
"url": "https:\/\/twitter.com\/account\/redirect_by_id?id=10876852",
"description": "",
"serviceName": "twitter"
},
"attachments": [
{
"title": "Pixable tempts Everpix users with quick-import tool for photos ahead of December 15 closure",
"description": "Pixable, the SingTel-owned company that organizes your social photos in smart ways, has announced a quick-import tool for Everpix users following the company's decision to close ...",
"url": "http:\/\/t.co\/tbsSrVYneK",
"type": "link",
"photo": "http:\/\/cdn1.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2013\/09\/camera1-.jpg"
}
]
}
Posts are read often (we have a view with 4 tabs, each tab requires 24 posts to be shown). Currently we are indexing these lists in Redis, so querying 4x24posts is as simple as fetching the lists from Redis (returns a list of mongo ids) and querying posts with the ids.
Updates on the embedded author happen rarely (for example when the author changes his picture). The updates do not have to be instantaneous or even fast.
We're wondering if we should split up the author and the post into two different collections. So a post would have a reference to its author, instead of an embedded / duplicated author. Is a normalized data state preferred here (author is duplicated for every post, resulting in a lot of duplicated data / extra bytes)? Or should we continue with the de-normalized state?
As it seems that you have a few magnitudes more reads than writes, it probably makes little sense to split this data out into two collections. Especially with few updates, and you needing almost all author information while showing posts one query is going to be faster than two. You also get data locality so potentially you would need less data in memory as well, which should provide another benefit.
However, you can only really find out by benchmarking this with the amount of data that you'd be using in production.
I have a small REST API that is being consumed by a single page web application powered by Backbone.js
There are two resource types that the API provides, and therefore, the Backbone app uses. These are articles and comments. These two resources have different endpoints and there is a link from each of the articles to the location of all the comments for that item.
The problem that I'm facing is that, on the article list in my web app I would like to be able to display the number of comments for each article. Given that that would only be possible if I also get the comments list, on the current setup, would require me to make one API request to get the the initial article list and another one for each of the articles to be able to count the number of comments. That becomes a problem if, for instance, there are 100 articles, and therefore 101 HTTP requests would be necessary to populate one single view.
The solutions I can think of right now are:
1. to include the comments data in the initial articles request like so
{
{
"id": 1,
"name": "Article 1",
...
"comments": {
{
"id": 1,
"text": "some comment"
},
{
"id": 2,
"text": "some comment"
},
...
}
},
}
The question in this case is: How is it possible to parse the "comments" as a separate comments collection and not include it into the article model?
2. to include some metadata inside the articles response like so:
{
{
"id": 1,
"name": "Article 1",
...
"comments": 13
},
}
Option that raises the question: how should I handle the parse of the model so that, on one hand the meta information is available, and on the other hand, the "comments" attribute is not one Backbone would try to perform updates on?
I feel there might be another solution, compliant with the REST philosophy, for this that I'm missing, so if you have any other suggestion please let me know.
I think your best bet is to go with your second option, include the number of comments for each article inside your article model.
Option that raises the question: how should I handle the parse of the model so that, on one hand the meta information is available, and on the other hand, the "comments" attribute is not one Backbone would try to perform updates on?
Not sure what your concern is here. Why would you be worried about the comments attribute getting updated?
I can't think of any other "RESTy" way of achieving your desired result.
I would suggest using alternative 2 and have the server return
a subset of the article attributes that are deemed useful for
applications when dealing with the article collection resource
(perhaps reachable at /articles).
The full article member resource with all its comments (whether
they are stored in separate tables in the backend) would be
available at /articles/:id).
From a Backbone.js point of view you probably want to put the
collection resource in a, say, ArticleCollection which will
convert each member (currently with a subset of the attributes)
to Article models.
When the user selects to view an article in full you pull it
out from the ArticleCollection and invoke fetch to populate
it in full.
Regarding what to do with extra/virtual attributes that are included
in the collection resource (/articles) like the comment count and
possibly other usefult aggregations, I see a few alternatives:
In Article#initialize you can pull those out from the attributes
and store them as meta-data on the article. This way the built-in
Backbone.Model#toJSON will not see them.
Keep them in the attributes section of each model and override
Backbone.Model#toJSON to exlcude them when "serializing" an Article.
In atlernative 1, an Article#commentCount() helper could return
this._commentCount || this.get('comments').length to make it work
on both partially and fully loaded articles.
For a fully loaded Article you would probably want to convert the
nested comments array into a full-blown CommentCollection anyway
and store that in this._comments so I don't think it is that unusual
to have your models store additional stuff directly on the model instance,
outside of its attributes hash.
As a mongo/nosql newbie with a RDBMS background I wondered what's the best way to proceed.
Currently I've got a large set of documents, containing in some fields, what I consider as "reference datas".
My need is to display in a search interface summarizing the possible values of those "reference fields" to further proceed a filter on my documents set.
Let's take a very simple and stupid example about nourishment.
Here is an extract of some mongo documents:
{ "_id": 1, "name": "apple", "category": "fruit"}
{ "_id": 1, "name": "orange", "category": "fruit"}
{ "_id": 1, "name": "cucumber", "category": "vegetable"}
In the appplication I'd like to have a selectbox displaying all the possible values for "category". Here it would display "fruit" and "vegetable".
What's the best way to proceed ?
extract datas from the existing documents ?
create some reference documents listing unique possible values (as I would do in RDBMS )
store reference data in a rdbms and programatically link mongo and rdbms...
something else ?
The first option is the easiest to implement and should be efficient if you have indexes properly set (see distinct command), so I would go with this.
You could also choose the second option (linking to a reference collection - RDBMS way) which trades performance (you will need more queries for fetching data) for space (you will need less space). Also, this option is preferred if the category is used in other collections as well.
I would advise against using a mixed system (NoSQL + RDBMS) in this case as the other options are better.
You could also store category values directly in application code - depends on your use case. Sometimes it makes sense, although any RDBMS fanatic would burst into tears (or worse) if you tell him that. YMMV. ;)