I have an application where an article can be linked to multiple platforms.
Article contains a list of platforms and platforms also contains a list of articles.
For more detailed information please look at this stackoverflow question that I asked a few months ago.
https://stackoverflow.com/a/40377383/5770147
The question was on how to create an article and implement the N-N relationship between article and platform.
I have Creating article and Deleting the article setup so that the lists update in the platforms aswell.
How do I implement editing an article so I can update which platforms are linked to an article?
For creating and editing the linked platforms I use a dropdown menu in which multiple options can be selected. The necessary code can be found in the question previously linked.
Based on the information that you provided, I would recommend two possible approaches, starting from the same foundation:
Use two collections (articles and platforms) and store only a reference to platform documents in an array defined on article
documents
I would recommend this approach if:
You have a high cardinality of both article documents, as well as
platforms
You want to be able to manage both entities independently, while
also syncing references between them
// articles collection schema
{
"_id": ...,
"title": "I am an article",
...
"platforms": [ "platform_1", "platform_2", "platform_3" ],
...
}
// platforms collection schema
{
"_id": "platform_1",
"name": "Platform 1",
"url": "http://right/here",
...
},
{
"_id": "platform_2",
"name": "Platform 2",
"url": "http://right/here",
...
},
{
"_id": "platform_3",
"name": "Platform 3",
"url": "http://right/here",
...
}
Even if this approach is quite flexible, it comes at a cost - if you require both article and platform data, you will have to fire more queries to your MongoDB instance, as the data is split in two different collections.
For example, when loading an article page, considering that you also want to display a list of platforms, you would have to fire a query to the articles collection, and then also trigger a search on the platforms collection to retrieve all the platform entities to which that article is published via the members of the platforms array on the article document.
However, if you only have a small subset of frequently accessed platform attributes that you need to have available when loading an article document, you might enhance the platforms array on the articles collection to store those attributes in addition to the _id reference to the platform documents:
// enhanced articles collection schema
{
"_id": ...,
"title": "I am an article",
...
"platforms": [
{platform_id: "platform_1", name: "Platform 1"},
{platform_id: "platform_2", name: "Platform 2"},
{platform_id: "platform_3", name: "Platform 3"}
],
...
}
This hybrid approach would be suitable if the platform data attributes that you frequently retrieve to display together with article specific data are not changing that often.
Otherwise, you will have to synchronize all the updates that are made to the platform document attributes in the platforms collection with the subset of attributes that you track as part of the platforms array for article documents.
Regarding the management of article lists for individual platforms, I wouldn't recommend storing N-to-N references in both collections, as the aforementioned mechanism already allows you to extract article lists by querying the articles collection using a find query with the _id value of the platform document:
Approach #1
db.articles.find({"platforms": "platform_1"});
Approach #2:
db.articles.find({"platforms.platform_id": "platform_1"});
Having presented two different approaches, what I would recommend now is for you to analyze the query patterns and performance thresholds of your application and make a calculated decision based on the scenarios that you encounter.
Related
When we return each document in our database to be consumed by the client we also must to add a property "isInUse" to that document's response payload to indicate if a given documented is referenced by other documents .
This is needed because referenced documents cannot be deleted and so a trash bin button should not be displayed next to it's listing entry in the client-side app.
So basically we have relationships where a document can reference another link this:
{
"_id": "factor:1I9JTM97D",
"someProp": 1,
"otherProp": 2,
"defaultBank": <id of some bank document>
}
Previously we have used views and selectors to query for each documents references in other documents, however this proved to be non-trivial.
So here's how someone in our team has implemented this now: We register all relationships in dedicated "relationship" documents like the one below and update them every time a document created/updated/deleted by the server, to reflect anything new references or de-references:
{
"_id": "docInUse:bank",
"_rev": "7-f30ffb403549a00f63c6425376c99427",
"items": [
{
"id": "bank:1S36U3FDD",
"usedBy": [
"factor:1I9JTM97D"
]
},
{
"id": "bank:M6FXX6UA5",
"usedBy": [
"salesCharge:VDHV2M9I1",
"salesCharge:7GA3BH32K"
]
}
]
}
The question is whether this solution is an anti-pattern and what are the potential drawbacks.
I would say using a single document to record the relationships between all other documents could be problematic because
the document "docInUse:bank" could end up being updated frequently. Cloudant allows you to update documents but when you get to many thousands of revisions, then the document size becomes none trivial, because all the previous revision tokens are retained
updating a central document invites the problem of document conflicts if two processes attempt to update the document at the same time. You are allowed to have have conflicts, but it is your app's responsibility to manage them see here
if you have lots of relationships, this document could get very large (I don't know enough about your app to judge)
Another solution is to keep your bank:*, factor:* & salesCharge:* documents the same and create a document per relationship e.g.
{
"_id": "1251251921251251",
"type": "relationship",
"doc": "bank:1S36U3FDD",
"usedby": "factor:1I9JTM97D"
}
You can then find out documents on either side of the "join" by querying documents by the value of doc or usedby with a suitable index.
I've also seen implementations, where the document's _id field contains all of the information:
{
"_id": "bank:1S36U3FDD:factor:1I9JTM97D"
"added": "2018-02-28 10:24:22"
}
and the primary key helpfully sorts the document ids for you allowing you to use judicious use of GET /db/_all_docs?startkey=x&endkey=y to fetch the relationships for the given bank id.
If you need to undo a relationship, just delete the document!
By building a cache of relationships on every document create/update/delete as you currently implemented it, you are basically recreating an index manually in the database. This is the reason why I would lean towards calling it an antipattern.
One great way to improve your design is to store each relation as a separate document as Glynn suggested.
If your concern is consistency (which I think might be the case, judging by looking at the document types you mentioned), try to put all information about a transaction into a single document. You can define the relationships in a consistent place in your documents, so updating the views would not be necessary:
{
"_id":"salesCharge:VDHV2M9I1",
"relations": [
{ "type": "bank", "id": "bank:M6FXX6UA5" },
{ "type": "whatever", "id": "whatever:xy" }
]
}
Then you can keep your views consistent, and you can rely on CouchDB to keep the "relation cache" up to date.
I'm approaching the noSQL world.
I studied a little bit around the web (not the best way to study!) and I read the Mongodb documentation.
Around the web I wasn't able to find a real case example (only fancy flights on big architectures not well explained or too basic to be real world examples).
So I have still some huge holes in my understanding of a noSQL and Mongodb.
I try to summarise one of them, the worst one actually, here below:
Let's imagine the data structure for a post of a simple blog structure:
{
"_id": ObjectId(),
"title": "Title here",
"body": "text of the post here",
"date": ISODate("2010-09-24"),
"author": "author_of_the_post_name",
"comments": [
{
"author": "comment_author_name",
"text": "comment text",
"date": ISODate("date")
},
{
"author": "comment_author_name2",
"text": "comment text",
"date": ISODate("date")
},
...
]
}
So far so good.
All works fine if the author_of_the_post does not change his name (not considering profile picture and description).
The same for all comment_authors.
So if I want to consider this situation I have to use relationships:
"authorID": <author_of_the_post_id>,
for post's author and
"authorID": <comment_author_id>,
for comments authors.
But MongoDB does not allow joins when querying. So there will be a different query for each authorID.
So what happens if I have 100 comments on my blog post?
1 query for the post
1 query to retrieve authors informations
100 queries to retrieve comments' authors informations
**total of 102 queries!!!**
Am I right?
Where is the advantage of using a noSQL here?
In my understanding 102 queries VS 1 bigger query using joins.
Or am I missing something and there is a different way to model this situation?
Thanks for your contribution!
Have you seen this?
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
It sounds like what you are doing is NOT a good use case for NoSQL. Use relational database for basic data storage to back applications, use NoSQL for caching and the like.
NoSQL databases are used for storage of non-sensitive data for instance posts, comments..
You are able to retrieve all data with one query. Example: Don't care about outdated fields as author_name, profile_picture_url or whatever because it's just a post and in the future this post will not be visible as newer ones. But if you want to have updated fields you have two options:
First option is to use some kind of worker service. If some user change his username or profile picture you will give some kind of signal to your service to traverse all posts and comments and update all fields his new username.
Second option use authorId instead of author name, and instead of 2 query you will make N+2 queries to query for comment_author_profile. But use pagination, instead of querying for 100 comments take 10 and show "load more" button/link, so you will make 12 queries.
Hope this helps.
I'm quite new to nosql world.
If I have a very simple webapp with users authenticating & publishing posts, what's the mongodb(nosql) way to store users & posts on the nosql db?
Do I have (like in relationnal databases) to store users & posts each one in his own collection? Or store them in the same collection, on different documents? Or, finally with a redondant user infos (credentials) on each post he has published?
A way you could do it is to use two collection, a posts collection and a authors collection. They could look like the following:
Posts
{
title: "Post title",
body: "Content of the post",
author: "author_id",
date: "...",
comments: [
{
name: "name of the commenter",
email: "...",
comment: "..."
}],
tags: [
"tag1", "tag2, "tag3
]
}
Authors
{
"_id": "author_id",
"password": "..."
}
Of course, you can put it in a single collection, but #jcrade mentioned a reason why you would/should use two collections. Remember, that's NoSQL. You should design your database from an application point of you, that means ask yourself what data is consumed and how.
This post says it all:
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
It really depends on your application, and how many posts you expect your users to have: if it's a one-to-few relationship, then probably using embedded documents (inside your users model) is the way to go. If it's one to many (up to a couple of thousands) then just embed an array of IDs in your users model. If it's more than that, then use the answer provided by Horizon_Net.
Read the post, and you get a pretty good idea of what you will have to do. Good luck!
When you are modeling nosql database you should think in 3 basic ideas
Desnormalization
Copy same data on multiple documents. in order to simplify/optimize query processing or to fit the user’s data into a particular data model
Aggregation
Embed data into documents for example (blog post and coments) in order to impact updates both in performance and consistency because mongo has one document consistency at time
Application level Joins
Create applicaciton level joins when its not good idea to agregate information (for example each post as idependent document will be really bad because we need to accces to the same resource)
to answer your question
Create two document one is blogPost with all the comments, and tags on it and user ui. Second User with all user information.
I have the following schema for posts. Each post has an embedded author and attachments (array of links / videos / photos etc).
{
"content": "Pixable tempts Everpix users with quick-import tool for photos ahead of December 15 closure http:\/\/t.co\/tbsSrVYneK by #psawers",
"author": {
"username": "TheNextWeb",
"id": "10876852",
"name": "The Next Web",
"photo": "https:\/\/pbs.twimg.com\/profile_images\/378800000147133877\/895fa7d3daeed8d32b7c089d9b3e976e_bigger.png",
"url": "https:\/\/twitter.com\/account\/redirect_by_id?id=10876852",
"description": "",
"serviceName": "twitter"
},
"attachments": [
{
"title": "Pixable tempts Everpix users with quick-import tool for photos ahead of December 15 closure",
"description": "Pixable, the SingTel-owned company that organizes your social photos in smart ways, has announced a quick-import tool for Everpix users following the company's decision to close ...",
"url": "http:\/\/t.co\/tbsSrVYneK",
"type": "link",
"photo": "http:\/\/cdn1.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2013\/09\/camera1-.jpg"
}
]
}
Posts are read often (we have a view with 4 tabs, each tab requires 24 posts to be shown). Currently we are indexing these lists in Redis, so querying 4x24posts is as simple as fetching the lists from Redis (returns a list of mongo ids) and querying posts with the ids.
Updates on the embedded author happen rarely (for example when the author changes his picture). The updates do not have to be instantaneous or even fast.
We're wondering if we should split up the author and the post into two different collections. So a post would have a reference to its author, instead of an embedded / duplicated author. Is a normalized data state preferred here (author is duplicated for every post, resulting in a lot of duplicated data / extra bytes)? Or should we continue with the de-normalized state?
As it seems that you have a few magnitudes more reads than writes, it probably makes little sense to split this data out into two collections. Especially with few updates, and you needing almost all author information while showing posts one query is going to be faster than two. You also get data locality so potentially you would need less data in memory as well, which should provide another benefit.
However, you can only really find out by benchmarking this with the amount of data that you'd be using in production.
I have a small REST API that is being consumed by a single page web application powered by Backbone.js
There are two resource types that the API provides, and therefore, the Backbone app uses. These are articles and comments. These two resources have different endpoints and there is a link from each of the articles to the location of all the comments for that item.
The problem that I'm facing is that, on the article list in my web app I would like to be able to display the number of comments for each article. Given that that would only be possible if I also get the comments list, on the current setup, would require me to make one API request to get the the initial article list and another one for each of the articles to be able to count the number of comments. That becomes a problem if, for instance, there are 100 articles, and therefore 101 HTTP requests would be necessary to populate one single view.
The solutions I can think of right now are:
1. to include the comments data in the initial articles request like so
{
{
"id": 1,
"name": "Article 1",
...
"comments": {
{
"id": 1,
"text": "some comment"
},
{
"id": 2,
"text": "some comment"
},
...
}
},
}
The question in this case is: How is it possible to parse the "comments" as a separate comments collection and not include it into the article model?
2. to include some metadata inside the articles response like so:
{
{
"id": 1,
"name": "Article 1",
...
"comments": 13
},
}
Option that raises the question: how should I handle the parse of the model so that, on one hand the meta information is available, and on the other hand, the "comments" attribute is not one Backbone would try to perform updates on?
I feel there might be another solution, compliant with the REST philosophy, for this that I'm missing, so if you have any other suggestion please let me know.
I think your best bet is to go with your second option, include the number of comments for each article inside your article model.
Option that raises the question: how should I handle the parse of the model so that, on one hand the meta information is available, and on the other hand, the "comments" attribute is not one Backbone would try to perform updates on?
Not sure what your concern is here. Why would you be worried about the comments attribute getting updated?
I can't think of any other "RESTy" way of achieving your desired result.
I would suggest using alternative 2 and have the server return
a subset of the article attributes that are deemed useful for
applications when dealing with the article collection resource
(perhaps reachable at /articles).
The full article member resource with all its comments (whether
they are stored in separate tables in the backend) would be
available at /articles/:id).
From a Backbone.js point of view you probably want to put the
collection resource in a, say, ArticleCollection which will
convert each member (currently with a subset of the attributes)
to Article models.
When the user selects to view an article in full you pull it
out from the ArticleCollection and invoke fetch to populate
it in full.
Regarding what to do with extra/virtual attributes that are included
in the collection resource (/articles) like the comment count and
possibly other usefult aggregations, I see a few alternatives:
In Article#initialize you can pull those out from the attributes
and store them as meta-data on the article. This way the built-in
Backbone.Model#toJSON will not see them.
Keep them in the attributes section of each model and override
Backbone.Model#toJSON to exlcude them when "serializing" an Article.
In atlernative 1, an Article#commentCount() helper could return
this._commentCount || this.get('comments').length to make it work
on both partially and fully loaded articles.
For a fully loaded Article you would probably want to convert the
nested comments array into a full-blown CommentCollection anyway
and store that in this._comments so I don't think it is that unusual
to have your models store additional stuff directly on the model instance,
outside of its attributes hash.