About normalization in Redux - mongodb

Assume you're making Reddit where each Subreddit has many Post and each Post has many Comment. Then the API response probably looks like this:
subreddits: [{
title: "food"
posts: [{
id: "",
body: "..",
comments: [{
id: ".."
body: "..",
}]
..morePosts
},
title: "culture"
posts: [{
id: "",
body: "..",
comments: [{
id: ".."
body: "..",
}]
..morePosts
},
]
But since Redux discourages such nested state, we normalize the data structure before we feed them into reducers. Then, the data can be represented like this:
subredditByTitle: {
food: {
id: subreddit_1,
title: "food"
posts: [post_1, post_2]
}
culture: {
id: subreddit_2,
title: "culture"
posts: [post_3, post_4]
}
}
postsById: {
post_1: {
body: ".."
comments: [comment_1, comment_2]
},
post_2: {
body: "..",
comments: [comment_3, comment_4]
}
}
commentsById: {
comment_1: {
body: ".."
},
comment_2: {
body: ".."
}
}
But it feels a bit awkward to normalize the backend data like this when I use MongoDB, especially when I am using subdocuments. In a relational DB, it makes sense to have a lookup table (e.g. PostsById) for every DB table, does it make sense to do the same for every DB collection? My gut feeling is that instead of trying to normalize everything, it might be better to have one reducer for each document, but I am not sure what the best practice might be.

You should really normalize everything and have entities object in your store where you put all your entities. I tried many approaches but IMHO this is the only true way.
I am using it for things which would be unthinkable to do without this approach but they are out of scope of this answer other more common ones are pagination and asking for data only when you need them application feels super snappy when there is no unnecessary data loading.
I highly recommend to take a really good look at this tiny piece of code and the redux real world example as whole there is really much to learn from that. Your entities reducer would look different obviously but you should be able to write your own to suit your needs

Related

Correct JSON structure to filter through data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What's the "best" JSON structure when you need to "filter" through data in Firebase (in Swift)?
I'm having users sort their questions into:
Business
Entertainment
Other
Is it better to have a separate child for each question genre? If so, how do I get all of the data (when i want it), and then filter it only by "business" when I want to?
In NoSQL databases you usually end up modeling your data structure for the use-cases you want to allow in your app.
It's a bit of a learning path, so I'll explain it below in four steps:
Tree by category: Storing the data in a tree by its category, as you seem to be most interested in already.
Flat list of questions, and querying: Storing the data in a flat list, and then using queries to filter.
Flat list and indexes: Combining the above two approaches, to make the result more scalable.
Duplicating data: By duplicating data on top of that, you can reduce code complexity and improve performance further.
Tree by category
If you only want to get the questions by their category, you're best of simply storing each question under its category. In a simple model that'd look like this:
questionsByCategory: {
Business: {
question1: { ... },
question4: { ... }
},
Entertainment: {
question2: { ... },
question5: { ... }
},
Other: {
question3: { ... },
question6: { ... }
}
}
With the above structure, loading a list of question for a category is a simple, direct-access read for that category: firebase.database().ref("questionsByCategory").child("Business").once("value"....
But if you'd need a list of all questions, you'd need to read all categories, and denest the categories client-side. If you'd need a list of all question that is not a real problem, as you need to load them all anyway, but if you want to filter over some other condition than category, this may be wasteful.
Flat list of questions, and querying
An alternative is to create a flat list of all questions, and use queries to filter the data. In that case your JSON would look like this:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
}
Now, getting a list of all questions is easy, as you can just read them and loop over the results:
firebase.database().ref("questions").once("value").then(function(result) {
result.forEach(function(snapshot) {
console.log(snapshot.key+": "+snapshot.val().category);
})
})
If we want to get all questions for a specific category, we use a query instead of just the ref("questions"). So:
Get all Business questions:
firebase.database().ref("questions").orderByChild("category").equalTo("Business").once("value")...
Get all questions with difficult 3:
firebase.database().ref("questions").orderByChild("difficult").equalTo(3).once("value")...
This approach works quite well, unless you have huge numbers of questions.
Flat list and indexes
If you have millions of questions, Firebase database queries may not perform well enough anymore for you. In that case you may need to combine the two approaches above, using a flat list to store the question, and so-called (self-made) secondary indexes to perform the filtered lookups.
If you think you'll ever reach this number of questions, I'd consider using Cloud Firestore, as that does not have the inherent scalability limits that the Realtime Database has. In fact, Cloud Firestore has the unique guarantee that retrieving a certain amount of data takes a fixed amount of time, no matter how much data there is in the database/collection.
In this scenario, your JSON would look like:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
},
questionsByCategory: {
Business: {
question1: true,
question4: true
},
Entertainment: {
question2: true,
question5: true
},
Other: {
question3: true,
question6: true
}
},
questionsByDifficulty: {
"1": {
question1: true,
question2: true,
question6: true
},
"2": {
question3: true,
question4: true
},
"3": {
question3: true
}
}
You see that we have a single flat list of the questions, and then separate lists with the different properties we want to filter on, and the question IDs of the question for each value. Those secondary lists are also often called (secondary) indexes, since they really serve as indexes on your data.
To load the hard questions in the above, we take a two-step approach:
Load the questions IDs with a direct lookup.
Load each question by their ID.
In code:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
result.forEach(function(snapshot) {
firebase.database().ref("questions").child(snapshot.key).once("value").then(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
});
})
})
If you need to wait for all questions before logging (or otherwise processing) them, you'd use Promise.all:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
var promises = [];
result.forEach(function(snapshot) {
promises.push(firebase.database().ref("questions").child(snapshot.key).once("value"));
})
Promise.all(promises).then(function(questionSnapshots) {
questionSnapshots.forEach(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
})
})
})
Many developers assume that this approach is slow, since it needs a separate call for each question. But it's actually quite fast, since Firebase pipelines the requests over its existing connection. For more on this, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Duplicating data
The code for the nested load/client-side join is a bit tricky to read at times. If you'd prefer only performing a single load, you could consider duplicating the data for each question into each secondary index too.
In this scenario, the secondary index would look like this:
questionsByCategory: {
Business: {
question1: { category: "Business", difficulty: 1, ... },
question4: { category: "Business", difficulty: 2, ... }
},
If you come from a background in relational data modeling, this may look quite unnatural, since we're now duplicating data between the main list and the secondary indexes.
To an experienced NoSQL data modeler however, this looks completely normal. We're trading off storing some extra data against the extra time/code it takes to load the data.
This trade-off is common in all areas of computer science, and in NoSQL data modeling you'll fairly often see folks choosing to sacrifice space (and thus store duplicate data) to get an easier and more scalable data model.

Loopback's ReferenceMany-like relation with additional fields

I need help with loopback framework.
I have two models: Post and Media.
Examples:
Media
{
id: ObjectId("...a1"),
type: "gif",
path: "some/folder"
},
{
id: ObjectId("...a2"),
type: "mp4",
path: "some/folder"
},
Post
{
id: ObjectId("...b1"),
title: "Apollo 13",
content: [
{
mediaId: ObjectId("...a1"),
header: "header-1",
description: "descr-1"
},
{
mediaId: ObjectId("...a2"),
header: "header-2",
description: "descr-2"
}
]
},
{
id: ObjectId("...b2"),
title: "2 seconds to Moon",
content: [
{
mediaId: ObjectId("...a1"),
header: "header-3",
description: "descr-3"
},
]
}
As you can guess I'm going to use MongoDb. I want to describe a relation between this two models, but not sure how to do it in the right way.
If I had only array of mediaIds, I'd make it through referenceMany. Now it's look more like embedsMany, but embeds many what?
I even tried to make something like MediaItem model and give it transient datasource. But I didn't make it works right with rest APIs.
At final I want to get one or many posts with including media data such as type and path fields.
Any thoughts?
Probably you should use HasManyThrough relation (http://loopback.io/doc/en/lb2/HasManyThrough-relations.html) and then include filter (http://loopback.io/doc/en/lb2/Include-filter.html)

Redux normalized state tree for posts and comments

Redux recommends using normalized app state tree but I am not sure if it's the best practice in this case. Assume the following case:
Each Circle has_many Posts.
Each Post has_many Comments.
In the database on the backend, each model looks like this:
Circle:
{
_id: '1'
title: 'BoyBand'
}
Post:
{
_id: '1',
circle_id: '1',
body: "Some Post"
}
Comment:
{
_id: '1',
post_id: '1',
body: "Some Comment"
}
In the app state (the final result of all reducers) on the frontend looks like this:
{
circles: {
byId: {
1: {
title: 'BoyBand'
}
},
allIds: [1]
},
posts: {
byId: {
1: {
circle_id: '1',
body: 'Some Post'
}
},
allIds: [1]
},
comments: {
byId: {
1: {
post_id: '1',
body: 'Some Comment'
},
allIds: [1]
}
}
Now, when I go to CircleView, I fetch Circle from the backend which returns all posts and comments associated with it.
export const fetchCircle = (title) => (dispatch, getState) => {
dispatch({
type: constants.REQUEST_CIRCLE,
data: { title: title }
})
request
.get(`${API_URL}/circles/${title}`)
.end((err, res) => {
if (err) {
return
}
// When you fetch circle from the API, the API returns:
// {
// circle: circleObj,
// posts: postsArr,
// comments: commentsArr
// }
// so it's easier for the reducers to consume the data
dispatch({
type: constants.RECEIVE_CIRCLE,
data: (normalize(res.body.circle, schema.circle))
})
dispatch({
type: 'RECEIVE_POSTS',
data: (normalize(res.body.posts, schema.arrayOfPosts))
})
dispatch({
type: 'RECEIVE_COMMENTS',
data: (normalize(res.body.comments, schema.arrayOfComments))
})
})
}
Up to this point, I think I did everything in a fairly standard way. However, when I wanted to render each Post component, I realized that populating the posts with their comments became inefficient (O(N^2)) compared to when I kept my state tree in the following format.
{
circles: {
byId: {
1: {
title: 'BoyBand'
}
},
allIds: [1]
},
posts: {
byId: {
1: {
circle_id: '1',
body: 'Some Post'
comments: [arrOfComments]
}
},
allIds: [1]
}
}
This goes against my understanding where in a redux state tree, it's better to keep everything normalized.
Q. Should I in fact keep things denormalized in a case like this? How do I determine what to do?
I'd go for: yes normalize it, but do it on the backend!
Why?
Deleting is easier
because otherwise, you'd have to track down the posts and comments every time you'd want to delete a circle, or post.
Working with the data is easier
because otherwise, you'd have to do the same mutations on your data over and over again just so that you can select the dataset which is related to a particular circle or post.
You don't have any many-to-many relationship
you don't have multiple posts which link to the same comment so it just makes sense to have the data normalized.
You shouldn't be limited by an API
If this is a third-party API then make your backend fetch the API and normalize the data there. You shouldn't be restricted by the API and I don't know what kind of data you access but you can definitively save a DNS lookup for the user and serve cached data if the API is unavailable. If you rely on the API to being up you introduce a single point of failure.
About your performance issues, they should be insignificant if you normalize on the backend and you should measure it and take the critical code for a code review.
In my opinion, the list of comments is specific for any post. User cannot post one comment into multiple posts. And there's nothing wrong that comments are tightly coupled with the post. It's easy to update/remove specific comment(both postId and commentId are present). Removing a post is trivial. Same with circle. It's insignificantly harder to remove all comments of a specific user. And I think that there are no strict rules, the RIGHT way, etc... more often it depends. KiSS ;)
While thinking how to organize comments on client side I was reading this article, it's about possible db structures for similar situation. https://docs.mongodb.com/ecosystem/use-cases/storing-comments/

What exactly is "data" that is passed to responses?

I'm writing a custom response that takes data as an input, and I am finding strange properties being added, namely:
add: [Function: add],
remove: [Function: remove]
When I log out some example data, I get:
[ { books:
[ { id: 1,
title: 'A Game of Thrones',
createdAt: '2015-08-04T04:53:38.043Z',
updatedAt: '2015-08-04T04:53:38.080Z',
author: 1 } ],
id: 1,
name: 'George R. R. Martin',
createdAt: '2015-08-04T04:53:38.040Z',
updatedAt: '2015-08-04T04:53:38.073Z' },
{ books:
[ { id: 2,
title: 'Ender\'s Game',
createdAt: '2015-08-04T04:53:38.043Z',
updatedAt: '2015-08-04T04:53:38.080Z',
author: 2 },
{ id: 3,
title: 'Speaker for the Dead',
createdAt: '2015-08-04T04:53:38.043Z',
updatedAt: '2015-08-04T04:53:38.081Z',
author: 2 } ],
id: 2,
name: 'Orson Scott Card',
createdAt: '2015-08-04T04:53:38.042Z',
updatedAt: '2015-08-04T04:53:38.074Z' } ]
Which looks innocent enough, but results in the strange add and remove functions when I use a custom serializer on it. If I take this data and hard-code it straight into the serializer, those are not present. Apparently something is lurking inside of data that's not being printed to the console.
So, what is data?
Edit: So, I'm still not quite sure what other magical properties live in here, but:
Object.keys(data[0].books))
reveals
[ '0', 'add', 'remove' ]
Which is where those are coming from. Why is this included in the data passed to custom responses? And what else might be hiding in there...
More importantly, how do I strip this gunk out and make data a normal object?
JSON.parse(JSON.stringify(data));
That cleans it up nicely, though it feels like a hack. (Actually, it's definitely a hack.)
I assume your data attribute is returned by a database query. e.g.:
Model.find(...).exec(function (err, data) { ... });
But what are these .add() and .remove() methods?
Here is what you can find in the docs:
For the most part, records are just plain old JavaScript objects (aka POJOs). However they do have a few protected (non-enumerable) methods for formatting their wrapped data, as well as a special method (.save()) for persisting programmatic changes to the database.
We can go deeper:
"collection" associations, on the other hand, do have a couple of special (non-enumerable) methods for associating and disassociating linked records. However, .save() must still be called on the original record in order for changes to be persisted to the database.
orders[1].buyers.add({ name: 'Jon Snow' });
orders[1].save(function (err) { ... });
So these methods (.add(), .remove(), .save()) are useful if you play with "collection" associations.
How to remove them?
You'll need to use .toObject() which returns a cloned model instance stripped of all instance methods.
You might want to use .toJSON() that also returns a cloned model instance. This one however includes all instance methods.

Best way to structure relationships in a no-SQL database?

I'm using MongoDB. I know that MongoDB isn't relational but information sometimes is. So what's the most efficient way to reference these kinds of relationships to lessen database load and maximize query speed?
Example:
* Tinder-style "matches" *
There are many users in a Users collection. They get matched to each other.
So I'm thinking:
Document 1:
{
_id: "d3fg45wr4f343",
firstName: "Bob",
lastName: "Lee",
matches: [
"ferh823u9WURF",
"8Y283DUFH3FI2",
"KJSDH298U2F8",
"shdfy2988U2Ywf"
]
}
Document 2:
{
_id: "d3fg45wr4f343",
firstName: "Cindy",
lastName: "Doe",
matches: [
"d3fg45wr4f343"
]
}
Would this work OK if there were, say, 10,000 users and you were on Bob's profile page and you wanted to display the firstName of all of his matches?
Any alternative structures that would work better?
* Online Forum *
I supposed you could have the following collections:
Users
Topics
Users Collection:
{
_id: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
}
{
_id: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
}
Topics Collection Version 1
One example document in the Topics Collection:
{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
{
message: "I'm going to use a gun.",
dateTimeSubmitted: 201408201200,
author: "d3fg45wr4f343"
},
{
message: "I don't think this would work.",
dateTimeSubmitted: 201408201201,
author: "23qdf3a3fq3fq3"
},
{
message: "It will totally work.",
dateTimeSubmitted: 201408201202,
author: "d3fg45wr4f343"
},
{
message: "ur dumb",
dateTimeSubmitted: 201408201203,
author: "23qdf3a3fq3fq3"
}
]
}
Topics Collection Version 2
One example document in the Topics Collection. The author's avatar and userName are now embedded in the document. I know that:
This is not DRY.
If the author changes their avatar and userName, these change would need to be updated in the Topics Collection and in all of the post documents that are in it.
BUT it saves the system from querying for all the avatars and userNames via the authors ID every single time this thread is viewed on the client.
{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
{
message: "I'm going to use a gun.",
dateTimeSubmitted: 201408201200,
author: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
},
{
message: "I don't think this would work.",
dateTimeSubmitted: 201408201201,
author: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
},
{
message: "It will totally work.",
dateTimeSubmitted: 201408201202,
author: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
},
{
message: "ur dumb",
dateTimeSubmitted: 201408201203,
author: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
}
]
}
So yeah, I'm not sure which is best...
If the data is realy many to many i.e. one can have many matches and can be matched by many in your first example it is usually best to go with relations.
The main arguments against relations stem from mongodb not beeing a relational database so there are no such things as foreign key constraints or join statements.
The trade off you have to consider in those many to many cases (many beeing much more than two) is either enforce the key constraints yourself or manage the possible data inconsistencies accross the multiple documents (your last example). And in most cases the relational approach is much more practical than the embedding approach for those cases.
Exceptions could be read often write seldom examples. For (a very constructed) example when in your first example matches would be recalculated once a day or so by wiping all previous matches and calculating a list of new matches. In that case the data inconsistencies you would introduce could be acceptable and the read time you save by embedding the firstnames of the matches could be an advantage.
But usually for many to many relations it would be best to use a relational approach and make use of the array query features such as {_id :{$in:[matches]}}.
But in the end it all comes down to the consideration of how many inconsistencies you can live with and how fast you realy need to access the data (is it ok for some topics to have the old avatar for a few days if I save half a second of page load time?).
Edit
The schema design series on the mongodb blog might be a good read for you: part1, part2 and part3