Advice on structuring data for scalability with large number of nested objects - mongodb

Looking for advice on how best to structure data in MongoDB, particularly for scalability - worried about having an array of potentially thousands of objects within each user object.
I am building a language learning app with a built in flashcard system. I want users to 'unlock' new vocabulary for each level, which automatically gets added to their flashcards, so when you unlock level 4, all the vocabulary attached to level 4 gets added to your flashcards.
For the flashcards themselves, I want a changable 'due date', so that you get prompted to do certain cards at a certain date - if you're familiar with spaced repition, that's the plan. So when you get a card, you can say how well you know it and, for example, if you know it well you won't get it for another week, but if you get it wrong you'll get it again the next day.
I'm using MongoDB for the backend, but am a little unsure about how best to structure my data. Currently, I have two objects: one for the cards, and one for the users.
The cards object looks like this, so there's a nested object for each flashcard, with a unique ID, the level the word appears in, and then the word in both languages.
const CardsList = [
{
id: 1,
level: 1,
gd: "sgìth",
en: "tired",
},
{
id: 2,
level: 2,
gd: "ceist",
en: "question",
},
];
Then each user has an object like the below, with various user data, and a nested array of objects for the cards - with the id of every card they've unlocked, and the date at which that card is next due.
{
id: 1,
name: "gordon",
level: 2,
cards: [
{ id: 1, date: "07/12/2021" },
{ id: 2, date: "09/12/2021" },
],
},
{
id: 2,
name: "mike",
level: 1,
cards: [
{ id: 1, date: "08/12/2021" },
{ id: 2, date: "07/12/2021" },
],
},
This works fine, but I'm a bit concerned about the scalability of it.
The plan is to have about two or three thousand words in total, and so if I had, say, fifty users complete the app, then that would mean fifty user objects, each with as much as three thousand objects in that nested cards array.
Is that going to be a problem? Would it be a problem if I had a thousand (or more) users, instead of 50? Is there a more sensible way of structuring the data that I'm not spotting?

Related

Best way to store and organize data in MongoDB

I have a users in MongoDB and each user has an interface allowing them to set their current state of hunger being a combination of "hungry", "not hungry", "famished", "starving", or "full"
Each user can enter a multiple options for any period of time. For example, one use case would be "in the morning, record how my hunger is" and the user can put "not hungry" and "full". They can record how their hunger is at any time in the day, and as many times as they want.
Should I store the data as single entries, and then group the data by a date in MongoDB later on when I need to show it in a UI? Or should I store the data as an array of the options the user selected along with a date?
It depends on your future queries, and you may want to do both. Disk space is cheaper than processing, and it's always best to double your disk space than double your queries.
If you're only going to map by date then you'll want to group all users/states by date. If you're only going to map by user then you'll want to group all dates/states by user. If you're going to query by both, you should just make two Collections to minimize processing. Definitely use an array for the hunger state in either case.
Example structure for date grouping:
{ date: '1494288000',
time-of-day: [
{ am: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
],
pm: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
]}]}
It depends on how you are going to access it. If you want to report on a user's last known state, then the array might be better:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: ['HUNGRY','FAMISHED'],
}
The timestamps of multiple records might not align perfectly if you are passing in the output from new Date() (note the second record is 99 ms later):
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: 'HUNGRY',
}
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.099Z',
hunger: ['FAMISHED',
}
You should probably look at your data model though and try to get a more deterministic state model. Maybe:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
isHungry: true,
hunger: 'FAMISHED',
}

Storing functions in MongoDB in different collections?

Say that I have a business that represents users who spend a certain amount of time to produce certain quantities of stuff. I want each user to be free to create their own algorithm, or formula, for determining the price that they charge for their work:
Users Collection, with possibly thousands of different users.
{
userId: 'sdf23d23dwew',
price: function(time, qty){
// some algorithm
}
},
{
userId: '23f5gf34f',
price: function(time, qty){
// another algorithm
}
},
{
userId: '7u76565',
price: function(time, qty){
// yet another algorithm
}
},
{
userId: 'w45y65yh4',
price: function(time, qty){
// something else
}
}
//and on and on and on...
Now, JSON doesn't support functions and neither does MongoDB. BUT this use-case of possibly thousands of users, each with the freedom to create their own unique method of determining their own prices, seems to me like being able to store functions inside of their user document would be ideal.
I certainly don't feel like it's a good idea to just store all these thousands of functions in a JS file on the server that somehow gets referenced by a userId when it's needed...
Is there a solution for this case?

With meteor.js and mongo, please show me the best way to organize a categories collection

I'm coming from the SQL world, so naturally mongo / noSQL has been an adventure.
I'm building a page to add/edit categories, that "posts" will later be assigned to.
What I've basically created is this:
{
_id: "asdf234ljsf",
title: "CategoryOne",
sortorder: 1,
active: true,
children: [
{
title: ChildOne,
sortorder: 1,
active: true
},
{
title: ChildTwo,
sortorder: 2,
active: true
}
]
}
So later, when creating a "post" I would assign that post to one or more parent categories, as well as optionally one or more child categories within the selected parent categories. Visitors to the site if they clicked on a parent category, it would show all posts within that parent category, and if they select a child category, it will only show posts within that child category.
The logic is obvious and simple, but in SQL I would have created tables like this:
table_Category ( CategoryID, Title, Sort, Active )
table_Category_Children ( ChildID, ParentID, Title, Sort, Active )
I've been reading the Discover Meteor book and it mentions that Meteor gives us many tools that work a lot better when operating at the collection level, as well as how the DDP operates at the top level of a document, meaning if something small changed down in a sub collection or array, potentially unneeded data will be sent back to all connected/subscribed clients.
So, this makes me think I should be organizing the categories like this:
Collection for parent categories
{
_id: "someid",
title: "CategoryOne"
sortorder: 1,
active: true
},
{
_id: "someid",
title: "CategoryTwo"
sortorder: 1,
active: true
}
Collection for Child Categories
{
_id: "someid",
parent: "idofparent"
title: "ChildOne"
sortorder: 1,
active: true
},
{
_id: "someid",
parent: "idofparent"
title: "ChildTwo"
sortorder: 1,
active: true
}
Or, perhaps its better like this:
Collection for parent categories
{
_id: "someid",
title: "CategoryOne"
sortorder: 1,
active: true,
children: [ { id: "childid" }, ... ]
}
I think understanding a best practice/method for Meteor and Mongo in this scenario will help me greatly across the board.
So conclusion: I have an admin page where I add/edit these categories. When clients create a post, they'll select the parent and child categories suitable for their post and make sure that I organize it properly from the beginning. Changing my thinking process from a traditional RDBMS to NoSQL is a big jump.
Thank you!
MongoDB stores all data in documents. This is a fundamental difference from relational database like SQL.
Imagine if you have 100 parent categories and 1000 child categories, once you update a parent category it will affect all linked child category's "idofparent", in a reactive way. In short, it's not sustainable.
Try to think of a way to avoid JOIN SQL equivalent in MongoDB.
Restructure you data perhaps similar to this way:
One big collection for all categories:
{
_id: id,
title: title,
sortorder: 1,
active: 1,
class: "parent > child" // make this as a field
...
}
// class can be "parent1", "parent2", "parent1 > child1" ... you get the idea
so each document store is completely individual.
Or if you absolutely need JOIN relational data structure, I don't think MongoDB is the right choice for you.

Best way to structure relationships in a no-SQL database?

I'm using MongoDB. I know that MongoDB isn't relational but information sometimes is. So what's the most efficient way to reference these kinds of relationships to lessen database load and maximize query speed?
Example:
* Tinder-style "matches" *
There are many users in a Users collection. They get matched to each other.
So I'm thinking:
Document 1:
{
_id: "d3fg45wr4f343",
firstName: "Bob",
lastName: "Lee",
matches: [
"ferh823u9WURF",
"8Y283DUFH3FI2",
"KJSDH298U2F8",
"shdfy2988U2Ywf"
]
}
Document 2:
{
_id: "d3fg45wr4f343",
firstName: "Cindy",
lastName: "Doe",
matches: [
"d3fg45wr4f343"
]
}
Would this work OK if there were, say, 10,000 users and you were on Bob's profile page and you wanted to display the firstName of all of his matches?
Any alternative structures that would work better?
* Online Forum *
I supposed you could have the following collections:
Users
Topics
Users Collection:
{
_id: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
}
{
_id: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
}
Topics Collection Version 1
One example document in the Topics Collection:
{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
{
message: "I'm going to use a gun.",
dateTimeSubmitted: 201408201200,
author: "d3fg45wr4f343"
},
{
message: "I don't think this would work.",
dateTimeSubmitted: 201408201201,
author: "23qdf3a3fq3fq3"
},
{
message: "It will totally work.",
dateTimeSubmitted: 201408201202,
author: "d3fg45wr4f343"
},
{
message: "ur dumb",
dateTimeSubmitted: 201408201203,
author: "23qdf3a3fq3fq3"
}
]
}
Topics Collection Version 2
One example document in the Topics Collection. The author's avatar and userName are now embedded in the document. I know that:
This is not DRY.
If the author changes their avatar and userName, these change would need to be updated in the Topics Collection and in all of the post documents that are in it.
BUT it saves the system from querying for all the avatars and userNames via the authors ID every single time this thread is viewed on the client.
{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
{
message: "I'm going to use a gun.",
dateTimeSubmitted: 201408201200,
author: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
},
{
message: "I don't think this would work.",
dateTimeSubmitted: 201408201201,
author: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
},
{
message: "It will totally work.",
dateTimeSubmitted: 201408201202,
author: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
},
{
message: "ur dumb",
dateTimeSubmitted: 201408201203,
author: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
}
]
}
So yeah, I'm not sure which is best...
If the data is realy many to many i.e. one can have many matches and can be matched by many in your first example it is usually best to go with relations.
The main arguments against relations stem from mongodb not beeing a relational database so there are no such things as foreign key constraints or join statements.
The trade off you have to consider in those many to many cases (many beeing much more than two) is either enforce the key constraints yourself or manage the possible data inconsistencies accross the multiple documents (your last example). And in most cases the relational approach is much more practical than the embedding approach for those cases.
Exceptions could be read often write seldom examples. For (a very constructed) example when in your first example matches would be recalculated once a day or so by wiping all previous matches and calculating a list of new matches. In that case the data inconsistencies you would introduce could be acceptable and the read time you save by embedding the firstnames of the matches could be an advantage.
But usually for many to many relations it would be best to use a relational approach and make use of the array query features such as {_id :{$in:[matches]}}.
But in the end it all comes down to the consideration of how many inconsistencies you can live with and how fast you realy need to access the data (is it ok for some topics to have the old avatar for a few days if I save half a second of page load time?).
Edit
The schema design series on the mongodb blog might be a good read for you: part1, part2 and part3

Select an item from Dojo Grid's store and display one of its attributes (array of objects) on grid

I have a Dojo EnhancedGrid which uses a data store filled with the following data structure:
[
{ id: 1, desc: "Obj Desc", options: [ { txt: "text", value: 0 }, { obj2 }, { objn } ] },
{ id: 2, ... },
{ id: 3, ... },
{ id: n, ... }
]
Currently I'm doing all this with an auxiliary store...but I believe this is far from a good approach to the problem, it's too ugly and doesn't work really well with edition (because I have to send changes from one store to another).
Instead of displaying all this objects at the same time, I wanted to select just one object (using its id) and display its options objects on grid. At the same time, the changes on grid should make effect on store, to be able to save them later.
Is it possible to query the grid's store, in order to display just one object? How?
And is it possible to fill the grid with objects list present on "options" attribute?