Optimal relational data model in Firestore - google-cloud-firestore

I have a set of static and pre-defined to-do's that each user in my app needs to be able to set as completed on their account.
At the moment, I use a map on the todo item that specifies which users has completed the task. My data model at the moment looks like this:
- todos (collection)
- todoA (document)
- title, description etc
- completedBy {
uid1: true,
uid2: true,
uid3: false
}
This allows me to easily set todos as completed/not completed for each user and I can easily filter/query. It does have two drawbacks though:
A single Firestore document can "only" have 20 000 properties. If my app would grow large, this would be an issue.
Document size
I was thinking of maybe creating a similar map on my user document instead, setting todo ID's as true/false. This would get rid of the two drawbacks above but I'd need two database queries whenever I'm getting my todo items, one for the todo and one to check if it's completed.
Is there a better way to achieve the desired functionality in Firestore?

If you are running into either the maximum number of fields or the maximum document size, then typically that means that you should be using a separate collection for "the thing that makes your documents so big".
In your case that'd mean that you store the "user has completed a task" in a separate collection. This can be a subcollection of the user document, a subcollection of the task document, and/or a separate top-level collection. Which one is correct depends on your use-case.
There is no single best data model in NoSQL databases. It all depends on your use-cases, trade-offs, and some personal preferences. For a great introduction read NoSQL data modeling and watch Get to Know Cloud Firestore.

Related

Should I use one projection per entity or category?

I am coding a new application usign CQRS+ES architecture with Event Store DB. In my app, I have the following streams:
user-1
user-2
user-3
...
Each stream contains all events regarding a given user.
I am now creating a projection called user-account, which consists in basic data regarding my user's account (like first name, email, and others)
What is the optimal way to design that projection?
I should have a single projection for each user, creating projections called:
user-account-1
user-account-2
user-account-3
...
Or a single projection for all user-accounts? Being it a key-value pair record (that may store millions of keys in the future)
You can go with one stream per user. Projections are like dimensions. A user can exist in different "dimensions" (CDC naming) and have a different shape in each.
Read https://www.eventstore.com/blog/the-cost-of-creating-a-stream
First, subscribing to individual streams (aggregate or entity streams) won't ever work. You will end up with thousands of subscriptions, which are sitting there doing nothing (how often the user details change?).
The category stream is one way to go, you will project all the events for all the users. Not only you need just one subscription for all your users, but you'll also have more interesting possibilities like "users pending activation" or "blocked users" projections.
I prefer subscribing to $all and apply server-side filtering if necessary. It might have a bit of overhead as you receive more events than you need, but you get so much more power by combining events from different aggregates.
I wrote a little about it in Eventuous documentation.

How to model data in NoSQL Firebase datastore?

I want to store following data:
Users,
Events,
Attendees
(similar to Firebase's example given here:
https://www.youtube.com/watch?v=ran_Ylug7AE)
My Firebase store is like the following:
Users - Collection
{
"9582940055" :
{
"name" : "test"
}
}
Every user is a different document. Am I doing it correctly?
If yes, I have kept Mobile number of every user as Document Id instead of auto id, as the mobile number is going to be unique and it will help me in querying. Is this right?
Events - Collection
{
"MkyzuARd8Uelh0qD1WMa" : // auto id for every event
{
"name" : "test",
"attendees" : {
"user": 'Lakshay'
}
}
}
Here, I have kept attendees as a Map inside the Event document. Is it right or should I make Attendees as a collection inside Event document?
Also, "user": 'Lakshay' inside "attendees" is just a string. Is it advisable to use reference data type of Firebase?
Every user is a different document. Am I doing it correctly?
Yes, this is quite common. Initially it may seem a bit weird to have documents with so little data, but over time you'll get used to it (and likely add more data to each user).
I have kept Mobile number of every user as Document Id instead of auto id, as the mobile number is going to be unique and it will help me in querying. Is this right?
If the number is unique for each user in the context of your app, then it can be used to identify users, and thus also as the ID of the user profile documents. It is slightly more idiomatic to use the user's UID for this purpose, since that is the most common way to look up a user. But if you use phone numbers for that and they are unique for each user, you can also use that.
Here, I have kept attendees as a Map inside the Event document. Is it right or should I make Attendees as a collection inside Event document?
That depends...
Storing the events for a user in a single document means you have a limit to how many events you can store for a user, as a document can be no bigger than 1MB.
Storing the events for a user in their document means you always read the data for a user's events, even when you maybe only need to have the user's name. So you'll be reading more data than needed, wasting bandwidth for both you and your users.
Storing the events inside a subcollection allows you to query them, and read a subset of the events of a user.
On the other hand: using a subcollection means you end up reading more smaller documents. So you'd be paying for more document reads from a subcollection, while paying less for bandwidth.
Also, "user": 'Lakshay' inside "attendees" is just a string. Is it advisable to use reference data type of Firebase?
This makes fairly little difference, as there's not a lot of extra functionality that Firestore's DocumentReference field type gives.

MeteorJS + MongoDB: How should I set up my collections when users can have the same document?

I wasn't quite sure how to word my question in one line, but here's a more in depth description.
I'm building a Meteor app where users can "own" the same document. For example, a user has a list of movies they own, which of course multiple people can own the same movie. There are several ways I've thought of structuring my database/collections for this, but I'm not sure which would be best.
I should also note that the movie info comes from an external API, that I'm currently storing into my own database as people find them in my app to speed up the next lookup.
Option 1 (My current config):
One collection (Movies) that stores all the movies and their info. Another collection that basically stores a list of movie ids in each document based on userId. On startup, I get the list of ids, find the movies in my database, and store them in local collections (there are 3 of them). The benefit that I see from this is I only have to store the movie once. The downside that I've ran into so far is difficulty in keeping things in sync and properly loading on startup (waiting on the local collections to populate).
Option 2 :
A Movies collection that stores a list of movie objects for each user. This makes the initial lookup and updating very simple, but it means I'll be storing the same fairly large documents multiple times.
Option 3:
A Movies collection with an array of userids on each movie that own that movie. This sounds pretty good too, but when I update the movie with new info, will an upsert work and keep the userids safe?
Option 3 seems sensible. Some of the choice may depend on the scale of each collection or the amount of links (will many users own the same movie, will users own many movies).
Some helpful code snippits for using option 3:
Upsert a movie detail (does not affect any other fields on the document if it already exists):
Movies.upsert({name: "Jaws"}, {$set: {year: 1975}});
Set that a user owns a movie (also does not affect any other document fields. $addToSet will not add the value twice if it is already in the array while using $push instead would create duplicates):
Movies.update({_id: ~~some movie id~~}, {$addToSet: {userIds: ~~some user id~~}});
Set that a user no longer owns a movie:
Movies.update({_id: ~~some movie id~~}, {$pull: {userIds: ~~some user id~~}});
Find all movies that a user owns (mongo automatically searches the field's array value):
Movies.find({userIds: ~~some user id~~});
Find all movies that a user owns, but exclude the users field from the result (keep the document small in the case that movie.userIds is a large array or protect the privacy of other user-movie ownership):
Movies.find({userIds: ~~some user id~~}, {userIds: 0});

Many to many relationship on Mongodb based e-learning webapp?

I am relatively new to No-SQL databases. I am designing a data structure for an e-learning web app. There would be X quantity of courses and Y quantity of users.
Every user will be able to take any number of courses.
Every course will be compound of many sections (each section may be a video or a quiz).
I will need to keep track of every section a user takes, so I think the whole course should be part of the user set (for each user), like so:
{
_id: "ed",
name: "Eduardo Ibarra",
courses: [
{
name: "Node JS",
progress: "100%",
section: [
{name: "Introdiction", passed:"100%", field3:"x", field4:""},
{name: "Quiz 1", passed:"75%", questions:[...], field3:"x", field4:""},
]
},
{
name: "MongoDB",
progress: "65%",
...
}
]
}
Is this the best way to do it?
I would say that design your database depending upon your queries. One thing is for sure.. You will have to do some embedding.
If you are going to perform more queries on what a user is doing, then make user as the primary entity and embed the courses within it. You don't need to embed the entire course info. The info about a course is static. For ex: the data about Node JS course - i.e. the content, author of the course, exercise files etc - will not change. So you can keep the courses' info separately in another collection. But how much of the course a user has completed is dependent on the individual user. So you should only keep the id of the course (which is stored in the separate 'course' collection) and for each user you can store the information that is related to that (User, Course) pair embedded in the user collection itself.
Now the most important question - what to do if you have to perform queries which require 'join' of user and course collections? For this you can use javascript to first get the courses (and maybe store them in an array or list etc) and then fetch the user for each of those courses from the courses collection or vice-versa. There are a few drivers available online to help you accomplish this. One is UnityJDBC which is available here.
From my experience, I understand that knowing what you are going to query from MongoDB is very helpful in designing your database because the NoSQL nature of MongoDB implies that you have no correct way for designing. Every way is incorrect if it does not allow you in accomplishing your task. So clearly, knowing beforehand what you will do (i.e. what you will query) with the database is the only guide.

Links vs References in Document databases

I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.
The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.
MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.