I need to model a many-to-many relation on Firestore. A summary of the requirements follow.
A company can hire many contractors for a project. A contractor can work for many companies on different projects at different times.
There should be no limit on the number of contractors or companies, i.e. collections or sub-collections should be used.
A contractor should be able to query by companies; and vice versa, a company should be able to query by contractors. For example, (1) a contractor might ask for a list of companies he/she worked for sorted by project & time, and (2) a company can ask for all contractors who worked for them over a month sorted by project & contractor, and possibly divided by week.
As far as the company is concerned, a contractor can change status, e.g. working, complete. A company changes the status of a contractor during the project lifetime. This status can be used in queries.
Obviously, contractors should not have access to other contractors' information.
A company is represented by only a single user on the mobile app. Similarly, a contractor is represented by only a single user on the mobile app.
The mobile app is built in React Native, which (to the best of my knowledge) is considered by Firestore as a web app.
I am thinking of using a sub-collection of documents for/under each company. Each document represents a project. All contractors' names, their statuses and start and end times are stored on this document.
At the same time, having a duplicate sub-collection of project documents for/under each contractor. Each of these duplicate documents represents a partial copy of the project's document (above). This duplicate document stores the company name and start and end time of the project.
a. Whenever a relationship is established, e.g. a contract is signed, both documents are created in a batch.
b. Status exists only on the 1st copy of the document.
c. In case of any rare changes to the almost static data, eg. name, phone, both documents are updated.
Does this design make sense?
Any concerns, suggestions, better ideas?
If you agree with the design, I would love to hear from you, maybe you can write in a comment something like sounds good.
AskFirebase
There are particular cases when you can use a sub-collection and when not to use sub-collections.
When to use sub-collections:
1) When you don't want to store a lot of fields in a document. Cloud Firestore has 20,000 field limit. (If the Company and Contractor information is very huge and can exceed more than 20,000 fields)
2) When updating the parent collection is a common operation. Firestore only lets you update the document at rate of 1 write/second. (If the Company and Contractor information is modified very often)
3) When you want to limit the access to particular fields of a document. (If you want to restrict the access to a Company's contractors or if the access to Contractor's companies should be restricted. In this case moving the restricted fields to another document in another collection is also a good idea!)
When not to use sub-collections:
1) When you want to query the collections and sub-collections together. Firestore queries are shallow. So sub-collections won't be queried when you query the parent collection so you have to query them separately. (If you have a case to show all the companies and their contractors in one window)
2) When you want to show the sub-collection when viewing the collection.(When showing a company, you might want to show its contractors. Here the number of reads will increase because instead of reading one document you are reading one document and its sub-collection all the time)
3) When you want to query collections and sub-collections together.(You can use the newly announced collections-group query whenever you want to query something that's common across the Companies and Contractors such as field of work or minimum rate)
4) If you're thinking about querying individual pieces of data, you should put them in a collection. (If the Contractor's particular attributes are usually queried by Companies or a Company's details are looked upon by multiple Contractors)
My Suggestion:
Company collection to store company information on which companies can be searched according to their qualities.
Contractors collection with the same approach since I'm assuming contractors will be queried a lot according to their attributes.
Projects sub-collection for info about the projects on which companies and contractors will collaborate. This can be a sub-collection under Company collection if only one company will be working on a project. Even if multiple contractors are going to be working on a project for a company you can store the contractor's Ids in an array in the Projects collection. This will help you avoid the Projects partial sub-collection inside each Company/Contractor collection.
But if you need to query on the project's qualities, it is better to expose them as a seperate parent collection. I leave that up to you.
Finally I would suggest a new collection Contracts which can be used to store the relationship between Company, Contractor and Project and all the information on which you can do the complex querying on. If the same company and contractor has two different projects on which they are working/collaborating, then it can be two documents in Contracts collection. This comes handy when you want to show some dashboards. Using this single collection you can show the separate statistics for a Company, Contractor and complex statistics involving both Company and Contractor.
Hope this helps.
Related
I'm having some issues to correctly design the domain that I'm working on.
My straightforward use case is the following:
The user (~5000 users) can access to a list of ads (~5 millions)
He can choose to add/remove some of them as favorites.
He can decide to show/hide some of them.
I have a command which will mutate the aggregate state, to set Favorite to TRUE, let's say.
In terms of DDD, how should I design the aggregates?
How design the relationship between a user and his favorite's ads selection?
Considering the large numbers of ads, I cannot duplicate each ad inside a user aggregate root.
Can I design a Ads aggregateRoot containing a user "collection".
And finally, how to handle/perform the readmodels part?
Thanks in advance
Cheers
Two concepts may help you understand how to model this:
1. Aggregates are Transaction Boundaries.
An aggregate is a cluster of associated objects that are considered as a single unit. All parts of the aggregate are loaded and persisted together.
If you have an aggregate that encloses a 1000 entities, then you have to load all of them into memory. So it follows that you should preferably have small aggregates whenever possible.
2. Aggregates are Distinct Concepts.
An Aggregate represents a distinct concept in the domain. Behavior associated with more than one Aggregate (like Favoriting, in your case) is usually an aggregate by itself with its own set of attributes, domain objects, and behavior.
From your example, User is a clear aggregate.
An Ad has a distinct concept associated with it in the domain, so it is an aggregate too. There may be other entities that will be embedded within the Ad like valid_until, description, is_active, etc.
The concept of a favoriting an Ad links the User and the Ad aggregates. Your question seems to be centered around where this linkage should be preserved. Should it be in the User aggregate (a list of Ads), or should an Ad have a collection of User objects embedded within it?
While both are possibilities, IMHO, I think FavoriteAd is yet another aggregate, which holds references to both the User aggregate and the Ad aggregate. This way, you don't burden the concepts of User or the Ad with favoriting behavior.
Those aggregates will also not be required to load this additional data every time they are loaded into memory. For example, if you are loading an Ad object to edit its contents, you don't want the favorites collection to be loaded into memory by default.
These aggregate structures don't matter as far as read models are concerned. Aggregates only deal with the write side of the domain. You are free to rewire the data any way you want, in multiple forms, on the read side. You can have a subscriber just to listen to the Favorited event (raised after processing the Favorite command) and build a composite data structure containing data from both the User and the Ad aggregates.
I really like the answer given by Subhash Bhushan and I want to add another approach for you to consider.
If you look closely at your question you will see that you've made the assumption that an aggregate can 'see' everything that the user does when they are interacting with the UI. This doesn't need to be so.
Depending on the requirements of the domain you don't need to hold a list of any Ads in the aggregate to favourite them. Here's what I mean:
For this example, it doesn't matter where the the 'favourite' ad command sits. It could be on the user aggregate or a specific aggregate for handling the concept of Favouriting. The command just needs to hold the id of the User and the Ad they are favouriting.
You may need to handle what happens if a user or ad is deleted but that would just be a case of an event process manager listening to the appropriate events and issuing compensating commands.
This way you don't need to load up 5 million ads. That's a job for the read model and UI, not the domain.
Just a thought.
I'm working on developing an app as part of my college assignment. It's a project management app, and I'm having trouble deciding the best way to store users and teams in my Postgres DB. Basically, users can signup and create/join teams. A user can be a part of multiple teams (each working on multiple projects). Users also have roles in teams (with varying permissions according to the role) and while they have only one role in a given team, they may have a different role in another one. In addition, users can mark some of their teams as favorites for easy access through the front-end.
I've come up with 3 ERDs to solve this.
First, store all users in one table and and all teams in another. Users table has all the data pertaining to a user, while the team table has the team data along with the members,roles and whether or not a user has marked this team as a favorite - like below.
This will have a lot of data duplication - if a team has a hundred members, there will be 100 entries where teamid, name, description are the same.
So, in v2 I separated them and added a members table. Now, each team is saved once, and so is each user. A reference to the team and user is made each time a user joins/creates a team and is stored in the members table along with the user's role and whether or not they have favorited the team.
But, I thought it might be bad to save roles as a string. If roles ever need to be changed/updated or I need to add new roles/rename roles, it would be easier with an ID rather than a string (I think).
So, then I came up with this.
Now all roles, users and teams are stored once (its possible that I've made the roles table into something like a lookup table, which I've heard is a bad practice). All these can be referenced in the members table.
My DBMS concepts are a little weak though I have tried my best to follow steps to normalize it and bring it into BCNF form. But I'm still unsure if I've done this right, or what to fix if something is wrong.
So essentially, I would like to know:
Is my table structure correct or incorrect?
Should everything be split into multiple tables, or is some data duplication okay (since I can use multiple or creative queries to get whatever I need)?
I like your ERD3 best. I don't think it is overkill, I think it looks fine. Having a "members" table be mostly foreign keys into other tables is a common thing.
It is not necessary to eliminate every trace of commonality in every table - sometimes it is more efficient to put up with a small amount of duplication - but in your example I think your ERD3 looks good.
We recently started to work in a big project and we decided to use MongoDB as a DDBB solution.
We wrote a lot of code, but the project has started to grow and we found out that we're trying to use joins instead of doing it the NoSQLway, which denotes a bad DDBB design.
What I'm trying to ask here is a good design for our project, which, at this point consists of the following:
More than 12.000 Products
More than 2.000 Sellers
Every seller should have its own private area that will allow to create a product catalog based on the +12.000 "products template list".
The seller should be able to set the price, stock and offers, which will then be reflected only in his public product listing. The template list of products will remain unchanged.
Currently we have two collections. One for the products (which holds the general product information, like name, description, photos, etc...) and one collection in which we store documents that contain the ID of the product from the first collection, an ID that is related to the seller and the stock, price and offers values.
We are using aggregate with $lookup to "emulate" SQL's left join to merge the two collections, but the process is not scaling as we'd like it to and we're hitting serious performance issues.
We're aware that using joins is not the way to go in NoSQL. What should we do? How should we refactor our DDBB design? Should we embed the prices, offers and stock for each seller in each document?
The decision of using "Embedded documents" or "Joins among two or more different collections" should depend on how you are going to retrieve the data.If every time,while fetching product, you are going to fetch sellers,then it makes sense to make it an embedded document instead of different collections.But if you will be planning to fetch these two entities separately, then only option you are left with is to use Join.
I'm working to develop an app with my team. It's based on Meteor and React. We have 2 collections: Rooms and Locations. Each room has an uniq location. We have a page where we list all the rooms and we can filter them. This is the most used feature. Insert of new room or new location can be done only by the admin.
We are design our filter (by date, by floor, by time, by location name). All the property we need are in the Rooms collection, excpetion done for the location name. We come out with two solutions:
duplicate the location name used in the filter also for each room in the Rooms collections.
get the list of rooms for each property.
I'm try to figure out which one is the best.
first option:
In that case we only need one collection: Rooms. Will cost O(n). The cost to add the location name to the new room will be the same since we already need to add the property id. The extra cost will be the space on MongoDB to save it.
second option.
In this solution we have all the data well structured in the DB. But to filter by location we need to parse each room and find the proper location in the location collections. Only this I think will cost O(n*m).
This is a simple case, we will never scale to much, but since I'm new to mongo I would like to know which one of the two approach can lead to have better performance.
I'm more used to a relational database and am having a hard time thinking about how to design my database in mongoDB, and am even more unclear when taking into account some of the special considerations of database design for meteorjs, where I understand you often prefer separate collections over embedded documents/data in order to make better use of some of the benefits you get from collections.
Let's say I want to track students progress in high school. They need to complete certain required classes each school year in order to progress to the next year (freshman, sophomore, junior, senior), and they can also complete some electives. I need to track when the students complete each requirement or elective. And the requirements may change slightly from year to year, but I need to remember for example that Johnny completed all of the freshman requirements as they existed two years ago.
So I have:
Students
Requirements
Electives
Grades (frosh, etc.)
Years
Mostly, I'm trying to think about how to set up the requirements. In a relational DB, I'd have a table of requirements, with className, grade, and year, and a table of student_requirements, that tracks the students as they complete each requirement. But I'm thinking in MongoDB/meteorjs, I'd have a model for each grade/level that gets stored with a studentID and initially instantiates with false values for each requirement, like:
{
student: [studentID],
class: 'freshman'
year: 2014,
requirements: {
class1: false,
class2: false
}
}
and as the student completes a requirement, it updates like:
{
student: [studentID],
class: 'freshman'
year: 2014,
requirements: {
class1: false,
class2: [completionDateTime]
}
}
So in this way, each student will collect four Requirements documents, which are somewhat dictated by their initial instantiation values. And instead of the actual requirements for each grade/year living in the database, they would essentially live in the code itself.
Some of the actions I would like to be able to support are marking off requirements across a set of students at one time, and showing a grid of users/requirements to see who needs what.
Does this sound reasonable? Or is there a better way to approach this? I'm pretty early in this application and am hoping to avoid painting myself into a corner. Any help suggestion is appreciated. Thanks! :-)
Currently I'm thinking about my application data design too. I've read the examples in the MongoDB manual
look up MongoDB manual data model design - docs.mongodb.org/manual/core/data-model-design/
and here -> MongoDB manual one to one relationship - docs.mongodb.org/manual/tutorial/model-embedded-one-to-one-relationships-between-documents/
(sorry I can't post more than one link at the moment in an answer)
They say:
In general, use embedded data models when:
you have “contains” relationships between entities.
you have one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents.
The normalized approach uses a reference in a document, to another document. Just like in the Meteor.js book. They create a web app which shows posts, and each post has a set of comments. They use two collections, the posts and the comments. When adding a comment it's submitted together with the post_id.
So in your example you have a students collection. And each student has to fulfill requirements? And each student has his own requirements like a post has his own comments?
Then I would handle it like they did in the book. With two collections. I think that should be the normalized approach, not the embedded.
I'm a little confused myself, so maybe you can tell me, if my answer makes sense.
Maybe you can help me too? I'm trying to make a app that manages a flea market.
Users of the app create events.
The creator of the event invites users to be cashiers for that event.
Users create lists of stuff they want to sell. Max. number of lists/sellers per event. Max. number of position on a list (25/50).
Cashiers type in the positions of those lists at the event, to track what is sold.
Event creators make billings for the sold stuff of each list, to hand out the money afterwards.
I'm confused how to set up the data design. I need Events and Lists. Do I use the normalized approach, or the embedded one?
Edit:
After reading percona.com/blog/2013/08/01/schema-design-in-mongodb-vs-schema-design-in-mysql/ I found following advice:
If you read people information 99% of the time, having 2 separate collections can be a good solution: it avoids keeping in memory data is almost never used (passport information) and when you need to have all information for a given person, it may be acceptable to do the join in the application.
Same thing if you want to display the name of people on one screen and the passport information on another screen.
But if you want to display all information for a given person, storing everything in the same collection (with embedding or with a flat structure) is likely to be the best solution