Which method of storing USERS, ROLES & TEAMS in my relational DB is most efficient - postgresql

I'm working on developing an app as part of my college assignment. It's a project management app, and I'm having trouble deciding the best way to store users and teams in my Postgres DB. Basically, users can signup and create/join teams. A user can be a part of multiple teams (each working on multiple projects). Users also have roles in teams (with varying permissions according to the role) and while they have only one role in a given team, they may have a different role in another one. In addition, users can mark some of their teams as favorites for easy access through the front-end.
I've come up with 3 ERDs to solve this.
First, store all users in one table and and all teams in another. Users table has all the data pertaining to a user, while the team table has the team data along with the members,roles and whether or not a user has marked this team as a favorite - like below.
This will have a lot of data duplication - if a team has a hundred members, there will be 100 entries where teamid, name, description are the same.
So, in v2 I separated them and added a members table. Now, each team is saved once, and so is each user. A reference to the team and user is made each time a user joins/creates a team and is stored in the members table along with the user's role and whether or not they have favorited the team.
But, I thought it might be bad to save roles as a string. If roles ever need to be changed/updated or I need to add new roles/rename roles, it would be easier with an ID rather than a string (I think).
So, then I came up with this.
Now all roles, users and teams are stored once (its possible that I've made the roles table into something like a lookup table, which I've heard is a bad practice). All these can be referenced in the members table.
My DBMS concepts are a little weak though I have tried my best to follow steps to normalize it and bring it into BCNF form. But I'm still unsure if I've done this right, or what to fix if something is wrong.
So essentially, I would like to know:
Is my table structure correct or incorrect?
Should everything be split into multiple tables, or is some data duplication okay (since I can use multiple or creative queries to get whatever I need)?

I like your ERD3 best. I don't think it is overkill, I think it looks fine. Having a "members" table be mostly foreign keys into other tables is a common thing.
It is not necessary to eliminate every trace of commonality in every table - sometimes it is more efficient to put up with a small amount of duplication - but in your example I think your ERD3 looks good.

Related

How to modelling domain model - aggregate root

I'm having some issues to correctly design the domain that I'm working on.
My straightforward use case is the following:
The user (~5000 users) can access to a list of ads (~5 millions)
He can choose to add/remove some of them as favorites.
He can decide to show/hide some of them.
I have a command which will mutate the aggregate state, to set Favorite to TRUE, let's say.
In terms of DDD, how should I design the aggregates?
How design the relationship between a user and his favorite's ads selection?
Considering the large numbers of ads, I cannot duplicate each ad inside a user aggregate root.
Can I design a Ads aggregateRoot containing a user "collection".
And finally, how to handle/perform the readmodels part?
Thanks in advance
Cheers
Two concepts may help you understand how to model this:
1. Aggregates are Transaction Boundaries.
An aggregate is a cluster of associated objects that are considered as a single unit. All parts of the aggregate are loaded and persisted together.
If you have an aggregate that encloses a 1000 entities, then you have to load all of them into memory. So it follows that you should preferably have small aggregates whenever possible.
2. Aggregates are Distinct Concepts.
An Aggregate represents a distinct concept in the domain. Behavior associated with more than one Aggregate (like Favoriting, in your case) is usually an aggregate by itself with its own set of attributes, domain objects, and behavior.
From your example, User is a clear aggregate.
An Ad has a distinct concept associated with it in the domain, so it is an aggregate too. There may be other entities that will be embedded within the Ad like valid_until, description, is_active, etc.
The concept of a favoriting an Ad links the User and the Ad aggregates. Your question seems to be centered around where this linkage should be preserved. Should it be in the User aggregate (a list of Ads), or should an Ad have a collection of User objects embedded within it?
While both are possibilities, IMHO, I think FavoriteAd is yet another aggregate, which holds references to both the User aggregate and the Ad aggregate. This way, you don't burden the concepts of User or the Ad with favoriting behavior.
Those aggregates will also not be required to load this additional data every time they are loaded into memory. For example, if you are loading an Ad object to edit its contents, you don't want the favorites collection to be loaded into memory by default.
These aggregate structures don't matter as far as read models are concerned. Aggregates only deal with the write side of the domain. You are free to rewire the data any way you want, in multiple forms, on the read side. You can have a subscriber just to listen to the Favorited event (raised after processing the Favorite command) and build a composite data structure containing data from both the User and the Ad aggregates.
I really like the answer given by Subhash Bhushan and I want to add another approach for you to consider.
If you look closely at your question you will see that you've made the assumption that an aggregate can 'see' everything that the user does when they are interacting with the UI. This doesn't need to be so.
Depending on the requirements of the domain you don't need to hold a list of any Ads in the aggregate to favourite them. Here's what I mean:
For this example, it doesn't matter where the the 'favourite' ad command sits. It could be on the user aggregate or a specific aggregate for handling the concept of Favouriting. The command just needs to hold the id of the User and the Ad they are favouriting.
You may need to handle what happens if a user or ad is deleted but that would just be a case of an event process manager listening to the appropriate events and issuing compensating commands.
This way you don't need to load up 5 million ads. That's a job for the read model and UI, not the domain.
Just a thought.

Implementing Many to Many Relationships on Firestore

I need to model a many-to-many relation on Firestore. A summary of the requirements follow.
A company can hire many contractors for a project. A contractor can work for many companies on different projects at different times.
There should be no limit on the number of contractors or companies, i.e. collections or sub-collections should be used.
A contractor should be able to query by companies; and vice versa, a company should be able to query by contractors. For example, (1) a contractor might ask for a list of companies he/she worked for sorted by project & time, and (2) a company can ask for all contractors who worked for them over a month sorted by project & contractor, and possibly divided by week.
As far as the company is concerned, a contractor can change status, e.g. working, complete. A company changes the status of a contractor during the project lifetime. This status can be used in queries.
Obviously, contractors should not have access to other contractors' information.
A company is represented by only a single user on the mobile app. Similarly, a contractor is represented by only a single user on the mobile app.
The mobile app is built in React Native, which (to the best of my knowledge) is considered by Firestore as a web app.
I am thinking of using a sub-collection of documents for/under each company. Each document represents a project. All contractors' names, their statuses and start and end times are stored on this document.
At the same time, having a duplicate sub-collection of project documents for/under each contractor. Each of these duplicate documents represents a partial copy of the project's document (above). This duplicate document stores the company name and start and end time of the project.
a. Whenever a relationship is established, e.g. a contract is signed, both documents are created in a batch.
b. Status exists only on the 1st copy of the document.
c. In case of any rare changes to the almost static data, eg. name, phone, both documents are updated.
Does this design make sense?
Any concerns, suggestions, better ideas?
If you agree with the design, I would love to hear from you, maybe you can write in a comment something like sounds good.
AskFirebase
There are particular cases when you can use a sub-collection and when not to use sub-collections.
When to use sub-collections:
1) When you don't want to store a lot of fields in a document. Cloud Firestore has 20,000 field limit. (If the Company and Contractor information is very huge and can exceed more than 20,000 fields)
2) When updating the parent collection is a common operation. Firestore only lets you update the document at rate of 1 write/second. (If the Company and Contractor information is modified very often)
3) When you want to limit the access to particular fields of a document. (If you want to restrict the access to a Company's contractors or if the access to Contractor's companies should be restricted. In this case moving the restricted fields to another document in another collection is also a good idea!)
When not to use sub-collections:
1) When you want to query the collections and sub-collections together. Firestore queries are shallow. So sub-collections won't be queried when you query the parent collection so you have to query them separately. (If you have a case to show all the companies and their contractors in one window)
2) When you want to show the sub-collection when viewing the collection.(When showing a company, you might want to show its contractors. Here the number of reads will increase because instead of reading one document you are reading one document and its sub-collection all the time)
3) When you want to query collections and sub-collections together.(You can use the newly announced collections-group query whenever you want to query something that's common across the Companies and Contractors such as field of work or minimum rate)
4) If you're thinking about querying individual pieces of data, you should put them in a collection. (If the Contractor's particular attributes are usually queried by Companies or a Company's details are looked upon by multiple Contractors)
My Suggestion:
Company collection to store company information on which companies can be searched according to their qualities.
Contractors collection with the same approach since I'm assuming contractors will be queried a lot according to their attributes.
Projects sub-collection for info about the projects on which companies and contractors will collaborate. This can be a sub-collection under Company collection if only one company will be working on a project. Even if multiple contractors are going to be working on a project for a company you can store the contractor's Ids in an array in the Projects collection. This will help you avoid the Projects partial sub-collection inside each Company/Contractor collection.
But if you need to query on the project's qualities, it is better to expose them as a seperate parent collection. I leave that up to you.
Finally I would suggest a new collection Contracts which can be used to store the relationship between Company, Contractor and Project and all the information on which you can do the complex querying on. If the same company and contractor has two different projects on which they are working/collaborating, then it can be two documents in Contracts collection. This comes handy when you want to show some dashboards. Using this single collection you can show the separate statistics for a Company, Contractor and complex statistics involving both Company and Contractor.
Hope this helps.

some questions about designing on OrientDB

We were looking for the most suitable database for our innovative “collaboration application”. Sorry, we don’t know how to name it in a way generally understood. In fact, highly complicated relationships among tenants, roles, users, tasks and bills need to be handled effectively.
After reading 5 DBs(Postgrel, Mongo, Couch, Arango and Neo4J), when the words “… relationships among things are more important than things themselves” came to my eyes, I made up my mind to dig into OrientDB. Both the design philosophy and innovative features of OrientDB (multi-models, cluster, OO,native graph, full graph API, SQL-like, LiveQuery, multi-masters, auditing, simple RID and version number ...) keep intensifying my enthusiasm.
OrientDB enlightens me to re-think and try to model from a totally different viewpoint!
We are now designing the data structure based on OrientDB. However, there are some questions puzzling me.
LINK vs. EDGE
Take a case that a CLIENT may place thousands of ORDERs, how to choose between LINKs and EDGEs to store the relationships? I prefer EDGEs, but they seem like to store thousands of RIDs of ORDERs in the CLIENT record.
Embedded records’ Security
Can an embedded record be authorized independently from it’s container record?
Record-level Security
How does activating Record-level Security affect the query performance?
Hope I express clearly. Any words will be truly appreciated.
LINK vs EDGE
If you don't have properties on your arch you can use a link, instead if you have it use edges. You really need edges if you need to traverse the relationship in both directions, while using the linklist you can only in one direction (just like a hyperlink on the web), without the overhead of edges. Edges are the right choice if you need to walk thru a graph.Edges require more storage space than a linklist. Another difference between them it's the fact that if you have two vertices linked each other through a link A --> (link) B if you delete B, the link doesn't disappear it will remain but without pointing something. It is designed this way because when you delete a document, finding all the other documents that link to it would mean doing a full scan of the database, that typically takes ages to complete. The Graph API, with bi-directional links, is specifically designed to resolve this problem, so in general we suggest customers to use that, or to be careful and manage link consistency at application level.
RECORD - LEVEL SECURITY
Using 1 Million vertex and an admin user called Luke, doing a query like: select from where title = ? with an NOT_UNIQUE_HASH_INDEX the execution time it has been 0.027 sec.
OrientDB has the concept of users and roles, as well as Record Level Security. It also supports token based authentication, so it's possible to use OrientDB as your primary means of authorizing/authenticating users.
EMBEDDED RECORD'S SECURITY
I've made this example for trying to answer to your question
I have this structure:
If I want to access to the embedded data, I have to do this command: select prop from User
Because if I try to access it through the class that contains the type of car I won't have any type of result
select from Car
UPDATE
OrientDB supports that kind of authorization/authentication but it's a little bit different from your example. For example: if an user A, without admin permission, inserts a record, another user B can't see the record inserted by user A without admin permission. An User can see only the records that has inserted.
Hope it helps

MongoDB: How to organize data

I am a little bit uncertain on how to organize the data when using MongoDB.
I have a user with some various data. Say a classified service, with a profile and possibly some items for sale. In a relational database this data would be split up into a profile table and a for-sale table. As I understand in MongoDB this would probably all go into one "document" (well, probably except if there is very large number of items for sale).
But my classified service is a little bit special, as for each item for sale, an administrator (salesman) adds stuff to the item for sale, such as allow the ad to go public, a comment on the item and possibly more. The user should obviously not be able to alter this admin-added info.
What would be the recommended way to deal with this? Can the administrator just change (add to) the users item-document? But I guess the user can then change what the administrator has added, right? So perhaps a better approach would be for the admin to create another document that contains the added data, and these two documents would be merged before being displayed?
Maybe the following may be helpful: http://docs.mongodb.org/manual/applications/data-models/?
Also, http://docs.mongodb.org/manual/data-modeling/

How to decide whether to use a RDBMS, Doc/Obj ODBMS or Graph?

What I intend to design basically boils down to a list of users, organisations, events, addresses and comments which could quite easily be maintained in a RDBMS such as MySQL. However, if the project takes off I want to add another aspect which is resources - i.e. files, videos, images etc which can belong to either a user, organisation or event. This instantly raises the question of whether to use a RDBMS and store a reference to an external file through a table related to each of the categories previously mentioned or whether to use a Doc/Obj ODBMS such as MongoDB to store these items.
But I also want to be able to link users, organisations and events. i.e. User A owns Org 1 and Org 2. User B owns Org 3 and Org 4. User C owns Org 5. Org 1 has an Event X, held at Addr M on Date R, which Org 3 will also be at. User C intends to attend Event X. Org 2 also has an Event Y at Addr M but on Date T. etc etc. As such, I would suspect that a Graph DBMS such as OrientDB would be the best solution. Either that, or I would have a lot of tables in a RDBMS with a lot of joins, and potentially a lot of queries, or a very strange structure in a Doc/Obj DBMS.
I've looked at InfoGrid, which is a Graph database that can connect to MySQL, which could be a potential way to skin this cat. Has anybody else attempted anything like this? What are your thoughts on how to implement such a system, which needs to be scalable? Suggestions are greatly appreciated.
Your description lends itself to a relational model. RDBMS for this particular setup is the proper way to go.