I have read Eric Evans' Domain Driven Design book and I have been trying to apply some of the concepts.
In his book, Eric talks about aggregates and how aggregate roots should have a unique global id whereas aggregate members should have a unique local id. I have been trying to apply that concept to my database tables and I'm running into some issues.
I have two tables in my PostgreSQL database: facilities and employees where employees can be assigned to a single facility.
In the past, I would lay out the employees table as follows:
CREATE TABLE "employees" (
"employeeid" serial NOT NULL PRIMARY KEY,
"facilityid" integer NOT NULL,
...
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is a globally unique id. I would then add code in the backend for access control validation, preventing users of one facility from accessing rows pertaining to other facilities. I have a feeling this might not be the safest way to do it.
What I am now considering is this layout:
CREATE TABLE "employees" (
"employeeid" integer NOT NULL,
"facilityid" integer NOT NULL,
...
PRIMARY KEY ("employeeid", "facilityid"),
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is unique (locally) for a given facilityid but needs to be paired with a facilityid to be unique globally.
Concretely, this is what I am looking for:
Employee A (employeeid: 1, facilityid: 1)
Employee B (employeeid: 2, facilityid: 1)
Employee C (employeeid: 1, facilityid: 2)
where A, B and C are 3 distinct employees and...
adding an employee D to facility 1 would give him the keys (employeeid : 3, facilityid: 1)
adding an employee E to facility 2 would give him the keys (employeeid : 2, facilityid: 2)
I see two ways of achieving this:
I could use triggers or stored procedures to automatically generate new employeeids and store the last ids for every facility in another table for quicker access but I am concerned about concurrency issues and ending up with 2 employees from the same facility with the same id.
I could possibly create a new sequence for each facility to manage the employeeids but I fear ending up with thousands of sequences to manage and with procedures to delete those sequences in case a facility is deleted. Is there anything wrong with this? It seems heavy to me.
Which approach should I take? Is there anything I'm missing out on?
I am inferring from your question that you will be running a single database for all facilities, or at least that if you have a local database as the "master" for each facility that the data will need to be combined in a central database without collisions.
I would make the facilityid the high order part of the primary key. You could probably assign new employee numbers using a simple SELECT max(employeeid) + 1 ... WHERE facilityid = n approach, since adding employees to any one facility is presumably not something that happens hundreds of times per second from multiple concurrent sources. There is some chance that this could generate an occasional serialization failure, but it is my opinion that any database access should be through a framework which recognizes those and automatically retries the transaction.
I guess you overstressed the aggregate root concept here. In my understanding of modelling an employee (that depends on your context) an employee is almost always an aggregate root possibly referenced by another aggregate root facility.
Both employee and facility almost always have natural keys. For the employee this is typically some employee id (printed on employee identification badges, or at least maintained in the human resources software system) and facilities have this natural keys too almost always containing some location part and some number like "MUC-1" for facility 1 located in munich. But that all depends on your context. In case employee and facility have this natural keys your database model should be quite clear.
Related
I have 3 fields to store in DynamoDB: identity-1, identity-2, score.
identity-1 and identity-2 are always unique in the table, i.e. no two entries can have same identity-1 or identity-2.
We want to allow entries to either have one of identity-1 or identity-2 or have both. Example:
identity-1
identity-2
score
a1
b1
s1
a2
s2
b3
s3
Access patterns are as follows:
Query identity-2 from identity-1
Query score from identity-1
Query score from identity-2
How do I define primary key in such case?
This is a "many:1" problem and there's a few ways to tackle it with DynamoDB. The simple answer here is to leverage Global Secondary Indexes (GSI). For every "identity" you wanted to do a direct look up from, you'd create a GSI.
GSI-1 would include Identity-1 as the hash key and you'd include Identity-2 and any other identities as a non-key attribute to include. You'd create a GSI for each identity you wanted to query directly on. You could also include the score as a non-key attribute if you wanted to directly look up score from any identity without having to resolve to the primary key (which we'll talk about).
The thing to consider with GSI's, though, is that they consume extra storage and throughput. If you create a GSI which includes all your attributes for every identity, you'd be paying for an additional copy of your table for each identity.
The other issue, so far, is that you haven't chosen a Primary Key for your table. You'll need a field to be your primary key and if none of your identities is non-nullable, you'll need a field which will be. It's often convenient to just call it what it is, so we'll call it pk.
You've got a few choices for pk here. Once is to define pk as a composite of your identities. For example: item.pk = item["identity-1"] || item["identity-2"]. Then you could do a query on the table for the identity == pk and if you don't find anything, you could then look up the index for the given identity. This works fine for your simple example, but as you wanted to do more complex things (such as many different identity types), you might find it to be a bit of a headache.
From past experience, my recommendation would be to adjust your approach slightly, however, and have an "users" table and a "scores" table. "users" would have a pk of a guid unique for every user and all their identities (call it "user_id"), you could then create a GSI for that table for every identity back to user_id. Then scores would then use "user_id" as the pk as well with no need for an index. Your application would always resolve to a "user_id" when a user was logged in or otherwise identified - then you can search for score without needing to track identity and you can look up all the associated identities or other user information without needing to create a very "fat" index of every identity->every other identity.
So in a traditional database I might have 2 tables like users, company
id
username
companyid
email
1
j23
1
something#gmail.com
2
fj222
1
james#aol.com
id
ownerid
company_name
1
1
A Really boring company
This is to say that user 1 and 2 are apart of company 1 (a really boring company) and user 1 is the owner of this company.
I could easily issue an update statement in MySQL or Postgresql to update the company name.
But how could I model the same data from a NoSQL perspective, in something like Dynamodb or Mongodb?
Would each user record (document in NoSQL) contain the same company table data (id, ownerid (or is owner true/false, and company name)? I'm unclear how to update the record for all users containing this data then if the company name needed to be updated.
In case you want to save the company object as JSON in each field (for performance reasons), indeed, you have to update a lot of rows.
But best way to achieve this is to have a similar structure as you have above, in MySQL. NoSql schema depends a lot on the queries you will be making.
For example, the schema above is great for:
Find a particular user by username, along with his company name. First you need to query User by username (you can add an index), get the companyId and do another query on Company to fetch the name.
Let's assume company name changes often
In this case company name update is easy. To execute the read query, you need 2 queries to get your result (but they should execute fast)
Embedded company JSON would work better for:
Find all users from a specific city and show their company name
Let's assume company name changes very rarely
In this case, we can't use the "relational" approach, because we will do 1 query to fetch Users by city and then another query for all users found to fetch the company name
Using embedded approach, we need only 1 query
To update a company name, a full (expensive) scan is needed, but should be ok if done rarely
What if company name changes ofter and I want to get users by city?
This becomes tricky, NoSQL is not a replacement for SQL, it has it's shortcomings. Solution may be a platform dependent feature (from mongo, dynamodb, firestore etc.), an additional layer above (elasticSearch) or no solution at all (consider not using key-value NoSQL)
Depends on the programming language used to handle NoSQL objects/documents you have variety of ORM libraries to model your schema. Eg. for MongoDB plus JS/Typescript I recommend Mongoose and its subdocuments. Here is more about it:
https://mongoosejs.com/docs/subdocs.html
This question already has answers here:
How can you represent inheritance in a database?
(7 answers)
Closed 2 years ago.
I am building a Postgres database which has the following two tables:
Projects (id, startDate, etc...)
and
Employees (id, name, etc...)
I want to keep track of the types of contributions that an employee adds to a project. For example, employee #1 might be an "engineer" on project 1 and a "manager" on project 2. I also don't want to restrict the number of contributions an employee can make to a certain project. So employee #1 could be both a "engineer" and a "manager" for a single project.
My first instinct was just to have a many to many relation between the two titled ProjectEmployees or something and store the projectId, employeeId, and a contributionType as a string which would only take on values from an enum as to not have to deal with misspellings or any related issues.
My main question is just whether or not this is a bad practice. My other thought was to split up each contribution type to its own table. So instead of an EmployeeProjects table, there would be tables such as ProjectEngineers, ProjectManagers, etc... and instead of storing the contributionType as a column, it would be implicit in the table I'm using, and the table only has to store projectId and employeeId. There are many more tables in this database which have a similar sort of relationship where there are many to many relations between different tables, but each relation could be one of many "types" of relations. Is it wiser to split these all into separate tables for each type of relation? Or is it better to just keep track of the relation type in a more general table like my first idea?
My desired result is to just be able to efficiently see which all project contributions (and types) an employee worked on as well as to see all contributors + contributor types for a project.
Use the many to many relation as in your first idea, which in my opinion is a good practice.
Avoid the creation of one table per contribution type as is not scalable and flexible. I.E. if one day you'll have a new contribution type, with the 2nd option you will need each time
to create a new table
to write the new table management logic
proceed with a new deploy of your sw
About the topic of storing the contribution types on a table (with id and description) or as a constraint with contribution types strings enumerated, in my opinion both are valuable solutions.
But if you think to manage contribution types in your software (in a first release or in the future) maybe having a table with contribution types anagraphics can be better. It depends by your design and requirements
Make a table to store contribution types as strings (manager, engineer, etc) and contribution type id (numeric id). This prevents misspellings.
Make a table to store contributions with columns: employee id, project id, contribution type id (you may want other columns there, but it should be unique on the combination of these 3 columns). Do not store contribution types as strings in a table like this, since, as you correctly mentioned, this may allow misspellings. Another reason is to save disk space. An extra join with a small table of contribution types is a small price to pay.
I'm new to database design and I am working on a project that requires the use of a single entity (medication) that could be tied to any number of patients and each patient could have a different dosage. What would be the best way to layout a table for this type of situation. I could use a single table and just store each individual medication and dosage and tie that to the unique patient. But that would give me duplicate entries in the medication table (same med with just different dosage).
What I would like is to have a single entry for each medication name and have each patient have a unique dosage for that particular med. Of course a single patient could also have many different medications so I would have to be able to have a unique dosage for each med for different patients.
I using entity framework model first approach. Would I use a single table T_Patient_Medication and use each of the two table IDs as the primary key combo and then use a dosage field for that combination? If so how would I create the association to tie this table to the other two. Any suggestions?
Off the top of my head:
-a medication table(MedicineId, MedicineName, etc).
-a patient table(PatientId, PatientName, etc)
-a patient-medicine table(MedicineId, PatientId, Dosage, date, notes etc).
In other words, the medication table contains on row per unique med, a patient contains one row for each unique patient.
The patient-medicine table is where these two things meet: it contains a patientId, a medicineId and then anything else unique about that patient getting that medicine (i.e. Dr. name, dosage, date started etc). Personally I would make each row in the patient-medicine table also have its own unqiue ID separate from the combination of the patientid and medicine id (what are you going to do when the same patient goes back on the same medicine at a different time, if your primary key is Patientid+Medicineid). Each record should have its own unique id in my way of thinking.
There would be foreign keys between the tables to enforce this relationship: i.e. you can't add a row to the patient-medicine table unless the patientid exists in the patient table, and the medicine exists in the medicine table; and equally important prevent you from deleting a rows from tables where there are dependent records in other tables. If you take the time and setup all those foreign keys (relationships), it will be a breeze in EF to navigate the related records.
It is without a doubt more complicated than this, but that is the basics idea of a relational table.
I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.