Is it bad practice to add a relation "type" to a many to many table in SQL? [duplicate] - postgresql

This question already has answers here:
How can you represent inheritance in a database?
(7 answers)
Closed 2 years ago.
I am building a Postgres database which has the following two tables:
Projects (id, startDate, etc...)
and
Employees (id, name, etc...)
I want to keep track of the types of contributions that an employee adds to a project. For example, employee #1 might be an "engineer" on project 1 and a "manager" on project 2. I also don't want to restrict the number of contributions an employee can make to a certain project. So employee #1 could be both a "engineer" and a "manager" for a single project.
My first instinct was just to have a many to many relation between the two titled ProjectEmployees or something and store the projectId, employeeId, and a contributionType as a string which would only take on values from an enum as to not have to deal with misspellings or any related issues.
My main question is just whether or not this is a bad practice. My other thought was to split up each contribution type to its own table. So instead of an EmployeeProjects table, there would be tables such as ProjectEngineers, ProjectManagers, etc... and instead of storing the contributionType as a column, it would be implicit in the table I'm using, and the table only has to store projectId and employeeId. There are many more tables in this database which have a similar sort of relationship where there are many to many relations between different tables, but each relation could be one of many "types" of relations. Is it wiser to split these all into separate tables for each type of relation? Or is it better to just keep track of the relation type in a more general table like my first idea?
My desired result is to just be able to efficiently see which all project contributions (and types) an employee worked on as well as to see all contributors + contributor types for a project.

Use the many to many relation as in your first idea, which in my opinion is a good practice.
Avoid the creation of one table per contribution type as is not scalable and flexible. I.E. if one day you'll have a new contribution type, with the 2nd option you will need each time
to create a new table
to write the new table management logic
proceed with a new deploy of your sw
About the topic of storing the contribution types on a table (with id and description) or as a constraint with contribution types strings enumerated, in my opinion both are valuable solutions.
But if you think to manage contribution types in your software (in a first release or in the future) maybe having a table with contribution types anagraphics can be better. It depends by your design and requirements

Make a table to store contribution types as strings (manager, engineer, etc) and contribution type id (numeric id). This prevents misspellings.
Make a table to store contributions with columns: employee id, project id, contribution type id (you may want other columns there, but it should be unique on the combination of these 3 columns). Do not store contribution types as strings in a table like this, since, as you correctly mentioned, this may allow misspellings. Another reason is to save disk space. An extra join with a small table of contribution types is a small price to pay.

Related

Inheriting Parent Table with identifier (Postgres)

Sorry if this is a relatively easy problem to solve; I read the docs on inheritance and I'm still confused on how I would do this.
Let's say I have the parent table being car_model, which has the name of the car and some of it's features as the columns (e.g. car_name, car_description, car_year, etc). Basically a list of cars.
I have the child table being car_user, which has the column user_id.
Basically, I want to link a car to the car_user, so when I call
SELECT car_name FROM car_user WHERE user_id = "name", I could retrieve the car_name. I would need a linking component that links car_user to the car.
How would I do this?
I was thinking of doing something like having car_name column in car_user, so when I create a new data row in car_user, it could link the 2 together.
What's the best way to solve this problem?
Inheritance is something completely different. You should read about foreign keys and joins.
If one user drives only one car, but many users can drive same car, you need to build one-to-many -relation. Add car_name to your user table and JOIN using that field.

Complex and multiple connected database relations in MongoDB

I am currently trying to model a MongoDB database structure where the entities are very complex in relation to each other.
In my current collections, MongoDB queries are difficult or impossible to put into a single aggregation. Incidentally, I'm not a database specialist and have been working with MongoDB for only about half a year.
To keep it as simple as possible but necessary, this is my challenge:
I have newspaper articles that contain simple keywords, works (oevres, books, movies), persons and linked combinations of works and persons. In addition, the same people appear under different names in different articles.
Later, on the person view I want to show the following:
the links of the person with name and work and the respective articles
the articles in which the person appears without a work (by name)
the other keywords that are still in the article
In my structure I want to avoid that entities such as people occur multiple times. So these are my current collections:
Article
id
title
keywordRelations
KeywordRelation
id
type (single or combination)
simpleKeywordId (optional)
personNameConnectionIds (optional)
workIds (optional)
SimpleKeyword
id
value
PersonNameConnection
id
personId
nameInArticleId
Person
id
firstname
lastname
NameInArticle
id
name
type (e.g. abbreviation, synonyme)
Work
id
title
To meet the requirements, I would always have to create queries that range over 3 to 4 tables. Is that possible and useful with MongoDB?
Or is there an easier way and structure to achieve that?

DDD: modelling aggregate entities' unique global/local id in PostgreSQL

I have read Eric Evans' Domain Driven Design book and I have been trying to apply some of the concepts.
In his book, Eric talks about aggregates and how aggregate roots should have a unique global id whereas aggregate members should have a unique local id. I have been trying to apply that concept to my database tables and I'm running into some issues.
I have two tables in my PostgreSQL database: facilities and employees where employees can be assigned to a single facility.
In the past, I would lay out the employees table as follows:
CREATE TABLE "employees" (
"employeeid" serial NOT NULL PRIMARY KEY,
"facilityid" integer NOT NULL,
...
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is a globally unique id. I would then add code in the backend for access control validation, preventing users of one facility from accessing rows pertaining to other facilities. I have a feeling this might not be the safest way to do it.
What I am now considering is this layout:
CREATE TABLE "employees" (
"employeeid" integer NOT NULL,
"facilityid" integer NOT NULL,
...
PRIMARY KEY ("employeeid", "facilityid"),
FOREIGN KEY ("facilityid") REFERENCES "facilities" ("facilityid")
);
where employeeid is unique (locally) for a given facilityid but needs to be paired with a facilityid to be unique globally.
Concretely, this is what I am looking for:
Employee A (employeeid: 1, facilityid: 1)
Employee B (employeeid: 2, facilityid: 1)
Employee C (employeeid: 1, facilityid: 2)
where A, B and C are 3 distinct employees and...
adding an employee D to facility 1 would give him the keys (employeeid : 3, facilityid: 1)
adding an employee E to facility 2 would give him the keys (employeeid : 2, facilityid: 2)
I see two ways of achieving this:
I could use triggers or stored procedures to automatically generate new employeeids and store the last ids for every facility in another table for quicker access but I am concerned about concurrency issues and ending up with 2 employees from the same facility with the same id.
I could possibly create a new sequence for each facility to manage the employeeids but I fear ending up with thousands of sequences to manage and with procedures to delete those sequences in case a facility is deleted. Is there anything wrong with this? It seems heavy to me.
Which approach should I take? Is there anything I'm missing out on?
I am inferring from your question that you will be running a single database for all facilities, or at least that if you have a local database as the "master" for each facility that the data will need to be combined in a central database without collisions.
I would make the facilityid the high order part of the primary key. You could probably assign new employee numbers using a simple SELECT max(employeeid) + 1 ... WHERE facilityid = n approach, since adding employees to any one facility is presumably not something that happens hundreds of times per second from multiple concurrent sources. There is some chance that this could generate an occasional serialization failure, but it is my opinion that any database access should be through a framework which recognizes those and automatically retries the transaction.
I guess you overstressed the aggregate root concept here. In my understanding of modelling an employee (that depends on your context) an employee is almost always an aggregate root possibly referenced by another aggregate root facility.
Both employee and facility almost always have natural keys. For the employee this is typically some employee id (printed on employee identification badges, or at least maintained in the human resources software system) and facilities have this natural keys too almost always containing some location part and some number like "MUC-1" for facility 1 located in munich. But that all depends on your context. In case employee and facility have this natural keys your database model should be quite clear.

Manually creating intermediate table in a many to many relationship for Core - Data

I'm currenty working with Core-data for an iPhone project.
But I'm a bit confused about one element.
With Core Data currently you do not need to create the intermediate table when creating many to many relationships (its all handled behind the scenes by core data)
But in my case I actually need some attributes on my many to many relationship!
For example
I have a table called Students
and another table called Lessons
a Student can be in many lessons
and a lesson can have many students
Now a standard many to many relationship will not work for me as I actually need to define more details on the join, i.e. StartDate and LeaveDate.
In a standard sql model for example my join table would be something like
StudentLessons (Studentid, LessonId, StartDate, LeaveDate )
I would need these properties as when i'm querying for information I will need the details from the join to filter my results.
How can I create this in core data and also filter for results?
I've seen folks say that you would actually create the StudentLesson entity manually in core data.
Now if I did this would I just have the attributes (Startdate, LeaveDate) and then a one to many relationship from the Student and then the Lessons table?
Student - > StudentLessons
Lesson - > StudentLessons
I guess I'm a bit confused on how I would go about making sure that the relationships and the content of the relationships are setup correctly. (i.e If I add an Student object to the StudentLessons - how would I then assign/add the Lesson.)
Sorry this is my first time playing with Core Data.
Takes a bit o getting used to when coming from a full on sql background.
You are absolutely right. The correct way to do this is to create a new entity like StudentLessons. Let's call it Attendance. It should have the startDate and endDate, and two relationships.
The relationship to the student can be many-to-many, unless it is foreseeable that startDate and endDate are always different for each student. One Attendance with its dates can have many students in it. One student can have several Attendance duties.
Student <<---->> Attendance
Clearly, the relationship to Lesson should be one-to-many. One Lesson can have different Attendance configurations, with different dates. But each Attendance belongs only to one Lesson.
Lesson <---->> Attendance
To address your question, you can make the Attendance attribute of Lesson non-optional (and vice versa), this way it will ensure that each Lesson has at least one Attendance with appropriate dates, and each Attendance has exactly one Lesson.
I think your can remove the link between Student and Lesson. Just assign an Attendance rather than a lesson. If you want a Lesson assigned to a Student without dates, just allow Attendance to have NULL as those properties.
TheTiger,
Just because Core Data will create a join table for you, that doesn't mean you have to use it. Maintaining which student succeeds with which lesson is just the same except you will create the intermediate entity and then use the appropriate setters to build the relationships.
You will have to use more key paths and do relationship prefetching but those are straightforward to do.
Andrew

No-sql relations question

I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.