Have a question about architecture: I have 2 subjects, DocumentLetter and DocumentOther, both should be approved by managers.
What would be better: to use 2 additional models DocumentLetterApprove and DocumentOtherApprove with entity relations, OR one additional table without relations but contains info about model identity (columns ModelName and ModelID)?
Or another example, attachments for different documents.
Letter, contract - 2 different tables and each should have own attachment.
I can use additional table for each model (for letter and for contract) or create one table with fields field ModelName and ModelID?
Personally, I would favor keeping the separate entities /w the relationships if there is any possibility that the related entities (approvals) could be in any way different depending on what they are applied to. I avoid ambiguously linked tables unless they represent a large 1 to many entity that might be associated to one of a number of other entities.
The problem with using something like a "ParentType" + "ParentId" is that you cannot leverage any form of FK constraint between the related tables. This also means you cannot leverage EF relationships given there will probably be times loading one of documents and wanting to know if it is approved and details from the approval.
If an Approval for the different document types is expected to be identical then I would sooner declare a common Approval table/Entity and put an ApprovalId on each of the document type tables to establish a many-to-1 relationship from the document to the approval.
If an approval is identical and can form a many to many, then a suitable many-to-many relation table DocumentLetter - DocumentLetterApproval (FKs) - Approval (Approval details) can be employed.
If a Letter approval vs. other approval could be different then: DocumentLetter - DocumentLetterApproval (Approval details)
Design decisions like this usually come from considerations around DRY (Don't Repeat Yourself). What advice I can give is that KISS (Keep It Stupidly Simple) should trump DRY, and that DRY should only apply to logic/structure that is proven to be identical. (not merely expected to be identical, or worse, expected to be similar) DRY should be a re-factoring consideration for constant improvement, not an up-front design decision. Coding for DRY too early ends up costing you time when you paint yourself into corners. By keeping code fluid these relationships can be proven, then if they are proven to be identical, re-factored into a single entity. Time is still spent re-factoring, but re-factoring to make code structure better rather than making code worse when having to work around design assumptions.
An example where i might consider an ambiguous loosely linked linked table would be something like File Attachments. I might have several entities that can hold references to 1 or more attachments. Attachments are not something I would need to link to often, but rather through an explicit action that I could fire off an additional query for anyways since I'm not about to pre-load attachment details when loading a document. In this case an attachment table might have a ParentType and ParentId indexed so that I can quickly get details for a particular document or other entity. I would never try to do something like Context.Documents.Include(x => x.Attachments) or the like, there would be no such reference available. Attachments would always be accessed by single document so I would resort to Context.Attachments.Where(x => x.ParentType == ParentTypes.DocumentLetter && x.ParentId == documenLetterId).ToList();
I have experience working on systems that were designed solely with these types of ambiguously linked tables. They are not only extremely slow as they scale to any reasonable size, but they are also extremely error prone as systems evolve and the nature of the relationships change. Records have a tendency to get out of sync with the expected rules.
Related
In the Parse.com API reference for Swift on iOS, it is very clear when to use the different kinds of One-to-Many relationships, based on the expected size of the Many side.
But I find it less clear on what kind of Many-to-Many relationships to use when both sides could be very large.
In my case, I have a Charity object that my Users can make small (often one-dollar) contributions to--so each User could conceivably make thousands of these contributions, and each Charity could have thousands of Users making contributions to it.
The Many-to-Many options listed for this kind of thing are Parse Relations, Join Tables, and Arrays, of which the docs explain:
Arrays should be used when the relationship will reliably include under 100 references, which is very clear and helpful guidance that I should not use Arrays.
The docs say Parse Relations could be used, for instance, to connect Books with multiple Authors and Authors with multiple Books--a situation in which a given Book is unlikely to have over 100 Authors, and only rarely will an Author have over 100 Books--so it's unclear if this is appropriate when both sides could be very large, as in my case.
The docs say Join Tables should be used when extra metadata should be attached to each relationship, so for one thing, I don't at present have an explicit need for this, and for another, the docs don't seem to even mention anything about how or if it matters how large each side of the Many-to-Many relationship is.
In the absence of any other information, it looks like I should use Join Tables, but only because the docs don't imply that I shouldn't, and not for the reason the docs say I should.
Which seems like a flimsy rationale.
I would greatly appreciate any guidance anyone can give.
Behind the scenes, when you use Relation, Parse Server automatically creates a Joint Table for you and delivers some APIs for easily managing and fetching its data. So, in terms of performance, it should be very similar.
The downside of the Relation is the impossibility to add new fields to this "Joint Table" it creates. So, if you need, for example, to store the charities that each of the users like, a relation between User and Charity would be a good fit, because you just need to store that the relation exists and do not need to store any extra information.
On the other hand, if you need to store the donations that each user did to each of the charities, I'd create a Joint Table called Donation or UserCharity with a pointer to the User class, a pointer to the Charity class, and the value of the donation. In this case, Relation is not a fit because you need to store the donation value.
I work in cattle production and I am learning about database design with postgreSQL. Now I am working on an entity attribute relationship model for a database that allows to register the allocation of the pastures in which cattle graze. In the logic of this business an animal can be assigned to several grazing groups during its life. Each grazing group in turn has a duration and is composed of several pastures in which the animals graze according to a rotation calendar. In this way, at a specific time, animals graze in a pasture that is part of a grazing group.
I have a situation in which many grazing groups can be assigned to many animals as well as many pastures. Trying to model this problem I find a fan trap because there are two one-to-many relationships for a single table. According to this, I would like to ask you about how one can deal with this type of relationship in which one entity relates to two others in the form of many-to-many relationships.
I put a diagram on the problem.
model diagram
Thanks
Traditionally, using a link table (the ones you call assignment) between two tables has been the right way to do many-to-many relationships. Other choices include having an ARRAY of animal ids in grazing group, using JSONB fields etc. Those might prove to be problematic later, so I'd recommend going the old way.
If you want to keep track of history, you can add an active boolean field (to the link table probably) to indicate which assignment is current or have a start date and end date for each assignment. This also makes it possible to plan future assignments. To make things easier, make VIEWs showing only current assignment and further VIEWs to show JOINed tables.
Since there's no clear question in your post, I'd just say you are going the right way.
I was reading about owned entity types here https://learn.microsoft.com/en-us/ef/core/modeling/owned-entities#feedback and I was wondering when I would use that. Especially when using .ToTable(); although I am not sure if ToTable creates a relationship with keys.
I read the entire article so I understand that it essentially forces you to access the data via nav properties and prevents the owned table from being treated as an entity. They also say Include() is not needed and the data comes down with every query for the parent table so its not like you are reducing the amount of data that comes back.
So whats the point exactly? Also whats the point of "table splitting"?
It takes the place of Complex types with the option to set it up like a 1-1 relationship /w ToTable while automatically eager-loaded. This would use the same PK in both tables, same as 1-1.
The point Table-splitting would be that you want an object model that is normalized, where the table structure is not. This would fit scenarios where you have an existing table structure and want to split off related pieces of that data into sub-entities associated with the main entity. With the ToTable option, it would be similar to a 1-1 relationship, but automatically eager-loaded. However when considering the reasons to use a 1-1 relationship I would consider this option a bad choice.
The common reasons for using it in normal 1-1 relationships would include:
Splitting off expensive to load, rarely used data. (images, binary, memo)
Encapsulating data particular to a single application off of a common entity. i.e. if I have a "Customer" which is used by a billing system vs. a CRM I might have "CustomerBillingData" and "CustomerCRMData" owned by "Customer" rather than an inherited BillingCustomer / CRMCustomer. As there is a "single" customer that may serve one or both systems. Billing doesn't care about CRM data, CRM doesn't care about Billing. If all data is in "Customer" then both systems potentially need to be updated, and I cannot rely on constraints when the data is optional to the other system. By using composition I can enforce required data for a particular system.
In neither of these cases would I want to use table-splitting or anything that automatically eager-loads, so Owned Types /w ToTable would not replace 1-1 relationships by any stretch. It's essentially a more strict version of complex types, I'd say it's strictly used for entity organization. Not something I'd admit to wanting to use very often.
I'm writing an API with spring boot, trying to keep it restful but the structure is quite nested. So say I have:
/api/examboard/{ebid}/qualification/{qid}/subject/{sid}/module/{mid}/
I have a controller for every noun that will take in all Id's, the problem with this is that I don't really need an ebid or a qid for modules, they only really need to be concerned with subjects most of the time. The mapping between them all is quite simple. An examboard will have many qualifications, a qualification will have many subjects etc....
Now the problem is say I go for a simpler API design where I only need the parent Id so the Subject controller will also have:
api/subject/{sid}/module
then I need to include multiple services in my controller based on the way JPA works. As I need to include SubjectEntity based calls and ModuleEntity based calls. However I want to maintain a one to one relationship between my controllers/services and services/repositories. This is why I opted for the longer url as I've mentioned above, but it does seem like overkill. Does anyone have any advice on how I should structure an API like this, most examples are quite small and don't really fit.
Without knowing more about your models and the relations between them, this answer will have to stay a bit diffuse.
First of all - "it depends". I know, but it really does. The way you should design an API depends heavily on your use cases that will define required access patterns. Do you often need all modules for a subject? Then introduce /subjects/{sid}/modules, if you need the details for a module of a subject in a qualification in an examboard - by all means have a /examboards/{ebid}/qualifications/{qid}/subjects/{sid}/modules/{mid}
As you say there are many relations between your entities. That is fine, but it does not mean that you need your API to capture each of these relations in a dedicated endpoint. You should distiguish between retrieving and modifying entities here. Find below examples for certain operations you might want to have (not knowing your models, this may not apply - let's consider this an illustration)
Retrieve qualifications for an examboard
GET /examboards/{ebid}/qualifications plain and simple
GET /qualifications?ebid={ebid} if you feel you might need sophisticated filtering later on
or create a new qualitication for an examboard
POST /examboards/{ebid}/qualifications with the details submitted in the body
POST /qualifications with the details submitted in the body and making the associated examboard ebid part of the submitted data
or update an existing qualification
PUT /qualifications/{qid} (if this operation is idempotent)
POST /qualifications/{qid} (if it should not be considered idempotent)
or delete qualifications
DELETE /qualifications/{qid} deletes entities, cascade-deletes associations
DELETE /examboards/{ebid}/qualifications clears all qualifications from an examboard, without actually deleting the qualification entities
There are certainly more ways to let an API do all these things, but this should demonstrate that you need to think of your use cases first and design your API around them.
Please note the pluralisation of collection resources in the previous examples. This comes down to personal preference, but I tends to follow the argumentation of Sam Ruby in RESTful Web Services (available as PDF) that collections should be first-class citizens in an API
Usually, there should not be a reason to have 1:1:1 relationships between controllers, services and repositories. Usually, this is not even possible. Now, I don't know the reason why you might want to do this, but following through with this will force you to put a lot of logic into your database queries and models. While this (depending on your setup and skills) may or may not be easily testable, it certainly shifts the required test types from unit (simpler, usually faster, more fine-grained) to integration tests (require more setup, more complex, usually slower), when instead of having the bulk of your business logic in your services you put them into many joins and subselects in your repositories.
I will only address your REST API structure question.
As you already pointed out
The problem with this is that I don't really need an ebid or a qid for modules, they only really need to be concerned with subjects most of the time
You need to think of your entities as resources if your entity can stand for itself give it its own top level resource. If instead your entity exists only as a part of another entity build a subresource below its parent. This should correspond with the association type aggregation and composition in your object model design.
Otherwise every entity that is part of a many relationship should also be accessible via a subresource on the other side of the relationship.
As I understood you you have a OneToMany relationship between examboard and qualification so we get:
api/examboards/{eid}/qualifications
api/qualifications/{qid}/examboard
Yo could also remove the examboard subresource and include it in the qualification response.
For ManyToMany realtionships you need two subresources:
api/foos/{fid}/bars
api/bars/{bid}/foos
And another resource to manipulate the relationship itself.
api/foosToBars/{fid}+{bid}
Or likewise.
I'm new to the Entity model thing and i'm looking for an advise how to organize my Entity model.
should i create one entity model file (.edmx) which will contain all the tables in my database or should i break it logical files for user, orders, products, etc.
Please let me know which one is better and what the pros/cons (if any) of each alternative.
Thanks.
I'm going to go against the grain here. I've built 2 large applications with EF now, one with a single edmx and one with several. There are pros and cons but generally I found life to be much easier with one edmx. The reason is that there is almost never a real separation of domains within an app, even if there appears to be from the start. New requirements pop up asking you to relate entities in different edmx's then you have to refactor and continually move things around.
All arguments for dividing will soon be obsolete when EF 5 introduces Multiple Diagrams, which is the only real benefit for dividing edmx files in the first place. You don't want to have to see everything you're not working on and you don't want a performance impact.
In my app with the divided edmx's we now have some duplicate entities to get the benefit of navigation properties. Maybe your app has a true separation of domains, but usually everything connects to the user. I'm considering merging two of the now but it will be a lot of work. So I would say keep them together until it becomes a problem.
Having one big EDM containing all the entities generally is NOT a good practice and is not recommended. You should come up with different sets of domain models each containing related objects while each set is unrelated and disconnected from the other one.
Take a look at this post where I explained this matter in detail:
Does it make sense to create a single diagram for all entities?
i think we should keep multiple edmx files in our project. it's like 1-edmx file -- one aggregate (collection of related objects). as per ddd (domain drive design) we can have more than one aggregates in our model. we can keep one edmx file for each aggregate