DB Design: more tables vs less tables - entity-framework

Say I want to design a database for a community site with blogs, photos, forums etc., one way to do this is to single out the concept of a "post", as a blog entry, a blog comment, a photo, a photo comment, a forum post all can be thought as a post. So, I could potentially have one table named Post [PostID, PostType, Title, Body .... ], the PostType will tell what type of post it is.
Or I could design this whole thing with more tables, BlogPost, PhotoPost, ForumPost, and I'll leave Comment just it's own table with a CommentType column.
Or have a Post table for all types of post, but have a separate Comment table.
To be complete I'm using ADO.NET Entity Framework to implement my DAL.
Now the question what are some of the implications if I go with any route described above that will influence on my DB performance and manageability, middle tier design and code cleaness, EF performance etc.?
Thank you very much!
Ray.

Let me ask you this:
What happens if two years from now you decide to add a 'music post' as a blog type? Do you have to create a new table for MusicPost, and then re-code your application to integrate it? Or would you rather log on to your blog admin panel, add a blog type in a drop-down box called 'Music', and be on your merry way?
In this case, less tables!

Generally, life will be easier if you can have all the posts in one table:
less joins to perform
less tables to maintain
common attributes are not repeated between tables
code more generic
However, you could run into some issues:
if each subtype has a lot of its own attributes, you could end up with many columns - maybe too many for your DBMS
if a subtype has an attribute (e.g. a stored picture) that is expensive for your DBMS to maintain even when unused, you might not want that column in all rows
Should you run unto such an issue, you can create a new table just for the specific attributes of that post subtype - for example:
create table posts (post_id number primary key,
post_date date,
post_title ...); /* All the common attributes */
create table photo_post (post_id references posts, photograph ...);
In many cases, no such issues arise and a single table for all will suffice.
I can't think of any merit in creating a distinct table for every subtype.

The problem is similar to the question of how deep your hierarchy should be in an OO design.
A simple approach in OO terms would be to have a base class Post and children for BlogPost, ForumPost and so on. Comment could either be a child of Post or its own hierarchy, depending on your requirements.
Then how this is going to be mapped to DB tables is an entirely different question. This classical essay by Scott Ambler deals with the different mapping strategies and explains their advantages and disadvantages in a rather detailed way.

Related

Parse Platform on iOS: Relations, Joins, or Arrays for Large Many-to-Many?

In the Parse.com API reference for Swift on iOS, it is very clear when to use the different kinds of One-to-Many relationships, based on the expected size of the Many side.
But I find it less clear on what kind of Many-to-Many relationships to use when both sides could be very large.
In my case, I have a Charity object that my Users can make small (often one-dollar) contributions to--so each User could conceivably make thousands of these contributions, and each Charity could have thousands of Users making contributions to it.
The Many-to-Many options listed for this kind of thing are Parse Relations, Join Tables, and Arrays, of which the docs explain:
Arrays should be used when the relationship will reliably include under 100 references, which is very clear and helpful guidance that I should not use Arrays.
The docs say Parse Relations could be used, for instance, to connect Books with multiple Authors and Authors with multiple Books--a situation in which a given Book is unlikely to have over 100 Authors, and only rarely will an Author have over 100 Books--so it's unclear if this is appropriate when both sides could be very large, as in my case.
The docs say Join Tables should be used when extra metadata should be attached to each relationship, so for one thing, I don't at present have an explicit need for this, and for another, the docs don't seem to even mention anything about how or if it matters how large each side of the Many-to-Many relationship is.
In the absence of any other information, it looks like I should use Join Tables, but only because the docs don't imply that I shouldn't, and not for the reason the docs say I should.
Which seems like a flimsy rationale.
I would greatly appreciate any guidance anyone can give.
Behind the scenes, when you use Relation, Parse Server automatically creates a Joint Table for you and delivers some APIs for easily managing and fetching its data. So, in terms of performance, it should be very similar.
The downside of the Relation is the impossibility to add new fields to this "Joint Table" it creates. So, if you need, for example, to store the charities that each of the users like, a relation between User and Charity would be a good fit, because you just need to store that the relation exists and do not need to store any extra information.
On the other hand, if you need to store the donations that each user did to each of the charities, I'd create a Joint Table called Donation or UserCharity with a pointer to the User class, a pointer to the Charity class, and the value of the donation. In this case, Relation is not a fit because you need to store the donation value.

Does 2 additional tables better than one with meta?

Have a question about architecture: I have 2 subjects, DocumentLetter and DocumentOther, both should be approved by managers.
What would be better: to use 2 additional models DocumentLetterApprove and DocumentOtherApprove with entity relations, OR one additional table without relations but contains info about model identity (columns ModelName and ModelID)?
Or another example, attachments for different documents.
Letter, contract - 2 different tables and each should have own attachment.
I can use additional table for each model (for letter and for contract) or create one table with fields field ModelName and ModelID?
Personally, I would favor keeping the separate entities /w the relationships if there is any possibility that the related entities (approvals) could be in any way different depending on what they are applied to. I avoid ambiguously linked tables unless they represent a large 1 to many entity that might be associated to one of a number of other entities.
The problem with using something like a "ParentType" + "ParentId" is that you cannot leverage any form of FK constraint between the related tables. This also means you cannot leverage EF relationships given there will probably be times loading one of documents and wanting to know if it is approved and details from the approval.
If an Approval for the different document types is expected to be identical then I would sooner declare a common Approval table/Entity and put an ApprovalId on each of the document type tables to establish a many-to-1 relationship from the document to the approval.
If an approval is identical and can form a many to many, then a suitable many-to-many relation table DocumentLetter - DocumentLetterApproval (FKs) - Approval (Approval details) can be employed.
If a Letter approval vs. other approval could be different then: DocumentLetter - DocumentLetterApproval (Approval details)
Design decisions like this usually come from considerations around DRY (Don't Repeat Yourself). What advice I can give is that KISS (Keep It Stupidly Simple) should trump DRY, and that DRY should only apply to logic/structure that is proven to be identical. (not merely expected to be identical, or worse, expected to be similar) DRY should be a re-factoring consideration for constant improvement, not an up-front design decision. Coding for DRY too early ends up costing you time when you paint yourself into corners. By keeping code fluid these relationships can be proven, then if they are proven to be identical, re-factored into a single entity. Time is still spent re-factoring, but re-factoring to make code structure better rather than making code worse when having to work around design assumptions.
An example where i might consider an ambiguous loosely linked linked table would be something like File Attachments. I might have several entities that can hold references to 1 or more attachments. Attachments are not something I would need to link to often, but rather through an explicit action that I could fire off an additional query for anyways since I'm not about to pre-load attachment details when loading a document. In this case an attachment table might have a ParentType and ParentId indexed so that I can quickly get details for a particular document or other entity. I would never try to do something like Context.Documents.Include(x => x.Attachments) or the like, there would be no such reference available. Attachments would always be accessed by single document so I would resort to Context.Attachments.Where(x => x.ParentType == ParentTypes.DocumentLetter && x.ParentId == documenLetterId).ToList();
I have experience working on systems that were designed solely with these types of ambiguously linked tables. They are not only extremely slow as they scale to any reasonable size, but they are also extremely error prone as systems evolve and the nature of the relationships change. Records have a tendency to get out of sync with the expected rules.

Modeling many to many relations with postgreSQL

I work in cattle production and I am learning about database design with postgreSQL. Now I am working on an entity attribute relationship model for a database that allows to register the allocation of the pastures in which cattle graze. In the logic of this business an animal can be assigned to several grazing groups during its life. Each grazing group in turn has a duration and is composed of several pastures in which the animals graze according to a rotation calendar. In this way, at a specific time, animals graze in a pasture that is part of a grazing group.
I have a situation in which many grazing groups can be assigned to many animals as well as many pastures. Trying to model this problem I find a fan trap because there are two one-to-many relationships for a single table. According to this, I would like to ask you about how one can deal with this type of relationship in which one entity relates to two others in the form of many-to-many relationships.
I put a diagram on the problem.
model diagram
Thanks
Traditionally, using a link table (the ones you call assignment) between two tables has been the right way to do many-to-many relationships. Other choices include having an ARRAY of animal ids in grazing group, using JSONB fields etc. Those might prove to be problematic later, so I'd recommend going the old way.
If you want to keep track of history, you can add an active boolean field (to the link table probably) to indicate which assignment is current or have a start date and end date for each assignment. This also makes it possible to plan future assignments. To make things easier, make VIEWs showing only current assignment and further VIEWs to show JOINed tables.
Since there's no clear question in your post, I'd just say you are going the right way.

What is Objectified Relationship?

I am not sure if I should be asking this here or at the programmers site. I came across "objectified relationship" while researching recursive saving in llblgen framework...I then searched stackoverflow (yes, first) and then google. I then came across a brief (related) topic on nHibernate.
I have an idea what it is but is there a detail description or explanation on it?
The relationship is an object itself, not just a connection. In a database the relationship would be represented as a row in a table rather than just as a UID in a column referencing another table. In a graph the relationship would itself be a node rather than 'just' an edge.

Mapping i18n tables in JPA

I am trying to map the tables from a database (of 60 tables) using JPA. I am doing this for a multilingual application, hence every piece of data has to be available in more than one language.
My database table structure is something like this. I have a Region table, which is related to a RegionLanguage table. The RegionLanguage table actually holds the description for that Region in different languages. You may want to have a look at this diagram:
When it comes to JPA, I find it hard to map it in a way that would require as little associations as possible. I have tried to use the Secondary table concept, but it fails in some occasions since this is a #OneToMany relationship. Preferably, I was thinking of a solution that would make these two tables appear as a single object.
Your help is appreciated.
Thanks in advance.
I'm not sure I understand this strange diagram... The RegionLanguage has some some kind of foreign key pointing to Region.id, I take it? If Region has only one column (as shown on the 'diagram'), you can simply map only 'RegionLanguage' and you'll have only one entity as you wanted --- no information lost ;).
But seriously, how would you want it mapped? Do you want to have something like this:
class Region {
//.. the missing fields not shown in diagram
List<String> languages; // take only language to avoid creating separate entity for region language
}
or something like this:
class RegionInnerJoinRegionLanguage {
// all fields from Region
// all fields from RegionLanguage
}
In any case you didn't say how the rest of tables are joined with your i18n tables. From your description, I'm guessing, that all tables have fk to RegionLanguage. I'm not sure what the Region table is used for in the grand scheme of things. I guess it's just for grouping of languages... I imagine this 'models' Switzerland (one 'region' 4 languages)... But what will you do with languages spoken in several regions? Are you going to have several French, English etc. languages (one for each region) and all data multiplied for each of those??
I know you din't ask for this... I just think you oversimplified your data structure for this question... So much so that it's hard to guess what you really want to achieve.
In any case, if you want to use the list of strings approach, you can try this:
#ElementCollection
#CollectionTable(name="RegionLanguage",
joinColumns=#JoinColumn(name="regionID") // or whatever... it's not on your diagram.
)
#Column(name="langage")
private List<String> langages;
I still don't understand why you want to put both tables into a single entity... If it's 'read-only' data, you might try creating a view and mapping it --- always a 'way out' ;). But it's (both of them, in fact) a bit of over-complication, to my mind --- it's going to bite you in future. My advice is to just go with 'simple OneToMany mapping'.