DB Tables which are in 3NF or 4NF but not in DKNF - rdbms

Are there examples of Relational tables which are in 3NF or 4NF but
not in Domain Key Normal Form?

Edit, Aug 2018 after 9 years
DKNF is the ultimate state of database normalisation after all previous normal forms have been eliminated
1NF -> 2NF -> 3NF -> BCNF -> 4NF -> 5NF -> 6NF/DKNF
The 6NF/DKNF question (Fagin vs Date) is out of scope here
So the question doesn't make sense because any design that is "only" in 3NF or 4NF won't be DKNF (or 6NF)
Most folk don't design past BCNF unless you have particularly complex relationships.
New link: https://www.tutorialride.com/dbms/database-normalization.htm

Yes. Domain Key Normal Form is not an enforceable normalization step. DKNF is "If every table has a single theme, then all functional dependencies will be logical consequences of keys. All data value constraints can then be expressed as domain constraints." In other words, if every constraint on the relation is a logical consequence of the definition of keys and domains, then the relation is in DKNF.
DKNF is mistakenly referred to as Sixth Normal Form (6NF) by some in the research community, but it is technically incorrect. CJ Date covers this in detail, and this On DK/NF Normal Form article is where I first learned about DKNF and understood its properties.

Related

Regarding 1 to 1 associations in UML class models

We often encounter class models, in UML modeling, that state a 1 x 1 or 1 x 1..* or 1..* x 1 or 1..* x 1..* association between given classes.
Take the example: Player 1..11 x 1 Team.
Wouldn't that impose a practical problem, in which it wouldn't be possible to determine what comes first: the team or a player?
In the example, a team would need a player, at least, to exist, while a player, to exist, needs the team.
Am I misinterpreting something?
Trying to implement it, you wouldn't be able to instantiate a Team, because you'd need at least one Player, and if you try to instantiate the Player, the Team would be missing.
How are 1 x 1 associations possible?
Thank you for your time!
When a model as a 1..11 relationship (Team - Player) there is no "what comes first". There need to be 11 Players and these can be connected to one Team. Only when the connections are all made you have a complying model. You can point out that players form a team by adding a composition. But usually from a programming aspect this does not add much semantics. You need to have the instances anyway in order to create the connections. So the Team will likely have an array of 11 Players and in order to work, none of them must be Null.
The same goes for 1..1 relations (plug/socket). Only when they are connected you have a complying model. From a modeling perspective a 1..1 is often used if you need a rucksack for one of the two classes. Then you bind another one with separate information. This can then be used together with other classes which are only interested in this contents and not the carrier itself.
You are correct. You must satisfy all the constraints somehow. Either create everything at once or relax your constraints. For example, a team can still exist as a team without any players, but a team must exist for a player to join it.
Your question "How are 1 x 1 associations possible?" refers to the issue of mandatory mutually inverse references, or, in DBMS jargon, cyclic foreign keys, which may indeed create an object/row creation or update problem in a data management app or its underlying database (DB).
There are two approaches how to deal with it: 1) Relax the mandatory reference constraint in at least one direction, 2) Allow intermediate app/DB states that do not have to satisfy the constraint.
1) While we know that in reality a team always includes more than zero players, we may choose not to implement this constraint for pragmatic reasons, such that we can more easily create a team data object (or DB row) without immediately assigning player objects/rows to it.
2) In our app, we may allow an intermediate state where a team has been created without any players assigned to it, and correspondingly in the underlying DB, we may instruct the transaction manager that the foreign key constraint is only checked when the entire transaction (consisting of first creating an empty team, then creating 11 players, such that each of them is assigned to the team and the team is assigned to them as their team) is completed. This can be achieved with the SQL clause DEFERRABLE INITIALLY DEFERRED, see the section Cyclic Foreign Keys of the post "Deferrable SQL Constraints in Depth".
It depends what kind of model it is. A UML model might be a model of the real world, e.g. a human hand has 1:1 relationship with a human arm. This 1:1-association models a biological fact (ignoring disabilities).
A UML model might also be a model of the functionality of an application. If the application enforces that every team has 11 players and every player has one team (e.g. they are all created at once by filling out a form with a "Save"-button), then the 1:11-association correctly models the functionality of the application.
A UML model might also be a technical model of classes in some programming language or tables in a database. In that case, the 1:1-associations are only possible if your programming language or database system allows creating the instances on both sides simultaneously, or at least in the same transaction.
Side note: When modeling teams and players, you might consider an association between Team and Person, with multiplicity 0..* and role name 'player' on the Team side.

Which normal form does the ER Diagram guarantee?

Whenever a proper ER diagram is drawn for a database and then mapped to the relational schema, I was informed that it guarantees 3NF.
Is this claim true?
If not, can anyone provide me a counter example.
Also, please tell me whether any normal form can be claimed to be strictly followed when relational schema is mapped from a perfect ER diagram?
The short answer is no. Depending on the analysis and design approach there could be examples of ER models that appear perfectly sound in ER terms but don't necessarily translate to a relational schema in 3NF. ER modelling and notation is not really expressive enough or formal enough to guarantee that all functional dependencies are correctly enforced in database designs. Experienced database designers are conscious of this and apply other techniques to come up with the "proper" design.
Terry Halpin devised a formal method for database design that guarantees a relational schema satisfying 5th Normal Form (see orm.net). He uses the Object Role Modelling approach, not ER modelling.
The diagram just shows what entities and attributes you have and how entities relate to one-another. Your attributes can violate the normal forms. An ER diagram is just a representation, it does not enforce any rules.
There is nothing about representing a model in an ER diagram that implies satisfaction of 3NF.
The thinking behind the erroneous claim may be based on the idea that when you, for example, convert a repeating group from columns to rows in a child table, or remove partially dependent columns to another table, you are increasing the normal form of your relations. However, the diagrammatic convention doesn't enforce this in any way.
Let's see an example (in oracle):
CREATE TABLE STUDENT (
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(64) NOT NULL,
RESIDENCE_STREET VARCHAR2(64),
RESIDENCE_CITY VARCHAR2(64),
RESIDENCE_PROVINCE VARCHAR2(64),
RESIDENCE_POSTALCODE NUMBER(8)
);
In some countries postal code uses prefixes to identify the region or province, so RESIDENCE_PROVINCE has a functional dependency from RESIDENCE_POSTALCODE. But RESIDENCE_POSTALCODE is a non-prime attribute. Then this easy and common example is "legal" and it is not in 3NF.

How to get EF6 to honor Unique Constraint (on FK) in Association/Relationship multiplicity?

2019 Update / TLDR; switch to Entity Framework Core (or whatever else)
While missing some "Features", EF Core properly honors Alternate Keys (aka Unique Constraints) in addition to Primary Keys and thus does a much better job of honoring Relational Algebra. YMMV otherwise; at least it supports many more SQL schemes correctly.
This support added was in the (very outdated) EF Core 1.0 release.. a bit disappointing that the original EF never had this design(ed!) flaw addressed.
This may be related to my other question - which seems to be that either:
Entity Framework is a terrible Relational Algebra mapper1 or;
(which I am hoping for) I am overlooking something with SSDL/CSDL and the EDMX model or EF mappings in general.
I have a Schema First model and the schema looks like this:
ExternalMaps
---
emap_id - PK
Melds
---
meld_id - PK
emap_id - >>UNIQUE INDEX<< over not-null column, FK to ExternalMaps.emap_id
For verification, these are scripted as the following, which should result in a multiplicity of ExternalMaps:1 <-> 0..1:Melds2.
ALTER TABLE [dbo].[Melds] WITH CHECK ADD CONSTRAINT [FK_Melds_ExternalMaps]
FOREIGN KEY([emap_id]) REFERENCES [dbo].[ExternalMaps] ([emap_id])
CREATE UNIQUE NONCLUSTERED INDEX [IX_Melds] ON [dbo].[Melds] ([emap_id] ASC)
However, when I use the EDMX designer to update from the database (SQL Server 2012), from scratch, it incorrectly creates the Association / Foreign Key relation as ExternalMap:1 <-> M:Meld.
When I try to change the multiplicity manually for the Meld (via the "Association Set" properties in the designer) side to either 1 or 0..1, I get:
Running transformation: Multiplicity is not valid in Role 'Meld' in relationship 'FK_Melds_ExternalMaps'. Because the Dependent Role properties are not the key properties, the upper bound of the multiplicity of the Dependent Role must be *.
(As with my other question, this seems to be related to Unique Constraints not being correctly registered/honored as Candidate Keys.)
How can I get EF to honor the 1 <-> 0..1/1 multiplicity, as established by the model?
1 While I hope this is not the case, I am having no end to grief when trying to get EF to map onto a perfectly valid RA model: LINQ to SQL (L2S) does not have this problem. Since my other question was not trivially answered for such a popular ORM, I am losing faith in this tooling.
2 It is by design that the FK is not the other way: "Though shalt not have nullable foreign keys." - It is also not the case that it's a "shared" PK as this answer from 2009 suggests as a fix.
I am using EF 6.1.1, VS 2013 Ultimate, and am not going to use any "OO subtype features" - if that changes anything.
EDIT sigh:
Multiplicity is not valid because the Dependent Role properties are not the key properties? (from 2011) - is this still the case for the EF "Microsoft-endorsed Enterprise-ready" ORM in 2014 2015?
At this rate the next time someone asks why EF wasn't used I'll have a large set of reasons other than "LINQ to SQL works just fine" ..
The problem is that Entity Framework (from EF4 through EF6.1, and who knows how much longer) does not "understand" the notion of Unique Constraints and all that they imply: EF maps Code First, not Relational Algebra *sigh*
This answer for my related question provides a link to a request to add the missing functionality and sums it up:
.. The Entity Framework currently only supports basing referential constraints on primary keys and does not have a notion of a unique constraint.
This can be expanded to pretty much all realms dealing with Unique Constraints and Candidate Keys, including the multiplicity issue brought up in this question.
I would be happy if this severe limitation of EF was discussed openly and made "well known", especially when EF is touted to support Schema First and/or replace L2S. From my viewpoint, EF is centered around mapping (and supporting) only Code First as a first-class citizen. Maybe in another 4 years ..

Converting a logical model to a physical model. Trouble understanding ERD

I am working with an ERD. It is supposedly a logical model and I am to make a physical model from it. I should be formatting in UML and our DBMS is PostgreSQL.
Some of my research (http://www.1keydata.com/datawarehousing/data-modeling-levels.html // http://en.wikipedia.org/wiki/Logical_data_model#Conceptual.2C_Logical_.26_Physical_Data_Model) indicates that this ERD may have too much information in it to be a logical model and that it may actually be closer to the physical.
My questions are as follows:
What do the bold labels mean?
What do the white "N"s and red "U"s at the end of some entries mean?
What is the difference between a dashed line (relationship) and a solid one?
What is the difference between the "crows foot" and the broken line on either end of the relationship?
Is this closer to the physical model or logical model? What would I have to do to convert it from one to the other?
This is the ERD:
Could bold text indicate primary key attributes?
That's not part of any standard ER modelling notation. By no means certain but my guess would be U means unique, N means nullable.
A solid line means an identifying relationship. Dashed line means a non-identifying one. It's usually not an especially important distinction but look those terms up if you want to know more.
One to many relationship. The crows foot represents the "many" side of the relationship; the short line across is the "one" side. Where the "one" symbol appears at both ends that's a one-to-one relationship.
In the context of information modelling a logical model means a semantic model - a model that's more about the business domain than about an actual database design. Exactly what goes into the logical model and at what level of detail depends a lot on the intended audience for the model and on how you want to use it. Turning it into a "physical" model means making it into a design for a database with the technical features and any changes you would need for your chosen DBMS platform (specific data types for example).
Logical/physical models in the information modelling sense should not be confused with what are termed the logical level and physical level in DBMS architecture and database theory. In principle relational database tables (AKA relation variables) are always "logical" level constructs but in data modelling terms they are part of a so-called "physical" model. That unfortunate choice of modelling terminology is responsible for a lot of confusion and misunderstandings.

Entity Framework TPC with multiple inheritance

I am using EF with TPC and I have a multiple inheritance lets say I have
Employee (abstract)
Developer (inherits from Employee)
SeniorDeveloper (inherits from Developer)
I inserted some rows in the database and EF reads them correctly.
BUT
When I insert a new SeniorDeveloper, the values get written to the SeniorDeveloper AND Developer database table, hence querying just the Developers (context.Employees.OfType()) also gets the recently added SeniorDevelopers.
Is there a way to tell EF, that it should store only in one table, or why does EF fall back to TPT strategy?
Since it doesn't look like EF supports the multiple inheritance with TPC, we ended up using TPC for Employee to Developer and TPT between Developer and SeniorDeveloper...
I believe there is a reason for this, although I may not see the full picture and might just be speculating.
The situation
Indeed, the only way (that I see) for EF to be able to list only the non-senior developers (your querying use-case) in a TPT scenario by reading only the Developer table would be by using a discriminator, and we know that EF doesn't use one in TPT/TPC strategies.
Why? Well, remember that all senior developers are developers, so it's only natural (and necessary) that they have a Developer record as well as a SeniorDeveloper record.
The only exception is if Developer is an abstract type, in which case you can use a TPC strategy to remove the Developer table altogether. In your case however, Developer is concrete.
The current solution
Remembering this, and without a discriminator in the Developer table, the only way to determine if any developer is a non-senior developer is by checking if it is not a senior developer; in other words, by verifying that there is no record of the developer in the SeniorDeveloper table, or any other subtype table.
That did sound a little obvious, but now we understand why the SeniorDeveloper table must be used and accessed when its base type (Developer) is concrete (non-abstract).
The current implementation
I'm writing this from memory so I hope it isn't too off, but this is also what Slauma mentioned in another comment. You probably want to fire up a SQL profiler and verify this.
The way it is implemented is by requesting a UNION of projections of the tables. These projections simply add a discriminator declaring their own type in some encoded way[1]. In the union set, the rows can then be filtered based on this discriminator.
[1] If I remember correctly, it goes something like this: 0X for the base type, 0X0X for the first subtype in the union, 0X1X for the second subtype, and so on.
Trade-off #1
We can already identify a trade-off: EF can either store a discriminator in the table, or it can "generate one" at "run time".
The disadvantages of a stored discriminator are that it is less space efficient, and possibly "ugly" (if that's an argument). The advantages are lookup performance in a very specific case (we only want the records of the base type).
The disadvantages of a "run time" discriminator are that lookup performance is not as good for that same use-case. The advantages are that it is more space efficient.
At first sight, it would seem that sometimes we might prefer to trade a little bit of space for query performance, and EF wouldn't let us.
In reality, it's not always clear when; by requesting a UNION of two tables, we just lookup two indexes instead of one, and the performance difference is negligible. With a single level of inheritance, it can't be worse than 2x (since all subtype sets are disjoint). But wait, there's more.
Trade-off #2
Remember that I said the performance advantage of the stored-discriminator approach would only appear in the specific use-case where we lookup records of the base type. Why is that?
Well, if you're searching for developers that may or may not be senior developers[2], you're forced to lookup the SeniorDeveloper table anyway. While this, again, seems obvious, what may be less obvious is that EF can't know in advance if the types will only be of one type or another. This means that it would have to issue two queries in the worst case: one on the Developer table, and if there is even one senior developer in the result set, a second one on the SeniorDeveloper table.
Unfortunately, the extra roundtrip probably has a bigger performance impact than a UNION of the two tables. (I say probably, I haven't verified it.) Worse, it increases for each subtype for which there is a row in the result set. Imagine a type with 3, or 5, or even 10 subtypes.
And that's your trade-off #2.
[2] Remember that this kind of operation could come from any part of your application(s), while the resolving the trade-off must be done globally to satisfy all processes/applications/users. Also couple that with the fact that the EF team must make these trade-offs for all EF users (although it is true that they could add some configuration for these kinds trade-offs).
A possible alternative
By batching SQL queries, it would be possible to avoid the multiple roundtrips. EF would have to send some procedural logic to the server for the conditional lookups (T-SQL). But since we already established in trade-off #1 that the performance advantage is most likely negligible in many cases, I'm not sure this would ever be worth the effort. Maybe someone could open an issue ticket for this to determine if it makes sense.
Conclusion
In the future, maybe someone can optimize a few typical operations in this specific scenario with some creative solutions, then provide some configuration switches when the optimization involves such trade-offs.
Right now however, I think EF has chosen a fair solution. In a strange way, it's almost cleaner.
A few notes
I believe the use of union is an optimization applied in certain cases. In other cases, it would be an outer join, but the use of a discriminator (and everything else) remains the same.
You mentioned multiple inheritance, which sort of confused me initially. In common object-oriented parlance, multiple inheritance is a construct in which a type has multiple base types. Many object-oriented type systems don't support this, including the CTS (used by all .NET languages). You mean something else here.
You also mentioned that EF would "fallback" to a TPT strategy. In the case of Developer/SeniorDeveloper, a TPC strategy would have the same results as a TPT strategy, since Developer is concrete. If you really want a single table, you must then use a TPH strategy.