Which normal form does the ER Diagram guarantee? - database-normalization

Whenever a proper ER diagram is drawn for a database and then mapped to the relational schema, I was informed that it guarantees 3NF.
Is this claim true?
If not, can anyone provide me a counter example.
Also, please tell me whether any normal form can be claimed to be strictly followed when relational schema is mapped from a perfect ER diagram?

The short answer is no. Depending on the analysis and design approach there could be examples of ER models that appear perfectly sound in ER terms but don't necessarily translate to a relational schema in 3NF. ER modelling and notation is not really expressive enough or formal enough to guarantee that all functional dependencies are correctly enforced in database designs. Experienced database designers are conscious of this and apply other techniques to come up with the "proper" design.
Terry Halpin devised a formal method for database design that guarantees a relational schema satisfying 5th Normal Form (see orm.net). He uses the Object Role Modelling approach, not ER modelling.

The diagram just shows what entities and attributes you have and how entities relate to one-another. Your attributes can violate the normal forms. An ER diagram is just a representation, it does not enforce any rules.
There is nothing about representing a model in an ER diagram that implies satisfaction of 3NF.
The thinking behind the erroneous claim may be based on the idea that when you, for example, convert a repeating group from columns to rows in a child table, or remove partially dependent columns to another table, you are increasing the normal form of your relations. However, the diagrammatic convention doesn't enforce this in any way.

Let's see an example (in oracle):
CREATE TABLE STUDENT (
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(64) NOT NULL,
RESIDENCE_STREET VARCHAR2(64),
RESIDENCE_CITY VARCHAR2(64),
RESIDENCE_PROVINCE VARCHAR2(64),
RESIDENCE_POSTALCODE NUMBER(8)
);
In some countries postal code uses prefixes to identify the region or province, so RESIDENCE_PROVINCE has a functional dependency from RESIDENCE_POSTALCODE. But RESIDENCE_POSTALCODE is a non-prime attribute. Then this easy and common example is "legal" and it is not in 3NF.

Related

ERD diagram conversion into UML diagram

I have an ERD Diagram of an E-commerce with the following entities Product , Tag , ProductTag,Category and other entities of course.
I tried to convert it into class diagram as follows:
1- removed the id
2- converted the foreign key into object of the type i'm refering to(product_id converted into => product: Product)
my question is , is this good approach to follow on all my entities? does it like achieve the SOLID principle? I have a presentation in 2 days and I want to be very sure of what I have made , any comment or modification would be really enough .I also chose these tables because they represent one to many and many to many. thanks in advance.
Basically your approach is correct. It's just a couple of UML specifications you got wrong.
The label in the middle of the connectors is just the name of the connector. Unless you do some OCL wizardry this name is meaningless. There is a way to adorn it with a black triangle to show the reading direction. This sometimes helps business people to understand how classes are related to each other (see Fig. 11.27 on p. 202 of UML 2.5). But usually you would not use it.
The shared aggregation has no semantics (p. 110 of UML: Indicates that the Property has shared aggregation semantics. Precise semantics of shared aggregation varies by application area and modeler.). So leave the open diamond away. Composite (filled diamond) can be used to show responsibility (when I'm killed I will kill my composites first). Usually it adds too little to be really useful, it only heats up the futile composition-discussion.
The navigation-direction is incorrect. The AC in the middle sees both connected classes so it's shown without any arrow. If you have an additional (directed) association you place it as lone (extra) connector. In that case put role names towards any end. That makes navigation clearer than just a simple arrow. I for myself use arrows only on rough sketches on the drawing board.
P.S. Just noticing that you have operations in your classes that have the same name as the class and take one paramter being also the class. I would guess you intend to show a constructor here. In that case you would make it Classname():Classname and provide only the paramaters that are needed for the constructor. Else these opreations don't seem to make much sense. Similarly the CRUD operations seem to work on a list of 'itself' which is also probably not desired. You would have a collection class which handles the base class where these operation make sense. So to summarize: you would only add getter/setter operations for the (private) properties matching the columns from your table.
P.P.S.: As per Christophe's comment it's a good idea to adorn the class instantiation operation with a <<create>> stereotype which highlights its purpose. See p. 196 of UML 2.5:
This stereotype is part of the standard (see p. 677) and the table on p. 678 states:
Specifies that the designated feature creates an instance of the classifier to which the feature is attached.
On the modeling part of your question, there’s already a perfect answer. For the records, I’d nevertheless like to add a complementary answer on the SOLID part:
Single responsibility: your classes have more than one reason to change, because you may want to change Product for what it is (e.g. add more product-related attributes), but you may also want to change the class to add new getByXxx() operations to find products in the database based on other criteria, independently of what a product really is. SO it's not complying.
Open-closed principle: we cannot tell
Liskov substitution principle: in absence of inheritance, this is not relevant. Moreover, you couldn't tell without having precondition, postcondition and invariant constraints.
Interface segregation principe: is probably not compliant, because you impose an implicit interface that all inheriting class would have to provide, even if they don't need it (e.g. products not stored in a database). A first step in the right direction, would be to use an interface for the common database operations.
Dependency inversion: we cannot tell but probably it isn't , because update(), delete(),... probably depends on some database, so that you can't switch it to another database. With DIP, you'd inject the database in the class that use it, so that you could at any moment inject another database that offers the same interface.
You didn't ask, but your design seems to correspond to active records. If you want to go for a cleaner, more SOLID design, you should prefer factor out the database related code to either repositories or table data gateways.

Normalisation of a table

Ask a quite stupid question here...
Based on http://www.sqa.org.uk/e-learning/MDBS01CD/page_30.htm#Non
It said that "Take each non-key attribute in turn and ask the question: is this attribute dependent on one part of the key?(paragraph 5)"
Actually how do I know whether a non-key attribute is dependent on a composite key or it depends on a part of the composite key only?
Could you all provide an explanation?Thanks!
You need to know what business rules your database design is supposed to satisfy. The aim of normalization is to help ensure that the business rules (i.e. the set of dependencies you wish to implement) are properly supported by the keys and other integrity constraints in your data model.
Write down the set of functional dependencies before you start and use that as the basis of your normalization exercise.

is it possible to consider language lookup table as a value object

I have a language table for 2 fields id and language_name. Can I consider this as a value object?
Ex record: 1 EN
2 DE
3 TR
As soon as these values are immutable, I think I don't need to give them Ids and make it as an entity which represented in database directly.
You can consider them as value object, but you don't have to consider everything in DDD way.
According to Martin Fowler's definition:
When we use a Domain Model, we use it because it contains complicated
domain logic. It can be helpful the classify this domain logic into:
validations: checks that input makes sense and objects are properly
suited for further actions.
consequences: initiating some action that will change the state of the world
derivations: figuring out some
information based on information we already have
ValueObject s are good at validates and derivations.
Language tables, on the other hand, are usually used to solve i18n issues(ui/query conern). Generally speaking, there is no domain logic here.This kind of features are easily implemented in simple CRUD style and better for being so. Consider them in DDD add lots of constraints like only aggregates can be returned by the repository or you could only modify an local entity through it's aggregate. For example, Users edit a product, add english description and deutsch description. One can model product as an aggregate and description as value object, but this does not add much value and somtimes annoying(Now a product cannot be edited by a english editor and a deutsch editor at the same time for concurrent modification on an aggregate).
But what if there is some real domain validations and derivations on the product aggregate? like inventory and pricing. This is where bounded context comes to the play. One can have both an inventory/pricing bounded context modeled in DDD and a product description context modeled in CURD.

Converting a logical model to a physical model. Trouble understanding ERD

I am working with an ERD. It is supposedly a logical model and I am to make a physical model from it. I should be formatting in UML and our DBMS is PostgreSQL.
Some of my research (http://www.1keydata.com/datawarehousing/data-modeling-levels.html // http://en.wikipedia.org/wiki/Logical_data_model#Conceptual.2C_Logical_.26_Physical_Data_Model) indicates that this ERD may have too much information in it to be a logical model and that it may actually be closer to the physical.
My questions are as follows:
What do the bold labels mean?
What do the white "N"s and red "U"s at the end of some entries mean?
What is the difference between a dashed line (relationship) and a solid one?
What is the difference between the "crows foot" and the broken line on either end of the relationship?
Is this closer to the physical model or logical model? What would I have to do to convert it from one to the other?
This is the ERD:
Could bold text indicate primary key attributes?
That's not part of any standard ER modelling notation. By no means certain but my guess would be U means unique, N means nullable.
A solid line means an identifying relationship. Dashed line means a non-identifying one. It's usually not an especially important distinction but look those terms up if you want to know more.
One to many relationship. The crows foot represents the "many" side of the relationship; the short line across is the "one" side. Where the "one" symbol appears at both ends that's a one-to-one relationship.
In the context of information modelling a logical model means a semantic model - a model that's more about the business domain than about an actual database design. Exactly what goes into the logical model and at what level of detail depends a lot on the intended audience for the model and on how you want to use it. Turning it into a "physical" model means making it into a design for a database with the technical features and any changes you would need for your chosen DBMS platform (specific data types for example).
Logical/physical models in the information modelling sense should not be confused with what are termed the logical level and physical level in DBMS architecture and database theory. In principle relational database tables (AKA relation variables) are always "logical" level constructs but in data modelling terms they are part of a so-called "physical" model. That unfortunate choice of modelling terminology is responsible for a lot of confusion and misunderstandings.

Entity Framework inheritance: TPT, TPH or none?

I am currently reading about the possibility about using inheritance with Entity Framework. Sometimes I use a approch to type data records and I am not sure if I would use TPT or TPH or none...
For example...
I have a ecommerce shop which adds shipping, billing, and delivery address
I have a address table:
RecordID
AddressTypeID
Street
ZipCode
City
Country
and a table AddressType
RecordID
AddressTypeDescription
The table design differs to the gerneral design when people show off TPT or TPH...
Does it make sense to think about inheritance an when having a approach like this..
I hope it makes sense...
Thanks for any help...
When considering how to represent inheritance in the database, you need to consider a few things.
If you have many different sub classes you can have a lot of extra joins in queries involving those more complex types which can hurt performance. One big advantage of TPH is that you query one table for all types in the hierarchy and this is a boon for performance, particularly for larger hierarchies. For this reason i tend to favour that approach in most scenarioes
However, TPH means that you can no longer have NOT NULL fields for sub types as all fields for all types are in a single table, pushing the responsibility for data integrity towards your application. Although this may sound horrible in practice i haven't found this to be too big a restriction.
However i would tend to use TPT if there were a lot of fields for each type and that the number of types in the hierarchy was likely to be small, meaning that performance was not so much of an issue with the joins, and you get better data integrity.
Note that one of the advantages of EF and other ORMs is that you can change your mind down the track without impacting your application so the decision doesn't need to be completely carved in stone.
In your example, it doesn't appear to have an inheritance relationship, it looks like a one to many from the address type to the addresses
This would be represented between your classes something like the following:
Address.AddressType
AddressType.Addresses
As Keith hints, this article suggests TPT in EF scales horribly, but I haven't tried it myself.