Normalisation of a table - database-normalization

Ask a quite stupid question here...
Based on http://www.sqa.org.uk/e-learning/MDBS01CD/page_30.htm#Non
It said that "Take each non-key attribute in turn and ask the question: is this attribute dependent on one part of the key?(paragraph 5)"
Actually how do I know whether a non-key attribute is dependent on a composite key or it depends on a part of the composite key only?
Could you all provide an explanation?Thanks!

You need to know what business rules your database design is supposed to satisfy. The aim of normalization is to help ensure that the business rules (i.e. the set of dependencies you wish to implement) are properly supported by the keys and other integrity constraints in your data model.
Write down the set of functional dependencies before you start and use that as the basis of your normalization exercise.

Related

ERD diagram conversion into UML diagram

I have an ERD Diagram of an E-commerce with the following entities Product , Tag , ProductTag,Category and other entities of course.
I tried to convert it into class diagram as follows:
1- removed the id
2- converted the foreign key into object of the type i'm refering to(product_id converted into => product: Product)
my question is , is this good approach to follow on all my entities? does it like achieve the SOLID principle? I have a presentation in 2 days and I want to be very sure of what I have made , any comment or modification would be really enough .I also chose these tables because they represent one to many and many to many. thanks in advance.
Basically your approach is correct. It's just a couple of UML specifications you got wrong.
The label in the middle of the connectors is just the name of the connector. Unless you do some OCL wizardry this name is meaningless. There is a way to adorn it with a black triangle to show the reading direction. This sometimes helps business people to understand how classes are related to each other (see Fig. 11.27 on p. 202 of UML 2.5). But usually you would not use it.
The shared aggregation has no semantics (p. 110 of UML: Indicates that the Property has shared aggregation semantics. Precise semantics of shared aggregation varies by application area and modeler.). So leave the open diamond away. Composite (filled diamond) can be used to show responsibility (when I'm killed I will kill my composites first). Usually it adds too little to be really useful, it only heats up the futile composition-discussion.
The navigation-direction is incorrect. The AC in the middle sees both connected classes so it's shown without any arrow. If you have an additional (directed) association you place it as lone (extra) connector. In that case put role names towards any end. That makes navigation clearer than just a simple arrow. I for myself use arrows only on rough sketches on the drawing board.
P.S. Just noticing that you have operations in your classes that have the same name as the class and take one paramter being also the class. I would guess you intend to show a constructor here. In that case you would make it Classname():Classname and provide only the paramaters that are needed for the constructor. Else these opreations don't seem to make much sense. Similarly the CRUD operations seem to work on a list of 'itself' which is also probably not desired. You would have a collection class which handles the base class where these operation make sense. So to summarize: you would only add getter/setter operations for the (private) properties matching the columns from your table.
P.P.S.: As per Christophe's comment it's a good idea to adorn the class instantiation operation with a <<create>> stereotype which highlights its purpose. See p. 196 of UML 2.5:
This stereotype is part of the standard (see p. 677) and the table on p. 678 states:
Specifies that the designated feature creates an instance of the classifier to which the feature is attached.
On the modeling part of your question, there’s already a perfect answer. For the records, I’d nevertheless like to add a complementary answer on the SOLID part:
Single responsibility: your classes have more than one reason to change, because you may want to change Product for what it is (e.g. add more product-related attributes), but you may also want to change the class to add new getByXxx() operations to find products in the database based on other criteria, independently of what a product really is. SO it's not complying.
Open-closed principle: we cannot tell
Liskov substitution principle: in absence of inheritance, this is not relevant. Moreover, you couldn't tell without having precondition, postcondition and invariant constraints.
Interface segregation principe: is probably not compliant, because you impose an implicit interface that all inheriting class would have to provide, even if they don't need it (e.g. products not stored in a database). A first step in the right direction, would be to use an interface for the common database operations.
Dependency inversion: we cannot tell but probably it isn't , because update(), delete(),... probably depends on some database, so that you can't switch it to another database. With DIP, you'd inject the database in the class that use it, so that you could at any moment inject another database that offers the same interface.
You didn't ask, but your design seems to correspond to active records. If you want to go for a cleaner, more SOLID design, you should prefer factor out the database related code to either repositories or table data gateways.

Which normal form does the ER Diagram guarantee?

Whenever a proper ER diagram is drawn for a database and then mapped to the relational schema, I was informed that it guarantees 3NF.
Is this claim true?
If not, can anyone provide me a counter example.
Also, please tell me whether any normal form can be claimed to be strictly followed when relational schema is mapped from a perfect ER diagram?
The short answer is no. Depending on the analysis and design approach there could be examples of ER models that appear perfectly sound in ER terms but don't necessarily translate to a relational schema in 3NF. ER modelling and notation is not really expressive enough or formal enough to guarantee that all functional dependencies are correctly enforced in database designs. Experienced database designers are conscious of this and apply other techniques to come up with the "proper" design.
Terry Halpin devised a formal method for database design that guarantees a relational schema satisfying 5th Normal Form (see orm.net). He uses the Object Role Modelling approach, not ER modelling.
The diagram just shows what entities and attributes you have and how entities relate to one-another. Your attributes can violate the normal forms. An ER diagram is just a representation, it does not enforce any rules.
There is nothing about representing a model in an ER diagram that implies satisfaction of 3NF.
The thinking behind the erroneous claim may be based on the idea that when you, for example, convert a repeating group from columns to rows in a child table, or remove partially dependent columns to another table, you are increasing the normal form of your relations. However, the diagrammatic convention doesn't enforce this in any way.
Let's see an example (in oracle):
CREATE TABLE STUDENT (
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(64) NOT NULL,
RESIDENCE_STREET VARCHAR2(64),
RESIDENCE_CITY VARCHAR2(64),
RESIDENCE_PROVINCE VARCHAR2(64),
RESIDENCE_POSTALCODE NUMBER(8)
);
In some countries postal code uses prefixes to identify the region or province, so RESIDENCE_PROVINCE has a functional dependency from RESIDENCE_POSTALCODE. But RESIDENCE_POSTALCODE is a non-prime attribute. Then this easy and common example is "legal" and it is not in 3NF.

Polymorphic association foreign key constraints. Is this a good solution?

We're using polymorphic associations in our application. We've run into the classic problem: we encountered an invalid foreign key reference, and we can't create a foreign key constraint, because its a polymorphic association.
That said, I've done a lot of research on this. I know the downsides of using polymorphic associations, and the upsides. But I found what seems to be a decent solution:
http://blog.metaminded.com/2010/11/25/stable-polymorphic-foreign-key-relations-in-rails-with-postgresql/
This is nice, because you get the best of both worlds. My concern is the data duplication. I don't have a deep enough knowledge of postgresql to completely understand the cost of this solution.
What are your thoughts? Should this solution be completely avoided? Or is it a good solution?
The only alternative, in my opinion, is to create a foreign key for each association type. But then you run into validating that only one association exists. It's a "pick your poison" situation. Polymorphic associations clearly describe intent, and also make this scenario impossible. In my opinion that is the most important. The database foreign key constraint is a behind the scenes feature, and altering "intent" to work with database limitations feels wrong to me. This is why I'd like to use the above solution, assuming there is not a glaring "avoid" with it.
The biggest problem I have with PostgreSQL's INHERITS implementation is that you can't set a foreign key reference to the parent table. There are a lot of cases where you need to do that. See the examples at the end of my answer.
The decision to create tables, views, or triggers outside of Rails is the crucial one. Once you decide to do that, then I think you might as well use the very best structure you can find.
I have long used a base parent table, enforcing disjoint subtypes using foreign keys. This structure guarantees only one association can exist, and that the association resolves to the right subtype in the parent table. (In Bill Karwin's slideshow on SQL antipatterns, this approach starts on slide 46.) This doesn't require triggers in the simple cases, but I usually provide one updatable view per subtype, and require client code to use the views. In PostgreSQL, updatable views require writing either triggers or rules. (Versions before 9.1 require rules.)
In the most general case, the disjoint subtypes don't have the same number or kind of attributes. That's why I like updatable views.
Table inheritance isn't portable, but this kind of structure is. You can even implement it in MySQL. In MySQL, you have to replace the CHECK constraints with foreign key references to one-row tables. (MySQL parses and ignores CHECK constraints.)
I don't think you have to worry about data duplication. In the first place, I'm pretty sure data isn't duplicated between parent tables and inheriting tables. It just appears that way. In the second place, duplication or derived data whose integrity is completely controlled by the dbms is not an especially bitter pill to swallow. (But uncontrolled duplication is.)
Give some thought to whether deletes should cascade.
A publications example with SQL code.
A "parties" example with SQL code.
You cannot enforce that in a database in an easy way - so this is a really bad idea. The best solution is usually the simple one - forget about the polymorphic associations - this is a taste of an antipattern.

I don't need/want a key!

I have some views that I want to use EF 4.1 to query. These are specific optimized views that will not have keys to speak of; there will be no deletions, updates, just good ol'e select.
But EF wants a key set on the model. Is there a way to tell EF to move on, there's nothing to worry about?
More Details
The main purpose of this is to query against a set of views that have been optimized by size, query parameters and joins. The underlying tables have their PKs, FKs and so on. It's indexed, statiscized (that a word?) and optimized.
I'd like to have a class like (this is a much smaller and simpler version of what I have...):
public MyObject //this is a view
{
Name{get;set}
Age{get;set;}
TotalPimples{get;set;}
}
and a repository, built off of EF 4.1 CF where I can just
public List<MyObject> GetPimply(int numberOfPimples)
{
return db.MyObjects.Where(d=> d.TotalPimples > numberOfPimples).ToList();
}
I could expose a key, but whats the real purpose of dislaying a 2 or 3 column natural key? That will never be used?
Current Solution
Seeming as their will be no EF CF solution, I have added a complex key to the model and I am exposing it in the model. While this goes "with the grain" on what one expects a "well designed" db model to look like, in this case, IMHO, it added nothing but more logic to the model builder, more bytes over the wire, and extra properties on a class. These will never be used.
There is no way. EF demands unique identification of the record - entity key. That doesn't mean that you must expose any additional column. You can mark all your current properties (or any subset) as a key - that is exactly how EDMX does it when you add database view to the model - it goes through columns and marks all non-nullable and non-computed columns as primary key.
You must be aware of one problem - EF internally uses identity map and entity key is unique identification in this map (each entity key can be associated only with single entity instance). It means that if you are not able to choose unique identification of the record and you load multiple records with the same identification (your defined key) they will all be represented by a single entity instance. Not sure if this can cause you any issues if you don't plan to modify these records.
EF is looking for a unique way to identify records. I am not sure if you can force it to go counter to its nature of desiring something unique about objects.
But, this is an answer to the "show me how to solve my problem the way I want to solve it" question and not actually tackling your core business requirement.
If this is a "I don't want to show the user the key", then don't bind it when you bind the data to your form (web or windows). If this is a "I need to share these items, but don't want to give them the keys" issue, then map or surrogate the objects into an external domain model. Adds a bit of weight to the solution, but allows you to still do the heavy lifting with a drag and drop surface (EF).
The question is what is the business requirement that is pushing you to create a bunch of objects without a unique identifier (key).
One way to do this would be not to use views at all.
Just add the tables to your EF model and let EF create the SQL that you are currently writing by hand.

How do I add relationships at runtime using DBIx::Class and Catalyst?

In the application I am building, users can specify relationships between tables.
Since I only determine this at runtime, I can't specify has_many or belongs_to relationships in the schema modules for startup.
So given two tables; system and place, I would like to add the relationship to join records between them.
I have part of the solution below:
$rs = $c->model('DB::system')->result_source;
$rs->add_relationship('locations','DB::place',{'foreign.fk0' => 'self.id'});
So the column fk0 would be the foreign key mapping to the location primary key id.
I know there must be a re-registration to allow future access to the relationship but I can't figure it out.
I don't believe you can re-define these relationships after an application is already running. At least not without discarding any existing DBIC objects, and re-creating them all from scratch. At that point, it would be easier to just re-start your application, I suspect.
If you're content defining these things dynamically at compile time, that is possible... we do something similar in one of our applications.
If that would be useful to you, I can provide some sample code.
The DBIx::Class::ResultSet::View module might provide a rough approximation of what you're looking for, by letting you execute arbitrary code, but retrieving the results as DBIx objects.
My general opinion on things like this, is that any abstraction layer (and an ORM is an abstraction layer), is intended to make life easier. When it gets in the way of making your application do what it wants, it's no longer making life easier, and ought to be discarded (for that specific use--not necessarily for every use). For this reason, I would suggest using DBI, as you suggested in one of your comments. I suspect it will make your life much easier in this case.
I've done this by calling the appropriate methods on the relevant result sources, e.g. $resultset->result_source-><relationship method>. It does work even in an active application.