Full functional dependencies - database-normalization

Alright I can't seem to wrap my head around the subject normalization.
I have this table
Now I need to find the full functional dependencies.
FilmID, Actor -> Title, Year, Director
Publisher -> PublisherCity
Actor -> DOB, Country
Now Can someone tell me if I'm on the right track? if not then any help would be appreciated.

Tables
fields
I hope this helps:
films
id
title
year
director_id
publisher_id
One publisher and director only with this setup.
actors
id
name
dob
country
you missed name from Actor attrs
films_actors
film_id
actor_id
This is a join table of films to actors. Allowing you to link limitless actors to films (Many to Many relationship).
publishers
id
name
city
self explanatory
directors
id
name
self explanatory
any questions just ask.

If you are asking "how can I design" relationships, well one thing is practice and reading the right books. Here is a short cut ->find the cardinality of any relationship.
For e.g. Consider books and author's. One author can write multiple books so the cardinality is multiple on the book side. This is called one to many relationship. Now with practice you learn that this can be modelled in a relational db using a foreign key. Like the director_id in films table of #Isotope's answer
Now consider that some books(for e.g. the professional series) can be written by multiple authors. Now cardinality is multiple on both sides of the relationship. This is known as many to many relationship. You can design this relationship using film_actors kind of mapping table from above answer. For basic modelling this much is more than enough. Do take a look at this excellent article on infoq which might help you in this.

Related

How to model: Team <-(1) --- (N)-> Employees, with a twist; Better Pattern?

I have to model and create in a SQL Server database a simple relationship...
A Team can have zero or more Employees assigned to it; An Employee can only be assigned to a single team. Simple enough... Here is the twist that I am struggling with...
The Team has a TeamLeader, who is an Employee. A TeamLeader can be assigned to a single Team. So, I added the TeamLeaderId long to the Team and gave TeamLeaderId a unique index. I created a foreign key relationship between the TeamLeaderId in Team to and EmployeeId in Employees.
Is this the best model for this situation, or is there a better pattern?
Thank you for your help and guidance,
Mike
For the constraints you laid out this looks right. But it seems you would be better off with a link table between Team and Employee. Why limit an employee to a single team or make it mandatory for an employee to be in a team at all? Today someone (you?) may think this is the only way, tomorrow it may be different.
It is better to remove TeamLeaderId field from Team table and create new table TeamLeaders (with an unique key [EmployeeId in Employees + TeamId in Teams])
Now, you can change your mind and remove team leaders from your business domain model without pain: just drop teamleaders table.
I think the answer to this question depends much on the usage of the system:
If the Team is created and employees assigned, and then finally the team leader is chosen among any of the team members, its a good choice you have made.
On the other hand, if an employee is hired as a team leader, and he will always be assigned as a team leader it is better to add this "type" of information to the Employee table (othervise adding/removing an employee requires extra uneeded logic to handle the TeamleaderId and potential future "types").

What is Objectified Relationship?

I am not sure if I should be asking this here or at the programmers site. I came across "objectified relationship" while researching recursive saving in llblgen framework...I then searched stackoverflow (yes, first) and then google. I then came across a brief (related) topic on nHibernate.
I have an idea what it is but is there a detail description or explanation on it?
The relationship is an object itself, not just a connection. In a database the relationship would be represented as a row in a table rather than just as a UID in a column referencing another table. In a graph the relationship would itself be a node rather than 'just' an edge.

Copying entities in Core Data

I have a couple of Core Data entities... Student and Exam.
Now, the Exams is initially just one object per exam - Maths Exam 3, English Exam 2 etc.
There is a relationship between Students and Exams in my data model (a student can have several exams). But initially, the Exams are just floating free, and not attached to any students.
How would I make a copy of one of the exams and attach it to a student?
If I do something like:
[student addExamsObject:examObject];
...then I think it simply references the original exam to the student, rather than making a copy.
I need a copy because the Exam has a boolean 'hasTaken', which is YES when the student has taken the exam. But if I set that now, it will make it seem like all the students with that exam have taken it.
Clarification: I would rather not restructure my model. The data is taken from a couple of xml files, one each for Students and Exams, which are parsed into the Core Data store. For instance, an Exam object might look like this:
name:Maths 5
class-id:12
year-id:4
student-id:0
..with a Student object looking like
name: Dave
class-id:12
year=id:4
student-id:222
Various rules are meant to guide which exams get attached to which students... for instance if all the Exam's ids are 0 then all students take the exam. If class-id and year-id match, and student-id is 0, then the Exam gets added to students with the same class and year. If the student-ids match, then just that student takes the Exam. etc etc.
I cannot change the way the xml is outputted from the server.
Another issue is that Exam has too-many relationship to a Question entity... in other words, the questions in the Exam. And I have to store answers to the questions that each student gives in an exam.
Edit: I wish people would try to answer my question rather than tell me to restructure my whole program. There are reasons why the data model has been structured like it is.
Edit2: Maybe I will have restructure....
Exam shouldn't have a hasTaken property. Think about it in the real world. An Exam would not know about who has taken it because many people could have taken it. The instance of taking an exam, then, should be a first-class concept in your model.
Consider this:
Exam has many TakenExams, TakenExams belongs to Student http://yuml.me/6627495d
Now the concept of taking an exam is a real object, you can then model assocation metadata as well, such as dateTaken, score, and so on.
Also remember that Core Data expects you to have all of your inverse associations set up as well.
You don't usually copy an entity. (I'm not sure what happens if you call copy on an NSManagedObject... it's not explained in the documentation, as far as I know. Experts can correct me. )
Just create another entity, or write a method which does just that.
I think another way is to make many-to-many relationships between Exam and Student:
create relationships in Exam called studentsToTakeThisExam and studentsWhoTookThisExam.
create relationships in Student called examsToTake and examsAlreadyTaken.
and set up the inverse relationships accordingly.
I would not argue (as You requested) if your modeling is correct or not. The procedure to copy an entity is, in general, quite complex, owing to the fact that, besides attributes, you also need to deal with the entity's relationships and copy them. I can not post here a huge amount of source code showing how to accomplish this, however, I can point you to a book where this issue is described in detail, with all of the source code you need. The book is the one from Marcus Zarra, "Core Data Apple’s API for Persisting Data on Mac OS X" by "The Pragmatic Programmers".
You really don't want to copy an Exam in this situation. You'd end up with lots of identically named Exams which didn't have a relationship with each other, and then you'd be forced to group them together (if you wanted to) by their name.
I'd recommend a new entity (perhaps "ExamSitting"?) which represents a Student sitting an Exam. You could then a to-many from Student to ExamSitting, and a to-many from Exam to ExamSitting. This enables you to have as many attributes on the ExamSitting as you like then (hasTaken, grade and so on).
Edit
Okay, given your clarification, I have a point or two to add (although they may not be what you're looking for). I understand that you're loading from files with a particular structure, but that doesn't necessarily have to dictate your structure.
With the XML files laid out as you now describe, I would still use an Exam - Student - ExamSitting model. If I were to implement it, I'd load all the Students, and then, for each record in the Exams file, I'd create one Exam object, and then a number of ExamSitting objects, one for each Student that fits the criteria defined in the record. As I mention above, this enables you to store more information about each event, such as mark, takenDate and so on.
If you're sure there's no requirement to be able to store additional information at this granularity, you could just create a to-many relationship studentsTakingExam. This could be populated as you load each exam record by querying the loaded Student entities.

When should i use properties instead of object references?

For example if i have an Author class and a book class independently. We all know an author writes a book.
What i would love to know when it's best to include the book as a reference object in the Author class or just include the book name?
The reason for this question ties mainly to flexibility and easy maintenance.
Update:
What design pattern should i read up that relate to this type of issue?
I would generally store the reference to the book, and the book object is therefore readily accessible from the author. If you store the property (name) in this scenario, then some questions are:
is the name unique ?
is it costly to retrieve the book from the author (e.g. do you have to go to the database) ?
If you don't want the whole book object in memory (perhaps storing all authors with all their books consumes huge amounts of resource), then perhaps you want a book placeholder object referenced from the author class. That placeholder would store the book's unique key and can retrieve the book upon demand. It may implement a book interface and thus be indistinguishable from the real book. The downside is that the book still has to be populated upon, or prior to, a request for info.
The name would be a very bad idea; it's quite possible for two different books to have the same title.
I'd use an object reference (inside some sort of collection, of course, since an author can write more than one book) - that's what they're for. This is certainly flexible and maintainable.
There may be exceptional circumstances where this causes problems, and only then would I consider keeping some sort of unique ID (in the case of books, the ISBN would be the prime candidate) instead of a reference.
It really depends if author is ever going to need access to book information beyond that of just the title. If really never, then ok maybe you only need the title as a String. However, if additional book info is likely to be needed, you need to think about how you are going to access that. In this case it probably makes sense to use a book class.
Having the book name inside the author object will create duplication if a book has many authors. If somehow you need to modify the book name then you will have to go through all the authors collection in order to determine where you need to change.
Having a single book object referenced by all its authors is much simple. Just change the book name in one place and that's all.
In don't think is it easy maintainable or flexible a structure with unneeded duplicates.
EDIT: With duplication of the name you can also get to inconsistencies. Just imagine what happens if a book's titles (with two authors) gets its name modified just for one author.
It depends on what you want to do with the information. Sometimes it might be easier to have your list of book names stored in the author, other times you might need the full book information (ISBN, publisher etc.)
In a database you would have an Author table with all of their details, and a book table with all details of the book, then probably a many-to-many relationship table to tie books to authors and vice-versa.
Oh, and properties / object references aren't really mutually exclusive. A property CAN be an object reference. It sounds more like you are asking if you should store the full obejct information or just the piece you might need immediately.
Option B. Avoid having such references - it will quickly render your program into a mess of dependencies. Keep the two concepts independent, and if you need, some higher-order abstraction to link between the two.
Your question is a part of a complex saga known as "object-relational mappings", there's a lot of material on design patterns for that on the web.
Object-oriented when each object has references to its best friends – not to their names. There are certain downsides to it, like: it is notoriously hard to keep track of back-references.
However, the problem calls for a more general solution than a best-practice pattern.
As long as you're using OO to design your application at all, I'd opt to be consistent and keep direct references to your objects.
It really depends on your domain. Most book store databases have some discrepancies between names of authors between books, and no information whether a given name is shared by more than one author. So in that a book would have a list of author names, and there would be no object identity mapped to that data.
On the other hand, if your domain is a publishing house, you have a very good idea which of the authors John Smith ( client number 19024982 ), J. Henry Smith ( client number 19024982 ) and John Smith ( client number 773829 ) are the same or different authors, and which books those two authors have created, and so using object references for author and book identity would be a good mapping of the domain.

DB Design: more tables vs less tables

Say I want to design a database for a community site with blogs, photos, forums etc., one way to do this is to single out the concept of a "post", as a blog entry, a blog comment, a photo, a photo comment, a forum post all can be thought as a post. So, I could potentially have one table named Post [PostID, PostType, Title, Body .... ], the PostType will tell what type of post it is.
Or I could design this whole thing with more tables, BlogPost, PhotoPost, ForumPost, and I'll leave Comment just it's own table with a CommentType column.
Or have a Post table for all types of post, but have a separate Comment table.
To be complete I'm using ADO.NET Entity Framework to implement my DAL.
Now the question what are some of the implications if I go with any route described above that will influence on my DB performance and manageability, middle tier design and code cleaness, EF performance etc.?
Thank you very much!
Ray.
Let me ask you this:
What happens if two years from now you decide to add a 'music post' as a blog type? Do you have to create a new table for MusicPost, and then re-code your application to integrate it? Or would you rather log on to your blog admin panel, add a blog type in a drop-down box called 'Music', and be on your merry way?
In this case, less tables!
Generally, life will be easier if you can have all the posts in one table:
less joins to perform
less tables to maintain
common attributes are not repeated between tables
code more generic
However, you could run into some issues:
if each subtype has a lot of its own attributes, you could end up with many columns - maybe too many for your DBMS
if a subtype has an attribute (e.g. a stored picture) that is expensive for your DBMS to maintain even when unused, you might not want that column in all rows
Should you run unto such an issue, you can create a new table just for the specific attributes of that post subtype - for example:
create table posts (post_id number primary key,
post_date date,
post_title ...); /* All the common attributes */
create table photo_post (post_id references posts, photograph ...);
In many cases, no such issues arise and a single table for all will suffice.
I can't think of any merit in creating a distinct table for every subtype.
The problem is similar to the question of how deep your hierarchy should be in an OO design.
A simple approach in OO terms would be to have a base class Post and children for BlogPost, ForumPost and so on. Comment could either be a child of Post or its own hierarchy, depending on your requirements.
Then how this is going to be mapped to DB tables is an entirely different question. This classical essay by Scott Ambler deals with the different mapping strategies and explains their advantages and disadvantages in a rather detailed way.