Transitive Dependency issue in Data Normalization

Transitive Dependency issue in Data Normalization - database-normalization

I wish to find the transitive dependencies present in this table.
Here,
SID: Staff ID
PS: Pay scale
S_name: Staff Name
C_P_HR: Charge per hour
CNIC: ID card

First start off by listing all the functional dependencies. Logically since SID is the primary key, everything depends on the primary key. Also, CNIC forms a candidate key
{SID}+ -> {PS, S_NAME, C_P_HR, CNIC, DESIGNATION, SALARY}
{S_NAME}+ -> {SID, PS, CNIC, C_P_HR, DESIGNATION, SALARY}
{CNIC}+ -> {SID, PS, S_NAME, C_P_HR, DESIGNATION, SALARY}
{PS}+ -> {C_P_HR,DESIGNATION}
Now, a transitive dependency needs to satisfy these criteria:
A → B
It is not the case that B → A
B → C
Upon inspection of all the Functional Dependencies, none of them satisfy the criteria. Thus, we can conclude that there ae no transitive dependencies in the table.

Related

To what level is this schema normalized?

logical design:
Pet(name, type, birthday, cost)
determinants:
name->type
name->birthday
name->cost
Here is some data:
name type birthday cost
Bruno cat 1/1/1982 free
Poppy cat 1/2/1982 20.00
Silly cat 12/2/1995 free
Sam dog 2/3/1989 100.00
Tuffy dog 3/3/1974 free
There's repeated data between rows but no duplicate columns. I think it's in BCNF.

Yes, the schema is in BCNF, if the dependencies given are a cover of all the dependencies holding in the schema. In this case, name is the only candidate key, and the left part of each non-trivial dependency (including those implied by the cover) is a super key. So the relation is in BCNF.

Database Normalization mistake

I'm preparing an exam and on my texts I found an example I don't understand.
On the Relation R(A,B,C,D,E,F) I got the following functional dependencies:
FD1 A,B -> C
FD2 C -> B
FD3 C,D -> E
FD4 D -> F
Now I think all The FD are in 3NF (none is in BCNF), but the text says FD1 and FD2 to be in 2NF and FD3 and FD4 to be in 1NF. Where am I making mistakes (or is it the text wrong).
I found alternative keys to be ABD and ACD

Terminology
It is highly improper to say that: “a Functional Dependency in is in a certain Normal Form”, since only a relation schema can be (or not) in a Normal Form. What can be said is that a Functional Dependency violates a certain Normal Form (so that the schema that contains it is not in that Normal Form).
Normal forms
It can be shown that a relation schema is in BCNF if every FD given has as determinant a superkey. Since, has you have correctly noted, the only candidate keys here are ABD and ACD, every dependency violates that Normal Form. So, the schema is not in BCNF.
To be in 3NF, a relation schema must have all the given functional dependencies such that either the determinant is a superkey, or every attribute of the determinate is a prime attribute, that is it is an attribute of some candidate key. In your example this is true for B and C, but not for E and F, so FD3 and FD4 violates the 3NF. So, the schema is neither in 3NF.
The 2NF, which is only of historical interest and not particularly useful in the normalization theory, is a normal form for which the relation schema does not have functional dependencies in which non-prime attributes depend on part of keys. This is not true again for FD3 and FD4, so that the relation is neither in 2NF.

Is an FD(functional dependency) fully fd when x->y and z->y where z is not a subset of x?

I have seen many examples about fully functional dependencies, but they use to say that:
x->y such that y shouldn't be determined by any proper subset of x, x has to be a key.
But, what if y is determined by an attribute other than the proper subset or subset of x.
Suppose that I have a students table which consists of rollno(primary key), name, phone no unique not null, email unique not null.
As rollno is a primary key, let it be x and take name as y.
now x->y, but phone or email also determine y(name) which are not subsets of x. Is this still called a fully functional dependent?
If yes, should we check the determinants of y which are only subsets of x?
If no, what is the mistake I did?

x->y such that y shouldn't be determined by any proper subset of x, x has to be a key.
You are confusing the definition of "full functional dependency" with the definition of "2NF". The definition of fully functionally dependent has nothing to do with superkeys or candidate keys or primary keys. And for a relation to be in 2NF, if X is a candidate key and Y is non-prime then Y can't be determined by any proper subset of X.
A functional dependency X -> Y is partial when Y is also functionally dependent on a proper/smaller subset of X. Otherwise it is full. It doesn't matter what else is true.
A superkey is a column or set of columns that functionally determines every column. If there is no smaller superkey inside it then it is a candidate key. A relation is in 2NF when every attribute is fully functionally dependent on every candidate key. It doesn't matter what else is true.
You can pick one candidate key to call primary key. So a primary key is a candidate key. Otherwise the notion of "primary key" is irrelevant to functional dependencies and normalization.
(In SQL primary key means the same as unique not null, namely superkey. Which is a candidate key only if there's no smaller superkey in it. So a set declared primary key might not even be a primary key. And in SQL you can't declare {} as a superkey.)
As rollno is a primary key, let it be x and take name as y.
A primary key is a candidate key, so {rollno} determines every attribute and no proper subset of {rollno} determines every attribute. So {}, the only proper subset of {rollno}, is not a superkey. ({} is a superkey when there can only ever be at most one row in the table.) But it's still possible that {} -> name. (That would be if the name column only contains at most one name at a time.) Then {rollno} -> name would be partial because its proper subset {} determines name.
now x->y, but phone or email also determine y(name) which are not subsets of x. Is this still called a fully functional dependent?
If no proper subset of {rollno} determines name then {rollno} -> name fully, otherwise partially. That's what the definition says. Nothing else matters. But we don't know whether proper subset of {rollno} determines name because you didn't say whether {} -> name.
If {rollno}, {phoneno} and {email} are candidate keys and {} doesn't determine name then name is fully functionally dependent on all three (because no proper subset of any of them determines name).

You are saying:
x->y such that y shouldn't be determined by any proper subset of x, x has to be a key.
but this mixes two different concepts, that of “full functional dependency”, and that of “key”.
A functional dependency is full if you cannot remove any element of the left part without losing the propoerty of determining the right part. So if a functional dependency has only one attribute on the left part (like rollno → name), it is always complete.
A (candidate) key on the other hand is a set of attributes that determines all the attributes of a relation, and such that you cannot remove any attribute from it without losing the property of being a key (so, it is not a superkey).
In your example there are three different keys, rollno, phone, and email, each of them composed by a single attribute.
Of course, if you know that the set of attributes X is a key, you can write that X → T, where T are all the attributes of the relation, and this functional dependency is complete.

How is every binary relation BCNF?

So, as part of my assignment, I have to prove that any relation with two attributes is in BCNF.
As per my understanding, if for a relation we have 3rd normal form and one non key attribute functionally determine key attribute, it violates the BCNF.
Say my relation consists of two attributes A1,A2
Scenario1(only one functional dependency)
A1 -> A2 (so A1 is the key, and A2 does not FD A1 : so no violation)
same applies for
A2 -> A1
But what if
A1->A2 and A2->A1
Here key can be either A1, A2. And the other non key attribute functionally determines the key.

In each functional dependency X -> Y, X and Y are sets of attributes. This requires special attention when either X or Y is an empty set1. So, in the example with only two attributes A1 and A2, we have all the possible non-trivial dependencies:
1. {} -> {A1}
2. {} -> {A2}
3. {} -> {A1 A2}
4. {A1} -> {A2}
5. {A2} -> {A1}
All the other possible dependencies are trivial dependencies, i.e. the right set is a subset of the left set (for instance {A1} -> {}, {} -> {}, {A1} -> {A1}, {A1 A2} -> {A1}, etc.). We know that these dependencies always hold, so they are not considered in the definition of the normal forms.
1. When empty sets are excluded from dependencies, the theorem is true
Consider the dependencies 4 and 5. We have four possible cases:
1. Only 4 holds, so we have: {A1} -> {A2}
this means that {A1} is a candidate key (since from {A1} -> {A2} we can derive that {A1}->{A1 A2}), and the BCNF condition is satisfied since each dependency has a superkey as determinant;
2. Only 5 holds, so we have: {A2} -> {A1}
equivalent to the previous case, only the role of A1 and A2 is exchanged;
3. Neither 4 nor 5 hold (no functional dependencies),
so the BCNF is formally satisfied (since no dependency violates the BCNF); and, finally:
4. both hold, so we have {A1} -> {A2} and {A2} -> {A1}
also in this case the relation is in BCNF, since {A1} and {A2} are both candidate keys, since they determine all the attributes (simply put together 1 and 2 above).
2. When we allow the empty set in the functional dependencies, the theorem is not true
Consider a relation R(A1, A2), with a cover F of the dependencies
F = { {}-> {A1} }
The meaning of {} -> {A1}, by recalling the definition of functional dependency, is that the column A1 has a constant value. So we have a relation with two columns, one of which has always the same value. In this case the only candidate key is {A2}, since {A2}+ = {A1 A2}, with {A1 A2} a superkey, and the relation is not in BCNF since a non-trivial functional dependency ({} -> {A1}) has a determinant which is not a superkey.
1 Note that in the scientific literature on the subject (as well as in books on databases) the possibility of empty sets in functional dependences is sometimes explicitly excluded (for instance, see: Tsou, Don-Min, and Patrick C. Fischer. “Decomposition of a Relation Scheme into Boyce-Codd Normal Form.” ACM SIGACT News 14, no. 3 (July 1, 1982): 23–29. https://doi.org/10.1145/990511.990513), while sometimes it is allowed, or not discussed.

For any relation to be in BCNF, the following must holds.
X → Y is a trivial functional dependency (Y ⊆ X).
X is a superkey for schema R
Wikipedia link here
For Example, there is a relation R = {A,B} with two attributes.
The only possible (non-trivial) FD's are {A}->{B} and {B}->{A}.
So, there are four possible cases:
1. No FD's holds in R. {C.K = AB}, Since it is an all key relation it's always in BCNF.
2. Only A->B holds. In this case {C.K = A} and relation satisfies BCNF.
3. Only B->A holds. In this case {C.K = B} and relation satisfies BCNF.
4. Both A->B and B->A holds. In this case there are two keys {CK = A and B} and
relation satisfies BCNF.
Hence, every Binary Relation (A relation with two attributes) is always in BCNF!

To prove any relation with two attributes is in BCNF.
Rule For Boyce-Codd Normal Form:
A relation R is in BCNF if R is in Third Normal Form and for every FD,LHS is super key
so if, A1 and A2 are the only attributes: A1 -> A2 and A2 -> A1 as functional dependencies, then in both functional dependencies, the left-hand side is a super key. Which satisfies the condition of BCNF.

How to normalize a doctor table to follow 2NF?

There is a base table called doctor in my database where I have the columns
Name, d_ID, Age, Gender, Contact_NO, Speciality, beg_Date, End_Date
I wish to normalize my table. I have found the dependencies for doctor table as follows:
Name, d_ID ---> Age, gender, Speciality
d_ID----> Name, Contanct_NO, Beg_Date, End_Date
There are a few more base tables with a similar structure.
I have computed the closures and found that I have 2 candidate keys which are {d_ID} and {Name,d_ID}. I chose {d_ID} to be the primary key and {Name,d_ID} to be the secondary key.
My question is:
I want to know if my table is in 2NF already. If not, please let
me know how to break down the relation?
I have an intermediate table called patient_record which has,
doctor id, patient id, nurse id, bed id (foreign key) and so on.My
confusion lies where, if normalization has to be only done to the intermediate tables
and not the other base tables. I believe this, because the base tables would only have
unique identifiers for their columns and hence they would
automatically fall under 2NF?

i computed the closures and found that i have 2 candidate keys which are {d_ID} and {Name,d_ID} (Please correct me if i am wrong).
No. By definition, candidate keys are irreducible. If d_ID is a candidate key, then {Name, d_ID} is not. {Name, d_ID} is not a candidate key, because it's reducible. Drop the attribute "Name", and you've got a candidate key (d_ID).
1) i want to know if my table is in 2NF already. If not, please let me know how to break down the relation?
It's really hard to say in this case. Although you have a unique ID number for every doctor, in your case it only serves to identify a row, not a doctor. Your table allows this kind of data.
d_ID Name Age Gender Contact_NO Speciality beg_Date End_Date
--
117 John Smith 45 M 123-456-7890 Cardio 2013-01-01 2015-12-31
199 John Smith 45 M 123-456-7890 Cardio 2013-01-01 2015-12-31
234 John Smith 45 M 123-456-7890 Cardio 2013-01-01 2015-12-31
How many doctors are there? (I made up the data, so I'm really the only one who knows the right answer.) There are two. 234 is an accidental duplicate of 117. 199 is a different doctor than 117; it's just a coincidence that they're both heart specialists at the same hospital, and their hospital privileges start and stop on the same dates.
That's the difference between identifying a row and identifying a doctor.
Whether it's in 2NF depends on other functional dependencies that might not yet be identified. There might be several of these.
2) i have an intermediate table called patient_record which has the doctor id, patient id, nurse id, bed id (foreign key)and so on. i am confused if normalization has to be only done to intermediate tables and not the other base tables.
Normalization is usually done to all tables.
Because base tables would only have unique identifiers for the columns and hence they would automatically fall under 2NF?
No, that's not true. For clarification, see my answer to Learning database normalization, confused about 2NF.
Identifying a row and identifying a thing
It's a subtle point, but it's really, really important.
Let's look at a well-behaved table that has three candidate keys.
create table chemical_elements (
element_name varchar(35) not null unique,
symbol varchar(3) not null unique,
atomic_number integer not null unique
);
All three attributes in that table are declared not null unique, which is the SQL idiom for identifying candidate keys. If you feel uncomfortable not having at least one candidate key declared as primary key, then just pick one. It doesn't really matter which one.
insert into chemical_elements
(atomic_number, element_name, symbol)
values
(1, 'Hydrogen', 'H'),
(2, 'Helium', 'He'),
(3, 'Lithium', 'Li'),
(4, 'Beryllium', 'Be'),
[snip]
(116, 'Ununhexium', 'Uuh'),
(117, 'Ununseptium', 'Uus'),
(118, 'Ununoctium', 'Uuo');
Each of the three candidate keys--atomic_number, element_name, symbol--unambiguously identifies an element in the real world. There's only one possible answer to the question, "What is the atomic number for beryllium?"
Now look at the table of doctors. There's more than one possible answer to the question, "What is the ID number of the doctor named 'John Smith'?" In fact, there's more than one possible answer for the very same doctor, because 234 and 117 refer to the same person.
It doesn't help to include more columns, because the data is the same for both doctors. You can get more than one answer to the question, "What's the ID number for the 45-year-old male doctor whose name is 'John Smith', whose phone number is 123-456-7890, and whose specialty is 'Cardio'?"
If you find people making appointments for these two doctors, you'll probably find their ID numbers written on a yellow sticky and stuck on their monitor.
Dr. John Smith who looks like Brad Pitt (!), ID 117.
Other Dr. John Smith, ID 199.
Each ID number unambiguously identifies a row in the database, but each ID number doesn't unambiguously identify a doctor. You know that ID 117 identifies a doctor named John Smith, but if both John Smiths were standing in front of you, you wouldn't be able to tell which one belonged to ID number 117. (Unless you had read that yellow sticky, and you knew what Brad Pitt looked like. But that information isn't in the database.)
What does this have to do with your question?
Normalization is based on functional dependencies. What "function" are we talking about when we talk about "functional dependencies"? We're talking about the identity function.

Here is the normalization process:
Identify all the candidate keys of the relation.
Identify all the functional dependencies in the relation.
Examine the determinants of the functional dependencies.if any determinant is not a candidate key,the relation is not well formed. then
i) Place the columns of the functional dependency in a new relation of their own.
ii)Make the determinant of the functional dependency the primary key of the new relation.
iii) Leave a copy of the determinant as a foreign key in the original relation.
Create a referential integrity constraint between the the original and the new relation.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse