Modifying Relation into BCNF - rdbms

I am learning DBMS and normalization and I have come across the following exercise. For the following problem:
Consider the relation R(b,e,s,t,r,o,n,g) with functional dependencies
b,s -> e,r,o,n
b -> t
b -> g
n -> b
o -> r
(a) identify candidate keys
(b) identify prime attributes
(c) state the highest normal form of this table
I think that (a) would be {b, s} since they identify all attributes without redundancy.
(b) would also be {b, s} since they compose the candidate keys of (a).
(c) would be 1-NF for several reasons. It does not satisfy 2-NF since there are partial-dependencies n -> b. The aforementioned functional dependency only depends on b and not s, hence partial-dependency. It does not satisfy 3-NF since o -> r indicates that a non prime attribute depends on another non-prime attribute. BCNF is not satisfied since 3-NF is not satisfied.
Lastly, if I were to modify the table until it is in BCNF, would splitting the relation R into:
R1(b, e, s, r, o, n) with b, s -> e, r, o, n
and
R2(b, t, g) with b -> t and b -> g
while eliminating the n -> b and o -> r satisfy BCNF?
I am most confused on the last part regarding satisfying BCNF. I would greatly appreciate any help/thoughts on all steps!

The schema has two candidate keys: {b, s} and {n, s}. You can verify that both are keys be computing the closures of the two sets of attributes.
So the prime attributes are b, s, and n.
You are correct in saying that the relation is not in 2NF, neither in 3NF.
Your proposed decomposition does not produces subschemas in BCNF, since in R1 the dependency o → r still holds, and o is not a superkey of R1.
The “classical” decomposition algorithm for BCNF produces the following normalized schema:
R1(b g t)
R2(o r)
R3(b n)
R4(e n o s)
but the dependencies
b s → e
b s → n
b s → o
b s → r
are not preserved in the decomposition.
A decomposition in 3NF that preserves data and dependencies is the following:
R1(b e n o s)
R2(b g t)
R3(o r)
In this decomposition, R2 and R3 are also in BCNF, while the dependency n → b in R1 violates the BCNF.

Related

Proof that a decomposition into 3 relations is lossless

Given this relation scheme with set of attributes R and set of dependencies F:
R = (ABCD) F = {AB -> C, B -> D; C -> A}
The function dependency B -> D violate BCNF because B is not a superkey so I converted the relation in BCNF by decomposing it in 3 relations using this algorithm:
Given a schema R.
Compute keys for R.
Repeat until all relations are in BCNF.
Pick any R' having a F.D A --> B that violates BCNF.
Decompose R' into R1(A,B) and R2(A,Rest of attributes).
Compute F.D's for R1 and R2.
Compute keys for R1 and R2.
The result I got (which is correct as I checked the available solution) is:
R1:(BD), R2:(CA), R3:(BC).
I know that a property of the conversion algorithm is that the decomposition preserves the data and I want to prove it as en exercise.
Usually with a decomposition into two relations R1 and R2 the procedure is: check for the attributes in common between R1 and R2, do the closure of the result you found, if the closure include all the attributes of either R1 or R2 then the decomposition preserve the data, else is does not.
In the case of this exercise there are no attributes in common between R1,R2 and R3, so I can't do the closure to determine if the decomposition preserve data or not and I don't know how else I could proceed. What should I do the prove that the decomposition is lossless?
To show that the decomposition is lossless, you can proceed in two steps, along the lines of the steps of the decomposing algorithm.
Starting from your schema, let’s apply the first step of the algorithm.
(1) R = (ABCD) F = {AB -> C, B -> D; C -> A}
considering that B -> D violates the BCNF (since the candidate keys are AB and BC), we decompose R in:
(2) R1 = (BD), F1 = {B -> D} and R2 = {ABC}, F2 = {C -> A, AB -> C}
Here we can prove that R1 and R2 are a lossless decomposition, since their intersection is {B}, which is a candidate key for F1 (according to the theorem the you cited).
Now, since R2 is not in BCNF because of C -> A, according to the algorithm we must decompose R2 in R3 = (CA) and R4 = (CB), so the final decomposition is {R1 = (BD), R3 = (CA), R4 = (CB)}. To show the this decomposition of R is lossless, we can use another theorem that says:
If ρ = {R1,..., Rm} is a lossless decomposition of R<T,F> (where T are the attributes of R and F are a cover of the dependencies of R), and σ = {S1, S2} a lossless decomposition of R1 with respect to π(T1)(F), then the decomposition {S1, S2, R2, ..., Rm) is lossless with respect to F.
In the theorem, π(T1)(F) is the projection of the dependencies F onto the attributes T1 of R1.
In this case, we decompose R2(ABC) and π(T2)(F) = {C -> A, AB -> C}, so the theorem can be applied since R3 and R4 are a lossless decomposition with respect to those dependencies.
a decomposition is lossless if we can recover r by using a natural join on the decomposed relations.
if we don't lose the exact information(instance) that was on r, it's lossless.
Now, there is a rule that says that if we have two decomposed relations we can determine whether the decomposition is lossless by intersecting R1 with R2 and if the result we get is an attribute that is a superkey for at least one of them, then the decomposition is lossless.
That's all good but what do we do when we have more than two decomposed relations ? How can we check if the decomposition is lossless in that case ?
well, let's take a look at the following picture for a second.
on this picture each circle represents a relation.
now what we are going to do with that picture ? we are going to apply the rule i stated above about finding if decomposition between two relations is lossless and draw a connecting lines between those relation that fulfill the rule.
so for example on this picture we found that R1 and R3 share a common attribute which is B now B is a superkey for R1, because those conditions are fulfilled we draw a line connecting between those relations.
So we get
now we find a common attribute with R2 and we see that R3 also have C and C is a superkey for R2.
because those conditions are fulfilled we draw a line connecting between R2 and R3
So in that "Graph" you could say, we can "travel" to all the circles because the lines are connected and we don't have any isolation.
so we can move from R1 to R3 and to R2.
because we got such Graph we can say that the decomposition is lossless so R1 ⋈ R3 ⋈ R2 = R

Understanding the diagrams of Product and Coproduct

I am trying to understand the Product and Coproduct corresponding to the following picture:
Product:
Coproduct:
As I understand, a Product type in Haskell is for example:
data Pair = P Int Double
and a Sumtype is:
data Pair = I Int | D Double
How to understand the images in relation with Sum and Product type?
The images are from http://blog.higher-order.com/blog/2014/03/19/monoid-morphisms-products-coproducts/.
So as far as I can tell, the idea behind these diagrams is that you are given:
types A, B and Z
function f and g of the indicated types (in the first diagram, f :: Z -> A and g :: Z -> B, in the second the arrows go "the other way", so f :: A -> Z and g :: B -> Z).
I'll concentrate on the first diagram for now, so that I don't have to say everything twice with slight variations.
Anyway, given the above, the idea is that there is a type M together with functions fst :: M -> A, snd :: M -> B, and h :: Z -> M such that, as the mathematicians say, the diagram "commutes". By that is simply meant that, given any two points in the diagram, if you follow the arrows in any way from one to the other, the resulting functions are the same. That is, f is the same as fst . h and g is the same as snd . h
It is easy to see that, no matter what Z is, the pair type (A, B), together with the usual Haskell functions fst and snd, satisfies this - together with an appropriate choice of h, which is:
h z = (f z, g z)
which trivially satisfies the two required identities for the diagram to commute.
That's a basic explanation of the diagram. But you may be slightly confused about the role of Z in all this. That arises because what's actually being stated is rather stronger. It is that, given A, B, f and g, there is an M together with functions fst and snd, that you can construct such a diagram for any type Z (which means supplying a function h :: Z -> M as well). And further that there is only one function h which satisfies the required properties.
It's pretty clear, once you play with it and understand the various requirements, that the pair (A, B), and various other types isomorphic to it (which basically means MyPair A B where you've defined data MyPair a b = MyPair a b), are the only things which satisfy this. And that there are other types M which would also work, but which would give various different hs - eg. take M to be a triple (A, B, Int), with fst and snd extracting ("projecting to" in mathematical terminology) the first and second components, and then h z = (f z, g z, x) is such a function for any x :: Int that you care to name.
It's been too long since I studied mathematics, and category theory in particular, to be able to prove that the pair (A, B) is the only type that satisfies the "universal property" we're talking about - but rest assured that it is, and you really don't need to understand that (or really any of this) in order to be able to program with product and sum types in Haskell.
The second diagram is more or less the same, but with all the arrows reversed. In this case the "coproduct" or "sum" M of A and B turns out to be Either a b (or something isomoprhic to it), and h :: M -> Z will be defined as:
h (Left a) = f a
h (Right b) = g b
A product (Tuple in Haskell) is an object with two projections. Those are functions projecting the product to their individual factors fst and snd.
Conversly a coproduct (Either in Haskell) is an object that has two injections. Those are functions injecting the individual summands lefts and rights into the sum.
Note, both product and coproduct need to satisfy an universal property. I recommend Bartosz Milewski's introduction on the topic along with his lecture.
One thing not communicated by these diagrams is which pieces are inputs and which are outputs. I'm going to start with products and be extra careful about which things are handed to you, and which you must cook up.
So a product says:
You give me two objects, A and B.
I give you a new object M, and two arrows fst : M -> A and snd : M -> B.
You give me an object Z and two arrows f : Z -> A and g : Z -> B.
I give you an arrow h : Z -> M that makes the diagram commute (...and this arrow is uniquely determined by the choices made so far).
We often pretend that there is a category Hask in which the objects are concrete (monomorphic) types, and the arrows are Haskell functions of the appropriate type. Let's see how the protocol above plays out, and demonstrate that Haskell's data Pair a b = P a b is a product in Hask.
You give me two objects (types), A=a and B=b.
I must produce an object (type) and two arrows (functions). I pick M=Pair a b. Then I must write functions of type Pair a b -> a (for the arrow fst : M -> A) and Pair a b -> b (for the arrow snd : M -> B). I choose:
fst :: Pair a b -> a
fst (P a b) = a
snd :: Pair a b -> b
snd (P a b) = b
You give me an object (type) Z=z and two arrows (functions); f will have type z -> a and g will have type z -> b.
I must produce a function h of type z -> Pair a b. I choose:
h = \z -> P (f z) (g z)
This h is required to make the diagram commute. This means that any two paths through the diagram that begin and end at the same object should be equal. For the diagrams given, that means we must show that it satisfies two equations:
f = fst . h
g = snd . h
I'll prove the first; the second is similar.
fst . h
= { definition of h }
fst . (\z -> P (f z) (g z))
= { definition of (.) }
\v -> fst ((\z -> P (f z) (g z)) v)
= { beta reduction }
\v -> fst (P (f v) (g v))
= { definition of fst }
\v -> f v
= { eta reduction }
f
As required.
The story for coproducts is similar, with the slight tweaks to the protocol described below:
You give me two objects, A and B.
I give you a new object W, and two arrows left : A -> W and right : B -> W.
You give me an object Z and arrows f : A -> Z and g : A -> Z.
I give you an arrow h : W -> Z that makes the diagrams commute (...and this arrow is uniquely determined by the choices made so far).
It should be straightforward to adapt the discussion above about products and Pair to see how this would apply to coproducts and data Copair a b = L a | R b.

What are the Equivalent Classes?

Let A = {a, b, c, d, e, f, g, h, i} and R be a relation on A as follows:
R={(a,a), (f,c), (b,b), (c,f), (a,d), (c,c), (c,i), (d,a), (b,e), (i,c), (e,b), (d,d), (e,e), (f,f), (g,g), (h,h), (i,i),
(h,e), (a,g), (g,a), (d,g), (g,d), (b,h), (h,b), (e,h), (f,i), (i,f)}
I know it is the equivalence relation which is symmetric, transitive and reflexive but I am confused about equivalence classes? What are the equivalence classes?
How can I find the equivalence classes of the relation?
As you stated, an equivalence relation is a relation which is symmetric, reflexive, and transitive. The definition for those terms is as follows:
Symmetric:
Given a,b in A, if a = b then b = a.
Reflexive:
Given a in A, a = a.
Transitive:
Given a,b,c in A, if a = b and b = c, then a = c.
Using these definitions, we can see that the R relation set in your question is indeed the equivalence relation on A. This is because for every a,b,c in A:
a = a, which is represented by (a,a) in R
if a = b, then b = a, represented by (b,a) and (a,b) both being in R
if a = b and b = c, then a = c, represented by (a,b), (b,c), and (a,c) in R.
You can check to make sure this is true, but I'm pretty sure it is. This is what makes R the equivalence relation. Once we have a definition for an equivalence relation, we can define an equivalence class as follows:
The set of all elements in a set which are equal under a given equivalence relation. In formal notation, {x in S | x -> a}, where -> is the equivalence relation.

BCNF decomposition that results in different decompositions (both faithful or faithless)?

This is a question 19.10 (4) in the textbook Database management systems by Raghu Ramakrishnan, Johannes Gehrke (in case anyone's curious - answer not included of course, which is why I needed to ask)
And I noticed that there are two ways to make the BCNF decomposition:
We're given R = ABCD
and the functional Dependencies: A -> B, B -> C, C -> D
Key: A
We can decompose it to BCNF by either using B -> C or C -> D at the start
If decompose along B -> C at the start, get R1 = AB , R2 = BC , R3 = BD (this is not faithful)
If decompose along C ->D at the start, get R1 = AB, R2 = BC , R3 = CD (this is faithful)
I'm quite new to the doing BCNF decomposition, is this correct? So it's possible to have multiple different BCNF decompositions depending on your chosen starting FD?
Thanks in advance :)

How to solve R(A,B,C,D,E) with FDs (A -> B, AB -> D, AC -> E)

How do I ensure that R(A,B,C,D,E) is in 3NF and BCNF with the following functional dependencies:
(A -> B, AB -> D, AC -> E)
If they are not in 3NF and BCNF, then how should I split?
I think that AC and ACB are keys. ABC are therefore prime attributes and DE are non-prime attributes.
Any help is appreciated!
A relation is in BCNF if one of the following conditions hold:
X → Y is a trivial functional dependency (Y ⊆ X)
X is a superkey for schema R
Wikipedia
In other words if the determinant, X, is a candidate key or Y is a subset of X then the relation is in BCNF. Furthermore if a relation is in BCNF it will also be in 3NF.
To determine if the determinant is a superkey you'd test it by its attribute closure of X. I'd recommend this Youtube video if you're not familiar with it.