Decomposing tables where some attributes aren't in any minimal, non-trivial functional dependency - database-normalization

While writing a library to automatically decompose a given table, I came across some special cases where the first steps of Bernstein's synthesis for 3NF (ACM 1976) give me results I don't expect.
Take this simple case:
a b
---
1 1
2 1
1 2
2 2
By my understanding, this is the full set of functional dependencies:
{} -> {}
a -> {}
a -> a
b -> {}
b -> b
ab -> {}
ab -> a
ab -> b
ab -> ab
By eye, we can see that both attributes together form a candidate key, and there's no normalisation to be done. However, suppose we take the full FD set above, and try to apply Bernstein to decompose the table. We expect to get the same table back.
Bernstein has the following first steps:
Eliminate extraneous attributes. We simplify to
{} -> {} (repeated)
a -> a (repeated)
b -> b (repeated)
ab -> ab
Find a non-redundant covering. ab -> ab is redundant by augmentation, so we have
{} -> {}
a -> a
b -> b
I'd say the latter two are redundant as well, by reflexivity. If we keep them, the remaining steps give two non-equivalent keys, which result in two separate relations after applying the rest of Bernstein synthesis. If we don't keep them, there's nothing to work with, so the remaining steps give no tables.
Where is the problem in the above?

This appears to be solved by an addendum to Bernstein's synthesis, that I came across in lecture videos from Gary Boeticcher, then at UHCL: if the decomposition does not contain a table containing one of the original table's candidate keys, then adding an additional table with one of those candidate keys will make the decomposition lossless. In this case, after applying Bernstein's synthesis, and getting no tables in return, we could add a table with both attributes a and b. This gives us back the original table, as we'd expect.

Related

Merging Two Forall and Sum Constraints

I'm looking at a way to merge these two constraints together, and feel there is a way to utilise an IF statement to merge them together. My attempt is below but I couldn't seem to get the constraints to perform correctly. Can someone help as I believe there is a simple way to join them and thus make the model perform more efficiently.
%Constraint 4 - Coaches must have 3 or less Juniors
constraint forall (coach in Coaches where coach != Unallocated)
(sum(coachee in Coachees where Coachee_Grade[coachee]=Junior) (Matched_Coach[coachee,coach]=1) <= 3);
%Constraint 5 - Coaches must have 4 or less Seniors
constraint forall (coach in Coaches where coach != Unallocated)
(sum(coachee in Coachees where Coachee_Grade[coachee]=Senior) (Matched_Coach[coachee,coach]=1) <= 4);
%Constraint 4 + Constraint 5 - Coaches must have 4 or less Seniors, 3 or less Juniors
constraint forall (coachee in Coachees, coach in Coaches where coach != Unallocated)
(if Coachee_Grade[coachee]=Junior then (Matched_Coach[coachee,coach]=1) <= 3)
else (Matched_Coach[coachee,coach]=1) <= 4) endif);
I would suggest to explicitly record for every coach if they are coaching students from a junior or senior level. This would simplify the sum constraint and possibly allows you to specify a search strategy that fixes this first, which might be helpful.
array[COACHES] of var bool: coaches_juniors;
% Ensure coaches that teach senior will never teach juniors
constraint forall(coach in COACHES where not coaches_juniors[coach], coachee in Coachees where Coachee_Grade[coachee]=Junior) (
not Matched_Coach[coachee,coach]
)
% Constrain coaches to teach at most 4 students or 3 when junior
constraint forall (coach in Coaches where coach != Unallocated) (
sum(coachee in Coachees)(Matched_Coach[coachee,coach]=1) + coaches_juniors[coach] <= 4
);
This should capture the constraints mentioned.
Furthermore you might think about the view point of your model. You have chosen a Boolean matrix for your variables, but in CP it is often worthwhile to describe your model at a higher level. (This very much looks like an integer programming model). You might instead want to try describing it using:
A variable set for every coach that contains the trainees. (Potentially arrays of 4 instead of sets, if you need more control and know how to eliminate the symmetries).
Or an integer variable for every student to assign a coach.
It sounds like a LCG solver like Chuffed or OR-Tools would work well with your model, so using this higher level view would probably get you better results.
Note that MiniZinc is build to translate high-level models to whichever solver is targeted. Generally it is best to use high-level MiniZinc and let the solver library choose the encoding of the problem that is best.

Graph database: get common parent node

I want to select the first common boss for two employees in draph.
My model is simple:
name: string
boss_of: uids
Lets assume the following data where each arrow denotes the boss_of edge:
A -> B
A -> C
B -> D
C -> E
E -> F
E -> G
So, given F And D the query should return A, for F and G the result is obviously E.
I tried using allofterms but found no solution as there may be a different number of nodes
between the co-workers and their common boss. Is it possible at all to formulate such a query?
I am trying to explore dgraph (or graph databases at all), so maybe I am just overseeing something.
You can use K-Shortest Path Queries
The middle one in the response is the closest common entity.

Database Normalization mistake

I'm preparing an exam and on my texts I found an example I don't understand.
On the Relation R(A,B,C,D,E,F) I got the following functional dependencies:
FD1 A,B -> C
FD2 C -> B
FD3 C,D -> E
FD4 D -> F
Now I think all The FD are in 3NF (none is in BCNF), but the text says FD1 and FD2 to be in 2NF and FD3 and FD4 to be in 1NF. Where am I making mistakes (or is it the text wrong).
I found alternative keys to be ABD and ACD
Terminology
It is highly improper to say that: “a Functional Dependency in is in a certain Normal Form”, since only a relation schema can be (or not) in a Normal Form. What can be said is that a Functional Dependency violates a certain Normal Form (so that the schema that contains it is not in that Normal Form).
Normal forms
It can be shown that a relation schema is in BCNF if every FD given has as determinant a superkey. Since, has you have correctly noted, the only candidate keys here are ABD and ACD, every dependency violates that Normal Form. So, the schema is not in BCNF.
To be in 3NF, a relation schema must have all the given functional dependencies such that either the determinant is a superkey, or every attribute of the determinate is a prime attribute, that is it is an attribute of some candidate key. In your example this is true for B and C, but not for E and F, so FD3 and FD4 violates the 3NF. So, the schema is neither in 3NF.
The 2NF, which is only of historical interest and not particularly useful in the normalization theory, is a normal form for which the relation schema does not have functional dependencies in which non-prime attributes depend on part of keys. This is not true again for FD3 and FD4, so that the relation is neither in 2NF.

How is every binary relation BCNF?

So, as part of my assignment, I have to prove that any relation with two attributes is in BCNF.
As per my understanding, if for a relation we have 3rd normal form and one non key attribute functionally determine key attribute, it violates the BCNF.
Say my relation consists of two attributes A1,A2
Scenario1(only one functional dependency)
A1 -> A2 (so A1 is the key, and A2 does not FD A1 : so no violation)
same applies for
A2 -> A1
But what if
A1->A2 and A2->A1
Here key can be either A1, A2. And the other non key attribute functionally determines the key.
In each functional dependency X -> Y, X and Y are sets of attributes. This requires special attention when either X or Y is an empty set1. So, in the example with only two attributes A1 and A2, we have all the possible non-trivial dependencies:
1. {} -> {A1}
2. {} -> {A2}
3. {} -> {A1 A2}
4. {A1} -> {A2}
5. {A2} -> {A1}
All the other possible dependencies are trivial dependencies, i.e. the right set is a subset of the left set (for instance {A1} -> {}, {} -> {}, {A1} -> {A1}, {A1 A2} -> {A1}, etc.). We know that these dependencies always hold, so they are not considered in the definition of the normal forms.
1. When empty sets are excluded from dependencies, the theorem is true
Consider the dependencies 4 and 5. We have four possible cases:
1. Only 4 holds, so we have: {A1} -> {A2}
this means that {A1} is a candidate key (since from {A1} -> {A2} we can derive that {A1}->{A1 A2}), and the BCNF condition is satisfied since each dependency has a superkey as determinant;
2. Only 5 holds, so we have: {A2} -> {A1}
equivalent to the previous case, only the role of A1 and A2 is exchanged;
3. Neither 4 nor 5 hold (no functional dependencies),
so the BCNF is formally satisfied (since no dependency violates the BCNF); and, finally:
4. both hold, so we have {A1} -> {A2} and {A2} -> {A1}
also in this case the relation is in BCNF, since {A1} and {A2} are both candidate keys, since they determine all the attributes (simply put together 1 and 2 above).
2. When we allow the empty set in the functional dependencies, the theorem is not true
Consider a relation R(A1, A2), with a cover F of the dependencies
F = { {}-> {A1} }
The meaning of {} -> {A1}, by recalling the definition of functional dependency, is that the column A1 has a constant value. So we have a relation with two columns, one of which has always the same value. In this case the only candidate key is {A2}, since {A2}+ = {A1 A2}, with {A1 A2} a superkey, and the relation is not in BCNF since a non-trivial functional dependency ({} -> {A1}) has a determinant which is not a superkey.
1 Note that in the scientific literature on the subject (as well as in books on databases) the possibility of empty sets in functional dependences is sometimes explicitly excluded (for instance, see: Tsou, Don-Min, and Patrick C. Fischer. “Decomposition of a Relation Scheme into Boyce-Codd Normal Form.” ACM SIGACT News 14, no. 3 (July 1, 1982): 23–29. https://doi.org/10.1145/990511.990513), while sometimes it is allowed, or not discussed.
For any relation to be in BCNF, the following must holds.
X → Y is a trivial functional dependency (Y ⊆ X).
X is a superkey for schema R
Wikipedia link here
For Example, there is a relation R = {A,B} with two attributes.
The only possible (non-trivial) FD's are {A}->{B} and {B}->{A}.
So, there are four possible cases:
1. No FD's holds in R. {C.K = AB}, Since it is an all key relation it's always in BCNF.
2. Only A->B holds. In this case {C.K = A} and relation satisfies BCNF.
3. Only B->A holds. In this case {C.K = B} and relation satisfies BCNF.
4. Both A->B and B->A holds. In this case there are two keys {CK = A and B} and
relation satisfies BCNF.
Hence, every Binary Relation (A relation with two attributes) is always in BCNF!
To prove any relation with two attributes is in BCNF.
Rule For Boyce-Codd Normal Form:
A relation R is in BCNF if R is in Third Normal Form and for every FD,LHS is super key
so if, A1 and A2 are the only attributes: A1 -> A2 and A2 -> A1 as functional dependencies, then in both functional dependencies, the left-hand side is a super key. Which satisfies the condition of BCNF.

Postgres: n:m intermediate table with type

I have a table called "Tag" which consists of an Id, Name and Description column.
Now lets say I have the tables Character (C), Movie (M), Series (S) etc..
And I want to be able to tag entries in C, M, S with multiple tags and one tag may be used for multiple entries.
So I could realize it like this:
T -> TC <- C
T -> TM <- M
T -> TS <- S
Where TC, TM, TS are the intermediate tables.
I was wondering if I could combine TC, TM, TS into one table with a type column added and still use foreign keys.
As of yet I haven't found a way to do it.
Or is this something I shouldn't be doing?
As the comments above suggested you can't combine multiple table into a single one. If you want to have a single view of the "tag relationships" you can pull the needed information into a View. This way, you only need to write a longer query once and are able to use like a single table. Keep in mind that you can't insert data into a view (there are possibilities to do so, but they are a little advanced)