Suppose, I have a Table with the columns:
person_id (primary key)
first_name
last_name
birthday
I also have a unique constraint on the combination {first_name, last_name} (I know that more people can have the same name, but I want to keep my example simple). I want to know whether this Table is in Third normal form.
My reasoning (before EDIT):
All fields can only contain atomic values, so the Table is in First normal form.
The candidate keys are 1) person_id, 2) [first_name, last_name]
The only non-prime attribute is birthday.
The attribute birthday is not functionally dependent on part of candidate key 1 (which is impossible anyway, since there is only 1 attribute in candidate key 1)
The attribute birthday is not functionally dependent on part of candidate key 2
Therefore, this Table is in Second normal form.
The attribute birthday (is/is not) non-transitively dependent on candidate key 1
The attribute birthday is non-transitively dependent on candidate key 1
The Question (before EDIT):
The question that I cannot answer is if birthday is non-transitively dependent on person_id. Functionally, there is no relationship at all between this id number and the birthday.
Does this mean that there is a transitive dependency (birthday depends on [first_name, last_name], and each combination [first_name, last_name] maps to an id) and therefore not in 3NF?
Does this mean that there is no dependency at all, and therefore not in 3NF?
Am I misinterpreting the difficult language and is this Table in 3NF?
My reasoning (after EDIT):
If you know the person_id, you know his first name, last name and his birthday, so there are the FDs {person_id} -> {first_name}, {person_id} -> {last_name} and {person_id} -> {birthday}.
If you know a person's first and last name, you know his person_id and birthday, so there are the FDs {first_name, last_name} -> {person_id} and {first_name, last_name} -> {birthday}.
If you know a person's birthday, you don't know anything about his person_id or name, so there are no FDs from birthday to another (set of) attribute(s).
All fields can only contain atomic values, so the Table is in First normal form.
The candidate keys are 1) {person_id}, 2) {first_name, last_name}
The only non-prime attribute is {birthday}.
The attribute {birthday} is not FD on part of CK 1 (which is impossible anyway, since there is only 1 attribute in CK 1)
The attribute {birthday} is not FD on part of CK 2
Therefore, this Table is in Second normal form.
There is an FD {person_id} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 1
There is an FD {first_name, last_name} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 2
Therefore, this Table is in Third normal form.
There is a dependency {person_id} -> {first_name, last_name} -> {birthday}, but since there is also a direct dependency {person_id} -> {birthday}, this dependency is not transitively.
The Question (after EDIT):
I don't have a predefined set of FDs from a book, so I am not sure whether the FDs are correct. Can someone confirm this, or if they don't look right, show how I can find the FDs in this practical example?
Third reasoning (second EDIT):
FD's:
If you only know a person's person_id, you know his first name, last name and his birthday (there cannot be multiple people with the same person_id)
FD: {person_id} -> {first_name}
FD: {person_id} -> {last_name}
FD: {person_id} -> {birthday}
Supersets including {person_id} no longer need to be considered
If you only know a person's first_name, you don't know any other field of this person (there can be multiple people with the same first_name)
Not FD: {first_name} -> {person_id}
Not FD: {first_name} -> {last_name}
Not FD: {first_name} -> {birthday}
If you only know a person's last_name, you don't know any other field of this person (there can be multiple people with the same last_name)
Not FD: {last_name} -> {person_id}
Not FD: {last_name} -> {first_name}
Not FD: {last_name} -> {birthday}
If you only know a person's birthday, you don't know any other field of this person (there can be multiple people with the same birthday)
Not FD: {birthday} -> {person_id}
Not FD: {birthday} -> {first_name}
Not FD: {birthday} -> {last_name}
If you know a person's first_name and last_name, you know his person_id and his birthday (there cannot be multiple people with the same first_name and last_name)
FD: {first_name, last_name} -> {person_id}
FD: {first_name, last_name} -> {birthday}
Supersets including {first_name, last_name} no longer need to be considered
If you know a person's first_name and birthday, you don't know any other field of this person (there can be multiple people with the same first_name and birthday)
Not FD: {first_name, birthday} -> {person_id}
Not FD: {first_name, birthday} -> {last_name}
If you know a person's last_name and birthday, you don't know any other field of this person (there can be multiple people with the same last_name and birthday)
Not FD: {last_name, birthday} -> {person_id}
Not FD: {last_name, birthday} -> {first_name}
Normal forms:
All attributes can only contain single values, so the Table is in First normal form.
Looking at the FDs, there are two candidate keys: 1) {person_id}, 2) {first_name, last_name}
The only non-prime attribute is {birthday}.
The attribute {birthday} is not FD on part of CK 1 (which is impossible anyway, since there is only 1 attribute in CK 1)
The attribute {birthday} is not FD on part of CK 2 (i.e. there is no FD {first_name} -> {birthday} or FD {last_name} -> {birthday})
Therefore, this Table is in Second normal form.
S transitively determines T when there exists an X such that S -> X and X -> T and not(X -> S)
Let S = CK1 = {person_id} and T = {birthday}. The only X such that S -> X and X -> T is when X = {first_name, last_name}. However, then also X -> S holds. Therefore, S non-transitively determines T.
Let S = CK2 = {first_name, last_name} and T = {birthday}. The only X such that S -> X and X -> T is when X = {person_id}. However, then also X -> S holds. Therefore, S non-transitively determines T.
Therefore, this Table is in Third normal form.
Re your original question:
Your organization and reasoning are unsound. First give the all the FDs. Eg this determines the CKs. Eg you cannot reason soundly on just giving the (alleged) CKs (which imply certain FDs) and a couple of non-FDs. Eg "non-transitively dependent" cannot be determined without knowing all the FDs. Only then can you write sound bullets & answer your numbered questions.
But let's assume that {first_name,last_name} and {person_id} really are the only CKs and that there are no FDs other than those implied by the fact that each CK determines every attribute not in it.
Functionally, there is no relationship at all between this id number
and the birthday.
I don't know what you mean by "Functionally, there is no relationship at all between". Maybe you are trying to say that {person_id} does not functionally determine {birthday}. But it does, because a CK determines all attributes not in it. Maybe you mean you don't see an application constraint between people ids and birthdays and/or a table constraint involving a table's person_id and a birthday values. But there is: A given person only has one birthday at a time, and in the table a person_id only ever has one birthday at a time. This is a consequence of the meaning of and rules around "people", "birthdays", person_id and birthday. The constraint on person_id and birthday is expressed by "{person_id} -> {birthday}" and you have to know whether it is the case as part of determining the intial list of all FDs (that precedes determining CKs).
S transitively determines T when there exists an X such that S -> X and X -> T and not(X -> S). S non-transitively determines T when it doesn't transitively determine it.
Does this mean that there is a transitive dependency (birthday depends on [first_name, last_name], and each combination [first_name, last_name] maps to an id) and therefore not in 3NF?
I don't know what you are trying to say by "each combination maps to an id" let alone why it implies non-3NF. Maybe you are trying to say that taking {person_id} as S and {birthday} as T and {first_name, last_name} as X we have S -> X and X -> T so (wrongly) a non-prime attribute is transitively dependent on a CK so the relation is not in 3NF. But you did not satisfy not(X -> S).
For {person_id} as S and {birthday} as T the only possibility for X -> T has {first_name,last_name} as X but X -> S because X is a key so S -> T is not transitive.
Similarly for {first_name,last_name} as S and {birthday} as T the only possibility for X -> T has {person_id} as X but X -> S because X is a key so S -> T is not transitive.
Does this mean that there is no dependency at all, and therefore not in 3NF?
Since the relation in in 2NF and every non-prime attribute is non-transitively dependent on every CK, the relation is in 3NF.
Am I misinterpreting the difficult language and is this Table in 3NF?
You didn't claim it was or wasn't, did you?
(Please edit your question to use proper technical terms.)
Re your EDIT version
(You acknowledged in comments that your last bullet was supposed to have CK 2 and that it was unsound. And that my guesses at your unclear phrasings were more or less what you meant.)
All fields can only contain atomic values, so the Table is in First normal form.
Normalization only makes sense for relational "tables", ie relations. That means unique unordered attributes ("columns") and tuples ("rows"). With one value per attribute per tuple. All relations are in 1NF.:
A relational table is always in 1NF. Each column of a row has a single
value of the column's type. A non-relational database is "normalized"
to tables ie 1NF (first sense of "normalized") which gets rid of
repeating groups. Then those tables/relations are "normalized" to
higher normal forms (second sense of "normalized").
"Atomic" is not helpful: "Atomic" originally meant not a relation.:
In Codd's original 1970 paper he explained that "atomic" meant not a
relation (ie not a table):
So far, we have discussed examples of relations which are defined on simple domains--domains whose elements are atomic
(nondecomposable)
values. Nonatomic values can be discussed within the relational
framework. Thus, some domains may have relations as elements.
By the time of Codd's 1990 book The Relational Model for Database Management: Version 2:
From a database perspective, data can be classified into two types:
atomic and compound.
In the relational model there is only one
type of compound data: the relation.
And a relation is a single value so there's nothing wrong with relation-valued attributes. (Pace Codd's changing opinion on that.)
The candidate keys are 1) {person_id}, 2) {first_name, last_name}
The only non-prime attribute is {birthday}.
To normalize you must know for every subset of attributes what attributes are (non-trivially) functionally dependent on it. Although every superset of a determinant determines what it does, so that takes care of a lot of them. You skipped that step.
You cannot show that {first_name,last_name} is a CK without showing that {first_name} and {last_name} aren't CKs via what each determines. Assuming you do, you still won't have considered remaining possible determinants {first_name,birthday} and {last_name,birthday}.
You cannot show that those are the only CKs until you show that there are no other CKs. You must show for every subset of attributes whether it is a CK. Although no superset of a CK is a CK, so that takes care of a lot of them. There are algorithms.
There is an FD {person_id} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 1
There is an FD {first_name, last_name} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 2
Your new last two bullets are unjustified. Look at my message's definition and use of "(non) transitively dependent"; just knowing S -> T does not tell you enough. When there's a non-transitive FD S -> X -> T it must also be that S -> T; so knowing S -> T alone does not tell you about whether S transitively or non-transitively determines T. "->" does not mean "directly"; non-transitively is the only meaningful notion of "directly".
Maybe by "so" you mean "so as shown below for the first of these two cases"?
There is a dependency {person_id} -> {first_name, last_name} ->
{birthday}, but since there is also a direct dependency {person_id} ->
{birthday}, this dependency is not transitively.
See above: "direct" is a misconception. And as I said in my original answer it's since {first_name, last_name} -> {person_id} for CK1 and {person_id} ->{first_name, last_name} for CK 2.
I don't have a predefined set of FDs from a book, so I am not sure
whether the FDs are correct. Can someone confirm this, or if they
don't look right, show how I can find the FDs in this practical
example?
You have to consider every possible value the table can have due to every possible application situation that can come up and the criterion (predicate) by which you are to put rows into the table vs leave them out. You can probably think of counterexamples to putative FDs, where two rows can share the same value for a putative determinant. Eg for {first_name,birthday} and {last_name,birthday} you can expect two different people to have the same name and birthday. (You can check the last two putative FDs.)
(Now your language is clearer. Roughly speaking your errors (still) come from not using definitions and skipping steps.)
Re your second EDIT version:
It now seems like you have probably done everything soundly. (Although I can't know for sure because you don't specifically make clear that there are no more 2-element attribute sets & there are no more attribute sets; why that pair is the set of CKs; and the 2NF/3NF "therefore"s.)
Phrasings like "If you know a person's last_name and birthday, you don't know any other field of this person" are problematic. Me: If I only know two fields, of course I don't know others; so there's never a FD? You: For a person. Me: But if I know the person then I know their first_name; so there is an FD? You: If you know first_name and birthday for one person but not who; you don't know any other field. Me: Sometimes I do know other fields; so the implication is false; so there's an FD? It turns out that "know" is a super-confusing word good to avoid. Write, "Given ... there exists ...". As you did in "(there cannot be multiple ...)".
Related
I have R(A, B, C, D, E, F) with following functional dependencies:
{AB}->{D}
{D}->{C}
{AC}->{E}
{E}->{B}
{BC}->{F}
The relation has 4 candidate keys AB, AC, AD and AE.
According to Wikipedia:
A relation is in the second normal form if it fulfils the following two requirements:
It is in first normal form.
It does not have any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation.
Thus the above relations should be in 2nd Normal Form, as F is the only non-prime attribute of the relation and it is not dependent on any proper subset of any candidate key.
But one of the answers https://stackoverflow.com/a/10114098/13861908, on stack overflow, says that
A table is in 2NF if and only if, it is in 1NF and every non-prime attribute of the table is either dependent on the whole of a candidate key, or on another non-prime attribute.
According to this definition, the relation is not in 2nd Normal Form, as {BC} is neither a super key nor contain any non-prime attribute.
Clearly, I am missing out something. Please help.
If there is a relation:
studentColor(studentNumber, favouriteColor)
And I have this dependency:
studentNumber -> favouriteColor
So this means a student can only have one favourite color but a favourite color can be chosen by many students, so I understand that there is a multi-value dependency:
favouriteColor ->> studentNumber
so this relation is only qualified to be in BCNF.
But I was wondering, if its:
studentNumber -> favouriteColor
favouriteColor -> studentNumber
this means that if a color is chosen by a student, it can't be picked anymore, so there isn't any multi-value dependency here.
Since I heard that a relation need to satisfy these rule to be in 4NF
It should be in the Boyce-Codd Normal Form (BCNF).
the table should not have any Multi-valued Dependency.
Does that mean it is in 4NF?
I would add what I was/is being taught in my Database Systems and Management Course
We were told that every relation that is in BNCF is not in 4NF iff
There are at least 3 attributes
There exists A ->-> B and A->->C
B and C are independent .
Here the ->-> double arrays refer to multivalued dependency.
Thus this directly leads to the conclusion that If the relation consisting of 2 attributes
is in BCNF then it is in 4NF.
I know that for a Relation to be 3NF It has to be 2NF and no transitive dependencies should exist but I couldn't answer the following question:
For a relationship to be 3NF :
A) All Attributes should depend on the primary key.
B) The relationship should only have one Foreign Key.
C) The relationship should only have one Primary Key.
D) The Relationship's Table should only have atomic values
D applies on a 3NF relationship because it's one of the conditions of 1NF and for a relationship to be 3NF it has to be 2NF and 1NF.
C is too general and doesn't apply just on 3NF but my book has chosen it as the answer!
B is not related to Normalization and A may be considered as 2NF but they didn't say all non-attributes so I don't know actually, what is the right answer here?
By definition of "superkey", all attributes depend on a superkey. By definition of "CK" (candidate key) as a superkey containing no smaller superkey, all attributes depend on a CK. By definition of "PK" (primary key) as a distinguished CK, all attributes depend on a PK. So A is an answer.
FKs (foreign keys) are irrelevant to normalization. So B is not an answer.
By definition of "PK", a relation/schema can have at most one, which we pick from among the CKs. There can always be a PK, because there is always at least one CK. Whether you must pick a PK depends on your textbook--PKs per se have no role in normalization theory. Unfortunately "should only have one" is not clear, because it might mean exacly one & it might mean at most one. So if it agrees with your textbook, C is an answer; otherwise not. Go with your textbook.
Presentations that talk about "atomic" values require them in either the definition of "relation" or the definition of "1NF" & higher NFs. So for your textbook presumably D is an answer. But actually the notion of atomic values, although ubiquitous, is confused & also "1NF" has no single meaning. Go with your textbook.
(None of the options guarantee 3NF.)
PS Your characterization of 3NF is not correct. Only certain transitive FDs (functional dependencies) matter--3NF is when/iff 2NF & no non-CK attribute is transitively dependent on a CK. (If one's "is in 1NF" is just "is a relation" then one can drop the "2NF &".) And be sure you get the correct definition of "transitive FD"--for sets X & Y, X->Y is transitive when/iff there exists set S where X->S & S->Y & not S->X & not S=Y. Get correct definitions from a good textbook.
I have an entities Group and Person with relationships:
Group:
Group.leader -> Person (To One)
Group.looser -> Person (To One)
Group.others ->> Person (To Many)
In leader, looser and others set I could have different Person entities. Same Person could be leader in one group, looser in second and appears in others set in third group.
in Person entity I have To-Many relationship groups which should connect
Person:
Person.groups ->> Group (should be enough but warnings)
Because I can make only one inverse relationship I always
will have a warning "something should have inverse"
How to deal with relationships like this?
Or:
I have entities Cube, Plan and Line. Cube has relationships x, y, z, Plane x and y, Line just x. And I need to share some values between them, even sometimes mixed:
Cube:
Cube.x --> Value
Cube.y --> Value
Cube.z --> Value
Plane:
Cube.x --> Value
Cube.y --> Value
Line:
Cube.x --> Value
Value:
Value.counted -->> Line.x or Line.y, Plane.x, Cube.x, y, z, SomeAnotherEntity.neededValue
Apple recommend that every relationship should have an inverse. In your case, that would mean the Person entity would have three relationships:
Person.groupsLed ->> Group (to many) // "groups where this Person is leader"
Person.groupsLost ->> Group (to many) // "groups where this person is the looser"
Person.otherGroups ->> Group (to many) // "other groups with this person as a member"
which does seem rather complicated. One alternative would be to collapse the three relationships into one (for each of Person and Group) with an intermediate entity (Ranking?):
Group.rankings ->> Ranking (to many) // "the ranking of people for this group"
Person.rankings ->> Ranking (to many) // "the ranking of this person in different groups"
In each case the inverse would be to-one:
Ranking.person -> (Person) (to one) // "the person for this ranking"
Ranking.group -> (Group) (to one) // "the group for this ranking"
You can then add an attribute to the Ranking entity to indicate the leaders/loosers/other. That could be a simple string attribute rank which takes the values "leader", "looser" or "other", or an equivalent integer enum. To manage the relationship between a Group and a Person, you add or remove Ranking objects. One downside to all this is that finding the leader or looser involves filtering the rankings, but it does give you a degree of flexibility.
So I looked into neo4j, and I may be using it in an upcoming project since its data model might fit my project very well. I looked through the docs but I still need an answer to this question:
Can I set relationships to be one-directional?
It seems that the neo4j people like movies so let's continue with that. If I have a graph like this:
Actor A -> [:Acts in] -> Movie B
then direction is obvious, because the nodes are different types.
But I like horror movies so...
Person A -> [:wants_to_kill] -> Person B
I need this relationship to be one-directional so if I query "Who does Person A want to kill?" I get Person B, if I query "Who does Person B want to kill?" I get nothing.
Sometimes I still need relationships to be two directional
Like:
Person A <-[:has_met] -> Person B
...which is obvious.
documentation says:
Relationships are equally well traversed in either direction. This means that there is
no need to add duplicate relationships in the opposite direction (with regard to
traversal or performance).
While relationships always have a direction, you can ignore the direction where it is
not useful in your application.
So docs say, relationships by default have an direction and I can ignore that if I wish.
Now this is where things get complicated:
Consider the following graph (and note the arrows)
Person A <- [:wants_to_kill] -> Person B
Person B -> [:wants_to_kill] -> Person C
Person C -> [:wants_to_kill] -> Person A
If I ignore the Directions for all [:wants_to_kill] I get false results
for "Who does Person A / C want to kill?"
If I knew which ones I had to ignore, I would not do the query.
So can I somehow set relationships to be two-directional (when creating them), or should I model this with two relationships (between Person A & B)?
A relationship in Neo4j always has a direction. If a relationship type's semantics do not incorporate a direction, e.g. has_met from your example, then it's best practice to apply an arbitrary direction on creation of the relationship. Querying is then done by using the "both directions" (there is no "larger/smaller than" character) notation in cypher:
start ... match (a)-[:HAS_MET]-(b) ....
In contrast, if the semantics of a relationship do have a direction like your wants_to_kill, you need to use two relationship to indicate that a and b wants kill the other one and vice versa. For the example above, you need to have 4 relationships:
Person A -[:wants_to_kill]-> Person B
Person B -[:wants_to_kill]-> Person A
Person B -[:wants_to_kill]-> Person C
Person C -[:wants_to_kill]-> Person A
To find all the person that A wants to kill you do:
start a=node:node_auto_index(name='A') match a-[:wants_to_kill]->victims_of_a return victims_of_a
To find all persons who want to kill A:
start a=node:node_auto_index(name='A') match murderer_of_a-[:wants_to_kill]->a return murderer_of_a