Confused on how to go from 2NF to 3NF - database-normalization

As the title states. I have read many articles trying to wrap my head around this, but am still not sure if I am doing it right or not. I think I am getting the hang of it, but wanted to get some more opinions in case I do need to correct what I am doing. Example is below.
Thanks!
1NF Employee_ID,Last Name, First Name, Street, City, Zip, D.O.B., Age, Degree required
2NF Employee_ID, Last Name, First Name, D.O.B, Age, Degrees Recieved
Location_ID, street, city, zip
3NF Employee_ID, Last Name, First Name, Age
Birth date, D.O.B.
Location_ID Street
Zip Code, City

Summarised:
2NF is when every non-key attribute depends on the whole primary key. So imagine a CD. The CD has an ID number, which is the primary key. The name, artist and gender of the artist are dependent on the primary key. So this is correct:
Table_CD:
CD_ID: Name: Artist: Artist_Gender:
1 CD1 Artist1 Male
2 CD2 Artist1 Male
3 CD3 Artist2 Female
This is correct for 2NF because artist is dependent on the Key (CD_ID). We dont check for transitive dependency.
In 3NF you simply say there can be no dependencies on something that is not the key. The gender of the artist depends on the artist. Not on the CD_ID, which is the key. Therefore it is not 3NF.
To make it 3NF you must seperate out the transitive dependency. Hence the gender of the artist. Thus:
Table_CD:
CD_ID: Name: Artist:
1 CD1 Artist1
2 CD2 Artist1
3 CD3 Artist2
Table_Artist:
Name: Gender:
Artist1 Male
Artist2 Female

Related

How can I group many values for a variable in grafana?

I have an SQL table with columns: name and user type.
Name
type
John
student
Tom
teacher
Peter
teacher
Steve
student
I want to have a grafana variable where I can select a user "type" and the variable passes the value of all the "names" corresponding to that "type" to the dashboard.
I found this example - https://grafana.com/blog/2019/07/17/ask-us-anything-how-to-alias-dashboard-variables-in-grafana-in-sql/ which lets me alias a one to one mapping, but I am looking for a one to many mapping.

#TigerGraph Import : one csv for each relation type?

I’ve just started using TigerGraph, and I saw in the import example that the edges files: friendship.csv is separated from the vertices files: person.csv.
Does it mean that if I have, say, 10 edges types I need or it would be better, to have 10 distinct csv files, each for a specific edge type ?
It's generally easier, but NOT necessary to have separate files for your edges.
For relationships where edges are one to one, you can get away with using the same file for data and relationships.
Example with 'Person' being a vertex with attribute 'name'. 'friend' is an edge connecting two 'Persons':
Person ID
Name
Friend
person_1
Bill
person_7
person_2
Sue
person_9
person_3
Ann
person_8
If your relationships are one to many, it may make sense to have a separate file for each relationship to prevent data duplication.
For example, you could either use a single file with duplication like this:
Person ID
Name
Friend
person_1
Bill
person_7
person_1
Bill
person_6
person_2
Sue
person_9
person_2
Sue
person_5
Or a separate data and edge file like here:
Data:
Person ID
Name
person_1
Bill
person_2
Sue
person_3
Ann
Edges:
Person ID
Friend
person_1
person_7
person_1
person_6
person_2
person_9
person_2
person_5

How to find all employees that contain all character sequenences in their name?

I'm trying to write a Micronaut Data JPA finder that finds all employees that match to a list of strings.
Given the following employees (first name, last name)
John Doe
Jane Doe
Peter Pan
Silvio Wangler
Franka Potente
Frank Potter
Lets pretend a user queries the system with a query like Silvio then the result should be
Employee: Silvio Wangler
If the query is Frank the result should be
Employee: Franka Potente
Employee: Frank Potter
And if the query is frank pot the result should as well (case insensitive like)
Employee: Franka Potente
Employee: Frank Potter
I managed to write a finder for a single string as listed below
Page<Employee> findAllByLastNameIlikeOrFirstNameIlike(
String lastName, String firstName, Pageable pageable);
For a query like frank pot I would like to tokenize/split the string to a list ["frank", "pot"] and was wondering if I could implement something like
Page<Employee> findAllByLastNameIlikeOrFirstNameIlike(
List<String> lastName, List<String> firstName, Pageable pageable);
or even better use the Criteria API of JPA. How would you implementent such a search finder? Can you point me the direction?
First thing i always think of, is how to do it in plain SQL.
Found that one MySQL Like multiple values
So i suggest the best way is using the criteria API and build your own dynamic query.
Have a look at the first example https://micronaut-projects.github.io/micronaut-data/latest/guide/#repositories which gives you everything you need from micronaut.
i use the ciriteria api also if i have some kind of dynamic querys like 'give me everything from start to end but if start is not specified just give me everything before end ... '

Third Normal Form uniqeness constraint

I am trying to make my database in 3NF i am confused about one thing. In the explanation below i do not understand how Zip can be the primary key of the address table if the zip can occur more than once. In the Student_Detail table a reoccuring zip is fine but as a primary key wont it lose its uniqeness?
Third Normal Form (3NF)
Third Normal form applies that every non-prime attribute of table must be dependent on primary key, or we can say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So this transitive functional dependency should be removed from the table and also the table must be in Second Normal form. For example, consider a table with following fields.
Student_Detail Table :
Student_id - Student_name - DOB - Street - city - State - Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table :
Student_id - Student_name - DOB - Zip
Address Table :
Zip - Street - city - state
The advantage of removing transtive dependency is,
Amount of data duplication is reduced.
Data integrity achieved.
Example:
http://www.studytonight.com/dbms/database-normalization.php
I'm assuming this is your question
i do not understand how Zip can be the primary key of the address table if the zip can occur more than once.
and the reason why you don't understand is just because Zip is a bad example.
All the explanation is correct. If you can infer any "non-prime" attribute base upon another "non-prime" attribute you have what is called "transitive dependency". You pull those to a different table and in its place you insert a FK reference.
Zip will not be able to appear more than once for that attribute is a PK. I believe it is just a bad example although the explanation is correct. Try to analyse it with different subjects.
Check if this example helps you in any way.

Table with unique identifier in Third normal form?

Suppose, I have a Table with the columns:
person_id (primary key)
first_name
last_name
birthday
I also have a unique constraint on the combination {first_name, last_name} (I know that more people can have the same name, but I want to keep my example simple). I want to know whether this Table is in Third normal form.
My reasoning (before EDIT):
All fields can only contain atomic values, so the Table is in First normal form.
The candidate keys are 1) person_id, 2) [first_name, last_name]
The only non-prime attribute is birthday.
The attribute birthday is not functionally dependent on part of candidate key 1 (which is impossible anyway, since there is only 1 attribute in candidate key 1)
The attribute birthday is not functionally dependent on part of candidate key 2
Therefore, this Table is in Second normal form.
The attribute birthday (is/is not) non-transitively dependent on candidate key 1
The attribute birthday is non-transitively dependent on candidate key 1
The Question (before EDIT):
The question that I cannot answer is if birthday is non-transitively dependent on person_id. Functionally, there is no relationship at all between this id number and the birthday.
Does this mean that there is a transitive dependency (birthday depends on [first_name, last_name], and each combination [first_name, last_name] maps to an id) and therefore not in 3NF?
Does this mean that there is no dependency at all, and therefore not in 3NF?
Am I misinterpreting the difficult language and is this Table in 3NF?
My reasoning (after EDIT):
If you know the person_id, you know his first name, last name and his birthday, so there are the FDs {person_id} -> {first_name}, {person_id} -> {last_name} and {person_id} -> {birthday}.
If you know a person's first and last name, you know his person_id and birthday, so there are the FDs {first_name, last_name} -> {person_id} and {first_name, last_name} -> {birthday}.
If you know a person's birthday, you don't know anything about his person_id or name, so there are no FDs from birthday to another (set of) attribute(s).
All fields can only contain atomic values, so the Table is in First normal form.
The candidate keys are 1) {person_id}, 2) {first_name, last_name}
The only non-prime attribute is {birthday}.
The attribute {birthday} is not FD on part of CK 1 (which is impossible anyway, since there is only 1 attribute in CK 1)
The attribute {birthday} is not FD on part of CK 2
Therefore, this Table is in Second normal form.
There is an FD {person_id} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 1
There is an FD {first_name, last_name} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 2
Therefore, this Table is in Third normal form.
There is a dependency {person_id} -> {first_name, last_name} -> {birthday}, but since there is also a direct dependency {person_id} -> {birthday}, this dependency is not transitively.
The Question (after EDIT):
I don't have a predefined set of FDs from a book, so I am not sure whether the FDs are correct. Can someone confirm this, or if they don't look right, show how I can find the FDs in this practical example?
Third reasoning (second EDIT):
FD's:
If you only know a person's person_id, you know his first name, last name and his birthday (there cannot be multiple people with the same person_id)
FD: {person_id} -> {first_name}
FD: {person_id} -> {last_name}
FD: {person_id} -> {birthday}
Supersets including {person_id} no longer need to be considered
If you only know a person's first_name, you don't know any other field of this person (there can be multiple people with the same first_name)
Not FD: {first_name} -> {person_id}
Not FD: {first_name} -> {last_name}
Not FD: {first_name} -> {birthday}
If you only know a person's last_name, you don't know any other field of this person (there can be multiple people with the same last_name)
Not FD: {last_name} -> {person_id}
Not FD: {last_name} -> {first_name}
Not FD: {last_name} -> {birthday}
If you only know a person's birthday, you don't know any other field of this person (there can be multiple people with the same birthday)
Not FD: {birthday} -> {person_id}
Not FD: {birthday} -> {first_name}
Not FD: {birthday} -> {last_name}
If you know a person's first_name and last_name, you know his person_id and his birthday (there cannot be multiple people with the same first_name and last_name)
FD: {first_name, last_name} -> {person_id}
FD: {first_name, last_name} -> {birthday}
Supersets including {first_name, last_name} no longer need to be considered
If you know a person's first_name and birthday, you don't know any other field of this person (there can be multiple people with the same first_name and birthday)
Not FD: {first_name, birthday} -> {person_id}
Not FD: {first_name, birthday} -> {last_name}
If you know a person's last_name and birthday, you don't know any other field of this person (there can be multiple people with the same last_name and birthday)
Not FD: {last_name, birthday} -> {person_id}
Not FD: {last_name, birthday} -> {first_name}
Normal forms:
All attributes can only contain single values, so the Table is in First normal form.
Looking at the FDs, there are two candidate keys: 1) {person_id}, 2) {first_name, last_name}
The only non-prime attribute is {birthday}.
The attribute {birthday} is not FD on part of CK 1 (which is impossible anyway, since there is only 1 attribute in CK 1)
The attribute {birthday} is not FD on part of CK 2 (i.e. there is no FD {first_name} -> {birthday} or FD {last_name} -> {birthday})
Therefore, this Table is in Second normal form.
S transitively determines T when there exists an X such that S -> X and X -> T and not(X -> S)
Let S = CK1 = {person_id} and T = {birthday}. The only X such that S -> X and X -> T is when X = {first_name, last_name}. However, then also X -> S holds. Therefore, S non-transitively determines T.
Let S = CK2 = {first_name, last_name} and T = {birthday}. The only X such that S -> X and X -> T is when X = {person_id}. However, then also X -> S holds. Therefore, S non-transitively determines T.
Therefore, this Table is in Third normal form.
Re your original question:
Your organization and reasoning are unsound. First give the all the FDs. Eg this determines the CKs. Eg you cannot reason soundly on just giving the (alleged) CKs (which imply certain FDs) and a couple of non-FDs. Eg "non-transitively dependent" cannot be determined without knowing all the FDs. Only then can you write sound bullets & answer your numbered questions.
But let's assume that {first_name,last_name} and {person_id} really are the only CKs and that there are no FDs other than those implied by the fact that each CK determines every attribute not in it.
Functionally, there is no relationship at all between this id number
and the birthday.
I don't know what you mean by "Functionally, there is no relationship at all between". Maybe you are trying to say that {person_id} does not functionally determine {birthday}. But it does, because a CK determines all attributes not in it. Maybe you mean you don't see an application constraint between people ids and birthdays and/or a table constraint involving a table's person_id and a birthday values. But there is: A given person only has one birthday at a time, and in the table a person_id only ever has one birthday at a time. This is a consequence of the meaning of and rules around "people", "birthdays", person_id and birthday. The constraint on person_id and birthday is expressed by "{person_id} -> {birthday}" and you have to know whether it is the case as part of determining the intial list of all FDs (that precedes determining CKs).
S transitively determines T when there exists an X such that S -> X and X -> T and not(X -> S). S non-transitively determines T when it doesn't transitively determine it.
Does this mean that there is a transitive dependency (birthday depends on [first_name, last_name], and each combination [first_name, last_name] maps to an id) and therefore not in 3NF?
I don't know what you are trying to say by "each combination maps to an id" let alone why it implies non-3NF. Maybe you are trying to say that taking {person_id} as S and {birthday} as T and {first_name, last_name} as X we have S -> X and X -> T so (wrongly) a non-prime attribute is transitively dependent on a CK so the relation is not in 3NF. But you did not satisfy not(X -> S).
For {person_id} as S and {birthday} as T the only possibility for X -> T has {first_name,last_name} as X but X -> S because X is a key so S -> T is not transitive.
Similarly for {first_name,last_name} as S and {birthday} as T the only possibility for X -> T has {person_id} as X but X -> S because X is a key so S -> T is not transitive.
Does this mean that there is no dependency at all, and therefore not in 3NF?
Since the relation in in 2NF and every non-prime attribute is non-transitively dependent on every CK, the relation is in 3NF.
Am I misinterpreting the difficult language and is this Table in 3NF?
You didn't claim it was or wasn't, did you?
(Please edit your question to use proper technical terms.)
Re your EDIT version
(You acknowledged in comments that your last bullet was supposed to have CK 2 and that it was unsound. And that my guesses at your unclear phrasings were more or less what you meant.)
All fields can only contain atomic values, so the Table is in First normal form.
Normalization only makes sense for relational "tables", ie relations. That means unique unordered attributes ("columns") and tuples ("rows"). With one value per attribute per tuple. All relations are in 1NF.:
A relational table is always in 1NF. Each column of a row has a single
value of the column's type. A non-relational database is "normalized"
to tables ie 1NF (first sense of "normalized") which gets rid of
repeating groups. Then those tables/relations are "normalized" to
higher normal forms (second sense of "normalized").
"Atomic" is not helpful: "Atomic" originally meant not a relation.:
In Codd's original 1970 paper he explained that "atomic" meant not a
relation (ie not a table):
So far, we have discussed examples of relations which are defined on simple domains--domains whose elements are atomic
(nondecomposable)
values. Nonatomic values can be discussed within the relational
framework. Thus, some domains may have relations as elements.
By the time of Codd's 1990 book The Relational Model for Database Management: Version 2:
From a database perspective, data can be classified into two types:
atomic and compound.
In the relational model there is only one
type of compound data: the relation.
And a relation is a single value so there's nothing wrong with relation-valued attributes. (Pace Codd's changing opinion on that.)
The candidate keys are 1) {person_id}, 2) {first_name, last_name}
The only non-prime attribute is {birthday}.
To normalize you must know for every subset of attributes what attributes are (non-trivially) functionally dependent on it. Although every superset of a determinant determines what it does, so that takes care of a lot of them. You skipped that step.
You cannot show that {first_name,last_name} is a CK without showing that {first_name} and {last_name} aren't CKs via what each determines. Assuming you do, you still won't have considered remaining possible determinants {first_name,birthday} and {last_name,birthday}.
You cannot show that those are the only CKs until you show that there are no other CKs. You must show for every subset of attributes whether it is a CK. Although no superset of a CK is a CK, so that takes care of a lot of them. There are algorithms.
There is an FD {person_id} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 1
There is an FD {first_name, last_name} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 2
Your new last two bullets are unjustified. Look at my message's definition and use of "(non) transitively dependent"; just knowing S -> T does not tell you enough. When there's a non-transitive FD S -> X -> T it must also be that S -> T; so knowing S -> T alone does not tell you about whether S transitively or non-transitively determines T. "->" does not mean "directly"; non-transitively is the only meaningful notion of "directly".
Maybe by "so" you mean "so as shown below for the first of these two cases"?
There is a dependency {person_id} -> {first_name, last_name} ->
{birthday}, but since there is also a direct dependency {person_id} ->
{birthday}, this dependency is not transitively.
See above: "direct" is a misconception. And as I said in my original answer it's since {first_name, last_name} -> {person_id} for CK1 and {person_id} ->{first_name, last_name} for CK 2.
I don't have a predefined set of FDs from a book, so I am not sure
whether the FDs are correct. Can someone confirm this, or if they
don't look right, show how I can find the FDs in this practical
example?
You have to consider every possible value the table can have due to every possible application situation that can come up and the criterion (predicate) by which you are to put rows into the table vs leave them out. You can probably think of counterexamples to putative FDs, where two rows can share the same value for a putative determinant. Eg for {first_name,birthday} and {last_name,birthday} you can expect two different people to have the same name and birthday. (You can check the last two putative FDs.)
(Now your language is clearer. Roughly speaking your errors (still) come from not using definitions and skipping steps.)
Re your second EDIT version:
It now seems like you have probably done everything soundly. (Although I can't know for sure because you don't specifically make clear that there are no more 2-element attribute sets & there are no more attribute sets; why that pair is the set of CKs; and the 2NF/3NF "therefore"s.)
Phrasings like "If you know a person's last_name and birthday, you don't know any other field of this person" are problematic. Me: If I only know two fields, of course I don't know others; so there's never a FD? You: For a person. Me: But if I know the person then I know their first_name; so there is an FD? You: If you know first_name and birthday for one person but not who; you don't know any other field. Me: Sometimes I do know other fields; so the implication is false; so there's an FD? It turns out that "know" is a super-confusing word good to avoid. Write, "Given ... there exists ...". As you did in "(there cannot be multiple ...)".