3NF Normalization Algorithm Step: Relation with SuperKey - database-normalization

I was wondering about the last step of the 3NF normalization algorithm where it states:
4) If none of the relations obtained in previous steps contains a superkey of R, then add a new relation whose schema is a key for R.
My specific question is, what happens with the semantic of that relation? Why have only one relation and not many single-attribute relations (one for each attribute of the key)?
I found that in some examples that extra relation makes sense, but in others it seems to "mix" attributes that aren't related...

The last step of the 3NF normalization algorithm is needed to guarantee that the decomposition generated by the algorithm is lossless.
In fact there is a theorem that states that if a decomposition preserves the dependencies, and one of the decomposed schema is a superkey of the original relation, then the decomposition is also lossless.
The algorithm, with the previous steps, guarantees that every functional dependency is present in some of the decomposed relation. The introduction of a relation that contains a key, if no key is already present in some of the other relation, guarantees that the algorithm produces a decomposition that preserves both data and dependencies.
Added
Here is a simple example that shows the need of this last step. Suppose that an instance of the relation R(A, B, C, D), with A->B, C->D, (with key A,C) is:
R
A | B | C | D
-------------
1 2 2 3
1 2 3 4
2 3 2 3
The decomposition in R1(A,B), R2(C,D) is in third normal form but is lossy (additive). In fact, projecting that instance on the decomposition produces:
R1 R2
A | B C | D
----- -----
1 2 2 3
2 3 3 4
The additive property of this decomposition is clear if we perform a natural join of the decomposed relation, which produces an instance different from the original one:
R1 ⨝ R2 =
A | B | C | D
-------------
1 2 2 3
1 2 3 4
2 3 2 3
2 3 3 4
The situation does not change if you decompose R in R1(A,B), R2(C,D), R3(A), R4(C): in fact, recomposing it with R1 ⨝ R2 ⨝ R3 ⨝ R4 produces exactly the same relation as above with 4 rows:
R1 R2 R3 R4 R1 ⨝ R2 ⨝ R3 ⨝ R4 =
A | B C | D A C A | B | C | D
----- ----- -- --- --------------
1 2 2 3 1 2 1 2 2 3
2 3 3 4 2 3 1 2 3 4
2 3 2 3
2 3 3 4
Instead, the situation changes completely with the decomposition in R1(A,B), R2(C,D), R3(A, C). When you recompose with the natural join you obtain the original instance:
R1 R2 R3 R1 ⨝ R2 ⨝ R3 =
A | B C | D A | C A | B | C | D
----- ----- ------ --------------
1 2 2 3 1 2 1 2 2 3
2 3 3 4 1 3 1 2 3 4
2 2 2 3 2 3
So, in summary, in the first two cases you have a loss of information (the original instance is not obtained), while in the third case you have a 3NF and a lossless (nonadditive) decomposition.

Related

SQL aggregate sum produces unexpected output

I don't understand how sum works.
For a PostgreSQL table in dbeaver:
a
b
c
d
1
2
3
2
1
2
4
3
2
1
3
2
2
1
4
2
3
2
4
2
the query
select a, b, c, d, sum(c) as sum_c, sum(d) as sum_d from abc a group by a, b, c, d
produces
a
b
c
d
sum_c
sum_d
1
2
3
2
3
2
1
2
4
3
4
3
2
1
3
2
3
2
2
1
4
2
4
2
3
2
4
2
4
2
and I don't understand why: I expected sum_c would be 18 in each row, which is the sum of values in c, and sum_d would be 11 for the same reason.
Why do sum_c and sum_d just copy the values from c and d in each row?
You can't get the result that you want with group by.
When you aggregate with group by you create groups for all the columns that are after group by and for each of these groups you get the aggregated results.
For your sample data, one group is 1,2,3,2 and for this combination of values you get the sum of c which is 3 since there is only 1 row with c=3 in that group.
Use SUM() window function:
SELECT a, b, c, d,
SUM(c) OVER () sum_c,
SUM(d) OVER () sum_d
FROM abc

separating the records in a kdb table

There is a table with a column that I would like to break into multiple records. For example
q)tab:([]a:1 2 3;b:(`a;`$"b c";`d);c:2 3 4)
q)tab
a b c
-------
1 a 2
2 b c 3
3 d 4
There is a space between b and c in the second entry of column b, I would like the table to become
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
I tried
" " string vs exec b from tab
but didn't work.
Any idea?
Since b is the column with multiple entries per row, you can count each value and expand the corresponding row entries accordingly. Then ungroup like Terry mentioned should work.
q)t:([]a:1 2 3;b:(`a;`b`c;`d);c:2 3 4)
q)![t;();0b;{x!(enlist({(count each x)#'y};`b)),/:x}cols t]
a b c
------------
,1 ,`a ,2
2 2 `b`c 3 3
,3 ,`d ,4
q)ungroup ![t;();0b;{x!(enlist({(count each x)#'y};`b)),/:x}cols t]
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
EDIT: Realised after your comment that the input is different. I think this is what you want.
q)t:([]a:1 2 3;b:(`a;`$"b c";`d);c:2 3 4)
q)ungroup update`$" "vs'string b from t
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
You would normally do this using ungroup:
q)ungroup([]a:1 2 3;b:((),`a;`b`c;(),`d);c:2 3 4)
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4

Tableau Frequency Distribution - multiple groups

I have data which includes 2 columns, ages, and groups similar to
B 1 B 1 B 1 B 4 B 5 B 8 D 2 D 2 D 3 D 3 D 3 D 4 D 6 D 7 D 9 D 9
In Tableau, I wish to plot a line for each group B and D, % number of records(observations) (of group in group), against the age range, 1 to 9.
So B 1 - 3/6*100, B 5 1/6*100, D 3 - 3/10*100.
Any help or pointers would be really appreciated.
Enda
Drag 'Age' measure in columns.
Drag 'Group' dimension in 'Color'
Drag the tableau default measure of 'Number of Records' in rows. Make it's aggregation as 'sum', add quick table calculation of 'Percent of Total'. Change it's 'Compute Using' to 'Age'.
That's it! Hopefully this is what you were trying to do.

Create a Boolean column displaying comparison between 2 other columns in kdb+

I'm currently learning kdb+/q.
I have a table of data. I want to take 2 columns of data (just numbers), compare them and create a new Boolean column that will display whether the value in column 1 is greater than or equal to the value in column 2.
I am comfortable using the update command to create a new column, but I don't know how to ensure that it is Boolean, how to compare the values and a method to display the "greater-than-or-equal-to-ness" - is it possible to do a simple Y/N output for that?
Thanks.
/ dummy data
q) show t:([] a:1 2 3; b: 0 2 4)
a b
---
1 0
2 2
3 4
/ add column name 'ge' with value from b>=a
q) update ge:b>=a from t
a b ge
------
1 0 0
2 2 1
3 4 1
Use a vector conditional:
http://code.kx.com/q/ref/lists/#vector-conditional
q)t:([]c1:1 10 7 5 9;c2:8 5 3 4 9)
q)r:update goe:?[c1>=c2;1b;0b] from t
c1 c2 goe
-------------
1 8 0
10 5 1
7 3 1
5 4 1
9 9 1
Use meta to confirm the goe column is of boolean type:
q)meta r
c | t f a
-------| -----
c1 | j
c2 | j
goe | b
The operation <= works well with vectors, but in some cases when a function needs atoms as input for performing an operation, you might want to use ' (each-both operator).
e.g. To compare the length of symbol string with another column value
q)f:{x<=count string y}
q)f[3;`ab]
0b
q)t:([] l:1 2 3; s: `a`bc`de)
q)update r:f'[l;s] from t
l s r
------
1 a 1
2 bc 1
3 de 0

How to sum across a row in KDB/Q

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?
why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?
Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.
One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website
This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)