use algorithm to generate decompositions in 3NF - database-normalization

I would like to use the algorithm to generate decompositions in 3NF
with retention of dependencies.
R(ABCDEFGH)
F={a->b, abcd->e,ef->g,ef->h,acdf->e,acdf->g}
the only key is acdf. I apply the algorithm to find the 3fn,
eliminating extraneous attributes and redundant dependencies. I get:
F={a->b, acd->e, ef->g, ef->h}
if I create these tables:
tab1 (AB)
tab2 (ACDE)
tab3 (EFG)
TAB4 (EFH)
Am I in normal third form? No. it seems to me that I can't
never decompose it into a 3fn because of a-> b.
So what does that mean, that the algorithm isn't sure which I am
do you find a decomposition in 3 fn?

Related

Could someone please give me an example of a 3NF *DECOMPOSITION* that is not in BCNF? (I have no problem determining this for non-decompositions.)

It seems to me that Bernstein's synthesis / 3NF synthesis always yields BCNF subrelations, but that's apparently not true.
When one uses 3NF synthesis, one will have subrelations as a result, and they will each consist of either:
just one functional dependency along with all attributes of the schema, so the left side of the lone functional dependency will be a superkey, and that subrelation will therefore be in BCNF.
multiple functional dependencies each of which have the same left side, so they're each superkeys, and that subrelation will therefore be in BCNF.
no functional dependency where the schema includes the attributes making up the primary key of the original / non-decomposed relation, which would satisfy BCNF vacuously because of there being no functional dependencies.
What is an example of the 3NF synthesis algorithm yielding a non-BCNF decomposition and why it is so?
Bernstein's algorithm returns (one or more) components in EKNF, which lies between 3NF & BCNF.
Your claims of "that subrelation will therefore be in BCNF" are wrong. The FDs that hold in a component are all the ones in the closure of the original relation whose attributes are all in the component. So FDs could hold in a component that are not out of its superkeys. (Which by definition of BCNF is just another way of saying a component could be not in BCNF. Obviously--since we are told that the algorithm doesn't always give BCNF.)
Since your reasoning is unsound, finding a counterexample seems moot. But just about any presentation of BCNF gives an example non-BCNF 3NF relation, which it then decomposes to BCNF. You can join the non-BCNF 3NF relation with a projection on attributes of one of its CKs extended by a fresh non-prime attribute, and Bernstein's algorithm can decompose back to the 2 tables.
Chris Date's classic An Introduction to Database Systems has a non-BCNF 3NF schema R(S, J, T) with minimal/irreducible cover
{S, J} -> T
{T} -> J
CKs are {S, J} & {T, J}. Berstein gives component (S, J, T)--non-BCNF 3NF input R--in which both given FDs hold--plus redundant component (T, J).
For an example with an additional non-redundant component, extend the cover by {T} -> X. CKs are the same. {S, J} -> T again gives (S, J, T)--non-BCNF--plus component (T, J, X).
So, could someone please give me an example of the 3NF synthesis algorithm yielding a non-BCNF decomposition and tell why it is so?
A better "So, [...]" would be, So, what is wrong with my reasoning? You would do well to examine the assumptions you made about what FDs could hold in a component. (That article happens to point out (with reference) that "A 3NF table that does not have multiple overlapping candidate keys is guaranteed to be in BCNF.")
There is no "why" in mathematics. We assume things ("assumptions", "axioms", "premises") & other things follow. We can ask for a proof of something, but the proof does not say "why" the something is so, it's a demonstration that it is so. "Why" might be used trying to ask for a proof or for steps that you got wrong in or are missing from whatever almost-proof you have in mind.
PS Such a ubiquitous non-BCNF 3NF relation is Today's Court Bookings in the Wikipedia article on BCNF as I write. But beware that that particular example has perhaps unintuitive FDs. Indeed beware that almost every relational model Wikipedia page--including that one--has errors & misconceptions. So do many, many textbooks, especially re normalization.
The answer of philipxy is correct. Since you are asking for an example, here there are a couple of them.
The relation (with a cover of the functional dependencies):
R (A B C D)
A B → C
C → D
D → B
through the synthesis algorithm is decomposed in:
R1 (A B C)
R2 (C D)
R3 (B D)
and R1 is not in BCNF for the dependency C → B (the candidate key is AB). Note that C → B is not present in the original cover, but is a dependency implied from it.
Here is another (classical) example:
Phones (AreaCode, PhoneNumber, Subscriber, Town, Street)
AreaCode, PhoneNumber → Town
AreaCode, PhoneNumber → Subscriber
AreaCode, PhoneNumber → Street
Town → AreaCode
The Bernsteins’s synthesis algorithm produces two subschemas:
R1 (AreaCode, PhoneNumber, Subscriber, Town, Street)
AreaCode, PhoneNumber → Town
AreaCode, PhoneNumber → Subscriber
AreaCode, PhoneNumber → Street
and:
R2 (Town, AreaCode)
Town → AreaCode
since R2 is included in R1, the algorithm eliminates the second relation. The resulting relation is in 3NF but not in BCNF, since the relation has two candidate keys, (AreaCode, PhoneNumber) and (PhoneNumber, Town) and the functional dependency Town → AreaCode violates the BCNF.

Pseudo randomization in MATLAB with minimum intervals between stimulus categories

For an experiment I need to pseudo randomize a vector of 100 trials of stimulus categories, 80% of which are category A, 10% B, and 10% C. The B trials have at least two non-B trials between each other, and the C trials must come after two A trials and have two A trials following them.
At first I tried building a script that randomized a vector and sort of "popped" out the trials that were not where they should be, and put them in a space in the vector where there was a long series of A trials. I'm worried though that this is overcomplicated and will create an endless series of unforeseen errors that will need to be debugged, as well as it not being random enough.
After that I tried building a script which simply shuffles the vector until it reaches the criteria, which seems to require less code. However now that I have spent several hours on it, I am wondering if these criteria aren't too strict for this to make sense, meaning that it would take forever for the vector to shuffle before it actually met the criteria.
What do you think is the simplest way to handle this problem? Additionally, which would be the best shuffle function to use, since Shuffle in psychtoolbox seems to not be working correctly?
The scope of this question moves much beyond language-specific constructs, and involves a good understanding of probability and permutation/combinations.
An approach to solving this question is:
Create blocks of vectors, such that each block is independent to be placed anywhere.
Randomly allocate these blocks to get a final random vector satisfying all constraints.
Part 0: Category A
Since category A has no constraints imposed on it, we will go to the next category.
Part 1: Make category C independent
The only constraint on category C is that it must have two A's before and after. Hence, we first create random groups of 5 vectors, of the pattern A A C A A.
At this point, we have an array of A vectors (excluding blocks), blocks of A A C A A vectors, and B vectors.
Part 2: Resolving placement of B
The constraint on B is that two consecutive Bs must have at-least 2 non-B vectors between them.
Visualize as follows: Let's pool A and A A C A A in one array, X. Let's place all Bs in a row (suppose there are 3 Bs):
s0 B s1 B s2 B s3
Where s is the number of vectors between each B. Hence, we require that s1, s2 be at least 2, and overall s0 + s1 + s2 + s3 equal to number of vectors in X.
The task is then to choose random vectors from X and assign them to each s. At the end, we finally have a random vector with all categories shuffled, satisfying the constraints.
P.S. This can be mapped to the classic problem of finding a set of random numbers that add up to a certain sum, with constraints.
It is easier to reduce the constrained sum problem to one with no constraints. This can be done as:
s0 B s1 t1 B s2 t2 B s3
Where t1 and t2 are chosen from X just enough to satisfy constraints on B, and s0 + s1 + s2 + s3 equal to number of vectors in X not in t.
Implementation
Implementing the same in MATLAB could benefit from using cell arrays, and this algorithm for the random numbers of constant sum.
You would also need to maintain separate pools for each category, and keep building blocks and piece them together.
Really, this is not trivial but also not impossible. This is the approach you could try, if you want to step aside from brute-force search like you have tried before.

Definition of 3NF

I'm rather confused about the definition of 3NF.
Let R be a relation with attribute set X.
Suppose Y -> A is a functional dependency where A is a non-prime attribute and Y is a subset of X.
If Y is a proper subset of any candidate key for R, then the relation is not in 3NF (and not even in 2NF) because this is a partial dependency, which is not permitted in 2NF (and by extension 3NF).
If Y is a non-prime attribute, the relation is not in 3NF because this is a transitive dependency of the non-prime attribute A on any candidate key through the non-prime attribute Y.
But what if Y is a set containing both prime and non-prime attributes? What if A is a subset of Y? What if Y contains only prime attributes, but those prime attributes come from different keys of R so that Y is not a proper subset of any particular key of R? What if Y contains only, but multiple non-prime attributes? Which of these cases violates the requirements of 3NF and why?
TL;DR Get definitions straight.
To know whether a case violates 3NF you have to look at the criteria used in some definition.
Your question is rather like asking, I know an even number is one that is divisible by 2 or one whose decimal representation ends in 0, 2, 4, 6 or 8, but what if it's three times a square? Well, you have to use the definition--show that the given conditions imply that it's divisible by two or that its decimal representation ends in one of those digits. Why do you even care about other properties than the ones in the definition?
When some FDs (functional dependencies) hold, others must also hold. We say the latter are implied by the former. So when given FDs hold usually tons of others also hold. So one or more arbitrary FDs holding doesn't necessarily tell you anything about any normal forms might hold. Eg when U is a superset of V, U → V must hold; such FDs are called trivial because they are implied by any collection of FDs. Eg when U → V, every superset of U determines every subset of V. Armstrong's axioms are some rules that can be mechanically applied to find all FDs that hold. There are algorithms to find a canonical/minimal/irreducible cover for a given set, a set of FDs that imply all those in it with no proper subset that does. There are also algorithms to determine whether a relation satisfies certain NFs (normal forms), and to decompose them into components with higher NFs when they're not.
Sometimes we think there is a case that the definition doesn't handle but really we have got the definition wrong.
The definition you are trying to refer to for a relation being in 3NF actually requires that there be no transitive functional dependence of a non-prime attribute on a candidate key.
In your non-3NF example you should say there is a transitive FD, not "this is a transitive FD", because the violating FD is of the form CK → A not Y → A. Also, U → V is transitive when there is an X where U → X AND X → V AND NOT X → V. It doesn't matter whether X is a prime attribute.
PS It's not very helpful to ask "why" something is or isn't so in mathematics. We describe a situation in terms of some givens, and a bunch of things follow. We can say that if certain of the givens weren't so then that thing wouldn't be so. But if certain other givens weren't so then it might also not be so. We can give a proof that something is or isn't so as "why" but it's not the only proof.

Matlab non-linear binary Minimisation

I have to set up a phoneme table with a specific probability distribution for encoding things.
Now there are 22 base elements (each with an assigned probability, sum 100%), which shall be mapped on a 12 element table, which has desired element probabilities (sum 100%).
So part of the minimisation is to merge several base elements to get 12 table elements. Each base element must occur exactly once.
In addition, the table has 3 rows. So the same 12 element composition of the 22 base elements must minimise the error for 3 target vectors. Let's say the given target vectors are b1,b2,b3 (dimension 12x1), the given base vector is x (dimension 22x1) and they are connected by the unknown matrix A (12x22) by:
b1+err1=Ax
b2+err2=Ax
b3+err3=Ax
To sum it up: A is to be found so that dot_prod(err1+err2+err3, err1+err2+err3)=min (least squares). And - according to the above explanation - A must contain only 1's and 0's, while having exactly one 1 per column.
Unfortunately I have no idea how to approach this problem. Can it be expressed in a way different from the matrix-vector form?
Which tools in matlab could do it?
I think I found the answer while parsing some sections of the Matlab documentation.
First of all, the problem can be rewritten as:
errSum=err1+err2+err3=3Ax-b1-b2-b3
=> dot_prod(errSum, errSum) = min(A)
Applying the dot product (least squares) yields a quadratic scalar expression.
Syntax-wise, the fmincon tool within the optimization box could do the job. It has constraints parameters, which allow to force Aij to be binary and each column to be 1 in sum.
But apparently fmincon is not ideal for binary problems algorithm-wise and the ga tool should be used instead, which can be called in a similar way.
Since the equation would be very long in my case and needs to be written out, I haven't tried yet. Please correct me, if I'm wrong. Or add further solution-methods, if available.

In Answer Set Programming, what is the difference between a model and a least model?

I'm taking an artificial intelligence class and we are working with Answer Set Programming (Clingo specifically). We're talking mostly theory at the moment and I am having some trouble differentiating between models and least models. I have the following definitions:
Satisfying rules, models, least models and answer sets of definite
program
A program is called definite if it does not have “not” in the body of its rules.
A set S is said to satisfy a rule of the form a :- b1, …, bm, not c1, …, not cn. if its body is satisfied by S (I.e., b1 … bm are in S
and none of c1 ... cn are in S) implies that its head must be
satisfied by S (i..e, a is in S).
A set S is said to satisfy a program if it satisfies all rules of that program.
A set S is said to be an answer set of a definite program P if (a) S satisfies P (also referred to as S is a model of P) and (b) No
strict subset of S satisfies P (i.e., S is the least model of P).
With the question (pulled from the lecture slides, not homework):
P is defined as:
a :- b,e.
b.
c :- d,b.
d.
Which of the following are models and least models?
{}, {b}, {b,d}, {b,d,c}, {b,d,c,e}, {b,d,c,e,a}
Can anyone let me know what the answer to the above question is? I can probably figure out the difference from there, although if someone could explain the difference in common speak (rather than text-book definition), that would be wonderful. I'm not sure which forum to post this question under - please let me know if it should be posted somewhere else.
Thanks
First of all, note that this section of your slides is talking about the answer sets of a positive program P (also called a definite program), even though it mentions also not. The positive program is the simple case, as for a positive program P there always exists a unique least model LM(P), which is the intersection of all it's models.
Allowing not rules in the rule bodies makes things more complex. The body of a rule is the right side of :-.
The answer to the question would be, set by set:
S={} is not a model, since b and d are facts b. d.
S={b} is not a model, since d is a fact d.
S={b,d} is not a model, since c is implied by c :- d,b. and c is not in S
S={b,d,c} is a model
S={b,d,c,e} is not a model, since a is implied by a :- b,e. and a is not in S
S={b,d,c,e,a} is a model
So what is the least model? It's S={b,c,d} since no strict subset of S satisfies P.
We can arrive to the least model of a positive program P in two ways:
Enumerate all models and take their intersection (here {b,c,d}∩{a,b,c,d,e}={b,c,d}).
Starting with the facts (here b. d.) and iteratively adding implied atoms (here c :- b,d.) to S, repeating until S is a model and stopping at that point.
Like your slides say, the definition of an answer set for a positive program P is: S is an answer set of P if S is the least model of P. To be stricter, this is actually if and only if, since the least model LM(P) is unique.
As a last note, so you are not later confused by them, a constraint :- a, b is actually just shorthand for x :- not x, a, b. Thus programs containing constraints are not positive programs; though they might look like they are at first, since the body of a constraint seemingly doesn't contain a not.