Converting 3NF to BCNF when there is a circular dependency - database-normalization

If we have a relational schema R(A, B, C, D), with the set of dependencies:
ABC -> D
D -> A
How is it possible to decompose R into BCNF relations? The only plausible way seems to be to discard one of the FDs, no matter how I think about it. Is there any other way?

That's right, one can always losslessly decompose to 3NF while preserving FDs but BCNF might not preserve them. Nevertheless it's a lossless decomposition: the components, if holding projections of the original, will join to the original. But whenever the original would have had a given value, the components should be projections of it. (If they're not, an error has been made, so we want the DBMS to constrain the components appropriately.) So it is necessary but sufficient to constrain the components to be projections of the original. ABC is trivially so (because it is a key). This leaves us needing to require that AD = ABCD PROJECT {DA}. We say that the components must satisfy that "equality dependency".

Related

Why do Calculus of Construction based languages use Setoids so much?

One finds that Setoids are widely used in languages such as Agda, Coq, ... Indeed languages such as Lean have argued that they could help avoid "Setoid Hell". What is the reason for using Setoids in the first place?
Does the move to extensional type theories based on HoTT (such as cubical Agda) reduce the need for Setoids?
As Li-yao Xia's answer describes, setoids are used when we don't have or don't want to use quotients.
In the HoTT book and in Lean quotients are (basically) axiomatized. One difference between Lean and the HoTT book is that the latter has many more higher inductive types, and Lean only has quotients and (regular) inductive types. If we just restrict our attention to quotients (set quotients in the HoTT book), it works the same in Lean and in Book HoTT. In this case you just postulate that given a type A and an equivalence R on A you have a quotient Q, and a function [-] : A → Q with the property ∀ x y : A, R x y → [x] = [y]. It comes with the following elimination principle: to construct a function g : Q → X for some type X (or hSet X in HoTT) we need a function f : A → X such that we can prove ∀ x y : A, R x y → f x = f y. This comes with the computation rule, that states ∀ x : A, g [x] ≡ f x (this is a definitional equality in both Lean and Book HoTT).
The main disadvantage of this quotient is that it breaks canonicity. Canonicity states that every closed term (that is, a term without free variables) in (say) the natural numbers normalizes to either zero or the successor of some natural number. The reason that this quotient breaks canonicity is that we can apply the elimination principle for = to the new equalities in a quotient, and a term like that will not reduce. In Lean the opinion is that this doesn't matter, since in all cases we care about we can still prove an equality, even though it might not be a definitional equality.
Cubical type theory has a fancy way to work with quotients while retaining canonicity. In CTT equality works differently, and this means that higher inductive types can be introduced while keeping canonicity. Potential disadvantages of CTT are that the type theory is a lot more complicated, and that you have to embrace HoTT (and in particular give up on proof irrelevance).
[The answers by Lia-yao Xia and Floris van Doorn are both excellent, so I will try to augment them with additional information.]
Another view is that quotients, while used a lot in classical mathematics, are perhaps not so great after all. Not taking quotients but sticking to Groupoids is exactly where non-commutative geometry starts from! It teaches us that some quotients are incredibly badly behaved, and the last thing we want to do (in those cases!) is to actually quotient. But that the thing itself is not so bad, even quite good, if you treat it using the 'right' tools.
It is arguably also deeply embedded in all of category theory, where one doesn't quotient out equivalent objects. Taking of 'skeletons' in category theory is regarded to be in bad taste. The same is true of strictness, and a host of other things, all of which boil down to trying to squish things down that are better left as they are, as they do no harm at all. We're just used to wanting 'uniqueness' to be reflected in our representations - something we should just get over.
Setoid hell arises not because some coherences must be proven (you need to prove them to show you have a proper equivalence, and again whenever you define functions on raw representations instead of on the quotiented version). It arises when you're forced to prove these coherences again and again when defining functions that can't possibly "go wrong". So Setoid hell is actually caused by a failure to provide proper abstraction mechanisms.
In other words, if you're doing sufficient simple mathematics, where quotients are well-behaved, then there should be some automation that lets you work with that smoothly. Currently, in type theory, working out exactly what that could look like, is ongoing research. Floris' answer outlines well what one pitfall is: at some point, you give up that computations will be well-behaved, and from then on, are forced to do everything via proofs.
Ideally one would certainly like to be able to treat arbitrary equivalence relations as Leibniz equality (eq), enabling rewriting in arbitrary contexts. That means to define the quotient of a type by an equivalence relation.
I'm not an expert on the topic, but I've been wondering the same for a while, and I think the reliance on setoids stems from the fact that quotients are still a poorly understood concept in type theory.
Setoid Hell is where we're stuck when we don't have/want quotient types.
We can axiomatize quotient types, I believe (I could be mistaken) that's what Lean does.
We can develop a type theory which can naturally express quotients, that's what HoTT/Cubical TT do with higher inductive types.
Furthermore, quotient types (or my naive imagination of them) force us to package programs and proofs together in a perhaps less-than-ideal way: a function between two quotient types is a plain function together with a proof that it respects the underlying equivalence relation. While one can technically do that, this interleaving of programming and proving is arguably indesirable because it makes programs unreadable: one often seeks to either keep programs and proofs in two completely separate worlds (so that mandates setoids, keeping types separate from their equivalence relations), or to change some representations so the program and the proof are one and the same entity (so we might not even need to explicitly reason about equivalences in the first place).

Are there algebraic data types outside of sum and product?

By most definitons the common or basic algebraic data types in Haskell or Scala are sum and product. Examples: 1, 2.
Sometimes a definition just says algebraic data types are sum and product, perhaps for simplicity.
However, the definitions leave an impression that other algebraic data types are possible, and sum and product are just the most useful to describe selection or combination of elements.
Given there are subtraction, division, raising to an integer power operations in a basic algebra - is it correct some implementation of other alternative algebraic types in programming is possible, but they are not useful?
Do any programming languages have algebraic data types implemented that are not sum and product types?
"Algebraic" comes from category theory. Every algebraic data type is an initial algebra of a functor. So you could in principle call anything that comes from a functor in this way algebraic, and I think it's quite a large class.
Interpreting "algebraic" to mean "high-school algebra" (I don't mean to be condescending, that's just how we refer to it) as you have, there are some nice analogies.
Arbitrary powers, not just integer powers, are closely analogous to function types, that is, A -> B is analogous to BA. In category theory, when you consider a function ("morphism") as an object of a category, it's called an exponential object, and the latter notation is used. For fun, see if you can prove the law CA+B = CA × CB by writing a bijection between the corresponding types.
Division is analogous to quotient types, which is a fascinating area of research that reaches into things as hott and trendy as homotopy type theory. The analogy of quotients to division is not as strong as product types with multiplication, as you have to divide by an equivalence relation.
At this rate, you would expect subtraction to have some beautiful analogy to go with it, but alas I know of none. Dan Piponi has explored it a little through the antidiagonal, but it is far from a general analogy.
The conventional answer is as given by #luqui, and I have nothing to add to that. The downside is that whilst you distinguish the alternants of the sum by tag ('constructor' in Haskell), you must access the components of the product by position. For homogeneous components as in a vector/array that's fine; but for typical data structures (record types) you want to access by 'field label' and abstract away the position. Haskell's record/label system a) does a very bad job of that; and b) is so deeply ingrained into the language it's proving almost impossible to improve -- see endless proposals and endless discussions resulting so far in no change.
Then an unconventional answer is indexed families aka 'indexed sets'. The idea has been developed mostly around the 'Set-Theoretic Data Structures' of D.L.Childs, for example this one. Childs'approach got a mention in the seminal paper on the Relational (database) Model Codd 1970.
The critical feature is that you can use any type to index the collection of components; the components are heterogeneous; and the compiler supports type-safe access (read and update) both by-component and whole-structure. The components might well be organised positionally within the structure, but that's an implementation detail hidden from the programmer. (Haskell's record system fails on this point.)
Do any programming languages have algebraic data types implemented that are not sum and product types?
You might or might not accept that SQL is a programming language. I might but mostly don't accept that SQL 'column names' are an implementation of 'Indexed families'. SQL's columns and rows are far too much oriented to physical layout (and indeed most vendors' SQL still allows positional notation for columns, even though it's been deprecated by the standard). That said, SQL is the nearest you'll find.
There's been a few extensible/anonymous record systems proposed/developed in Haskell (especially HList) or Haskell-like languages (like Ur/web), or even dear old Hugs' TRex. (See the Gaster & Jones paper for links to other attempts in FP languages.) All of them are limited because they're trying to put lipstick on Haskell's sum-of-product types.
I know Sum, Product, Exponential and Recursive

Skipping steps in Normalization?

Just curious: is there some reason why one cannot do all necessary normalizations
in a single step? Isnt normalization ultimately the redrawing of the Functional Dependency (FD) graph? We start out with an FD diagram/graph and we want to end up with a graph (vertices are attributes, there is an edge between attributes a,b if b is FD on a ) representing a relation in (Edit) BCNF ?
EDIT: What I mean is : we start with a FD graph , which is a graph pairing attributes a,b iff b is FD on A, i.e., we join a and b with an edge iff b=f(a).
From this graph we want to obtain a graph (FD)_2 with certain traits, which are equivalent to having been fully normalized, i.e., (FD)_2 is in 5NF or 6NF, using the graph-theoretical relation between a graph and a given normal form. If So we are basically mapping one graph to another graph. Can we use this approch-- drawing (FD)_2 directly, as a function of FD, to skip normalization steps?
Yes: Normalization can be characterized by rearranging (hyper)graphs. It does not have to be done by moving through normal forms in some order. (It's just a common misconception that it is.)
The normal forms on the continuum from 1NF to 6NF are those dealing with problematic FDs (functional dependencies) and JDs (join dependencies). They can be ordered so that if a relation value or variable satisfies a form then it satisfies the forms before but not necessarily after. Currently: 1NF, 2NF, 3NF, EKNF, BCNF, 4NF, ETNF, RFNF, SKNF, 5NF aka PJ/NF, Overstrong PJ/NF, 6NF. This ordering has nothing to do per se with decomposing to relation values or variables that are in higher normal forms. It is not necessary to decompose through a sequence of forms.
The normal forms are just different conditions that have been found with helpful properties. Moreover, the normal forms are just those that have been discovered; there may well be other helpful properties to be distinguished. We don't pass through them to normalize now. ETNF is 2012!
As to your graph characterization:
A FD has a set of attributes as determinant. Which determines another set. But since the one determines the other if and only if the one determines each of the sets that contain exactly one member of the other, informally but unambiguously we also talk about a set of attributes determining an attribute. A FD {...} -> a holds iff a = f(...). (There can be zero or more determinant attributes.) BCNF is the highest normal form re problematic FDs, but there are higher normal forms re problematic JDs. A JD with given components holds in a relation iff it is always their join. Ie its meaning/predicate can be expressed as the AND of the components'. So a FD {...} -> A holds iff a JD holds corresponding to a meaning/predicate with conjunct A = F(...)! A MVD (multi-valued dependency) corresponds to a certain binary JD. 5NF means that every JD that holds is "implied by the keys" (a technical term).
There are algorithms that starting with FDs decompose directly to 2NF, directly to 3NF and directly to BCNF (with various other properties like preservation of FDs). See the Alice book. One can decompose to 6NF simply by decomposing until there are no nontrivial JDs, without regard to FDs.
(See C. J. Date's Database Design and Relational Theory: Normal Forms and All That Jazz.)

Existence of a 0- and 1-valent configurations in the proof of FLP impossibility result

In the known paper Impossibility of Distributed Consensus with one Faulty Process (JACM85), FLP (Fisher, Lynch and Paterson) proved the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death.
In Lemma 3, after showing that D contains both 0-valent and 1-valent configurations, it says:
Call two configurations neighbors if one results from the other in a single step. By an easy induction, there exist neighbors C₀, C₁ ∈ C such that Dᵢ = e(Cᵢ) is i-valent, i = 0, 1.
I can follow the whole proof except when they claim the existence of such C₀ and C₁. Could you please give me some hints?
D (the set of possible configurations after applying e to elements of C) contains both 0-valent and 1-valent configurations (and is assumed to contain no bivalent configurations).
That is — e maps every element in C to either a 0-valent or a 1-valent configuration. By definition of C, there must be a root element that is connected to all other elements by a series of "neighbour" relationships, so there must be a boundary point where an element in C that leads to a 0-valent configuration after e is neighbours with an element in C that leads to a 1-valent configuration after e.
I once went down the path of reading all these papers only to discover its a complete waste of time.
The result is not surprising at all.
The paper you mention "[Impossibility of Distributed Consensus with One Faulty
Process]" 1
is a long list of complex mathematical proofs that simply equate to:
1) Consensus is a deterministic state
2) one (or more) faulty systems within an environment is a non deterministic environment
3) No deterministic state, action or outcome can ever be reached within a non deterministic environment.
The end. No further thought is required.
This is how it works in the real world outside of acadamia.
If you wish for agents to reach consensus then Synchronous (Timing model) approximation constructs have to be added to make the environment deterministic within a given set of constraints. For example simple constructs like Timeouts, Ack/Nack, Handshake, Witness, or way more complex constructs.
The closer you wish to get to a Synchronous deterministic model the more complex the constructs become. A hypothetical Synchronous model would have infinitely complex constructs. Also bearing in mind that a fully deterministic Synchronous model can never be achieved in a non trivial distributed system. This is because in any non trivial dynamic multi variate system with a variable initial state there exists an infinite number of possible states, actions and outcomes at any point in time. Chaos Theory
Consider the complexity of a construct for detecting a dropped TCP packets because of buffer overflow errors in a router at hop number 21. And the complexity of detecting the same buffer overflow error dropping the detection signal from the construct itself.
Define a mapping f such that f(C) = 0, if e(C) is 0-valent, otherwise, f(C) = 1, if e(C) is 1-valent.
Because e(C) could not be bivalent, if we assume that D has no bivalent configuration, f(C) could only be either 0 or 1.
Arrange accessible configurations from the initial bivalent configuration in a tree, there must be two neighbors C0, C1 in the tree that f(C0) != f(C1). Because, if not, all f(C) are the same, which means that D has only either all 0-valent configurations or all 1-valent configurations.

Algorithm generation

I have a rather large(not too large but possibly 50+) set of conditions that must be placed on a set of data(or rather the data should be manipulated to fit the conditions).
For example, Suppose I have the a sequence of binary numbers of length n,
if n = 5 then a element in the data might be {0,1,1,0,0} or {0,0,0,1,1}, etc...
BUT there might be a set of conditions such as
x_3 + x_4 = 2
sum(x_even) <= 2
x_2*x_3 = x_4 mod 2
etc...
Because the conditions are quite complex in that they come from experiment(although they can be written down in logic form) and are hard to diagnose I would like instead to use a large sample set of valid data. i.e., Data I know satisfies the conditions and is a pretty large set. i.e., it is easier to collect the data then it is to deduce the conditions that the data must abide by.
Having said that, basically what I'm doing is very similar to neural networks. The difference is, I would like an actual algorithm, in some sense optimal, in some form of code that I can run instead of the network.
It might not be clear what I'm actually trying to do. What I have is a set of data in some raw format that is unique and unambiguous but not appropriate for my needs(in a sense the amount of data is too large).
I need to map the data into another set that actually is ambiguous to some degree but also has certain specific set of constraints that all the data follows(certain things just cannot happen while others are preferred).
The unique constraints and preferences are hard to figure out. That is, the mapping from the non-ambiguous set to the ambiguous set is hard to describe(which is why it is ambiguous). The goal, actually, is to have an unambiguous map by supplying the right constraints if at all possible.
So, on the vein of my initial example, I'm given(or supply) a set of elements and need some way to derive a list of constraints similar to what I've listed.
In a sense, I simply have a set of valid data and train it very similar to neural networks.
Then, after this "Training" I'm given the mapping function I can then use on any element in my dataset and it will produce a new element satisfying the constraint's, or if it can't, will give as close as possible an unambiguous result.
The main difference between neural networks and what I'm trying to achieve is I'd like to be able to use have an algorithm to code to be used instead of having to run a neural network. The difference here is the algorithm would probably be a lot less complex, not need potential retraining, and a lot faster.
Here is a simple example.
Suppose my "training set" are the binary sequences and mappings
01000 => 10000
00001 => 00010
01010 => 10100
00111 => 01110
then from the "Magical Algorithm Finder"(tm) I would get a mapping out like
f(x) = x rol 1 (rol = rotate left)
or whatever way one would want to express it.
Then I could simply apply f(x) to any other element, such as x = 011100 and could apply f to generate a hopefully unambiguous output.
Of course there are many such functions that will work on this example but the goal is to supply enough of the dataset to narrow it down to hopefully a few functions that make the most sense(at the very least will always map the training set correctly).
In my specific case I could easily convert my problem into mapping the set of binary digits of length m to the set of base B digits of length n. The constraints prevents some numbers from having an inverse. e.g., the mapping is injective but not surjective.
My algorithm could be a simple collection if statements acting on the digits if need be.
I think what you are looking for here is an application of Learning Classifier Systems, LCS -wiki. There are actually quite a few LCS open-source applications available, but you may need to experiment with the parameters in order to get a good result.
LCS/XCS/ZCS have the features that you are looking for including individual rules that could be heavily optimized, pressure to reduce the rule-set, and of course a human-readable/understandable set of rules. (Unlike a neural-net)