Database normalization 2NF and 3NF - database-normalization

Assume the relation Appliance(model, year, price, manufacturer, color)
with {model, year} as a key and following FDs:
model -> manufacturer
model, year -> price
manufacturer -> color
Find 2NF and 3NF.
My solution was this:
Since model -> manufacturer violates 2NF because of partial dependency, I decomposed Appliance as below:
R1(model, manufacturer)
R2(model, year, price)
R3(manufacturer, color)
Similarly model -> manufacturer and manufacturer -> color violates 3NF because of transitive dependency so, I decompose Appliance as below:
R1(model, manufacturer)
R2(model, year, price)
R3(model, color)
My question is what is wrong with my normalization?

Your normalization for 2NF is right. You might want to think harder about whether a relation violates 2NF or 3NF, or whether a functional dependency violates 2NF or 3NF. You say both.
In your 2NF decomposition, R1, R2, and R3 are in 5NF. (So, by definition, they're also in 3NF and 2NF.)
For 3NF, you've lost the FD manufacturer -> color. So that's wrong.
In the real world, normalizing a relation like "Appliance" might result in more than one 5NF decomposition.

Related

Storing and Using Value Ranges (PostgreSQL)

I have a PostgreSQL database where physical activities store a certain energy decimal value, e.g.
ACTIVITY ENERGY
-----------------
Activity1 0.7
Activity2 1.3
Activity3 4.5
I have a Classification system that classifies each Energy value as
Light: 0 - 2.9
Moderate: 3.0 - 5.9
Vigorous: >= 6.0
The Classification and Energy Values are subject to change. I need a way to quickly get the Type of each activity. But how to store these in a way which is easy to retrieve?
One solution, is define MIN/MAX lookups of the Type "Classification" -- and pull up all available classifications; then do a CASE/WHEN to go through each one.
LOOKUP_ID LOOKUP_NAME LOOKUP_VALUE LOOKUP_TYPE
---------------------------------------------------------
1 LIGHT_MIN 0 CLASSIFICATION
2 LIGHT_MAX 2.9 CLASSIFICATION
3 MODERATE_MIN 3 CLASSIFICATION
4 MODERATE_MAX 5.9 CLASSIFICATION
5 VIGOROUS_MIN 6 CLASSIFICATION
6 VIGOROUS_MAX null CLASSIFICATION
But this doesn't look very easy to me -- if a developer needs to get the current Classiication they'll have to step through different cases and compare them.
Is there a better strategy to capture these ranges, or is this the right one?
Use a range type
create table classification
(
description text,
energy numrange
);
insert into classification
(description, energy)
values
('Light', numrange(0,3.0,'[)')),
('Moderate', numrange(3.0, 6.0, '[)')),
('Vigorous', numrange(6.0, null, '[)'));
Then you can join those two tables using the <# operator:
select *
from activity a
join classification c on a.energy <# c.energy
The nice thing about the range type is that you can prevent inserting overlapping ranges by using an exclusion constraint
alter table classification
add constraint check_range_overlap
exclude using gist (energy with &&);
Given the above sample data, the following insert would be rejected:
insert into classification
(description, energy)
values
('Strenuous', numrange(8.0, 11.0, '[)'));
I don't think this is a great solution, but it seems preferable to the model above.
Create a table with the ranges and classifications:
create table classification (
energy_min numeric,
energy_max numeric,
classification text
);
Then do a join on that table as follows:
a.activity, a.energy, c.classification
from
activities a
left join classification c on
a.energy >= c.energy_min and
(a.energy <= c.energy_max or c.energy_max is null);
If the possible classifications is relatively small, this should work well enough. I don't think it's efficient on the back end, as it's likely doing a cross-join on the classification table. That said, if it's three (or even ten) records, it's not that big of a deal.
It should scale very well and enable you to modify values on the fly and get results quickly.
If you really want to get fancy, you can also include effectivity from and thru dates on the "classification" table that enable you to change the classifications but also retain historical classifications for older records.

Genetic algorithm encoding technique to be used in this scenario

The problem is to find the optimum quantity that incurs minimum total cost in a number of warehouses using genetic algorithm.
Let's say there are n warehouses. Associated with each warehouse are a few factors:
LCosti: loading cost for warehouse i
HCosti: holding cost for warehouse i
TCosti: transportation cost for warehouse i
OCosti: ordering cost for warehouse i
Each warehouse has quantity Qi associated with it that must satisfy these 4 criteria:
loading constraint: Qi * LCosti >= Ai for warehouse i
holding constraint: Qi * HCosti >= Bi for warehouse i
Transportation constraint: Qi * TCosti >= Ci for warehouse i
Ordering constraint: Qi * OCosti >= Di for warehouse i
where A, B, C and D are constants for each of the warehouses.
Another important criterion is that each Qi must satisfy:
Di >= Qi
where Di is the demand in warehouse i.
And the equation of total cost is:
Total cost = sum(Qi * (LCosti + HCosti + TCosti) + OCosti / Qi)
How do I encode a chromosome for this problem? What I am thinking is that combining one of the four constraints that gives a minimum allowable value for Qi and the last constraint, I can get a range for Qi. Then I can randomly generate values in that range for the initial population. But how do I perform crossover, and mutation in the above scenario? How do I encode the chromosomes?
Generally, in constrained problems you have basically three possible approaches (regarding evolutionary algorithms):
1. Incorporate constraint violation into fitness
You can design your fitness as a sum of the actual objective and penalties for violation of constraints. The extreme case is a "death penalty", i.e. any individual which violates any constraint in any way receives the worst possible fitness.
This approach is usually very easy to implement but, however, has a big drawback: it often penalizes solutions that have good building blocks but violate the constraints too much.
2. Correction operators, resistant encoding
If it is possible for your problem, you can implement "correction operators" - operators that take a solution that violate constraints and transform it into another one that does not violate the constraints, preserving as much structure from the original solution as possible. Similar thing is to use such an encoding that guarantees that the solution will always be feasible, i.e. you have such a decoding algorithm that always produces valid solution.
If it is possible, this is probably the best approach you can take. However, it is often quite hard to implement, or not possible without major changes in the solutions which can then significantly slow the search down, or even make it useless.
3. Multi-objective approach
Use some multi-objective (MO) algorithm, e.g. NSGA-II, and turn your measure(s) of constraint violation into objectives and optimize all the objectives at once. The MO algorithms usually provide a pareto-front of solutions - a set of solutions that are on the front of the objective-violation tradeoff space.
Using Differential Evolution you can keep the same representation and avoid the double conversion (integer -> binary, binary -> integer).
The mutation operation is:
V(g+1, i) = X(g, r1) + F ⋅ (X(g, r2) − X(g, r3))
where:
i, r1, r2, r3 are references to vectors in the population and none is equal to the other
F is a random constant in the [0, 1.5] range
V (the mutant vector) is recombined with elements of a target vector (X(g, i)) to build a trial vector u(g+1, i). The selection process chooses the better candidate from the trial vector and the target vector (see the references below for further details).
The interesting aspects of this approach are:
you haven't to redesign the code. You need a different mutation / recombination operator and (perhaps) you have to cast some reals to integers, but it's simple and fast;
for constraint management you can adopt the techniques described in zegkljan's answer;
DE has been shown to be effective on a large range of optimization problems and it seems to be suitable for your problem.
References:
Explain the Differential Evolution method and an old Dr.Dobb's article (by Kenneth Price and Rainer Storn) as introduction;
Storn's page for more details and many code examples.

Responsibility matrix(Raci) in enterprise-architecture

I have pools and lanes with some activities within them in BPMN 2.0 Business Process Diagram. I want showing Lanes ( or pools) with their activity in Relationship Matrix.
I choose lanes (pools) in Source and activities in Target or vica versa in relationship matrix, but the relations could not be shown.
How can i select Link Type in relashionship matrix? How can I resolve my problem? how should I relate activities to lanes for showing relations in Relationship Matrix?
You can't in this way. The relationship matrix is based on connections established between elements. But this is a structural relation. And this is already visible in the project browser. But of course there you can't play with that relation wizardry (which in the case of pools does not make sense anyway).

Skipping steps in Normalization?

Just curious: is there some reason why one cannot do all necessary normalizations
in a single step? Isnt normalization ultimately the redrawing of the Functional Dependency (FD) graph? We start out with an FD diagram/graph and we want to end up with a graph (vertices are attributes, there is an edge between attributes a,b if b is FD on a ) representing a relation in (Edit) BCNF ?
EDIT: What I mean is : we start with a FD graph , which is a graph pairing attributes a,b iff b is FD on A, i.e., we join a and b with an edge iff b=f(a).
From this graph we want to obtain a graph (FD)_2 with certain traits, which are equivalent to having been fully normalized, i.e., (FD)_2 is in 5NF or 6NF, using the graph-theoretical relation between a graph and a given normal form. If So we are basically mapping one graph to another graph. Can we use this approch-- drawing (FD)_2 directly, as a function of FD, to skip normalization steps?
Yes: Normalization can be characterized by rearranging (hyper)graphs. It does not have to be done by moving through normal forms in some order. (It's just a common misconception that it is.)
The normal forms on the continuum from 1NF to 6NF are those dealing with problematic FDs (functional dependencies) and JDs (join dependencies). They can be ordered so that if a relation value or variable satisfies a form then it satisfies the forms before but not necessarily after. Currently: 1NF, 2NF, 3NF, EKNF, BCNF, 4NF, ETNF, RFNF, SKNF, 5NF aka PJ/NF, Overstrong PJ/NF, 6NF. This ordering has nothing to do per se with decomposing to relation values or variables that are in higher normal forms. It is not necessary to decompose through a sequence of forms.
The normal forms are just different conditions that have been found with helpful properties. Moreover, the normal forms are just those that have been discovered; there may well be other helpful properties to be distinguished. We don't pass through them to normalize now. ETNF is 2012!
As to your graph characterization:
A FD has a set of attributes as determinant. Which determines another set. But since the one determines the other if and only if the one determines each of the sets that contain exactly one member of the other, informally but unambiguously we also talk about a set of attributes determining an attribute. A FD {...} -> a holds iff a = f(...). (There can be zero or more determinant attributes.) BCNF is the highest normal form re problematic FDs, but there are higher normal forms re problematic JDs. A JD with given components holds in a relation iff it is always their join. Ie its meaning/predicate can be expressed as the AND of the components'. So a FD {...} -> A holds iff a JD holds corresponding to a meaning/predicate with conjunct A = F(...)! A MVD (multi-valued dependency) corresponds to a certain binary JD. 5NF means that every JD that holds is "implied by the keys" (a technical term).
There are algorithms that starting with FDs decompose directly to 2NF, directly to 3NF and directly to BCNF (with various other properties like preservation of FDs). See the Alice book. One can decompose to 6NF simply by decomposing until there are no nontrivial JDs, without regard to FDs.
(See C. J. Date's Database Design and Relational Theory: Normal Forms and All That Jazz.)

Converting 3NF to BCNF when there is a circular dependency

If we have a relational schema R(A, B, C, D), with the set of dependencies:
ABC -> D
D -> A
How is it possible to decompose R into BCNF relations? The only plausible way seems to be to discard one of the FDs, no matter how I think about it. Is there any other way?
That's right, one can always losslessly decompose to 3NF while preserving FDs but BCNF might not preserve them. Nevertheless it's a lossless decomposition: the components, if holding projections of the original, will join to the original. But whenever the original would have had a given value, the components should be projections of it. (If they're not, an error has been made, so we want the DBMS to constrain the components appropriately.) So it is necessary but sufficient to constrain the components to be projections of the original. ABC is trivially so (because it is a key). This leaves us needing to require that AD = ABCD PROJECT {DA}. We say that the components must satisfy that "equality dependency".