Minimal cover FD - database-normalization

I have the following exercise text:
Stack Overflow is a community where users (US), with a specific number of years of experience (Ye), can ask questions and give answers. Based on the given answers, each user earns a reputation (Re) and a badge (Ba), a sort of medal (e.g., bronze, silver and gold badge). The number of years of experience starts from the registration date (Da) on Stack Overflow. Each user can gain some money (Mo) by answering a question. Please note that the description above is not referring to the real Stack Overflow, but it’s useful to define the following exercise. The following FDs are satisfied:
Each user has a specific number of years of experience and a specific badge
The years of experience uniquely determine the user’s reputation and the amount of money for each answer
For each user and for each reputation is defined a unique badge
The registration date determines a single value of years of experience
Each user earns a specific amount of money for each answer
The FDs I defined are the following ones:
YeBa -> US
Ye -> ReMo
Ba -> USRe
Da -> Ye
Mo -> US
I'm not sure about the third one and the last one. For the third one I don't know if it is correct the one I wrote, or there must be two FDs (Ba -> US, Ba -> Re). For the last one I'm not sure how to represent it, because the sentence doesn't tell exactly what I wrote, and I think it is wrong because it asks me to apply the minimal cover and with these FDs it is not possible.

A functional dependency X → Y expresses the fact that values of the set of attributes Y are determined in a unique way from the values of the set of attributes X. In other words, for each combination of X values there is a unique, specific combination of values of Y.
Let's examine the facts described in the exercise.
Each user has a specific number of years of experience and a specific badge
You have represented this fact with the dependency:
Ye Ba → US
but this is exactly the opposite. With the above FD, in fact, you are saying that given a number of years of experience and a certain badge, there is only a user with such years and badge, but in the reality it is possible that different users have the same years of experience and badge. The fact specified by the sentence can instead be expressed by the FD:
1. US → Ye Ba
that is, each user determines (has uniquely) a number of years of experience and a badge.
Let see the other facts:
The years of experience uniquely determine the user’s reputation and the amount of money for each answer
2. Ye → Re Mo
(this is correct)
For each user and for each reputation is defined a unique badge
This is somewhat ambiguous, as pointed in a comment, in the sense that the badge is probably determined by the reputation only, but if we follow the specification and write the FD corresponding to it, we can write:
3. US Re → Ba
(not the viceversa), and this will be simplified when we will compute a minimal cover.
The registration date determines a single value of years of experience
4. Re → Ye
(correct)
Each user earns a specific amount of money for each answer
5. US → Mo
again, there is a certain, specific, amount of money earned by a user (and not the viceversa, since that would mean that for a certain amount of money earned, only a single user can earn it).
So we have now the dependencies:
US → Ye Ba
Ye → Re Mo
US Re → Ba
Re → Ye
US → Mo
and we can compute a minimal cover from it. Here is a possible computation (note that it is always possible to compute a minimal cover from any set of dependencies).
First we transform the dependencies with a single attribute on the right:
Re US → Ba
Re → Ye
US → Ba
US → Ye
US → Mo
Ye → Mo
Ye → Re
In Re US → Ba, Re is extraneous since {US}+ = (Ba Mo Re US Ye). So we replace this dependency with US → Ba (which is already present).
From the remaining dependencies, we can now remove the redundant dependency:
US → Mo
This is because we have US → Ye and Ye → Mo.
So, the final set of dependencies (a minimal cover) is:
Re → Ye
US → Ba
US → Ye
Ye → Mo
Ye → Re

Related

Netlogo: Built-in function to calculate the expected profit

Sorry for long post. I am newbie in agent-based modelling. So please accept my apology in advance if my question sounds stupid. I am trying to model a scenario where framer (i.e. agent) decides which type of crop should be harvest in different types of fields to increase the profit. The farmer agent has a budget i.e. the amount of money that can be spent on farming each time step equal to $100.
The farmer operates a farm that is subdivided into nine fields, which are arranged in a 3x3
cellular grid. Each field is of the same size. Water availability varies spatially across the fields with a rating of either 1 (driest), 2 (moderate),
or 3 (wettest). The manner in which water availability varies across the fields (i.e. randomly).
The farmer must choose among three crops. As initial parameter settings, the crops have the
following characteristics:
Yield Price Costs Minimum Water Req.
Crop 1 300 20 15 3
Crop 2 200 12 10 2
Crop 3 100 7 5 1
Each crop requires a certain amount of water to grow. Crop yields will only be realized if the crop is
planted in a field with at least the crop’s minimum water requirement.
Now the problem is that I couldn't find any function in Netlogo that calculates the permutation or combination of crop, field, and water requirements to calculate the expected profit. Any help would be high appreciated.
I believe you describe a linear programming problem.
Useful functions for solving Simplex Linear Programming problems are in NumAnal extension, which does not come bundled with NetLogo but which you can get as follows:
In NetLogo, under Tools / Extensions ... you can find NumAnal, probably with no green check-mark. Select it. On the right, you have buttons to install it, and then one to add it to your code. When you click those, it should now get a green checkmark and you should have a new line in your code "extensions [ numanal ]", and you are now able to use those commands, with the "numanal:" prefix, for example, numanal:simplex.
The documentation for it is in the folder where it was installed. But where is that?
Sadly, the documentation for where extensions are downloaded is not current.
https://ccl.northwestern.edu/netlogo/docs/extensions.html#where-extensions-are-located
After exhaustive search by date-modified, I actually found the folder on my Windows 10 laptop here: c:\Users\condor\AppData\Roaming\NetLogo\6.1\extensions
( Note the "\Roaming\" ).
That folder has a README.md text file, and a pdf document named "NumAnal-v3.4.0" explaining how to use it, and an examples folder with code. It is a little dense.
Here's a link to the basics of how to describe a Linear Programming problem, which is beyond the scope of StackOverflow. You can find help via Google.
Here's one 8 minute video ( as of 24-Nov-2019) that might help you figure out if this is what you need.
Simplex Algorithm Explanation (How to Solve a Linear Program)
https://www.youtube.com/watch?v=RO5477EKlXE

Inter annotator agreement when users annotates more than one category for any subject

I want to find the inter annotator agreement for few annotators.
Annotators annotates few categories (out of 10 categories) for each subjects.
For e.g. there are 3 annotator , 10 categories and 100 subjects .
I am aware about http://en.wikipedia.org/wiki/Cohen's_kappa (For two annotators) and http://en.wikipedia.org/wiki/Fleiss%27_kappa (for more than two annotators) inter annotator agreement but I realized that they may not work if user annotates more than one category for any subject.
Do anyone has any idea for determining inter annotation agreement in this scenario.
Thanks
i had to do this several years back. i cant recall how exactly i did it(i dont have code anymore) but i have a worked example to report to my professor. i was dealing with annotation of comments and have 56 categories and 4 annotators.
note:at the time i need a way to detect where annotators most disagree so that after each annotation session they can focus on why they disagree and set out reasonable rules to maximize this statistic. it worked well for that purpose
Let's assume A-D are annotators and 1-5 are categories. This is a possible scenario.
A B C D Probability of agreement
1 X X X X 4/4
2 X X X 3/4
3 X X 2/4
4 X 1/4
5
A tags this comment as 1,2,3,4 B->1,2,3, and so forth.
For each category the probability of agreement is calculated.
Which is then divided by the number of unique categories tagged for that particular comment.
Therefore for the example comment, we have 10/16 as annotator's agreement. This is a value between 0 and 1.
if this doesnt work for you then (http://www.mitpressjournals.org/doi/pdf/10.1162/coli.07-034-R2) pg-567, which was referenced by pg-587 case study.
Compute agreement on a per-label basis. If you treat one of the annotators as the gold standard, you can then compute recall and precision on label assignments. Another option is label overlap, which would be the proportion of subjects where either annotator assigned a category where the both assigned it (intersection over union).

Byzantine's General

So I was reading Lamport's paper on Byzantine Generals in which he proves that for T malicious generals you need 2T+1 generals in a group to read a consensus. However I dont understand how. If there are T malicious nodes making up stuff, you just need T+1 votes to outvote them. Why is that not the case?
There is a section on Wikipedia about this:
One solution considers scenarios in which messages may be forged, but which will be Byzantine-fault-tolerant as long as the number of traitorous generals does not equal or exceed one third. The impossibility of dealing with one-third or more traitors ultimately reduces to proving that the 1 Commander + 2 Lieutenants problem cannot be solved, if the Commander is traitorous. The reason is, if we have three commanders, A, B, and C, and A is the traitor: when A tells B to attack and C to retreat, and B and C send messages to each other, forwarding A's message, neither B nor C can figure out who is the traitor, since it isn't necessarily A – the other commander could have forged the message purportedly from A. It can be shown that if n is the number of generals in total, and t is the number of traitors in that n, then there are solutions to the problem only when n is greater than or equal to 3t + 1
you just need T+1 votes to outvote them. Why is that not the case?
This makes sense if all loyal generals produce the same answer, but that's not the case for BGP systems, where each honest element can give you a different answer.
BGP is for systems where each element sees a different information. Example: redundant radars. It is not for systems where the elements are mirrored (ex. redundant HDs).
Example:
Generals: A, B, C;
Traitor: C;
A says "attack";
B says "retreat";
C says "attack" to A, and "retreat" to "B";
Result: A thinks it has reached agreement and it will attack alone;

Discovering the shortest "path" between two people on Facebook [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am not a very experienced programmer with these kind of things, but I wanted to write a program that receives two Facebook profiles, and returns the shortest "path" of people between them.
I guess that the running time is huge and etc, but if I would start making that program, what language should I choose? What libraries should I use? What general direction should I go in?
When it comes to the language, you should use whatever you are most comfortable with. They have sample code for PHP, for example, so if you know PHP you could start with that. Java would work too.
Now, I don't know if the fbook API already has some function which performs this task. But, as you have already alluded to, you want to find the "shortest path." In fact, there are many algorithms out there which will find the shortest path between two nodes of a graph.
You are looking for the shortest path between two nodes of a graph. What's a graph?
A graph just what it sounds like - a collection of nodes and edges. In this case, each person would be a node. And the edges, which connect nodes, are formed by friendships.
So lets say you have Friend X, who has friends {A, B, C, D} and Friend Y, who has friends (B, D, E, F}. You's start by creating a graph of all of the friends (that is, take the union of the two sets). {A, B, C, D, E, F, X, Y} We include X and Y because we ultimately want to find the shortest distance between those two.
Once you get the social graph of each friend (who are their friends, are they friends with each other, etc) then you can place them into a graph structure. I won't talk about how to do that - just going big-picture here.
One way to represent that is with an adjacency matrix:
A B C D E F X Y
A 1 0 0 0 0 0 1 0
B ...
C
D
E
F
X
Y
That is, look at each grid item. If the two people are friends, put a "1" in their cross-section, otherwise a "0".
Now apply a shortest-path algorithm to that data. You could use Dijkstra's Algorithm to accomplish this.
So: you need to have a little background on graphs, adjacency matrices, and shortest path algorithms. There might even be a Java library that does all this for you. Or even a PHP or R library. But at a high level, this is what you are trying to accomplish. I'm not even sure if the fbook API will give you all the data you need to solve this.
Best of luck!
What language should I choose?
Any language your are comfortable to use.
What libraries should I use? What general direction should I go in?
Try: BFS (queue) and DFS(Stack or recursive).

assignment problem with costs

i have a problem, which i'm stuck with, and cant find anywhere to start with, so i'm hopelessly turning to stackoverflow.
the problem wants us to find out if it is np-hard or polynomial, if its np-hard prove np-completeness, else give the algorithm.
the problem is as follows:
a product exists of n modules. there are two companies that can build each module, with some cost (c_ij, i: module number, j: company number). if modules a and b are built by different companies, they also have an additional cost, (p_ab). the modules a and b do not have to be successive, the same additional cost applies for a and c too. as expected, the problem wants us to find the assignment of modules to companies so that the total cost is minimum.
any ideas ?
It can be reduced to min cut problem, which can be found by any max flow algorithm.
So what's the network?
Modules will be vertecies of our graph and also we add 2 new vertices source and sink.
From source we add edge to every module i with capacity Ci1. Similarly from every module i we add edge to sink with capacity Ci2. Also for any modules i and j we add edge with capacity pij
(graph oriented thus there will be two edges (i j) and (j i)). It is easy to see that value of min cut is solution of the problem (modules in part of the cut with the source assign to the second company and rest modules to the first company)