From Discrete Mathematics - Graph theory - discrete-mathematics

Describe a graph model that represents whether each person at a party knows the name of each other person at the party. Should the edges be directed or undirected? Should multiple edges be allowed? Should loops be allowed?

The edges should be directed because it's possible that John knows Mary's name, but Mary does not know John's. This could happen if John is a private citizen in a town and Mary is the mayor of that town.
Multiple edges should not be allowed from one person to another, since a person either knows the other person, or not. The presence or absence of an arc is sufficient to represent this either/or reality. This also suggests that the graph need not be weighted.
Cycles of any length, including length one, should be allowed. Presumably, most individuals know their own names, though this is not necessarily guaranteed (consider a party-goer with total retrograde amnesia). So, many individuals will have self-loops. Also, it's likely pairs of people know each other's name, so loops between pairs of individuals are likely. Larger cycles are also likely.

Related

Methods for searching for people with similar purchasing habits in big data with a given person as the base

I'm looking at finding people with similar purchasing behaviors with a given person or group as a starting point for a market research problem.
I'm going to use vectors and represent every person and their habits as a vector and then compare these vectors to return the base person or group. I'd probably use Faiss. I believe KNN can be used too.
But I'm looking to see is if I can use other methods such as clustering methods like k-means clustering for such a question, and with the presence of a given person or group as the base. I thought the only way clustering algs would work is to first cluster the data, then return the group that the 'base person or group' falls into. However, this would be costly and probably not very accurate. But potentially this technique can be used to reduce the search space.
So, do you know of any other ways? (non-Machine Learning or Information Retrieval methods would be welcomed too :) )

Schema for golf tournament scores

I work for a small non-profit organization and we are looking for a way to quickly tally scores and provide a few statistics for donors at our annual golf tournament. I thought this would be fairly easy, but I'm struggling to come up with a database schema to capture the scores. I can't figure out how the player's score relates to the specific hole on the course.
This is the diagram that I have so far. Am I way off base with this?
The Schema can be found here: https://app.quickdatabasediagrams.com/#/schema/forneGJp40inm7rWlf2Sbg
Perhaps make the Scores table a m:n join table between Players and Holes to capture each player's score on each hole. This is depicted on the diagram below. To get the score for a round you'd sum all scores for all holes with a specific CourseId, for a specific event.
I also denormalised it a little, adding a total score to the Rounds table. This means that you don't need to SUM() the individual scores every time to get the tallies for each Player's Round. That's just a suggestion for performance optimisation.
Source: https://app.quickdatabasediagrams.com/#/schema/x_amshIckkeGp8KAKEAmLQ
If it is possible to play the same course twice in the same event (the first and last matches could be on the same course, for example) then you should provide for that.
I have two other suggestions:
I think the relationship between Events and Venues is in the wrong direction.
I suggest splitting Players into two tables. One representing the human being, and the other representing the human's participation in the round. Perhaps "Person" and "Contestant" would be good names.

Optimized search.How to reduce the complexity ?

Here is a problem I'm trying to solve using graph algorithms. Answer to this question is easy if one is familiar with different graph traversal algorithms. What I want to learn is how can we reduce the complexity of this problem?
Let say we have to traverse in someone's network - Friends, Friends of
Friends (FoF) and FoFoF (1st, 2nd, 3rd Degree.. up to 6th degree) to
search for a particular thing, say 'people living in California'. The
complexity of the problem greatly increases when you have 1000 friends
and your 1000 friends have 1000 friends each and so on.
Let's say we want to do an optimized search, where you know the
destination node (here, a person living in California). How will you
reduce the complexity of the problem?
The program you submit should return the degree by which that person
is connected to you. [where the 'destination node' is your Degree 1st
(Friend), or 2nd (friend of friend) or 3rd Degree (FoFoF) or a Degree
greater than 3rd degree].
Assuming your graph is unweighted, doing Breadth First Search will give you shortest paths (which effectively are the degrees that you need). If the destination is known you can also use Dijkstra's Algorithm to find a shortest path to that specific node, although if the graph is unweighted just doing the BFS will be more efficient as it's complexity is lower than Dijkstra's. Also if I understand correctly your output has to cover only 4 cases: Degrees 1,2,3 or higher than that. If so, you can just BFS the first three levels and store the results. Then you can answer the question in constant time by checking for the existence of such person in the data obtained via BFS.

Heuristics to find "surprising" mutual friends

I have the undirected friends graph of my Facebook friends, G i.e. G[i][j] = G[j][i] = true iff my friend i and friend j are friends with each other. I want to find "surprising" mutual friends i.e. pairs of my friends I normally would not expect to know each other. What are some good heuristics/algorithms I can apply? My initial idea is to run a clustering algorithm (not sure which one is the best) and see if I can find edges going across clusters. Any other ideas? What's a good clustering algorithm I can use that takes in a G and spits out clusters.
Here is my idea. Friendship is an edge. Surprising friendship is an edge, such that if you remove the edge, the distance between the two nodes becomes very large.
The answer by Wu Yongzheng can be tied to an existing network concept that is a robust and perhaps more sensitive measure, i.e. a quantitative take on the distance between the nodes becomes very large. This concept is edge betweenness. In this context one would compute an estimated version. See e.g. https://en.wikipedia.org/wiki/Betweenness_centrality and http://igraph.sourceforge.net/doc/R/betweenness.html.

Facebook Programming Challenge - ByteLand

ByteLand 
Byteland consists of N cities numbered 1..N. There are M roads connecting some pairs of cities. There are two army divisions, A and B, which protect the kingdom. Each city is either protected by army division A or by army division B.
 
You are the ruler of an enemy kingdom and have devised a plan to destroy Byteland. Your plan is to destroy all the roads in Byteland disrupting all communication. If you attack any road, the armies from both the cities that the road connects comes for its defense. You realize that your attack will fail if there are soldiers from both armies A and B defending any road.
 
So you decide that before carrying out this plan, you will attack some of the cities and defeat the army located in the city to make your plan possible. However, this is considerably more difficult. You have estimated that defeating the army located in city i will take up ci amount of resources. Your aim now is to decide which cities to attack so that your cost is minimum and no road should be protected from both armies A and B.
 
----Please tell me if this approach is correct----
We need to sort the cities in terms of resources required to destroy the city. For each city we need to ask the following questions:
1) Did deletion of the previous city NOT result into a state which can destroy Byteland?
2) Does it connect any road?
3) Does it connect any road which is armed by a different city?
If all of these conditions are true, we'll proceed towards destroying the city and record the total cost incurred so far and also determine if destruction of this city will lead to overall destruction of Byteland.
Since the cities are arranged in increasing order of the cost incurred, we can stop wherever we find the desired set of deletions.
You need only care about roads that link two cities with different armies - links between A and B or links between B and A, so let's delete all links from A to A or B to B.
You want to find a set of points such that each link has at least one point on it, which is a minimum weight vertex cover. On an arbitrary graph this would be NP-complete. However, your graph only ever has nodes of type A linked to nodes of type B, or the reverse - it is a bipartite graph with these two types of nodes as the two parties. So you can find a minimum weight vertex cover by using an algorithm for finding minimum weight vertex covers on bipartite graphs. Searching for this, I find e.g. http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-854j-advanced-algorithms-fall-2008/assignments/sol5.pdf
mcdowella,
But the vertices have a cost to them and the minimum vertex cover would not produce the right vertices to remove. Imagine 2 vertices (A army) pointing to the third one (B). First two vertices cost 1 each, where the third one costs 5. A minimum vertex cover would return the third one - but removing the third one costs more than removing both nodes with cost 1 + 1.
We would probably need some modified version of a minimum vertex cover here.