Optimized search.How to reduce the complexity ? - facebook

Here is a problem I'm trying to solve using graph algorithms. Answer to this question is easy if one is familiar with different graph traversal algorithms. What I want to learn is how can we reduce the complexity of this problem?
Let say we have to traverse in someone's network - Friends, Friends of
Friends (FoF) and FoFoF (1st, 2nd, 3rd Degree.. up to 6th degree) to
search for a particular thing, say 'people living in California'. The
complexity of the problem greatly increases when you have 1000 friends
and your 1000 friends have 1000 friends each and so on.
Let's say we want to do an optimized search, where you know the
destination node (here, a person living in California). How will you
reduce the complexity of the problem?
The program you submit should return the degree by which that person
is connected to you. [where the 'destination node' is your Degree 1st
(Friend), or 2nd (friend of friend) or 3rd Degree (FoFoF) or a Degree
greater than 3rd degree].

Assuming your graph is unweighted, doing Breadth First Search will give you shortest paths (which effectively are the degrees that you need). If the destination is known you can also use Dijkstra's Algorithm to find a shortest path to that specific node, although if the graph is unweighted just doing the BFS will be more efficient as it's complexity is lower than Dijkstra's. Also if I understand correctly your output has to cover only 4 cases: Degrees 1,2,3 or higher than that. If so, you can just BFS the first three levels and store the results. Then you can answer the question in constant time by checking for the existence of such person in the data obtained via BFS.

Related

Understanding Titan Traversals

I am trying to write a highly scalable system with titandb. I have a situation where some nodes are highly connected.
Imagine the following example at much larger scale.
Now I have the following situations:
I want to find all the freinds of node X.
I want to find a specific friend of node X for example 5.
For scenario 1 I do: g.V(X).out(friend).toList(). For scenario 2 I do: g.V(X).out(friend).hasId(5).next(). Both of these traversals will work but scale poorly as X gets more friends. Can I optimise this situation by putting more information on the edge label ? For example if on the edge between X and 5 I change the label to freind_with_5 will the following be faster:
`g.V(X).out(freind_with_5).next()`
From my understanding this will be faster as only 1 edge will be traversed. However, if I make such a change to my edge labels how would I find all the friends of X ?
You could encode data into your edge label, but I would say that do that at the cost of complicating your graph schema considerably and, as you note, make it hard to do simple things like "find all my friends". I don't think you should take that approach.
The preferred method for dealing with this is with vertex-centric indices. If you denormalize any data to your edges, you should do it with those indices in mind (and not by encoding that data into the edge label). Put some unique identifier for the friend on the "friend" edge and index that.
If your supernodes are especially large (millions+ edges) you should also consider Titan's vertex partitioning feature.

Heuristics to find "surprising" mutual friends

I have the undirected friends graph of my Facebook friends, G i.e. G[i][j] = G[j][i] = true iff my friend i and friend j are friends with each other. I want to find "surprising" mutual friends i.e. pairs of my friends I normally would not expect to know each other. What are some good heuristics/algorithms I can apply? My initial idea is to run a clustering algorithm (not sure which one is the best) and see if I can find edges going across clusters. Any other ideas? What's a good clustering algorithm I can use that takes in a G and spits out clusters.
Here is my idea. Friendship is an edge. Surprising friendship is an edge, such that if you remove the edge, the distance between the two nodes becomes very large.
The answer by Wu Yongzheng can be tied to an existing network concept that is a robust and perhaps more sensitive measure, i.e. a quantitative take on the distance between the nodes becomes very large. This concept is edge betweenness. In this context one would compute an estimated version. See e.g. https://en.wikipedia.org/wiki/Betweenness_centrality and http://igraph.sourceforge.net/doc/R/betweenness.html.

How to generate recommendation with matrix factorization

I've read some papers of Matrix Factorization(Latent Factor Model) in Recommendation System,and I can implement the algorithm.I can get the similar RMSE result like the paper said on the MovieLens dataset.
However I find out that,if I try to generate a top-K(e.g K=10) recommended movies list for every user by rank the predicted rating,it seems that the movies that are thought to be rated high point of all users are the same.
Is that just what it works or I've got something wrong?
This is a known problem in recommendation.
It is sometimes called "Harry Potter" effect - (almost) everybody likes Harry Potter.
So most automated procedures will find out which items are generally popular, and recommend those to the users.
You can either filter out very popular items, or multiply the predicted rating by a factor that is lower the more globally popular an item is.

Calculation route length

I have a map with about 80 annotations. I would like to do 3 things.
1) From my current location, I would like to know the actual route distance to that position. Not the linear distance.
2) I want to be able to show a list of all the annotations, but for every annotation (having lon/lat) I would like to know the actual route distance from my position to that position.
3) I would like to know the closest annotation to my possition using route distance. Not linear distance.
I think the answer to all these three points will be the same. But please keep in mind that I don't want to create a route, I just want to know the distance to the annotation.
I hope someone can help me.
Best regards,
Paul Peelen
From what I understand of your post, I believe you seek the Haversine formula. Luckily for you, there are a number of Objective-C implementations, though writing your own is trivial once the formula's in front of you.
I originally deleted this because I didn't notice that you didn't want linear distance at first, but I'm bringing it back in case you decide that an approximation is good enough at that particular point of the user interaction.
I think as pointed out before, your query would be extremely heavy for google maps API if you perform exactly what you are saying. Do you need all that information at once ? Maybe first it would be good enough to query just some of the distances based on some heuristic or in the user needs.
To obtain the distances, you could use a Google Maps GDirections object... as pointed out here ( at the bottom of the page there's "Routes and Steps" section, with an advanced example.
"The GDirections object also supports multi-point directions, which can be constructed using the GDirections.loadFromWaypoints() method. This method takes an array of textual input addresses or textual lat/lon points. Each separate waypoint is computed as a separate route and returned in a separate GRoute object, each of which contains a series of GStep objects."
Using the Google Maps API in the iPhone shouldn't be too difficult, and I think your question doesn't cover that, but if you need some basic example, you could look at this question, and scroll to the answer.
Good Luck!
Calculating route distance to about 80 locations is certain to be computationally intensive on Google's part and I can't imagine that you would be able to make those requests to the Google Maps API, were it possible to do so on a mobile device, without being severely limited by either the phone connection or rate limits on the server.
Unfortunately, calculating route distance rather than geometric distance is a very expensive computation involving a lot of data about the area - data you almost certainly don't have. This means, unfortunately, that this isn't something that Core Location or MapKit can help you with.
What problem are you trying to solve, exactly? There may be other heuristics other than route distance you can use to approximate some sort of distance ranking.

Dijkstra algorithm for iPhone

It is possible to easily use the GPS functionality in the iPhone since sdk 3.0, but it is explicitly forbidden to use Google's Maps.
This has two implications, I think:
You will have to provide maps yourself
You will have to calculate the shortest routes yourself.
I know that calculating the shortest route has puzzled mathematicians for ages, but both Tom Tom and Google are doing a great job, so that issue seems to have been solved.
Searching on the 'net, not being a mathematician myself, I came across the Dijkstra Algorithm. Is there anyone of you who has successfully used this algorithm in a Maps-like app in the iPhone?
Would you be willing to share it with me/the community?
Would this be the right approach, or are the other options?
Thank you so much for your consideration.
I do not believe Dijkstra's algorithm would be useful for real-world mapping because, as Tom Leys said (I would comment on his post, but lack the rep to do so), it requires a single starting point. If the starting point changes, everything must be recalculated, and I would imagine this would be quite slow on a device like the iPhone for a significantly large data set.
Dijkstra's algorithm is for finding the shortest path to all nodes (from a single starting node). Game programmers use a directed search such as A*. Where Dijkstra processes the node that is closest to the starting position first, A* processes the one that is estimated to be nearest to the end position
The way this works is that you provide a cheap "estimate" function from any given position to the end point. A good example is how far a bird would fly to get there. A* adds this to the current distance from the start for each node and then chooses the node that seems to be on the shortest path.
The better your estimate, the shorter the time it will take to find a good path. If this time is still too long, you can do a path find on a simple map and then another on a more complex map to find the route between the places you found on the simple map.
Update
After much searching, I have found an article on A* for you to to read
Dijkstra's algorithm is O(m log n) for n nodes and m edges (for a single path) and is efficient enough to be used for network routing. This means that it's efficient enough to be used for a one-off computation.
Briefly, Dijkstra's algorithm works like:
Take the start node
Assign it a depth of zero
Insert it into a priority queue at its depth key
Repeat:
Pop the node with the lowest depth from the priority queue
Record the node that you came from so you can track the path back
Mark the node as having been visited
If this node is the destination:
Break
For each neighbour:
If the node has not previously been visited:
Calculate depth as depth of current node + distance to neighbour
Insert neighbour into the priority queue at the calculated depth.
Return the destination node and list of the nodes through which it was reached.
Contrary to popular belief, Dijkstra's algorithm is not necessarily an all-pairs shortest path calculator, although it can be adapted to do this.
You would have to get a graph of the streets and intersections with the distances between the intersections. If you had this data you could use Dijkstra's algorithm to compute a shortest route.
If you look at technology tomtom calls 'IQ routes', they measure actual speed and travel time per roadstretch per time of day. This makes the arrival time more accurate. So the expected arrival time is more fact-based http://www.tomtom.com/page/iq-routes
Calculating a route using the A* algorithm is plenty fast enough on an iPhone with offline map data. I have experience of doing this commercially. I use the A* algorithm as documented on Wikipedia, and I keep the road network in memory and re-use it; once it's loaded, routing even over a large area like Spain or the western half of Canada is practically instant.
I take data from OpenStreetMap or elswhere and convert it into a directed graph, assuming (which is the right way to do it according to those who know) that any two roads sharing a point with the same ID are joined. I assign weights to different types of roads based on expected speeds, and if a portion of a road is one-way I create only a single arc; two-way roads get two arcs, one in each direction. That's pretty much the whole thing apart from some ad-hoc code to prevent dangerous turns, and implementing routing restrictions.
This was discussed earlier here: What algorithms compute directions from point a to point b on a map?
Have a look at CloudMade. They offer a free service for iPhone and iPad that allows navigation based on your current location. It is built on open street maps and has some nifty features like making your own mapstyle. It is a little slow from time to time but its totally free.