Personalized Page Rank - recommendation-engine

I have been trying to wrap me head around the personalized page rank algorithm and how it works. I came across this paper which gives this graph:see link to image below with weights calculated by PPR. I am have trouble reproducing the calculations with the models they give.
Can anyone break it down for me to help me wrap me head around the concept?
Thanks!

The paper is a good reference to personalized page rank. Basically my understanding, ppr scores tell you the probability from the source node move to the target node. It is a specific score describe the relationship between specific source and target nodes in the graph.
If you have problem to reproduce the results, you can use networkx in python, load a graph and compute ppr using
networkx.pagerank(graph, personalization={'a':0, 's':1, 'b':0....})
Networkx use power iteration approach to compute ppr, you can get exact result as what shown in the example.
The author of this thesis have c++ code here https://github.com/snap-stanford/snap/blob/master/snap-core/randwalk.h Since this method is random walk based approach, you could not get exactly same results as what shown in the example, but the rank is correct.
Hope that helps.

Related

pgRouting with custom network?

I have a cost network, but it's not a street mapping network. I know the nodes and edges as I defined them. pgRouting looks like a good choice, but every single example I can find uses Open Street Map as the data. I don't have GPS coordinates. The x1,y1 for nodes makes no sense in my graphs, my nodes have specific ids, not coordinates. The costs aren't calculated from the coordinates, they're assigned by me on the various edges based on domain knowledge specific to my domain.
Are there any examples of how to create a custom network in pgRouting? I'm really struggling because the examples are "and then you use this tool to import OSM data"...which doesn't help me at all.
#Chris Kessel
I don't know if this is still relevant, but it may help others:
Basically, what you need to have is a table with edges, where in column 'source' is the id of a node on one end of the edge and in column 'target' - id of the node on the other end. You also have to have a defined cost for the edge, I'm not sure what this will be for you - usually it's distance or time units.
Ususally this is done with geo info using pgr_createTopology function, but in your case you will need to just create this yourself, I suppose.
I think this link can help you:
https://anitagraser.com/2011/02/07/a-beginners-guide-to-pgrouting/
The answer to the question "Are there any examples of how to create a custom network in pgRouting?" is Yes there are.

Multiple object tracking using radar data and extended kalman filter

thanks in advance.
I am new to the multiple object tracking field. So, I have been working on this for a couple of days. I have developed my first version of a single object tracker using an extended Kalman filter. I am estimating position, velocity by assuming a constant acceleration model. Now my question is how can I convert the existing model for multiple objects tracking. The main problem is I am using radar data. So, I am not able to get the references for developing the tracker. So, One good example or steps to achieve can help me in understanding the concept.
The answer to this question depends on a lot of things. For example, how much control and knowledge do you have over the whole system? If you know how many targets you need to track you can add all of them to the Kalman Filter state and for every measurement you perform data association to find out to which object a given measurement belongs. An easy association metric would be nearest neighbor.
If you don't know how many targets there will be you will want to implement a track management where each target you are tracking represents a track and you can model birth and death probabilities of targets.
Multi Target Tracking is a vast field and if you want to have an in-depth mathematical introduction I would recommend the 2015 survey paper "Multitarget Tracking" by Ba-Ngu Vo et al. You should be able to find a preprint pdf online.
If you are looking more for a lightweight tutorial I would assume it should be possible to find some tutorial or example code online where to start. As mentioned in the first paragraph, nearest neighbor association for a fixed amount of objects might be a good first step.

Grouping similar words (bad , worse )

I know there are ways to find synonyms either by using NLTK/pywordnet or Pattern package in python but it isn't solving my problem.
If there are words like
bad,worst,poor
bag,baggage
lost,lose,misplace
I am not able to capture them. Can anyone suggest me a possible way?
There have been numerous research in this area in past 20 years. Yes computers don't understand language but we can train them to find similarity or difference in two words with the help of some manual effort.
Approaches may be:
Based on manually curated datasets that contain how words in a language are related to each other.
Based on statistical or probabilistic measures of words appearing in a corpus.
Method 1:
Try Wordnet. It is a human-curated network of words which preserves the relationship between words according to human understanding. In short, it is a graph with nodes as something called 'synsets' and edges as relations between them. So any two words which are very close to each other are close in meaning. Words that fall within the same synset might mean exactly the same. Bag and Baggage are close - which you can find either by iteratively exploring node-to-node in a breadth first style - like starting with 'baggage', exploring its neighbors in an attempt to find 'baggage'. You'll have to limit this search upto a small number of iterations for any practical application. Another style is starting a random walk from a node and trying to reach the other node within a number of tries and distance. It you reach baggage from bag say, 500 times out of 1000 within 10 moves, you can be pretty sure that they are very similar to each other. Random walk is more helpful in much larger and complex graphs.
There are many other similar resources online.
Method 2:
Word2Vec. Hard to explain it here but it works by creating a vector of a user's suggested number of dimensions based on its context in the text. There has been an idea for two decades that words in similar context mean the same. e.g. I'm gonna check out my bags and I'm gonna check out my baggage both might appear in text. You can read the paper for explanation (link in the end).
So you can train a Word2Vec model over a large amount of corpus. In the end, you will be able to get 'vector' for each word. You do not need to understand the significance of this vector. You can this vector representation to find similarity or difference between words, or generate synonyms of any word. The idea is that words which are similar to each other have vectors close to each other.
Word2vec came up two years ago and immediately became the 'thing-to-use' in most of NLP applications. The quality of this approach depends on amount and quality of your data. Generally Wikipedia dump is considered good training data for training as it contains articles about almost everything that makes sense. You can easily find ready-to-use models trained on Wikipedia online.
A tiny example from Radim's website:
>>> model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)
[('queen', 0.50882536)]
>>> model.doesnt_match("breakfast cereal dinner lunch".split())
'cereal'
>>> model.similarity('woman', 'man')
0.73723527
First example tells you the closest word (topn=1) to words woman and king but meanwhile also most away from the word man. The answer is queen.. Second example is odd one out. Third one tells you how similar the two words are, in your corpus.
Easy to use tool for Word2vec :
https://radimrehurek.com/gensim/models/word2vec.html
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf (Warning : Lots of Maths Ahead)

How to generate recommendation with matrix factorization

I've read some papers of Matrix Factorization(Latent Factor Model) in Recommendation System,and I can implement the algorithm.I can get the similar RMSE result like the paper said on the MovieLens dataset.
However I find out that,if I try to generate a top-K(e.g K=10) recommended movies list for every user by rank the predicted rating,it seems that the movies that are thought to be rated high point of all users are the same.
Is that just what it works or I've got something wrong?
This is a known problem in recommendation.
It is sometimes called "Harry Potter" effect - (almost) everybody likes Harry Potter.
So most automated procedures will find out which items are generally popular, and recommend those to the users.
You can either filter out very popular items, or multiply the predicted rating by a factor that is lower the more globally popular an item is.

Dijkstra algorithm for iPhone

It is possible to easily use the GPS functionality in the iPhone since sdk 3.0, but it is explicitly forbidden to use Google's Maps.
This has two implications, I think:
You will have to provide maps yourself
You will have to calculate the shortest routes yourself.
I know that calculating the shortest route has puzzled mathematicians for ages, but both Tom Tom and Google are doing a great job, so that issue seems to have been solved.
Searching on the 'net, not being a mathematician myself, I came across the Dijkstra Algorithm. Is there anyone of you who has successfully used this algorithm in a Maps-like app in the iPhone?
Would you be willing to share it with me/the community?
Would this be the right approach, or are the other options?
Thank you so much for your consideration.
I do not believe Dijkstra's algorithm would be useful for real-world mapping because, as Tom Leys said (I would comment on his post, but lack the rep to do so), it requires a single starting point. If the starting point changes, everything must be recalculated, and I would imagine this would be quite slow on a device like the iPhone for a significantly large data set.
Dijkstra's algorithm is for finding the shortest path to all nodes (from a single starting node). Game programmers use a directed search such as A*. Where Dijkstra processes the node that is closest to the starting position first, A* processes the one that is estimated to be nearest to the end position
The way this works is that you provide a cheap "estimate" function from any given position to the end point. A good example is how far a bird would fly to get there. A* adds this to the current distance from the start for each node and then chooses the node that seems to be on the shortest path.
The better your estimate, the shorter the time it will take to find a good path. If this time is still too long, you can do a path find on a simple map and then another on a more complex map to find the route between the places you found on the simple map.
Update
After much searching, I have found an article on A* for you to to read
Dijkstra's algorithm is O(m log n) for n nodes and m edges (for a single path) and is efficient enough to be used for network routing. This means that it's efficient enough to be used for a one-off computation.
Briefly, Dijkstra's algorithm works like:
Take the start node
Assign it a depth of zero
Insert it into a priority queue at its depth key
Repeat:
Pop the node with the lowest depth from the priority queue
Record the node that you came from so you can track the path back
Mark the node as having been visited
If this node is the destination:
Break
For each neighbour:
If the node has not previously been visited:
Calculate depth as depth of current node + distance to neighbour
Insert neighbour into the priority queue at the calculated depth.
Return the destination node and list of the nodes through which it was reached.
Contrary to popular belief, Dijkstra's algorithm is not necessarily an all-pairs shortest path calculator, although it can be adapted to do this.
You would have to get a graph of the streets and intersections with the distances between the intersections. If you had this data you could use Dijkstra's algorithm to compute a shortest route.
If you look at technology tomtom calls 'IQ routes', they measure actual speed and travel time per roadstretch per time of day. This makes the arrival time more accurate. So the expected arrival time is more fact-based http://www.tomtom.com/page/iq-routes
Calculating a route using the A* algorithm is plenty fast enough on an iPhone with offline map data. I have experience of doing this commercially. I use the A* algorithm as documented on Wikipedia, and I keep the road network in memory and re-use it; once it's loaded, routing even over a large area like Spain or the western half of Canada is practically instant.
I take data from OpenStreetMap or elswhere and convert it into a directed graph, assuming (which is the right way to do it according to those who know) that any two roads sharing a point with the same ID are joined. I assign weights to different types of roads based on expected speeds, and if a portion of a road is one-way I create only a single arc; two-way roads get two arcs, one in each direction. That's pretty much the whole thing apart from some ad-hoc code to prevent dangerous turns, and implementing routing restrictions.
This was discussed earlier here: What algorithms compute directions from point a to point b on a map?
Have a look at CloudMade. They offer a free service for iPhone and iPad that allows navigation based on your current location. It is built on open street maps and has some nifty features like making your own mapstyle. It is a little slow from time to time but its totally free.