Finding all shortest paths between two nodes in NetworkX - networkx

Assume I have a BA network with N nodes where each node has at least 2 edges. The network is unweighted. I am trying to find all the shortest paths between every node i and j for all nodes in the network. But if there are more than 1 shortest path between node i and j, then I need every single shortest path between i and j.
So if node 2 can be reached from 0 by using the paths [0,1,2], [0,3,4,2], [0,3,4,5,2], [0,4,5,2] and [0,3,2], I need a list that says [[0,1,2], [0,3,2]].
Is the only way of doing this is calculating each path from i to j and getting the smallest lenghted lists? Can this be founded in a more efficient way?
Edit: Apparently there's a path finding method called all_shortest_paths. I will try this and see if it is efficient.

You can use nx.nx.all_shortest_paths for this:
all_shortest_paths(G, source, target, weight=None, method='dijkstra')
Which allows you to specify a source and target nodes. Here's a simple example:
plt.figure(figsize=(6,4))
G = nx.from_edgelist([[1,2],[2,3],[7,8],[3,8],[1,8], [2,9],[9,0],[0,7]])
nx.draw(G, with_labels=True, node_color='lightgreen')
list(nx.all_shortest_paths(G, 2, 8))
# [[2, 1, 8], [2, 3, 8]]

Floyd Warshall algorithm is that what you need

Related

shortest path algorithm without loading graph in memory

https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.generic.all_shortest_paths.html#networkx.algorithms.shortest_paths.generic.all_shortest_paths
I need to find all shortest paths given the source and target nodes.
networkx has a function to compute all shortest paths. But it requires the construction of the whole graph first.
In many cases, the shortest paths can be simple. For example, if the input is a TSV with each edge in a row, and the target node and the source node already has an edge between them, there is no need to construction such a graph in networkx first.
Is there an efficient algorithm to find all shortest paths so that the graph is constructed only when it is needed?
EDIT:
For start a and end d.
Input:
a a
a b
b c
c d
b 1
1 d
Output:
a b 1 d
a b c d

Coloring the cluster with same colors as defined for ground truth for visualization

Example: (Consider the platform = MATLAB)
Ground_Truth_Indices = [ 1, 1, 1, 2, 2, 2, 3, 3, 3];
For each unique index in the GT, I have defined a color array.
Color_Array = [ 0, 255, 0; 255, 0, 0; 0, 0, 255]; %assuming (in this eg.) the max. cluster size is 3
Next, I use a clustering algorithm (DBSCAN in my case) and it gives the following indices:
Clustered_Indices = [2, 2, 2, 3, 3, 3, 1, 1, 1];
Now, I need to visualize the results alongside the ground truth.
But the obtained indices, after clustering, are different from the ground truth indices.
Thus, according to the color array defined, I would not get the same pattern of colors for ground truth and obtained clusters during visualization. Is there any solution so that I could make both the colorings consistent?
Figure with ground truth and obtained clusters
The same is illustrated in the above link to the figure (not a MatLab plot! Created for the purpose of illustration), where the Cluster 1 should have the same color in the ground truth as well as the obtained cluster results. But, it is not the case here because of the index number associated with colour array defined.
Note: The indices obtained after the clustering cant be predefined and depends on the clustering algorithm and clustering input.
You can use the Kuhn-Munkres maximum matching (Hungarian Algorithm) to find the best 1:1 alignment of the cluster labels.
As the generated clustering may have a different number of clusters, you'll need a robust implementation that can find alignments in non-square matrixes.
Butyou may be more interested in visualizing the differences between the clusterings. I've seen this in the following paper, but I am not sure if this is usable beyond toy data sets:
Evaluation of Clusterings -- Metrics and Visual Support
Elke Achtert, Sascha Goldhofer, +2 authors Arthur Zimek
Published in IEEE 28th International… 2012
DOI:10.1109/ICDE.2012.128
(Sorry for the incomplete reference, blame semantic scholar, but that was easiest to link a figure from the paper, I can't take a better screenshot on this device).
This seems to visualize the differences between a k-means and an EM clustering, where grey points are those where they agree on the clustering. This approach seems to work on pairs of points, just as the evaluation measures.
Inspired from the answer to this post:
How can I match up cluster labels to my 'ground truth' labels in Matlab, I have the following solution code for my question:
N = length(Ground_Truth_Indices);
cluster_names = unique(Clustered_Indices);
accuracy = 0;
maxInd = 1;
perm = perms(unique(Ground_Truth_Indices));
[perm_nrows perm_ncols] = size(perm);
true_labels = Ground_Truth_Indices;
for i=1:perm_nrows
flipped_labels = zeros(1,N);
for cl = 1 : perm_ncol
flipped_labels(Clustered_Indices==cluster_names(cl)) = perm(i,cl);
end
testAcc = sum(flipped_labels == Ground_Truth_Indices')/N;
if testAcc > accuracy
accuracy = testAcc;
maxInd = i;
true_labels = flipped_labels;
end
end
where 'true_labels' contains the re-arranged labels for the variable 'Clustered_Indices' in accordance with the variable 'Ground_Truth_Indices'.
This code as explained in the original post uses permutation-based
matching (It works well for the example which I had given in this post. I also tested with other variations). But, when the size of the cluster becomes large this code
does not work well. What do think about this code? Is there a better
way to write it? Or optimize it?

Will Dijkstra ever a path with cycle?

Note: There is no negative cost.
I am considering to implement U-turn in routing, which uses Dijkstra.
Will Dijkstra ever recommend route A-B-C-B-D over A-B-D? When encountering B for the first time, B is marked as visited after visiting its neighbours, thus cycle from B-C-B will never be considered
In that case, Dijkstra never recommends cycles in the result?
It's task is to find the shortest (lowest costs) path ...
There will be no cycle in case the edge weight is greater than zero
on edge weights equal to zero it could happen but makes no sence in your case
TL;DR - It is not possible unless the cost of each edge on the cycle is 0. Otherwise, including the cycle in the shortest path would add unnecessary cost to the shortest path (meaning it would no longer be the shortest path).
Background:
Dijkstra's operates by maintaining two sets of vertices. One set is the vertices that have already been marked and the other set is the vertices that have yet to be marked. Given these two sets, Dijkstra's algorithm looks for the next cheapest element to add to the list of marked vertices and then updates the shortest paths to unmarked vertices.
In the case that A-B-C have been marked and the next edge added is C->B, B would be reached twice and the cost to get to B from A with the cycle included is [x + p + q]. However, the cost of getting to B from A without the cycle would obviously be [x]. Now the shortest path from A to D with the cycle is [x + p + q + r], while the shortest path without the cycle would be [x + r]. If p and q are both greater than 0, we see the path without the cycle will be shorter.
In the general case (with positive costs of edges), a cycle will never be included because the shortest path would contain unnecessary extra cost to get back to the starting point of the cycle.
If the U-turn is actually the shortest path:
For Dijkstra's to work for a necessary U-turn, you could just start the algorithm over from C and search for the shortest path to D (hence the recalculating notification when routing). Another solution could be to modify the underlying graph ahead of time. For example, the path A-B-C-B-D would become A-B-C-Z-D. Alternatively, the edge from C->B and the edge from B->D could both be removed and replaced with a single edge from C->D.

Shortest Route in a Matrix

I want to find the shortest path in this for an NXN MATRIX like the 3X3 matrix below. starting at any row of column 1 and ends at any row of column 3. The shortest path in the matrix A below is 1, 3, 2, 4.
A = [1 3 9;
4 2 4;
5 4 9];
The standard way is to first represent your problem as a graph. In your case, if you treat each cell in your matrix as a vertex, there is an edge from that vertex to the cell on the right, the cell above, and the cell below (any one of which may not exist because they fall off the edge of the matrix). Backtracking to a previous column cannot contribute to a shortest path, so we don't include those edges. If you did include them, the answer would come out the same, it just might take a fraction longer.
Using index numbering, then, we have the following edges and weights:
i j s
[1,2]=4
[1,4]=3
[2,1]=1
[2,3]=5
[2,5]=2
[3,2]=4
[3,6]=4
[4,5]=2
[4,7]=9
[5,4]=3
[5,6]=4
[5,8]=4
[6,5]=2
[6,9]=9
[7,8]=4
[8,7]=9
[8,9]=9
[9,8]=4
(The labels i,j and s are used below.) But this doesn't account for the initial cost of moving to the first column, so we add a new node that has edges to each of those nodes:
i j s
[10,1]=1
[10,2]=4
[10,3]=3
Unfortunately, I haven't found a clever way of doing this, but it's fairly straightforward and once you're finished you have the 3 column vectors you need to create a sparse matrix using:
S = sparse(i,j,s,m,n,nzmax)
where
i,j,s are column vectors as labeled from the edge/weight list above
m = n = numel(A)+1
nzmax = numel(i) = numel(j) = numel(s)
From there you would use Dijkstra's Algorithm to find the single-source shortest path starting from the new node 10. Your answer would be the node on the rightmost column (node 7, 8 or 9, in this case) with the smallest path distance.
If you have the bioinformatics toolbox, you can use shortestpath. If you don't, there are several implementations of Dijkstra's on File Exchange.

Dijkstra's algorithm with negative weights

Can we use Dijkstra's algorithm with negative weights?
STOP! Before you think "lol nub you can just endlessly hop between two points and get an infinitely cheap path", I'm more thinking of one-way paths.
An application for this would be a mountainous terrain with points on it. Obviously going from high to low doesn't take energy, in fact, it generates energy (thus a negative path weight)! But going back again just wouldn't work that way, unless you are Chuck Norris.
I was thinking of incrementing the weight of all points until they are non-negative, but I'm not sure whether that will work.
As long as the graph does not contain a negative cycle (a directed cycle whose edge weights have a negative sum), it will have a shortest path between any two points, but Dijkstra's algorithm is not designed to find them. The best-known algorithm for finding single-source shortest paths in a directed graph with negative edge weights is the Bellman-Ford algorithm. This comes at a cost, however: Bellman-Ford requires O(|V|·|E|) time, while Dijkstra's requires O(|E| + |V|log|V|) time, which is asymptotically faster for both sparse graphs (where E is O(|V|)) and dense graphs (where E is O(|V|^2)).
In your example of a mountainous terrain (necessarily a directed graph, since going up and down an incline have different weights) there is no possibility of a negative cycle, since this would imply leaving a point and then returning to it with a net energy gain - which could be used to create a perpetual motion machine.
Increasing all the weights by a constant value so that they are non-negative will not work. To see this, consider the graph where there are two paths from A to B, one traversing a single edge of length 2, and one traversing edges of length 1, 1, and -2. The second path is shorter, but if you increase all edge weights by 2, the first path now has length 4, and the second path has length 6, reversing the shortest paths. This tactic will only work if all possible paths between the two points use the same number of edges.
If you read the proof of optimality, one of the assumptions made is that all the weights are non-negative. So, no. As Bart recommends, use Bellman-Ford if there are no negative cycles in your graph.
You have to understand that a negative edge isn't just a negative number --- it implies a reduction in the cost of the path. If you add a negative edge to your path, you have reduced the cost of the path --- if you increment the weights so that this edge is now non-negative, it does not have that reducing property anymore and thus this is a different graph.
I encourage you to read the proof of optimality --- there you will see that the assumption that adding an edge to an existing path can only increase (or not affect) the cost of the path is critical.
You can use Dijkstra's on a negative weighted graph but you first have to find the proper offset for each Vertex. That is essentially what Johnson's algorithm does. But that would be overkill since Johnson's uses Bellman-Ford to find the weight offset(s). Johnson's is designed to all shortest paths between pairs of Vertices.
http://en.wikipedia.org/wiki/Johnson%27s_algorithm
There is actually an algorithm which uses Dijkstra's algorithm in a negative path environment; it does so by removing all the negative edges and rebalancing the graph first. This algorithm is called 'Johnson's Algorithm'.
The way it works is by adding a new node (lets say Q) which has 0 cost to traverse to every other node in the graph. It then runs Bellman-Ford on the graph from point Q, getting a cost for each node with respect to Q which we will call q[x], which will either be 0 or a negative number (as it used one of the negative paths).
E.g. a -> -3 -> b, therefore if we add a node Q which has 0 cost to all of these nodes, then q[a] = 0, q[b] = -3.
We then rebalance out the edges using the formula: weight + q[source] - q[destination], so the new weight of a->b is -3 + 0 - (-3) = 0. We do this for all other edges in the graph, then remove Q and its outgoing edges and voila! We now have a rebalanced graph with no negative edges to which we can run dijkstra's on!
The running time is O(nm) [bellman-ford] + n x O(m log n) [n Dijkstra's] + O(n^2) [weight computation] = O (nm log n) time
More info: http://joonki-jeong.blogspot.co.uk/2013/01/johnsons-algorithm.html
Actually I think it'll work to modify the edge weights. Not with an offset but with a factor. Assume instead of measuring the distance you are measuring the time required from point A to B.
weight = time = distance / velocity
You could even adapt velocity depending on the slope to use the physical one if your task is for real mountains and car/bike.
Yes, you could do that with adding one step at the end i.e.
If v ∈ Q, Then Decrease-Key(Q, v, v.d)
Else Insert(Q, v) and S = S \ {v}.
An expression tree is a binary tree in which all leaves are operands (constants or variables), and the non-leaf nodes are binary operators (+, -, /, *, ^). Implement this tree to model polynomials with the basic methods of the tree including the following:
A function that calculates the first derivative of a polynomial.
Evaluate a polynomial for a given value of x.
[20] Use the following rules for the derivative: Derivative(constant) = 0 Derivative(x) = 1 Derivative(P(x) + Q(y)) = Derivative(P(x)) + Derivative(Q(y)) Derivative(P(x) - Q(y)) = Derivative(P(x)) - Derivative(Q(y)) Derivative(P(x) * Q(y)) = P(x)*Derivative(Q(y)) + Q(x)*Derivative(P(x)) Derivative(P(x) / Q(y)) = P(x)*Derivative(Q(y)) - Q(x)*Derivative(P(x)) Derivative(P(x) ^ Q(y)) = Q(y) * (P(x) ^(Q(y) - 1)) * Derivative(Q(y))