Dijkstra's algorithm with negative weights - dijkstra

Can we use Dijkstra's algorithm with negative weights?
STOP! Before you think "lol nub you can just endlessly hop between two points and get an infinitely cheap path", I'm more thinking of one-way paths.
An application for this would be a mountainous terrain with points on it. Obviously going from high to low doesn't take energy, in fact, it generates energy (thus a negative path weight)! But going back again just wouldn't work that way, unless you are Chuck Norris.
I was thinking of incrementing the weight of all points until they are non-negative, but I'm not sure whether that will work.

As long as the graph does not contain a negative cycle (a directed cycle whose edge weights have a negative sum), it will have a shortest path between any two points, but Dijkstra's algorithm is not designed to find them. The best-known algorithm for finding single-source shortest paths in a directed graph with negative edge weights is the Bellman-Ford algorithm. This comes at a cost, however: Bellman-Ford requires O(|V|·|E|) time, while Dijkstra's requires O(|E| + |V|log|V|) time, which is asymptotically faster for both sparse graphs (where E is O(|V|)) and dense graphs (where E is O(|V|^2)).
In your example of a mountainous terrain (necessarily a directed graph, since going up and down an incline have different weights) there is no possibility of a negative cycle, since this would imply leaving a point and then returning to it with a net energy gain - which could be used to create a perpetual motion machine.
Increasing all the weights by a constant value so that they are non-negative will not work. To see this, consider the graph where there are two paths from A to B, one traversing a single edge of length 2, and one traversing edges of length 1, 1, and -2. The second path is shorter, but if you increase all edge weights by 2, the first path now has length 4, and the second path has length 6, reversing the shortest paths. This tactic will only work if all possible paths between the two points use the same number of edges.

If you read the proof of optimality, one of the assumptions made is that all the weights are non-negative. So, no. As Bart recommends, use Bellman-Ford if there are no negative cycles in your graph.
You have to understand that a negative edge isn't just a negative number --- it implies a reduction in the cost of the path. If you add a negative edge to your path, you have reduced the cost of the path --- if you increment the weights so that this edge is now non-negative, it does not have that reducing property anymore and thus this is a different graph.
I encourage you to read the proof of optimality --- there you will see that the assumption that adding an edge to an existing path can only increase (or not affect) the cost of the path is critical.

You can use Dijkstra's on a negative weighted graph but you first have to find the proper offset for each Vertex. That is essentially what Johnson's algorithm does. But that would be overkill since Johnson's uses Bellman-Ford to find the weight offset(s). Johnson's is designed to all shortest paths between pairs of Vertices.
http://en.wikipedia.org/wiki/Johnson%27s_algorithm

There is actually an algorithm which uses Dijkstra's algorithm in a negative path environment; it does so by removing all the negative edges and rebalancing the graph first. This algorithm is called 'Johnson's Algorithm'.
The way it works is by adding a new node (lets say Q) which has 0 cost to traverse to every other node in the graph. It then runs Bellman-Ford on the graph from point Q, getting a cost for each node with respect to Q which we will call q[x], which will either be 0 or a negative number (as it used one of the negative paths).
E.g. a -> -3 -> b, therefore if we add a node Q which has 0 cost to all of these nodes, then q[a] = 0, q[b] = -3.
We then rebalance out the edges using the formula: weight + q[source] - q[destination], so the new weight of a->b is -3 + 0 - (-3) = 0. We do this for all other edges in the graph, then remove Q and its outgoing edges and voila! We now have a rebalanced graph with no negative edges to which we can run dijkstra's on!
The running time is O(nm) [bellman-ford] + n x O(m log n) [n Dijkstra's] + O(n^2) [weight computation] = O (nm log n) time
More info: http://joonki-jeong.blogspot.co.uk/2013/01/johnsons-algorithm.html

Actually I think it'll work to modify the edge weights. Not with an offset but with a factor. Assume instead of measuring the distance you are measuring the time required from point A to B.
weight = time = distance / velocity
You could even adapt velocity depending on the slope to use the physical one if your task is for real mountains and car/bike.

Yes, you could do that with adding one step at the end i.e.
If v ∈ Q, Then Decrease-Key(Q, v, v.d)
Else Insert(Q, v) and S = S \ {v}.

An expression tree is a binary tree in which all leaves are operands (constants or variables), and the non-leaf nodes are binary operators (+, -, /, *, ^). Implement this tree to model polynomials with the basic methods of the tree including the following:
A function that calculates the first derivative of a polynomial.
Evaluate a polynomial for a given value of x.
[20] Use the following rules for the derivative: Derivative(constant) = 0 Derivative(x) = 1 Derivative(P(x) + Q(y)) = Derivative(P(x)) + Derivative(Q(y)) Derivative(P(x) - Q(y)) = Derivative(P(x)) - Derivative(Q(y)) Derivative(P(x) * Q(y)) = P(x)*Derivative(Q(y)) + Q(x)*Derivative(P(x)) Derivative(P(x) / Q(y)) = P(x)*Derivative(Q(y)) - Q(x)*Derivative(P(x)) Derivative(P(x) ^ Q(y)) = Q(y) * (P(x) ^(Q(y) - 1)) * Derivative(Q(y))

Related

what is the difference between defining a vector using linspace and defining a vector using steps?

i am trying to learn the basics of matlab ,
i wanted to write a mattlab script ,
in this script i defined a vector x with a "d" step that it's length is (2*pi/1000)
and i wanted to plot two sin function according to x :
the first sin is with a frequency of 1, and the second sin frequency 10.3 ..
this is what i did:
d=(2*pi/1000);
x=-pi:d:pi;
first=sin(x);
second=sin(10.3*x);
plot(x,first,x,second);
my question:
what is the different between :
x=linspace(-pi,pi,1000);
and ..
d=(2*pi/1000);
x=-pi:d:pi;
? i am asking because i got confused since i think they both are the same but i think there is something wrong with my assumption ..
also is there is a more sufficient way to write sin function with a giveng frequency ?
The main difference can be summarizes as predefined size vs predefined step. And your example highlights it very well, indeed (1000 elements vs 1001 elements).
The linspace function produces a fixed-length vector (the length being defined by the third input argument, which defaults to 100) whose lower and upper limits are set, respectively, by the first and the second input arguments. The correct step to use is internally computed by the function itself (step = (x2 - x1) / n).
The colon operator defines a vector of elements whose values range between the specified lower and upper limits. The step, which is an optional parameter that defaults to 1, is the discriminant of the vector length. This means that the length of the result is determined by the number of steps that must be accomplished in order to reach the upper limit, starting from the lower one. On an side note, on this MathWorks thread you can find a very interesting discussion concerning the behavior of the colon operator in respect of floating-point management.
Another difference, related to the first one, is that linspace always includes the upper limit value while the colon operator only contains it if the specified step allows it (0:5:14 = [0 5 10]).
As a general rule, I prefer to use the former when I want to produce a vector of a predefined length (pretty obvious, isn't it?), and the latter when I need to create a sequence whose length has only a marginal relevance (or no relevance at all)

Will Dijkstra ever a path with cycle?

Note: There is no negative cost.
I am considering to implement U-turn in routing, which uses Dijkstra.
Will Dijkstra ever recommend route A-B-C-B-D over A-B-D? When encountering B for the first time, B is marked as visited after visiting its neighbours, thus cycle from B-C-B will never be considered
In that case, Dijkstra never recommends cycles in the result?
It's task is to find the shortest (lowest costs) path ...
There will be no cycle in case the edge weight is greater than zero
on edge weights equal to zero it could happen but makes no sence in your case
TL;DR - It is not possible unless the cost of each edge on the cycle is 0. Otherwise, including the cycle in the shortest path would add unnecessary cost to the shortest path (meaning it would no longer be the shortest path).
Background:
Dijkstra's operates by maintaining two sets of vertices. One set is the vertices that have already been marked and the other set is the vertices that have yet to be marked. Given these two sets, Dijkstra's algorithm looks for the next cheapest element to add to the list of marked vertices and then updates the shortest paths to unmarked vertices.
In the case that A-B-C have been marked and the next edge added is C->B, B would be reached twice and the cost to get to B from A with the cycle included is [x + p + q]. However, the cost of getting to B from A without the cycle would obviously be [x]. Now the shortest path from A to D with the cycle is [x + p + q + r], while the shortest path without the cycle would be [x + r]. If p and q are both greater than 0, we see the path without the cycle will be shorter.
In the general case (with positive costs of edges), a cycle will never be included because the shortest path would contain unnecessary extra cost to get back to the starting point of the cycle.
If the U-turn is actually the shortest path:
For Dijkstra's to work for a necessary U-turn, you could just start the algorithm over from C and search for the shortest path to D (hence the recalculating notification when routing). Another solution could be to modify the underlying graph ahead of time. For example, the path A-B-C-B-D would become A-B-C-Z-D. Alternatively, the edge from C->B and the edge from B->D could both be removed and replaced with a single edge from C->D.

Combination of SVD perturbation

To apply the combination of SVD perturbation:
I = imread('image.jpg');
Ibw = single(im2double(I));
[U S V] = svd(Ibw);
% calculate derviced image
P = U * power(S, i) * V'; % where i is between 1 and 2
%To compute the combined image of SVD perturbations:
J = (single(I) + (alpha*P))/(1+alpha); % where alpha is between 0 and 1
I applied this method to a specific face recognition model and I noticed the accuracy was highly increased!! So it is very efficient!. Interestingly, I used the value i=3/4 and alpha=0.25 according to a paper that was published in a journal in 2012 in which the authors used i=3/4 and alpha=0.25. But I didn't make attention that i must be between 1 and 2! (I don't know if the authors make an error of dictation or they in fact used the value 3/4). So I tried to change the value of i to a value greater than 1, the accuracy decreased!!. So can I use the value 3/4 ? If yes, how can I argument therefore my approach?
The paper that I read is entitled "Enhanced SVD based face recognition". In page 3, they used the value i=3/4.
(http://www.oalib.com/paper/2050079)
Kindly I need your help and opinions. Any help will be very appreciated!
The idea to have the value between one and two is to magnify the singular values to make them invariant to illumination changes.
Refer to this paper: A New Face Recognition Method based on SVD Perturbation for Single Example Image per Person: Daoqiang Zhang,Songcan Chen,and Zhi-Hua Zhou
Note that when n equals to 1, the derived image P is equivalent to the original image I . If we
choose n>1, then the singular values satisfying s_i > 1 will be magnified. Thus the reconstructed
image P emphasizes the contribution of the large singular values, while restraining that of the
small ones. So by integrating P into I , we get a combined image J which keeps the main
information of the original image and is expected to work better against minor changes of
expression, illumination and occlusions.
My take:
When you scale the singular values in the exponent, you are basically introducing a non-linearity, so its possible that for a specific dataset, scaling down the singular values may be beneficial. Its like adjusting the gamma correction factor in a monitor.

Spearman's rank correlation significance

I'm calculating Spearman's rank correlation in matlab with the following code:
[RHO,PVAL] = corr(x,y,'Type','Spearman');
RHO =
0.7211
PVAL =
4.9473e-04
and then with different variables
[RHO,PVAL] = corr(x2,y2,'Type','Spearman');
RHO =
0.3277
PVAL =
0.0060
How do you categorize these as p < 0.05, p < 0.01, p < 0.001 etc. Commonly in scientific journals these pvalues are represented as the examples I've shown and not as one number. Would both of these be p < 0.01? When defining whether a correlation is significant to a specific value do you always look for the smallest error i.e if its PVAL = 0.0005, both p > 0.05 and p > 0.001 would be correct here, do we simply write the lowest i.e. p > 0.001?
As Martin Dinov wrote, this is at least partially a matter of journal policy. But, as long as there is no explicit journal convention against it, I would recommend to always report the actual p-value, in this case in the form p = 4.9·10-4 and p = 0.006, respectively. You can then proceed to say that the effect you found is statistically significant, usually based on comparison with a previously chosen significance level, typically 0.05, unless you need to correct for multiple comparisons.
The reason is that the commonly used significance levels are purely a matter of convention. By only saying that p is below one conventional threshold means to withhold valuable information from the reader, which she might use to make up her own mind about the result – and this truncation is not even justified by relevant saving of print space.
You should also, of course, report the value of the correlation coefficient itself (which in this case doubles as a test statistic and an effect size) as well as the sample size.
At least for the field of psychology, these are official recommendations:
Hypothesis tests. It is hard to imagine a situation in which a dichotomous accept-reject decision is better than reporting an actual p value or, better still, a confidence interval.
…
Effect sizes. Always present effect sizes for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d).
L. Wilkinson and the Task Force on Statistical Inference, "Statistical Methods in Psychology Journals. Guidelines and Explanations"
You mean pval is < 0.05 and also < 0.001 and not >. In general, you do want to show that it is smaller than the smallest significance level (alpha) threshold that you can. So yes, it is best to say for the second example that the p-value is < 0.001. Depending on the journal convention, it may be preferable to put the actual p-value in (so, for the first example, 4.9473e-04) or just that it's < some good alpha (0.0001 for the first case).

Clusters merge threshold

I'm working with Mean shift, this procedure calculates where every point in the data set converges. I can also calculate the euclidean distance between the coordinates where 2 distinct points converged but I have to give a threshold, to say, if (distance < threshold) then this points belong to the same cluster and I can merge them.
How can I find the correct value to use as threshold??
(I can use every value and from it depends the result, but I need the optimal value)
I've implemented mean-shift clustering several times and have run into this same issue. Depending on how many iterations you're willing to shift each point for, or what your termination criteria is, there is usually some post-processing step where you have to group the shifted points into clusters. Points that theoretically shift to the same mode need not practically end up on directly top of each other.
I think the best and most general way to do this is to use a threshold based on the kernel bandwidth, as suggested in the comments. In the past my code to do this post processing has usually looked something like this:
threshold = 0.5 * kernel_bandwidth
clusters = []
for p in shifted_points:
cluster = findExistingClusterWithinThresholdOfPoint(p, clusters, threshold)
if cluster == null:
// create new cluster with p as its first point
newCluster = [p]
clusters.add(newCluster)
else:
// add p to cluster
cluster.add(p)
For the findExistingClusterWithinThresholdOfPoint function I usually use the minimum distance of p to each currently defined cluster.
This seems to work pretty well. Hope this helps.