I am trying to write Ulam's distance function in python from scratch with this given equation, but couldn't figure it out.
Ulam's distance equation
Let me know if you have any solution, thanks
If I am doing hierarchical clustering, if I am using centroid linkage with a distance function other than Euclidean, say, for example, minkowski distance with an exponent of 3 as opposed to 2, will that necessarily not work? What would matlab do if it was given centroid linkage and a Minkowski distance with an exponent fo 3 so that it was not simply Euclidean distance?
You can use the linkage function define other distances. Examples are included in the link.
I have found the following formulas for Inter-Cluster and Intra-Cluster distances and I am not sure I understand how they work.
Inter-Cluster Distance
Shouldn't there be a square root in formulas above?
Inter-Cluster and Intra-Cluster:
Why is there the j index starting from N+1? And not from 1 to N2?
Which one is the correct one? Or are there any equivalencies? Or should I go for the distance between centroids for the inter cluster distance? Seems rather simple. What about the intra cluster distance?
I find the wikipedia formulas http://en.wikipedia.org/wiki/Cluster_analysis#Internal_evaluation even harder to understand.
I need to compute this distances in order to proper group colors in order to create a reduced color palette, so I'm thinking the more accurate these distances are, the more accurate the groupping (formula instead of distance between centroids distance for inter-cluster). The vectors are 3-dimensional(RGB components).
A lot of algorithms don't really use "distance".
k-means for example minimizes variance, which is the sum-of-squares you are seeing here. Now sum-of-squares is squared Euclidean distance, so one can argue that this algorithm also tries to minimize Euclidean distances; but the "natural" formulation of the algorithm doesn't use Euclidean distances, but sum-of-squares. if I'm not mistaken, the same also holds for Ward clustering, that you should compute it using variance, not euclidean distance.
Note that if you minimize z^2, and z cannot be negative, then you also minimized z.
See also: https://stats.stackexchange.com/questions/95793/is-there-an-advantage-to-squaring-dissimilarities-when-using-ward-clustering
I'm looking for help doing a (simple?) least squares line fit to a set of points in Matlab.
I have an image with a set of points that I'm trying to fit a line to, minimizing the distance from each point to the line (least squares fit). Seems to work fine with openCV/fitline, but we're doing our research on two platforms - the other being Matlab, and Matlab/polyfit doesn't do the same thing as opencv/fitline.
Per Fitting a line - MatLab disagrees with OpenCV, it seems that Polyfit minimizes the Y distance to the line, not the least squares (perpendicular) distance.
You can write your own distance function and use lsqnonneg.
I would like to use a MATLAB function to find the minimum length between a point and a curve? The curve is described by a complicated function that is not quite smooth. So I hope to use an existing tool of matlab to compute this. Do you happen to know one?
When someone says "its complicated" the answer is always complicated too, since I never know exactly what you have. So I'll describe some basic ideas.
If the curve is a known nonlinear function, then use the symbolic toolbox to start with. For example, consider the function y=x^3-3*x+5, and the point (x0,y0) =(4,3) in the x,y plane.
Write down the square of the distance. Euclidean distance is easy to write.
(x - x0)^2 + (y - y0)^2 = (x - 4)^2 + (x^3 - 3*x + 5 - 3)^2
So, in MATLAB, I'll do this partly with the symbolic toolbox. The minimal distance must lie at a root of the first derivative.
sym x
distpoly = (x - 4)^2 + (x^3 - 3*x + 5 - 3)^2;
r = roots(diff(distpoly))
r =
-1.9126
-1.2035
1.4629
0.82664 + 0.55369i
0.82664 - 0.55369i
I'm not interested in the complex roots.
r(imag(r) ~= 0) = []
r =
-1.9126
-1.2035
1.4629
Which one is a minimzer of the distance squared?
subs(P,r(1))
ans =
35.5086
subs(P,r(2))
ans =
42.0327
subs(P,r(3))
ans =
6.9875
That is the square of the distance, here minimized by the last root in the list. Given that minimal location for x, of course we can find y by substitution into the expression for y(x)=x^3-3*x+5.
subs('x^3-3*x+5',r(3))
ans =
3.7419
So it is fairly easy if the curve can be written in a simple functional form as above. For a curve that is known only from a set of points in the plane, you can use my distance2curve utility. It can find the point on a space curve spline interpolant in n-dimensions that is closest to a given point.
For other curves, say an ellipse, the solution is perhaps most easily solved by converting to polar coordinates, where the ellipse is easily written in parametric form as a function of polar angle. Once that is done, write the distance as I did before, and then solve for a root of the derivative.
A difficult case to solve is where the function is described as not quite smooth. Is this noise or is it a non-differentiable curve? For example, a cubic spline is "not quite smooth" at some level. A piecewise linear function is even less smooth at the breaks. If you actually just have a set of data points that have a bit of noise in them, you must decide whether to smooth out the noise or not. Do you wish to essentially find the closest point on a smoothed approximation, or are you looking for the closest point on an interpolated curve?
For a list of data points, if your goal is to not do any smoothing, then a good choice is again my distance2curve utility, using linear interpolation. If you wanted to do the computation yourself, if you have enough data points then you could find a good approximation by simply choosing the closest data point itself, but that may be a poor approximation if your data is not very closely spaced.
If your problem does not lie in one of these classes, you can still often solve it using a variety of methods, but I'd need to know more specifics about the problem to be of more help.
There's two ways you could go about this.
The easy way that will work if your curve is reasonably smooth and you don't need too high precision is to evaluate your curve at a dense number of points and simply find the minimum distance:
t = (0:0.1:100)';
minDistance = sqrt( min( sum( bxsfun(#minus, [x(t),y(t)], yourPoint).^2,2)));
The harder way is to minimize a function of t (or x) that describes the distance
distance = #(t)sum( (yourPoint - [x(t),y(t)]).^2 );
%# you can use the minimum distance from above as a decent starting guess
tAtMin = fminsearch(distance,minDistance);
minDistanceFitte = distance(tAtMin);