Clusters merge threshold - cluster-analysis

I'm working with Mean shift, this procedure calculates where every point in the data set converges. I can also calculate the euclidean distance between the coordinates where 2 distinct points converged but I have to give a threshold, to say, if (distance < threshold) then this points belong to the same cluster and I can merge them.
How can I find the correct value to use as threshold??
(I can use every value and from it depends the result, but I need the optimal value)

I've implemented mean-shift clustering several times and have run into this same issue. Depending on how many iterations you're willing to shift each point for, or what your termination criteria is, there is usually some post-processing step where you have to group the shifted points into clusters. Points that theoretically shift to the same mode need not practically end up on directly top of each other.
I think the best and most general way to do this is to use a threshold based on the kernel bandwidth, as suggested in the comments. In the past my code to do this post processing has usually looked something like this:
threshold = 0.5 * kernel_bandwidth
clusters = []
for p in shifted_points:
cluster = findExistingClusterWithinThresholdOfPoint(p, clusters, threshold)
if cluster == null:
// create new cluster with p as its first point
newCluster = [p]
clusters.add(newCluster)
else:
// add p to cluster
cluster.add(p)
For the findExistingClusterWithinThresholdOfPoint function I usually use the minimum distance of p to each currently defined cluster.
This seems to work pretty well. Hope this helps.

Related

How to define Traingular membership function for fuzzy controller design?

I am designing a fuzzy controller and for that, I have to define 3 triangular function sets. They are:
1 large
2 medium
3 small
But my problem is I have following data only:
Maximum input = 3 Minimum input= 0.1
Maximum output = 5.5 Minimum output= 0.8
How to define 3 triangular set range based on only this given information?
Here is the formula for a triangular membership function
f=0 if x<=a
f=(x-a)/(b-a) if a<=x<=b
f=(c-x)/(c-b) if b<=x<=c
f=0 if x>c
where a is the min, c is the max and b is the midpoint.
In your case, take the top situation where the max is 3 and the min is 0.1. The midpoint is (3+0.1)/2=1.55, so you have
f=0 if x<=0.1
f=(x-0)/(1.55-1) if 0.1<=x<=1.55
f=(3-x)/(3-1.55) if 1.55<=x<=3
f=0 if x>3
You should be able to take the 2nd example from here, but if not let me know. Something worth pointing out is that the midpoint may not be the ideal b in your situation. Any point between a and c could serve as your b, just know that it is the point where the membership function equals 1.
It is difficult to tell, but it looks like maybe you just have given parameters for two of the functions, perhaps for small and large or medium and large. You may need to use some judgement for the 3rd membership function.

SAT based motion planning

SAT BASED MOTION PLANNING ALGORITHM
A simple motion planning problem can be remodelled as a SAT solving problem. Can anyone explain how is this possible?
In this problem, we have to find a collision free path from start to end location.
The simplest example could look like this.
Let's introduce 2D grid of N rows and M columns, a moving agent A starts at a node (x,y). His target T has coordinates (x_i, y_j):
To reach a target the agent should perform several steps - move left, right, up or down consequently. We don't know how many steps it needs, so we have to limit this number ourselves. Let's say, we are searching for a plan that consists of K steps. In this case, we should add N*M*K boolean variables: N and M represent coordinates, K - time. If a variable is True then the agent currently at a node (x,y) at time k.
Next, we add various constraints:
The agent must change his position at each step (this is optional, actually)
If robot at step k is at a position (x,y), then at step k+1 it must be at one of four adjacent nodes
SAT formula is satisfied if and only if the agent at step k is at the target node
I'll not discuss a detailed implementation of the constraints here, it's not that difficult. The similar approach could be used for multiagent planning.
This example is just an illustration. People use satplan and STRIPS in real life.
EDIT1
In the case of a collision-free path you should add additional constraints:
If a node contains an obstacle, an agent can't visit it. E.g. corresponding boolean variables can't be True at any timestep e.g. it's always False
If we are talking about a multiagent system, then two boolean variables, corresponding to two agents being at same timestep at the same node, can't be True simultaneously:
AND (agent1_x_y_t, agent2_x_y_t) <=> False
EDIT2
How to build a formula that would be satisfied. Iterate over all nodes and all timestamps, e.g. over each Boolean variable. For each Boolean variable add constraints (I'll use Python-like pseudocode):
formula = []
for x in range(N):
for y in range(M):
for t in range (K):
current_var = all_vars[x][y][t]
# obstacle
if obstacle:
formula = AND (formula, NOT (current_var))
# an agent should change his location each step
prev_step = get_prev_step (x,y,t)
change = NOT (AND (current_var, prev_step))
formula = AND (formula, change)
adjacent_nodes = get_adj (x,y, k+1)
constr = AND (current_var, only_one_is_true (adjacent_nodes))
formula = AND (formula, constr)
satisfy (formula)

Match Two Sets of Measurement Data With Different Logging Start Times and End Times

Problem
I have two arrays (Xa and Xb) that contain measurements of the same physical signal, but they are taken at different sample rates. Lastly, physical logging of Xa data starts at a different time, than that of Xb. The logging of data also stops at different time.
i.e.
(The following is just a summary of important statements, not code.)
sampleRatea > sampleRateb % Resolution of Xa is greater than that of Xb
t0a ~= t0b % Start times are not equal
t1a ~= t1b % End times are not equal
Objective
Find the necessary shift in indices that will best line up these sets of data.
Approach
Use fmincon to find the index that minimizes the mean squared error (MSE) between versions Xa and Xb that are edited to have the same sample rate (perhaps using the interpolation function).
I have tried to do this but it always seems that I have too many degrees of freedom. Is there anyone who can shed some light on a process that might facilitate this process?
Assuming you have two samples with constant frequencies, the problem reduces to something quite simple:
Find scale, location such that:
Xa , at timestamps corresponding to its index, makes the best match with Xb at timstamps corresponding to location + scale * its index.
If you agree with this you can see that only two degrees of freedom are left, if you know the ratio of sample rates it even reduces to just 1 degree of freedom.
I believe that now the hard part is done, but some work still remains:
Judge how good two samples with timestamps and values match
Find the optimal combination of your location and scale parameter
Note that, assuming you complete these 2 steps properly, the solution should be optimal for finding the optimal timestamps. As you are looking for a shift in (integer) indices, translating these timestamps back to indices may not be result in the real optimum but it should be pretty close.
Here is a quick-and-dirty solution that should be enough to get you started. Given your input signals Xa and Xb sampled at sampleRatea and sampleRateb respectively:
g = gcd(sampleRatea,sampleRateb);
Ya = interp(Xa,sampleRateb/g);
Yb = interp(Xb,sampleRatea/g);
Yfs = sampleRatea*sampleRateb/g;
[acor,lag] = xcorr(Ya,Yb);
time_shift = lag(acor == max(acor))/Yfs;
The variable time_shift will tell you the time elapsed between the start of A and the start of B. If B starts first, the result will be negative.
If your sampling rates are relatively prime, this will be horribly inefficient. If one is an integer multiple of the other, or they have a relatively large GCD, it will be much better.

To find the largest edge in the path between two given nodes / vertices

I am trying to update a MST by adding a new vertex in the MST. For this, I have been following "Updating Spanning Tree" by Chin and Houck. http://www.computingscience.nl/docs/vakken/al/WerkC/UpdatingSpanningTrees.pdf
A step in the paper requires me to find the largest edge in the path/paths between two given vertices. My idea is to find all the possible paths between the vertices and then, subsequently find the largest edge from the paths. I have been trying to implement this in MATLAB. However, so far, I have been unsuccessful. Any lead / clear algorithm to find all paths between two vertices or even the largest edge in the path between two given nodes/ vertices would be really welcome.
For reference, I would like to put forward an example. If the graph has following edges 1-2, 1-3, 2-4 and 3-4, the paths between 4 and 4 are:
1) 4-2-1-3-4
2) 4-3-1-2-4
Thank you
The algorithm works by lowering the t value to exclude large edges from the new MST. When the algorithm completes, t will be the lowest edge that remains to be inserted to complete the MST.
The m value represents the largest edge on a path from r to z, local to each run of INSERT. m is lowered at each iteration of the loop if possible, thereby removing the previous m edge as a possible candidate for t.
It's not easy to explain in words, I recommend doing a run of the algorithm on paper until the steps are clear.
I made a quick attempt to sketch the steps here: http://jacob.midtgaard-olesen.dk/?p=140
But basically, the algorithm adds edges from the old MST unless it finds a smaller edge to add between the new node z and another node in the old MST. In the example, the edge (A,B) is not in the new tree, since a better connection to B was found by the algorithm.
Note that on selecting h and k, if t and (w,r) have equal edge value, I believe you should choose (w,r)
Finally you should probably go trough the proof following the algorithm to understand why the algorithm works. (I didn't read it all :) )

Dijkstra's algorithm with negative weights

Can we use Dijkstra's algorithm with negative weights?
STOP! Before you think "lol nub you can just endlessly hop between two points and get an infinitely cheap path", I'm more thinking of one-way paths.
An application for this would be a mountainous terrain with points on it. Obviously going from high to low doesn't take energy, in fact, it generates energy (thus a negative path weight)! But going back again just wouldn't work that way, unless you are Chuck Norris.
I was thinking of incrementing the weight of all points until they are non-negative, but I'm not sure whether that will work.
As long as the graph does not contain a negative cycle (a directed cycle whose edge weights have a negative sum), it will have a shortest path between any two points, but Dijkstra's algorithm is not designed to find them. The best-known algorithm for finding single-source shortest paths in a directed graph with negative edge weights is the Bellman-Ford algorithm. This comes at a cost, however: Bellman-Ford requires O(|V|·|E|) time, while Dijkstra's requires O(|E| + |V|log|V|) time, which is asymptotically faster for both sparse graphs (where E is O(|V|)) and dense graphs (where E is O(|V|^2)).
In your example of a mountainous terrain (necessarily a directed graph, since going up and down an incline have different weights) there is no possibility of a negative cycle, since this would imply leaving a point and then returning to it with a net energy gain - which could be used to create a perpetual motion machine.
Increasing all the weights by a constant value so that they are non-negative will not work. To see this, consider the graph where there are two paths from A to B, one traversing a single edge of length 2, and one traversing edges of length 1, 1, and -2. The second path is shorter, but if you increase all edge weights by 2, the first path now has length 4, and the second path has length 6, reversing the shortest paths. This tactic will only work if all possible paths between the two points use the same number of edges.
If you read the proof of optimality, one of the assumptions made is that all the weights are non-negative. So, no. As Bart recommends, use Bellman-Ford if there are no negative cycles in your graph.
You have to understand that a negative edge isn't just a negative number --- it implies a reduction in the cost of the path. If you add a negative edge to your path, you have reduced the cost of the path --- if you increment the weights so that this edge is now non-negative, it does not have that reducing property anymore and thus this is a different graph.
I encourage you to read the proof of optimality --- there you will see that the assumption that adding an edge to an existing path can only increase (or not affect) the cost of the path is critical.
You can use Dijkstra's on a negative weighted graph but you first have to find the proper offset for each Vertex. That is essentially what Johnson's algorithm does. But that would be overkill since Johnson's uses Bellman-Ford to find the weight offset(s). Johnson's is designed to all shortest paths between pairs of Vertices.
http://en.wikipedia.org/wiki/Johnson%27s_algorithm
There is actually an algorithm which uses Dijkstra's algorithm in a negative path environment; it does so by removing all the negative edges and rebalancing the graph first. This algorithm is called 'Johnson's Algorithm'.
The way it works is by adding a new node (lets say Q) which has 0 cost to traverse to every other node in the graph. It then runs Bellman-Ford on the graph from point Q, getting a cost for each node with respect to Q which we will call q[x], which will either be 0 or a negative number (as it used one of the negative paths).
E.g. a -> -3 -> b, therefore if we add a node Q which has 0 cost to all of these nodes, then q[a] = 0, q[b] = -3.
We then rebalance out the edges using the formula: weight + q[source] - q[destination], so the new weight of a->b is -3 + 0 - (-3) = 0. We do this for all other edges in the graph, then remove Q and its outgoing edges and voila! We now have a rebalanced graph with no negative edges to which we can run dijkstra's on!
The running time is O(nm) [bellman-ford] + n x O(m log n) [n Dijkstra's] + O(n^2) [weight computation] = O (nm log n) time
More info: http://joonki-jeong.blogspot.co.uk/2013/01/johnsons-algorithm.html
Actually I think it'll work to modify the edge weights. Not with an offset but with a factor. Assume instead of measuring the distance you are measuring the time required from point A to B.
weight = time = distance / velocity
You could even adapt velocity depending on the slope to use the physical one if your task is for real mountains and car/bike.
Yes, you could do that with adding one step at the end i.e.
If v ∈ Q, Then Decrease-Key(Q, v, v.d)
Else Insert(Q, v) and S = S \ {v}.
An expression tree is a binary tree in which all leaves are operands (constants or variables), and the non-leaf nodes are binary operators (+, -, /, *, ^). Implement this tree to model polynomials with the basic methods of the tree including the following:
A function that calculates the first derivative of a polynomial.
Evaluate a polynomial for a given value of x.
[20] Use the following rules for the derivative: Derivative(constant) = 0 Derivative(x) = 1 Derivative(P(x) + Q(y)) = Derivative(P(x)) + Derivative(Q(y)) Derivative(P(x) - Q(y)) = Derivative(P(x)) - Derivative(Q(y)) Derivative(P(x) * Q(y)) = P(x)*Derivative(Q(y)) + Q(x)*Derivative(P(x)) Derivative(P(x) / Q(y)) = P(x)*Derivative(Q(y)) - Q(x)*Derivative(P(x)) Derivative(P(x) ^ Q(y)) = Q(y) * (P(x) ^(Q(y) - 1)) * Derivative(Q(y))