how to individuate two separate sets of points within a distribution - matlab

I have some difficulties in distinguishing univocally two or more sets of points within a distribution on a bidimensional plane. I would like to separate those points in two different arrays on the basis of their position (they form two separate distribution). I was thinking something like using polar transformation (cart2pol), ordering them according to the angle and than using hist command to see the arrangement, but I cannot manage how to make a stable and univocal solution (because the position of those points isn't known beforehand). thanks for help.

Related

Label/Color Probability Plot Points by Group

I am plotting a data set consisting of the same kind of data from nominally the same source. I am using MATLAB's probplot command. I would like to do something along the lines of using findgroups to plot the points with a color corresponding to their source. I have attempted, for example,
probplot('lognormal', data, ~findgroups(category))
I am not sure how (or if) one could use the 'params' functionality to pass the findgroups and make this work. I'm interested in any other solutions as well.

Putative correspondences

I am trying to implement the algorithm for estimating the fundamental matrix between two images using RANSAC. So far I have found the interest points using Harris corner detection. I am stuck at computing the putative correspondences using these interest points. I don't want to use matlab toolbox for that , I like to know a way to learn about corresponding point extraction from two images and it's implementation. I have read about block matching but have not completely understood the concept of it. Any samples and guidelines would help me to understand this problem better.
Thanks in advance.
There are many ways to search for corresponding interest points, but they're usually based on describing each of these interest points using the characteristics of the image around them, and, for each point in one image, comparing its surrounding's characteristics to the characteristics of the surroundings of other interest points in the other image.
Now assume you've decided to consider only a squared region (a block) around each point of interest that contains the intensity values of the image around the point. Now you can compare these blocks, and match those that are close to each other. The problem is now how to define "close" or, in other words, how to define the distance metric you'll use to compare these blocks.
There are many approaches, for example, you could use the sum-of-absolute-differences between two blocks, which means you could subtract two blocks, take the absolute value of the resulting block, and then sum all values in this resulting block, obtaining a scalar value which represents how close these blocks are. If this distance is less than a given threshold, you can consider the two blocks a match. This is basically what block matching does.
Similarly, you could define other types of regions to describe your points of interes, for example by changing their shapes, sizes, orientations etc, and create more complex descriptors for these points of interest, which might capture more distinguishable characteristics (which is highly desired if you have the purpose of matching them later).
If you want to learn more about the topic, I think this presentation can get you started:
http://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect6.pdf

Clusters based on distance

Here is my problem: I have a list of villages. For each village I computed the path distance between them and prepared a distance matrix. Now I want to identify clusters of villages which are close to each other.
I use Python 2.7 and I already used hierarchical clustering (provided by scypy) to cluster the distance matrix. By looking at it as a human being, I can identify the nearest villages, but I need to automate it. I need to get the elements which belong to each cluster.
I was also wondering how to retrieve the clusters once I had created and cut the dendrogram. Since this is unanswered and may come up for others with a similar question, I'll answer according to what I was looking for, making some assumptions since this is an old question.
The first step is that you need to determine where to cut the dendrogram. You can do this a variety of ways, but I'll assume you already know how to do this, since you're looking at the dendrogram and seem to have satisfied yourself that you have clustered the data. If you don't know where to cut, you could start with something simple like cutting at the max distance. But really, where to cut is a different, very long discussion which I will assume you have figured out how to do (since I had done so at this point in my search).
Now I assume you have a dendrogram, and you know where to cut it, and maybe you even have it plotted with the cut line. But you want to do something more with the clusters, so you need to label the points you clustered. This can be done using the flat cluster (fcluster()) function in scipy.
from scipy.cluster.hierarchy import fcluster
clusters=fcluster(Z,distance,criterion='distance')
print(clusters)
Z is the hierarchical linkage matrix (as from scipy's linkage() function) which I assume you had already created. distance is the distance at which you are cutting the dendrogram (but there are other ways to cut the dendrogram, see source for how to do this with fcluster).
This returns a numpy array denoting which observation is in which cluster. Now you can append this to your data as a new column and go to town (or village) with it.

Data interpolation over a non-homogeneous surface

In my project i deal with big data surfaces.
In a certain point, i have a line across the data, and I need the values of the points of the line.
The grid is non,homogeneous, it doesnt go from n:m with fixed steps nor nothing.
Lets ilustrate!
In the figure the 2D proyection of my data can be seen. Each of the points has also other 3 data information. I defined a arbitrary red line with the form y=ax+b. a and b are known.
How can I define i.e. 50 points in the line that has not only the x and y coords (wich is straigforward) but also the interpolation of the 3 data information of each of the points around it.
I know is not an easy question but I can't seem to step forward even a bit.
PD: realize I DONT want code written for me, but the idea of how to achieve my objective.
You could use a tool like triScatteredInterp, which will triangulate the 2-d domain, then interpolate a list of points along your line. Griddata is also an option.
I have a toolbox for problems like this (of course.) It allows me to build a triangulation of the non-convex domain in the (x,y) plane. Then it can form a completely general slice through that surface, interpolating in z also as it does so. The result will be a 1-manifold, in this case a piecewise linear function along that path in (x,y,z). While those tools are not posted on the file exchange, they are available for the person willing to invest the time to learn to use them.
If the surface you describe is a completely general one in 3-d, that might be fairly complex, then you might need a CRUST based tool to define that surface triangulation. These can be found online too. Once a triangulation is available, my tools can then be used to slice them. (Sorry, I never did finish that piece.)
What I did was to define several points in the crack line and then cheack for each one of them in wich quadrilateral it is with inpoligon matlab function (no tthe fastest way but less than 2 secs).
Then I created a triangular plane in the used quadrilaterals using x,y and Z or the othre data , achieving a linear interpolation between the data.
finally i take out all the points that are 0 o Nan.

Process for comparing two datasets

I have two datasets at the time (in the form of vectors) and I plot them on the same axis to see how they relate with each other, and I specifically note and look for places where both graphs have a similar shape (i.e places where both have seemingly positive/negative gradient at approximately the same intervals). Example:
So far I have been working through the data graphically but realize that since the amount of the data is so large plotting each time I want to check how two sets correlate graphically it will take far too much time.
Are there any ideas, scripts or functions that might be useful in order to automize this process somewhat?
The first thing you have to think about is the nature of the criteria you want to apply to establish the similarity. There is a wide variety of ways to measure similarity and the more precisely you can describe what you want for "similar" to mean in your problem the easiest it will be to implement it regardless of the programming language.
Having said that, here is some of the thing you could look at :
correlation of the two datasets
difference of the derivative of the datasets (but I don't think it would be robust enough)
spectral analysis as mentionned by #thron of three
etc. ...
Knowing the origin of the datasets and their variability can also help a lot in formulating robust enough algorithms.
Sure. Call your two vectors A and B.
1) (Optional) Smooth your data either with a simple averaging filter (Matlab 'smooth'), or the 'filter' command. This will get rid of local changes in velocity ("gradient") that appear to be essentially noise (as in the ascending component of the red trace.
2) Differentiate both A and B. Now you are directly representing the velocity of each vector (Matlab 'diff').
3) Add the two differentiated vectors together (element-wise). Call this C.
4) Look for all points in C whose absolute value is above a certain threshold (you'll have to eyeball the data to get a good idea of what this should be). Points above this threshold indicate highly similar velocity.
5) Now look for where a high positive value in C is followed by a high negative value, or vice versa. In between these two points you will have similar curves in A and B.
Note: a) You could do the smoothing after step 3 rather than after step 1. b) Re 5), you could have a situation in which a 'hill' in your data is at the edge of the vector and so is 'cut in half', and the vectors descend to baseline before ascending in the next hill. Then 5) would misidentify the hill as coming between the initial descent and subsequent ascent. To avoid this, you could also require that the points in A and B in between the two points of velocity similarity have high absolute values.