Writing ELKI DBSCAN convex hull of clusters to file - cluster-analysis

I have started using ELKI for data analysis, but one seemingly simple thing I cannot seem to do is output the calculated convex hull of clusters to a file after running DBSCAN. I am able to visualize the convex hulls via the visualization gui, but I cannot generate the KML file. I am also able to write my clustering results to a folder (using the ResultWriter resulthandler), but no file is generated when I set the KMLOutputHandler. I receive no error message in the log window (even with verbose parameter set to true).
Is there a trick to generating a KML file in ELKI? Could anyone walk through the steps of doing this?
Any help would be appreciated.
(as an aside, is it possible to generate alpha shapes for DBSCAN results with ELKI? If so, which parameter must be adjusted?)

So that is actually a lot of questions in one...
Cunvex hulls: they are used in ELKI for visualization, but not considered part of the output result, so they are not saved to file. A trick you could employ is to save the visualization as SVG and extract them from this file, but they will then be in a different coordinate system.
One of the reasons for this is that the convex hulls are only implemented for 2D Euclidean space - I figure you want to use it for spatial data, where it may actually happen to not return the correct convex hull then due to the curvature of the earth surface. Furthermore, many data sets will be of higher dimensionality.
However, you can of course look at the source code and invoke the convex hull algorithm, then write the result to your favorite output format. In general, just as you will need to spend time on preprocessing, you will also need to customize the output.
Which brings me to the second question. The KMLResultHandler is closely tied to the publication of ELKI 0.4.0: Spatial Outlier Detection: Data, Algorithms, Visualizations.
Which pretty much summarizes what this class does: visualize spatial outlier detection. It currently does not (yet) include code to visualize clusters of spatial data, for example. In order to get an output from the class, you need to ensure a number of restrictions, unfortunately. Essentially, if it finds a Polygon relation and a OutlierResult that it can map to each other, it will output this to KML.
It is not yet a class that could write arbitrary results to KML. It probably needs a lot more of documentation, too. Contributions of a more general output tool would be appreciated; but a customizeable, automatic, general output to KML is really hard to do. In particular, you may also end up having to include projection capabilities then, if someone is not processing Latitude-Longitude data, but e.g. UTM projected data.
As such, I recommend looking at the source code of the class and customizing it to your needs. In my opinion, visualization to KML will always require a lot of customization.
To generate alpha shapes (only the hull, not the extended alpha shape - the optimal visualization of DBSCAN would likely consist of the alpha shape of the core points only, extended by a radius of epsilon, which should then include the border points. This is on the wish list, but not implemented), you just need to set the -hull.alpha parameter to the desired alpha value. Note that this happens in the visualization projection, not at the raw data. If the axes are scaled differently, alpha shapes will look differently. Again, you may be interested in using the class AlphaShape on the raw data vectors, instead of exporting the projected visualization. Then you can easily write the resulting Polygons to your custom visualization.
If you implement such a KML visualization using alpha shapes (or convex hulls) for clusters, I would appreciate if you could contribute this to ELKI to make it available for others as well. Thank you.

Related

Analyze 3D objects by Voxels

I plan to use OpenVDB to analyze 3D objects/meshes. The objective is:
To detect object surface regions with a certain criterion, like slope
Then manipulate those regions
The manipulation might be adding other 3D objects to those regions, for example
OpenVDB has some tools available:
Conversion Tools
Filters
Topological Operations
Level Set Tools
Morphological Operations
Geometric Transforms
Compositing Tools
...
It is a large set of confusing tools to choose from. Does anybody with OpenVDB experience know:
Is OpenVDB the proper library to achieve my objective
If so, which OpenVDB tool best suits my needs
Answer provided by OpenVDB community:
An important question is what you mean by "3D objects/meshes."
OpenVDB is very good at performing those regions with surfaces by
representing them as signed distance fields. But the word "mesh"
raises some alarm bells that you may want to maintain topology. In
this case another library may be more effective.
It also sounds like you have a problem domain you are trying to
explore. For that, I would not go straight to code but instead
explore solutions using 3d applications first. My own biased first
choice would be Houdini, whose apprentice version you can get for
free. This provides most of the VDB code as separate nodes. So, for
example, you can use a File SOP to load a mesh from disk, a VDB From
Polygons to convert it to a Signed Distance FIeld, and then VDB
Analysis to compute the Gradient. The gradient I think matches what
you are looking for as slope, but it is also possible you are looking
for curvature...
To return to mesh land, you can use a VDB Convert. Finally a ROP
Geometry can save it out.
Attached is a file showing a network to compute an approximate Y-slope
as a volume, apply it back to a mesh, and save to disk.
Attached file

How to separate a thin strip from the rest of data

I have a dataset that is represented by this picture.
As you can see, there is a thin strip on top of the rest of data points. The question is how I can separate the strip from the rest, using clustering analysis or any other technique.
I have tried DBSCAN, KMeans, and Hierarchical Clustering and all gave me similar results shown by colors in the graph.
DBSCAN and OPTICS are your best candidates. If the data is not too big, you can also try meanshift. But they will not be able to do it perfectly - some points will be "noise" to them.
It's fairly obvious that k-means and most hierarchical clustering cannot solve this.
Keep minPts small (5 to 10), and focus on choosing epsilon. It must be small enough to not cover the gap. OPTICS will be easier to use, since you only need to give an upper bound on epsilon.
Consider manually specifying a model. Tweaking parameters until you get your desired result is not any better. Draw a line on your plot with a ruler, turn that into a linear model by reading off the parameters...

Curve fitting, but I want to guarantee only one inflection point

I often find myself fitting a scatter plot, and knowing that the 'true fit' should have only one inflection point. Any ideas for forcing a fit that will obey this?
I am using Matlab and Microsoft Excel
Many thanks
Option 1:
I like to use spline smoothing with Akaike information criteria, and while it is a hyper-parametric fit and has a large number of analytic candidate inflection points, the smoothed data at the sample points tends to reveal only what is within the data.
If your data doesn't actually have an inflection point, this is indicated. If it does, it is also usually captured. Statistical jargon for an important cousin to this is called a "non-informative prior".
Try slides 30-31 here: link.
Option 2:
If you have an older version of MatLab then you can specify the exact model easily in the "cftool" (not the same as sftool) then get m-file that gives how you put it into your own script. Pick a model appropriate to your data.

Prediction avoiding landmass

I am working on a project where the following functions has to be implemented.
Predict the location of the ships (in maritime environment) into a future time (Can be done with Kalman filter, IMM filter and some other algorithms).
Ships can be any part of the world.
Avoiding landmass during prediction
Shortest path along the shorelines
I am totally done with the first part which is predicting without considering the shoreline information. I have
problem with the functions 2 and 3.
Problem in function 2
At times, your predicted location can fall into the landmass area which is totally unacceptable.
I am using following coastal area shp file http://openstreetmapdata.com/data/coastlines
This file has converted XY values of the world shoreline data.
I have loaded this shp file into postgreSQL and used postgis to read it from the database.
So my idea is to go through all the polygons (shoreline defined based on polygons) and checking whether the line connecting the present location and the predicted location
crosses the polygon. If it crosses, that means we have to find the where the ship intercept the shoreline first.
So if I follow this approach going through all the polygons, it is going to take time forever. (It has around 62000 polygons with each of them has 1000's of
points). So any advice on this? I thought about initially dividing the worldmap into hierachical areas (Level 1 : 10 polygons, Level 2: Each polygon has 10 polygons inside).
But I am not sure how to divide the world map with the above shp file into the level of polygons I require.
Or any functionality of postgis helpful for this? or any other libraries for this purpose. I believe this kind of functionality should be available already. But I could not
able to figure it out sofar.
Function 3
Since now we know where does the ship intercept the shoreline first, we can predict it along the shoreline using the shortest path algorithm given we know
the destination information. But to do this, you need to divide the above shoreline map into grids so the shortest path can be used.
So how can you make grids based on this along the shorelines? I am not doing image processing here. What I have is this shp file now. Any advice is appreciated.
or should I go with some image processing approach and make the grid shorelines. if so please provide some links.
First, PostGIS is pretty fast, and with the proper indexes, as long as you keep your polygons reasonably small, you should be able to make up for the number of them with good indexing and overlapping operator support (overlapping polygons can use GIST and GIN indexes, with the latter performing better than the former for reads and worse for writes).
62000 polygons globally is nothing. Write back when you are having to check more than a few thousand whose bounding boxes overlap with your line....
For the third problem, you are going one direction, right? I am wondering how hard it would be to write a tangent(point, vector, polygon) function which would return the closest tangent to a polygon along a certain vector (a vector could be represented by a (point, point) tuple). If you were to combine this with KNN searches, you ought to be able to plot a course using a WITH RECURSIVE query.

Ideas for extracting features of an object using keypoints of image

I'll be appreciated if you help me to create a feature vector of an simple object using keypoints. For now, I use ETH-80 dataset, objects have an almost blue background and pictures are took from different views. Like this:
After creating a feature vector, I want to train a neural network with this vector and use that neural network to recognize an input image of an object. I don't want make it complex, input images will be as simple as train images.
I asked similar questions before, some one suggested using average value of 20x20 neighborhood of keypoints. I tried it, It seems it's not working with ETH-80 images, because of different views of images. It's why I asked another question.
SURF or SIFT. Look for interest point detectors. A MATLAB SIFT implementation is freely available.
Update: Object Recognition from Local Scale-Invariant Features
SIFT and SURF features consist of two parts, the detector and the descriptor. The detector finds the point in some n-dimensional space (4D for SIFT), the descriptor is used to robustly describe the surroundings of said points. The latter is increasingly used for image categorization and identification in what is commonly known as the "bag of word" or "visual words" approach. In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset:
http://www.vlfeat.org/applications/apps.html#apps.caltech-101