Fast libraries for merging two tangled convex hulls accessible from python? - convex-hull

Am looking for something that is incremental (with accessible state). So that likely means some merge method is exposed.
So in general I want to start with a set of points, that has a ConvexHull calculated and add a point to it (which trivially has itself as a convex hull). Was looking for alternatives to BowyerWatson through convex hull merges. Not sure if this is a bad idea. Not sure if this should be a question in CS except it's about finding a real solution in the python echosystem.
I see some related content here.
Merging two tangled convex hulls
And Qhull (scipy Delaunay and ConvexHull use this) has a lot of options I do not yet understand
http://www.qhull.org/html/qh-optq.htm

You can use Andrew's modification of the Graham scan algorithm.
Here is a reference to some short python code implementing it.
What makes it suited for your needs is that after the points are sorted in xy-order, the upper and lower hulls are computed in linear time. Since you already have the convex hull (possibly both convex hulls), the xy-sorting of the convex hull points will take linear time (e.g., reverse the lower hulls and merge-sort four sorted lists). The rest of the algorithm will take linear time (in the number of points on the convex hulls, which may well be much smaller than the original number of points).
All the functionality for this implementation is in the code referenced above, and for the merge you can use the code from this SO answer or implement your own.

Related

Merging two tangled convex hulls

How to merge two tangled convex hulls (like this) to form a Convex Hull using Graham's Scan or anyother algorithm in linear time?
Basically, you use
Andrew's modification of the Graham scan algorithm.
After the points are sorted in xy-order, the upper and lower hulls are computed in linear time. Since you already have the two convex hulls , the xy-sorting of the convex hull points will also take linear time (e.g., reverse the lower hulls and merge-sort four sorted lists).
Since the rest of the algorithm will take linear time, the overall runtime is linear as you requested.
Here is a reference to some short python code implementing Andrew's modification to Graham's scan. See also my answer here.

Qhull(Qhalf) interior point

I am using qhull library for for computing the intersection of half spaces. Although this problem is a dual of convex hull problem, but as its input it needs an interior point of the intersection. As it is stated on their webpage, here, using linear programming we can find such a point. However, even for simple 2D cases, this LP problem does not have a bounded solution. Is there something wrong with the given instruction on the qhull website?
Well I found the answer myself! Yest the LP would be unbounded and we need to set an upper bound, depending on the context of the given problem.

Python Clustering Algorithms

I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori linking lengths are known (similar to this question).
I've tried kmeans, which works well if you know how many clusters you want. I've tried dbscan, which does poorly unless you tell it a characteristic length scale on which to stop looking (or start looking) for clusters. The problem is, I have potentially thousands of these clusters of particles, and I cannot spend the time to tell kmeans/dbscan algorithms what they should go off of.
Here is an example of what dbscan find:
You can see that there really are two separate populations here, though adjusting the epsilon factor (the max. distance between neighboring clusters parameter), I simply cannot get it to see those two populations of particles.
Is there any other algorithms which would work here? I'm looking for minimal information upfront - in other words, I'd like the algorithm to be able to make "smart" decisions about what could constitute a separate cluster.
I've found one that requires NO a priori information/guesses and does very well for what I'm asking it to do. It's called Mean Shift and is located in SciKit-Learn. It's also relatively quick (compared to other algorithms like Affinity Propagation).
Here's an example of what it gives:
I also want to point out that in the documentation is states that it may not scale well.
When using DBSCAN it can be helpful to scale/normalize data or
distances beforehand, so that estimation of epsilon will be relative.
There is a implementation of DBSCAN - I think its the one
Anony-Mousse somewhere denoted as 'floating around' - , which comes
with a epsilon estimator function. It works, as long as its not fed
with large datasets.
There are several incomplete versions of OPTICS at github. Maybe
you can find one to adapt it for your purpose. Still
trying to figure out myself, which effect minPts has, using one and
the same extraction method.
You can try a minimum spanning tree (zahn algorithm) and then remove the longest edge similar to alpha shapes. I used it with a delaunay triangulation and a concave hull:http://www.phpdevpad.de/geofence. You can also try a hierarchical cluster for example clusterfck.
Your plot indicates that you chose the minPts parameter way too small.
Have a look at OPTICS, which does no longer need the epsilon parameter of DBSCAN.

Data interpolation over a non-homogeneous surface

In my project i deal with big data surfaces.
In a certain point, i have a line across the data, and I need the values of the points of the line.
The grid is non,homogeneous, it doesnt go from n:m with fixed steps nor nothing.
Lets ilustrate!
In the figure the 2D proyection of my data can be seen. Each of the points has also other 3 data information. I defined a arbitrary red line with the form y=ax+b. a and b are known.
How can I define i.e. 50 points in the line that has not only the x and y coords (wich is straigforward) but also the interpolation of the 3 data information of each of the points around it.
I know is not an easy question but I can't seem to step forward even a bit.
PD: realize I DONT want code written for me, but the idea of how to achieve my objective.
You could use a tool like triScatteredInterp, which will triangulate the 2-d domain, then interpolate a list of points along your line. Griddata is also an option.
I have a toolbox for problems like this (of course.) It allows me to build a triangulation of the non-convex domain in the (x,y) plane. Then it can form a completely general slice through that surface, interpolating in z also as it does so. The result will be a 1-manifold, in this case a piecewise linear function along that path in (x,y,z). While those tools are not posted on the file exchange, they are available for the person willing to invest the time to learn to use them.
If the surface you describe is a completely general one in 3-d, that might be fairly complex, then you might need a CRUST based tool to define that surface triangulation. These can be found online too. Once a triangulation is available, my tools can then be used to slice them. (Sorry, I never did finish that piece.)
What I did was to define several points in the crack line and then cheack for each one of them in wich quadrilateral it is with inpoligon matlab function (no tthe fastest way but less than 2 secs).
Then I created a triangular plane in the used quadrilaterals using x,y and Z or the othre data , achieving a linear interpolation between the data.
finally i take out all the points that are 0 o Nan.

calculate distance between curves

I need to calculate some kind of distance between to curves.
Those are general curves, and may not be functions - that is, some values of x may be mapped to more then one value.
EDIT
The curves are given as a list of X,Y pairs and the logical curve is the line passing through all the points in the order given. a typical data set will include about 1000 points
as noted, the curve may not be a function, but is usually similar to a function
This issue us what prevents using interp1 or the curve fitting toolbox (in Matlab)
The distance measure I was thinking of the the area of the region between the curves - but any reasonable alternative is ok.
EDIT
a sample illustration of to curves, and the area I want to compute
A Matlab solution is preferred, but other languages are also fine.
If you have functions that are of the type y = f(x) and they are defined over the same domain, then a common way to find the "distance" is to use the L2 norm as explained here http://en.wikipedia.org/wiki/L2_norm#p-norm. This is simply the integral of the absolute value of the difference between the functions squared. If you have parametric curves then you cannot directly employ this approach. If the L2 norm is not good enough for your requirements then you will need to provide a more concrete definition of what you mean by "distance". If you are unclear as to what you need try taking a look at different types of mathematical norm and see if any of the commonly used ones are what you need (ie L1 norm, uniform norm). The wikepedia link above is a good start point. If the L2 is good enough then you need a way to calculate the integral that you have - there are many many numerical integration techniques out there and I suggest google is your friend here (or a good numerical analysis text book).
If you do have parametric type curves then this is very nontrivial. Using the "area" between curves is not a good idea as there is no clear way to define this area and would become even more complicated in the general case where you could have self-intersecting curves. If your curves are parametrized in the same way you could try some very crude measurement where you evaluate points on each curve at equally spaced values over the parameter range, then calculate the length of the distance between each, sum and take the average as a notion of "closeness". i.e. partition your parameter range into a set {u_0, ... , u_n} and evaluate curve1(u_i) and curve2(u_i) for each i to generate a set of n paired points. Then sum the euclidean distances between each pair of points.
This is very very crude though and if the parametrizations are different then it wont be much use.
You need to define what you mean by distance between the curves. If it is the closest approach between two general curves, then it becomes quite difficult to solve the problem.
If the "curves" are not even representable as single valued functions of x, then it becomes more complex yet.
Merely telling us that you need to define "some kind of distance" is too broad of a statement to be on-topic here, and it says that you have not yet thought out the problem you wish to solve.
If all you are willing to tell us is that the curves are two totally general parametric curves, which may be closed or not, or they may not even lie over the same domain, then the question becomes so totally ill-posed as to be impossible to answer. What is the area between two curves in that case?
If the curves are defined over the SAME support, then subtracting them and integration of the absolute value or square of the difference will be adequate. But you have already told us that these "curves" may be multi-valued. In that case, it is essentially impossible to do what you are asking.