Merging two tangled convex hulls - convex-hull

How to merge two tangled convex hulls (like this) to form a Convex Hull using Graham's Scan or anyother algorithm in linear time?

Basically, you use
Andrew's modification of the Graham scan algorithm.
After the points are sorted in xy-order, the upper and lower hulls are computed in linear time. Since you already have the two convex hulls , the xy-sorting of the convex hull points will also take linear time (e.g., reverse the lower hulls and merge-sort four sorted lists).
Since the rest of the algorithm will take linear time, the overall runtime is linear as you requested.
Here is a reference to some short python code implementing Andrew's modification to Graham's scan. See also my answer here.

Related

Fast libraries for merging two tangled convex hulls accessible from python?

Am looking for something that is incremental (with accessible state). So that likely means some merge method is exposed.
So in general I want to start with a set of points, that has a ConvexHull calculated and add a point to it (which trivially has itself as a convex hull). Was looking for alternatives to BowyerWatson through convex hull merges. Not sure if this is a bad idea. Not sure if this should be a question in CS except it's about finding a real solution in the python echosystem.
I see some related content here.
Merging two tangled convex hulls
And Qhull (scipy Delaunay and ConvexHull use this) has a lot of options I do not yet understand
http://www.qhull.org/html/qh-optq.htm
You can use Andrew's modification of the Graham scan algorithm.
Here is a reference to some short python code implementing it.
What makes it suited for your needs is that after the points are sorted in xy-order, the upper and lower hulls are computed in linear time. Since you already have the convex hull (possibly both convex hulls), the xy-sorting of the convex hull points will take linear time (e.g., reverse the lower hulls and merge-sort four sorted lists). The rest of the algorithm will take linear time (in the number of points on the convex hulls, which may well be much smaller than the original number of points).
All the functionality for this implementation is in the code referenced above, and for the merge you can use the code from this SO answer or implement your own.

What's the best way to calculate a numerical derivative in MATLAB?

(Note: This is intended to be a community Wiki.)
Suppose I have a set of points xi = {x0,x1,x2,...xn} and corresponding function values fi = f(xi) = {f0,f1,f2,...,fn}, where f(x) is, in general, an unknown function. (In some situations, we might know f(x) ahead of time, but we want to do this generally, since we often don't know f(x) in advance.) What's a good way to approximate the derivative of f(x) at each point xi? That is, how can I estimate values of dfi == d/dx fi == df(xi)/dx at each of the points xi?
Unfortunately, MATLAB doesn't have a very good general-purpose, numerical differentiation routine. Part of the reason for this is probably because choosing a good routine can be difficult!
So what kinds of methods are there? What routines exist? How can we choose a good routine for a particular problem?
There are several considerations when choosing how to differentiate in MATLAB:
Do you have a symbolic function or a set of points?
Is your grid evenly or unevenly spaced?
Is your domain periodic? Can you assume periodic boundary conditions?
What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
Does it matter to you that your derivative is evaluated on the same points as your function is defined?
Do you need to calculate multiple orders of derivatives?
What's the best way to proceed?
These are just some quick-and-dirty suggestions. Hopefully somebody will find them helpful!
1. Do you have a symbolic function or a set of points?
If you have a symbolic function, you may be able to calculate the derivative analytically. (Chances are, you would have done this if it were that easy, and you would not be here looking for alternatives.)
If you have a symbolic function and cannot calculate the derivative analytically, you can always evaluate the function on a set of points, and use some other method listed on this page to evaluate the derivative.
In most cases, you have a set of points (xi,fi), and will have to use one of the following methods....
2. Is your grid evenly or unevenly spaced?
If your grid is evenly spaced, you probably will want to use a finite difference scheme (see either of the Wikipedia articles here or here), unless you are using periodic boundary conditions (see below). Here is a decent introduction to finite difference methods in the context of solving ordinary differential equations on a grid (see especially slides 9-14). These methods are generally computationally efficient, simple to implement, and the error of the method can be simply estimated as the truncation error of the Taylor expansions used to derive it.
If your grid is unevenly spaced, you can still use a finite difference scheme, but the expressions are more difficult and the accuracy varies very strongly with how uniform your grid is. If your grid is very non-uniform, you will probably need to use large stencil sizes (more neighboring points) to calculate the derivative at a given point. People often construct an interpolating polynomial (often the Lagrange polynomial) and differentiate that polynomial to compute the derivative. See for instance, this StackExchange question. It is often difficult to estimate the error using these methods (although some have attempted to do so: here and here). Fornberg's method is often very useful in these cases....
Care must be taken at the boundaries of your domain because the stencil often involves points that are outside the domain. Some people introduce "ghost points" or combine boundary conditions with derivatives of different orders to eliminate these "ghost points" and simplify the stencil. Another approach is to use right- or left-sided finite difference methods.
Here's an excellent "cheat sheet" of finite difference methods, including centered, right- and left-sided schemes of low orders. I keep a printout of this near my workstation because I find it so useful.
3. Is your domain periodic? Can you assume periodic boundary conditions?
If your domain is periodic, you can compute derivatives to a very high order accuracy using Fourier spectral methods. This technique sacrifices performance somewhat to gain high accuracy. In fact, if you are using N points, your estimate of the derivative is approximately N^th order accurate. For more information, see (for example) this WikiBook.
Fourier methods often use the Fast Fourier Transform (FFT) algorithm to achieve roughly O(N log(N)) performance, rather than the O(N^2) algorithm that a naively-implemented discrete Fourier transform (DFT) might employ.
If your function and domain are not periodic, you should not use the Fourier spectral method. If you attempt to use it with a function that is not periodic, you will get large errors and undesirable "ringing" phenomena.
Computing derivatives of any order requires 1) a transform from grid-space to spectral space (O(N log(N))), 2) multiplication of the Fourier coefficients by their spectral wavenumbers (O(N)), and 2) an inverse transform from spectral space to grid space (again O(N log(N))).
Care must be taken when multiplying the Fourier coefficients by their spectral wavenumbers. Every implementation of the FFT algorithm seems to have its own ordering of the spectral modes and normalization parameters. See, for instance, the answer to this question on the Math StackExchange, for notes about doing this in MATLAB.
4. What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
For many purposes, a 1st or 2nd order finite difference scheme may be sufficient. For higher precision, you can use higher order Taylor expansions, dropping higher-order terms.
If you need to compute the derivatives within a given tolerance, you may want to look around for a high-order scheme that has the error you need.
Often, the best way to reduce error is reducing the grid spacing in a finite difference scheme, but this is not always possible.
Be aware that higher-order finite difference schemes almost always require larger stencil sizes (more neighboring points). This can cause issues at the boundaries. (See the discussion above about ghost points.)
5. Does it matter to you that your derivative is evaluated on the same points as your function is defined?
MATLAB provides the diff function to compute differences between adjacent array elements. This can be used to calculate approximate derivatives via a first-order forward-differencing (or forward finite difference) scheme, but the estimates are low-order estimates. As described in MATLAB's documentation of diff (link), if you input an array of length N, it will return an array of length N-1. When you estimate derivatives using this method on N points, you will only have estimates of the derivative at N-1 points. (Note that this can be used on uneven grids, if they are sorted in ascending order.)
In most cases, we want the derivative evaluated at all points, which means we want to use something besides the diff method.
6. Do you need to calculate multiple orders of derivatives?
One can set up a system of equations in which the grid point function values and the 1st and 2nd order derivatives at these points all depend on each other. This can be found by combining Taylor expansions at neighboring points as usual, but keeping the derivative terms rather than cancelling them out, and linking them together with those of neighboring points. These equations can be solved via linear algebra to give not just the first derivative, but the second as well (or higher orders, if set up properly). I believe these are called combined finite difference schemes, and they are often used in conjunction with compact finite difference schemes, which will be discussed next.
Compact finite difference schemes (link). In these schemes, one sets up a design matrix and calculates the derivatives at all points simultaneously via a matrix solve. They are called "compact" because they are usually designed to require fewer stencil points than ordinary finite difference schemes of comparable accuracy. Because they involve a matrix equation that links all points together, certain compact finite difference schemes are said to have "spectral-like resolution" (e.g. Lele's 1992 paper--excellent!), meaning that they mimic spectral schemes by depending on all nodal values and, because of this, they maintain accuracy at all length scales. In contrast, typical finite difference methods are only locally accurate (the derivative at point #13, for example, ordinarily doesn't depend on the function value at point #200).
A current area of research is how best to solve for multiple derivatives in a compact stencil. The results of such research, combined, compact finite difference methods, are powerful and widely applicable, though many researchers tend to tune them for particular needs (performance, accuracy, stability, or a particular field of research such as fluid dynamics).
Ready-to-Go Routines
As described above, one can use the diff function (link to documentation) to compute rough derivatives between adjacent array elements.
MATLAB's gradient routine (link to documentation) is a great option for many purposes. It implements a second-order, central difference scheme. It has the advantages of computing derivatives in multiple dimensions and supporting arbitrary grid spacing. (Thanks to #thewaywewalk for pointing out this glaring omission!)
I used Fornberg's method (see above) to develop a small routine (nderiv_fornberg) to calculate finite differences in one dimension for arbitrary grid spacings. I find it easy to use. It uses sided stencils of 6 points at the boundaries and a centered, 5-point stencil in the interior. It is available at the MATLAB File Exchange here.
Conclusion
The field of numerical differentiation is very diverse. For each method listed above, there are many variants with their own set of advantages and disadvantages. This post is hardly a complete treatment of numerical differentiation.
Every application is different. Hopefully this post gives the interested reader an organized list of considerations and resources for choosing a method that suits their own needs.
This community wiki could be improved with code snippets and examples particular to MATLAB.
I believe there is more in to these particular questions. So I have elaborated on the subject further as follows:
(4) Q: What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
A: The accuracy of numerical differentiation is subjective to the application of interest. Usually the way it works is, if you are using the ND in forward problem to approximate the derivatives to estimate features from signal of interest, then you should be aware of noise perturbations. Usually such artifacts contain high frequency components and by the definition of the differentiator, the noise effect will be amplified in the magnitude order of $i\omega^n$. So, increasing the accuracy of differentiator (increasing the polynomial accuracy) will no help at all. In this case you should be able to cancelt the effect of noise for differentiation. This can be done in casecade order: first smooth the signal, and then differentiate. But a better way of doing this is to use "Lowpass Differentiator". A good example of MATLAB library can be found here.
However, if this is not the case and you're using ND in inverse problems, such as solvign PDEs, then the global accuracy of differentiator is very important. Depending on what kind of bounady condition (BC) suits your problem, the design will be adapted accordingly. The rule of thump is to increase the numerical accuracy known is the fullband differentiator. You need to design a derivative matrix that takes care of suitable BC. You can find comprehensive solutions to such designs using the above link.
(5) Does it matter to you that your derivative is evaluated on the same points as your function is defined?
A: Yes absolutely. The evaluation of the ND on the same grid points is called "centralized" and off the points "staggered" schemes. Note that using odd order of derivatives, centralized ND will deviate the accuracy of frequency response of the differentiator. Therefore, if you're using such design in inverse problems, this will perturb your approximation. Also, the opposite applies to the case of even order of differentiation utilized by staggered schemes. You can find comprehensive explanation on this subject using the link above.
(6) Do you need to calculate multiple orders of derivatives?
This totally depends on your application at hand. You can refer to the same link I have provided and take care of multiple derivative designs.

Qhull(Qhalf) interior point

I am using qhull library for for computing the intersection of half spaces. Although this problem is a dual of convex hull problem, but as its input it needs an interior point of the intersection. As it is stated on their webpage, here, using linear programming we can find such a point. However, even for simple 2D cases, this LP problem does not have a bounded solution. Is there something wrong with the given instruction on the qhull website?
Well I found the answer myself! Yest the LP would be unbounded and we need to set an upper bound, depending on the context of the given problem.

calculate distance between curves

I need to calculate some kind of distance between to curves.
Those are general curves, and may not be functions - that is, some values of x may be mapped to more then one value.
EDIT
The curves are given as a list of X,Y pairs and the logical curve is the line passing through all the points in the order given. a typical data set will include about 1000 points
as noted, the curve may not be a function, but is usually similar to a function
This issue us what prevents using interp1 or the curve fitting toolbox (in Matlab)
The distance measure I was thinking of the the area of the region between the curves - but any reasonable alternative is ok.
EDIT
a sample illustration of to curves, and the area I want to compute
A Matlab solution is preferred, but other languages are also fine.
If you have functions that are of the type y = f(x) and they are defined over the same domain, then a common way to find the "distance" is to use the L2 norm as explained here http://en.wikipedia.org/wiki/L2_norm#p-norm. This is simply the integral of the absolute value of the difference between the functions squared. If you have parametric curves then you cannot directly employ this approach. If the L2 norm is not good enough for your requirements then you will need to provide a more concrete definition of what you mean by "distance". If you are unclear as to what you need try taking a look at different types of mathematical norm and see if any of the commonly used ones are what you need (ie L1 norm, uniform norm). The wikepedia link above is a good start point. If the L2 is good enough then you need a way to calculate the integral that you have - there are many many numerical integration techniques out there and I suggest google is your friend here (or a good numerical analysis text book).
If you do have parametric type curves then this is very nontrivial. Using the "area" between curves is not a good idea as there is no clear way to define this area and would become even more complicated in the general case where you could have self-intersecting curves. If your curves are parametrized in the same way you could try some very crude measurement where you evaluate points on each curve at equally spaced values over the parameter range, then calculate the length of the distance between each, sum and take the average as a notion of "closeness". i.e. partition your parameter range into a set {u_0, ... , u_n} and evaluate curve1(u_i) and curve2(u_i) for each i to generate a set of n paired points. Then sum the euclidean distances between each pair of points.
This is very very crude though and if the parametrizations are different then it wont be much use.
You need to define what you mean by distance between the curves. If it is the closest approach between two general curves, then it becomes quite difficult to solve the problem.
If the "curves" are not even representable as single valued functions of x, then it becomes more complex yet.
Merely telling us that you need to define "some kind of distance" is too broad of a statement to be on-topic here, and it says that you have not yet thought out the problem you wish to solve.
If all you are willing to tell us is that the curves are two totally general parametric curves, which may be closed or not, or they may not even lie over the same domain, then the question becomes so totally ill-posed as to be impossible to answer. What is the area between two curves in that case?
If the curves are defined over the SAME support, then subtracting them and integration of the absolute value or square of the difference will be adequate. But you have already told us that these "curves" may be multi-valued. In that case, it is essentially impossible to do what you are asking.

how to do clustering with similarity as a measure?

I read about spherical kmeans but i did not come across an implementation.To be clear, similarity is simple the dot product of two document unit vectors.I have read that standard k means uses distance as measure. Is the distance being specified the vector distance just like in coordinate geometry sqrt((x2 -x1)^2 + (y2-y1)^2)?
There are more clustering methods than k-means. The problem with k-means is not so much that is is built on Euclidean distance, but that the mean must reduce the distances for the algorithm to converge.
However, there are tons of other clustering algorithms that do not need to compute a mean or have triangle inequality. If you read the Wikipedia article on DBSCAN, it also mentions a version called GDBSCAN, Generalized DBSCAN. You definitely should be able to plug your similarity function into GDBSCAN. Most likely, you could just use 1/similarity and use it as a distance function, unless the algorithm requires triangle inequality. So this trick should work with DBSCAN and OPTICS, for example. Probably also with hierarchical clustering, k-medians and k-medoids (PAM).