Volume of 3d shape using numerical integration with scipy - scipy

I have written a function for computing volume of intersection of a cube and a half-space and now I'm writing tests for it.
I've tried computing the volume numerically like this:
integral = scipy.integrate.tplquad(lambda z, y, x: int(Vector(x, y, z).dot(normal) < distance),
-0.5, 0.5,
lambda x: -0.5, lambda x: 0.5,
lambda x, y: -0.5, lambda x, y: 0.5,
epsabs=1e-5,
epsrel=1e-5)
... basically I integrate over the whole cube and each point gets value 1 or 0 based on if it is inside the half space.
This gets very slow (more than several seconds per invocation) and keeps giving me warnings like
scipy.integrate.quadpack.IntegrationWarning: The integral is probably divergent, or slowly convergent
Is there a better way to calculate this volume?

Integration of characteristic function is mathematically correct, but not practical. That is because most integration schemes are designed to integrate polynomials to some degree exactly, and in consequence all "relatively smooth" functions reasonably well. Characteristic functions, however, are everything but smooth. Polynomial-style integration will get you nowhere.
A much better-suited approach is to build a discretized version of the domain first, and then simply sum up the volumes of the little tetrahedra.
Discretization in 3D can be done, for example, with pygalmesh (a project of mine interfacing CGAL). The below code discretizes the cut-off cube to
You can increase the precision by decreasing max_cell_circumradius and/or max_edge_size_at_feature_edges, but meshing will take longer then. Moreover, you could specify "feature edges" to resolve the intersection edges exactly. This would give you the exactly correct result even with the coarsest cell size.
import pygalmesh
import numpy
c = pygalmesh.Cuboid([0, 0, 0], [1, 1, 1])
h = pygalmesh.HalfSpace([1.0, 2.0, 3.0], 4.0, 10.0)
u = pygalmesh.Intersection([c, h])
mesh = pygalmesh.generate_mesh(
u, max_cell_circumradius=3.0e-2, max_edge_size_at_feature_edges=1.0e-2
)
def compute_tet_volumes(vertices, tets):
cell_coords = vertices[tets]
a = cell_coords[:, 1, :] - cell_coords[:, 0, :]
b = cell_coords[:, 2, :] - cell_coords[:, 0, :]
c = cell_coords[:, 3, :] - cell_coords[:, 0, :]
# omega = <a, b x c>
omega = numpy.einsum("ij,ij->i", a, numpy.cross(b, c))
# https://en.wikipedia.org/wiki/Tetrahedron#Volume
return abs(omega) / 6.0
vol = numpy.sum(compute_tet_volumes(mesh.points, mesh.get_cells_type("tetra")))
print(f"{vol:.8e}")
8.04956436e-01

Integration
Integration of a discontinuous function is problematic, especially in multiple dimension. Some preliminary work, reducing the problem to an integral of a continuous function, is needed. Here I work out the height (top-bottom) as a function of x and y, and use dblquad for that: it returns in 36.2 ms.
I express the plane equations as a*x + b*y + c*z = distance. Some care is needed with the sign of c, as the plane could be a part of the top or of the bottom.
from scipy.integrate import dblquad
distance = 0.1
a, b, c = 3, -4, 2 # normal
zmin, zmax = -0.5, 0.5 # cube bounds
# preprocessing: make sure that c > 0
# by rearranging coordinates, and flipping the signs of all if needed
height = lambda y, x: min(zmax, max(zmin, (distance-a*x-b*y)/c)) - zmin
integral = dblquad(height, -0.5, 0.5,
lambda x: -0.5, lambda x: 0.5,
epsabs=1e-5, epsrel=1e-5)
Monte Carlo methods
Picking sample points at random (Monte Carlo method) avoids the issues with discontinuity: the accuracy is about the same for discontinuous as for continuous functions, the error decreases at the rate 1/sqrt(N) where N is the number of sample points.
The polytope package uses it internally. With it, a computation could go as
import numpy as np
import polytope as pc
a, b, c = 3, 4, -5 # normal vector
distance = 0.1
A = np.concatenate((np.eye(3), -np.eye(3), [[a, b, c]]), axis=0)
b = np.array(6*[0.5] + [distance])
p = pc.Polytope(A, b)
print(p.volume)
Here A and b encode the halfspaces as Ax<=b: the first fix rows are for faces of the cube, the last is for the plane.
To have more control over precision, either implement Monte-Carlo method yourself (easy) or use mcint package (about as easy).
Polytope volume: a task for linear algebra, not for integrators
You want to compute the volume of a polytope, a convex body formed by intersecting halfspaces. This ought to have an algebraic solution. SciPy has HalfspaceIntersection class for these but so far (1.0.0) does not implement finding the volume of such an object. If you could find the vertices of the polytope, then the ConvexHull class could be used to compute the volume. But as is, it seems that SciPy spatial module is no help. Maybe in a future version of SciPy...

If we assume that boundary of the half-space is given by $\{(x, y, z) \mid ax + by + cz + d = 0 \}$ with $c \not= 0$, and that the half-space of interest is that below the plane (in the $z$-direction), then your integral is given by
scipy.integrate.tplquad(lambda z, y, x: 1,
-0.5, 0.5,
lambda x: -0.5, lambda x: 0.5,
lambda x, y: -0.5, lambda x, y: max(-0.5, min(0.5, -(b*y+a*x+d)/c)))
Since at least one of $a$, $b$, and $c$ must be non-zero, the case $c = 0$ can be handled by changing coordinates.

Related

Extrapolate in log scale on y axis

x = [0,1,2,3,4,5,6,7] y = [0.07, 0.05, 0.03, 0.02, 0.01, 0.005, 0.002, 0.0007]
I want to find what x is when y= 0.000001 and I tried below but it gives me a wrong value.
10^(interp1(log10(y),x,10^-6, 'linear','extrap'))
Also, would linear extrapolation be possible if I only had two points like so,
x = [6,7] y = [0.002, 0.0007]
interp1's linear-extrap function simply extends the last (or first) segment in the piecewise linear fit made from the points. It doesn't actually create a least-squares fit. This is very evident in the image below:
How did I get this image? I fixed the following problems with your code:
You are interpolating log10(y) vs x.
So the third argument for interp1 needs to be log10(new_y). For new_y = 10^-6, you actually need to pass -6.
The call to interp1() will give you new_x. You're raising 10 to the result of interp1, which is wrong.
x = [0,1,2,3,4,5,6,7];
y = [0.07, 0.05, 0.03, 0.02, 0.01, 0.005, 0.002, 0.0007]
logy = log10(y);
plot(logy, x, '-x');
new_y = 10^-6;
new_x = interp1(logy, x, log10(new_y), 'linear', 'extrap')
plot(log10([new_y, y(end)]), [new_x, x(end)], '--r');
plot(log10(new_y), new_x, 'or');
xlabel('log10(y)'); ylabel('x');
The short answer to your second question is yes!.
Longer answer: replace x and y in my code above and see if it works(spoiler: it does).
Note: I ran this code in Octave Online because I don't have a local MATLAB installation. Shouldn't make a difference to the answer though.

Different python functions to fit cubic splines, finding coefficients

I want to fit a cubic spline in Python to noisy x, y data and extract the spline coefficients for each interval (i.e. I would expect to obtain four spline coefficients for each interval)
So far, I have tried (all from scipy.interpolate):
1) CubicSpline, but this method does not allow me to smooth the spline, resulting in unrealistic, jumpy coefficient data.
2) Combining splrep and splev, e.g.
tck = splrep(x, y, k=3, s=1e25)
where I extract the coefficients/knots using
F = PPoly.from_spline(tck)
coeffs = F.c
knots = F.x
However, I cannot find smooth coefficients over the full x-range (jumps between values close to zero and 1e23, which is unphysical) even if I ramp up the smoothing parameter s to very large numbers that ultimately lead to too small numbers of knots since the number of knots decreases with s. It seems that I cannot find a suitable parameter s and number of knots at the same time.
3) I used
UnivariateSpline(x, y, k=3, s=0.03)
Here, I found a better sensitivity to changing s, but the corresponding get_coeffs() method does not provide 4 coefficients for each interval but only one, which I do not understand.
4) I also tried a piecewise ridged linear regression with a third order polynomial, but this method provides too large percentage errors for the fit, so it would be great to get one of the standard spline methods working.
What am I missing? Can someone help, please?
The concrete issue I see here is that UnivariateSpline does not yield the algebraic coefficients of various powers of x in the interpolating spline. This is because the coefficients it keeps in the private _data property, which it also returns with get_coeffs method, are a kind of B-spline coefficients. These coefficients describe the spline without any redundancy (you need N of them for a spline with N degrees of freedom), but the basis splines that they are attached to are somewhat complicated.
But you can get the kind of coefficients you want by using the derivatives method of the spline object. It returns all four derivatives at a given point x, from which the Taylor coefficients at that point are easy to find. It is natural to use this method with x being the knots of interpolation, excluding the rightmost one; the coefficients obtained are valid from that knot to the next one. Here is an example, complete with "fancy" formatted output.
import numpy as np
from scipy.interpolate import UnivariateSpline
spl = UnivariateSpline(np.arange(6), np.array([3, 1, 4, 1, 5, 9]), s=0)
kn = spl.get_knots()
for i in range(len(kn)-1):
cf = [1, 1, 1/2, 1/6] * spl.derivatives(kn[i])
print("For {0} <= x <= {1}, p(x) = {5}*(x-{0})^3 + {4}*(x-{0})^2 + {3}*(x-{0}) + {2}".format(kn[i], kn[i+1], *cf))
The knots are 0, 2, 3, 5 in this example. The output is:
For 0.0 <= x <= 2.0, p(x) = -3.1222222222222222*(x-0.0)^3 + 11.866666666666667*(x-0.0)^2 + -10.744444444444445*(x-0.0) + 3.000000000000001
For 2.0 <= x <= 3.0, p(x) = 4.611111111111111*(x-2.0)^3 + -6.866666666666667*(x-2.0)^2 + -0.7444444444444436*(x-2.0) + 4.000000000000001
For 3.0 <= x <= 5.0, p(x) = -2.322222222222221*(x-3.0)^3 + 6.966666666666665*(x-3.0)^2 + -0.6444444444444457*(x-3.0) + 1.0000000000000016
Note that for each piece, cf holds the coefficients starting with the lowest degree, so the order is reversed when formatting the string.
(Of course, you'd probably want to do something else with these coefficients)
To check that the formulas are correct, I copy-pasted them for plotting:

Showing 3D data on a patch surface with Matlab

I want to show, with Matlab, a temperature distribution on an object surface.
I've got a 3D data in the form of (x, y, z, V) vectors. I would like to show this object in Matlab, with the colour representing the local total "value".
I can export the object as an STL file. It can be shown easily using the STL plotting (see stldemo):
fv = stlread('file.stl');
patch(fv, 'EdgeColor', 'none', 'FaceLighting', 'gouraud', 'AmbientStrength', 0.15, 'FaceColor', [0.8 0.8 1.0]);
camlight('headlight');
material('dull');
To colour it according to (x,y,z,V), I need to attach each (x, y, z) point to a vertex in the patch (the nearest one would work). If there are many (x,y,z) points for which a single STL vertex is the nearest, I add up the corresponding V values for that vertex.
The number of vertices is thousands. The number of (x, y, z) points is also large. So doing a loop through (x, y, z) points and then an internal loop over vertices to find the nearest one (which involves calculating distances between points) is out of question. Is there any smart way to do it quickly?
Note: I cannot control the location of the data points, they are defined by an external program. The STL points are controlled by another external program. So I have to marry two different point sets.
Here is the code illustrating what I want to achieve, with 4 vertices and 3 data points:
% Create patch
figure;
p = patch;
colorbar
p.Vertices = [...
0, 0, 0; ...
1, 0, 0; ...
1, 1, 0;
0, 1, 0];
p.Faces = [ ...
1, 2, 3; ...
1, 3, 4];
% Data points
x = [0.1, 0.1, 0.25];
y = [0.01, 0.02, 0.75];
z = [0.01, 0.2, -0.01];
v = [1, 1, 1];
p.FaceVertexCData = zeros(size(p.Vertices, 1), 1);
% Point 1 (0.1, 0.01, 0.01) is closest to vertex 1 (0, 0, 0). Its value
% goes to vertex 1.
p.FaceVertexCData(1) = p.FaceVertexCData(1) + v(1);
% Point 2 (0.1, 0.02, 0.2) is also closest to vertex 1 (0, 0, 0). Its
% value also goes to vertex 1
p.FaceVertexCData(1) = p.FaceVertexCData(1) + v(2);
% Point 3 (0.25, 0.75, -0.01) is closest to vertex 4 (0, 1, 0). Its power
% goes to vertex 4.
p.FaceVertexCData(4) = p.FaceVertexCData(4) + v(3);
% Other vertices are left with 0.
p.FaceColor = 'interp';
Attaching a volume scalar value (of Temperature in your case) of a point to a neighbouring point is a tricky exercise, requires complex for loops and defining special case rules (in your case you wanted to attach the value of 2 different points to the same patch vertex, what if the 2 values to attach are different? Do you average? discard ?).
A safer approach is to re-interpolate your temperature field over your object surface. the function griddata can do that for you.
First I had to define a scalar field. Since I do not have your temperature data, I use the flow function from Matlab. I generated a scalar field the same ways than in this article: flow data.
This gave me a scalar field v (flow value but let's say it's your temperature) at for every coordinates x, y, z.
Then I created and introduced a 3D patch which will be your object. I chose a sphere but any 3D patch will work the same way.
The code to get the sphere as a patch is borrowed from surf2patch
you will have to offset and inflate the sphere to get it exactly as in the figure below
Now is the interesting bit. In the following code, v is the value of the scalar field (temperature for you) at the coordinates x, y, z.
%% // Extract patch vertices coordinates in separate variables
xp = fv.vertices(:,1) ;
yp = fv.vertices(:,2) ;
zp = fv.vertices(:,3) ;
%% // interpolate the temperature field over the patch coordinates
Tpv = griddata(x,y,z,v,xp,yp,zp) ;
%% // Set the patch color data to the new interpolated temperature
set(hp,'FaceVertexCData',Tpv) ;
And your object surface is now at the right interpolated temperature:
you can delete the slice plane if you want to observe the patch alone

What algorithm can I use to recognize the line in this scatterplot?

I'm creating a program to compare audio files which uses a similar algorithm to the one described here http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf. I am plotting the times of matches between two songs being compared and finding the line of least squares for the plot. href=http://imgur.com/fGu7jhX&yOeMSK0 is an example plot of matching files. The plot is too messy and the least squares regression line does not produce a high correlation coefficient even though there is an obvious line in the graph. What other algorithm can I use to recognize this line?
This is an interesting question, but it's been pretty quiet. Maybe this answer
will trigger some more activity.
For identifying lines with arbitrary slopes and intercepts within a collection
of points, the Hough transform would be a good place to start. For your audio
application, however, it looks like the slope should always be 1, so you don't
need the full generality of the Hough transform.
Instead, you can think of the problem as one of clustering the differences x - y, where x and y are the vectors holding the x and y coordinates of the points.
One approach would be to compute a histogram of x - y. Points that are close to lying in the same line with slope 1 will have differences in the same bin in the histogram. The bin with the largest count corresponds to the largest collection of points that are approximately aligned. An issue to deal with in this approach is choosing the boundaries of the histogram bins. A bad choice could result in points that should be grouped together being split into neighboring bins.
A simple brute-force approach is to imagine a diagonal window with a given width, sliding left to right across the (x,y) plane. The best candidate for a line corresponds to the position of the window that contains the most points. This is similar to a histogram of x - y, but instead of having a collection of disjoint bins, there are overlapping bins, one for each point. All the bins have the same width, and each point determines the left edge of a bin.
The function count_diag_groups in the code below does that computation. For each point, it counts how many points are in the diagonal window when the left edge of the window is on that point. The best candidate for a line is the window with the most points. Here's the plot generated by the script. The top is the scatter plot of the data. The bottow is the same scatter plot, with the best candidate points highlighted.
A nice feature of this method is that there is only one parameter, the window width. A not-so-nice feature is that it has time complexity O(n**2), where n is the number of points. There are surely algorithms with better time complexity that could do something similar; the article that you link to discusses this. To judge the quality of an alternative, however, will require more concrete specifications of how "good" or robust the line identification must be.
import numpy as np
import matplotlib.pyplot as plt
def count_diag_groups(x, y, width):
"""
Returns a list of arrays. The length of the list is the same
as the length of x. The k-th array holds the indices into x
(and y) of a set of points that are in a "diagonal" window with
the given width whose left edge includes the point (x[k], y[k]).
"""
d = x - y
result = []
for i in range(d.size):
delta = d - d[i]
neighbors = np.where((delta >= 0) & (delta <= width))[0]
result.append(neighbors)
return result
def generate_demo_data():
# Generate some data.
np.random.seed(123)
xmin = 0
xmax = 100
ymin = 0
ymax = 25
nrnd = 175
xrnd = xmin + (xmax - xmin)*np.random.rand(nrnd)
yrnd = ymin + (ymax - ymin)*np.random.rand(nrnd)
n = 25
xx = xmin + 0.1*(xmax - xmin) + ymax*np.random.rand(n)
yy = (xx - xx.min()) + 0.2*np.random.randn(n)
x = np.concatenate((xrnd, xx))
y = np.concatenate((yrnd, yy))
return x, y
def plot_result(x, y, width, selection):
xmin = x.min()
xmax = x.max()
ymin = y.min()
ymax = y.max()
xsel = x[selection]
ysel = y[selection]
# Plot...
plt.figure(1)
plt.clf()
ax = plt.subplot(2,1,1)
plt.plot(x, y, 'o', mfc='b', mec='b', alpha=0.5)
plt.xlim(xmin - 1, xmax + 1)
plt.ylim(ymin - 1, ymax + 1)
plt.subplot(2,1,2, sharex=ax, sharey=ax)
plt.plot(x, y, 'o', mfc='b', mec='b', alpha=0.5)
plt.plot(xsel, ysel, 'o', mfc='w', mec='w')
plt.plot(xsel, ysel, 'o', mfc='r', mec='r', alpha=0.65)
xi = np.array([xmin, xmax])
d = x - y
yi1 = xi - d[imax]
yi2 = yi1 - width
plt.plot(xi, yi1, 'r-', alpha=0.25)
plt.plot(xi, yi2, 'r-', alpha=0.25)
plt.xlim(xmin - 1, xmax + 1)
plt.ylim(ymin - 1, ymax + 1)
plt.show()
if __name__ == "__main__":
x, y = generate_demo_data()
# Find a selection of points that are close to being aligned
# with a slope of 1.
width = 0.75
r = count_diag_groups(x, y, width)
# Find the largest group.
sz = np.array(list(len(f) for f in r))
imax = sz.argmax()
# k holds the indices of the selected points.
selection = r[imax]
plot_result(x, y, width, selection)
This looks like an excellent example of a task for Random Sampling Consensus (RANSAC).
The Wikipedia article even uses your problem as an example!
The rough outline is something like this.
Select 2 random points in your data, fit a line to them
For each other point, find the distance to that line. If the distance is below a threshold, it is part of the inlier set.
If the final inlier set for this particular line is larger than the previously best line, then keep the new line as the best candidate.
If the decided number of iterations is reached, return the best line found, else go back to 1 and choose new random points.
Check the Wikipedia article for more information.

"Frequency" shift in discrete FFT in MATLAB

(Disclaimer: I thought about posting this on math.statsexchange, but found similar questions there that were moved to SO, so here I am)
The context:
I'm using fft/ifft to determine probability distributions for sums of random variables.
So e.g. I'm having two uniform probability distributions - in the simplest case two uniform distributions on the interval [0,1].
So to get the probability distribution for the sum of two random variables sampled from these two distributions, one can calculate the product of the fourier-transformed of each probabilty density.
Doing the inverse fft on this product, you get back the probability density for the sum.
An example:
function usumdist_example()
x = linspace(-1, 2, 1e5);
dx = diff(x(1:2));
NFFT = 2^nextpow2(numel(x));
% take two uniform distributions on [0,0.5]
intervals = [0, 0.5;
0, 0.5];
figure();
hold all;
for i=1:size(intervals,1)
% construct the prob. dens. function
P_x = x >= intervals(i,1) & x <= intervals(i,2);
plot(x, P_x);
% for each pdf, get the characteristic function fft(pdf,NFFT)
% and form the product of all char. functions in Y
if i==1
Y = fft(P_x,NFFT) / NFFT;
else
Y = Y .* fft(P_x,NFFT) / NFFT;
end
end
y = ifft(Y, NFFT);
x_plot = x(1) + (0:dx:(NFFT-1)*dx);
plot(x_plot, y / max(y), '.');
end
My issue is, the shape of the resulting prob. dens. function is perfect.
However, the x-axis does not fit to the x I create in the beginning, but is shifted.
In the example, the peak is at 1.5, while it should be 0.5.
The shift changes if I e.g. add a third random variable or if I modify the range of x.
But I can't get figure how.
I'm afraid it might have to do with the fact that I'm having negative x values, while fourier transforms usually work in a time/frequency domain, where frequencies < 0 don't make sense.
I'm aware I could find e.g. the peak and shift it to its proper place, but seems nasty and error prone...
Glad about any ideas!
The problem is that your x origin is -1, not 0. You expect the center of the triangular pdf to be at .5, because that's twice the value of the center of the uniform pdf. However, the correct reasoning is: the center of the uniform pdf is 1.25 above your minimum x, and you get the center of the triangle at 2*1.25 = 2.5 above the minimum x (that is, at 1.5).
In other words: although your original x axis is (-1, 2), the convolution (or the FFT) behave as if it were (0, 3). In fact, the FFT knows nothing about your x axis; it only uses the y samples. Since your uniform is zero for the first samples, that zero interval of width 1 is amplified to twice its width when you do the convolution (or the FFT). I suggest drawing the convolution on paper to see this (draw original signal, reflected signal about y axis, displace the latter and see when both begin to overlap). So you need a correction in the x_plot line to compensate for this increased width of the zero interval: use
x_plot = 2*x(1) + (0:dx:(NFFT-1)*dx);
and then plot(x_plot, y / max(y), '.') will give the correct graph: