How does scipy.interpolate.CubicSpline's extrapolate work mathematically?

How does scipy.interpolate.CubicSpline's extrapolate work mathematically? - scipy

i'm trying to understand how scipy.interpolate.CubicSpline(..., extrapolate=True)
works internally: How exactly does Scipy implement the extrapolation of a point x beyond [a, b] mathematically?
I'm looking for a formula such as:
score(x) = exp(-x + b) if x > b (which seems NOT to be how they implement it!).
Tried looking at the source code, was unable to find the out how it's implemented. Documentation does not give any details either.

A spline is a piecewise polynomial, i.e., there are several subintervals of data interval [a, b] on each of which it's just a polynomial. With extrapolate=True the polynomial from the leftmost interval is used for x < a, and the polynomial from the rightmost interval is used for x > b. This is what the documentation describes as
extrapolate to out-of-bounds points based on first and last intervals,
Very simple and very useless. Here is an illustration.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import CubicSpline
f = CubicSpline([-2, -1, 0, 1, 2], [1, 1, 0, 2, 2], extrapolate=True)
t = np.linspace(-4, 4)
plt.plot(t, f(t))
plt.show()
All the given y-values are between 0 and 2, but the cubic polynomials make a lousy extrapolation.

Related

Using numerical methods to plot solution to first-order nonlinear differential equation in Matlab

I have a question about plotting x(t), the solution to the following differential equation knowing that dx/dt equals the expression below. The value of x is 0 at t = 0.
syms x
dxdt = -(1.0*(6.84e+45*x^2 + 5.24e+32*x - 2.49e+42))/(2.47e+39*x + 7.12e+37)
I want to plot the solution of this first-order nonlinear differential equation. The analytical solution involves complex numbers so that's not relevant because this equation models a real-life process, but Matlab can solve the equation using numerical methods and plot it. Can someone please suggest how to do this?

in matlab try this
tspan = [0 10];
x0 = 0;
[t,x] = ode45(#(t,x) -(1.0*(6.84e+45*x^2 + 5.24e+32*x - 2.49e+42))/(2.47e+39*x + 7.12e+37), tspan, x0);
plot(t,x,'b')
i try it and i got this
hope that help you.

I have written an example for how to use Python with SymPy and matplotlib. SymPy can be used to calculate both definite and indefinite integrals. By calculating the indefinite integral and adding a constant to set it to evaluate to 0 at t = 0. Now you have the integral, so just a matter of plotting. I would define an array from a starting point to an endpoint with 1000 points between (could likely be less). You can then calculate the value of the integral with the constant at each time point, which can then be plotted with matplotlib. There are plenty of other questions on how to customize plots with matplotlib.
This displays a basic plot of the indefinite integral of the function dxdt with assumption of x(t) = 0. Variation of the tuple when running Plotting() will set what range of x values to plot. This is set to plot 1000 data points between the minimum and maximum values set when calling the function.
For more information on customizing the plot, I recommend matplotlib documentation. Documentation on the integral can be found in SymPy documentation.
import pandas as pd
from sympy import *
from sympy.abc import x
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
def Plotting(xValues, dxdt):
# Calculate integral
xt = integrate(dxdt,x)
# Convert to function
f = lambdify(x, xt)
C = -f(0)
# Define x values, last number in linspace corresponding to number of points to plot
xValues = np.linspace(xValues[0],xValues[1],500)
yValues = [f(x)+C for x in xValues]
# Initialize figure
fig = plt.figure(figsize = (4,3))
ax = fig.add_axes([0, 0, 1, 1])
# Plot Data
ax.plot(xValues, yValues)
plt.show()
plt.close("all")
# Define Function
dxdt = -(1.0*(6.84e45*x**2 + 5.24e32*x - 2.49e42))/(2.47e39*x + 7.12e37)
# Run Plotting function, with left and right most points defined as tuple, and function as second argument
Plotting((-0.025, 0.05),dxdt)

Volume of 3d shape using numerical integration with scipy

I have written a function for computing volume of intersection of a cube and a half-space and now I'm writing tests for it.
I've tried computing the volume numerically like this:
integral = scipy.integrate.tplquad(lambda z, y, x: int(Vector(x, y, z).dot(normal) < distance),
-0.5, 0.5,
lambda x: -0.5, lambda x: 0.5,
lambda x, y: -0.5, lambda x, y: 0.5,
epsabs=1e-5,
epsrel=1e-5)
... basically I integrate over the whole cube and each point gets value 1 or 0 based on if it is inside the half space.
This gets very slow (more than several seconds per invocation) and keeps giving me warnings like
scipy.integrate.quadpack.IntegrationWarning: The integral is probably divergent, or slowly convergent
Is there a better way to calculate this volume?

Integration of characteristic function is mathematically correct, but not practical. That is because most integration schemes are designed to integrate polynomials to some degree exactly, and in consequence all "relatively smooth" functions reasonably well. Characteristic functions, however, are everything but smooth. Polynomial-style integration will get you nowhere.
A much better-suited approach is to build a discretized version of the domain first, and then simply sum up the volumes of the little tetrahedra.
Discretization in 3D can be done, for example, with pygalmesh (a project of mine interfacing CGAL). The below code discretizes the cut-off cube to
You can increase the precision by decreasing max_cell_circumradius and/or max_edge_size_at_feature_edges, but meshing will take longer then. Moreover, you could specify "feature edges" to resolve the intersection edges exactly. This would give you the exactly correct result even with the coarsest cell size.
import pygalmesh
import numpy
c = pygalmesh.Cuboid([0, 0, 0], [1, 1, 1])
h = pygalmesh.HalfSpace([1.0, 2.0, 3.0], 4.0, 10.0)
u = pygalmesh.Intersection([c, h])
mesh = pygalmesh.generate_mesh(
u, max_cell_circumradius=3.0e-2, max_edge_size_at_feature_edges=1.0e-2
)
def compute_tet_volumes(vertices, tets):
cell_coords = vertices[tets]
a = cell_coords[:, 1, :] - cell_coords[:, 0, :]
b = cell_coords[:, 2, :] - cell_coords[:, 0, :]
c = cell_coords[:, 3, :] - cell_coords[:, 0, :]
# omega = <a, b x c>
omega = numpy.einsum("ij,ij->i", a, numpy.cross(b, c))
# https://en.wikipedia.org/wiki/Tetrahedron#Volume
return abs(omega) / 6.0
vol = numpy.sum(compute_tet_volumes(mesh.points, mesh.get_cells_type("tetra")))
print(f"{vol:.8e}")
8.04956436e-01

Integration
Integration of a discontinuous function is problematic, especially in multiple dimension. Some preliminary work, reducing the problem to an integral of a continuous function, is needed. Here I work out the height (top-bottom) as a function of x and y, and use dblquad for that: it returns in 36.2 ms.
I express the plane equations as a*x + b*y + c*z = distance. Some care is needed with the sign of c, as the plane could be a part of the top or of the bottom.
from scipy.integrate import dblquad
distance = 0.1
a, b, c = 3, -4, 2 # normal
zmin, zmax = -0.5, 0.5 # cube bounds
# preprocessing: make sure that c > 0
# by rearranging coordinates, and flipping the signs of all if needed
height = lambda y, x: min(zmax, max(zmin, (distance-a*x-b*y)/c)) - zmin
integral = dblquad(height, -0.5, 0.5,
lambda x: -0.5, lambda x: 0.5,
epsabs=1e-5, epsrel=1e-5)
Monte Carlo methods
Picking sample points at random (Monte Carlo method) avoids the issues with discontinuity: the accuracy is about the same for discontinuous as for continuous functions, the error decreases at the rate 1/sqrt(N) where N is the number of sample points.
The polytope package uses it internally. With it, a computation could go as
import numpy as np
import polytope as pc
a, b, c = 3, 4, -5 # normal vector
distance = 0.1
A = np.concatenate((np.eye(3), -np.eye(3), [[a, b, c]]), axis=0)
b = np.array(6*[0.5] + [distance])
p = pc.Polytope(A, b)
print(p.volume)
Here A and b encode the halfspaces as Ax<=b: the first fix rows are for faces of the cube, the last is for the plane.
To have more control over precision, either implement Monte-Carlo method yourself (easy) or use mcint package (about as easy).
Polytope volume: a task for linear algebra, not for integrators
You want to compute the volume of a polytope, a convex body formed by intersecting halfspaces. This ought to have an algebraic solution. SciPy has HalfspaceIntersection class for these but so far (1.0.0) does not implement finding the volume of such an object. If you could find the vertices of the polytope, then the ConvexHull class could be used to compute the volume. But as is, it seems that SciPy spatial module is no help. Maybe in a future version of SciPy...

If we assume that boundary of the half-space is given by $\{(x, y, z) \mid ax + by + cz + d = 0 \}$ with $c \not= 0$, and that the half-space of interest is that below the plane (in the $z$-direction), then your integral is given by
scipy.integrate.tplquad(lambda z, y, x: 1,
-0.5, 0.5,
lambda x: -0.5, lambda x: 0.5,
lambda x, y: -0.5, lambda x, y: max(-0.5, min(0.5, -(b*y+a*x+d)/c)))
Since at least one of $a$, $b$, and $c$ must be non-zero, the case $c = 0$ can be handled by changing coordinates.

Different python functions to fit cubic splines, finding coefficients

I want to fit a cubic spline in Python to noisy x, y data and extract the spline coefficients for each interval (i.e. I would expect to obtain four spline coefficients for each interval)
So far, I have tried (all from scipy.interpolate):
1) CubicSpline, but this method does not allow me to smooth the spline, resulting in unrealistic, jumpy coefficient data.
2) Combining splrep and splev, e.g.
tck = splrep(x, y, k=3, s=1e25)
where I extract the coefficients/knots using
F = PPoly.from_spline(tck)
coeffs = F.c
knots = F.x
However, I cannot find smooth coefficients over the full x-range (jumps between values close to zero and 1e23, which is unphysical) even if I ramp up the smoothing parameter s to very large numbers that ultimately lead to too small numbers of knots since the number of knots decreases with s. It seems that I cannot find a suitable parameter s and number of knots at the same time.
3) I used
UnivariateSpline(x, y, k=3, s=0.03)
Here, I found a better sensitivity to changing s, but the corresponding get_coeffs() method does not provide 4 coefficients for each interval but only one, which I do not understand.
4) I also tried a piecewise ridged linear regression with a third order polynomial, but this method provides too large percentage errors for the fit, so it would be great to get one of the standard spline methods working.
What am I missing? Can someone help, please?

The concrete issue I see here is that UnivariateSpline does not yield the algebraic coefficients of various powers of x in the interpolating spline. This is because the coefficients it keeps in the private _data property, which it also returns with get_coeffs method, are a kind of B-spline coefficients. These coefficients describe the spline without any redundancy (you need N of them for a spline with N degrees of freedom), but the basis splines that they are attached to are somewhat complicated.
But you can get the kind of coefficients you want by using the derivatives method of the spline object. It returns all four derivatives at a given point x, from which the Taylor coefficients at that point are easy to find. It is natural to use this method with x being the knots of interpolation, excluding the rightmost one; the coefficients obtained are valid from that knot to the next one. Here is an example, complete with "fancy" formatted output.
import numpy as np
from scipy.interpolate import UnivariateSpline
spl = UnivariateSpline(np.arange(6), np.array([3, 1, 4, 1, 5, 9]), s=0)
kn = spl.get_knots()
for i in range(len(kn)-1):
cf = [1, 1, 1/2, 1/6] * spl.derivatives(kn[i])
print("For {0} <= x <= {1}, p(x) = {5}*(x-{0})^3 + {4}*(x-{0})^2 + {3}*(x-{0}) + {2}".format(kn[i], kn[i+1], *cf))
The knots are 0, 2, 3, 5 in this example. The output is:
For 0.0 <= x <= 2.0, p(x) = -3.1222222222222222*(x-0.0)^3 + 11.866666666666667*(x-0.0)^2 + -10.744444444444445*(x-0.0) + 3.000000000000001
For 2.0 <= x <= 3.0, p(x) = 4.611111111111111*(x-2.0)^3 + -6.866666666666667*(x-2.0)^2 + -0.7444444444444436*(x-2.0) + 4.000000000000001
For 3.0 <= x <= 5.0, p(x) = -2.322222222222221*(x-3.0)^3 + 6.966666666666665*(x-3.0)^2 + -0.6444444444444457*(x-3.0) + 1.0000000000000016
Note that for each piece, cf holds the coefficients starting with the lowest degree, so the order is reversed when formatting the string.
(Of course, you'd probably want to do something else with these coefficients)
To check that the formulas are correct, I copy-pasted them for plotting:

How to extract coefficients of variables in a matrix in Matlab

Suppose I have the matrix below:
syms x y z
M = [x+y-z;2*x+3*y+-5*z;-x-y6*z];
I want to have the a matrix consisting the coefficients of the variables x,y, and z:
CM = [1,1,-1;2,3,-5;-1,-1,6];
If I multiply CM by [x;y;z], I expect to get M.
Edit
I have a system of ODE:
(d/dt)A = B
A and B are square matrices. I want to solve these set of equations. I don't want to use ode solving commands of Matlab.
If I turn the above set of equations into:
(d/dt)a = M*a
then I can solve it easily by the eigen vectors and values of matrix M. Here a is a column vector containing the variables, and M is the matrix of coefficient extracted from B.

Since you seem to be using the Symbolic Math Toolbox, you should diff symbolically, saving the derivative with respect to each variable:
syms x y z;
M=[x+y-z;2*x+3*y-5*z;-x-y+6*z];
Mdiff=[];
for k=symvar(M)
Mdiff=[Mdiff diff(M,k)];
end
Then you get
Mdiff =
[ 1, 1, -1]
[ 2, 3, -5]
[ -1, -1, 6]
If you want to order the columns in a non-lexicographical way, then you need to use a vector of your own instead of symvar.
Update
Since you mentioned that this approach is slow, it might be faster to use coeffs to treat M as a polynomial of its variables:
syms x y z;
M=[x+y-z;2*x+3*y-5*z;-x-y+6*z];
Mdiff2=[];
varnames=symvar(M);
for k=1:length(M)
Mdiff2=[Mdiff2; coeffs(M(k),varnames(end:-1:1))];
end
Note that for some reason (which I don't understand) the output of coeffs is reversed compare to its input variable list, this is why we call it with an explicitly reversed version of symvar(M).
Output:
>> Mdiff2
Mdiff2 =
[ 1, 1, -1]
[ 2, 3, -5]
[ -1, -1, 6]
As #horchler pointed out, this second solution will not work if your symbolic vector has varying number of variables in its components. Since speed only matters if you have to do this operation a lot of times, with many configurations of the parameters in your M, I would suggest constructing M parametrically (so that the coefficients are also syms) is possible, then you only have to perform the first version once. The rest is only substitution into the result.

Using scipy.stats.gaussian_kde with 2 dimensional data

I'm trying to use the scipy.stats.gaussian_kde class to smooth out some discrete data collected with latitude and longitude information, so it shows up as somewhat similar to a contour map in the end, where the high densities are the peak and low densities are the valley.
I'm having a hard time putting a two-dimensional dataset into the gaussian_kde class. I've played around to figure out how it works with 1 dimensional data, so I thought 2 dimensional would be something along the lines of:
from scipy import stats
from numpy import array
data = array([[1.1, 1.1],
[1.2, 1.2],
[1.3, 1.3]])
kde = stats.gaussian_kde(data)
kde.evaluate([1,2,3],[1,2,3])
which is saying that I have 3 points at [1.1, 1.1], [1.2, 1.2], [1.3, 1.3]. and I want to have the kernel density estimation using from 1 to 3 using width of 1 on x and y axis.
When creating the gaussian_kde, it keeps giving me this error:
raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix
Looking into the source code of gaussian_kde, I realize that the way I'm thinking about what dataset means is completely different from how the dimensionality is calculate, but I could not find any sample code showing how multi-dimension data works with the module. Could someone help me with some sample ways to use gaussian_kde with multi-dimensional data?

This example seems to be what you're looking for:
import numpy as np
import scipy.stats as stats
from matplotlib.pyplot import imshow
# Create some dummy data
rvs = np.append(stats.norm.rvs(loc=2,scale=1,size=(2000,1)),
stats.norm.rvs(loc=0,scale=3,size=(2000,1)),
axis=1)
kde = stats.kde.gaussian_kde(rvs.T)
# Regular grid to evaluate kde upon
x_flat = np.r_[rvs[:,0].min():rvs[:,0].max():128j]
y_flat = np.r_[rvs[:,1].min():rvs[:,1].max():128j]
x,y = np.meshgrid(x_flat,y_flat)
grid_coords = np.append(x.reshape(-1,1),y.reshape(-1,1),axis=1)
z = kde(grid_coords.T)
z = z.reshape(128,128)
imshow(z,aspect=x_flat.ptp()/y_flat.ptp())
Axes need fixing, obviously.
You can also do a scatter plot of the data with
scatter(rvs[:,0],rvs[:,1])

I think you are mixing up kernel density estimation with interpolation or maybe kernel regression. KDE estimates the distribution of points if you have a larger sample of points.
I'm not sure which interpolation you want, but either the splines or rbf in scipy.interpolate will be more appropriate.
If you want one-dimensional kernel regression, then you can find a version in scikits.statsmodels with several different kernels.
update: here is an example (if this is what you want)
>>> data = 2 + 2*np.random.randn(2, 100)
>>> kde = stats.gaussian_kde(data)
>>> kde.evaluate(np.array([[1,2,3],[1,2,3]]))
array([ 0.02573917, 0.02470436, 0.03084282])
gaussian_kde has variables in rows and observations in columns, so reversed orientation from the usual in stats. In your example, all three points are on a line, so it has perfect correlation. That is, I guess, the reason for the singular matrix.
Adjusting the array orientation and adding a small noise, the example works, but still looks very concentrated, for example you don't have any sample point near (3,3):
>>> data = np.array([[1.1, 1.1],
[1.2, 1.2],
[1.3, 1.3]]).T
>>> data = data + 0.01*np.random.randn(2,3)
>>> kde = stats.gaussian_kde(data)
>>> kde.evaluate(np.array([[1,2,3],[1,2,3]]))
array([ 7.70204299e+000, 1.96813149e-044, 1.45796523e-251])

I found it difficult to understand the SciPy manual's description of how gaussian_kde works with 2D data. Here is an explanation which is intended to complement #endolith 's example. I divided the code into several steps with comments to explain the less intuitive bits.
First, the imports:
import numpy as np
import scipy.stats as st
from matplotlib.pyplot import imshow, show
Create some dummy data: these are 1-D arrays of the "X" and "Y" point coordinates.
np.random.seed(142) # for reproducibility
x = st.norm.rvs(loc=2, scale=1, size=2000)
y = st.norm.rvs(loc=0, scale=3, size=2000)
For 2-D density estimation the gaussian_kde object has to be initialised with an array with two rows containing the "X" and "Y" datasets. In NumPy terminology, we "stack them vertically":
xy = np.vstack((x, y))
so the "X" data is in the first row xy[0,:] and the "Y" data are in the second row xy[1,:] and xy.shape is (2, 2000). Now create the gaussian_kde object:
dens = st.gaussian_kde(xy)
We will evaluate the estimated 2-D density PDF on a 2-D grid. There is more than one way of creating such a grid in NumPy. I show here an approach which is different from (but functionally equivalent to) #endolith 's method:
gx, gy = np.mgrid[x.min():x.max():128j, y.min():y.max():128j]
gxy = np.dstack((gx, gy)) # shape is (128, 128, 2)
gxy is a 3-D array, the [i,j]-th element of gxy contains a 2-element list of the corresponding "X" and "Y" values: gxy[i, j] 's value is [ gx[i], gy[j] ].
We have to invoke dens() (or dens.pdf() which is the same thing) on each of the 2-D grid points. NumPy has a very elegant function for this purpose:
z = np.apply_along_axis(dens, 2, gxy)
In words, the callable dens (could have been dens.pdf as well) is invoked along axis=2 (the third axis) in the 3-D array gxy and the values should be returned as a 2-D array. The only glitch is that the shape of z will be (128,128,1) and not (128,128) what I expected. Note that the documentation says that:
The shape of out [the return value, L.D.] is identical to the shape of arr, except along the
axis dimension. This axis is removed, and replaced with new dimensions
equal to the shape of the return value of func1d. So if func1d returns
a scalar out will have one fewer dimensions than arr.
Most likely dens() returned a 1-long tuple and not a scalar which I was hoping for. I didn't investigate the issue any further, because this is easy to fix:
z = z.reshape(128, 128)
after which we can generate the image:
imshow(z, aspect=gx.ptp() / gy.ptp())
show() # needed if you try this in PyCharm
Here is the image. (Note that I have implemented #endolith 's version as well and got an image indistinguishable from this one.)

The example posted in the top answer didn't work for me. I had to tweak it little bit and it works now:
import numpy as np
import scipy.stats as stats
from matplotlib import pyplot as plt
# Create some dummy data
rvs = np.append(stats.norm.rvs(loc=2,scale=1,size=(2000,1)),
stats.norm.rvs(loc=0,scale=3,size=(2000,1)),
axis=1)
kde = stats.kde.gaussian_kde(rvs.T)
# Regular grid to evaluate kde upon
x_flat = np.r_[rvs[:,0].min():rvs[:,0].max():128j]
y_flat = np.r_[rvs[:,1].min():rvs[:,1].max():128j]
x,y = np.meshgrid(x_flat,y_flat)
grid_coords = np.append(x.reshape(-1,1),y.reshape(-1,1),axis=1)
z = kde(grid_coords.T)
z = z.reshape(128,128)
plt.imshow(z,aspect=x_flat.ptp()/y_flat.ptp())
plt.show()

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How does scipy.interpolate.CubicSpline's extrapolate work mathematically? - scipy

Related

Using numerical methods to plot solution to first-order nonlinear differential equation in Matlab

Volume of 3d shape using numerical integration with scipy

Different python functions to fit cubic splines, finding coefficients

How to extract coefficients of variables in a matrix in Matlab

Using scipy.stats.gaussian_kde with 2 dimensional data

Categories

Resources