small differences in scipy spline interpolation vs natural and matlab splines - scipy

I am using scipy.interpolate to make a spline interpolation based on the following data:
xs=[0.041984,0.374045,0.625954,0.874045,1.374045,1.870229,2.362595,2.862595,3.358778,3.854961,4.354961,5.354961,7.343511,8.835877,9.335877,10.33587]
ys=[14.145,14.235,14.275,14.24,13.91,13.7,13.57,13.52,13.55,13.56,13.45,13.44,13.46,13.44,13.45,13.45]
f = interp1d(xs,ys,"cubic")
now f(10) gives 13.4589, while I obtain 13.4525 with the "numerical recieipes" code in C (natural spline)
Can you please explain the difference?
I tried also with Matlab which gives 13.4583

Despite their attractive name, "natural" cubic splines are rarely used. The usual way to deal with two extra degrees of freedom in the cubic spline construction is to impose the "not-a-knot" condition, which requires the third derivative to be continuous at the first and last interior knots; in effect, these knots are no longer knots because the polynomial coefficients do not change at those knots.
The cubic spline returned by interp1d is not-a-knot, and so is the spline constructed by Matlab (by default). To construct a natural spline with SciPy, use make_interp_spline with boundary conditions [(2, 0)], [(2, 0)], meaning the 2nd derivative must be zero at both ends. Example:
xs=[0.041984,0.374045,0.625954,0.874045,1.374045,1.870229,2.362595,2.862595,3.358778,3.854961,4.354961,5.354961,7.343511,8.835877,9.335877,10.33587]
ys=[14.145,14.235,14.275,14.24,13.91,13.7,13.57,13.52,13.55,13.56,13.45,13.44,13.46,13.44,13.45,13.45]
spl1 = interp1d(xs, ys, "cubic")
l, r = [(2, 0)], [(2, 0)] # natural spline boundary conditions
spl2 = make_interp_spline(xs, ys, k=3, bc_type=(l, r))
t = np.linspace(min(xs), max(xs), 500)
plt.plot(t, spl1(t) - spl2(t))
plt.show()
The plot of the difference of two splines shows that the difference is only visible near the ends of the interpolation range; this is where the effect of the boundary conditions is strongest.
The values at 10:
13.458277949 for knot-a-not spline
13.4524921744 for natural spline

Related

Create spline from knots and coefficients using scipy

I'm trying to reproduce a function from a paper, which is specified only in terms of spline knots and coefficients. After finding this on stackoverflow, given a scipy interpolation object, from its knots and coefficients, I can recreate the scipy interpolation. However, the approach fails for the function specified in the paper. To reproduce a scipy interpolation I can do this:
using PyCall, PyPlot, Random
Random.seed!(5)
sp = pyimport("scipy.interpolate")
x = LinRange(0,1,50)
y = (0.9 .+ 0.1rand(length(x))).*sin.(2*pi*(x.-0.5))
t = collect(x[2:2:end-1]) # knots
s1 = sp.LSQUnivariateSpline(x, y, t)
x2 = LinRange(0, 1, 200) # new x-grid
y2 = s1(x2) # evaluate spline on that new grid
figure()
plot(x, y, label="original")
plot(x2, y2, label="interp", color="k")
knots = s1.get_knots()
c = s1.get_coeffs()
newknots(knots, k) = vcat(fill(knots[1],k),knots,fill(knots[end],k)) # func for boundary knots of order k
forscipyknots = newknots(knots, 3)
s2 = sp.BSpline(forscipyknots, c, 3)
y3 = s2(x2)
plot(x2,y3,"--r", label="reconstructed \nfrom knots and coeff")
legend()
Which provides the following as expected:
On trying to reproduce a function (image below) with specified knots = [.4,.4,.4,.4,.7] and coefficients c = [2,-5,5,2,-3,-1,2] which is supposed to produce:
With the below code and above knots and coefficients:
knots = [.4,.4,.4,.4,.7]
c = [2,-5,5,2,-3,-1,2]
forscipyknots = newknots(knots, 3)
s2 = sp.BSpline(forscipyknots, c, 3)
figure()
plot(x2, s2(x2))
I get the following (below) instead. I'm sure I'm messing up the boundary knots - how can I fix this?
Short answer:
The inner-knot sequence t=[0.4,0.4,0.4,0.4,0.7] and the parameters c=[2,-5,5,2,-3,-1,2] do not allow a spline to be constructed, the example contains an error (more on this later). The best you can get out of it is to remove one of the 0.4 knots and construct a quadratic (second-degree) spline as follows
tt = [0.0,0.0,0.0,0.4,0.4,0.4,0.7,1.0,1.0,1.0]
c = [2,-5,5,2,-3,-1,2]
s2 = BSpline(tt,c,2)
This produces the following graph
Long answer:
The knot sequence in Example 3 contains only the inner knots, therefore, you need to add the boundary knots. Since you want to evaluate the spline on the interval [0,1] the full knot sequence needs to cover the points 0 and 1. The simplest is to add 0 to the beginning and 1 to the end of the sequence and replicate them as necessary according to the desired degree of the spline. A cubic (third-degree) spline would require four boundary knots (i.e. four zeros and four ones) and a quadratic spline would require three boundary knots (three zeros and three ones).
There is a problem, however. A cubic spline would require 9 parameters, while Example 3 only gives you 7. Hence, you cannot construct a cubic spline from this. With the seven parameters given, you could construct a quadratic spline, but, there, the problem is that for quadratic splines each point can only appear at most three times in the inner-knot sequence. And 0.4 appears four times (which would suggest a cubic spline). Hence, all you can do is to remove one of the 0.4 knots and construct a second-degree spline as in the short answer above.
Now I will explain what you did wrong. In the first example you obtained the knot sequence from an existing spline using knots = s1.get_knots(), which gave you knots=[0,0.02,0.04,...,0.98,1]. This sequence contains the boundary knots 0 and 1 (although only once). Hence, to construct a cubic spline, you replicated each of these three times to obtain forscipyknots = [0,0,0,0,0.02,0.04,...,0.98,1,1,1,1]. So far so good.
In Example 3, however, the knot sequence does not contain the boundary points. As you did the same as before, you ended up replicating the 0.4 and 0.7 knots three times resulting in forscipyknots = [0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.7,0.7,0.7,0.7]. You cannot construct a spline on this sequence, whatever comes out of this is not a spline. What you needed instead was forscipyknots = [0.0,0.0,0.0,0.0,0.4,0.4,0.4,0.4,0.7,1.0,1.0,1.0,1.0] (which would not have worked because you do not have enough coefficients; but you could try this with your own, for instance, c = [1,2,-5,5,2,-3,-1,2,1]). To do this, you needed to add 0 to the beginning and 1 to the end of the array and only then use your newknots function.
Just as an example, a cubic spline could look like this
tt = [0.0,0.0,0.0,0.0,0.4,0.4,0.4,0.4,0.7,1.0,1.0,1.0,1.0]
c = [1,2,-5,5,2,-3,-1,2,1]
s2 = BSpline(tt,c,3)

How to plot precision and recall of a CNN in MATLAB?

How to plot the precision and recall curves of a CNN?
I have generated the scores from CNN and want to plot the precision-recall curve, but I am unable to get that.
I have calculated TP, TN, FP, and FN using:
idx = (ACTUAL()==1);
p = length(ACTUAL(idx));
n = length(ACTUAL(~idx));
N = p+n;
tp = sum(ACTUAL(idx)==PREDICTED(idx));
tn = sum(ACTUAL(~idx)==PREDICTED(~idx));
fp = n-tn;
fn = p-tp;
The formula of precision and recall is
precision = tp/(tp+fp)
but with that, I am getting some undesired plot.
I have obtained scores of the CNN using the following command:
[YTest,score]=classify(convnet,TestData)
MATLAB has a function for creating ROC curves and similar performance curves (such as precision-recall curves) in the Statistics and Machine Learning Toolbox: perfcurve.
By default, the ROC curve is calculated.
The function has the following syntax:
[X, Y] = perfcurve(labels, scores, posclass)
Here, labels is the true label for each sample, scores is the prediction of the CNN (or any other classifier), and posclass is the label of the class you assume to be "positive" - which appears to be 1 in your example. The outputs of the perfcurve function are the (x, y) coordinates of the ROC curve, so you can easily plot it using
plot(X, Y)
To make perfcurve plot the precision-recall curve instead of the ROC curve, you have to set the optional 'XCrit' and 'YCrit' arguments of the function. As described in the documentation, different pre-defined criteria such as number of false positives ('fp'), true positive rate ('tpr'), accuracy ('accu') and many more, or even custom functions can be used.
By setting 'XCrit' to 'tpr' (Recall) and 'YCrit' to 'prec' (Precision), a precision-recall curve is created:
[X, Y] = perfcurve(labels, scores, posclass, 'XCrit', 'tpr', 'YCrit', 'prec');
plot(X, Y);
xlabel('Recall')
ylabel('Precision')
xlim([0, 1])
ylim([0, 1])
For example (using randomly generated data and a SVM):
The answer of hbaderts is correct but the end of the answer is wrong.
[X,Y] = perfcurve(labels,scores,posclass,'xCrit', 'fpr', 'yCrit', 'tpr');
Then the generated Receiver operating characteristic (ROC) curve is correct.

Non-symbolic derivative at all sample points including boundary points

Suppose I have a vector t = [0 0.1 0.9 1 1.4], and a vector x = [1 3 5 2 3]. How can I compute the derivative of x with respect to time that has the same length as the original vectors?
I should not use any symbolic operations. The command diff(x)./diff(t) does not produce a vector of the same length. Should I first interpolate the x(t) function and then take its derivative?
Different approaches exist to calculate the derivative at the same points as your initial data:
Finite differences: Use a central difference scheme at your inner points and a forward/backward scheme at your first/last point
or
Curve fitting: Fit a curve through your points, calculate the derivative of this fitted function and sample them at the same points as the original data. Typical fitting functions are polynomials or spline functions.
Note that the curve fitting approach gives better results, but needs more tuning options and is slower (~100x).
Demonstration
As an example, I will calculate the derivative of a sine function:
t = 0:0.1:1;
y = sin(t);
Its exact derivative is well known:
dy_dt_exact = cos(t);
The derivative can approximately been calculated as:
Finite differences:
dy_dt_approx = zeros(size(y));
dy_dt_approx(1) = (y(2) - y(1))/(t(2) - t(1)); % forward difference
dy_dt_approx(end) = (y(end) - y(end-1))/(t(end) - t(end-1)); % backward difference
dy_dt_approx(2:end-1) = (y(3:end) - y(1:end-2))./(t(3:end) - t(1:end-2)); % central difference
or
Polynomial fitting:
p = polyfit(t,y,5); % fit fifth order polynomial
dp = polyder(p); % calculate derivative of polynomial
The results can be visualised as follows:
figure('Name', 'Derivative')
hold on
plot(t, dy_dt_exact, 'DisplayName', 'eyact');
plot(t, dy_dt_approx, 'DisplayName', 'finite difference');
plot(t, polyval(dp, t), 'DisplayName', 'polynomial');
legend show
figure('Name', 'Error')
hold on
plot(t, abs(dy_dt_approx - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'finite difference');
plot(t, abs(polyval(dp, t) - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'polynomial');
legend show
The first graph shows the derivatives itself and the second graph plots the relative errors made by both methods.
Discussion
One clearly sees that the curve fitting method gives better results than the finite differences, but it is ~100x slower. The curve fitting methods has a relative error of order 10^-5. Note that the finite differences approach becomes better when your data is sampled more densely or you use a higher order scheme. The disadvantage of the curve fitting approach is that one has to choose a good polynomial order. Spline functions may be better suited in general.
A 10x faster sampled dataset, i.e. t = 0:0.01:1;, results in the following graphs:

Different python functions to fit cubic splines, finding coefficients

I want to fit a cubic spline in Python to noisy x, y data and extract the spline coefficients for each interval (i.e. I would expect to obtain four spline coefficients for each interval)
So far, I have tried (all from scipy.interpolate):
1) CubicSpline, but this method does not allow me to smooth the spline, resulting in unrealistic, jumpy coefficient data.
2) Combining splrep and splev, e.g.
tck = splrep(x, y, k=3, s=1e25)
where I extract the coefficients/knots using
F = PPoly.from_spline(tck)
coeffs = F.c
knots = F.x
However, I cannot find smooth coefficients over the full x-range (jumps between values close to zero and 1e23, which is unphysical) even if I ramp up the smoothing parameter s to very large numbers that ultimately lead to too small numbers of knots since the number of knots decreases with s. It seems that I cannot find a suitable parameter s and number of knots at the same time.
3) I used
UnivariateSpline(x, y, k=3, s=0.03)
Here, I found a better sensitivity to changing s, but the corresponding get_coeffs() method does not provide 4 coefficients for each interval but only one, which I do not understand.
4) I also tried a piecewise ridged linear regression with a third order polynomial, but this method provides too large percentage errors for the fit, so it would be great to get one of the standard spline methods working.
What am I missing? Can someone help, please?
The concrete issue I see here is that UnivariateSpline does not yield the algebraic coefficients of various powers of x in the interpolating spline. This is because the coefficients it keeps in the private _data property, which it also returns with get_coeffs method, are a kind of B-spline coefficients. These coefficients describe the spline without any redundancy (you need N of them for a spline with N degrees of freedom), but the basis splines that they are attached to are somewhat complicated.
But you can get the kind of coefficients you want by using the derivatives method of the spline object. It returns all four derivatives at a given point x, from which the Taylor coefficients at that point are easy to find. It is natural to use this method with x being the knots of interpolation, excluding the rightmost one; the coefficients obtained are valid from that knot to the next one. Here is an example, complete with "fancy" formatted output.
import numpy as np
from scipy.interpolate import UnivariateSpline
spl = UnivariateSpline(np.arange(6), np.array([3, 1, 4, 1, 5, 9]), s=0)
kn = spl.get_knots()
for i in range(len(kn)-1):
cf = [1, 1, 1/2, 1/6] * spl.derivatives(kn[i])
print("For {0} <= x <= {1}, p(x) = {5}*(x-{0})^3 + {4}*(x-{0})^2 + {3}*(x-{0}) + {2}".format(kn[i], kn[i+1], *cf))
The knots are 0, 2, 3, 5 in this example. The output is:
For 0.0 <= x <= 2.0, p(x) = -3.1222222222222222*(x-0.0)^3 + 11.866666666666667*(x-0.0)^2 + -10.744444444444445*(x-0.0) + 3.000000000000001
For 2.0 <= x <= 3.0, p(x) = 4.611111111111111*(x-2.0)^3 + -6.866666666666667*(x-2.0)^2 + -0.7444444444444436*(x-2.0) + 4.000000000000001
For 3.0 <= x <= 5.0, p(x) = -2.322222222222221*(x-3.0)^3 + 6.966666666666665*(x-3.0)^2 + -0.6444444444444457*(x-3.0) + 1.0000000000000016
Note that for each piece, cf holds the coefficients starting with the lowest degree, so the order is reversed when formatting the string.
(Of course, you'd probably want to do something else with these coefficients)
To check that the formulas are correct, I copy-pasted them for plotting:

Fit a curve in MATLAB where points have specified normals

I have two 2D points p1, p2 in MATLAB, and each point has a normal n1, n2. I wish to find the (cubic) polynomial which joins the two points and agrees with the specified normals at each end. Is there something built-in to MATLAB to do this?
Of course, I could derive the equations for the polynomial manually, but MATLAB's curve fitting toolbox has so much built-in that I assumed it would be possible. I haven't been able to find any examples of curve, spline or polynomial fitting where the normals are specified.
As an extrapolation of this, I would like to fit splines where each data point has a normal specified.
1. If your points are points of a function, then you need cubic Hermite spline interpolation:
In numerical analysis, a cubic Hermite spline or cubic Hermite
interpolator is a spline where each piece is a third-degree polynomial
specified in Hermite form: that is, by its values and first derivatives at the
end points of the corresponding domain interval.
Cubic Hermite splines
are typically used for interpolation of numeric data specified at
given argument values x(1), x(2), ..., x(n), to obtain a smooth
continuous function. The data should consist of the desired function
value and derivative at each x(k). (If only the values are provided,
the derivatives must be estimated from them.) The Hermite formula is
applied to each interval (x(k), x(k+1)) separately. The resulting
spline will be continuous and will have continuous first derivative.
Cubic polynomial splines can be specified in other ways, the Bézier
form being the most common. However, these two methods provide the
same set of splines, and data can be easily converted between the
Bézier and Hermite forms; so the names are often used as if they were
synonymous.
Specifying the normals at each point is the same as specifying the tangents (slopes, 1st derivatives), because the latter are perpendicular to the former.
In Matlab, the function for calculating the Piecewise Cubic Hermite Interpolating Polynomial is pchip. The only problem is that pchip is a bit too clever:
The careful reader will notice that pchip takes function values as
input, but no derivative values. This is because pchip uses the
function values f(x) to estimate the derivative values. [...] To do a
good derivative approximation, the function has to use an
approximation using 4 or more points [...] Luckily, using Matlab we
can write our own functions to do interpolation using real cubic
Hermite splines.
...the author shows how to do this, using the function mkpp.
2. If your points are not necessarily points of a function, then each interval should be interpolated by a quadratic Bezier curve:
In this example, 3 points are given: the endpoints P(0) and P(2), and P(1), which is the intersection of the tangents at the endpoints. The position of P(1) can be easily calculated from the coordinates of P(0) and P(2), and the normals at these points.
In Matlab, you can use spmak, see the examples here and here.
You could do something like this:
function neumann_spline(p, m, q, n)
% example data
p = [0; 1];
q = [2; 5];
m = [0; 1];
n = [1; 1];
if (m(2) ~= 0)
s1 = atan(-m(1)/m(2));
else
s1 = pi/2;
end
if (n(2) ~= 0)
s2 = atan(-n(1)/n(2));
else
s2 = pi/2;
end
hold on
grid on
axis equal
plot([p(1) p(1)+0.5*m(1)], [p(2) p(2)+0.5*m(2)], 'r', 'Linewidth', 1)
plot([q(1) q(1)+0.5*n(1)], [q(2) q(2)+0.5*n(2)], 'r', 'Linewidth', 1)
sp = csape([p(1) q(1)], [s1 p(2) q(2) s2], [1 1]);
fnplt(sp)
plot(p(1), p(2), 'k.', 'MarkerSize', 16)
plot(q(1), q(2), 'k.', 'MarkerSize', 16)
title('Cubic spline with prescribed normals at the endpoints')
end
The result is