Create spline from knots and coefficients using scipy - scipy

I'm trying to reproduce a function from a paper, which is specified only in terms of spline knots and coefficients. After finding this on stackoverflow, given a scipy interpolation object, from its knots and coefficients, I can recreate the scipy interpolation. However, the approach fails for the function specified in the paper. To reproduce a scipy interpolation I can do this:
using PyCall, PyPlot, Random
Random.seed!(5)
sp = pyimport("scipy.interpolate")
x = LinRange(0,1,50)
y = (0.9 .+ 0.1rand(length(x))).*sin.(2*pi*(x.-0.5))
t = collect(x[2:2:end-1]) # knots
s1 = sp.LSQUnivariateSpline(x, y, t)
x2 = LinRange(0, 1, 200) # new x-grid
y2 = s1(x2) # evaluate spline on that new grid
figure()
plot(x, y, label="original")
plot(x2, y2, label="interp", color="k")
knots = s1.get_knots()
c = s1.get_coeffs()
newknots(knots, k) = vcat(fill(knots[1],k),knots,fill(knots[end],k)) # func for boundary knots of order k
forscipyknots = newknots(knots, 3)
s2 = sp.BSpline(forscipyknots, c, 3)
y3 = s2(x2)
plot(x2,y3,"--r", label="reconstructed \nfrom knots and coeff")
legend()
Which provides the following as expected:
On trying to reproduce a function (image below) with specified knots = [.4,.4,.4,.4,.7] and coefficients c = [2,-5,5,2,-3,-1,2] which is supposed to produce:
With the below code and above knots and coefficients:
knots = [.4,.4,.4,.4,.7]
c = [2,-5,5,2,-3,-1,2]
forscipyknots = newknots(knots, 3)
s2 = sp.BSpline(forscipyknots, c, 3)
figure()
plot(x2, s2(x2))
I get the following (below) instead. I'm sure I'm messing up the boundary knots - how can I fix this?

Short answer:
The inner-knot sequence t=[0.4,0.4,0.4,0.4,0.7] and the parameters c=[2,-5,5,2,-3,-1,2] do not allow a spline to be constructed, the example contains an error (more on this later). The best you can get out of it is to remove one of the 0.4 knots and construct a quadratic (second-degree) spline as follows
tt = [0.0,0.0,0.0,0.4,0.4,0.4,0.7,1.0,1.0,1.0]
c = [2,-5,5,2,-3,-1,2]
s2 = BSpline(tt,c,2)
This produces the following graph
Long answer:
The knot sequence in Example 3 contains only the inner knots, therefore, you need to add the boundary knots. Since you want to evaluate the spline on the interval [0,1] the full knot sequence needs to cover the points 0 and 1. The simplest is to add 0 to the beginning and 1 to the end of the sequence and replicate them as necessary according to the desired degree of the spline. A cubic (third-degree) spline would require four boundary knots (i.e. four zeros and four ones) and a quadratic spline would require three boundary knots (three zeros and three ones).
There is a problem, however. A cubic spline would require 9 parameters, while Example 3 only gives you 7. Hence, you cannot construct a cubic spline from this. With the seven parameters given, you could construct a quadratic spline, but, there, the problem is that for quadratic splines each point can only appear at most three times in the inner-knot sequence. And 0.4 appears four times (which would suggest a cubic spline). Hence, all you can do is to remove one of the 0.4 knots and construct a second-degree spline as in the short answer above.
Now I will explain what you did wrong. In the first example you obtained the knot sequence from an existing spline using knots = s1.get_knots(), which gave you knots=[0,0.02,0.04,...,0.98,1]. This sequence contains the boundary knots 0 and 1 (although only once). Hence, to construct a cubic spline, you replicated each of these three times to obtain forscipyknots = [0,0,0,0,0.02,0.04,...,0.98,1,1,1,1]. So far so good.
In Example 3, however, the knot sequence does not contain the boundary points. As you did the same as before, you ended up replicating the 0.4 and 0.7 knots three times resulting in forscipyknots = [0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.7,0.7,0.7,0.7]. You cannot construct a spline on this sequence, whatever comes out of this is not a spline. What you needed instead was forscipyknots = [0.0,0.0,0.0,0.0,0.4,0.4,0.4,0.4,0.7,1.0,1.0,1.0,1.0] (which would not have worked because you do not have enough coefficients; but you could try this with your own, for instance, c = [1,2,-5,5,2,-3,-1,2,1]). To do this, you needed to add 0 to the beginning and 1 to the end of the array and only then use your newknots function.
Just as an example, a cubic spline could look like this
tt = [0.0,0.0,0.0,0.0,0.4,0.4,0.4,0.4,0.7,1.0,1.0,1.0,1.0]
c = [1,2,-5,5,2,-3,-1,2,1]
s2 = BSpline(tt,c,3)

Related

How can I code a contour integral representation of cosine?

I am pretty new to MATLAB computation and would appreciate any help on this. Firstly, I am trying to integrate a cosine function using the McLaurin expansion [cos(z) = 1 − (z^2)/2 + (z^4)/4 + ....] and lets say plotted over one cycle from 0 to 2π. This first integral (as seen in Figure 1) would serve as a "reference" or "contour" for what I want to do next.
Figure 1 - "The first integral representation"
Now, my next problem comes from writing in MATLAB cos(z) in
terms of an integral in the complex plane.
Figure 2 - "showing cos(z) as an integral in the complex plane"
Where I could choose an equi-sampled set of points 'sn' along the contour, from 'sL' to 'sU'
and separated by '∆s'. This is Euler’s method.
I am trying to write a code to numerically approximate the integral
along a suitable contour and plot the approximation against the
exact value of cos(z). I would like to do this twice - once for
z ∈ [0, 6π] and once for complex valued z in the range
z ∈ [0 + i, 6π + ]. Then to plot both the real and imaginary part of the
computed cos(z)
I am aware of the steps I am looking to implement which I'll bullet-point here.
Choose γ, SL, SU , N.
Step through z from z lower to z upper (use a different
number of steps (other than N) for this).
For each value of z compute cos z by stepping along the contour in
N discrete steps from SL to SU .
For each value of sn along the contour compute the integrand e^(sn-(z^2/4sn))/sqrt(sn) and add it to the rolling sum [I have attached figure 3 showing an image formula of the integrand if its not clear!] Figure 3 - "The exponential integrand I am looking to compute"
Now I will show what I have attempted in MATLAB!
% "Contour Integral Representation of Cosine"
% making 'z' a possible number such as pi
N = 10000000; % example number - meaning sample of steps
z_lower = 0;
z_upper = 6*pi;
%==========================%
z1 = linspace(z_lower,z_upper,N);
y = 1;
Sl = y - 10*1i;
sum = 0.0;
%==========================%
for z = linspace(z_lower,z_upper,N)
for Sn = linspace(Sl,Su,N)
sum = sum + ((exp(Sn) - (z.^2/4*Sn))/sqrt(Sn))*ds;
end
end
sum = sum*(sqrt(pi)/2*pi*1i);
plot(Sn,sum)
Edit1: Hiya, this figure will show what I am expecting - the numerical method to be not exactly the same as the "symbolic" integration lets say. In figure 4, the black cosine wave is the same as in figure 1 and the blue is the numerical integration method.Figure 4 - "End Result of what I expect to plot

finding peaks that have some structure in 1-d

I have signal with peaks that has some structure on top of some background, and I'm trying to find a robust way to locate their positions and amplitudes.
For example, assume the peak has this form:
t=linspace(0,10,1e3);
w=0.25;
rw=#(t0) 2/(sqrt(3*w)*pi^0.25)*(1-((t-t0)/w).^2).*exp(-(t-t0).^2/(2*w.^2));
and I have several peaks a bit too close so their structure starts to interfere with the others, and one separated, so you can see it's structure:
pos=[ 2 2.5 3 8]; % positions
total_signal=0.*t;
for i=1:length(pos)
total_signal=total_signal+rw(pos(i));
end
plot(t,total_signal);
So the goal is to find all peaks found in total_signal and check that their positions agree with the original positions that were used to generate them in pos.
Your signal can be thought of as the convolution of a pulse train with the peak shape. Deconvolution can be used to retrieve the pulse train, whose peaks are the locations you are looking for. This assumes that the peak shape is known and constant.
I will start by rewriting your code as follows:
t = linspace(0,10,1e3);
w = 0.25;
rw = #(t) 2/(sqrt(3*w)*pi^0.25)*(1-((t)/w).^2).*exp(-(t).^2/(2*w.^2));
pos = [2, 2.5, 3, 8];
total_signal = zeros(size(t));
for i=1:length(pos)
total_signal = total_signal + rw(t-pos(i));
end
total_signal = total_signal + randn(size(t))*1e-2;
plot(t,total_signal);
I've changed rw to take t-t0, rather than just t0. This will allow me later to create a clean signal with just one peak in the middle. I have also added noise to the signal, for a more realistic problem. Without the noise, the problem is a lot easier to solve.
Wiener deconvolution is the simplest approach to solve this problem. In short, we assume that
total_signal = conv(pulse_train, shape)
In the frequency domain, this is written as
G = F .* H
(with .* the element-wise multiplication, using MATLAB syntax here). The Wiener deconvolution is:
F = (conj(H) .* G) ./ (abs(H).^2 + k)
with k some constant that we can tune to regularize the solution.
We implement this as follows:
shape = rw(linspace(-5,5,1e3));
G = fft(total_signal);
H = fft(ifftshift(shape)); % ifftshift moves the origin to sample #0, as expected by FFT.
k = 1;
F = (conj(H) .* G) ./ (abs(H).^2 + k);
pulse_train = ifft(F);
Now, findpeaks (requires the Signal Processing Toolbox) can be used to find the prominent peaks:
findpeaks(pulse_train, 'MinPeakProminence', 0.02)
Note how the four peaks are approximately the same height. It's not exact, because we're regularizing to deal with the noise. An exact solution is only possible in the noise-free case. Without noise, k=0, and the expression simplifies to F = G ./ H.
Also, the x-axis is off in the plot produced by findpeaks, but this shouldn't affect the results. The locations returned by it are indices into the array, those same indices can be used to index into t and find the actual locations of the peaks.

Different python functions to fit cubic splines, finding coefficients

I want to fit a cubic spline in Python to noisy x, y data and extract the spline coefficients for each interval (i.e. I would expect to obtain four spline coefficients for each interval)
So far, I have tried (all from scipy.interpolate):
1) CubicSpline, but this method does not allow me to smooth the spline, resulting in unrealistic, jumpy coefficient data.
2) Combining splrep and splev, e.g.
tck = splrep(x, y, k=3, s=1e25)
where I extract the coefficients/knots using
F = PPoly.from_spline(tck)
coeffs = F.c
knots = F.x
However, I cannot find smooth coefficients over the full x-range (jumps between values close to zero and 1e23, which is unphysical) even if I ramp up the smoothing parameter s to very large numbers that ultimately lead to too small numbers of knots since the number of knots decreases with s. It seems that I cannot find a suitable parameter s and number of knots at the same time.
3) I used
UnivariateSpline(x, y, k=3, s=0.03)
Here, I found a better sensitivity to changing s, but the corresponding get_coeffs() method does not provide 4 coefficients for each interval but only one, which I do not understand.
4) I also tried a piecewise ridged linear regression with a third order polynomial, but this method provides too large percentage errors for the fit, so it would be great to get one of the standard spline methods working.
What am I missing? Can someone help, please?
The concrete issue I see here is that UnivariateSpline does not yield the algebraic coefficients of various powers of x in the interpolating spline. This is because the coefficients it keeps in the private _data property, which it also returns with get_coeffs method, are a kind of B-spline coefficients. These coefficients describe the spline without any redundancy (you need N of them for a spline with N degrees of freedom), but the basis splines that they are attached to are somewhat complicated.
But you can get the kind of coefficients you want by using the derivatives method of the spline object. It returns all four derivatives at a given point x, from which the Taylor coefficients at that point are easy to find. It is natural to use this method with x being the knots of interpolation, excluding the rightmost one; the coefficients obtained are valid from that knot to the next one. Here is an example, complete with "fancy" formatted output.
import numpy as np
from scipy.interpolate import UnivariateSpline
spl = UnivariateSpline(np.arange(6), np.array([3, 1, 4, 1, 5, 9]), s=0)
kn = spl.get_knots()
for i in range(len(kn)-1):
cf = [1, 1, 1/2, 1/6] * spl.derivatives(kn[i])
print("For {0} <= x <= {1}, p(x) = {5}*(x-{0})^3 + {4}*(x-{0})^2 + {3}*(x-{0}) + {2}".format(kn[i], kn[i+1], *cf))
The knots are 0, 2, 3, 5 in this example. The output is:
For 0.0 <= x <= 2.0, p(x) = -3.1222222222222222*(x-0.0)^3 + 11.866666666666667*(x-0.0)^2 + -10.744444444444445*(x-0.0) + 3.000000000000001
For 2.0 <= x <= 3.0, p(x) = 4.611111111111111*(x-2.0)^3 + -6.866666666666667*(x-2.0)^2 + -0.7444444444444436*(x-2.0) + 4.000000000000001
For 3.0 <= x <= 5.0, p(x) = -2.322222222222221*(x-3.0)^3 + 6.966666666666665*(x-3.0)^2 + -0.6444444444444457*(x-3.0) + 1.0000000000000016
Note that for each piece, cf holds the coefficients starting with the lowest degree, so the order is reversed when formatting the string.
(Of course, you'd probably want to do something else with these coefficients)
To check that the formulas are correct, I copy-pasted them for plotting:

find the line which best fits to the data

I'm trying to find the line which best fits to the data. I use the following code below but now I want to have the data placed into an array sorted so it has the data which is closest to the line first how can I do this? Also is polyfit the correct function to use for this?
x=[1,2,2.5,4,5];
y=[1,-1,-.9,-2,1.5];
n=1;
p = polyfit(x,y,n)
f = polyval(p,x);
plot(x,y,'o',x,f,'-')
PS: I'm using Octave 4.0 which is similar to Matlab
You can first compute the error between the real value y and the predicted value f
err = abs(y-f);
Then sort the error vector
[val, idx] = sort(err);
And use the sorted indexes to have your y values sorted
y2 = y(idx);
Now y2 has the same values as y but the ones closer to the fitting value first.
Do the same for x to compute x2 so you have a correspondence between x2 and y2
x2 = x(idx);
Sembei Norimaki did a good job of explaining your primary question, so I will look at your secondary question = is polyfit the right function?
The best fit line is defined as the line that has a mean error of zero.
If it must be a "line" we could use polyfit, which will fit a polynomial. Of course, a "line" can be defined as first degree polynomial, but first degree polynomials have some properties that make it easy to deal with. The first order polynomial (or linear) equation you are looking for should come in this form:
y = mx + b
where y is your dependent variable and X is your independent variable. So the challenge is this: find the m and b such that the modeled y is as close to the actual y as possible. As it turns out, the error associated with a linear fit is convex, meaning it has one minimum value. In order to calculate this minimum value, it is simplest to combine the bias and the x vectors as follows:
Xcombined = [x.' ones(length(x),1)];
then utilized the normal equation, derived from the minimization of error
beta = inv(Xcombined.'*Xcombined)*(Xcombined.')*(y.')
great, now our line is defined as Y = Xcombined*beta. to draw a line, simply sample from some range of x and add the b term
Xplot = [[0:.1:5].' ones(length([0:.1:5].'),1)];
Yplot = Xplot*beta;
plot(Xplot, Yplot);
So why does polyfit work so poorly? well, I cant say for sure, but my hypothesis is that you need to transpose your x and y matrixies. I would guess that that would give you a much more reasonable line.
x = x.';
y = y.';
then try
p = polyfit(x,y,n)
I hope this helps. A wise man once told me (and as I learn every day), don't trust an algorithm you do not understand!
Here's some test code that may help someone else dealing with linear regression and least squares
%https://youtu.be/m8FDX1nALSE matlab code
%https://youtu.be/1C3olrs1CUw good video to work out by hand if you want to test
function [a0 a1] = rtlinreg(x,y)
x=x(:);
y=y(:);
n=length(x);
a1 = (n*sum(x.*y) - sum(x)*sum(y))/(n*sum(x.^2) - (sum(x))^2); %a1 this is the slope of linear model
a0 = mean(y) - a1*mean(x); %a0 is the y-intercept
end
x=[65,65,62,67,69,65,61,67]'
y=[105,125,110,120,140,135,95,130]'
[a0 a1] = rtlinreg(x,y); %a1 is the slope of linear model, a0 is the y-intercept
x_model =min(x):.001:max(x);
y_model = a0 + a1.*x_model; %y=-186.47 +4.70x
plot(x,y,'x',x_model,y_model)

Exponential curve fitting without the Curve Fitting toolbox?

I have some data points to which I need to fit an exponential curve of the form
y = B * exp(A/x)
(without the help of Curve Fitting Toolbox).
What I have tried so far to linearize the model by applying log, which results in
log(y/B) = A/x
log(y) = A/x + log(B)
I can then write it in the form
Y = AX + B
Now, if I neglect B, then I am able to solve it with
A = pseudoinverse (X) * Y
but I am stuck with values of B...
Fitting a curve of the form
y = b * exp(a / x)
to some data points (xi, yi) in the least-squares sense is difficult. You cannot use linear least-squares for that, because the model parameters (a and b) do not appear in an affine manner in the equation. Unless you're ready to use some nonlinear-least-squares method, an alternative approach is to modify the optimization problem so that the modified problem can be solved using linear least squares (this process is sometimes called "data linearization"). Let's do that.
Under the assumption that b and the yi's be positive, you can apply the natural logarithm to both sides of the equations:
log(y) = log(b) + a / x
or
a / x + log(b) = log(y)
By introducing a new parameter b2, defined as log(b), it becomes evident that parameters a and b2 appear in a linear (affine, really) manner in the new equation:
a / x + b2 = log(y)
Therefore, you can compute the optimal values of those parameters using least squares; all you have left to do is construct the right linear system and then solve it using MATLAB's backslash operator:
A = [1 ./ x, ones(size(x))];
B = log(y);
params_ls = A \ B;
(I'm assuming x and y are column vectors, here.)
Then, the optimal values (in the least-squares sense) for the modified problem are given by:
a_ls = params_ls(1);
b_ls = exp(params_ls(2));
Although those values are not, in general, optimal for the original problem, they are often "good enough" in practice. If needed, you can always use them as initial guesses for some iterative nonlinear-least-squares method.
Doing the log transform then using linear regression should do it. Wikipedia has a nice section on how to do this:
http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#The_general_problem
%MATLAB code for finding the best fit line using least squares method
x=input('enter a') %input in the form of matrix, rows contain points
a=[1,x(1,1);1,x(2,1);1,x(3,1)] %forming A of Ax=b
b=[x(1,2);x(2,2);x(3,2)] %forming b of Ax=b
yy=inv(transpose(a)*a)*transpose(a)*b %computing projection of matrix A on b, giving x
%plotting the best fit line
xx=linspace(1,10,50);
y=yy(1)+yy(2)*xx;
plot(xx,y)
%plotting the points(data) for which we found the best fit line
hold on
plot(x(2,1),x(2,2),'x')
hold on
plot(x(1,1),x(1,2),'x')
hold on
plot(x(3,1),x(3,2),'x')
hold off
I'm sure the code can be cleaned up, but that's the gist of it.