Curve fitting of sine function in python using scipy is not yielding desired output - scipy

I'm trying to fit sine function on my data. No errors are shown but it doesn't seem to work.
python
def sin_fun(x,a,b):
return (a*np.sin(b*x))
p_opt,p_cov=cf(sin_fun,xdata,ydata)
print(p_opt)
plt.plot(xdata,sin_fun(xdata,*p_opt))
plt.scatter(xdata,ydata)
plt.show()
This is the output I am getting:

I have simulated your data. There are 2 problems with your code as to why it isn't doing what you want. First is that your sin_fun needs a y-offset parameter, otherwise the function will always be symmetrical about y = 0. Secondly, the fit works better if you can provide curve_fit with a reasonable guess. This is done using the p0 argument. Have a look here:
from scipy.optimize import curve_fit as cf
import numpy as np
from matplotlib import pyplot as plt
# simulate your data
xdata = np.linspace(0, 25000, 256)
ydata = 15000 * np.sin(xdata/2000) + 22000
# add some noise
ydata += np.random.rand(xdata.size) * 2000
# sin function needs a y-offset -> c
def sin_fun(x,a,b,c):
return a*np.sin(b*x)+c
# need a reasonable guess -> note that the guess is not quite right but curve_fit still works
p_opt,p_cov=cf(sin_fun,xdata,ydata, p0=(10000, 1/2500, 15000))
print(p_opt)
plt.plot(xdata,sin_fun(xdata,*p_opt))
plt.plot(xdata,ydata, 'r.', ms=1)
plt.show()
With these fixes you can get a good fit. You could also add a phase parameter to your function to help fit other sinusoids.

Related

scipy.special yields fluctuating result for confluent hypergeometric function

The scipy implementation of the confluent hypergeometric function gives me wrong results. This is a minimal code:
import matplotlib.pyplot as plt
import numpy as np
from scipy import special
x=np.arange(0,1,.001)
f=special.hyp1f1(30,60,-1/x)
plt.scatter(x,f,s=.05)
When I run it, it produces the following plot:
output of scipy.special.hyp1f1
I wonder if there is a way to fix these fluctuations, which are definitely not correct. In fact, the function should be strictly positive in that range.
Starting from the explanation at scipy.special.hyp1f1, here is an attempt to approximate the function with a polynomial.
Apparently, hyp1f1(-1/x) works nice between x=0 and about x=0.2. Note that at x exactly 0, the function isn't properly defined. The approximation with a 5th degree polynomial is much too large for x<0.4. With a 80th degree polynomial, the approximation seems correct starting at x>0.025 but quickly gets out of bounds for smaller x. (With more than 90 terms the polynomial can't be calculated in this way anymore.)
Probably the best solution would be to use a high degree polynomial for x>=0.1 and the original hyp1f1 when x is smaller.
import matplotlib.pyplot as plt
import numpy as np
from scipy import special
x = np.linspace(0.001, 1, 1000)
f = special.hyp1f1(30, 60, -1 / x)
plt.scatter(x, f, s=1, color='r', label='hyp1f1')
for terms in range(80, 1, -10):
k10 = np.arange(terms)
c10 = special.poch(30, k10) / (special.poch(60, k10) * special.factorial(k10))
poly10 = np.poly1d(c10[::-1])
plt.scatter(x, poly10(-1 / x), s=1, label=f'{terms} terms', color=plt.cm.Set1(terms / 80))
plt.ylim(-3.5, 3.7)
plt.legend(scatterpoints=10, ncol=3)
plt.show()
Zoomed in:

Why does the HMC sampler return negative values for hyperparameters that need to be positive? [older GPflow versions before 1.0]

I'd like to build a GP with marginalized hyperparameters.
I have seen that this is possible with the HMC sampler provided in gpflow from this notebook
However, when I tried to run the following code as a first step of this (NOTE this is on gpflow 0.5, an older version), the returned samples are negative, even though the lengthscale and variance need to be positive (negative values would be meaningless).
import numpy as np
from matplotlib import pyplot as plt
import gpflow
from gpflow import hmc
X = np.linspace(-3, 3, 20)
Y = np.random.exponential(np.sin(X) ** 2)
Y = (Y - np.mean(Y)) / np.std(Y)
k = gpflow.kernels.Matern32(1, lengthscales=.2, ARD=False)
m = gpflow.gpr.GPR(X[:, None], Y[:, None], k)
m.kern.lengthscales.prior = gpflow.priors.Gamma(1., 1.)
m.kern.variance.prior = gpflow.priors.Gamma(1., 1.)
# dont want likelihood be a hyperparam now so fixed
m.likelihood.variance = 1e-6
m.likelihood.variance.fixed = True
m.optimize(maxiter=1000)
samples = m.sample(500)
print(samples)
Output:
[[-0.43764571 -0.22753325]
[-0.50418501 -0.11070128]
[-0.5932655 0.00821438]
[-0.70217714 0.05077999]
[-0.77745654 0.09362291]
[-0.79404456 0.13649446]
[-0.83989415 0.27118385]
[-0.90355789 0.29589641]
...
I don't know too much in detail about HMC sampling but I would expect that the sampled posterior hyperparameters are positive, I've checked the code and it seems maybe related to the Log1pe transform, though I failed to figure it out myself.
Any hint on this?
It would be helpful if you specified which GPflow version you are using - especially given that from the output you posted it looks like you are using a really old version of GPflow (pre-1.0), and this is actually something that got improved since. What is happening here (in old GPflow) is that the sample() method returns a single array S x P, where S is the number of samples, and P is the number of free parameters [e.g. for a M x M matrix parameter with lower-triangular transform (such as the Cholesky of the covariance of the approximate posterior, q_sqrt), only M * (M - 1)/2 parameters are actually stored and optimised!]. These are the values in the unconstrained space, i.e. they can take any value whatsoever. Transforms (see gpflow.transforms module) provide the mapping between this value (between plus/minus infinity) and the constrained value (e.g. gpflow.transforms.positive for lengthscales and variances). In old GPflow, the model provides a get_samples_df() method that takes the S x P array returned by sample() and returns a pandas DataFrame with columns for all the trainable parameters which would be what you want. Or, ideally, you would just use a recent version of GPflow, in which the HMC sampler directly returns the DataFrame!

Convergence when utilizing scipy.odr module to find best-fit parameters when there is only horizontal errorbars

I am trying to fit a piecewise (otherwise linear) function to a set of experimental data. The form of the data is such that there is only horizontal error bars and no vertical error bars. I am familiar with scipy.optimize.curve_fit module but that works when there is only vertical error bars corresponding to the dependent variable y. After searching for my specific need, I came across the following post where it explains about the possibility of using scipy.odr module when errorbars are those of independent variable x. (Correct fitting with scipy curve_fit including errors in x?)
Attached is my version of the code which tries to find best-fit parameters using ODR methodology. It actually draws best-fit function and it seems it's working. However, after changing initial (educated guess) values and trying to extract best-fit parameters, I am getting the same guessed parameters I inserted initially. This means that the method is not convergent and you can verify this by printing output.stopreason and getting
['Numerical error detected']
So, my question is whether this methodology is consistent with my function being piecewise and if not, if there is any other correct methodology to adopt in such cases?
from numpy import *
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from scipy.odr import ODR, Model, Data, RealData
x_array=array([8.2,8.6,9.,9.4,9.8,10.2,10.6,11.,11.4,11.8])
x_err_array=array([0.2]*10)
y_array=array([-2.05179545,-1.64998354,-1.49136169,-0.94200805,-0.60205999,0.,0.,0.,0.,0.])
y_err_array=array([0]*10)
# Linear Fitting Model
def func(beta, x):
return piecewise(x, [x < beta[0]], [lambda x:beta[1]*x-beta[1]*beta[0], lambda x:0.0])
data = RealData(x_array, y_array, x_err_array, y_err_array)
model = Model(func)
odr = ODR(data, model, [10.1,1.02])
odr.set_job(fit_type=0)
output = odr.run()
f, (ax1) = plt.subplots(1, sharex=True, sharey=True, figsize=(10,10))
ax1.errorbar(x_array, y_array, xerr = x_err_array, yerr = y_err_array, ecolor = 'blue', elinewidth = 3, capsize = 3, linestyle = '')
ax1.plot(x_array, func(output.beta, x_array), 'blue', linestyle = 'dotted', label='Best-Fit')
ax1.legend(loc='lower right', ncol=1, fontsize=12)
ax1.set_xlim([7.95, 12.05])
ax1.set_ylim([-2.1, 0.1])
ax1.yaxis.set_major_locator(MaxNLocator(prune='upper'))
ax1.set_ylabel('$y$', fontsize=12)
ax1.set_xlabel('$x$', fontsize=12)
ax1.set_xscale("linear", nonposx='clip')
ax1.set_yscale("linear", nonposy='clip')
ax1.get_xaxis().tick_bottom()
ax1.get_yaxis().tick_left()
f.subplots_adjust(top=0.98,bottom=0.14,left=0.14,right=0.98)
plt.setp([a.get_xticklabels() for a in f.axes[:-1]], visible=True)
plt.show()
An error of 0 for y is causing problems. Make it small but not zero, e.g. 1e-16. Doing so the fit converges. It also does if you omit the y_err_array when defining RealData but I am not sure what happens internally in that case.

Why is the output different for code ported from MATLAB to Python?

EDIT: After some more testing and a response form the scipy mailing list, the issue appears to be with fspecial(). To get the same output I need to generate the same kind of kernel in Python as the Matlab fspecial command is producing. For now I will try to export the kernel from matlab and work from there. Added as a edit since question has been "closed"
I am trying to port the following MATLAB code to Python. It seems to work but the output is different form MATLAB. I think the problem is with apply a "mean" filter to the log(amplituide). Any help appreciated.
The MATLAB code is from: http://www.klab.caltech.edu/~xhou/projects/spectralResidual/spectralresidual.html
%% Read image from file
inImg = im2double(rgb2gray(imread('1.jpg')));
inImg = imresize(inImg, 64/size(inImg, 2));
%% Spectral Residual
myFFT = fft2(inImg);
myLogAmplitude = log(abs(myFFT));
myPhase = angle(myFFT);
mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, fspecial('average', 3), 'replicate');
saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2;
%% After Effect
saliencyMap = mat2gray(imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5)));
imshow(saliencyMap);
Here is my attempt in python:
from skimage import img_as_float
from skimage.io import imread
from skimage.color import rgb2gray
from scipy import fftpack, ndimage, misc
from scipy.ndimage import uniform_filter
from matplotlib.pyplot as plt
# Read image from file
image = img_as_float(rgb2gray(imread('1.jpg')))
image = misc.imresize(image, 64.0 / image.shape[0])
# Spectral Residual
fft = fftpack.fft2(image)
logAmplitude = np.log(np.abs(fft))
phase = np.angle(fft)
avgLogAmp = uniform_filter(logAmplitude, size=3, mode="nearest") #Is this same a applying "mean" filter
spectralResidual = logAmplitude - avgLogAmp
saliencyMap = np.abs(fftpack.ifft2(np.exp(spectralResidual + 1j * phase))) ** 2
# After Effect
saliencyMap = ndimage.gaussian_filter(sm, sigma=2.5)
plt.imshow(sm)
plt.show()
For completness here is a input image and the output from MATLAB and python.
I doubt anyone will be able to give you a firm answer on this. It could be any number of things... Could be that one FFT is 0-centered while the other isn't, could be a float vs double somewhere, could be mishandling of absolute value, could be a filter setting, ...
If I were you, I'd write out some intermediate values for both computations and find a way to compare them. Start in the middle, if they compare well then move down, if they don't compare well then move up. Maybe write an intermediate value from the python script out to a file, import into matlab, take the element-wise difference, and graph. If they're not the same dimensions, that's clue #1.

Fitting transfer function models in Scipy.signal

I am using curve_fit to fit a step response of a first order dynamic system to estimate the gain and time constant. I use two approaches. First approach is to fit the curve generated from the function , in the time domain .
# define the first order dynamics in the time domain
def model(t,gain,tau):
return (gain*(1-exp(-t/tau)))
#define the time intervals
time_interval = linspace(1,100,100)
#genearte the output using the model with gain= 10 and tau= 4
output= model(t,10,4)
# fit to output and estimate parameters - gain and tau
par = curve_fit(time_interval, output)
Now checking par reveals an array of 10 and 4 which is perfect.
The second approach is to estimate gain and time constant by fitting to a step response of a LTI system
The LTI System is defined as a transfer function with numerator and denominator.
#define function as a step response of a LTI system .
# The argument x has no significance here,
# I have included because , the curve_fit requires passing "x" data to the function
def model1(x ,gain1,tau1):
return lti(gain1,[tau1,1]).step()[1]
#generate output using the above model
output1 = model1(0,10,4)
par1 = curve_fit(model1,1,output1)
now checking par1 reveals an array of [ 1.00024827, 0.01071004] which is wrong. What is wrong with my second approach here? Is there more efficient way of estimating the transfer function coefficients from the data by curve_fit
Thank you
The first three arguments to curve_fit are the function to be fit,
the xdata and the ydata. You have passed xdata=1. Instead you should
give it the time values associated with output1.
One way to do that is to actually use the first argument in the function
model1, like you did in model(). For example:
import numpy as np
from scipy.signal import lti
from scipy.optimize import curve_fit
def model1(x, gain1, tau1):
y = lti(gain1, [tau1, 1]).step(T=x)[1]
return y
time_interval = np.linspace(1,100,100)
output1 = model1(time_interval, 10, 4)
par1 = curve_fit(model1, time_interval, output1)
I get [10., 4.] for the parameters, as expected.