Why do the y-intercept differs between Excel's LINEST function and NumPy's np.linalg.lstsq? - linear-regression

I'm trying to replicate an excel model in python, and I've used the excel linest formulae to find the y-intercept. Formulae is INDEX(LINEST(known y's, known x's, True),2).
Known y's [Highs]= 79.375, 89.5625, 91.5, 75.125, 72.6875, 70.5, 72.625, 70, 68, 67.1875, 68.625, 65.1875
Known x's = 1, 2, 3,.. 11, 12
The below code is trying to replicate the y-intercept but I'm getting different outputs.
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
drange = 12
A = np.vstack([ np.arange(drange), np.ones(len(x))]).T
df['intercept'] = df['High'].rolling(drange).apply(lambda y: np.linalg.lstsq(A, y, rcond=None)[0][1], raw=False)
In Excel I get 86.82996 for the intercept, in Python I get 84.89503. I'm trying to get the same Excel result in python.

Related

Matlab giving wrong operation for mathematical equation

I am running a script in Matlab R2020b. The script contains an array with following values:
a=[500, 500, 500, 1000, 750, 750, 567.79 613.04]
The script as an equation:
(a(1)*(a(8)-a(6)) + a(7)*(a(6)-a(2))+ a(5)*(a(2)-a(8)))
When running on Matlab the above equation gives the answer -11312 for the values of array a.
But when I calculate each value separately and add them the Matlab compiler gives a different answer.
a(1)*(a(8)-a(6)) = -68480
a(7)*(a(6)-a(2)) = 1.419e+05
a(5)*(a(2)-a(8)) = -84780
>>(-68480) + (1.419e+05) +(-84780)
the answer for the above is -11310.
A screenshot of the commands is also attached.
kindly tell me why Matlab compiler gives these different answers??
The problem is that MATLAB's default format is 'short', and this is not showing you complete precision. Try format long.
>> format long
>> a(7)*(a(6)-a(2))
ans =
1.419475000000000e+05
You are wrong.
If you add format long g you can see the real numbers:
format long g
a=[500, 500, 500, 1000, 750, 750, 567.79 613.04]
res1=(a(1)*(a(8)-a(6)) + a(7)*(a(6)-a(2))+ a(5)*(a(2)-a(8)))
a2=a(7)*(a(6)-a(2))
a1=a(1)*(a(8)-a(6))
a3=a(5)*(a(2)-a(8))
res2=a1+a2+a3
results in:
res1 =
-11312.5
a2 =
141947.5
a1 =
-68480
a3 =
-84780
res2 =
-11312.5

Applying scipy.stats.gaussian_kde to 3D point cloud

I have a set of about 33K (x,y,z) points in a csv file and would like to convert this to a grid of density values using scipy.stats.gaussian_kde. I have not been able to find a way to convert this point cloud array into an appropriate input format for the gaussian_kde function (and then take the output of this and convert it into a density value grid). Can anyone provide sample code?
Here's an example with some comments which may be of use. gaussian_kde wants the data and points to be row stacked, ie. (# ndim, # num values), as per the docs. In your case you would row_stack([x, y, z]) such that the shape is (3, 33000).
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt
# simulate some data
n = 33000
x = np.random.randn(n)
y = np.random.randn(n) * 2
# data must be stacked as (# ndim, # n values) as per docs.
data = np.row_stack((x, y))
# perform KDE
kernel = gaussian_kde(data)
# create grid over which to evaluate KDE
s = np.linspace(-8, 8, 128)
grid = np.meshgrid(s, s)
# again KDE needs points to be row_stacked
grid_points = np.row_stack([g.ravel() for g in grid])
# evaluate KDE and reshape result correctly
Z = kernel(grid_points)
Z = Z.reshape(grid[0].shape)
# plot KDE as image and overlay some data points
fig, ax = plt.subplots()
ax.matshow(Z, extent=(s.min(), s.max(), s.min(), s.max()))
ax.plot(x[::10], y[::10], 'w.', ms=1, alpha=0.3)
ax.set_xlim(s.min(), s.max())
ax.set_ylim(s.min(), s.max())

Probelm with Scipy.quad

I have been trying to work our my project using a Python code which needs integration within a sigma.
I get the following error and despite trying several ways, I was not able to solve it. Below you can find a shorter version of my code for error duplication.
The code can be run without any problem if the lower limit of the integral is zero or positive. If it's negative the code gives error...
File "C:\Users\AppData\Local\Programs\Python\Python37\lib\site-packages\scipy\integrate\quadpack.py", line 341, in quad
points)
File "C:\Users\AppData\Local\Programs\Python\Python37\lib\site-packages\scipy\integrate\quadpack.py", line 448, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
TypeError: must be real number, not mpc
import numpy as np
from scipy.integrate import quad
from mpmath import besselk, besseli, nsum, inf, exp, log, cos, mp
mp.dps = 3; mp.pretty = True
tt = (np.logspace(0.0001, 10, num=10)).round(2)
lenght = len(tt)
k0 = lambda u: besselk(0,u)
f = lambda u: u*exp(-2)
Zwn = lambda n: 0.5*(cos(n)*cos(2*n))
Rn = lambda u, n, xD: (1/u)*k0(xD*((f(u) + (n)**2)**0.5))
Lap_Func = lambda u: nsum(lambda n: ((quad(lambda xD: Zwn(n)*Rn(u, n, xD), -10, 10))[0]), [1, 100])
print(Lap_Func((log(2))*1/tt[3]))
Quad only deals with floats, and does not understand mpmath objects. Either drop mpmath and use directly numpy/scipy functions, or convert mpmath expressions to floats at the end of computations.

How to use scipy signal for MIMO systems

I am looking for a way to simulate the output of a signal for various input signals. To be more precise, I have a system defined by its transfer function H that takes one input and has one output. I generated several signals (stored in a numpy array). What I would like to do, is get the response of the system, to each input signal whithout using a for loop. Is there a way to proceed? Below is the code I wrote so far.
from __future__ import division
import numpy as np
from scipy import signal
nbr_inputs = 5
t_in = np.arange(0,10,0.2)
dim = (nbr_inputs, len(t_in))
x = np.cumsum(np.random.normal(0,2e-3, dim), axis=1)
H = signal.TransferFunction([1, 3, 3], [1, 2, 1])
t_out, y, _ = signal.lsim(H, x[0], t_in) # here, I would just like to simply write x
thanks for your help
This is not a MIMO system, it is a SISO system but you have multiple inputs.
You can create a MIMO system and apply your inputs all at once which will be computed channel by channel but simultaneously. Moreover, you can't use scipy.signal.lsim for MIMO systems yet. You can use other options such as python-control (if you have slycot extension otherwise again no MIMO) or harold if you have Python 3.6 or greater (disclaimer: I'm the author).
import numpy as np
from harold import *
import matplotlib.pyplot
nbr_inputs = 5
t_in = np.arange(0,10,0.2)
dim = (nbr_inputs, len(t_in))
x = np.cumsum(np.random.normal(0,2e-3, dim), axis=1)
# Forming a 1x5 system, common denominator will be completed automatically
H = Transfer([[[1, 3, 3]]*nbr_inputs], [1, 2, 1])
The keyword per_channel=True applies first input to first channel, second input to second and so on. Otherwise combined response is returned. You can check the shapes by playing around with it to see what I mean.
# Notice it is x.T below -> input shape = <num samples>, <num inputs>
y, t = simulate_linear_system(H, x.T, t_in, per_channel=True)
plt.plot(t, y)
This gives

Why is the output different for code ported from MATLAB to Python?

EDIT: After some more testing and a response form the scipy mailing list, the issue appears to be with fspecial(). To get the same output I need to generate the same kind of kernel in Python as the Matlab fspecial command is producing. For now I will try to export the kernel from matlab and work from there. Added as a edit since question has been "closed"
I am trying to port the following MATLAB code to Python. It seems to work but the output is different form MATLAB. I think the problem is with apply a "mean" filter to the log(amplituide). Any help appreciated.
The MATLAB code is from: http://www.klab.caltech.edu/~xhou/projects/spectralResidual/spectralresidual.html
%% Read image from file
inImg = im2double(rgb2gray(imread('1.jpg')));
inImg = imresize(inImg, 64/size(inImg, 2));
%% Spectral Residual
myFFT = fft2(inImg);
myLogAmplitude = log(abs(myFFT));
myPhase = angle(myFFT);
mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, fspecial('average', 3), 'replicate');
saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2;
%% After Effect
saliencyMap = mat2gray(imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5)));
imshow(saliencyMap);
Here is my attempt in python:
from skimage import img_as_float
from skimage.io import imread
from skimage.color import rgb2gray
from scipy import fftpack, ndimage, misc
from scipy.ndimage import uniform_filter
from matplotlib.pyplot as plt
# Read image from file
image = img_as_float(rgb2gray(imread('1.jpg')))
image = misc.imresize(image, 64.0 / image.shape[0])
# Spectral Residual
fft = fftpack.fft2(image)
logAmplitude = np.log(np.abs(fft))
phase = np.angle(fft)
avgLogAmp = uniform_filter(logAmplitude, size=3, mode="nearest") #Is this same a applying "mean" filter
spectralResidual = logAmplitude - avgLogAmp
saliencyMap = np.abs(fftpack.ifft2(np.exp(spectralResidual + 1j * phase))) ** 2
# After Effect
saliencyMap = ndimage.gaussian_filter(sm, sigma=2.5)
plt.imshow(sm)
plt.show()
For completness here is a input image and the output from MATLAB and python.
I doubt anyone will be able to give you a firm answer on this. It could be any number of things... Could be that one FFT is 0-centered while the other isn't, could be a float vs double somewhere, could be mishandling of absolute value, could be a filter setting, ...
If I were you, I'd write out some intermediate values for both computations and find a way to compare them. Start in the middle, if they compare well then move down, if they don't compare well then move up. Maybe write an intermediate value from the python script out to a file, import into matlab, take the element-wise difference, and graph. If they're not the same dimensions, that's clue #1.