student cdf and normal ppf precision - scipy

Is there a way to increase the precision for the scipy.stats functions norm.pdf and t.cdf? Because if I do this it works:
from scipy.stats import norm, t
norm.ppf(t.cdf(9, 140))
but if I do the following, I got "inf" for the pdf and "1.0" for the cdf:
norm.ppf(t.cdf(10, 140))
Thanks in advance

Related

how to find out peak rise and decay

I have doing some siganl processing and I am new to it. I am using scipy.signal to do the calculations.
I am able to find the peak height, width, but I was wondering if I can also find the rise of peak time and decay time. That will be the distance from the left width point to the tallest peak point and then tallest peak point to right width point.
So, far I have this, which is from tutorial
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, height=0)
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.plot(np.zeros_like(x), "--", color="gray")
plt.show()
esults_full = peak_widths(x, peaks, rel_height=1)
I think I am looking for the first moment or derivative
This is a thing that depends on the type of the signal, for this signal in particular an approach that worked is to find all peaks then filter the peaks by a prominence threshold defined by the the midpoint in the prominence ranges.
Once I have the peaks of interest I used the positions of the previous and next peaks.
import numpy as np
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks, peak_prominences
x = electrocardiogram()[2000:3500]
#b, a = butter(4, 0.001, 'high')
#x = lfilter(b, a, x)
peaks, _ = find_peaks(x)
prominences, _, _ = peak_prominences(x, peaks)
selected = prominences > 0.5 * (np.min(prominences) + np.max(prominences))
left = peaks[:-1][selected[1:]]
right = peaks[1:][selected[:-1]]
top = peaks[selected]
plt.figure(figsize=(14, 4))
plt.plot(x)
plt.plot(top, x[top], "x")
plt.plot(left, x[left], ".", markersize=20)
plt.plot(right, x[right], ".", markersize=20)
plt.show()
If you want to use height threshold it is interesting to remove frequencies lower than the signal frequency.
from scipy.signal import butter, lfilter
x = electrocardiogram()
plt.figure(figsize=(14, 4))
b, a = butter(4, 0.01, 'high')
plt.plot(x[2000:10000])
x = lfilter(b, a, x)
plt.plot(x[2000:10000])
plt.legend(['original', 'highpass filtered'])
About coding style preference, if you coming from MATLAB you may be used to everything in the global scope, but I always say that modules are your friends :). I would simply import scipy.signal instead of importing their member functions as global variables, you can use some alias for the modules like import matplotlib.pyplot as plt, and you can find what alias is commong to be used for each module, but this is more for programmer interoperability, not mandatory, so it is that I wrote the code in your style.
The derivatives
You can use rise = (peaks[top] - peaks[left]) / (top - left), and fall = (peaks[top] - peaks[right]) / (top - right), this is not the actual value of the derivatives, but are related featueres features.
Also if you want to find the max de

scipy.special yields fluctuating result for confluent hypergeometric function

The scipy implementation of the confluent hypergeometric function gives me wrong results. This is a minimal code:
import matplotlib.pyplot as plt
import numpy as np
from scipy import special
x=np.arange(0,1,.001)
f=special.hyp1f1(30,60,-1/x)
plt.scatter(x,f,s=.05)
When I run it, it produces the following plot:
output of scipy.special.hyp1f1
I wonder if there is a way to fix these fluctuations, which are definitely not correct. In fact, the function should be strictly positive in that range.
Starting from the explanation at scipy.special.hyp1f1, here is an attempt to approximate the function with a polynomial.
Apparently, hyp1f1(-1/x) works nice between x=0 and about x=0.2. Note that at x exactly 0, the function isn't properly defined. The approximation with a 5th degree polynomial is much too large for x<0.4. With a 80th degree polynomial, the approximation seems correct starting at x>0.025 but quickly gets out of bounds for smaller x. (With more than 90 terms the polynomial can't be calculated in this way anymore.)
Probably the best solution would be to use a high degree polynomial for x>=0.1 and the original hyp1f1 when x is smaller.
import matplotlib.pyplot as plt
import numpy as np
from scipy import special
x = np.linspace(0.001, 1, 1000)
f = special.hyp1f1(30, 60, -1 / x)
plt.scatter(x, f, s=1, color='r', label='hyp1f1')
for terms in range(80, 1, -10):
k10 = np.arange(terms)
c10 = special.poch(30, k10) / (special.poch(60, k10) * special.factorial(k10))
poly10 = np.poly1d(c10[::-1])
plt.scatter(x, poly10(-1 / x), s=1, label=f'{terms} terms', color=plt.cm.Set1(terms / 80))
plt.ylim(-3.5, 3.7)
plt.legend(scatterpoints=10, ncol=3)
plt.show()
Zoomed in:

Curve fitting of sine function in python using scipy is not yielding desired output

I'm trying to fit sine function on my data. No errors are shown but it doesn't seem to work.
python
def sin_fun(x,a,b):
return (a*np.sin(b*x))
p_opt,p_cov=cf(sin_fun,xdata,ydata)
print(p_opt)
plt.plot(xdata,sin_fun(xdata,*p_opt))
plt.scatter(xdata,ydata)
plt.show()
This is the output I am getting:
I have simulated your data. There are 2 problems with your code as to why it isn't doing what you want. First is that your sin_fun needs a y-offset parameter, otherwise the function will always be symmetrical about y = 0. Secondly, the fit works better if you can provide curve_fit with a reasonable guess. This is done using the p0 argument. Have a look here:
from scipy.optimize import curve_fit as cf
import numpy as np
from matplotlib import pyplot as plt
# simulate your data
xdata = np.linspace(0, 25000, 256)
ydata = 15000 * np.sin(xdata/2000) + 22000
# add some noise
ydata += np.random.rand(xdata.size) * 2000
# sin function needs a y-offset -> c
def sin_fun(x,a,b,c):
return a*np.sin(b*x)+c
# need a reasonable guess -> note that the guess is not quite right but curve_fit still works
p_opt,p_cov=cf(sin_fun,xdata,ydata, p0=(10000, 1/2500, 15000))
print(p_opt)
plt.plot(xdata,sin_fun(xdata,*p_opt))
plt.plot(xdata,ydata, 'r.', ms=1)
plt.show()
With these fixes you can get a good fit. You could also add a phase parameter to your function to help fit other sinusoids.

Minimal p-value for scipy.stats.pearsonr

I am running scipy.stats.pearsonr on my data, and I get
(0.9672434106763087, 0.0)
It is reasonable that the r-value is high and the p-value is very low.
However, p is obviously not 0, so I would like to know what p=0.0 means. Is it p<10^-10, p<10^-100 or what is the limit?
As pointed out by #MB-F in the comments it is calculated analytically.
In the code for the version 0.19.1, you could isolate that part of the code and plot the p-value in terms of r
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import betainc
r = np.linspace(-1, 1, 1000)*(1-1e-10);
for n in [10, 100, 1000]:
df = n - 2
t_squared = r**2 * (df / ((1.0 - r) * (1.0 + r)))
prob = betainc(0.5*df, 0.5, df/(df+t_squared))
plt.semilogy(r, prob, label=f'n={n}')
plt.axvline(0.9672434106763087, ls='--', color='black', label='r value')
plt.legend()
plt.grid()
The current stable version 1.9.3 uses a different formula
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import btdtr
r = np.linspace(-1, 1, 1000)*(1-1e-10);
for n in [10, 100, 1000]:
ab = 0.5*n
prob = btdtr(ab, ab, 0.5*(1-abs(r)))
plt.semilogy(r, prob, label=f'n={n}')
plt.axvline(0.9672434106763087, ls='--', color='black', label='r value')
plt.legend()
plt.grid()
But yield the same results.
You can see that if you have 1000 points and your correlation, the p value will be less than the minimum floating value.
The beta distribution
Scipy provides a collection of probability distributions, among them, the beta distribution.
The line
prob = btdtr(ab, ab, 0.5*(1-abs(r)))
could be replaced by
from scipy.stats import beta
prob = beta(ab, ab).cdf(0.5*(1-abs(r)))
There you can get much more information about it.

keep the scaling while drawing a weighed networkx

when I draw a weighed networkx, it does not really represented the real weight in terms of distance. I was curious if there is any parameters that I am missing or some other problem.
so, I started by making a simulated dataset as following
from pylab import plot,show
from numpy import vstack,array
from numpy.random import rand
from scipy.cluster.vq import kmeans,vq
from scipy.spatial.distance import euclidean
import networkx as nx
from scipy.spatial.distance import pdist, squareform, cdist
# data generation
data = vstack((rand(5,2) + array([12,12]),rand(5,2)))
a = pdist(data, 'euclidean')
def givexy(index1D, VectorLength):
return [index1D%VectorLength, index1D/VectorLength]
import matplotlib.pyplot as plt
plt.plot(data[:,0], data[:,1], 'o')
plt.show()
then, I calculate the euclidean distance among all pairs and use the distance as weight
G = nx.empty_graph(1)
for cnt, item in enumerate(a):
print cnt
G.add_edge(givexy(cnt, 10)[0], givexy(cnt, 10)[1], weight=item, length=0)
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos)
edge_labels=dict([((u,v,),"%.2f" % d['weight'])
for u,v,d in G.edges(data=True)])
nx.draw_networkx_edge_labels(G,pos,edge_labels=edge_labels)
#~ nx.draw(G,pos,edge_labels=edge_labels)
plt.show()
exit()
you might a get a different plot - because of unknown reason it is random. my main problem is the distance of nodes. for example the distance between node 4 to 8 is 0.82 but it looks longer than the distance of node 7 and 0.
any hint ?
thank you,
The spring layout doesn't explicitly use the weights as distances. Higher weight edges produce shorter edges in general.
Though if you want to specify the positions explicitly you can do that:
from numpy import vstack,array
from numpy.random import rand
from scipy.spatial.distance import euclidean, pdist
import networkx as nx
import matplotlib.pyplot as plt
# data generation
data = vstack((rand(5,2) + array([12,12]),rand(5,2)))
a = pdist(data, 'euclidean')
def givexy(index1D, VectorLength):
return [index1D%VectorLength, index1D/VectorLength]
plt.plot(data[:,0], data[:,1], 'o')
G = nx.Graph()
for cnt, item in enumerate(a):
print cnt
G.add_edge(givexy(cnt, 10)[0], givexy(cnt, 10)[1], weight=item, length=0)
pos={}
for node,row in enumerate(data):
pos[node]=row
nx.draw_networkx(G, pos)
plt.savefig('drawing.png')