How can I draw numbers from a Bernoulli distribution using scipy efficiently?
You can make use of the fact that the Bernoulli distribution is a special case of the Binomial distribution with n=1.
Example:
import scipy as sp
p = 0.2
sp.random.binomial(1, p)
Related
I have a 2x1800 dimensional data. Both features I approximated separately with the following distributions:
How to combine these two distributions to plot them as a multivariate distrubution's surface graph?
For independent random variables the joint distribution is the product of the marginal distributions. Use meshgrid to generate the appropriate indicies. Here we assume the marginal distributions are stored in arrays px and py
[xidx,yidx] = meshgrid(1:numel(px),1:numel(py));
pxy = px(xidx).*py(yidx);
surf(xidx,yidx,pxy);
shading('interp');
I want to fit a lognormal distribution in Python. My question is why should I use scipy.lognormal.fit instead of just doing the following:
from numpy import log
mu = log(data).mean()
sigma = log(data).std()
which gives the MLE of mu and sigma so that the distribution is lognormal(mu, sigma**2)?
Also, once I get mu and sigma how can I get a scipy object of the distribution lognormal(mu, sigma**2)? The arguments passed to scipy.stats.lognorm are not clear to me.
Thanks
Wrt fitting, you could use scipy.lognormal.fit, you could use scipy.normal.fit applied to log(x), you could do what you just wrote, I believe you should get pretty much the same result.
The only thing I could state, that you have to fit two parameters (mu, sigma), so you have to match two values. Instead of going for mean/stddev, some people might prefer to match peaks, thus getting (mu,sigma) from mode/stddev.
Wrt using lognorm with known mean and stddev
from scipy.stats import lognorm
stddev = 0.859455801705594
mean = 0.418749176686875
dist=lognorm([stddev],loc=mean) # will give you a lognorm distribution object with the mean and standard deviation you specify.
# You can then get the pdf or cdf like this:
import numpy as np
import pylab as pl
x=np.linspace(0,6,200)
pl.plot(x,dist.pdf(x))
pl.plot(x,dist.cdf(x))
pl.show()
I want to know how to generate the same random (Normal Distribution) numbers in numpy as I do in MATLAB.
As an example when I do this in MATLAB
RandStream.setGlobalStream(RandStream('mt19937ar','seed',1));
rand
ans =
0.417022004702574
Now I can reproduce this with numpy:
import numpy as np
np.random.seed(1)
np.random.rand()
0.417022004702574
Which is nice, but when I do this with normal distribution I get different numbers.
RandStream.setGlobalStream(RandStream('mt19937ar','seed',1));
randn
ans =
-0.649013765191241
And with numpy
import numpy as np
np.random.seed(1)
np.random.randn()
1.6243453636632417
Both functions say in their documentation that they draw from the standard normal distribution, yet give me different results. Any idea how I can adjust my python/numpy to get the same numbers as MATLAB.
Because someone marked this as a duplicate:
This is about normal distribution, as I wrote in the beginning and end.
As I wrote uniform distribution works fine, this is about normal distribution.
None of the answers in the linked thread help with normal distribution.
My guess would be that the matlab and numpy may use different methods to get normal distribution of random numbers (which are obtained from uniform numbers in some way).
You can avoid this problem by writing a box-muller method to generate the random numbers yourself. For python,
import numpy as np
# Box-muller normal distribution, note needs pairs of random numbers as input
def randn_from_rand(rand):
assert rand.size == 2
#Use box-muller to get normally distributed random numbers
randn = np.zeros(2)
randn[0] = np.sqrt(-2.*np.log(rand[0]))*np.cos(2*np.pi*rand[1])
randn[1] = np.sqrt(-2.*np.log(rand[0]))*np.sin(2*np.pi*rand[1])
return randn
np.random.seed(1)
r = np.random.rand(2)
print(r, randn_from_rand(r))
which gives,
(array([ 0.417022 , 0.72032449]), array([-0.24517852, -1.29966152]))
and for matlab,
% Box-muller normal distribution, note needs pairs of random numbers as input
function randn = randn_from_rand(rand)
%Use box-muller to get normally distributed random numbers
randn(1) = sqrt(-2*log(rand(1)))*cos(2*pi*rand(2));
randn(2) = sqrt(-2*log(rand(1)))*sin(2*pi*rand(2));
which we call with
RandStream.setGlobalStream(RandStream('mt19937ar','seed',1));
r = [rand, rand]
rn = randn_from_rand(r)
with answer,
r =
0.4170 0.7203
rn =
-0.2452 -1.2997
Note, you can check the output is normally distributed, for python,
import matplotlib.pyplot as plt
ra = []
np.random.seed(1)
for i in range(1000000):
rand = np.random.rand(2)
ra.append(randn_from_rand(rand))
plt.hist(np.array(ra).ravel(),100)
plt.show()
which gives,
I know for a random variable x that P(x=i) for each i=1,2,...,100. Then how may I sample x by a multinomial distribution, based on the given P(x=i) in Matlab?
I am allowed to use the Matlab built-in commands rand and randi, but not mnrnd.
In general, you can sample numbers from any 1 dimensional probability distribution X using a uniform random number generator and the inverse cumulative distribution function of X. This is known as inverse transform sampling.
random_x = xcdf_inverse(rand())
How does this apply here? If you have your vector p of probabilities defining your multinomial distribution, F = cumsum(p) gives you a vector that defines the CDF. You can then generate a uniform random number on [0,1] using temp = rand() and then find the first row in F greater than temp. This is basically using the inverse CDF of the multinomial distribution.
Be aware though that for some distributions (eg. gamma distribution), this turns out to be an inefficient way to generate random draws because evaluating the inverse CDF is so slow (if the CDF cannot expressed analytically, slower numerical methods must be used).
I want to compute the parameters mu and lambda for the Inverse Gaussian Distribution given the CDF.
By 'given the CDF' I mean that I have given the data AND the (estimated) quantile for the data I.e.
Quantile - Value
0.01 - 10
0.5 - 12
0.7 - 13
Now I want to find out the inverse gaussian distribution for this data so that I can e.g. Look up the quantile for value 11 based on my distribution.
How can I find out the values mu and lambda?
The only solution I can think of is using Gradient descent to find the best mu and lambda using RMSE as an error measure.
Isn't there a better solution?
Comment: Matlab's MLE-Algorithm is not an option, since it does not use the quantile data.
As all you really want to do is estimate the quantiles of the distribution at unknown values and you have a lot of data points you can simply interpolate the values you want to lookup.
quantile_estimate = interp1(values, quantiles, value_of_interest);
According to #mpiktas here I implemented a gradient descent algorithm for estimating my mu and lambda:
Make initial guess using MLE
Learn mu and lambda using gradient descent with RMSE as error measure.
The following article explains in detail how to compute quantiles (the inverse CDF) for the inverse Gaussian distribution:
Giner, G, and Smyth, GK (2016). statmod: probability calculations for the inverse Gaussian distribution. R Journal. http://arxiv.org/abs/1603.06687
Code for the R language is contained in the R package statmod available from CRAN. For example:
> library(statmod)
> qinvgauss(0.01, lower.tail=FALSE)
[1] 4.98
computes the 0.01 upper tail quantile of the standard IG distribution.