My goal is to replicate this plot in Python:
I'm experiencing difficulty in generating light and heavy tailed distributions. The rest, I know how to generate. What should I do?
You can try e.g. gennorm, the generalized normal distribution. It has a shape parameter which controls the tails.
Related
I am trying to fit a custom distribution to a large (~O(500,000) measurements) dataset using scipy. I have derived a theoretical PDF based on some other factors, but both by hand and using symbolic integration software I cannot find an exact form of the CDF.
Currently, simply evaluating 1000 random samples from my custom distribution is expensive, which I believe is due to the need to invert an unknown CDF. If I cannot find an explicit form of the CDF and it's inverse, is there anything else I can do to speed up usage of this distribution?
I've used maple, matlab and Sympy to try and determine a CDF, yet none give a result. I also tried down-sampling my data whilst still retaining the tail attributes, but this still required so much data that doing anything with the distribution was slow.
My distribution is a sub-class of SciPy's rv_continuous class.
Thanks for any advice.
This sounds like you want to sample from a Kernel Density Estimation of the probability distribution. While Scipy does offer a Gaussian Kernel package, for that many measurements you would be much better off using sklearn's implementation. A good resource with code examples can be found on Jake VanderPlas's blog.
I already write some python or matlab code for neural network, but not using any framework or auto differentiation, but as we know, Theano and TensorFlow using auto differentiation, you build a calculation graph, they do calcultaion(back propagation) for you, but some times written program can run but definitely not run as I wish, So i wonder have some methods to make sure my program is correct?
printing the constructed calculation graph? but seems complicated when the number of NN layers is big like the winner of Imagenet adopted 152 layers
Or write another program using simple matlab or python code, then compare this output with program using framework?
Standard solution is numerical gradient checking. You can inefficiently compute gradient by doing forward propagation at two nearby values.
See section on numerical gradient checking here:
https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf
In TensorFlow this is implemented using compute_numeric_jacobian here.
I have a binary image full noises. I detected the objects circled in red using median filter B = medfilt2(A, [m n])(Matlab) or medianBlur(src, dst, ksize)(openCV).
Could you suggest other methods to detect those objects in a more "academic" way, e.g probabilistic method, clustering, etc?
This example looks like the very scenario DBSCAN was designed for.
Lots of noise, but with a well understood density, and clusters with much higher density but arbitrary shape.
You can use any sort of clustering here and can start from k-means one.
You can find a pretty good example from Matlab to start with.
I just detected faces using Viola-jones algorithm. I cropped faces from frames(or video)and I made it as training set.In my video there are 5 different faces. I decided to use eigenfaces for face recognition.I ended with finding eucledian distance for a input image.What Iam supposed do now.Whether I have to use classification or clustering technique?In my project they told to use kmeans clustering.How it can be done using eucledian distance explain?Give some useful links so that I can understand in better way.
When you have 5 different faces, this sounds like classification to me. That is a label column!
I don't think k-means will get you anywhere. When you have high-dimensional data (such as images), k-means performs as good as random convex partitioning. I.e. it's largely useless.
I am trying to fit a distribution to some data I've collected from microscopy images. We know that the peak at about 152 is due to a Poisson process. I'd like to fit a distribution to the large density in the center of the image, while ignoring the high intensity data. I know how to fit a Normal distribution to the data (red curve), but it doesn't do a good job of capturing the heavy tail on the right. Although the Poisson distribution should be able to model the tail to the right, it doesn't do a very good job either (green curve), because the mode of the distribution is at 152.
PD = fitdist(data, 'poisson');
The Poisson distribution with lambda = 152 looks very Gaussian-like.
Does anyone have an idea how to fit a distribution that will do a good job of capturing the right-tail of the data?
Link to an image showing the data and my attempts at distribution fitting.
The distribution looks a bit like an Ex-Gaussian (see the green line in the first wikipedia figure), that is, a mixture model of a normal and an exponential random variable.
On a side note, are you aware that, although the events of a poisson process are poisson distributed, the waiting times between the events are exponentially distributed? Given that a gaussian noise added to your measurement, an ex-gaussian distribution could be theoretically possible. (Of course this does not mean that this is also plausible.)
A tutorial on fitting the ex-gaussian with MatLab can be found in
Lacouture Y, Cousineau D. (2008)
How to use MATLAB to fit the ex‐Gaussian and other probability functions to a distribution of response times.
Tutorials in Quantitative Methods for Psychology 4 (1), p. 35‐45.
http://www.tqmp.org/Content/vol04-1/p035/p035.pdf
take a look at this: http://blogs.mathworks.com/pick/2012/02/10/finding-the-best/
it reviews the following FEX submission about fitting distributions: http://www.mathworks.com/matlabcentral/fileexchange/34943