Is this a good way of plotting a Normal Distribution? On occasion, I get a pdf value (pdf_x) which is greater than 1.
% thresh_strain contains a Normally Distributed set of numbers
[mu_j,sigma_j] = normfit(thresh_strain);
x=linspace(mu_j-4*sigma_j,mu_j+4*sigma_j,200);
pdf_x = 1/sqrt(2*pi)/sigma_j*exp(-(x-mu_j).^2/(2*sigma_j^2));
plot(x,pdf_x);
The integral of a pdf is 1, at any point the values can be higher. Your plot is corect.
As #Daniel points out in his answer, with continuous random variables the PDF is a derivative of a probability (or a measure of intensity) so it can be greater than one. The CDF is a probability and must always be on [0, 1].
As an example, take the distributions marked below. The area under each curve is 1 (they are valid distributions) yet the density can be above 1.
Related StackExchange posts: here and here
Related
I am trying to plot Gaussian using Matlab. My code is like this.
a=1/(0.1*sqrt(2*3.14))
y1=a*exp(-1*(((X1-Mu).^2)./(2*(Sigma^2)) ))
plot(X1,y1)
My graph looks like the image on link
It is showing correct shape but values at y axis is going up to 4. As per my knowledge Gaussian is a probability distribution function and thus must always return value between 0 and 1.Thus I am apprehensive if my implementation is correct?
Yes it is a probability distribution function but it is not required to return value between 0 and 1 everytime. As you can see from the picture below, Gaussian graph depends on variance and mean.
Your implementation is correct. The gaussian is a probability DENSITY function, which is different from a probability distribution. The former must only be larger or equal than zero but when integrated over the entire range of posible X1, the result must be equal to 1.
Probability distributions are the ones whos values must be lower or equal to 1.
As a sidenote. Matlab has both the gaussian probability density and distribution functions builtin as normpdf and normcdf respectively.
I am using the below code to calculate the probabilities of pixel intensities for the image given below. However, the total sum of probabilities sum(sum(probOfPixelIntensities)) is greater than 1.
I'm not sure where the mistake may be. Any help in figuring this out would be greatly appreciated. Thanks in advance.
clear all
clc
close all
I = imread('Images/cameraman.jpg');
I = rgb2gray(I);
imshow(I)
muHist = 134;
sigmaHist = 54;
Iprob = normpdf(double(I), muHist, sigmaHist);
sum(sum(Iprob))
What you are doing is computing the PDF values for every pixel in the image. Iprob is not a normal distribution but you are simply using the image pixels to sample from the distribution of a known mean and standard deviation.
Essentially, you are just performing a data transformation where the image pixel intensities get mapped to values on a normal PDF with a known mean and standard deviation. This is not the same as a PDF and that's why the sum is not 1. On top of this, the image pixel intensities don't even follow a normal distribution itself so there wouldn't be any way that the sum of the distribution is 1.
Not much more to say other than the output of normpdf is not what you are expecting it to be. You should opt to read the documentation of normpdf more carefully: http://www.mathworks.com/help/stats/normpdf.html
If it is your desire to determine the actual PDF of the image, what you need to do is find the histogram of the image, and not do a data transformation. You can do that with imhist. Once you do that, assuming that encountering the intensities is equiprobable, you would divide each histogram entry by the total size of the image and then sum along all bins. You should get the sum to be 1 in this case.
Just to verify, let's use the image you provided in your post. We'll read this in from StackOverflow. Once we do that, compute the PDF and then sum over all bins:
%// Load in image
im = rgb2gray(imread('http://i.stack.imgur.com/0XiU5.jpg'));
%// Compute PDF
h = imhist(im) / numel(im);
%// Sum over all bins
fprintf('Total sum over all bins is: %f\n', sum(h));
We get:
Total sum over all bins is: 1.000000
Just to be absolutely sure you understand, this is the PDF of the image. What you did before was perform a data transformation where you transformed all image pixel intensities that conforms to a Gaussian distribution with a known mean and standard deviation. This will not give you a sum of 1 as you expect.
Remember that PDF is only the probability density function $p(x)$. Function which is restricted to range $[0, 1]$ is the integral over all domain of that function $\int_D p(x)dx$.
Refer to the Matlab manual, Y = normpdf(X,mu,sigma) computes the pdf at each of the values in X using the normal distribution with mean mu and standard deviation sigma.
The sum of the pdf is equal to 1.
The sum of the output is not.
If I estimate the entropy of a vector of standard normal random variables using the Matlab entropy() function, I get an answer somewhere in the region of 4, whereas the actual entropy should be 0.5 * log(2*pi*e*sigma^2) which is approximately equal to 1.4.
Does anyone know where the discrepancy is coming from?
Note: To save time here is the Matlab code
for i = 1:1000
X(i) = randn();
end
'The entropy of X is'
entropy(X)
Please read the help (help entropy) or documentation for entropy. You'll see that it's designed for images and uses a histogram technique rather than calculating the it analytically. You'll need to create your own function if you want the formula from Wikipedia, but as the formula is so simple, that should be no problem.
I believe that the reason that you're getting such divergent answers is that entropy scales the bins of the histogram by the number of elements. If you want to uses such an estimation technique you'll want to use hist and scale the bins by area. See this StackOverflow question.
I have a set of samples, S, and I want to find its PDF. The problem is when I use ksdensity I get values greater than one!
[f,xi] = ksdensity(S)
In array f, most of the values are greater than one! Would you please tell me what the problem can be? Thanks for your help.
For example:
S=normrnd(0.3035, 0.0314,1,1000);
ksdensity(S)
ksdensity, as the name says, estimates a probability density function over a continuous variable. Probability densities can be larger than 1, they can actually have arbitrary values from zero upwards. The constraint on probabilities is that their sum over an exhaustive range of possibilities has to be 1. For probability densities, the constraint is that the integral over the whole range of values is 1.
A crude approximation of an integral of the pdf estimated by ksdensity can be obtained in Matlab like this:
sum(f) * min(diff(xi))
assuming that the values in xi are equally spaced. The value of this expression should be approximately 1.
If in your application you believe this approximation is not close enough to 1, you might want to specify the grid of estimation points (second parameter pts) such that the spacing is finer or the range is wider than the one automatically generated by ksdensity.
The radii r is drawn from a cut-off log-normal distribution, which has a following probability density function:
pdf=((sqrt(2).*exp(-0.5*((log(r/rch)).^2)))./((sqrt(pi.*(sigma_nd.^2))...
.*r).*(erf((log(rmax/rch))./sqrt(2.*(sigma_nd.^2)))-erf((log(rmin/rch))./sqrt(2.*(sigma_nd.^2))))));
rch, sigma_nd, rmax, and rmin are all constants.
I found the explanation from the net, but it seems difficult to find its integral and then take inverse in Matlab.
I checked, but my first instinct is that it looks like log(r/rch) is a truncated normal distribution with limits of log(rmin/rch) and log(rmax/rch). So you can generate the appropriate truncated normal random variable, say y, then r = rch * exp(y).
You can generate truncated normal random variables by generating the untruncated values and replacing those that are outside the limits. Alternatively, you can do it using the CDF, as described by #PengOne, which you can find on the wikipedia page.
I'm (still) not sure that your p.d.f. is completely correct, but what's most important here is the distribution.
If your PDF is continuous, then you can integrate to get a CDF, then find the inverse of the CDF and evaluate that at the random value.
If your PDF is not continuous, then you can get a discrete CDF using cumsum, and use that as your initial Y value in interp(), with the initial X value the same as the values the PDF was sampled at, and asking to interpolate at your array of rand() numbers.
it's not clear what's your variable, but i'm assuming it's r.
the simplest way to do this is, as Chris noted, first get the cdf (note that if r starts at 0, pdf(1) is Nan... change it to 0):
cdf = cumtrapz(pdf);
cdf = cdf / cdf(end);
then spawn a uniform distribution (size_dist indicating the number of elements):
y = rand (size_dist,1);
followed by a method to place distribution along the cdf. Any technique will work, but here is the simplest (albeit inelegant)
x = zeros(size_dist,1);
for i = 1:size_dist
x(i) = find( y(i)<= cdf,1);
end
and finally, returning to the original pdf. Use matlab numerical indexing to reverse course. Note: use r and not pdf:
pdfHist = r(x);
hist (pdfHist,50)
Probably an overkill for your distribution - but you can always write a Metropolis sampler.
On the other hand - implementation is straight forward so you'd have your sampler very quick.