Determining probability density for a Normal Distribution in Matlab - matlab

I have the following code, which I use to obtain the graph below. How can I determine the probability density of the values as I want my Y-Axis label to be Probability density or,do I have to normalise the Y-values?
Thanks
% thresh_strain contains a Normally Distributed set of numbers
[mu_j,sigma_j] = normfit(thresh_strain);
x=linspace(mu_j-4*sigma_j,mu_j+4*sigma_j,200);
pdf_x = 1/sqrt(2*pi)/sigma_j*exp(-(x-mu_j).^2/(2*sigma_j^2));
plot(x,pdf_x);

Your figure as it stands is correct - the area under the curve is 1. It does not need to be normalised.
You can check this by plotting the cumulative distribution function:
plot(x,(x(2)-x(1)).*cumsum(pdf_x));
The y-axis in your figure needs to be relabeled as it is not "number of dents". "Probability density" is an acceptable label.

Related

Why is my Gaussian function giving values out of range?

I am trying to plot Gaussian using Matlab. My code is like this.
a=1/(0.1*sqrt(2*3.14))
y1=a*exp(-1*(((X1-Mu).^2)./(2*(Sigma^2)) ))
plot(X1,y1)
My graph looks like the image on link
It is showing correct shape but values at y axis is going up to 4. As per my knowledge Gaussian is a probability distribution function and thus must always return value between 0 and 1.Thus I am apprehensive if my implementation is correct?
Yes it is a probability distribution function but it is not required to return value between 0 and 1 everytime. As you can see from the picture below, Gaussian graph depends on variance and mean.
Your implementation is correct. The gaussian is a probability DENSITY function, which is different from a probability distribution. The former must only be larger or equal than zero but when integrated over the entire range of posible X1, the result must be equal to 1.
Probability distributions are the ones whos values must be lower or equal to 1.
As a sidenote. Matlab has both the gaussian probability density and distribution functions builtin as normpdf and normcdf respectively.

MATLAB how to plot a vector of probability densities on to a histogram?

I currently have a vector of calculated probability densities, i.e.
probden = (0.0008, 0.0016, 0.0048, 0.0064, 0.0072, ... , 1.0936, ... , 0.0072, 0.0064, 0.0048, 0.0016, 0.0008)
The list of calculated probability densities should be in the shape of a normal distribution.
I also have a same length list of the bins of each probability density.
I am trying to create a histogram such that each probability density is reflected on each bin on the X-axis.
If I use the function hist, it only shows how many probability densities are in each bin.
How should I go on approaching this issue?
Thanks!
The function that goes hand in hand with hist is bar
In your case, you already have your histogram/distribution values (so no need to call hist), you can directly call bar:
bar( YourvectorOfBins , probden )

ROC curve and libsvm

Given a ROC curve drawn with plotroc.m (see here):
Theoretical question: How to select the best threshold to be used?
Programming qeuestion: How to induce the libsvm classifier to work with the selected (best) threshold?
ROC curve is plot generated by plotting fraction of true positive on y-axis versus fraction of false positive on x-axis. So, co-ordinates of any point (x,y) on ROC curve indicates FPR and TPR value at particular threshold.
As shown in figure, we find the point (x,y) on ROC curve which corresponds to the minimum distance of that point from top-left corner (i.e given by(0,1)) of plot. The threshold value corresponding to that point is the required threshold. Sorry, I am not permitted to put any image, so couldn't explain with figure. But, for more details about this click ROC related help
Secondly, In libsvm, svmpredict function returns you probability of data sample belonging to a particular class. So, if that probability(for positive class) is greater than threshold (obtained from ROC plot) then we can classify the sample to positive class. These few lines might be usefull to you:
[pred_labels,~,p] = svmpredict(target_labels,feature_test,svmStruct,'-b 1');
% where, svmStruct is structure returned by svmtrain function.
op = p(:,svmStruct.Label==1); % This gives probability for positive
% class (i.e whose label is 1 )
Now if this variable 'op' is greater than threshold then we can classify the corresponding test sample to positive class. This can be done as
op_labels = op>th; % where 'th' is threshold obtained from ROC

Plotting a Normal Distribution in Matlab

Is this a good way of plotting a Normal Distribution? On occasion, I get a pdf value (pdf_x) which is greater than 1.
% thresh_strain contains a Normally Distributed set of numbers
[mu_j,sigma_j] = normfit(thresh_strain);
x=linspace(mu_j-4*sigma_j,mu_j+4*sigma_j,200);
pdf_x = 1/sqrt(2*pi)/sigma_j*exp(-(x-mu_j).^2/(2*sigma_j^2));
plot(x,pdf_x);
The integral of a pdf is 1, at any point the values can be higher. Your plot is corect.
As #Daniel points out in his answer, with continuous random variables the PDF is a derivative of a probability (or a measure of intensity) so it can be greater than one. The CDF is a probability and must always be on [0, 1].
As an example, take the distributions marked below. The area under each curve is 1 (they are valid distributions) yet the density can be above 1.
Related StackExchange posts: here and here

MATLAB : frequency distribution

I have raw observations of 500 numeric values (ranging from 1 to 25000) in a text file, I wish to make a frequency distribution in MATLAB. I did try the histogram (hist), however I would prefer a frequency distribution curve than blocks and bars.
Any help is appreciated !
If you pass two output parameters to HIST, you will get both the x-axis and y-axis values. Then you can plot the data as you like. For instance,
[counts, bins] = hist(mydata);
plot(bins, counts); %# get a line plot of the histogram
You could try Kernel smoothing density estimate