How to compute the Cumulative Distribution Function of an image in MATLAB - matlab

I need to compute the Cumulative Distribution Function of an image. I normalized the values using the following code:
im = imread('cameraman.tif');
im_hist = imhist(im);
tf = cumsum(im_hist); %transformation function
tf_norm = tf / max(tf);
plot(tf_norm), axis tight
Also, when the CDF function is plotted, does the plot have to be somewhat a straight line which ideally should be a straight line to represent equal representation for pixel intensities?

You can obtain a CDF very easily by:
A = imread('cameraman.tif');
[histIM, bins] = imhist(A);
cdf = cumsum(counts) / sum(counts);
plot(cdf); % If you want to be more precise on the X axis plot it against bins
For the famous cameraman.tif it results in:
As for your second question. When the histogram is perfectly equalized (i.e. when at each intensity correspond roughly the same number of pixels) your CDF will look like a straight 45° line.
EDIT: Strictly speaking cumsum alone is not a proper CDF as a CDF describe a probability, hence it must obey probability axioms. In particular the first axiom of probability tell us that a probability value should lie in the range [0 ... 1] and cumsum alone does not guarantee that.

function icdf = imgcdf(img)
% Author: Javier Montoya (jmontoyaz#gmail.com).
% http://www.lis.ic.unicamp.br/~jmontoya
%
% IMGCDF calculates the Cumulative Distribution Function of image I.
% Input parameters:
% img: image I (passed as a bidimensional matrix).
% Ouput parameters:
% icdf: cumulative distribution function.
%
% See also: IMGHIST
%
% Usage:
% I = imread('tire.tif');
% icdf = imgcdf(I);
% figure; stem(icdf); title('Cumulative Distribution Function (CDF)');
if exist('img', 'var') == 0
error('Error: Specify an input image.');
end
icdf = [];
ihist = imghist(img);
maxgval = 255;
icdf = zeros(1,maxgval);
icdf(1)= ihist(1);
for i=2:1:maxgval+1
icdf(i) = ihist(i) + icdf(i-1);
end
end
Its not my code but it works for me! Also check the cdf function in the statistics toolbox

Related

How do I obtain the X values corresponding to a given probability value using probplot, Matlab?

I would like to obtain the corresponding X value for a given probability value in the probplot command in Matlab.
% Input Data
X = [ 78572.12124
85385.44766
71947.35964
87050.1572
77451.33935
54705.93013
69341.39769
63182.64207
71262.53291 ];
% Plotting lognormal proabability plot with reference line
h1 = probplot('lognormal', X);
% Obtain the X and Y coordinates of the reference line
Xcoord = h1(2).XData';
Ycoord = h1(2).YData';
% Note the Y data is in quantiles, it could be converted to proabability
% values
Ycoord_probability = normcdf(Ycoord);
% How do I obtain the Xcoord corresponding to Ycoord_probability = 1e-3
% (say) ??
From a Matlab standpoint, you can do the following.
Xcoord_queried = interp1(Ycoord_probability, Xcoord, Ycoord_probability_queried);
where Ycoord_probability_queried is 1e-3 in your example and the output is the answer I believe you are looking for. See more info on interpolation in Matlab.
From a math standpoint, perhaps you are better off fitting a lognormal distribution to your data set, using fitdist for example, and retrieve information from the fitted distribution itself.

Wind rose diagram plot in matlab

I am currently trying to plot in matlab a wind rose diagram with data wind velocities and directions for a given period.
The main program is such that after plotting several plots on the Weibull distribution, it calls another matlab program to produce a wind rose.
The wind rose program is essentially here: https://www.mathworks.com/matlabcentral/fileexchange/47248-wind-rose
and the main program is mainly based on https://www.mathworks.com/matlabcentral/fileexchange/41996-computing-weibull-distribution-parameters-from-a-wind-speed-time-series?focused=3786165&tab=function
I've been working yesterday on a very old matlab edition and I had serious problems with making the code run.
Today with Octave in a Ubuntu machine and after some efforts I managed to get a result with a minor problem: the wind rose did not had all the information it should have.
I run the program in a new version of matlab and then I got the following message:
Error using WindRose (line 244)
is not a valid property for WindRose function.
Error in octavetestoforiginalprogram (line 184)
[figure_handle,count,speeds,directions,Table] = WindRose(dir,vel,Options);
How can the program run in Octave and now produces such an error, I don't understand.
Does anyone have an idea of what this error means?
Note: I am posting the entire code below if anyone wants to read it:
%% EXTRACT AND PLOT RAW DATA
% Extract wind speed data from a file
v = xlsread('1981-1985_timeseries.xlsx');
% Plot the measured wind speed
plot(v)
title('Wind speed time series');
xlabel('Measurement #');
ylabel('Wind speed [m/s]');
%% PROCESS DATA
% Remove nil speed data (to avoid infeasible solutions in the following)
v(find(v==0)) = [];
% Extract the unique values occuring in the series
uniqueVals = unique(v);
uniqueVals(isnan(uniqueVals))=[];
% Get the number of unique values
nbUniqueVals = length(uniqueVals);
% Find the number of occurences of each unique wind speed value
for i=1:nbUniqueVals
nbOcc = v(find(v==uniqueVals(i)));
N(i) = length(nbOcc);
end
% Get the total number of measurements
nbMeas = sum(N);
% To take into account the measurement resolution
% (i.e., a measured wind speed of 2.0 m/s may actually correspond to a
% real wind speed of 2.05 or 1.98 m/s), compute the delta vector which
% contains the difference between two consecutive unique values
delta(1) = uniqueVals(1);
for i=2:nbUniqueVals
delta(i) = uniqueVals(i) - uniqueVals(i-1);
end
% Get the frequency of occurence of each unique value
for i=1:nbUniqueVals
prob(i) = N(i)/(nbMeas*delta(i));
end
% Get the cumulated frequency
freq = 0;
for i=1:nbUniqueVals
freq = prob(i)*delta(i) + freq;
cumFreq(i) = freq;
end
%% PLOT THE RESULTING DISTRIBUTION
% Plot the distribution
figure
subplot(2,1,1);
pp=plot(uniqueVals,prob)
title('Distribution extracted from the time series');
xlabel('Wind speed [m/s]');
ylabel('Probability');
% Plot the cumulative distribution
subplot(2,1,2);
plot(uniqueVals,cumFreq)
title('Cumulative distribution extracted from the time series');
xlabel('Wind speed [m/s]');
ylabel('Cumulative probability');
%% EXTRACT THE PARAMETERS USING A GRAPHICAL METHOD
% See the following references for more explanations:
% - Akdag, S.A. and Dinler, A., A new method to estimate Weibull parameters
% for wind energy applications, Energy Conversion and Management,
% 50 :7 1761�1766, 2009
% - Seguro, J.V. and Lambert, T.W., Modern estimation of the parameters of
% the Weibull wind speed distribution for wind energy analysis, Journal of
% Wind Engineering and Industrial Aerodynamics, 85 :1 75�84, 2000
% Linearize distributions (see papers)
ln = log(uniqueVals);
lnln = log(-log(1-cumFreq));
% Check wether the vectors contain inifinite values, if so, remove them
test = isinf(lnln);
for i=1:nbUniqueVals
if (test(i)==1)
ln(i)= [];
lnln(i)= [];
end
end
% Extract the line parameters (y=ax+b) using the polyfit function
params = polyfit(ln,lnln',1);
a = params(1);
b = params(2);
y=a*ln+b;
% Compare the linealized curve and its fitted line
figure
plot(ln,y,'b',ln,lnln,'r')
title('Linearized curve and fitted line comparison');
xlabel('x = ln(v)');
ylabel('y = ln(-ln(1-cumFreq(v)))');
% Extract the Weibull parameters c and k
k = a
c = exp(-b/a)
%% CHECK RESULTS
% Define the cumulative Weibull probability density function
% F(V) = 1-exp(-((v/c)^k)) = 1-exp(-a2), with a1 = v/c, a2 = (v/c)^k
a1 = uniqueVals/c;
a2 = a1.^k;
cumDensityFunc = 1-exp(-a2);
% Define the Weibull probability density function
%f(v)=k/c*(v/c)^(k-1)*exp(-((v/c)^k))=k2*a3.*exp(-a2),
% with k2 = k/c, a3 = (v/c)^(k-1)
k1 = k-1;
a3 = a1.^k1;
k2 = k/c;
densityFunc = k2*a3.*exp(-a2);
% Plot and compare the obtained Weibull distribution with the frequency plot
figure
subplot(2,2,1);
pp=plot(uniqueVals,prob,'.',uniqueVals,densityFunc, 'r')
title('Weibull probability density function');
xlabel('v');
ylabel('f(v)');
subplot(2,2,3)
h=hist(v);
title('Wind speed time series');
xlabel('Measurement #');
ylabel('Wind speed [m/s]');
h=h/(sum(h)*10);
bar(h)
% Same for the cumulative distribution
subplot(2,2,2);
plot(uniqueVals,cumFreq,'.',uniqueVals,cumDensityFunc, 'r')
title('Cumulative Weibull probability density function');
xlabel('v');
ylabel('F(V)');
%inner
figure
hold on
pp=plot(uniqueVals,prob,'.',uniqueVals,densityFunc, 'r')
title('Weibull probability density function');
xlabel('v');
ylabel('f(v)');
bar(h)
hold off
%inner
%rose
w=xlsread('rose.xlsx');
dir=w(:,2)*10;
vel=w(:,1);
Options = {'anglenorth','FreqLabelAngle',0,'angleeast','FreqLabelAngle',90,'labels',{'N (0)','S (180)','E (90)','W (270)'},'freqlabelangle',45,'nDirections',20,'nFreq',25,'LegendType',1};
[figure_handle,count,speeds,directions,Table] = WindRose(dir,vel,Options);
close all; clear Options;
After a quick read of the script documentation, here is what I found concerning the creation of the windrose plot:
% With options in a cell array:
Options = {'anglenorth',0,'angleeast',90,'labels',{'N (0°)','S (180°)','E (90°)','W (270°)'},'freqlabelangle',45};
[figure_handle,count,speeds,directions,Table] = WindRose(dir,spd,Options);
% With options in a structure:
Options.AngleNorth = 0;
Options.AngleEast = 90;
Options.Labels = {'N (0°)','S (180°)','E (90°)','W (270°)'};
Options.FreqLabelAngle = 45;
[figure_handle,count,speeds,directions,Table] = WindRose(dir,spd,Options);
close all;
% Usual calling:
[figure_handle,count,speeds,directions,Table] = WindRose(dir,spd,'anglenorth',0,'angleeast',90,'labels',{'N (0°)','S (180°)','E (90°)','W (270°)'},'freqlabelangle',45);
Your error is:
Error using WindRose (line 244)
is not a valid property for WindRose function.
Error in octavetestoforiginalprogram (line 184)
[figure_handle,count,speeds,directions,Table] = WindRose(dir,vel,Options);
And it is being produced within the routine that undertakes option arguments sanitization. Since options must be provided in the form of name-value pairs... it seems that the script is detecting a mismatching number of elements and one or more names with a missing value. And here they are:
Options =
{'anglenorth','FreqLabelAngle',0,'angleeast','FreqLabelAngle',90,'labels',{'N
(0)','S (180)','E (90)','W
(270)'},'freqlabelangle',45,'nDirections',20,'nFreq',25,'LegendType',1};
The two properties marked with a bold font have no value associated to them (unlike the examples in the tutorial) and this probably messes up the whole parametrization process. Probably, the first option being extracted is anglenorth = FreqLabelAngle, which is not correct.

Matlab - Plot normal distribution with unknown mean that is normally distributed with known parameters

How can I use Matlab to plot a univariate normal distribution when it has unknown mean but the mean is also normally distributed with known mean of mean and variance of mean?
Eg. N(mean, 4) and mean ~N(2,8)
Using the law of total probability, one can write
pdf(x) = int(pdf(x | mean) * pdf(mean) dmean)
So, we can calculate it in Matlab as follows:
% define the constants
sigma_x = 4;
mu_mu = 2;
sigma_mu = 8;
% define the pdf of a normal distribution using the Symbolic Toolbox
% to be able to calculate the integral
syms x mu sigma
pdf(x, mu, sigma) = 1./sqrt(2*pi*sigma.^2) * exp(-(x-mu).^2/(2*sigma.^2));
% calculate the desired pdf
pdf_x(x) = int(pdf(x, mu, sigma_x) * pdf(mu, mu_mu, sigma_mu), mu, -Inf, Inf);
pdfCheck = int(pdf_x, x, -Inf, Inf) % should be one
% plot the desired pdf (green) and N(2, 4) as reference (red)
xs = -40:0.1:40;
figure
plot(xs, pdf(xs, mu_mu, sigma_x), 'r')
hold on
plot(xs, pdf_x(xs), 'g')
Note that I also checked that the integral of the calculated pdf is indeed equal to one, which is a necessary condition for being a pdf.
The green plot is the requested pdf. The red plot is added as reference and represents the pdf for a constant mean (equal to the average mean).

Normalize histogram within the range of 0 to 1 [duplicate]

How to normalize a histogram such that the area under the probability density function is equal to 1?
My answer to this is the same as in an answer to your earlier question. For a probability density function, the integral over the entire space is 1. Dividing by the sum will not give you the correct density. To get the right density, you must divide by the area. To illustrate my point, try the following example.
[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution
% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off
% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off
You can see for yourself which method agrees with the correct answer (red curve).
Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx) which expresses the integral of the probability density function, i.e.
% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off
Since 2014b, Matlab has these normalization routines embedded natively in the histogram function (see the help file for the 6 routines this function offers). Here is an example using the PDF normalization (the sum of all the bins is 1).
data = 2*randn(5000,1) + 5; % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf') % PDF normalization
The corresponding PDF is
Nbins = h.NumBins;
edges = h.BinEdges;
x = zeros(1,Nbins);
for counter=1:Nbins
midPointShift = abs(edges(counter)-edges(counter+1))/2;
x(counter) = edges(counter)+midPointShift;
end
mu = mean(data);
sigma = std(data);
f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
The two together gives
hold on;
plot(x,f,'LineWidth',1.5)
An improvement that might very well be due to the success of the actual question and accepted answer!
EDIT - The use of hist and histc is not recommended now, and histogram should be used instead. Beware that none of the 6 ways of creating bins with this new function will produce the bins hist and histc produce. There is a Matlab script to update former code to fit the way histogram is called (bin edges instead of bin centers - link). By doing so, one can compare the pdf normalization methods of #abcd (trapz and sum) and Matlab (pdf).
The 3 pdf normalization method give nearly identical results (within the range of eps).
TEST:
A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));
figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');
subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');
subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');
subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');
dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);
max(normalization_difference_trapz)
max(normalization_difference_sum)
The maximum difference between the new PDF normalization and the former one is 5.5511e-17.
hist can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar. Example:
Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)
or if you want a one-liner:
bar(hist(Y) ./ sum(hist(Y)))
Documentation:
hist
bar
Edit: This solution answers the question How to have the sum of all bins equal to 1. This approximation is valid only if your bin size is small relative to the variance of your data. The sum used here correspond to a simple quadrature formula, more complex ones can be used like trapz as proposed by R. M.
[f,x]=hist(data)
The area for each individual bar is height*width. Since MATLAB will choose equidistant points for the bars, so the width is:
delta_x = x(2) - x(1)
Now if we sum up all the individual bars the total area will come out as
A=sum(f)*delta_x
So the correctly scaled plot is obtained by
bar(x, f/sum(f)/(x(2)-x(1)))
The area of abcd`s PDF is not one, which is impossible like pointed out in many comments.
Assumptions done in many answers here
Assume constant distance between consecutive edges.
Probability under pdf should be 1. The normalization should be done as Normalization with probability, not as Normalization with pdf, in histogram() and hist().
Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach
The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization.
I assume the mistake with hist()'s approach here is about the normalization as partially pdf, not completely as probability.
Code with hist() [deprecated]
Some remarks
First check: sum(f)/N gives 1 if Nbins manually set.
pdf requires the width of the bin (dx) in the graph g
Code
%http://stackoverflow.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND
%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off
Output is in Fig. 1.
Code with histogram()
Some remarks
First check: a) sum(f) is 1 if Nbins adjusted with histogram()'s Normalization as probability, b) sum(f)/N is 1 if Nbins is manually set without normalization.
pdf requires the width of the bin (dx) in the graph g
Code
%%METHOD 5: with histogram()
% http://stackoverflow.com/a/38809232/54964
N=10000;
figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges;
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off
Output in Fig. 2 and expected output is met: area 1.0000.
Matlab: 2016a
System: Linux Ubuntu 16.04 64 bit
Linux kernel 4.6
For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. In which case I do
[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')
There is an excellent three part guide for Histogram Adjustments in MATLAB (broken original link, archive.org link),
the first part is on Histogram Stretching.

How to normalize a histogram in MATLAB?

How to normalize a histogram such that the area under the probability density function is equal to 1?
My answer to this is the same as in an answer to your earlier question. For a probability density function, the integral over the entire space is 1. Dividing by the sum will not give you the correct density. To get the right density, you must divide by the area. To illustrate my point, try the following example.
[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution
% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off
% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off
You can see for yourself which method agrees with the correct answer (red curve).
Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx) which expresses the integral of the probability density function, i.e.
% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off
Since 2014b, Matlab has these normalization routines embedded natively in the histogram function (see the help file for the 6 routines this function offers). Here is an example using the PDF normalization (the sum of all the bins is 1).
data = 2*randn(5000,1) + 5; % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf') % PDF normalization
The corresponding PDF is
Nbins = h.NumBins;
edges = h.BinEdges;
x = zeros(1,Nbins);
for counter=1:Nbins
midPointShift = abs(edges(counter)-edges(counter+1))/2;
x(counter) = edges(counter)+midPointShift;
end
mu = mean(data);
sigma = std(data);
f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
The two together gives
hold on;
plot(x,f,'LineWidth',1.5)
An improvement that might very well be due to the success of the actual question and accepted answer!
EDIT - The use of hist and histc is not recommended now, and histogram should be used instead. Beware that none of the 6 ways of creating bins with this new function will produce the bins hist and histc produce. There is a Matlab script to update former code to fit the way histogram is called (bin edges instead of bin centers - link). By doing so, one can compare the pdf normalization methods of #abcd (trapz and sum) and Matlab (pdf).
The 3 pdf normalization method give nearly identical results (within the range of eps).
TEST:
A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));
figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');
subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');
subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');
subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');
dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);
max(normalization_difference_trapz)
max(normalization_difference_sum)
The maximum difference between the new PDF normalization and the former one is 5.5511e-17.
hist can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar. Example:
Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)
or if you want a one-liner:
bar(hist(Y) ./ sum(hist(Y)))
Documentation:
hist
bar
Edit: This solution answers the question How to have the sum of all bins equal to 1. This approximation is valid only if your bin size is small relative to the variance of your data. The sum used here correspond to a simple quadrature formula, more complex ones can be used like trapz as proposed by R. M.
[f,x]=hist(data)
The area for each individual bar is height*width. Since MATLAB will choose equidistant points for the bars, so the width is:
delta_x = x(2) - x(1)
Now if we sum up all the individual bars the total area will come out as
A=sum(f)*delta_x
So the correctly scaled plot is obtained by
bar(x, f/sum(f)/(x(2)-x(1)))
The area of abcd`s PDF is not one, which is impossible like pointed out in many comments.
Assumptions done in many answers here
Assume constant distance between consecutive edges.
Probability under pdf should be 1. The normalization should be done as Normalization with probability, not as Normalization with pdf, in histogram() and hist().
Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach
The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization.
I assume the mistake with hist()'s approach here is about the normalization as partially pdf, not completely as probability.
Code with hist() [deprecated]
Some remarks
First check: sum(f)/N gives 1 if Nbins manually set.
pdf requires the width of the bin (dx) in the graph g
Code
%http://stackoverflow.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND
%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off
Output is in Fig. 1.
Code with histogram()
Some remarks
First check: a) sum(f) is 1 if Nbins adjusted with histogram()'s Normalization as probability, b) sum(f)/N is 1 if Nbins is manually set without normalization.
pdf requires the width of the bin (dx) in the graph g
Code
%%METHOD 5: with histogram()
% http://stackoverflow.com/a/38809232/54964
N=10000;
figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges;
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off
Output in Fig. 2 and expected output is met: area 1.0000.
Matlab: 2016a
System: Linux Ubuntu 16.04 64 bit
Linux kernel 4.6
For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. In which case I do
[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')
There is an excellent three part guide for Histogram Adjustments in MATLAB (broken original link, archive.org link),
the first part is on Histogram Stretching.