Shouldn't fitdist(data,'Lognormal') give the same result and plot as fitdist(log(data),'Normal') in Matlab? - matlab

I am trying to fit a distribution curve to the histogram of some data. (I have used some model data here instead because it is difficult to upload the actual data. I have included the complete code after my question.)
Because the histogram looks normally distributed when I plotted the x-axis in logscale, I transform the data first before fitting a normal distribution to it and I got the following results:
>>pdn=fitdist(log(data),'Normal')
pdn =
Normal distribution
mu = -0.334458 [-0.34704, -0.321876]
sigma = 0.351478 [0.342804, 0.360605]
When I plotted out the pdf with the histogram, I got this:
The result seems reasonable to me. Then I discovered that in the Matlab fitdist(), it already has a 'Lognormal' option and I don't really need the transform my data first and this is what I got:
>>pdln = fitdist(data,'Lognormal')
pdln =
Lognormal distribution
mu = -0.334458 [-0.34704, -0.321876]
sigma = 0.351478 [0.342804, 0.360605]
Exactly the same mean and standard deviation as I have got before. However, when I plotted it out with the histogram, I got a different curve:
This curve fits better to the data but the positions of the mean and the mean+/-std points are not as I have expected (i.e. mean at the peak and the mean+/-std at the same levels).
Which come to my question, why would fitdist(data,'Lognormal') give the same result as fitdist(log(data),'Normal') but a different plot? I have looked through the Matlab help pages and I still could not understand why, or where are my mistakes, please help.
My aim for all this is to get some numerical parameters about the distributions of my data under different conditions and compare them to see if there is any difference. At the moment, I am not certain which way would give me reliable estimates of the means and standard deviations.
The code for the graphs is below:
%random data in lognormal distribution
mu=-0.335742;
sigma=0.35228;
data=lognrnd(mu,sigma,[3000 1]);
%make histogram
interval=0.1;
svalue=sort(data);
bx(1)=interval/2;
i=2;
while bx(i-1)<=max(svalue)
bx(i)=bx(i-1)+interval;
i=i+1;
end
by=hist(svalue,bx);
subplot(211)
h = bar(bx,by,'hist');
set(h,'FaceColor',[.9 .9 .9]);
set(gca,'xlim',[0.05 10]);
xticks=[0.05 0.1 0.2 0.5 1 2 5 10];
set(gca,'xscale','log','xminortick','on')
set(gca,'xtick',xticks)
ylabel('counts')
subplot(212)
h = bar(bx,by,'hist');
set(h,'FaceColor',[.9 .9 .9]);
set(gca,'xlim',[0.05 10]);
xticks=[0.05 0.1 0.2 0.5 1 2 5 10];
set(gca,'xscale','log','xminortick','on')
set(gca,'xtick',xticks)
ylabel('counts')
% fit distribution curves
pdf_x = 0:0.01:max(data);
max_by=max(by); % for scaling the pdf to the histogram
% case 1 - PDF fitted using fitdist(log(data),'Normal')
subplot(211)
hold on
pdn = fitdist(log(data),'Normal')
pdf_y = pdf(pdn,log(pdf_x));
h1=plot(pdf_x,pdf_y./max(pdf_y).*max_by,'-k');
range=[exp(pdn.mu-pdn.sigma) exp(pdn.mu+pdn.sigma)];
h2=plot(exp(pdn.mu),pdf(pdn,(pdn.mu))./max(pdf_y).*max_by,'sk') ;
h3=plot(range,pdf(pdn,log(range))./max(pdf_y).*max_by,'ok') ;
title('PDF fitted using fitdist(log(data),''Normal'')');
legend([h1 h2 h3],'pdf','mean','meam+/-std');
% case 2 - PDF fitted using fitdist(data,'Lognormal')
subplot(212)
hold on
pdln = fitdist(data,'Lognormal')
pdf_y = pdf(pdln,pdf_x);
h1=plot(pdf_x,pdf_y./max(pdf_y).*max_by,'-b');
range=[exp(pdln.mu-pdln.sigma) exp(pdln.mu+pdln.sigma)];
h2=plot(exp(pdln.mu),pdf(pdln,exp(pdln.mu))./max(pdf_y).*max_by,'sb');
h3=plot(range,pdf(pdln,range)./max(pdf_y).*max_by,'ob') ;
title('PDF fitted using fitdist(data,''Lognormal'')');
legend([h1 h2 h3],'pdf','mean','meam+/-std');

Related

Normalize histogram within the range of 0 to 1 [duplicate]

How to normalize a histogram such that the area under the probability density function is equal to 1?
My answer to this is the same as in an answer to your earlier question. For a probability density function, the integral over the entire space is 1. Dividing by the sum will not give you the correct density. To get the right density, you must divide by the area. To illustrate my point, try the following example.
[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution
% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off
% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off
You can see for yourself which method agrees with the correct answer (red curve).
Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx) which expresses the integral of the probability density function, i.e.
% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off
Since 2014b, Matlab has these normalization routines embedded natively in the histogram function (see the help file for the 6 routines this function offers). Here is an example using the PDF normalization (the sum of all the bins is 1).
data = 2*randn(5000,1) + 5; % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf') % PDF normalization
The corresponding PDF is
Nbins = h.NumBins;
edges = h.BinEdges;
x = zeros(1,Nbins);
for counter=1:Nbins
midPointShift = abs(edges(counter)-edges(counter+1))/2;
x(counter) = edges(counter)+midPointShift;
end
mu = mean(data);
sigma = std(data);
f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
The two together gives
hold on;
plot(x,f,'LineWidth',1.5)
An improvement that might very well be due to the success of the actual question and accepted answer!
EDIT - The use of hist and histc is not recommended now, and histogram should be used instead. Beware that none of the 6 ways of creating bins with this new function will produce the bins hist and histc produce. There is a Matlab script to update former code to fit the way histogram is called (bin edges instead of bin centers - link). By doing so, one can compare the pdf normalization methods of #abcd (trapz and sum) and Matlab (pdf).
The 3 pdf normalization method give nearly identical results (within the range of eps).
TEST:
A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));
figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');
subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');
subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');
subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');
dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);
max(normalization_difference_trapz)
max(normalization_difference_sum)
The maximum difference between the new PDF normalization and the former one is 5.5511e-17.
hist can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar. Example:
Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)
or if you want a one-liner:
bar(hist(Y) ./ sum(hist(Y)))
Documentation:
hist
bar
Edit: This solution answers the question How to have the sum of all bins equal to 1. This approximation is valid only if your bin size is small relative to the variance of your data. The sum used here correspond to a simple quadrature formula, more complex ones can be used like trapz as proposed by R. M.
[f,x]=hist(data)
The area for each individual bar is height*width. Since MATLAB will choose equidistant points for the bars, so the width is:
delta_x = x(2) - x(1)
Now if we sum up all the individual bars the total area will come out as
A=sum(f)*delta_x
So the correctly scaled plot is obtained by
bar(x, f/sum(f)/(x(2)-x(1)))
The area of abcd`s PDF is not one, which is impossible like pointed out in many comments.
Assumptions done in many answers here
Assume constant distance between consecutive edges.
Probability under pdf should be 1. The normalization should be done as Normalization with probability, not as Normalization with pdf, in histogram() and hist().
Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach
The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization.
I assume the mistake with hist()'s approach here is about the normalization as partially pdf, not completely as probability.
Code with hist() [deprecated]
Some remarks
First check: sum(f)/N gives 1 if Nbins manually set.
pdf requires the width of the bin (dx) in the graph g
Code
%http://stackoverflow.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND
%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off
Output is in Fig. 1.
Code with histogram()
Some remarks
First check: a) sum(f) is 1 if Nbins adjusted with histogram()'s Normalization as probability, b) sum(f)/N is 1 if Nbins is manually set without normalization.
pdf requires the width of the bin (dx) in the graph g
Code
%%METHOD 5: with histogram()
% http://stackoverflow.com/a/38809232/54964
N=10000;
figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges;
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off
Output in Fig. 2 and expected output is met: area 1.0000.
Matlab: 2016a
System: Linux Ubuntu 16.04 64 bit
Linux kernel 4.6
For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. In which case I do
[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')
There is an excellent three part guide for Histogram Adjustments in MATLAB (broken original link, archive.org link),
the first part is on Histogram Stretching.

MATLAB quickie: How to plot markers on a freqs plot?

I haven't used MATLAB in a while and I am stuck on a small detail. I would really appreciate it if someone could help me out!
So I am trying to plot a transfer function using a specific function called freqs but I can't figure out how I can label specific points on the graph.
b = [0 0 10.0455]; % Numerator coefficients
a = [(1/139344) (1/183.75) 1]; % Denominator coefficients
w = logspace(-3,5); % Frequency vector
freqs(b,a,w)
grid on
I want to mark values at points x=600 Hz and 7500 Hz with a marker or to be more specific, points (600,20) and (7500,-71), both of which should lie on the curve. For some reason, freqs doesn't let me do that.
freqs is very limited when you want to rely on it plotting the frequency response for you. Basically, you have no control on how to modify the graph on top of what MATLAB generates for you.
Instead, generate the output response in a vector yourself, then plot the magnitude and phase of the output yourself so that you have full control. If you specify an output when calling freqs, you will get the response of the system.
With this, you can find the magnitude of the output by abs and the phase by angle. BTW, (600,20) and (7500,-71) make absolutely no sense unless you're talking about magnitude in dB.... which I will assume is the case for the moment.
As such, we can reproduce the plot that freqs gives by the following. The key is to use semilogx to get a semi-logarithmic graph on the x-axis. On top of this, declare those points that you want to mark on the magnitude, so (600,20) and (7500,-71):
%// Your code:
b = [0 0 10.0455]; % Numerator coefficients
a = [(1/139344) (1/183.75) 1]; % Denominator coefficients
w = logspace(-3,5); % Frequency vector
%// New code
h = freqs(b,a,w); %// Output of freqs
mag = 20*log10(abs(h)); %// Magnitude in dB
pha = (180/pi)*angle(h); %// Phase in degrees
%// Declare points
wpt = [600, 7500];
mpt = [20, -71];
%// Plot the magnitude as well as markers
figure;
subplot(2,1,1);
semilogx(w, mag, wpt, mpt, 'r.');
xlabel('Frequency');
ylabel('Magnitude (dB)');
grid;
%// Plot phase
subplot(2,1,2);
semilogx(w, pha);
xlabel('Frequency');
ylabel('Phase (Degrees)');
grid;
We get this:
If you check what freqs generates for you, you'll see that we get the same thing, but the magnitude is in gain (V/V) instead of dB. If you want it in V/V, then just plot the magnitude without the 20*log10() call. Using your data, the markers I plotted are not on the graph (wpt and mpt), so adjust the points to whatever you see fit.
There are a couple issues before we attempt to answer your question. First, there is no data-point at 600Hz or 7500Hz. These frequencies fall between data-points when graphed using the freqs command. See the image below, with datatips added interactively. I copy-pasted your code to generate this data.
Second, it does not appear that either (600,20) or (7500,-71) lie on the curves, at least with the data as you entered above.
One solution is to use plot a marker on the desired position, and use a "text" object to add a string describing the point. I put together a script using your data, to generate this figure:
The code is as follows:
b = [0 0 10.0455];
a = [(1/139344) (1/183.75) 1];
w = logspace(-3,5);
freqs(b,a,w)
grid on
figureHandle = gcf;
figureChildren = get ( figureHandle , 'children' ); % The children this returns may vary.
axes1Handle = figureChildren(1);
axes2Handle = figureChildren(2);
axes1Children = get(axes1Handle,'children'); % This should be a "line" object.
axes2Children = get(axes2Handle,'children'); % This should be a "line" object.
axes1XData = get(axes1Children,'xdata');
axes1YData = get(axes1Children,'ydata');
axes2XData = get(axes2Children,'xdata');
axes2YData = get(axes2Children,'ydata');
hold(axes1Handle,'on');
plot(axes1Handle,axes1XData(40),axes1YData(40),'m*');
pointString1 = ['(',num2str(axes1XData(40)),',',num2str(axes1YData(40)),')'];
handleText1 = text(axes1XData(40),axes1YData(40),pointString1,'parent',axes1Handle);
hold(axes2Handle,'on');
plot(axes2Handle,axes2XData(40),axes2YData(40),'m*');
pointString2 = ['(',num2str(axes2XData(40)),',',num2str(axes2YData(40)),')'];
handleText2 = text(axes2XData(40),axes2YData(40),pointString2,'parent',axes2Handle);

Histogram (hist) not starting (and ending) in zero

I'm using the Matlab function "hist" to estimate the probability density function of a realization of a random process I have.
I'm actually:
1) taking the histogram of h0
2) normalizing its area in order to get 1
3) plotting the normalized curve.
The problem is that, no matter how many bins I use, the histogram never start from 0 and never go back to 0 whereas I would really like that kind of behavior.
The code I use is the following:
Nbin = 36;
[n,x0] = hist(h0,Nbin);
edge = find(n~=0,1,'last');
Step = x0(edge)/Nbin;
Scale_factor = sum(Step*n);
PDF_h0 = n/Scale_factor;
hist(h0 ,Nbin) %plot the histogram
figure;
plot(a1,p_rice); %plot the theoretical curve in blue
hold on;
plot(x0, PDF_h0,'red'); %plot the normalized curve obtained from the histogram
And the plots I get are:
If your problem is that the plotted red curve does not go to zero: you can solve that adding initial and final points with y-axis value 0. It seems from your code that the x-axis separation is Step, so it would be:
plot([x0(1)-Step x0 x0(end)+Step], [0 PDF_h0 0], 'red')

Relative Frequency Histograms and Probability Density Functions

The function called DicePlot simulates rolling 10 dice 5000 times.
The function calculates the sum of values of the 10 dice of each roll, which will be a 1 ⇥ 5000 vector, and plot relative frequency histogram with edges of bins being selected in where each bin in the histogram represents a possible value of for the sum of the dice.
The mean and standard deviation of the 1 ⇥ 5000 sums of dice values will be computed, and the probability density function of normal distribution (with the mean and standard deviation computed) on top of the relative frequency histogram will be plotted.
Below is my code so far - What am I doing wrong? The graph shows up but not the extra red line on top? I looked at answers like this, and I don't think I'll be plotting anything like the Gaussian function.
% function[]= DicePlot()
for roll=1:5000
diceValues = randi(6,[1, 10]);
SumDice(roll) = sum(diceValues);
end
distr=zeros(1,6*10);
for i = 10:60
distr(i)=histc(SumDice,i);
end
bar(distr,1)
Y = normpdf(X)
xlabel('sum of dice values')
ylabel('relative frequency')
title(['NumDice = ',num2str(NumDice),' , NumRolls = ',num2str(NumRolls)]);
end
It is supposed to look like
But it looks like
The red line is not there because you aren't plotting it. Look at the documentation for normpdf. It computes the pdf, it doesn't plot it. So you problem is how do you add this line to the plot. The answer to that problem is to google "matlab hold on".
Here's some code to get you going in the right direction:
% Normalize your distribution
normalizedDist = distr/sum(distr);
bar(normalizedDist ,1);
hold on
% Setup your density function using the mean and std of your sample data
mu = mean(SumDice);
stdv = std(SumDice);
yy = normpdf(xx,mu,stdv);
xx = linspace(0,60);
% Plot pdf
h = plot(xx,yy,'r'); set(h,'linewidth',1.5);

How to normalize a histogram in MATLAB?

How to normalize a histogram such that the area under the probability density function is equal to 1?
My answer to this is the same as in an answer to your earlier question. For a probability density function, the integral over the entire space is 1. Dividing by the sum will not give you the correct density. To get the right density, you must divide by the area. To illustrate my point, try the following example.
[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution
% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off
% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off
You can see for yourself which method agrees with the correct answer (red curve).
Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx) which expresses the integral of the probability density function, i.e.
% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off
Since 2014b, Matlab has these normalization routines embedded natively in the histogram function (see the help file for the 6 routines this function offers). Here is an example using the PDF normalization (the sum of all the bins is 1).
data = 2*randn(5000,1) + 5; % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf') % PDF normalization
The corresponding PDF is
Nbins = h.NumBins;
edges = h.BinEdges;
x = zeros(1,Nbins);
for counter=1:Nbins
midPointShift = abs(edges(counter)-edges(counter+1))/2;
x(counter) = edges(counter)+midPointShift;
end
mu = mean(data);
sigma = std(data);
f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
The two together gives
hold on;
plot(x,f,'LineWidth',1.5)
An improvement that might very well be due to the success of the actual question and accepted answer!
EDIT - The use of hist and histc is not recommended now, and histogram should be used instead. Beware that none of the 6 ways of creating bins with this new function will produce the bins hist and histc produce. There is a Matlab script to update former code to fit the way histogram is called (bin edges instead of bin centers - link). By doing so, one can compare the pdf normalization methods of #abcd (trapz and sum) and Matlab (pdf).
The 3 pdf normalization method give nearly identical results (within the range of eps).
TEST:
A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));
figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');
subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');
subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');
subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');
dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);
max(normalization_difference_trapz)
max(normalization_difference_sum)
The maximum difference between the new PDF normalization and the former one is 5.5511e-17.
hist can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar. Example:
Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)
or if you want a one-liner:
bar(hist(Y) ./ sum(hist(Y)))
Documentation:
hist
bar
Edit: This solution answers the question How to have the sum of all bins equal to 1. This approximation is valid only if your bin size is small relative to the variance of your data. The sum used here correspond to a simple quadrature formula, more complex ones can be used like trapz as proposed by R. M.
[f,x]=hist(data)
The area for each individual bar is height*width. Since MATLAB will choose equidistant points for the bars, so the width is:
delta_x = x(2) - x(1)
Now if we sum up all the individual bars the total area will come out as
A=sum(f)*delta_x
So the correctly scaled plot is obtained by
bar(x, f/sum(f)/(x(2)-x(1)))
The area of abcd`s PDF is not one, which is impossible like pointed out in many comments.
Assumptions done in many answers here
Assume constant distance between consecutive edges.
Probability under pdf should be 1. The normalization should be done as Normalization with probability, not as Normalization with pdf, in histogram() and hist().
Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach
The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization.
I assume the mistake with hist()'s approach here is about the normalization as partially pdf, not completely as probability.
Code with hist() [deprecated]
Some remarks
First check: sum(f)/N gives 1 if Nbins manually set.
pdf requires the width of the bin (dx) in the graph g
Code
%http://stackoverflow.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND
%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off
Output is in Fig. 1.
Code with histogram()
Some remarks
First check: a) sum(f) is 1 if Nbins adjusted with histogram()'s Normalization as probability, b) sum(f)/N is 1 if Nbins is manually set without normalization.
pdf requires the width of the bin (dx) in the graph g
Code
%%METHOD 5: with histogram()
% http://stackoverflow.com/a/38809232/54964
N=10000;
figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges;
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off
Output in Fig. 2 and expected output is met: area 1.0000.
Matlab: 2016a
System: Linux Ubuntu 16.04 64 bit
Linux kernel 4.6
For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. In which case I do
[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')
There is an excellent three part guide for Histogram Adjustments in MATLAB (broken original link, archive.org link),
the first part is on Histogram Stretching.