I'm having troubles when fitting a pdf to an histogram in Matlab. I'm using gmdistribution.fit because my data is multi-modal. This is what I have done:
data=[0.35*randn(1,100000), 0.5*randn(1,100000)+5, 1*randn(1,100000)+3]'; %multimodal data
x=min(data):(max(data)-min(data))/10000:max(data);
%Normalized Histogram
[counts,edges]=histcounts(data,500, 'Normalization', 'pdf');
bw=edges(2)-edges(1);
centers=edges(1:end-1)+bw;
H = bar(centers,counts,'hist');
hold on
%Fitting with gmdistribution
rng default
obj=gmdistribution.fit(data,3,'Replicates',5);
%the PDF
PDF=zeros(1,length(x));
for i=1:obj.NumComponents
k=obj.ComponentProportion(i);
u=obj.mu(i);
sigma=obj.Sigma(i);
PDF=PDF+k*normpdf(x,u,sigma);
end
PDF=PDF/trapz(x,PDF); %normalization (just in case)
plot(x,PDF)
%Fitting with ksdensity (for comparison)
[PDF2,xi]=ksdensity(data,x);
plot(x,PDF2)
legend('Normalized Histogram','gmdistribution','ksdensity')
Histogram and PDFs
As you can see, the Gaussian Mixture doesn't fit the histogram properly. The PDF from the ksdensiti function is much better. I have also tried to fit just one gaussian. If you run the same previous code, using
data=[0.35*randn(1,100000)]';
and
obj=gmdistribution.fit(data,1,'Replicates',5);
you get the following
Histogram and PDFs for one gaussian
Again, the pdf from gmdistribution doesn't fit the histogram. It seems that the problem is with the scaling factor in the data generation (the 0.35). What am I doing wrong?
The Sigma parameter of the gmdistribution object corresponds to the covariance, however, the normpdf function needs the standard deviation. The problem is fixed by replacing normpdf(x,u,sigma) with normpdf(x,u,sqrt(sigma)) in the for loop.
Related
I have a data set in excel so I passed it to MATLAB to draw a histogram and append gaussian fitting. My code is below.
vData = xlsread("2.xlsx");
figure(1);
hHist = histogram(vData, -2.7:0.001:-2.4);
As I run my code, I get a histogram like this
histogram of my data
To gaussian fit my histogram, I add some code like this
figure(2);
histfit(vData); % I'm not sure this is the right fitting method
But the result I got is like this
fitting on histogram
I guess the bin size and bin edge is not appropriate for my data because my data is usually clustered around -2.5.
hisfit method doesn't have bin size or bin edge parameter so I think I can't use this method.
I'm wondering how I can get an appropriate gaussian fitting to my histogram. Thank you for you help.
I have an image I which pixel intensities fall within the range of 0-1. I can calculate the image histogram by normalizing it but I found the curves is not exactly the same as the histogram of raw data. This will cause some issue for the later peaks finding process(See attached two images).
My question is in Matlab, is there any way I can plot the image histogram without normalization the data so that I can keep the curve shape unchanged? This will benefit for those raw images when their pixel intensities are not within 0-1 ranges. Currently, I cannot calculate their histogram if I don't normalize the data.
The Matlab code for normalization and histogram calculation is attached. Any suggestion will be appreciated!
h = imhist(mat2gray(I));
Documentation of imhist tells us that the function checks the data type of the input and scale the values accordingly. Therefore, without testing with your attached data, this may work:
h = imhist(uint8(I));
An alternatively you may scale the integer-representation to floating-representation, by either using argument of mat2gray
h = imhist(mat2gray(I, [0,255]));
or just divide it.
h = imhist(I/255);
The imhist answer in this thread describing normalizing or casting is completely correctly. Alternatively, you could use the histogram function in MATLAB which will work with unnormalized floating point data:
A = 255*rand(500,500);
histogram(A);
I have got the 10,000 values for my data using Matlab. When I plotted the histogram and fitted it with normal distribution I got the following figure
Is there some mistake in this histogram fitting or what should I do to scale it properly.
or you could just call
histfit(your_data,num_bins)
I've used the gmdistribution to fit data to a Gaussian mixture model. I wanted to plot a contour plot http://imgur.com/yVE1M where the contours are obviously missing. For a 1D problem I found fplot, but now I'm stumped.
I ran into a similar problem when I wrote an EM algorithm for gaussian mixtures. Here is the snippet of code that fixed it in my case:
for l=1:k
zz=gmdistribution(MU(l,:),SIG(:,:,l),PI(l));
ezcontour(#(x,y)pdf(zz,[x y]),[minx1 maxx1],[miny1 maxy1],250);
end
The key is to increase N:
ezcontour(...,N) plots FUN over the default domain using an N-by-N
grid. The default value for N is 60.
I have a data which is 100x1 vector. How can I display its empirical pdf in Matlab? Also, if I want to compare the pdf of three vectors on the same graph, then how to do that?
Right now I am using pdfplot.m file to plot my empirical pdf, however when I want to compare the 3 distributions by using 'hold on', then firstly its not working and secondly all the distributions are in same color. Thanks!
EDIT: I don't want to plot cdf.
What you are looking for is Kernel density estimation (also known as Parzen windows). Its implemented in KSDENSITY function in the Statistics toolbox:
data = randn(100,1);
ksdensity(data)
The Wikipedia entry above has a MATLAB example using a function submission on FEX
hist:
hist(data)
or, if you want more control over how it is presented, use:
[n,x] = hist(data);
plot(x,n,'rx-'); %# just an example, plot the pdf with red x's and a line,
%# instead of bars
figure;
plot(x, cumsum(n)/sum(n)); %# plot the CDF