I need to calculate the probability density of the function 1/2-x where x is a randomly generated number between 0 and 1.
My code looks like this:
n=10^6;
b=10^2;
x=(1./(2-(rand(1,n))));
a=histfit(x,b,'kernel'); % Requires Statistics and Machine Learning Toolbox
xlim([0.5 1.0])
And I get a decent graph that looks like this:
As it may be apparent, there are a couple of problems with this:
MATLAB draws a fit that differs from my histogram, because it counts in the empty space outside the [0.5 1] range of the function as well. This results in a distorted fit towards the edges. (The reason you don't see said empty space is because I entered an xlim there)
I don't know how I could divide every value in the Y-axis by 10^6, which would give me my probability density.
Thanks in advance.
To solve both of your problems, I suggest using hist (Note if you have a version above 2010b, you should use histogram instead) instead of histfit to first get the values of your histogram and then doing a regression and plotting them:
n=10^6;
b=10^2;
x=(1./(2-(rand(1,n))));
[counts,centers]=hist(x,b);
density = counts./trapz(centers, counts);
%// Thanks to #Arpi for this correction
polyFitting = polyfit(centers,density,3)
polyPlot = polyval(polyFitting,centers)
figure
bar(centers,density)
hold on
plot(centers,polyPlot,'r','LineWidth',3)
You can also up the resolution by adjusting b, which is set to 100 currently. Also try different regressions to see which one you prefer.
1. Better result can be obtained by using ksdensity and specifying the support of the distribution.
2. By using hist you have access to the counts and centers, thus the normalization to get density is straightforward.
Code to demonstrate the suggestions:
rng(125)
n=10^6;
b=10^2;
x=(1./(2-(rand(1,n))));
subplot(1,2,1)
a = histfit(x,b,'kernel');
title('Original')
xlim([0.5 1.0])
[f,c] = hist(x,b);
% normalization to get density
f = f/trapz(c,f);
% kernel density
pts = linspace(0.5, 1, 100);
[fk,xk] = ksdensity(x, pts, 'Support', [0.5, 1]);
subplot(1,2,2)
bar(c,f)
hold on
plot(xk,fk, 'red', 'LineWidth', 2)
title('Improved')
xlim([0.5 1.0])
Comparing the results:
EDIT: If you do not like the endings:
pts = linspace(0.5, 1, 500);
[fk,xk] = ksdensity(x, pts, 'Support', [0.5, 1]);
bar(c,f)
hold on
plot(xk(2:end-1),fk(2:end-1), 'red', 'LineWidth', 2)
title('Improved_2')
xlim([0.5 1.0])
Related
I want to represent data with 2 variables in 2D format. The value is represented by color and the 2 variables as the 2 axis. I am using the contourf function to plot my data:
clc; clear;
load('dataM.mat')
cMap=jet(256); %set the colomap using the "jet" scale
F2=figure(1);
[c,h]=contourf(xrow,ycol,BDmatrix,50);
set(h, 'edgecolor','none');
xlim([0.0352 0.3872]);
ylim([0.0352 0.3872]);
colormap(cMap);
cb=colorbar;
caxis([0.7 0.96]);
% box on;
hold on;
Both xrow and ycol are 6x6 matrices representing the coordinates. BDmatrix is the 6x6 matrix representing the corresponding data. However, what I get is this:
The following is the xrow and yrow matices:
The following is the BDmatrix matices:
Would it be possible for the contour color to vary smoothly rather than appearing as straight lines joining the data points? The problem of this figure is the coarse-granularity which is not appealing. I have tried to replace contourf with imagec but it seems not working. I am using MATLAB R2015b.
You can interpolate your data.
newpoints = 100;
[xq,yq] = meshgrid(...
linspace(min(min(xrow,[],2)),max(max(xrow,[],2)),newpoints ),...
linspace(min(min(ycol,[],1)),max(max(ycol,[],1)),newpoints )...
);
BDmatrixq = interp2(xrow,ycol,BDmatrix,xq,yq,'cubic');
[c,h]=contourf(xq,yq,BDmatrixq);
Choose the "smoothness" of the new plot via the parameter newpoints.
To reduce the Color edges, you can increase the number of value-steps. By default this is 10. The following code increases the number of value-steps to 50:
[c,h]=contourf(xq,yq,BDmatrixq,50);
A 3D-surf plot would be more suitable for very smooth color-shading. Just rotate it to a top-down view. The surf plot is also much faster than the contour plot with a lot of value-steps.
f = figure;
ax = axes('Parent',f);
h = surf(xq,yq,BDmatrixq,'Parent',ax);
set(h, 'edgecolor','none');
view(ax,[0,90]);
colormap(Jet);
colorbar;
Note 1: Cubic interpolation is not shape-preserving. That means, the interpolated shape can have maxima which are greater than the maximum values of the original BDmatrix (and minima which are less). If BDmatrix has noisy values, the interpolation might be bad.
Note 2: If you generated xrow and yrow by yourself (and know the limits), than you do not need that min-max-extraction what I did.
Note 3: After adding screenshots of your data matrices to your original posting, one can see, that xrow and ycol come from an ndgrid generator. So we also must use this here in order to be consistent. Since interp2 needs meshgrid we have to switch to griddedInterpolant:
[xq,yq] = ndgrid(...
linspace(min(min(xrow,[],1)),max(max(xrow,[],1)),newpoints ),...
linspace(min(min(ycol,[],2)),max(max(ycol,[],2)),newpoints )...
);
F = griddedInterpolant(xrow,ycol,BDmatrix,'cubic');
BDmatrixq = F(xq,yq);
I am trying to fit a distribution curve to the histogram of some data. (I have used some model data here instead because it is difficult to upload the actual data. I have included the complete code after my question.)
Because the histogram looks normally distributed when I plotted the x-axis in logscale, I transform the data first before fitting a normal distribution to it and I got the following results:
>>pdn=fitdist(log(data),'Normal')
pdn =
Normal distribution
mu = -0.334458 [-0.34704, -0.321876]
sigma = 0.351478 [0.342804, 0.360605]
When I plotted out the pdf with the histogram, I got this:
The result seems reasonable to me. Then I discovered that in the Matlab fitdist(), it already has a 'Lognormal' option and I don't really need the transform my data first and this is what I got:
>>pdln = fitdist(data,'Lognormal')
pdln =
Lognormal distribution
mu = -0.334458 [-0.34704, -0.321876]
sigma = 0.351478 [0.342804, 0.360605]
Exactly the same mean and standard deviation as I have got before. However, when I plotted it out with the histogram, I got a different curve:
This curve fits better to the data but the positions of the mean and the mean+/-std points are not as I have expected (i.e. mean at the peak and the mean+/-std at the same levels).
Which come to my question, why would fitdist(data,'Lognormal') give the same result as fitdist(log(data),'Normal') but a different plot? I have looked through the Matlab help pages and I still could not understand why, or where are my mistakes, please help.
My aim for all this is to get some numerical parameters about the distributions of my data under different conditions and compare them to see if there is any difference. At the moment, I am not certain which way would give me reliable estimates of the means and standard deviations.
The code for the graphs is below:
%random data in lognormal distribution
mu=-0.335742;
sigma=0.35228;
data=lognrnd(mu,sigma,[3000 1]);
%make histogram
interval=0.1;
svalue=sort(data);
bx(1)=interval/2;
i=2;
while bx(i-1)<=max(svalue)
bx(i)=bx(i-1)+interval;
i=i+1;
end
by=hist(svalue,bx);
subplot(211)
h = bar(bx,by,'hist');
set(h,'FaceColor',[.9 .9 .9]);
set(gca,'xlim',[0.05 10]);
xticks=[0.05 0.1 0.2 0.5 1 2 5 10];
set(gca,'xscale','log','xminortick','on')
set(gca,'xtick',xticks)
ylabel('counts')
subplot(212)
h = bar(bx,by,'hist');
set(h,'FaceColor',[.9 .9 .9]);
set(gca,'xlim',[0.05 10]);
xticks=[0.05 0.1 0.2 0.5 1 2 5 10];
set(gca,'xscale','log','xminortick','on')
set(gca,'xtick',xticks)
ylabel('counts')
% fit distribution curves
pdf_x = 0:0.01:max(data);
max_by=max(by); % for scaling the pdf to the histogram
% case 1 - PDF fitted using fitdist(log(data),'Normal')
subplot(211)
hold on
pdn = fitdist(log(data),'Normal')
pdf_y = pdf(pdn,log(pdf_x));
h1=plot(pdf_x,pdf_y./max(pdf_y).*max_by,'-k');
range=[exp(pdn.mu-pdn.sigma) exp(pdn.mu+pdn.sigma)];
h2=plot(exp(pdn.mu),pdf(pdn,(pdn.mu))./max(pdf_y).*max_by,'sk') ;
h3=plot(range,pdf(pdn,log(range))./max(pdf_y).*max_by,'ok') ;
title('PDF fitted using fitdist(log(data),''Normal'')');
legend([h1 h2 h3],'pdf','mean','meam+/-std');
% case 2 - PDF fitted using fitdist(data,'Lognormal')
subplot(212)
hold on
pdln = fitdist(data,'Lognormal')
pdf_y = pdf(pdln,pdf_x);
h1=plot(pdf_x,pdf_y./max(pdf_y).*max_by,'-b');
range=[exp(pdln.mu-pdln.sigma) exp(pdln.mu+pdln.sigma)];
h2=plot(exp(pdln.mu),pdf(pdln,exp(pdln.mu))./max(pdf_y).*max_by,'sb');
h3=plot(range,pdf(pdln,range)./max(pdf_y).*max_by,'ob') ;
title('PDF fitted using fitdist(data,''Lognormal'')');
legend([h1 h2 h3],'pdf','mean','meam+/-std');
I was trying to do histogram image comparison between two RGB images which includes heads of the same persons and non-heads to see the correlation between them. The reason I am doing this is because after performing scanning using HOG to check whether the scanning window is a head or not, I am now trying to track the same head throughout consequent frames and also I want to remove some clear false positives.
I currently tried both RGB and HSV histogram comparison and used Euclidean Distance to check the difference between the histograms. The following is the code I wrote:
%RGB histogram comparison
%clear all;
img1 = imread('testImages/samehead_1.png');
img2 = imread('testImages/samehead_2.png');
img1 = rgb2hsv(img1);
img2 = rgb2hsv(img2);
%% calculate number of bins = root(pixels);
[rows, cols] = size(img1);
no_of_pixels = rows * cols;
%no_of_bins = floor(sqrt(no_of_pixels));
no_of_bins = 256;
%% obtain Histogram for each colour
% -----1st Image---------
rHist1 = imhist(img1(:,:,1), no_of_bins);
gHist1 = imhist(img1(:,:,2), no_of_bins);
bHist1 = imhist(img1(:,:,3), no_of_bins);
hFig = figure;
hold on;
h(1) = stem(1:256, rHist1);
h(2) = stem(1:256 + 1/3, gHist1);
h(3) = stem(1:256 + 2/3, bHist1);
set(h, 'marker', 'none')
set(h(1), 'color', [1 0 0])
set(h(2), 'color', [0 1 0])
set(h(3), 'color', [0 0 1])
hold off;
% -----2nd Image---------
rHist2 = imhist(img2(:,:,1), no_of_bins);
gHist2 = imhist(img2(:,:,2), no_of_bins);
bHist2 = imhist(img2(:,:,3), no_of_bins);
hFig = figure;
hold on;
h(1) = stem(1:256, rHist2);
h(2) = stem(1:256 + 1/3, gHist2);
h(3) = stem(1:256 + 2/3, bHist2);
set(h, 'marker', 'none')
set(h(1), 'color', [1 0 0])
set(h(2), 'color', [0 1 0])
set(h(3), 'color', [0 0 1])
%% concatenate values of a histogram in 3D matrix
% -----1st Image---------
M1(:,1) = rHist1;
M1(:,2) = gHist1;
M1(:,3) = bHist1;
% -----2nd Image---------
M2(:,1) = rHist2;
M2(:,2) = gHist2;
M2(:,3) = bHist2;
%% normalise Histogram
% -----1st Image---------
M1 = M1./no_of_pixels;
% -----2nd Image---------
M2 = M2./no_of_pixels;
%% Calculate Euclidean distance between the two histograms
E_distance = sqrt(sum((M2-M1).^2));
The E_distance consists of an array containing 3 distances which refer to the red histogram difference, green and blue.
The Problem is:
When I compare the histogram of a non-head(eg. a bag) with that of a head..there is a clear difference in the error. So this is acceptable and can help me to remove the false positive.
However! When I am trying to check whether the two images are heads of the same person, this technique did not help at all as the head of another person gave a less Euclidean distance than that of the same person.
Can someone explain to me if I am doing this correctly, or maybe any guidance of what I should do?
PS: I got the idea of the LAB histogram comparison from this paper (Affinity Measures section): People Looking at each other
Color histogram similarity may be used as a good clue for tracking by detection, but don't count on it to disambiguate all possible matches between people-people and people-non-people.
According to your code, there is one thing you can do to improve the comparison: currently, you are working with per-channel histograms. This is not discriminative enough because you do not know when R-G-B components co-occur (e.g. you know how many times the red channel is in range 64-96 and how many times the blue is in range 32-64, but not when these occur simultaneously). To rectify this, you must work with 3D histograms, counting the co-occurrence of colors). For a discretization of 8 bins per channel, your histograms will have 8^3=512 bins.
Other suggestions for improvement:
Weighted assignment to neighboring bins according to interpolation weights. This eliminates the discontinuities introduced by bin quantization
Hierarchical splitting of detection window into cells (1 cell, 4 cells, 16 cells, etc), each with its own histogram, where the histograms of different levels and cells are concatenated. This allows catching local color details, like the color of a shirt, or even more local, a shirt pocket/sleeve.
Working with the Earth Mover's Distance (EMD) instead of the Euclidean metric for comparing histograms. This reduces color quantization effects (differences in histograms are weighted by color-space distance instead of equal weights), and allows for some error in the localization of cells within the detection window.
Use other cues for tracking. You'll be surprised how much the similarity between the HoG descriptors of your detections helps disambiguate matches!
I'm using the Matlab function "hist" to estimate the probability density function of a realization of a random process I have.
I'm actually:
1) taking the histogram of h0
2) normalizing its area in order to get 1
3) plotting the normalized curve.
The problem is that, no matter how many bins I use, the histogram never start from 0 and never go back to 0 whereas I would really like that kind of behavior.
The code I use is the following:
Nbin = 36;
[n,x0] = hist(h0,Nbin);
edge = find(n~=0,1,'last');
Step = x0(edge)/Nbin;
Scale_factor = sum(Step*n);
PDF_h0 = n/Scale_factor;
hist(h0 ,Nbin) %plot the histogram
figure;
plot(a1,p_rice); %plot the theoretical curve in blue
hold on;
plot(x0, PDF_h0,'red'); %plot the normalized curve obtained from the histogram
And the plots I get are:
If your problem is that the plotted red curve does not go to zero: you can solve that adding initial and final points with y-axis value 0. It seems from your code that the x-axis separation is Step, so it would be:
plot([x0(1)-Step x0 x0(end)+Step], [0 PDF_h0 0], 'red')
Assume y is a vector with random numbers following the distribution f(x)=sqrt(4-x^2)/(2*pi). At the moment I use the command hist(y,30). How can I plot the distribution function f(x)=sqrt(4-x^2)/(2*pi) into the same histogram?
Instead of normalizing numerically, you could also do it by finding a theoretical scaling factor as follows.
nbins = 30;
nsamples = max(size(y));
binsize = (max(y)-min(y)) / nsamples
hist(y,nbins)
hold on
x1=linspace(min(y),max(y),100);
scalefactor = nsamples * binsize
y1=scalefactor * sqrt(4-x^2)/(2*pi)
plot(x1,y1)
Update: How it works.
For any dataset that is large enough to give a good approximation to the pdf (call it f(x)), the integral of f(x) over this domain will be approximately unity. However we know that the area under any histogram is precisely equal to the total number of samples times the bin-width.
So a very simple scale factor to bring the pdf into line with the histogram is Ns*Wb, the total number of sample point times the width of the bins.
Let's take an example of another distribution function, the standard normal. To do exactly what you say you want, you do this:
nRand = 10000;
y = randn(1,nRand);
[myHist, bins] = hist(y,30);
pdf = normpdf(bins);
figure, bar(bins, myHist,1); hold on; plot(bins,pdf,'rx-'); hold off;
This is probably NOT what you actually want though. Why? You'll notice that your density function looks like a thin line at the bottom of your histogram plot. This is because a histogram is counts of numbers in bins, while a density function is normalized to integrate to one. If you have hundreds of items in a bin, there is no way that the density function will match that in scale, so you have a scaling or normalization problem. Either you have to normalize the histogram, or plot a scaled distribution function. I prefer to scale the distribution function so that my counts are sensical when I look at the histogram:
normalizedpdf = pdf/sum(pdf)*sum(myHist);
figure, bar(bins, myHist,1); hold on; plot(bins,normalizedpdf,'rx-'); hold off;
Your case is the same, except you'll use the function f(x) you specified instead of the normpdf command.
Let me add another example to the mix:
%# some normally distributed random data
data = randn(1e3,1);
%# histogram
numbins = 30;
hist(data, numbins);
h(1) = get(gca,'Children');
set(h(1), 'FaceColor',[.8 .8 1])
%# figure out how to scale the pdf (with area = 1), to the area of the histogram
[bincounts,binpos] = hist(data, numbins);
binwidth = binpos(2) - binpos(1);
histarea = binwidth*sum(bincounts);
%# fit a gaussian
[muhat,sigmahat] = normfit(data);
x = linspace(binpos(1),binpos(end),100);
y = normpdf(x, muhat, sigmahat);
h(2) = line(x, y*histarea, 'Color','b', 'LineWidth',2);
%# kernel estimator
[f,x,u] = ksdensity( data );
h(3) = line(x, f*histarea, 'Color','r', 'LineWidth',2);
legend(h, {'freq hist','fitted Gaussian','kernel estimator'})