Understanding Matlab example fit a Mixture of Two Normals distribution - matlab

I am following the example to fit a Mixture of Two Normals distribution that
you can find here.
x = [trnd(20,1,50) trnd(4,1,100)+3];
hist(x,-2.25:.5:7.25);
pdf_normmixture = #(x,p,mu1,mu2,sigma1,sigma2) ...
p*normpdf(x,mu1,sigma1) + (1-p)*normpdf(x,mu2,sigma2);
pStart = .5;
muStart = quantile(x,[.25 .75])
sigmaStart = sqrt(var(x) - .25*diff(muStart).^2)
start = [pStart muStart sigmaStart sigmaStart];
lb = [0 -Inf -Inf 0 0];
ub = [1 Inf Inf Inf Inf];
options = statset('MaxIter',300, 'MaxFunEvals',600);
paramEsts = mle(x, 'pdf',pdf_normmixture, 'start',start, ...
'lower',lb, 'upper',ub, 'options',options)
bins = -2.5:.5:7.5;
h = bar(bins,histc(x,bins)/(length(x)*.5),'histc');
h.FaceColor = [.9 .9 .9];
xgrid = linspace(1.1*min(x),1.1*max(x),200);
pdfgrid = pdf_normmixture(xgrid,paramEsts(1),paramEsts(2),paramEsts(3),paramEsts(4),paramEsts(5));
hold on
plot(xgrid,pdfgrid,'-')
hold off
xlabel('x')
ylabel('Probability Density')
Could you please explain why when it calculates
h = bar(bins,histc(x,bins)/(length(x)*.5),'histc');
it divides for (length(x)*.5)

The idea is to scale your histogram such that is represents probability instead of counts. This is the unscaled histogram
The vertical axis is the count of how many events fall within each bin. You have defined your bins to be -2.25:.5:7.25 and thus your bin width is 0.5. So if we look at the first bar of the histogram, it is telling us that the number of elements in x (or the number of events in your experiment) that fall in the bin -2.5 to -2 (note the width of 0.5) is 2.
But now you want to compare this with a probability distribution function and we know that the integral of a PDF is 1. This is the same as saying the area under the PDF curve is 1. So if we want our histogram's vertical scale to match the of the PDF as in this second picture
we need to scale it such that the total area of all the histogram's bars sum to 1. The area of the first bar of the histogram is height times width which according to our investigation above is 2*0.5. Now the width stays the same for all the bins in the histogram so we can find its total area by adding up all the bar heights and then multiplying by the width once at the end. The sum of all the heights in the histogram is the total number of events, which is the total number of elements in x or length(x). Thus the area of the first histogram is length(x)*0.5 and to make this area equal to 1 we need to scale all the bar heights by dividing them by length(x)*0.5.

Related

Trying to find the full width at half maximum for a noisy signal

I am currently looking to find the FWHM of a signal.
The peak of the signal is around 1.0 but the lowest value is only around 0.6.
So, in fact, I don't have a half maximum value.
How could I proceed to analyze the curve in a similar fashion?
Here is an image of the curve:
Assuming that min and max of the signal y is 0.6 and 1 respectively you can find FWHM:
idx1 and idx2 each return indexes of two points before and after the desired point at half. We can use these points to interpolate value of x at the half of y.
%height at half
h=(0.6+1)/2;
idx1=find(y>h,1) +[-1 0];
idx2=find(y>h,1,'last') +[0 1];
x1 = interp1(y(idx1),x(idx1),h);
x2 = interp1(y(idx2),x(idx2),h);
w = x2 - x1;

Matlab - concentration vs distance plot (2D)

I have several sets of data. Each set is a list of numbers which is the distance from 0 that a particle has travelled. Each set is associated with a finite time, so set 1 is the distances at T=0; set 2 is the distances at T=1 and so on. The size of each set is the total number of particles and the size of each set is the same.
I want to plot a concentration vs distance line.
For example, if there are 1000 particles (the size of the sets); at time T=0 then the plot will just be a straight line x=0 because all the particles are at 0 (the set contains 1000 zeroes). So the concentration at x=0 =100% and is 0% at all other distances
At T=1 and T=2 and so on, the distances will increase (generally) so I might have sets that look like this: (just an example)
T1 = (1.1,2.2,3.0,1.2,3.2,2.3,1.4...) etc T2 = (2.9,3.2,2.6,4.5,4.3,1.4,5.8...) etc
it is likely that each number in each set is unique in that set
The aim is to have several plots (I can eventually plot them on one graph) that show the concentration on the y-axis and the distance on the x-axis. I imagine that as T increases T0, T1, T2 then the plot will flatten until the concentration is roughly the same everywhere.
The x-axis (distance) has a fixed maximum which is the same for each plot. So, for example, some sets will have a curve that hits zero on the y-axis (concentration) at a low value for x (distance) but as the time increases, I envisage a nearly flat line where the line does not cross the x-axis (concentration is non-zero everywhere)
I have tried this with a histogram, but it is not really giving the results I want. I would like a line plot but have to try and put the distances into common-sense sized bins.
thank you W
some rough data
Y1 = 1.0e-09 * [0.3358, 0.3316, 0.3312, 0.3223, 0.2888, 0.2789, 0.2702,...
0.2114, 0.1919, 0.1743, 0.1738, 0.1702, 0.0599, 0.0003, 0, 0, 0, 0, 0, 0];
Y2 = 1.0e-08 * [0.4566, 0.4130, 0.3439, 0.3160, 0.3138, 0.2507, 0.2483,...
0.1714, 0.1371, 0.1039, 0.0918, 0.0636, 0.0502, 0.0399, 0.0350, 0.0182,...
0.0010, 0, 0, 0];
Y3 = 1.0e-07 * [0.2698, 0.2671, 0.2358, 0.2250, 0.2232, 0.1836, 0.1784,...
0.1690, 0.1616, 0.1567, 0.1104, 0.0949, 0.0834, 0.0798, 0.0479, 0.0296,...
0.0197, 0.0188, 0.0173, 0.0029];
These data sets contain the distances of just 20 particles. The Y0 set is zeros. I will be dealing with thousands, so the data sets will be too large.
Thankyou
Well, basically, you just miss the hold command. But first, put all your data in one matrix, like this:
Y = [1.0e-09 * [0.3358, 0.3316, 0.3312, 0.3223, 0.2888, 0.2789, 0.2702,...
0.2114, 0.1919, 0.1743, 0.1738, 0.1702, 0.0599, 0.0003, 0, 0, 0, 0, 0, 0];
1.0e-08 * [0.4566, 0.4130, 0.3439, 0.3160, 0.3138, 0.2507, 0.2483,...
0.1714, 0.1371, 0.1039, 0.0918, 0.0636, 0.0502, 0.0399, 0.0350, 0.0182,...
0.0010, 0, 0, 0];
1.0e-07 * [0.2698, 0.2671, 0.2358, 0.2250, 0.2232, 0.1836, 0.1784,...
0.1690, 0.1616, 0.1567, 0.1104, 0.0949, 0.0834, 0.0798, 0.0479, 0.0296,...
0.0197, 0.0188, 0.0173, 0.0029]];
Then you need to plot each time step separately, and use the hold on to paste them on the same axes:
hold on
for r = size(Y,1):-1:1
histogram(Y(r,:));
end
hold off
T_names = [repmat('T',size(Y,1),1) num2str((size(Y,1):-1:1).')];
legend(T_names)
Which will give you (using the example data):
Notice, that in the loop I iterate on the rows backwards - that's just to make the narrower histograms plot on the wider, so you can see all of them clearly.
EDIT
In case you want continues lines, and not bins, you have to first get the histogram values by histcounts, then plot them like a line:
hold on
for r = 1:size(Y,1)
[H,E] = histcounts(Y(r,:));
plot(E,[H(1) H])
end
hold off
T_names = [repmat('T',size(Y,1),1) num2str((1:size(Y,1)).')];
legend(T_names)
With your small example data it doesn't look so impressive though:

In matlab plot scatter points as different sized points that cover the x and y error instead of error bars

In matlab, I want to plot scatter data with both x and y errors, which I can do this using errorbarxy function.
I am wondering, however, if I can use the upper and lower limits of x and y to instead plot the scatter points as different sized semi-transparent points that cover the error 'region' where the error bars would usually cover?
i.e. how can I achieve scatter(x,y,a,c) where a is the area defined by upper and lower limits in each direction?
My code for the normal errorbarxy is:
X = 10 * rand(7,1);
Y = 10 * rand(7,1);
ux = rand(7,1);
uy = rand(7,1);
lx = rand(7,1);
ly = rand(7,1);
errorbarxy(X,Y,ux,uy,lx,ly,'Color','k','LineStyle','none','Marker','o','MarkerFaceColor','w','MarkerSize',11);
set(gca,'YScale','log');
set(gca,'XScale','log');
Note the log scaling.
Thanks for any ideas!
To achieve a scaling of the size of your scatter points you normally subtract the minimum to shift your data to 0, then divide by the maximum to normalise to the interval [0,1]. In this case I'd recommend some increase of interval, say [4,9] to increase the area for visualisation in scatter. So for one dimension:
X = rand(1e3,1)*8+14; %// some random data to make this example work
X = X-min(X); %// shift to 0
X = X/max(X); %// normalise to [0,1]
X = 5*X+4; %// increase area for visualisation purposes

Efficient inpaint with neighbouring pixels

I am implementing a simple algorithm to do in-painting on a "damaged" image. I have a predefined mask that specifies the area which needs to be fixed. My strategy is to start at the border of the masked area and in-paint each pixel with the central mean of its neighboring non-zero pixels, repeating until there's no unknown pixels left.
function R = inPainting(I, mask)
H = [1 2 1; 2 0 2; 1 2 1];
R = I;
n = 1;
[row,col,~] = find(~mask); %Find zeros in mask (area to be inpainted)
unknown = horzcat(row, col)';
while size(unknown,2) > 0
new_unknown = [];
new_R = R;
for u = unknown
r = u(1);
c = u(2);
nb = R(max((r-n), 1):min((r+n), end), max((c-n),1):min((c+n),end));
nz = nb~=0;
nzs = sum(nz(:));
if nzs ~= 0 %We have non-zero neighbouring pixels. In-paint with average.
new_R(r,c) = sum(nb(:)) / nzs;
else
new_unknown = horzcat(new_unknown, u);
end
end
unknown = new_unknown;
R = new_R;
end
This works well, but it's not very efficient. Is it possible to vectorize such an approach, using mostly matrix operations? Does someone know of a more efficient way to implement this algorithm?
If I understand your problem statement, you are given a mask and you wish to fill in these pixels in this mask with the mean of the neighbourhood pixels that surround each pixel in the mask. Another constraint is that the image is defined such that any pixels that belong to the mask in the same spatial locations are zero in this mask. You are starting from the border of the mask and are propagating information towards the innards of the mask. Given this algorithm, there is unfortunately no way you can do this with standard filtering techniques as the current time step is dependent on the previous time step.
Image filtering mechanisms, like imfilter or conv2 can't work here because of this dependency.
As such, what I can do is help you speed up what is going on inside your loop and hopefully this will give you some speed up overall. I'm going to introduce you to a function called im2col. This is from the image processing toolbox, and given that you can use imfilter, we can use this function.
im2col creates a 2D matrix such that each column is a pixel neighbourhood unrolled into a single vector. How it works is that each pixel neighbourhood in column major order is grabbed, so we get a pixel neighbourhood at the top left corner of the image, then move down one row, and another row and we keep going until we reach the last row. We then move one column over and repeat the same process. For each pixel neighbourhood that we have, it gets unrolled into a single vector, and the output would be a MN x K matrix where you have a neighbourhood size of M x N for each pixel neighbourhood and there are K neighbourhoods.
Therefore, at each iteration of your loop, we can unroll the current inpainted image's pixel neighbourhoods into single vectors, determine which pixel neighborhoods are non-zero and from there, determine how many zero values there are for each of these selected pixel neighbourhood. After, we compute the mean for these non-zero columns disregarding the zero elements. Once we're done, we update the image and move to the next iteration.
What we're going to need to do first is pad the image with a 1 pixel border so that we're able to grab neighbourhoods that extend beyond the borders of the image. You can use padarray, also from the image processing toolbox.
Therefore, we can simply do this:
function R = inPainting(I, mask)
R = double(I); %// For precision
n = 1;
%// Change - column major indices
unknown = find(~mask); %Find zeros in mask (area to be inpainted)
%// Until we have searched all unknown pixels
while numel(unknown) ~= 0
new_R = R;
%// Change - take image at current iteration and
%// create columns of pixel neighbourhoods
padR = padarray(new_R, [n n], 'replicate');
cols = im2col(padR, [2*n+1 2*n+1], 'sliding');
%// Change - Access the right pixel neighbourhoods
%// denoted by unknown
nb = cols(:,unknown);
%// Get total sum of each neighbourhood
nbSum = sum(nb, 1);
%// Get total number of non-zero elements per pixel neighbourhood
nzs = sum(nb ~= 0, 1);
%// Replace the right pixels in the image with the mean
new_R(unknown(nzs ~= 0)) = nbSum(nzs ~= 0) ./ nzs(nzs ~= 0);
%// Find new unknown pixels to look at
unknown = unknown(nzs == 0);
%// Update image for next iteration
R = new_R;
end
%// Cast back to the right type
R = cast(R, class(I));

Centering histogram bins and setting percentage range in Matlab

I'm doing data-analysis in Matlab and I'm plotting the frequencies of discrete values (1-15) into a histogram on Matlab. I would like to center the bins so that the center of 1st bin is on value 1, center of the 2nd bin is on value 2, etc.
Also I would like to get percentage range for the Y-axis. Any quick ideas how to do this? Here is a picture highlighting my question:
Start by using hist with your expected centers. Then use bar and xlabel to display the histogram with the y axis the way you want:
dat = randi(15,100,1);
centers = 1:15;
counts = hist(dat,centers);
pcts = 100 * counts / sum(counts);
bar(centers,counts)
ylabel('%')