Distribution histogram - matlab

Hi i am trying to make a simple distribution histogram using some code from stack overflow
however i am unable to get it to work. i know that there are is a simple method for this using statistic toolbox but form a learning point of view i prefer a more explanatory code - can any one help me ?
%%
clear all
load('Mini Project 1.mat')
% Set data to var2
data = var2;
% Set the number of bins
nbins = 0:.8:8;
% Create a histogram plot of data sorted into (nbins) equally spaced bins
n = hist(data,nbins);
% Plot a bar chart with y values at each x value.
% Notice that the length(x) and length(y) have to be same.
bar(nbins,n);
MEAN = mean(data);
STD = sqrt(mean((data - MEAN).^2)); % can also use the simple std(data)
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((nbins-MEAN)/STD).^2 );
f = f*sum(nbins)/sum(f);
hold on;
% Plots a 2-D line plot(x,y) with the normal distribution,
% c = color cyan , Width of line = 2
plot (data,f, 'c', 'LineWidth', 2);
xlabel('');
ylabel('Firmness of apples after one month of storage')
title('Histogram compared to normal distribution');
hold of

You are confusing
hist
with
histc
Read up on both.
Also, you are not defining the number of bins, you are defining the bins themselves .

I don't have Matlab at hand now, but try the following:
If you want to compare a normal distribution to the bar plot bar(nbins,n), you should first normalize it:
bar(nbins,n/sum(n))
See if this solves your problem.
If not, try also removing the line f = f*sum(nbins)/sum(f);.

Related

MATLAB: combining and normalizing histograms with different sample sizes

I have four sets of data, the distribution of which I would like to represent in MATLAB in one figure. Current code is:
[n1,x1]=hist([dataset1{:}]);
[n2,x2]=hist([dataset2{:}]);
[n3,x3]=hist([dataset3{:}]);
[n4,x4]=hist([dataset4{:}]);
bar(x1,n1,'hist');
hold on; h1=bar(x1,n1,'hist'); set(h1,'facecolor','g')
hold on; h2=bar(x2,n2,'hist'); set(h2,'facecolor','g')
hold on; h3=bar(x3,n3,'hist'); set(h3,'facecolor','g')
hold on; h4=bar(x4,n4,'hist'); set(h4,'facecolor','g')
hold off
My issue is that I have different sampling sizes for each group, dataset1 has an n of 69, dataset2 has an n of 23, dataset3 and dataset4 have n's of 10. So how do I normalize the distributions when representing these three groups together?
Is there some way to..for example..divide the instances in each bin by the sampling for that group?
You can normalize your histograms by dividing by the total number of elements:
[n1,x1] = histcounts(randn(69,1));
[n2,x2] = histcounts(randn(23,1));
[n3,x3] = histcounts(randn(10,1));
[n4,x4] = histcounts(randn(10,1));
hold on
bar(x4(1:end-1),n4./sum(n4),'histc');
bar(x3(1:end-1),n3./sum(n3),'histc');
bar(x2(1:end-1),n2./sum(n2),'histc');
bar(x1(1:end-1),n1./sum(n1),'histc');
hold off
ax = gca;
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))
However, as you can see above, you can do some more things to make your code more simple and short:
You only need to hold on once.
Instead of collecting all the bar handles, use the axes handle.
Plot the bar in ascending order of the number of elements in the dataset, so all histograms will be clearly visible.
With the axes handle set all properties at one command.
and as a side note - it's better to use histcounts.
Here is the result:
EDIT:
If you want to also plot the pdf line from histfit, then you can save it first, and then plot it normalized:
dataset = {randn(69,1),randn(23,1),randn(10,1),randn(10,1)};
fits = zeros(100,2,numel(dataset));
hold on
for k = numel(dataset):-1:1
total = numel(dataset{k}); % for normalizing
f = histfit(dataset{k}); % draw the histogram and fit
% collect the curve data and normalize it:
fits(:,:,k) = [f(2).XData; f(2).YData./total].';
x = f(1).XData; % collect the bar positions
n = f(1).YData; % collect the bar counts
f.delete % delete the histogram and the fit
bar(x,n./total,'histc'); % plot the bar
end
ax = gca; % get the axis handle
% set all color and transparency for the bars:
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))
% plot all the curves:
plot(squeeze(fits(:,1,:)),squeeze(fits(:,2,:)),'LineWidth',3)
hold off
Again, there are some other improvements you can introduce to your code:
Put everything in a loop to make thigs more easily changed later.
Collect all the curves data to one variable so you can plot them all together very easily.
The new result is:

Plot confusion matrix

I want to plot a confusion matrix in MATLAB. Here's my code;
data = rand(3, 3)
imagesc(data)
colormap(gray)
colorbar
When I run this, a confusion matrix with a color bar is shown. But usually, I have seen confusion matrix in MATLAB will give counts as well as probabilities. How can I get them? How can I change the class labels which will be shown as 1,2,3, etc.?
I want a matrix like this:
If you do not have the neural network toolbox, you can use plotConfMat. It gets you the following result.
I have also included an independent example below without the need for the function:
% sample data
confmat = magic(3);
labels = {'Dog', 'Cat', 'Horse'};
numlabels = size(confmat, 1); % number of labels
% calculate the percentage accuracies
confpercent = 100*confmat./repmat(sum(confmat, 1),numlabels,1);
% plotting the colors
imagesc(confpercent);
title('Confusion Matrix');
ylabel('Output Class'); xlabel('Target Class');
% set the colormap
colormap(flipud(gray));
% Create strings from the matrix values and remove spaces
textStrings = num2str([confpercent(:), confmat(:)], '%.1f%%\n%d\n');
textStrings = strtrim(cellstr(textStrings));
% Create x and y coordinates for the strings and plot them
[x,y] = meshgrid(1:numlabels);
hStrings = text(x(:),y(:),textStrings(:), ...
'HorizontalAlignment','center');
% Get the middle value of the color range
midValue = mean(get(gca,'CLim'));
% Choose white or black for the text color of the strings so
% they can be easily seen over the background color
textColors = repmat(confpercent(:) > midValue,1,3);
set(hStrings,{'Color'},num2cell(textColors,2));
% Setting the axis labels
set(gca,'XTick',1:numlabels,...
'XTickLabel',labels,...
'YTick',1:numlabels,...
'YTickLabel',labels,...
'TickLength',[0 0]);
If you have the neural network toolbox you can use the function plotconfusion. You can create a copy and edit it to customise it further, for example to print custom labels.

finding minimum reflection points using matlab / octave

I have a dataset the red line.
I'm trying to find the minimum points highlighted in yellow if I take the reflection/mirror image of a data set.
See example code / plot below I'm trying to find a way to find the minimum points highlighted in yellow of the reflection/mirror image of a dataset (the blue line) that is below the reflection line (the black line).
Please note this is just a simple dataset there will be much larger datasets around 100000+
PS: I'm using Octave 3.8.1 which is like matlab
clear all,clf, clc,tic
x1=[0.;2.04;4.08;6.12;8.16;10.2;12.24;14.28;16.32;18.36]
y1=[2;2.86;4;2;1;4;5;2;7;1]
x2=[0.;2.04;4.08;6.12;8.16;10.2;12.24;14.28;16.32;18.36]
y2=abs(y1-max(y1));
data1 = y2;
reflection_line=max(y1)/2
[pks3 idx3] = findpeaks(data1,"DoubleSided","MinPeakHeight",0.1);
line([min(x1) max(x1)], [reflection_line reflection_line]);
hold on;
plot(x1,reflection_line)
hold on;
plot(x1,y1,'-r',x2,y2,'-b')
I am not reflecting your original data, but rather find the local maxima of the original values, that are larger than your given line.
Using a DIY alternative to findpeaks:
A value is a local maximum, if it's larger than (or equal to) its predecessor and successor.
%% Setup
x1 = [0.;2.04;4.08;6.12;8.16;10.2;12.24;14.28;16.32;18.36];
y1 = [2;2.86;4;2;1;4;5;2;7;1];
reflection_line = max(y1)/2;
%% Sort by x value
[x1, I] = sort(x1);
y1 = y1(I);
%% Compute peaks
maxima = #(y) [true; y(2:end)>=y(1:end-1)] & ... % Value larger than predecessor
[y(1:end-1)>=y(2:end); true]; % Value larger than successor
maximaLargerThanLine = maxima(y1) & (y1>reflection_line);
%% Plotting
plot(x1,y1);
hold on;
plot(x1(maximaLargerThanLine),y1(maximaLargerThanLine),'rx');
line([min(x1) max(x1)], [reflection_line reflection_line]);

Symbolic gradient differing wildly from analytic gradient

I am trying to simulate a network of mobile robots that uses artificial potential fields for movement planning to a shared destination xd. This is done by generating a series of m-files (one for each robot) from a symbolic expression, as this seems to be the best way in terms of computational time and accuracy. However, I can't figure out what is going wrong with my gradient computation: the analytical gradient that is being computed seems to be faulty, while the numerical gradient is calculated correctly (see the image posted below). I have written a MWE listed below, which also exhibits this problem. I have checked the file generating part of the code, and it does return a correct function file with a correct gradient. But I can't figure out why the analytic and numerical gradient are so different.
(A larger version of the image below can be found here)
% create symbolic variables
xd = sym('xd',[1 2]);
x = sym('x',[2 2]);
% create a potential function and a gradient function for both (x,y) pairs
% in x
for i=1:size(x,1)
phi = norm(x(i,:)-xd)/norm(x(1,:)-x(2,:)); % potential field function
xvector = reshape(x.',1,size(x,1)*size(x,2)); % reshape x to allow for gradient computation
grad = gradient(phi,xvector(2*i-1:2*i)); % compute the gradient
gradx = grad(1);grady=grad(2); % split the gradient in two components
% create function file names
gradfun = strcat('GradTester',int2str(i),'.m');
phifun = strcat('PotTester',int2str(i),'.m');
% generate two output files
matlabFunction(gradx, grady,'file',gradfun,'outputs',{'gradx','grady'},'vars',{xvector, xd});
matlabFunction(phi,'file',phifun,'vars',{xvector, xd});
end
clear all % make sure the workspace is empty: the functions are in the files
pause(0.1) % ensure the function file has been generated before it is called
% these are later overwritten by a specific case, but they can be used for
% debugging
x = 0.5*rand(2);
xd = 0.5*rand(1,2);
% values for the Stackoverflow case
x = [0.0533 0.0023;
0.4809 0.3875];
xd = [0.4087 0.4343];
xp = x; % dummy variable to keep x intact
% compute potential field and gradient for both (x,y) pairs
for i=1:size(x,1)
% create a grid centered on the selected (x,y) pair
xGrid = (x(i,1)-0.1):0.005:(x(i,1)+0.1);
yGrid = (x(i,2)-0.1):0.005:(x(i,2)+0.1);
% preallocate the gradient and potential matrices
gradx = zeros(length(xGrid),length(yGrid));
grady = zeros(length(xGrid),length(yGrid));
phi = zeros(length(xGrid),length(yGrid));
% generate appropriate function handles
fun = str2func(strcat('GradTester',int2str(i)));
fun2 = str2func(strcat('PotTester',int2str(i)));
% compute analytic gradient and potential for each position in the xGrid and
% yGrid vectors
for ii = 1:length(yGrid)
for jj = 1:length(xGrid)
xp(i,:) = [xGrid(ii) yGrid(jj)]; % select the position
Xvec = reshape(xp.',1,size(x,1)*size(x,2)); % turn the input into a vector
[gradx(ii,jj),grady(ii,jj)] = fun(Xvec,xd); % compute gradients
phi(jj,ii) = fun2(Xvec,xd); % compute potential value
end
end
[FX,FY] = gradient(phi); % compute the NUMERICAL gradient for comparison
%scale the numerical gradient
FX = FX/0.005;
FY = FY/0.005;
% plot analytic result
subplot(2,2,2*i-1)
hold all
xlim([xGrid(1) xGrid(end)]);
ylim([yGrid(1) yGrid(end)]);
quiver(xGrid,yGrid,-gradx,-grady)
contour(xGrid,yGrid,phi)
title(strcat('Analytic result for position ',int2str(i)));
xlabel('x');
ylabel('y');
subplot(2,2,2*i)
hold all
xlim([xGrid(1) xGrid(end)]);
ylim([yGrid(1) yGrid(end)]);
quiver(xGrid,yGrid,-FX,-FY)
contour(xGrid,yGrid,phi)
title(strcat('Numerical result for position ',int2str(i)));
xlabel('x');
ylabel('y');
end
The potential field I am trying to generate is defined by an (x,y) position, in my code called xd. x is the position matrix of dimension N x 2, where the first column represents x1, x2, and so on, and the second column represents y1, y2, and so on. Xvec is simply a reshaping of this vector to x1,y1,x2,y2,x3,y3 and so on, as the matlabfunction I am generating only accepts vector inputs.
The gradient for robot i is being calculated by taking the derivative w.r.t. x_i and y_i, these two components together yield a single derivative 'vector' shown in the quiver plots. The derivative should look like this, and I checked that the symbolic expression for [gradx,grady] indeed looks like that before an m-file is generated.
To fix the particular problem given in the question, you were actually calculating phi in such a way that meant you doing gradient(phi) was not giving the correct results compared to the symbolic gradient. I'll try and explain. Here is how you created xGrid and yGrid:
% create a grid centered on the selected (x,y) pair
xGrid = (x(i,1)-0.1):0.005:(x(i,1)+0.1);
yGrid = (x(i,2)-0.1):0.005:(x(i,2)+0.1);
But then in the for loop, ii and jj were used like phi(jj,ii) or gradx(ii,jj), but corresponding to the same physical position. This is why your results were different. Another problem you had was you used gradient incorrectly. Matlab assumes that [FX,FY]=gradient(phi) means that phi is calculated from phi=f(x,y) where x and y are matrices created using meshgrid. You effectively had the elements of phi arranged differently to that, an so gradient(phi) gave the wrong answer. Between reversing the jj and ii, and the incorrect gradient, the errors cancelled out (I suspect you tried doing phi(jj,ii) after trying phi(ii,jj) first and finding it didn't work).
Anyway, to sort it all out, on the line after you create xGrid and yGrid, put this in:
[X,Y]=meshgrid(xGrid,yGrid);
Then change the code after you load fun and fun2 to:
for ii = 1:length(xGrid) %// x loop
for jj = 1:length(yGrid) %// y loop
xp(i,:) = [X(ii,jj);Y(ii,jj)]; %// using X and Y not xGrid and yGrid
Xvec = reshape(xp.',1,size(x,1)*size(x,2));
[gradx(ii,jj),grady(ii,jj)] = fun(Xvec,xd);
phi(ii,jj) = fun2(Xvec,xd);
end
end
[FX,FY] = gradient(phi,0.005); %// use the second argument of gradient to set spacing
subplot(2,2,2*i-1)
hold all
axis([min(X(:)) max(X(:)) min(Y(:)) max(Y(:))]) %// use axis rather than xlim/ylim
quiver(X,Y,gradx,grady)
contour(X,Y,phi)
title(strcat('Analytic result for position ',int2str(i)));
xlabel('x');
ylabel('y');
subplot(2,2,2*i)
hold all
axis([min(X(:)) max(X(:)) min(Y(:)) max(Y(:))])
quiver(X,Y,FX,FY)
contour(X,Y,phi)
title(strcat('Numerical result for position ',int2str(i)));
xlabel('x');
ylabel('y');
I have some other comments about your code. I think your potential function is ill-defined, which is causing all sorts of problems. You say in the question that x is an Nx2 matrix, but you potential function is defined as
norm(x(i,:)-xd)/norm(x(1,:)-x(2,:));
which means if N was three, you'd have the following three potentials:
norm(x(1,:)-xd)/norm(x(1,:)-x(2,:));
norm(x(2,:)-xd)/norm(x(1,:)-x(2,:));
norm(x(3,:)-xd)/norm(x(1,:)-x(2,:));
and I don't think the third one makes sense. I think this could be causing some confusion with the gradients.
Also, I'm not sure if there is a reason to create the .m file functions in your real code, but they are not necessary for the code you posted.

side by side multiply histogram in matlab

I would like to produce a plot like the following in matlab.
Or may be something like this
You can use bar(...) or hist(...) to get the results you want. Consider the following code with results shown below:
% Make some play data:
x = randn(100,3);
[y, b] = hist(x);
% You can plot on your own bar chart:
figure(82);
bar(b,y, 'grouped');
title('Grouped bar chart');
% Bust histogram will work here:
figure(44);
hist(x);
title('Histogram Automatically Grouping');
% Consider stack for the other type:
figure(83);
bar(b,y,'stacked');
title('Stacked bar chart');
If your data is different sizes and you want to do histograms you could choose bins yourself to force hist(...) results to be the same size then plot the results stacked up in a matrix, as in:
data1 = randn(100,1); % data of one size
data2 = randn(25, 1); % data of another size!
myBins = linspace(-3,3,10); % pick my own bin locations
% Hists will be the same size because we set the bin locations:
y1 = hist(data1, myBins);
y2 = hist(data2, myBins);
% plot the results:
figure(3);
bar(myBins, [y1;y2]');
title('Mixed size result');
With the following results:
Doesn't hist already do the first one?
From help hist:
N = HIST(Y) bins the elements of Y into 10 equally spaced containers
and returns the number of elements in each container. If Y is a
matrix, HIST works down the columns.
For the second look at help bar