Finding where two histograms cross paths - MATLAB - matlab

I am generating two histograms using the histogram function from Matlab that are both normalized using the probability argument.
However, once I generate two histograms as shown below, I'd like to be able to find the exact point at which the histograms would cross paths, assuming the histograms were drawn using lines instead of bars. Unfortunately, this form of histogram doesn't allow for lines, it just has bars. There is a hist function which can be manipulated in Matlab to draw a histogram as lines instead of bars, however, it doesn't easily normalize.
Hence, ideally, I'd like to use histogram() to plot the 2 histograms and find where they cross. See image below:
Here's an example of how the graphs can be created:
x = randn(2000,1);
y = 1 + randn(5000,1);
h1 = histogram(x);
hold on
h2 = histogram(y);
h1.Normalization = 'probability';
h1.BinWidth = 0.25;
h2.Normalization = 'probability';
h2.BinWidth = 0.25;
Now from here, I want to find the point where the two histograms cross paths. Note, the intersection value is the intersection (in the mathematical sense). This is not what I'm looking for. I'm looking for the x coordinate of where the two histograms cross at their outer boundaries. For example, in the attached image, the answer would be ~2.5.

From your example data, with a simple modification:
x = randn(2000,1);
y = 1 + randn(5000,1);
h1 = histogram(x);
hold on
h2 = histogram(y);
h1.Normalization = 'probability';
h1.BinWidth = 0.25;
h1.BinLimits=[min([x(:); y(:)]) max([x(:); y(:)])];
h2.Normalization = 'probability';
h2.BinWidth = 0.25;
h2.BinLimits=[min([x(:); y(:)]) max([x(:); y(:)])];
data1=h1.Values;
data2=h2.Values;
intersection_value=find(data2>data1,1); % this is the index, bad variable name

Related

Interpolation between curves that may overlap

I try to interpolate curves in the gaps of some already existing curves using matlab (and the interp1-function).
Start of edit 1
I already have the data of 5 Torque over rpm curves that I obtained with a number of simulations for each curve. Since simulation time is precious, I would like to save time with the interpolation of curves that "fill the gap" between the already existing ones.
I'm looking to form the following:
Further thoughts of me down below.
End of edit 1
I tried to follow the steps from the question in the thread Interpolation between two curves (matlab) but it does not seem to work with my code. I am not sure if the code is actually applicable since the curves might overlap...
I tried to edit the code from the link above like the following:
% Save the data in one array each. The original data is stored in the
% arrays "x/yOriginal" row-wise.
curve1_data = [xOriginal(:,1) yOriginal(:,1)];
curve2_data = [xOriginal(:,2) yOriginal(:,2)];
% This was to try if the code from the link works
yy = [0:1:60];
xx1 = interp1(curve1_data(:,2),curve1_data(:,1),yy,'spline');
xx2 = interp1(curve2_data(:,2),curve2_data(:,1),yy,'spline');
m = 3; % Curve_Offset
mm(:,m) = xx1 + (xx2-xx1)*(m/(8750-5000));
% With the following I tried to interpolate over x
xx = (xOriginal(1,1):1:xOriginal(1,2));
yy1 = interp1(curve1_data(:,1),curve1_data(:,2),xx,'spline');
yy2 = interp1(curve2_data(:,1),curve2_data(:,2),xx,'spline');
m = 3; % Curve_Offset
% From the original code:
mm(:,m) = xx1 + (xx2-xx1)*(m/(8750-5000));
% Interpolation over x
yINT = yy1 + (yy2-yy1)*(m/8750-5000);
None of the interpolation-techniques worked, the y-values are either 90% negative (with the code from the link) or way too high (10e8 with the interpolation over x).
What I expected was that it creates a curve a little less steep than the curve "to its left" and a bit steeper than the curve "to its right".
My further thoughts:
The existing curves are the product of big 3-dimensional arrays. I.e. it could be that it is more the way to go to interpolate the arrays and then "read out" the Torque-over-rpm-Curves. On the other hand, I don't see a way to interpolate between two 1001-by-7001-by-5 arrays...
Moreover, for the next steps with the programm, the curve-interpolation needs to be quite fine (it is necessary to have way more than 1 interpolated curve between two existing curves) which makes the problem even more difficult.
If I understand right, the plot is generated with something like this:
plot(xOriginal(:,1),yOriginal(:,1))
plot(xOriginal(:,2),yOriginal(:,2))
% ...
If so, you can plot an intermediate curve with
plot((xOriginal(:,1) + xOriginal(:,2))/2, (yOriginal(:,1) + yOriginal(:,2))/2)
That is, the average between each pair of coordinates forms a curve that is exactly half-way between the two original curves.
Use weighted averages to generate more of these curves. This is linear interpolation.
d = 0.2;
plot(d*xOriginal(:,1) + (1-d)*xOriginal(:,2), d*yOriginal(:,1) + (1-d)*yOriginal(:,2))
Setting d = 0.5 we go back to the half-way case above.
Example:
xOriginal(:,1) = linspace(0.2,0.5,100);
yOriginal(:,1) = 3 * cos(xOriginal(:,1)*10-2.5) + 3;
xOriginal(:,2) = linspace(0.6,0.8,100);
yOriginal(:,2) = cos(xOriginal(:,2)*20-13.5) + 1;
clf; hold on
plot(xOriginal(:,1),yOriginal(:,1))
plot(xOriginal(:,2),yOriginal(:,2))
plot((xOriginal(:,1)+xOriginal(:,2))/2,(yOriginal(:,1)+yOriginal(:,2))/2)
(the interpolated line is in orange)

Matlab fit poisson function to histogram

I am trying to fit a Poisson function to a histogram in Matlab: the example calls for using hist() (which is deprecated) so I want to use histogram() instead, especially as you cannot seem to normalize a hist(). I then want to apply a poisson function to it using poisspdf() or any other standard function (preferably no toolboxes!). The histogram is probability scaled, which is where the issue with the poisson function comes from AFAIK.
clear
clc
lambda = 5;
range = 1000;
rangeVec = 1:range;
randomData = poissrnd(lambda, 1, range);
histoFigure = histogram(randomData, 'Normalization', 'probability');
hold on
poissonFunction = poisspdf(randomData, lambda);
poissonFunction2 = poisspdf(histoFigure, lambda);
plot(poissonFunction)
plot(poissonFunction2)
I have tried multiple different methods of creating the poisson function + plotting and neither of them seems to work: the values within this function are not consistent with the histogram values as they differ by several decimals.
This is what the image should look like
however currently I can only get the bar graphs to show up correctly.
You're not specifing the x-data of you're curve. Then the sample number is used and since you have 1000 samples, you get the ugly plot. The x-data that you use is randomData. Using
plot(randomData, poissonFunction)
will lead to lines between different samples, because the samples follow each other randomly. To take each sample only once, you can use unique. It is important that the x and y values stay connected to each other, so it's best to put randomData and poissonFunction in 1 matrix, and then use unique:
d = [randomData;poissonFunction].'; % make 1000x2 matrix to find unique among rows
d = unique(d,'rows');
You can use d to plot the data.
Full code:
clear
clc
lambda = 5;
range = 1000;
rangeVec = 1:range;
randomData = poissrnd(lambda, 1, range);
histoFigure = histogram(randomData, 'Normalization', 'probability');
hold on
poissonFunction = poisspdf(randomData, lambda);
d = [randomData; poissonFunction].';
d = unique(d, 'rows');
plot(d(:,1), d(:,2))
With as result:

How can I specify different points in the plot in matlab

I have generated a data set in matlab then some outliers embedding in the data. I would like to plot it and since I'm new in matlab I don't know how to specify the outliers from inliers by different sign or different color. The points which are outlyingness with respect to the x axis, y axis and both of them. This is the matlab codes for that;
pd = makedist('Normal');
rng(38)
a = random(pd,100,1);
b = datasample(1:100,40,'Replace',false);
pd1 = makedist('Normal','mu',10*sqrt(2),'sigma',0.1);
a(b)=random(pd1,40,1);
a=reshape(a,[50,2]);
plot(a(:,1),a(:,2),'O')
I would be appreciated if you could help me.
In this example I assumed that the points which distance along OX axis is greater than 3 are outliers and marked them red (whereas normal points are marked blue):
centroid = mean(a);
distx = a(:,1) - centroid(1);
disty = a(:,2) - centroid(2);
outliers_x = distx > 3;
plot(centroid(1), centroid(2), 'xk')
hold on
plot(a(outliers_x,1),a(outliers_x,2),'or')
plot(a(~outliers_x,1),a(~outliers_x,2),'ob')
hold off
Note that I've also displayed the centroid as a black "X" mark.
hold on/hold off are used to "stack" several plots (or images) together
You may want to read hold() reference. Also here you'll find which markers and colors are available.
To answer to my question I have written the following codes, in order to specify 4 groups of observations with different color.
pd = makedist('Normal');
rng(38)
a = random(pd,100,1);
b = datasample(1:100,40,'Replace',false);
pd1 = makedist('Normal','mu',10*sqrt(2),'sigma',0.1);
a(b)=random(pd1,40,1);
a=reshape(a,[50,2]);
hold all;
aa=(a >= 10 | a >= 10);
rep=repmat(0, 1, 50);
aaa=[rep',aa];
n=50;
for i=1:n; plot(a(i,1),a(i,2),'o','col',aaa(i,:));
end

Finding 2D area defined by contour lines in Matlab

I am having difficulty with calculating 2D area of contours produced from a Kernel Density Estimation (KDE) in Matlab. I have three variables:
X and Y = meshgrid which variable 'density' is computed over (256x256)
density = density computed from the KDE (256x256)
I run the code
contour(X,Y,density,10)
This produces the plot that is attached. For each of the 10 contour levels I would like to calculate the area. I have done this in some other platforms such as R but am having trouble figuring out the correct method / syntax in Matlab.
C = contourc(density)
I believe the above line would store all of the values of the contours allowing me to calculate the areas but I do not fully understand how these values are stored nor how to get them properly.
This little script will help you. Its general for contour. Probably working for contour3 and contourf as well, with adjustments of course.
[X,Y,Z] = peaks; %example data
% specify certain levels
clevels = [1 2 3];
C = contour(X,Y,Z,clevels);
xdata = C(1,:); %not really useful, in most cases delimters are not clear
ydata = C(2,:); %therefore further steps to determine the actual curves:
%find curves
n(1) = 1; %n: indices where the certain curves start
d(1) = ydata(1); %d: distance to the next index
ii = 1;
while true
n(ii+1) = n(ii)+d(ii)+1; %calculate index of next startpoint
if n(ii+1) > numel(xdata) %breaking condition
n(end) = []; %delete breaking point
break
end
d(ii+1) = ydata(n(ii+1)); %get next distance
ii = ii+1;
end
%which contourlevel to calculate?
value = 2; %must be member of clevels
sel = find(ismember(xdata(n),value));
idx = n(sel); %indices belonging to choice
L = ydata( n(sel) ); %length of curve array
% calculate area and plot all contours of the same level
for ii = 1:numel(idx)
x{ii} = xdata(idx(ii)+1:idx(ii)+L(ii));
y{ii} = ydata(idx(ii)+1:idx(ii)+L(ii));
figure(ii)
patch(x{ii},y{ii},'red'); %just for displaying purposes
%partial areas of all contours of the same plot
areas(ii) = polyarea(x{ii},y{ii});
end
% calculate total area of all contours of same level
totalarea = sum(areas)
Example: peaks (by Matlab)
Level value=2 are the green contours, the first loop gets all contour lines and the second loop calculates the area of all green polygons. Finally sum it up.
If you want to get all total areas of all levels I'd rather write some little functions, than using another loop. You could also consider, to plot just the level you want for each calculation. This way the contourmatrix would be much easier and you could simplify the process. If you don't have multiple shapes, I'd just specify the level with a scalar and use contour to get C for only this level, delete the first value of xdata and ydata and directly calculate the area with polyarea
Here is a similar question I posted regarding the usage of Matlab contour(...) function.
The main ideas is to properly manipulate the return variable. In your example
c = contour(X,Y,density,10)
the variable c can be returned and used for any calculation over the isolines, including area.

Draw Normal Distribution Graph of a Sample in Matlab

I have 100 sampled numbers, and I need to draw the normal distribution curve of them in matlab.
The mean and standard deviation of these sampled data can be calculated easily, but is there any function that plots the normal distribution?
If you have access to Statistics Toolbox, the function histfit does what I think you need:
>> x = randn(10000,1);
>> histfit(x)
Just like with the hist command, you can also specify the number of bins, and you can also specify which distribution is used (by default, it's a normal distribution).
If you don't have Statistics Toolbox, you can reproduce a similar effect using a combination of the answers from #Gunther and #learnvst.
Use hist:
hist(data)
It draws a histogram plot of your data:
You can also specify the number of bins to draw, eg:
hist(data,5)
If you only want to draw the resulting pdf, create it yourself using:
mu=mean(data);
sg=std(data);
x=linspace(mu-4*sg,mu+4*sg,200);
pdfx=1/sqrt(2*pi)/sg*exp(-(x-mu).^2/(2*sg^2));
plot(x,pdfx);
You probably can overlay this on the previous hist plot (I think you need to scale things first however, the pdf is in the range 0-1, and the histogram is in the range: number of elements per bin).
If you want to draw a Gaussian distribution for your data, you can use the following code, replacing mean and standard deviation values with those calculated from your data set.
STD = 1;
MEAN = 2;
x = -4:0.1:4;
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((x-MEAN)/STD).^2 );
hold on; plot (x,f);
The array x in this example is the xaxis of your distribution, so change that to whatever range and sampling density you have.
If you want to draw your Gaussian fit over your data without the aid of the signal processing toolbox, the following code will draw such a plot with correct scaling. Just replace y with your own data.
y = randn(1000,1) + 2;
x = -4:0.1:6;
n = hist(y,x);
bar (x,n);
MEAN = mean(y);
STD = sqrt(mean((y - MEAN).^2));
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((x-MEAN)/STD).^2 );
f = f*sum(n)/sum(f);
hold on; plot (x,f, 'r', 'LineWidth', 2);