calculating sum of two triangular random variables (Matlab) - matlab

I would like to calculate the sum of two triangular random variables,
P(x1+x2 < y)
Is there a faster way to implement the sum of two triangular random variables in Matlab?
EDIT: It seems there's possibly a much easier way, as shown in this minitab demonstration. So it's not impossible. It doesn't explain how the PDF was calculated, sadly. Still looking into how I can do this in matlab.
EDIT2: Following advice, I'm using conv function in Matlab to develop the PDF of the sum of two random variables:
clear all;
clc;
pd1 = makedist('Triangular','a',85,'b',90,'c',100);
pd2 = makedist('Triangular','a',90,'b',100,'c',110);
x = linspace(85,290,200);
x1 = linspace(85,100,200);
x2 = linspace(90,110,200);
pdf1 = pdf(pd1,x1);
pdf2 = pdf(pd2,x2);
z = median(diff(x))*conv(pdf1,pdf2,'same');
p1 = trapz(x1,pdf1) %probability P(x1<y)
p2 = trapz(x2,pdf2) %probability P(x2<y)
p12 = trapz(x,z) %probability P(x1+x2 <y)
hold on;
plot(x1,pdf1) %plot pdf of dist. x1
plot(x2,pdf2) %plot pdf of dist. x2
plot(x,z) %plot pdf of x1+x2
hold off;
However this code has two problems:
PDF of X1+X2 integrates to much higher than 1.
PDF of X1+X2 varies widely depending on the range of x. Intuitively, if the X1+X2 is larger than 210 (the sum of upper limits "c" of two individual triangular distributions, 100 + 110), shouldn't P(X1+X2 <210) equal to 1? Also, since the lower limits "a" is 85 and 90, P(X1+X2 <85) = 0?

The pdf of the sum of independent variables is the convolution of the pdf's of the variables. So you need to compute the convolution of two variables with trianular pdf's. A triangle is piecewise linear, so the convolution will be piecewise quadratic.
There are a few ways to about it. If a numerical result is acceptable: discretize the pdf's and compute the convolution of the discretized pdf's. I believe there is a function conv in Matlab for that. If not, you can take the fast Fourier transform (via fft), compute the product point by point, then take the inverse transform (ifft if I remember correctly) since fft(convolution(f, g)) = fft(f) fft(g). You will need to be careful about normalization if you use either conv or fft.
If you must have an exact result, the convolution is just an integral, and if you're careful with the limits of integration, you can figure it out by hand. I don't know if the Matlab symbolic toolbox is available to you, and if so, I don't know if it can handle integrals of functions defined piecewise.

Below is the proper implementation for future users. Many thanks to Robert Dodier for guidance.
clear all;
clc;
min1 = 85;
max1 = 100;
min2 = 90;
max2 = 110;
y = 210;
pd1 = makedist('Triangular','a',min1,'b',90,'c',max1);
pd2 = makedist('Triangular','a',min2,'b',100,'c',max2);
dx = 0.01; % to ensure constant spacing
x1 = min1:dx:max1; % Could include some of the region where
x2 = min2:dx:max2; % the pdf is 0, but we don't have to.
x12 = linspace(...
x1(1) + x2(1) , ...
x1(end) + x2(end) , ...
length(x1)+length(x2)-1);
[c,index] = min(abs(x12-y));
x_short = linspace(min1+min2,x12(index),index);
pdf1 = pdf(pd1,x1);
pdf2 = pdf(pd2,x2);
pdf12 = conv(pdf1,pdf2)*dx;
zz = pdf12(1:index);
zz(index) = 0;
p1 = trapz(x1,pdf1)
p2 = trapz(x2,pdf2)
p12 = trapz(x_short,zz)
plot(x1,pdf1,x2,pdf2,x12,pdf12)
hold on;
fill(x_short,zz,'blue') % plot x1+x2
hold off;

Related

Vectors must be the same length error in Curve Fitting in Matlab

I'm having problems in curve fitting my randomized data for the function
Here is my code
N = 100;
mu = 5; stdev = 2;
x = mu+stdev*randn(N,1);
bin=mu-6*stdev:0.5:mu+6*stdev;
f=hist(x,bin);
plot(bin,f,'bo'); hold on;
x_ = x(1):0.1:x(end);
y_ = (1./sqrt(8.*pi)).*exp(-((x_-mu).^2)./8);
plot(x_,y_,'b-'); hold on;
It seems like I'm having vector size problems since it is giving me the error
Error using plot
Vectors must be the same length.
Note that I simplified y_ since mu and the standard deviation is known.
Plot:
Well first of all some adjustments to your question:
You are not trying to do curve fitting. What you are trying to do (in my opinion) is to overlay a probability density function on an histogram obtained by taking random points from the same distribution (A normal distribution with parameters (mu,sigma)). These two curve should indeed overlay, as they represent the same thing, only one is analytical and the other one is obtained numerically.
As seen in the hist documentation, hist is not recommended and you should use histogram instead
First step: Generating your random data
Knowing the distribution is the Normal distribution, we can use MATLAB's random function to do that :
N = 150;
rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,N,1);
Second step: Plot the histogram
Because we don't just want a count of the elements in each bin, but a feel of the probability density function, we can use the 'Normalization' 'pdf' arguments
Nbins = 25;
f=histogram(r,Nbins,'Normalization','pdf');
hold on
Here I'd rather specify a number of bins than specifying the bins themselves, because you never know in advance how far from the mean your data is going to be.
Last step: overlay the probability density function over the histogram
The histogram being already consistent with a probability density function, it is sufficient to just overlay the density function:
x_ = linspace(min(r),max(r),100);
y_ = (1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'b-');
With N = 150
With N = 1500
With N = 150.000 and Nbins = 50
If for some obscure reason you want to use old hist() function
The old hist() function can't handle normalization, so you'll have to do it by hand, by normalizing your density function to fit your histogram:
N = 1500;
% rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,1,N);
Nbins = 50;
[~,centers]=hist(r,Nbins);
hist(r,Nbins); hold on
% Width of bins
Widths = diff(centers);
x_ = linspace(min(r),max(r),100);
y_ = N*mean(Widths)*(1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'r-');

Producing a histogram in Matlab with out using Hist

I am using histograms in Matlab to look at the distribution of some data from my experiments. I want to find the mean distribution (mean height of the bars) from a group of tests then produce an average histogram.
By using this code:
data = zeros(26,31);
for i = 1:length(files6)
x = csvread(files6(i).name);
x = x(1:end,:);
time = x(:,1);
variable = x(:,3);
thing(:,1) = x(:,1);
thing(:,2) = x(:,3);
figure()
binCenter = {0:tbinstep:tbinend 0:varbinstep:varbinend};
hist3(thing, 'Ctrs', binCenter, 'CDataMode','auto','FaceColor','interp');
colorbar
[N,C] = hist3(thing, 'Ctrs', binCenter);
data = data + N;
clearvars x time variable
end
avedata = data / i;
I can find the mean of N, which will be the Z value for the plot (histogram) I want, and I have X,Y (which are the same for all tests) from:
x = 0:tbinstep:tbinend;
y = 0:varbinstep:varbinend;
But how do I bring these together to make the graphical out that shows the average height of the bars? I can't use hist3 again as that will just calculate the distribution of avedata.
AT THE RISK OF STARTING AN XY PROBLEM using bar3 has been suggested, but that asks the question "how do I go from 2 vectors and a matrix to 1 matrix bar3 can handle? I.e. how do I plot x(1), y(1), avedata(1,1) and so on for all the data points in avedata?"
TIA
By looking at hist3 source code in matlab r2014b, it has his own plotting implemented inside that prepares data and plot it using surf method. Here is a function that reproduce the same output highly inspired from the hist3 function with your options ('CDataMode','auto','FaceColor','interp'). You can put this in a new file called hist3plot.m:
function [ h ] = hist3plot( N, C )
%HIST3PLOT Summary of this function goes here
% Detailed explanation goes here
xBins = C{1};
yBins = C{2};
% Computing edges and width
nbins = [length(xBins), length(yBins)];
xEdges = [0.5*(3*xBins(1)-xBins(2)), 0.5*(xBins(2:end)+xBins(1:end-1)), 0.5*(3*xBins(end)-xBins(end-1))];
yEdges = [0.5*(3*yBins(1)-yBins(2)), 0.5*(yBins(2:end)+yBins(1:end-1)), 0.5*(3*yBins(end)-yBins(end-1))];
xWidth = xEdges(2:end)-xEdges(1:end-1);
yWidth = yEdges(2:end)-yEdges(1:end-1);
del = .001; % space between bars, relative to bar size
% Build x-coords for the eight corners of each bar.
xx = xEdges;
xx = [xx(1:nbins(1))+del*xWidth; xx(2:nbins(1)+1)-del*xWidth];
xx = [reshape(repmat(xx(:)',2,1),4,nbins(1)); NaN(1,nbins(1))];
xx = [repmat(xx(:),1,4) NaN(5*nbins(1),1)];
xx = repmat(xx,1,nbins(2));
% Build y-coords for the eight corners of each bar.
yy = yEdges;
yy = [yy(1:nbins(2))+del*yWidth; yy(2:nbins(2)+1)-del*yWidth];
yy = [reshape(repmat(yy(:)',2,1),4,nbins(2)); NaN(1,nbins(2))];
yy = [repmat(yy(:),1,4) NaN(5*nbins(2),1)];
yy = repmat(yy',nbins(1),1);
% Build z-coords for the eight corners of each bar.
zz = zeros(5*nbins(1), 5*nbins(2));
zz(5*(1:nbins(1))-3, 5*(1:nbins(2))-3) = N;
zz(5*(1:nbins(1))-3, 5*(1:nbins(2))-2) = N;
zz(5*(1:nbins(1))-2, 5*(1:nbins(2))-3) = N;
zz(5*(1:nbins(1))-2, 5*(1:nbins(2))-2) = N;
% Plot the bars in a light steel blue.
cc = repmat(cat(3,.75,.85,.95), [size(zz) 1]);
% Plot the surface
h = surf(xx, yy, zz, cc, 'CDataMode','auto','FaceColor','interp');
% Setting x-axis and y-axis limits
xlim([yBins(1)-yWidth(1) yBins(end)+yWidth(end)]) % x-axis limit
ylim([xBins(1)-xWidth(1) xBins(end)+xWidth(end)]) % y-axis limit
end
You can then call this function when you want to plot outputs from Matlab's hist3 function. Note that this can handle non uniform positionning of bins:
close all; clear all;
data = rand(10000,2);
xBins = [0,0.1,0.3,0.5,0.6,0.8,1];
yBins = [0,0.1,0.3,0.5,0.6,0.8,1];
figure()
hist3(data, {xBins yBins}, 'CDataMode','auto','FaceColor','interp')
title('Using hist3')
figure()
[N,C] = hist3(data, {xBins yBins});
hist3plot(N, C); % The function is called here
title('Using hist3plot')
Here is a comparison of the two outputs:
So if I understand your question and code correctly, you are plotting the distribution of multiple experiments' data as histograms, then you want to calculate the average shape of all the previous histograms.
I usually avoid giving approaches the asker isn't explicitly asking for, but for this one I must comment that it is a very strange thing to do. I've never heard of calculating the average shape of multiple histograms before. So just in case, you could simply append all your experiment's data into a single variable, and plot a normalized histogram of that using histogram2. This code outputs a relative frequency histogram. (Other normalization methods)
% Append all data in a single matrix
x = []
for i = 1:length(files6)
x = [x; csvread(files6(i).name)];
end
% Plot normalized bivariate histogram, normalized
xEdges = 0:tbinstep:tbinend;
yEdges = 0:varbinstep:varbinend;
histogram2(x(:,1), x(:,3), xEdges, yEdges, 'Normalize', 'Probability')
Now, if you really are looking to draw the average shape of multiple histograms, then yes, use bar3. Since bar3 doesn't accept an (x,y) value argument, you can follow the other answer, or modify the XTickLabel and YTickLabel property to match whatever your bin range is, afterwards.
... % data = yourAverageData;
% Save axis handle to `h`
h = bar3(data);
% Set property of axis
h.XTickLabels = 0:tbinstep:tbinend;
h.YTickLabels = 0:varbinstep:varbinend;

Multivariate Normal Distribution Matlab, probability area

I have 2 arrays: one with x-coordinates, the other with y-coordinates.
Both are a normal distribution as a result of a Monte-Carlo simulation. I know how to find the sigma and mu for both array's, and get a 95% confidence interval:
[mu,sigma]=normfit(x_array);
hist(x_array);
x=norminv([0.025 0.975],mu,sigma)
However, both array's are correlated with each other. To plot the probability distribution of the combined array's, i use the multivariate normal distribution. In MATLAB this gives me:
[MuX,SigmaX]=normfit(x_array);
[MuY,SigmaY]=normfit(y_array);
mu = [MuX MuY];
Sigma=cov(x_array,y_array);
x1 = MuX-4*SigmaX:5:MuX+4*SigmaX; x2 = MuY-4*SigmaY:5:MuY+4*SigmaY;
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
set(gca,'Ydir','reverse')
xlabel('x0-as'); ylabel('y0-as'); zlabel('Probability Density');
So far so good. Now I want to calculate the 95% probability area. I'am looking for a function as mndinv, just as norminv. However, such a function doesn't exist in MATLAB, which makes sense because there are endless possibilities... Does somebody have a tip about how to get a 95% probability area? Thanks in advance.
For the bivariate case you can add the ellispe whose area corresponds to NORMINV(95%). This ellipse is uniquely identified and for proof see the first source in the link.
% Suppose you know the distribution params, or you got them from normfit()
mu = [3, 7];
sigma = [1, 2.5
2.5 9];
% X/Y values for plotting grid
x = linspace(mu(1)-3*sqrt(sigma(1)), mu(1)+3*sqrt(sigma(1)),100);
y = linspace(mu(2)-3*sqrt(sigma(end)), mu(2)+3*sqrt(sigma(end)),100);
% Z values
[X1,X2] = meshgrid(x,y);
Z = mvnpdf([X1(:) X2(:)],mu,sigma);
Z = reshape(Z,length(y),length(x));
% Plot
h = pcolor(x,y,Z);
set(h,'LineStyle','none')
hold on
% Add level set
alpha = 0.05;
r = sqrt(-2*log(alpha));
rho = sigma(2)/sqrt(sigma(1)*sigma(end));
M = [sqrt(sigma(1)) rho*sqrt(sigma(end))
0 sqrt(sigma(end)-sigma(end)*rho^2)];
theta = 0:0.1:2*pi;
f = bsxfun(#plus, r*[cos(theta)', sin(theta)']*M, mu);
plot(f(:,1), f(:,2),'--r')
Sources
https://upload.wikimedia.org/wikipedia/commons/a/a2/Cumulative_function_n_dimensional_Gaussians_12.2013.pdf
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
To get the numerical value of F where the top part lies, you should use top5=prctile(F(:),95) . This will return the value of F that limits the bottom 95% of data with the top 5%.
Then you can get just the top 5% with
Ftop=zeros(size(F));
Ftop=F>top5;
Ftop=Ftop.*F;
%// optional: Ftop(Ftop==0)=NaN;
surf(x1,x2,Ftop,'LineStyle','none');

How to plot two 1-dimensional Gaussian distributions together with the classification boundary [Matlab]?

I have two classes(normally distributed), C1 and C2, each defined by their mean and standard deviation. I want to be able to visualize the pdf plot of a normal distributions and the classification boundary between the two. Currently I have the code to plot the distributions but I'm not sure how to go about plotting the decision boundary. Any ideas would be appreciated. I have included a sample of what I want to plot. 1
Many thanks!
This is what I came up with:
% Generate some example data
mu1 = -0.5; sigma1 = 0.7; mu2 = 0.8; sigma2 = 0.5;
x = linspace(-8, 8, 500);
y1 = normpdf(x, mu1, sigma1);
y2 = normpdf(x, mu2, sigma2);
% Plot it
figure; plot(x, [y1; y2])
hold on
% Detect intersection between curves; choose threshold so you get the whole
% intersection (0.0001 should do unless your sigmas are very large)
ind = y1 .* y2 > 0.0001;
% Find the minimum values in range
minVals = min([y1(ind); y2(ind)]);
if ~isempty(minVals)
area(x(ind), minVals)
end
I don't know if this is the best way to do what you want, but it seems to work.

Plotting an ellipse in MATLAB given in matrix form

I have an ellipse in 2 dimensions, defined by a positive definite matrix X as follows: a point x is in the ellipse if x'*X*x <= 1. How can I plot this ellipse in matlab? I've done a bit of searching while finding surprisingly little.
Figured out the answer actually: I'd post this as an answer, but it won't let me (new user):
Figured it out after a bit of tinkering. Basically, we express the points on the ellipse border (x'*X*x = 1) as a weighted combination of the eigenvectors of X, which makes some of the math to find the points easier. We can just write (au+bv)'X(au+bv)=1 and work out the relationship between a,b. Matlab code follows (sorry it's messy, just used the same notation that I was using with pen/paper):
function plot_ellipse(X, varargin)
% Plots an ellipse of the form x'*X*x <= 1
% plot vectors of the form a*u + b*v where u,v are eigenvectors of X
[V,D] = eig(X);
u = V(:,1);
v = V(:,2);
l1 = D(1,1);
l2 = D(2,2);
pts = [];
delta = .1;
for alpha = -1/sqrt(l1)-delta:delta:1/sqrt(l1)+delta
beta = sqrt((1 - alpha^2 * l1)/l2);
pts(:,end+1) = alpha*u + beta*v;
end
for alpha = 1/sqrt(l1)+delta:-delta:-1/sqrt(l1)-delta
beta = -sqrt((1 - alpha^2 * l1)/l2);
pts(:,end+1) = alpha*u + beta*v;
end
plot(pts(1,:), pts(2,:), varargin{:})
I stumbled across this post while searching for this topic, and even though it's settled, I thought I might provide another simpler solution, if the matrix is symmetric.
Another way of doing this is to use the Cholesky decomposition of the semi-definite positive matrix E implemented in Matlab as the chol function. It computes an upper triangular matrix R such that X = R' * R. Using this, x'*X*x = (R*x)'*(R*x) = z'*z, if we define z as R*x.
The curve to plot thus becomes such that z'*z=1, and that's a circle. A simple solution is thus z = (cos(t), sin(t)), for 0<=t<=2 pi. You then multiply by the inverse of R to get the ellipse.
This is pretty straightforward to translate into the following code:
function plot_ellipse(E)
% plots an ellipse of the form xEx = 1
R = chol(E);
t = linspace(0, 2*pi, 100); % or any high number to make curve smooth
z = [cos(t); sin(t)];
ellipse = inv(R) * z;
plot(ellipse(1,:), ellipse(2,:))
end
Hope this might help!