Integration with Probability Density Function - matlab

I need to plot the probability density function {p(z,phi)}and need to integrate it,as shown in the attached eq.#1
enter image description here
where Af and Vf are constants,
phi is angle,
z is distance(numerical value, can be in decimals)
The P(z,phi) will be the force values along with respective different values of z and phi.
Could someone guide me, on MATLAB, how can I write these set of equations?

To intregate your function, you should either create an m-file or anonymous function
f = #(z,phi) P(z,phi) * p(z,phi)
where you construct P and p similarly. Then you will need to use one of the numerical integrators, such as ode45 to integrate f twice... once over f and once over phi.

If I understand you correctly you multiply a uniform probability distribution between -lf/2 and lf/2 with another probability distribution that looks like the first quarter of a sine wave. You want to know the resulting probability distribution.
Basically if lf/2 > pi/2 you end up with the same distribution. The sine-distribution is entirely inside the uniform distribution. If (lf/2)<(pi/2) the uniform-distribution chops of part of your sine-distribution. You then want to divide your probability distribution by the part you choped off so the integral stays one. It must remain a probability distribution.
The integral of sin(x) is cos(x). So in that case you devide by (1-cos(lf/2))
Below is a script that makes it more visible:
lf=2;
xx = linspace(-lf,lf,1E4);
p1 = (xx>-lf/2&xx<lf/2)*(1/lf);
p2 = zeros(size(xx));
p2(xx>0&xx<pi/2) = sin(xx(xx>0&xx<pi/2));
p3 = p2.*p1.*lf;
if lf<pi
p3 = p3./(1-cos(lf/2));
end
plot(xx,p1,xx,p2,xx,p3)
legend({'uniform distribution','sine','result'})
%integrals (actually Riemann sums):
sum(p1.*(xx(2)-xx(1)))
sum(p2.*(xx(2)-xx(1)))
sum(p3.*(xx(2)-xx(1)))

Related

Approximate continuous probability distribution in Matlab

Suppose I have a continuous probability distribution, e.g., Normal, on a support A. Suppose that there is a Matlab code that allows me to draw random numbers from such a distribution, e.g., this.
I want to build a Matlab code to "approximate" this continuous probability distribution with a probability mass function spanning over r points.
This means that I want to write a Matlab code to:
(1) Select r points from A. Let us call these points a1,a2,...,ar. These points will constitute the new discretised support.
(2) Construct a probability mass function over a1,a2,...,ar. This probability mass function should "well" approximate the original continuous probability distribution.
Could you help by providing also an example? This is a similar question asked for Julia.
Here some of my thoughts. Suppose that the continuous probability distribution of interest is one-dimensional. One way to go could be:
(1) Draw 10^6 random numbers from the continuous probability distribution of interest and store them in a column vector D.
(2) Suppose that r=10. Compute the 10-th, 20-th,..., 90-th quantiles of D. Find the median point falling in each of the 10 bins obtained. Call these median points a1,...,ar.
How can I construct the probability mass function from here?
Also, how can I generalise this procedure to more than one dimension?
Update using histcounts: I thought about using histcounts. Do you think it is a valid option? For many dimensions I can use this.
clear
rng default
%(1) Draw P random numbers for standard normal distribution
P=10^6;
X = randn(P,1);
%(2) Apply histcounts
[N,edges] = histcounts(X);
%(3) Construct the new discrete random variable
%(3.1) The support of the discrete random variable is the collection of the mean values of each bin
supp=zeros(size(N,2),1);
for j=2:size(N,2)+1
supp(j-1)=(edges(j)-edges(j-1))/2+edges(j-1);
end
%(3.2) The probability mass function of the discrete random variable is the
%number of X within each bin divided by P
pmass=N/P;
%(4) Check if the approximation is OK
%(4.1) Find the CDF of the discrete random variable
CDF_discrete=zeros(size(N,2),1);
for h=2:size(N,2)+1
CDF_discrete(h-1)=sum(X<=edges(h))/P;
end
%(4.2) Plot empirical CDF of the original random variable and CDF_discrete
ecdf(X)
hold on
scatter(supp, CDF_discrete)
I don't know if this is what you're after but maybe it can help you. You know, P(X = x) = 0 for any point in a continuous probability distribution, that is the pointwise probability of X mapping to x is infinitesimal small, and thus regarded as 0.
What you could do instead, in order to approximate it to a discrete probability space, is to define some points (x_1, x_2, ..., x_n), and let their discrete probabilities be the integral of some range of the PDF (from your continuous probability distribution), that is
P(x_1) = P(X \in (-infty, x_1_end)), P(x_2) = P(X \in (x_1_end, x_2_end)), ..., P(x_n) = P(X \in (x_(n-1)_end, +infty))
:-)

Solving integral in Matlab containing unknown variables?

The task is to create a cone hat in Matlab by creating a developable surface with numerical methods. There are 3 parts of which I have done 2. My question is regarding part 3 where I need to calculate the least rectangular paper surface that can contain the hat. And I need to calculate the material waste of the paper.
YOU CAN MAYBE SKIP THE LONG BACKGROUND AND GO TO LAST PARAGRAPH
BACKGROUND:
The cone hat can be created with a skewed cone with its tip located at (a; 0; b) and with a circle-formed base.
x = Rcos u,
y = Rsin u
z = 0
0<_ u >_2pi
with
known values for R, a and b
epsilon and eta ('n') is the curves x- and y- values when the parameter u goes from 0 to 2pi and alpha is the angle of inclination for the curve at point (epsilon, eta). Starting values at A:
u=0, alhpa=0, epsilon=0, eta=0.
Curve stops at B where the parameter u has reached 2pi.
1.
I plotted the curve by using Runge-Kutta-4 and showed that the tip is located at P = (0, sqrt(b^2 + (R-alpha)^2))
2.
I showed that by using smaller intervals in RK4 I still get quite good accuracy but the problem then is that the curve isn't smooth. Therefore I used Hermite-Interpolation of epsilon and eta as functions of u in every interval to get a better curve.
3.
Ok so now I need to calculate the least rectangular paper surface that can contain the hat and the size of the material waste of the paper. If the end angle alpha(2pi) in the template is pi or pi/2 the material waste will be less. I now get values for R & alpha (R=7.8 and alpha=5.5) and my task is to calculate which height, b the cone hat is going to get with the construction-criteria alpha(2pi)=pi (and then for alpha(2pi)=pi/2 for another sized hat).
So I took the first equation above (the expression containing b) and rewrote it like an integral:
TO THE QUESTION
What I understand is that I need to solve this integral in matlab and then choose b so that alpha(2pi)-pi=0 (using the given criteria above).
The values for R and alpha is given and t is defined as an interval earlier (in part 1 where I did the RK4). So when the integral is solved I get f(b) = 0 which I should be able to solve with for example the secant method? But I'm not able to solve the integral with the matlab function 'integral'.. cause I don't have the value of b of course, that is what I am looking for. So how am I going to go about this? Is there a function in matlab which can be used?
You can use the differential equation for alpha and test different values for b until the condition alpha(2pi)=pi is met. For example:
b0=1 %initial seed
b=fsolve(#find_b,b0) %use the function fsolve or any of your choice
The function to be solved is:
function[obj]=find_b(b)
alpha0=0 %initual valur for alpha at u=0
uspan=[0 2*pi] %range for u
%Use the internal ode solver or any of your choice
[u,alpha] = ode45(#(u,alpha) integrate_alpha(u,alpha,b), uspan, alpha0);
alpha_final=alpha(end) %Get the last value for alpha (at u=2*pi)
obj=alpha_final-pi %Function to be solved
end
And the integration can be done like this:
function[dalpha]=integrate_alpha(u,alpha,b)
a=1; %you can use the right value here
R=1; %you can use the right value here
dalpha=(R-a*cos(u))/sqrt(b^2+(R-a*cos(u))^2);
end

Generate random samples from arbitrary discrete probability density function in Matlab

I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.

Finding probability of Gaussian random variable with range

I am a beginner in MATLAB trying to use it to help me understand random signals, I was doing some Normal probability density function calculations until i came across this problem :
Write a MATLAB program to calculate the probability Pr(x1 ≤ X ≤ x2)
if X is a Gaussian random variable for an arbitrary x1 and x2.
Note that you will have to specify the mean and variance of the Gaussian random
variable.
I usually use the built in normpdf function of course selecting my mean and variance, but for this one since it has a range am not sure what I can do to find the answer
Y = normpdf(X,mu,sigma)
If you recall from probability theory, you know that the Cumulative Distribution Function sums up probabilities from -infinity up until a certain point x. Specifically, the CDF F(x) for a probability distribution P with random variable X evaluated at a certain point x is defined as:
Note that I am assuming that we are dealing with the discrete case. Also, let's call this a left-tail sum, because it sums all of the probabilities from the left of the distribution up until the point x. Consequently, this defines the area underneath the curve up until point x.
Now, what your question is asking is to find the probability between a certain range x1 <= x <= x2, not just with using a left-tail sum (<= x). Now, if x1 <= x2, this means that total area where the end point is x2, or the probability of all events up to and including x2, is also part of the area defined by the end point defined at x1. Because you want the probability between a certain range, you need to accumulate all events that happen between x1 and x2, and so you want the area under the PDF curve that is in between this range. Also, you want to have the area that is greater than x1 and less than x2.
Here's a pictorial example:
Source: ReliaWiki
The top figure is the PDF of a Gaussian distribution function, while the bottom figure denotes the CDF of a Gaussian distribution. You see that if x1 <= x2, the area defined by the point at x1 is also captured by the point at x2. Here's a better graph:
Source: Introduction to Statistics
Here the CDF is continuous instead of discrete, but the result is still the same. If you want the area in between two intervals and ultimately the probability in between the two ranges, you need to take the CDF value at x2 and subtract the CDF value at x1. You want the remaining area, and so you just need to subtract the CDF values and ultimately the left-tail areas, and so:
As such, to calculate the CDF of the Gaussian distribution use normcdf and specify the mean and standard deviation of your Gaussian distribution. Therefore, you just need to do this:
y = normcdf(x2, mu, sigma) - normcdf(x1, mu, sigma);
x1 and x2 are the values of the interval that you want to calculate the sum of probabilities under.
You can use erf,
mu = 5;
sigma = 3;
x1 = 3;
x2 = 8;
p = .5*(erf((x2-mu)/sigma/2^.5) - erf((x1-mu)/sigma/2^.5));
error function is defined like this in MATLAB,

Discrete surface integral with cumsum

I have a matrix z(x,y)
This is an NxN abitary pdf constructed from a unique Kernel density estimation (i.e. not a usual pdf and it doesn't have a function). It is multivariate and can't be separated and is discrete data.
I wan't to construct a NxN matrix (F(x,y)) that is the cumulative distribution function in 2 dimensions of this pdf so that I can then randomly sample the F(x,y) = P(x < X ,y < Y);
Analytically I think the CDF of a multivariate function is the surface integral of the pdf.
What I have tried is using the cumsum function in order to calculate the surface integral and tested this with a multivariate normal against the analytical solution and there seems to be some discrepancy between the two:
% multivariate parameters
delta = 100;
mu = [1 1];
Sigma = [0.25 .3; .3 1];
x1 = linspace(-2,4,delta); x2 = linspace(-2,4,delta);
[X1,X2] = meshgrid(x1,x2);
% Calculate Normal multivariate pdf
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
% My attempt at a numerical surface integral
FN = cumsum(cumsum(F,1),2);
% Normalise the CDF
FN = FN./max(max(FN));
X = [X1(:) X2(:)];
% Analytic solution to a multivariate normal pdf
p = mvncdf(X,mu,Sigma);
p = reshape(p,delta,delta);
% Highlight the difference
dif = p - FN;
error = max(max(sqrt(dif.^2)));
% %% Plot
figure(1)
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
xlabel('x1'); ylabel('x2'); zlabel('Probability Density');
figure(2)
surf(X1,X2,FN);
xlabel('x1'); ylabel('x2');
figure(3);
surf(X1,X2,p);
xlabel('x1'); ylabel('x2');
figure(5)
surf(X1,X2,dif)
xlabel('x1'); ylabel('x2');
Particularly the error seems to be in the transition region which is the most important.
Does anyone have any better solution to this problem or see what I'm doing wrong??
Any help would be much appreciated!
EDIT: This is the desired outcome of the cumulative integration, The reason this function is of value to me is that when you randomly generate samples from this function on the closed interval [0,1] the higher weighted (i.e. the more likely) values appear more often in this way the samples converge on the expected value(s) (in the case of multiple peaks) this is desired outcome for algorithms such as particle filters, neural networks etc.
Think of the 1-dimensional case first. You have a function represented by a vector F and want to numerically integrate. cumsum(F) will do that, but it uses a poor form of numerical integration. Namely, it treats F as a step function. You could instead do a more accurate numerical integration using the Trapezoidal rule or Simpson's rule.
The 2-dimensional case is no different. Your use of cumsum(cumsum(F,1),2) is again treating F as a step function, and the numerical errors resulting from that assumption only get worse as the number of dimensions of integration increases. There exist 2-dimensional analogues of the Trapezoidal rule and Simpson's rule. Since there's a bit too much math to repeat here, take a look here:
http://onestopgate.com/gate-study-material/mathematics/numerical-analysis/numerical-integration/2d-trapezoidal.asp.
You DO NOT need to compute the 2-dimensional integral of the probability density function in order to sample from the distribution. If you are computing the 2-d integral, you are going about the problem incorrectly.
Here are two ways to approach the sampling problem.
(1) You write that you have a kernel density estimate. A kernel density estimate is a special case of a mixture density. Any mixture density can be sampled by first selecting one kernel (perhaps differently or equally weighted, same procedure applies), and then sampling from that kernel. (That applies in any number of dimensions.) Typically the kernels are some relatively simple distribution such as a Gaussian distribution so that it is easy to sample from it.
(2) Any joint density P(X, Y) is equal to P(X | Y) P(Y) (and equivalently P(Y | X) P(X)). Therefore you can sample from P(Y) (or P(X)) and then from P(X | Y). In order to sample from P(X | Y), you will need to integrate P(X, Y) along a line Y = y (where y is the sampled value of Y), but (this is crucial) you only need to integrate along that line; you don't need to integrate over all values of X and Y.
If you tell us more about your problem, I can help with the details.