How to calculate cumulative distibution functions in matlab - matlab

How to solve the following problem, not quite sure how to do it, using the MATLAB functions: binocdf, normcdf, expcdf:
This is given:
X1 ∈ Bin(10, 0.3), X2 ∈ N(5, 3), X3 ∈ Exp(7)
k = 1, 2, 3
What is this probability P(3 < Xk ≤ 4) = ?
I know that the cumulative distribution function gives you the probability of a stochastic variable of being less than or equal to the input if you use for example:
binocdf(4,10,0.3) = P(X1 ≤ 4)
But how do I use these functions when it's Xk > 3?

If you remember the properties of a CDF, you can find the probability of an event of a random variable spanned by an interval [a,b] by simply substituting each end point into the CDF and subtracting the two quantities. Concretely, given f being the PDF and F being the CDF of a random variable X, calculating the probability of the event occurring P(a < X <= b) is such that:
Source: Wikipedia
Therefore, to compute P(3 < X1 <= 4) as for your example, do:
out = binocdf(4,10,0.3) - binocdf(3,10,0.3);
I'll leave it to you to figure out the other ones.

Related

Calculate percentiles? (Or more generally, evaluate function implicitly defined by 2 vectors x and y at many values z)

Let's say you have some vector z and you compute [f, x] = ecdf(z);, hence your empirical CDF can be plotted with stairs(x, f).
Is there a simple way to compute what all the percentile scores are for z?
I could do something like:
Loop through z, that is for each entry z(i) of z
Binary search through sorted vector x to find where z(i) is. (find index j such that x(j) = z(i))
Find the corresponding value f(j)
It feels like there should be a simpler, already implemented way to do this...
Let f be a monotone function defined at values x, for which you want to compute the inverse function at values p. In your case f is monotone because it is a CDF; and the values p define the desired quantiles. Then you can simply use interp1 to interpolate x, considered as a function of f, at values p:
z = randn(1,1e5); % example data: normalized Gaussian distribution
[f, x] = ecdf(z); % compute empirical CDF
p = [0.5 0.9 0.95]; % desired values for quantiles
result = interp1(f, x, p);
In an example run of the above code, this produces
result =
0.001706069265714 1.285514249607186 1.647546848952448
For the specific case of computing quantiles p from data z, you can directly use quantile and thus avoid computing the empirical CDF:
result = quantile(z, p)
The results may be slightly different depending on how the empirical CDF has been computed in the first method:
>> quantile(z, p)
ans =
0.001706803588857 1.285515826972878 1.647582486507752
For comparison, the theoretical values for the above example (Gaussian distribution) are
>> norminv(p)
ans =
0 1.281551565544601 1.644853626951472

Sampling using rejection method

I'm trying to sample 1000 numbers from a distribution with the probability density function f(x) = (1/3)x^2 , -1 < x < 2 using the rejection method. I also want to plot a histogram based on the data.
My textbook gives the following rules for using the rejection method:
1. Find such numbers a, b, and c that 0 ≤ f(x) ≤c for a ≤ x ≤ b. The bounding box
stretches along the x-axis from a to b and along the y-axis from 0 to c.
2. Obtain Standard Uniform random variables U and V from a random number generator
or a table of random numbers.
3. Define X = a+(b−a)U and Y = cV. Then X has Uniform(a,b) distribution, Y is
Uniform(0, c), and the point (X, Y ) is Uniformly distributed in the bounding box.
Based on those rules I wrote the following the code, but I believe I'm really far off from a proper solution and can use some guidance
a=-1; b=2; c=2;
while p < 1000
U = rand; V = rand;
X = a+U*(b-a); Y = c*V; f = (1/3)*X^2;
if Y<=f
x(p)=X;
p = p+1;
end
end
histogram(x);
Where exactly p is defined? Supposed to be set to 0
Wrt algorithm, it looks ok except it could be made more efficient - f(x) reach maximum value at X=2, so you could set c to 4/3.

Monte Carlo integration of exp(-x^2/2) from x=-infinity to x=+infinity

I want to integrate
f(x) = exp(-x^2/2)
from x=-infinity to x=+infinity
by using the Monte Carlo method. I use the function randn() to generate all x_i for the function f(x_i) = exp(-x_i^2/2) I want to integrate to calculate afterwards the mean value of f([x_1,..x_n]). My problem is, that the result depends on what values I choose for my borders x1 and x2 (see below). My result is going far away from the real value by increasing the value of x1 and x2. Actually the result should be better and better by increasing x1 and x2.
Does someone see my mistake?
Here is my Matlab code
clear all;
b=10; % border
x1 = -b; % left border
x2 = b; % right border
n = 10^6; % number of random numbers
x = randn(n,1);
f = ones(n,1);
g = exp(-(x.^2)/2);
F = ((x2-x1)/n)*f'*g;
The right value should be ~2.5066.
Thanks
Try this:
clear all;
b=10; % border
x1 = -b; % left border
x2 = b; % right border
n = 10^6; % number of random numbers
x = sort(abs(x1 - x2) * rand(n,1) + x1);
f = exp(-x.^2/2);
F = trapz(x,f)
F =
2.5066
Ok, lets start with writing of general case of MC integration:
I = S f(x) * p(x) dx, x in [a...b]
S here is integral sign.
Usually, p(x) is normalized probability density function, f(x) you want to integrate, and algorithm is very simple one:
set accumulator s to zero
start loop of N events
sample x randomly from p(x)
given x, compute f(x) and add to accumulator
back to start loop if not done
if done, divide accumulator by N and return it
In simplest textbook case you have
I = S f(x) dx, x in [a...b]
where it means PDF is equal to uniformly distributed one
p(x) = 1/(b-a)
but what you have to sum is actually (b-a)*f(x), because your integral now looks like
I = S (b-a)*f(x) 1/(b-a) dx, x in [a...b]
In general, if both f(x) and p(x) could serve as PDF, then it is matter of choice whether you integrate f(x) over p(x), or p(x) over f(x). No difference! (Well, except maybe computation time)
So, back to particular integral (which is equal to \sqrt{2\pi}, i believe)
I = S exp(-x^2/2) dx, x in [-infinity...infinity]
You could use more traditional approach like #Agriculturist and write it
I = S exp(-x^2/2)*(2a) 1/(2a) dx, x in [-a...a]
and sample x from U(0,1) in [-a...a] interval, and for each x compute exp() and average it and get the result
From what I understand, you want to use exp() as PDF, so your integral looks like
I = S D * exp(-x^2/2)/D dx, x in [-infinity...infinity]
PDF to be normalized so it shall include normalization factor D, which is exactly equal to \sqrt{2 \pi} from gaussian integral.
Now f(x) is just a constant equal to D. It doesn't depend on x. It means that you for each sampled x should add to accumulator a CONSTANT value of D. After running N samples,
in accumulator you'll have exactly N*D. To find mean you'll divide by N and as a result you'll get perfect D, which is \sqrt{2 \pi}, which, in turn, is
2.5066.
Too rusty to write any matlab, and Happy New Year anyway

How do I check to see if three points form a straight line?

Problem: Create the function mylinecheck(a,b,c,d,e,f) which takes six inputs:
a,b,c,d,e,f which are real numbers, and a,c,e are not equal. The function must check if the three points (a,b), (c,d), and (e,f) all lie on the same line. If so, return a 1. If not, return a 0.
I think what I want to do is tell MATLAB to check if coordinates (c,d) and (e,f) are multiples of (a,b), and then if not I will return a 0. If so, I will return a 1. If this is the right thought process, I'm not sure how to command MATLAB to do so. Any advice would be greatly appreciated.
The points (x1,y1), (x2,y2), and (x3,y3) lie on the same line if and only if they satisfy
a x + b y + c = 0
for fixed values of a, b, and c (I cannot get over your notation; sorry for the "confusion"), where a or b are nonzero. Hence they lie on the same line if and only if
a x1 + b y1 + c = 0 [x1 y1 1][a] [0]
a x2 + b y2 + c = 0 <=> [x2 y2 1][b] = [0]
a x3 + b y3 + c = 0 [x3 y3 1][c] [0],
that is, the homogeneous linear system with the matrix
[x1 y1 1]
X = [x2 y2 1]
[x3 y3 1]
has a nonzero solution. This is possible only if X is singular. By eliminating the last column of X you can find that X is singular if and only if the matrix
Y = [x2-x1 y2-y1]
[x3-x1 y3-y1]
is singular.
To reliably check for the singularity of a matrix in Matlab, you can use SVD or, equivalently, the function rank. Hence your function could be implemented as follows:
function [result] = mylinecheck(x1,y1,x2,y2,x3,y3)
result = rank([x2-x1, y2-y1; x3-x1, y3-y1]) < 2;
If you want to check if points all fall on the same line (or are collinear), one of the classic methods would be to assume that each point forms a vertex in a triangle. If the three points make the triangle such that the area is equal to 0, then the points would be collinear or form a line. This can be done by checking the determinant of the following matrix:
[a b 1]
[c d 1]
[e f 1]
You can read the article on collinearity on Wolfram Mathworld here: http://mathworld.wolfram.com/Collinear.html (I also linked it above).
As such, your function simply needs to be:
function [out] = mylinecheck(a,b,c,d,e,f)
D = [a b 1; c d 1; e f 1];
out = det(D) == 0;
However, due to numerical imprecision, you may provide floating point numbers where the points are indeed collinear, but you may get a determinant that isn't equal to 0 (actually, perhaps a small number). As such, one thing I can suggest is check to see if the determinant is less than a small number. Something like:
function [out] = mylinecheck(a,b,c,d,e,f)
D = [a b 1; c d 1; e f 1];
out = abs(det(D)) < 1e-10;
1e-10 is a small number which is 10^{-10}. We take the abs to account for both positive and negative determinants, so you would be checking collinearity and is true if:
-10^{-10} < det(D) < 10^{-10}
However, with the comments made by Pavel, if points fall along the same line, if we decide to scale the coordinates, the determinant value will also increase in value as well. One suggestion I have is to perhaps be more liberal with the threshold. Make it larger.... perhaps something like 0.1.

Using matlabs regress like polyfit

I have:
x = [1970:1:2000]
y = [data]
size(x) = [30,1]
size(y) = [30,1]
I want:
% Yl = kx + m, where
[k,m] = polyfit(x,y,1)
For some reason i have to use "regress" for this.
Using k = regress(x,y) gives some totally random value that i have no idea where it comes from. How do it?
The number of outputs you get in "k" is dependant on the size of input X, so you will not get both m and k just by putting in your x and y straight. From the docs:
b = regress(y,X) returns a p-by-1 vector b of coefficient estimates for a multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses.
It is not exactly stated, but the example in the help docs using the carsmall inbuilt dataset shows you how to set this up. For your case, you'd want:
X = [ones(size(x)) x]; % make sure this is 30 x 2
b = regress(y,X); % y should be 30 x 1, b should be 2 x 1
b(1) should then be your m, and b(2) your k.
regress can also provide additional outputs, such as confidence intervals, residuals, statistics such as r-squared, etc. The input remains the same, you'd just change the outputs:
[b,bint,r,rint,stats] = regress(y,X);