Is there anyway to calculate mean of beta distribution with MATLAB? - matlab

This is my code for calculate the mean:
syms a b x
f=1/(beta(a,b))*x^(a-1)*(1-x)^(b-1);
int(x*f,x,0,1)
Warning: Explicit integral could not be found.
ans =
int((x*x^(a - 1)*(1 - x)^(b - 1))/beta(a, b), x == 0..1)
How can I fix this?
This is my result for TryHArd's:
syms x a b
f=int(x^(a-1)*(1-x)^(b-1),x,0,1)
Warning: Explicit integral could not be found.
f =
piecewise([0 < real(a) and 0 < real(b), beta(a, b)], [real(a) <= 0 or real(b) <= 0, int(x^(a - 1)*(1 - x)^(b - 1), x == 0..1)])
My result did not show the same gamma(a)*gamma(b)/gamma(a+b) as TryHard did.

I assume by beta you mean
beta(z,w) = integral from 0 to 1 of t.^(z-1) .* (1-t).^(w-1) dt.
Then break down the problem:
>> int((x^(a-1))*((1-x)^(b-1)),x,0,1)
ans =
gamma(b)*gamma(a)/gamma(a+b)
The desired integral is equivalent to:
>> bint=int((x^(a-1))*((1-x)^(b-1)),x,0,1);
>> int( x*((x^(a-1))*((1-x)^(b-1)))/bint,x,0,1)
ans =
a/(a+b)
(This was computed with an earlier version of the Matlab SMT (on R14), but should serve as a guide.)

The mean of the Beta distritution is 1/(1+b/a). See for example here

Wolfram alpha is capable of it.
Computing the definite integral directly exceeds maximum standard computation time, but it can find the indefinite integral. Limiting that expression to x between 0 and 1, means simply equating x to 1. This results in this expression, the alternate form of which is the standard form of the mean of the beta distribution.
Wolfram Mathematica (& Alpha) is simply better at symbolic math than MuPad. I'd advise to use MATLAB only for what it rocks at: numerical math. Other than perhaps some fringe cases, I do not expect MuPad to ever come close to what has been achieved by Wolfram.

If you need the result for some specific parameters alpha and beta, you can use betastat. This function can be used also with vectors, like that:
A=[4 4 4];
B=[5 6 7];
[m,v]=betastat(A,B)
This will give you a 3-elements vector for both m (mean) and v (variance). For this example, m will be the means of beta(4,5), beta(4,6) and beta(4,7). (Note that "beta" here denotes the distribution, not the beta-function, like beta function on Matlab).
If you need a general (mathematical) solution, see Rody's answer. Wolfram is much appropriate for this.
EDIT: You can define f (in your code) by the function betapdf instead of the whole formula you wrote over there.

Related

Row--wise application of an inline function

I defined an inline function f that takes as argument a (1,3) vector
a = [3;0.5;1];
b = 3 ;
f = #(x) x*a+b ;
Suppose I have a matrix X of size (N,3). If I want to apply f to each row of X, I can simply write :
f(X)
I verified that f(X) is a (N,1) vector such that f(X)(i) = f(X(i,:)).
Now, if I a add a quadratic term :
f = #(x) x*A*x' + x*a + b ;
the command f(X) raises an error :
Error using +
Matrix dimensions must agree.
Error in #(x) x*A*x' + x*a + b
I guess Matlab is considering the whole matrix X as the input to f. So it does not create a vector with each row, i, being equal to f(X(i,:)). How can I do it ?
I found out that there exist a built-in function rowfun that could help me, but it seems to be available only in versions r2016 (I have version r2015a)
That is correct, and expected.
MATLAB tries to stay close to mathematical notation, and what you are doing (X*A*X' for A 3×3 and X N×3) is valid math, but not quite what you intend to do -- you'll end up with a N×N matrix, which you cannot add to the N×1 matrix x*a.
The workaround is simple, but ugly:
f_vect = #(x) sum( (x*A).*x, 2 ) + x*a + b;
Now, unless your N is enormous, and you have to do this billions of times every minute of every day, the performance of this is more than acceptable.
Iff however this really and truly is your program's bottleneck, than I'd suggest taking a look at MMX on the File Exchange. Together with permute(), this will allow you to use those fast BLAS/MKL operations to do this calculation, speeding it up a notch.
Note that bsxfun isn't going to work here, because that does not support mtimes() (matrix multiplication).
You can also upgrade to MATLAB R2016b, which will have built-in implicit dimension expansion, presumably also for mtimes() -- but better check, not sure about that one.

Matlab : Help in modulus operation

I am trying to implement a map / function which has the equation Bernoulli Shift Map
x_n+1 = 2* x_n mod 1
The output of this map will be a binary number which will be either 0/1.
So, I generated the first sample x_1 using rand. The following is the code. The problem is I am getting real numbers. When using a digital calculator, I can get binary, whereas when using Matlab, I am getting real numbers. Please help where I am going wrong. Thank you.
>> x = rand();
>> x
x =
0.1647
>> y = mod(2* x,1)
y =
0.3295
The dyadic transformation seems to be a transformation from [0,1) continuous to [0,1) continuous. I see nothing wrong with your test code if you are trying to implement the dyadic mapping. You should be expecting output in the [0,1)
I misunderstood your question because I focused on the assumption you had that the output should be binary [0 or 1], which is wrong.
To reproduce the output of the dyadic transformation as in the link you provided, your code works fine (for 1 value), and you can use this function to calculate N terms (assuming a starting term x0) :
function x = dyadic(x0,n)
x = zeros(n,1) ; %// preallocate output vector
x(1) = x0 ; %// assign first term
for k=2:n
x(k) = mod( 2*x(k-1) , 1) ; %// calculate all terms of the serie
end
Note that the output does not have to be binary, it has to be between 0 and 1.
In the case of integers, the result of mod(WhateverInteger,1) is always 0, but in the case of Real numbers (which is what you use here), the result of mod(AnyRealNumber,1) will be the fractional part, so a number between 0 and 1. (1 is mathematically excluded, 0 is possible by the mod(x,1) operation, but in the case of your serie it means all the successive term will be zero too).

matlab wrong modulo result when the divident is raised to a power

Just wondering... I tried doing by hand (with the multiply and square method) the operation (111^11)mod143 and I got the result 67. I also checked that this is correct, in many online tools. Yet, in matlab plugging:
mod(111^11,143)
gives 127! Is there any particular reason for this? I didn't find anything in the documentation...
The value of 111^11 (about 3.1518e+022) exceeds the maximum integer that is guaranteed to be represented exactly as a double, which is 2^53 (about 9.0072e+015). So the result is spoilt by insufficient numerical precision.
To achieve the correct result, use symbolic computation:
>> syms x y z
>> r = mod(x^y, z);
>> subs(r, [x y z], [111 11 143])
ans =
67
Alternatively, for this specific operation (modulo of a large number that is expressed as a product of small numbers), you can do the computation very easily using the following fact (where ∗ denotes product):
mod(a∗b, z) = mod(mod(a,z)∗mod(b,z), z)
That is, you can apply the modulo operation to factors of your large number and the final result is unchanged. If you choose factors sufficiently small so that they can be represented exactly as double, you can do the computation numerically without any loss of precision.
For example: using the decomposition 111^11 = 111^4*111^4*111^3, since all factors are small enough, gives the correct result:
>> mod((mod(111^4, 143))^2 * mod(111^3, 143), 143)
ans =
67
Similarly, using 111^2 and 111 as factors,
>> mod((mod(111^2, 143))^5 * mod(111, 143), 143)
ans =
67
from the matlab website they recommend using powermod(b, e, m) (b^e mod m)
"If b and m are numbers, the modular power b^e mod m can also be computed by the direct call b^e mod m. However, powermod(b, e, m) avoids the overhead of computing the intermediate result be and computes the modular power much more efficiently." ...
Another way is to use symfun
syms x y z
f = symfun(mod(x^y,z), [x y z])
f(111,11,143)

determine the frequency of a number if a simulation

I have the following function:
I have to generate 2000 random numbers from this function and then make a histogram.
then I have to determine how many of them is greater that 2 with P(X>2).
this is my function:
%function [ output_args ] = Weibullverdeling( X )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
for i=1:2000
% x= rand*1000;
%x=ceil(x);
x=i;
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
and it gives me the following image:
how can I possibly make it to tell me how many values Do i Have more than 2?
Very simple:
>> Y_greater_than_2 = Y(Y>2);
>> size(Y_greater_than_2)
ans =
1 1998
So that's 1998 values out of 2000 that are greater than 2.
EDIT
If you want to find the values between two other values, say between 1 and 4, you need to do something like:
>> Y_between = Y(Y>=1 & Y<=4);
>> size(Y_between)
ans =
1 2
This is what I think:
for i=1:2000
x=rand(1);
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
U is a uniform random variable from which you can get the X. So you need to use rand function in MATLAB.
After which you implement:
size(Y(Y>2),2);
You can implement the code directly (here k is your root, n is number of data points, y is the highest number of distribution, x is smallest number of distribution and lambda the lambda in your equation):
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
result=numel(X(X>2));
Lets split it and explain it detailed:
You want the k-th root of a number:
number.^(1/k)
you want the natural logarithmic of a number:
log(number)
you want to multiply sth.:
numberA.*numberB
you want to get lets say 1000 random numbers between x and y:
(x+rand(1,1000).*(y-x))
you want to combine all of that:
x= lower_bound;
y= upper_bound;
n= No_Of_data;
lambda=wavelength; %my guess
k= No_of_the_root;
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
So you just have to insert your x,y,n,lambda and k
and then check
bigger_2 = X(X>2);
which would return only the values bigger than 2 and if you want the number of elements bigger than 2
No_bigger_2=numel(bigger_2);
I'm going to go with the assumption that what you've presented is supposed to be a random variate generation algorithm based on inversion, and that you want real-valued (not complex) solutions so you've omitted a negative sign on the logarithm. If those assumptions are correct, there's no need to simulate to get your answer.
Under the stated assumptions, your formula is the inverse of the complementary cumulative distribution function (CCDF). It's complementary because smaller values of U give larger values of X, and vice-versa. Solve the (corrected) formula for U. Using the values from your Matlab implementation:
X = 3 * (-log(U))^(6/5)
X / 3 = (-log(U))^(6/5)
-log(U) = (X / 3)^(5/6)
U = exp(-((X / 3)^(5/6)))
Since this is the CCDF, plugging in a value for X gives the probability (or proportion) of outcomes greater than X. Solving for X=2 yields 0.49, i.e., 49% of your outcomes should be greater than 2.
Make suitable adjustments if lambda is inside the radical, but the algebra leading to solution is similar. Unless I messed up my arithmetic, the proportion would then be 55.22%.
If you still are required to simulate this, knowing the analytical answer should help you confirm the correctness of your simulation.

MATLAB: Test if anonymous vector is a subset of R^n

I'm trying to use MatLab code as a way to learn math as a programmer.
So reading I'm this post about subspaces and trying to build some simple matlab functions that do it for me.
Here is how far I got:
function performSubspaceTest(subset, numArgs)
% Just a quick and dirty function to perform subspace test on a vector(subset)
%
% INPUT
% subset is the anonymous function that defines the vector
% numArgs is the the number of argument that subset takes
% Author: Lasse Nørfeldt (Norfeldt)
% Date: 2012-05-30
% License: http://creativecommons.org/licenses/by-sa/3.0/
if numArgs == 1
subspaceTest = #(subset) single(rref(subset(rand)+subset(rand))) ...
== single(rref(rand*subset(rand)));
elseif numArgs == 2
subspaceTest = #(subset) single(rref(subset(rand,rand)+subset(rand,rand))) ...
== single(rref(rand*subset(rand,rand)));
end
% rand just gives a random number. Converting to single avoids round off
% errors.
% Know that the code can crash if numArgs isn't given or bigger than 2.
outcome = subspaceTest(subset);
if outcome == true
display(['subset IS a subspace of R^' num2str(size(outcome,2))])
else
display(['subset is NOT a subspace of R^' num2str(size(outcome,2))])
end
And these are the subset that I'm testing
%% Checking for subspaces
V = #(x) [x, 3*x]
performSubspaceTest(V, 1)
A = #(x) [x, 3*x+1]
performSubspaceTest(A, 1)
B = #(x) [x, x^2, x^3]
performSubspaceTest(B, 1)
C = #(x1, x3) [x1, 0, x3, -5*x1]
performSubspaceTest(C, 2)
running the code gives me this
V =
#(x)[x,3*x]
subset IS a subspace of R^2
A =
#(x)[x,3*x+1]
subset is NOT a subspace of R^2
B =
#(x)[x,x^2,x^3]
subset is NOT a subspace of R^3
C =
#(x1,x3)[x1,0,x3,-5*x1]
subset is NOT a subspace of R^4
The C is not working (only works if it only accepts one arg).
I know that my solution for numArgs is not optimal - but it was what I could come up with at the current moment..
Are there any way to optimize this code so C will work properly and perhaps avoid the elseif statments for more than 2 args..?
PS: I couldn't seem to find a build-in matlab function that does the hole thing for me..
Here's one approach. It tests if a given function represents a linear subspace or not. Technically it is only a probabilistic test, but the chance of it failing is vanishingly small.
First, we define a nice abstraction. This higher order function takes a function as its first argument, and applies the function to every row of the matrix x. This allows us to test many arguments to func at the same time.
function y = apply(func,x)
for k = 1:size(x,1)
y(k,:) = func(x(k,:));
end
Now we write the core function. Here func is a function of one argument (presumed to be a vector in R^m) which returns a vector in R^n. We apply func to 100 randomly selected vectors in R^m to get an output matrix. If func represents a linear subspace, then the rank of the output will be less than or equal to m.
function result = isSubspace(func,m)
inputs = rand(100,m);
outputs = apply(func,inputs);
result = rank(outputs) <= m;
Here it is in action. Note that the functions take only a single argument - where you wrote c(x1,x2)=[x1,0,x2] I write c(x) = [x(1),0,x(2)], which is slightly more verbose, but has the advantage that we don't have to mess around with if statements to decide how many arguments our function has - and this works for functions that take input in R^m for any m, not just 1 or 2.
>> v = #(x) [x,3*x]
>> isSubspace(v,1)
ans =
1
>> a = #(x) [x(1),3*x(1)+1]
>> isSubspace(a,1)
ans =
0
>> c = #(x) [x(1),0,x(2),-5*x(1)]
>> isSubspace(c,2)
ans =
1
The solution of not being optimal barely scratches the surface of the problem.
I think you're doing too much at once: rref should not be used and is complicating everything. especially for numArgs greater then 1.
Think it through: [1 0 3 -5] and [3 0 3 -5] are both members of C, but their sum [4 0 6 -10] (which belongs in C) is not linear product of the multiplication of one of the previous vectors (e.g. [2 0 6 -10] ). So all the rref in the world can't fix your problem.
So what can you do instead?
you need to check if
(randn*subset(randn,randn)+randn*subset(randn,randn)))
is a member of C, which, unless I'm mistaken is a difficult problem: Conceptually you need to iterate through every element of the vector and make sure it matches the predetermined condition. Alternatively, you can try to find a set such that C(x1,x2) gives you the right answer. In this case, you can use fminsearch to solve this problem numerically and verify the returned value is within a defined tolerance:
[s,error] = fminsearch(#(x) norm(C(x(1),x(2)) - [2 0 6 -10]),[1 1])
s =
1.999996976386119 6.000035034493023
error =
3.827680714104862e-05
Edit: you need to make sure you can use negative numbers in your multiplication, so don't use rand, but use something else. I changed it to randn.