Draw random numbers from a custom mixture on Matlab? - matlab

I want to generate random numbers from a mixture of Gumbel distributions in Matlab differing for location and scale. Could you advise on how to do that?
What I know (very little)
1) In Matlab there is a pre-built package to draw from a mixture of Gaussians. For example,
clear
rng default
m=-3;
mu_a = [m, m, m];
sigma_a = [1 0.1 0.5; 0.1 10 0.9; 0.5 0.9 20];
mu_b = -mu_a;
sigma_b= sigma_a;
MU = [mu_a;mu_b];
SIGMA = cat(3,sigma_a,sigma_b);
w = [1/2 1/2]; %equal weight 0.5
obj = gmdistribution(MU,SIGMA,w);
N = 10^4; %number draws
values = random(obj,N);
2) In Matlab there is a pre-built package to draw from Gumbel. See here
In general, I couldn't find any Matlab code to draw from a custom mixture in Matlab.

Say you have a mixture of 3 Gumbel distributions, each with their own mu and sigma, and relative weight (weights sum to 1), such that the total distribution is:
weight(1) * Gumbel(mu(1),sigma(1)) + weight(2) * Gumbel(mu(2),sigma(2)) + weight(3) * Gumbel(mu(3),sigma(3))
Then drawing a random value from this distribution is a two-step process:
Randomly select one of the 3 distributions to draw a number from.
Randomly select a value from the given distribution.
You can implement that this way:
mu = [1, 2, 3];
sigma = [0.9, 1.5, 2.1];
weight = [1, 2, 1.5]; weight = weight/sum(weight);
k = rand; % a random value in the range [0, 1];
k = find(k < cumsum(weight), 1, 'first');
random_value = evrnd(mu(k), sigma(k)); % Random value from the Gumbel distribution
The above generalizes to any number of distributions, and any type of distribution.
You can vectorize the above to draw N random values using:
N = 100;
k = rand(N,1); % a random value in the range [0, 1];
[~, k] = max(k < cumsum(weight), [], 2); % find doesn't vectorize nicely, this is an ugly workaround...
random_value = evrnd(mu(k), sigma(k)); % N random values from the Gumbel distributions

Related

Generating two correlated uniform random variables in Matlab

Like the title suggests, I am facing difficulty in understanding how we generate two correlated uniform [0,1] random variables. I am new to the idea of copulas.
I am struggling to write a MATLAB code wherein I am required to generate two correlated uniform [0,1] random variables.
Generating correlated uniform random variables with Gaussian Copula
rho = .75; % Desired target correlation
N = 1000; % Number of samples
Z = mvnrnd([0 0],[1 rho; rho 1], N);
U = normcdf(Z); % Correlated U(0,1) random variables
scatterhist(U(:,1),U(:,2),'Direction','out') % Visualize (change `rho` to see impact)
Note: Method not guaranteed to hit target correlation exactly but should be close enough for many applications.
This can be very useful to quickly generate correlated distributions using the inverse transform method (either analytically or numerically). Both use cases illustrated below.
Analytical approach
lambda = 2; alpha = 2; beta = 3;
rho = -.35; N = 1000;
Z = mvnrnd([0 0],[1 rho; rho 1], N);
U = normcdf(Z);
X = (-1/lambda)*log(U(:,1)); % Inverse Transform for Exponential
Y = beta*(-log(U(:,2))).^(1/alpha); % Inverse Transform for Weibull
corr(X,Y)
scatterhist(X,Y,'Direction','out')
Numerical approach
% Parameters
alpha = 6.7; lambda = 3;
mu = 0.1; sigma = 0.5;
rho = 0.75; N = 1000;
% Make distributions
pd_X = makedist('Gamma',alpha,lambda);
pd_Y = makedist('Lognormal',mu,sigma);
Z = mvnrnd([0 0],[1 rho; rho 1], N);
U = normcdf(Z);
% Use Inverse Transform for marginal distributions (numerically)
X = icdf(pd_X,U(:,1)); % Inverse CDF for X
Y = icdf(pd_Y,U(:,2)); % Inverse CDF for Y
corr(X,Y)
scatterhist(X,Y,'Direction','out')
References:
Inverse Transform
Copulas
Gaussian copula:
Ross, Sheldon. (2013). Simulation. Academic Press, San Diego, CA, 5th edition. 103--105.
Modified related answer from here.

How to draw random numbers from a gamma distribution without the Statistics Toolbox?

I am varying the signal strength for synthetic images. I need the signal to vary between 0 and 0.1, but I need to do this with a gamma distribution so that more of them fall around the .01/.02 range. The problem is that I am using the 2010 version of Matlab without the Statistics Toolbox that doesn't have the gamrnd function a part of its library.
Any and all help is greatly appreciated.
You can use the Inverse transform sampling method to convert a uniform distribution to any other distribution:
P = rand(1000);
X = gaminv(P(:),2,2); % with k = 2 and theta = 2
Here is a litle demonstration:
for k = [1 3 9]
for theta = [0.5 1 2]
X = gaminv(P(:),k,theta);
histogram(X,50)
hold on
end
end
Which gives:
Edit:
Without the statistics toolbox, you can use the Marsaglia's simple transformation-rejection method to generate random numbers from gamma distribution with rand and randn:
N = 10000; % no. of tries
% distribution parameters:
a = 0.5;
b = 0.1;
% Marsaglia's simple transformation-rejection:
d = a - 1/3;
x = randn(N,1);
U = rand(N,1);
v = (1+x./sqrt(9*d)).^3;
accept = log(U)<(0.5*x.^2+d-d*v+d*log(v));
Y = d*(v(accept)).*b;
Now Y is distributed like gamma(a,b). We can test the result using the gamrnd function:
n = size(Y,1);
X = gamrnd(a,b,n,1);
And the histograms of Y, and X are:
However, keep in mind that gamma distribution might not fit your needs because it has no specific upper bound (i.e. goes to infinity). So you may want to use another (bounded) distribution, like beta divided by 10.

How to generate a probability vector?

I would like to generate a vector of probabilities that is following some known distribution.
For example when I want the distribution to be uniform I can do in MATLAB:
N = 5;
proba = (1/(N))*ones(1, N);
What to do if I want to do it with Poisson distribution or Binomial distribution?
If you're looking for a solution that uses built-in MATLAB functions, you can look into random, which allows you to supply parameters to many types of well-known distributions.
For example, if you want to draw a M x N matrix of values from a binomial distribution with n trials and a p chance of success:
n=3; p=0.5; M=20; N=1;
random('Binomial',n,p,[M,N])
If you have a (discrete) probability distribution of your own creation, with the PMF given as a vector, you can sample from it by generating a random number r from a uniform distribution on [0,1] using r=rand() and then picking the first bin in the CMF which is greater than r.
x = [ 0 1 2 3 ];
PMF = [0.25 0.2 0.5 0.05];
CMF = cumsum(PMF);
N = 10000;
valuesDrawn = zeros(N,1);
for i = 1:N
r = rand();
for j = 1:length(PMF)
if r < CMF(j)
valuesDrawn(i) = x(j);
break;
end
end
end
hist(valuesDrawn);

Euclidean and Mahalanobis classifiers always return same error for each classifier?

I have a simple matlab code to generate some random data and then to use a Euclidean and Mahalanobis classifier to classify the random data. The issue I am having is that the error results for each classifier is always the same. They both always misclassify the same vectors. But the data is different each time.
So the data is created in a simple way to check the results easily. Because we have three classes all of which are equiprobable, I just generate 333 random values for each class and add them all to X to be classified. Thus the results should be [class 1, class 2, class 3] but 333 of each.
I can tell the classifiers work because I can view the data created by mvnrnd is random each time and the error changes. But between the two classifiers the error does not change.
Can anyone tell why?
% Create some initial values, means, covariance matrix, etc
c = 3;
P = 1/c; % All 3 classes are equiprobable
N = 999;
m1 = [1, 1];
m2 = [12, 8];
m3 = [16, 1];
m = [m1; m2; m3];
S = [4 0; 0 4]; % All share the same covar matrix
% Generate random data for each class
X1 = mvnrnd(m1, S, N*P);
X2 = mvnrnd(m2, S, N*P);
X3 = mvnrnd(m3, S, N*P);
X = [X1; X2; X3];
% Create the solution array zEst to compare results to
xEst = ceil((3/999:3/999:3));
% Do the actual classification for mahalanobis and euclidean
zEuc = euc_mal_classifier(m', S, P, X', c, N, true);
zMal = euc_mal_classifier(m', S, P, X', c, N, false);
% Check the results
numEucErr = 0;
numMalErr = 0;
for i=1:N
if(zEuc(i) ~= xEst(i))
numEucErr = numEucErr + 1;
end
if(zMal(i) ~= xEst(i))
numMalErr = numMalErr + 1;
end
end
% Tell the user the results of the classification
strE = ['Euclidean classifier error percent: ', num2str((numEucErr/N) * 100)];
strM = ['Mahalanob classifier error percent: ', num2str((numMalErr/N) * 100)];
disp(strE);
disp(strM);
And the classifier
function z = euc_mal_classifier( m, S, P, X, c, N, eOrM)
for i=1:N
for j=1:c
if(eOrM == true)
t(j) = sqrt((X(:,i)- m(:,j))'*(X(:,i)-m(:,j)));
else
t(j) = sqrt((X(:,i)- m(:,j))'*inv(S)*(X(:,i)-m(:,j)));
end
end
[num, z(i)] = min(t);
end
The reason why there is no difference in classification lies in your covariance matrix.
Assume the distance of a point to the center of a class is [x,y].
For euclidian the distance then will be:
sqrt(x*x + y*y);
For Mahalanobis:
Inverse of covariance matrix:
inv([a,0;0,a]) = [1/a,0;0,1/a]
Distance is then:
sqrt(x*x*1/a + y*y*1/a) = 1/sqrt(a)* sqrt(x*x + y*y)
So, the distances for the classes will be the same as euclidean but with a scale factor. Since the scale factor is the same for all classes and dimensions, you will not find a difference in your class assignments!
Test it with different covariance matrices and you will find your errors to differ.
Because of this kind of data with identity covariance matrix, all classifiers should result in almost the same performance
let's see the data without identity covariance matrix that three classifiers lead to different errors :
err_bayesian =
0.0861
err_euclidean =
0.1331
err_mahalanobis =
0.0871
close('all');clear;
% Generate and plot dataset X1
m1=[1, 1]'; m2=[10, 5]';m3=[11, 1]';
m=[m1 m2 m3];
S1 = [7 4 ; 4 5];
S(:,:,1)=S1;
S(:,:,2)=S1;
S(:,:,3)=S1;
P=[1/3 1/3 1/3];
N=1000;
randn('seed',0);
[X,y] =generate_gauss_classes(m,S,P,N);
plot_data(X,y,m,1);
randn('seed',200);
[X4,y1] =generate_gauss_classes(m,S,P,N);
% 2.5_b.1 Applying Bayesian classifier
z_bayesian=bayes_classifier(m,S,P,X4);
% 2.5_b.2 Apply ML estimates of the mean values and covariance matrix (common to all three
% classes) using function Gaussian_ML_estimate
class1_data=X(:,find(y==1));
[m1_hat, S1_hat]=Gaussian_ML_estimate(class1_data);
class2_data=X(:,find(y==2));
[m2_hat, S2_hat]=Gaussian_ML_estimate(class2_data);
class3_data=X(:,find(y==3));
[m3_hat, S3_hat]=Gaussian_ML_estimate(class3_data);
S_hat=(1/3)*(S1_hat+S2_hat+S3_hat);
m_hat=[m1_hat m2_hat m3_hat];
% Apply the Euclidean distance classifier, using the ML estimates of the means, in order to
% classify the data vectors of X1
z_euclidean=euclidean_classifier(m_hat,X4);
% 2.5_b.3 Similarly, for the Mahalanobis distance classifier, we have
z_mahalanobis=mahalanobis_classifier(m_hat,S_hat,X4);
% 2.5_c. Compute the error probability for each classifier
err_bayesian = (1-length(find(y1==z_bayesian))/length(y1))
err_euclidean = (1-length(find(y1==z_euclidean))/length(y1))
err_mahalanobis = (1-length(find(y1==z_mahalanobis))/length(y1))

Obtaining sigma of a Normal distibution in Matlab

I have the mean and the percentile 60 of a Normal Distribution. I need to obtain a random array of numbers form it with those two values in Matlab.
Is this somehow posible?
Thanks
I'm not exactly sure of what your looking. I understood that you want to know the standard deviation for a certain value of a quantile at 0.6 (percentile = quantile * 100)?
From wikipedia. (look at the quantile function for a mean μ and variance σ2).
From this you have all you need:
p = 0.6
F^-1(p) is your percentile
erf^{-1} is erfinv
Get the \sigma it's your standard deviation and then to generate your vector simply use x=randn(nbSamples,1)*sigma + mean
m = 2; %// number of rows
n = 5; %// number of columns
mu = 3.2; %// mean
p60 = 0.4; %// 60-percentile
sigma = p60 / norminv(.6); %// compute sigma from p60
result = mu + sigma*randn(m,n); %// random numbers with desired distribution
%// Or: result = normrnd(mu, sigma, m, n);