Mixture of Gaussians (EM) how to calculate the responsabilities

Mixture of Gaussians (EM) how to calculate the responsabilities - matlab

I have an assignment to implement MoG with EM in matlab. The assignment:
My code atm;
clear
clc
load('data2')
%% INITIALIZE
K = 20
pi = 0.01:((1-0.01)/K):1;
for k=1:20
sigma{k} = eye(2);
mu(k,:) = [rand(1),rand(1)];
end
%% Posterior over the laten variables
addition = 0;
for k =1:20
addition = addition + (pi(k)*mvnpdf(x,mu(k,:), sigma{k}));
end
test = 0;
for k =1:20
gamma{k} = (pi(k)*mvnpdf(x,mu(k), sigma{k})) ./ addition;
end
data has 1000 rows and 2 columns (so 1000 datapoints). My question is now how do I calculate the responsibilities. When I try to calculate the covariance matrix I get a 1x1000 matrix. While I believe the covariance matrix should be 2x2.

Unfortunately, I don't speak Matlab, so I can't really see where your code is incorrect, but I can answer generally (and maybe someone who knows Matlab can see if your code can be salvaged). Each datapoint has a gamma associated with it, which is the expectation of an indicator variable for each component in the mixture. Calculating them is pretty simple: for the i-th datapoint and the k-th component, gamma_ik is just the density of the k-th component at the i-th point, multiplied by the k-th mixture coefficient (the prior probability that the point came from the k-th component, which is pi in your assignment), normalised by this quantity computed over all k. Thus for each datapoint, you have a vector of responsibilities (of length k) with a sum of one.

Related

Understanding PCA in MATLAB

What are the difference between the following two functions?
prepTransform.m
function [mu trmx] = prepTransform(tvec, comp_count)
% Computes transformation matrix to PCA space
% tvec - training set (one row represents one sample)
% comp_count - count of principal components in the final space
% mu - mean value of the training set
% trmx - transformation matrix to comp_count-dimensional PCA space
% this is memory-hungry version
% commented out is the version proper for Win32 environment
tic;
mu = mean(tvec);
cmx = cov(tvec);
%cmx = zeros(size(tvec,2));
%f1 = zeros(size(tvec,1), 1);
%f2 = zeros(size(tvec,1), 1);
%for i=1:size(tvec,2)
% f1(:,1) = tvec(:,i) - repmat(mu(i), size(tvec,1), 1);
% cmx(i, i) = f1' * f1;
% for j=i+1:size(tvec,2)
% f2(:,1) = tvec(:,j) - repmat(mu(j), size(tvec,1), 1);
% cmx(i, j) = f1' * f2;
% cmx(j, i) = cmx(i, j);
% end
%end
%cmx = cmx / (size(tvec,1)-1);
toc
[evec eval] = eig(cmx);
eval = sum(eval);
[eval evid] = sort(eval, 'descend');
evec = evec(:, evid(1:size(eval,2)));
% save 'nist_mu.mat' mu
% save 'nist_cov.mat' evec
trmx = evec(:, 1:comp_count);
pcaTransform.m
function [pcaSet] = pcaTransform(tvec, mu, trmx)
% tvec - matrix containing vectors to be transformed
% mu - mean value of the training set
% trmx - pca transformation matrix
% pcaSet - output set transforrmed to PCA space
pcaSet = tvec - repmat(mu, size(tvec,1), 1);
%pcaSet = zeros(size(tvec));
%for i=1:size(tvec,1)
% pcaSet(i,:) = tvec(i,:) - mu;
%end
pcaSet = pcaSet * trmx;
Which one is actually doing PCA?
If one is doing PCA, what is the other one doing?

The first function prepTransform is actually doing the PCA on your training data where you are determining the new axes to represent your data onto a lower dimensional space. What it does is that it finds the eigenvectors of the covariance matrix of your data and then orders the eigenvectors such that the eigenvector with the largest eigenvalue appears in the first column of the eigenvector matrix evec and the eigenvector with the smallest eigenvalue appears in the last column. What's important with this function is that you can define how many dimensions you want to reduce the data down to by keeping the first N columns of evec which will allow you to reduce your data down to N dimensions. The discarding of the other columns and keeping only the first N is what is set as trmx in the code. The variable N is defined by the prep_count variable in prepTransform function.
The second function pcaTransform finally transforms data that is defined within the same domain as your training data but not necessarily the training data itself (it could be if you wish) onto the lower dimensional space that is defined by the eigenvectors of the covariance matrix. To finally perform the reduction of dimensions, or dimensionality reduction as it is popularly known, you simply take your training data where each feature is subtracted from its mean and you multiply your training data by the matrix trmx. Note that prepTransform outputting the mean of each feature in the vector mu is important in order to mean subtract your data when you finally call pcaTransform.
How to use these functions
To use these functions effectively, first determine the trmx matrix, which contain the principal components of your data by first defining how many dimensions you want to reduce your data down to as well as the mean of each feature stored in mu:
N = 2; % Reduce down to two dimensions for example
[mu, trmx] = prepTransform(tvec, N);
Next you can finally perform dimensionality reduction on your data that is defined within the same domain as tvec (or even tvec if you wish, but it doesn't have to be) by:
pcaSet = pcaTransform(tvec, mu, trmx);
In terms of vocabulary, pcaSet contain what are known as the principal scores of your data, which is the term used for the transformation of your data to the lower dimensional space.
If I can recommend something...
Finding PCA through the eigenvector approach is known to be unstable. I highly recommend you use the Singular Value Decomposition via svd on the covariance matrix where the V matrix of the result already gives you the eigenvectors sorted which correspond to your principal components:
mu = mean(tvec, 1);
[~,~,V] = svd(cov(tvec));
Then perform the transformation by taking the mean subtracted data per feature and multiplying by the V matrix, once you subset and grab the first N columns of V:
N = 2;
X = bsxfun(#minus, tvec, mu);
pcaSet = X*V(:, 1:N);
X is the mean subtracted data which performs the same thing as doing pcaSet = tvec - repmat(mu, size(tvec,1), 1);, but you are not explicitly replicating the mean vector over each training example but letting bsxfun do that for you internally. However, taking advantage of MATLAB R2016b, this repeating can be done without the explicit call to bsxfun:
X = tvec - mu;
Further Reading
If you fully want to understand the code that was written and the theory behind what it's doing, I recommend the following two Stack Overflow posts that I have written that talk about the topic:
What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?
How to use eigenvectors obtained through PCA to reproject my data?
The first post brings the code you presented into light which performs PCA using the eigenvector approach. The second post touches base on how you'd do it using the SVD towards the end of the answer. This answer I've written here is a mix between the two posts above.

Unexpected result with DFT in MATLAB

I have a problem when calculate discrete Fourier transform in MATLAB, apparently get the right result but when plot the amplitude of the frequencies obtained you can see values very close to zero which should be exactly zero. I use my own implementation:
function [y] = Discrete_Fourier_Transform(x)
N=length(x);
y=zeros(1,N);
for k = 1:N
for n = 1:N
y(k) = y(k) + x(n)*exp( -1j*2*pi*(n-1)*(k-1)/N );
end;
end;
end
I know it's better to use fft of MATLAB, but I need to use my own implementation as it is for college.
The code I used to generate the square wave:
x = [ones(1,8), -ones(1,8)];
for i=1:63
x = [x, ones(1,8), -ones(1,8)];
end
MATLAB version: R2013a(8.1.0.604) 64 bits
I have tried everything that has happened to me but I do not have much experience using MATLAB and I have not found information relevant to this issue in forums. I hope someone can help me.
Thanks in advance.

This will be a numerical problem. The values are in the range of 1e-15, while the DFT of your signal has values in the range of 1e+02. Most likely this won't lead to any errors when doing further processing. You can calculate the total squared error between your DFT and the MATLAB fft function by
y = fft(x);
yh = Discrete_Fourier_Transform(x);
sum(abs(yh - y).^2)
ans =
3.1327e-20
which is basically zero. I would therefore conclude: your DFT function works just fine.
Just one small remark: You can easily vectorize the DFT.
n = 0:1:N-1;
k = 0:1:N-1;
y = exp(-1j*2*pi/N * n'*k) * x(:);
With n'*k you create a matrix with all combinations of n and k. You then take the exp(...) of each of those matrix elements. With x(:) you make sure x is a column vector, so you can do the matrix multiplication (...)*x which automatically sums over all k's. Actually, I just notice, this is exactly the well-known matrix form of the DFT.

Defining an efficient distance function in matlab

I'm using kNN search function in matlab, but I'm calculating the distance between two objects of my own defined class, so I've written a new distance function. This is it:
function d = allRepDistance(obj1, obj2)
%calculates the min dist. between repr.
% obj2 is a vector, to fit kNN function requirements
n = size(obj2,1);
d = zeros(n,1);
for i=1:n
M = dist(obj1.Repr, [obj2(i,:).Repr]');
d(i) = min(min(M));
end
end
The difference is that obj.Repr may be a matrix, and I want to calculate the minimal distance between all the rows of each argument. But even if obj1.Repr is just a vector, which gives essentially the normal euclidian distance between two vectors, the kNN function is slower by a factor of 200!
I've checked the performance of just the distance function (no kNN). I measured the time it takes to calculate the distance between a vector and the rows of a matrix (when they are in the object), and it work slower by a factor of 3 then the normal distance function.
Does that make any sense? Is there a solution?

You are using dist(), which corresponds to the Euclidean distance weight function. However, you are not weighting your data, i.e. you don't consider that one dimension is more important that others. Thus, you can directly use the Euclidean distance pdist():
function d = allRepDistance(obj1, obj2)
% calculates the min dist. between repr.
% obj2 is a vector, to fit kNN function requirements
n = size(obj2,1);
d = zeros(n,1);
for i=1:n
X = [obj1.Repr, obj2(i,:).Repr'];
M = pdist(X,'euclidean');
d(i) = min(min(M));
end
end
BTW, I don't know your matrix dimensions, so you will need to deal with the concatenation of elements to create X correctly.

Calculating the essential matrix from two sets of corresponding points

I'm trying to reconstruct a 3d image from two calibrated cameras. One of the steps involved is to calculate the 3x3 essential matrix E, from two sets of corresponding (homogeneous) points (more than the 8 required) P_a_orig and P_b_orig and the two camera's 3x3 internal calibration matrices K_a and K_b.
We start off by normalizing our points with
P_a = inv(K_a) * p_a_orig
and
P_b = inv(K_b) * p_b_orig
We also know the constraint
P_b' * E * P_a = 0
I'm following it this far, but how do you actually solve that last problem, e.g. finding the nine values of the E matrix? I've read several different lecture notes on this subject, but they all leave out that crucial last step. Likely because it is supposedly trivial math, but I can't remember when I last did this and I haven't been able to find a solution yet.

This equation is actually pretty common in geometry algorithms, essentially, you are trying to calculate the matrix X from the equation AXB=0. To solve this, you vectorise the equation, which means,
vec() means vectorised form of a matrix, i.e., simply stack the coloumns of the matrix one over the another to produce a single coloumn vector. If you don't know the meaning of the scary looking symbol, its called Kronecker product and you can read it from here, its easy, trust me :-)
Now, say I call the matrix obtained by Kronecker product of B^T and A as C.
Then, vec(X) is the null vector of the matrix C and the way to obtain that is by doing the SVD decomposition of C^TC (C transpose multiplied by C) and take the the last coloumn of the matrix V. This last coloumn is nothing but your vec(X). Reshape X to 3 by 3 matrix. This is you Essential matrix.
In case you find this maths too daunting to code, simply use the following code by Y.Ma et.al:
% p are homogenius coordinates of the first image of size 3 by n
% q are homogenius coordinates of the second image of size 3 by n
function [E] = essentialDiscrete(p,q)
n = size(p);
NPOINTS = n(2);
% set up matrix A such that A*[v1,v2,v3,s1,s2,s3,s4,s5,s6]' = 0
A = zeros(NPOINTS, 9);
if NPOINTS < 9
error('Too few mesurements')
return;
end
for i = 1:NPOINTS
A(i,:) = kron(p(:,i),q(:,i))';
end
r = rank(A);
if r < 8
warning('Measurement matrix rank defficient')
T0 = 0; R = [];
end;
[U,S,V] = svd(A);
% pick the eigenvector corresponding to the smallest eigenvalue
e = V(:,9);
e = (round(1.0e+10*e))*(1.0e-10);
% essential matrix
E = reshape(e, 3, 3);

You can do several things:
The Essential matrix can be estimated using the 8-point algorithm, which you can implement yourself.
You can use the estimateFundamentalMatrix function from the Computer Vision System Toolbox, and then get the Essential matrix from the Fundamental matrix.
Alternatively, you can calibrate your stereo camera system using the estimateCameraParameters function in the Computer Vision System Toolbox, which will compute the Essential matrix for you.

Custom Algorithm for Exp. maximization in Matlab

I try to write an algorithm which determine $\mu$, $\sigma$,$\pi$ for each class from a mixture multivariate normal distribution.
I finish with the algorithm partially, it works when I set the random guess values($\mu$, $\sigma$,$\pi$) near from the real value. But when I set the values far from the real one, the algorithm does not converge. The sigma goes to 0 $(2.30760684053766e-24 2.30760684053766e-24)$.
I think the problem is my covarience calculation, I am not sure that this is the right way. I found this on wikipedia.
I would be grateful if you could check my algorithm. Especially the covariance part.
Have a nice day,
Thanks,
2 mixture gauss
size x = [400, 2] (400 point 2 dimension gauss)
mu = 2 , 2 (1 row = first gauss mu, 2 row = second gauss mu)
for i = 1 : k
gaussEvaluation(i,:) = pInit(i) * mvnpdf(x,muInit(i,:), sigmaInit(i, :) * eye(d));
gaussEvaluationSum = sum(gaussEvaluation(i, :));
%mu calculation
for j = 1 : d
mu(i, j) = sum(gaussEvaluation(i, :) * x(:, j)) / gaussEvaluationSum;
end
%sigma calculation methode 1
%for j = 1 : n
% v = (x(j, :) - muNew(i, :));
% sigmaNew(i) = sigmaNew(i) + gaussEvaluation(i,j) * (v * v');
%end
%sigmaNew(i) = sigmaNew(i) / gaussEvaluationSum;
%sigma calculation methode 2
sub = bsxfun(#minus, x, mu(i,:));
sigma(i,:) = sum(gaussEvaluation(i,:) * (sub .* sub)) / gaussEvaluationSum;
%p calculation
p(i) = gaussEvaluationSum / n;

Two points: you can observe this even when you implement gaussian mixture EM correctly, but in your case, the code does seem to be incorrect.
First, this is just a problem that you have to deal with when fitting mixtures of gaussians. Sometimes one component of the mixture can collapse on to a single point, resulting in the mean of the component becoming that point and the variance becoming 0; this is known as a 'singularity'. Hence, the likelihood also goes to infinity.
Check out slide 42 of this deck: http://www.cs.ubbcluj.ro/~csatol/gep_tan/Bishop-CUED-2006.pdf
The likelihood function that you are evaluating is not log-concave, so the EM algorithm will not converge to the same parameters with different initial values. The link I gave above also gives some solutions to avoid this over-fitting problem, such as putting a prior or regularization term on your parameters. You can also consider running multiple times with different starting parameters and discarding any results with variance 0 components as having over-fitted, or just reduce the number of components you are using.
In your case, your equation is right; the covariance update calculation on Wikipedia is the same as the one on slide 45 of the above link. However, if you are in a 2d space, for each component the mean should be a length 2 vector and the covariance should be a 2x2 matrix. Hence your code (for two components) is wrong because you have a 2x2 matrix to store the means and a 2x2 matrix to store the covariances; it should be a 2x2x2 matrix.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse