How can I compute kernels in Matlab? - matlab

I want to calculate weighted kernels (for using in a SVM classifier) in Matlab but I'm currently compeletely confused.
I would like to implement the following weighted RBF and Sigmoid kernel:
x and y are vectors of size n, gamma and b are constants and w is a vector of size n with weights.
The problem now is that the fitcsvm method from Matlab need two matrices as input, i.e. K(X,Y). For example the not weighted RBF and sigmoid kernel can be computed as follows:
K_rbf = exp(-gamma .* pdist2(X,Y,'euclidean').^2)
K_sigmoid = tanh(gamma*X*Y' + b);
X and Y are matrices where the rows are the data points (vectors).
How can I compute the above weighted kernels efficiently in Matlab?

Simply scale your input by the weights before passing to the kernel equations. Lets assume you have a vector w of weights (of size of the input problem), you have your data in rows of X, and features are columns. Multiply it with broadcasting over rows (for example using bsxfun) with w. Thats all. Do not do the same to Y though, just multiply one of the matrices. This is true for every such "weighted" kernel based on scalar product (like sigmoid); for distance based (like RBF) you want to scale both by sqrt of w.
Short proofs:
scalar based
f(<wx, y>) = f(w<x, y>) (linearity of scalar product)
distance based
f(||sqrt(w)x - sqrt(w)y||^2) = f(SUM_i (sqrt(w_i)(x_i - y_i))^2)
= f(SUM_i w_i (x_i - y_i)^2)

Related

How to reduce dimensions of Gaussian Mixture Model parameters

Assuming I have already built a Gaussian Mixture Model using the fitgmdist function and want to map the multivariate distributions into a subspace with a smaller dimension without having to recreate the model how do I go about it?
In MATLAB terms, I have a GMM, gmm_goal, with gmm_goal.NumComponents = K and gmm_goal.NumVariables = N and want to reduce N to a number n < N.
If code isn't available, an explanation or mathematical derivation will do.
The parameters of the Gaussian Mixture Model effected by the transformation into a subspace are the mean and variance of the Gaussian distributions that form the GMM.
Assuming a linear transformation of your data points x:
y = A*x + b
Because of linearity of expectation, we can calculate the new mean and variance of the subspace from the old ones:
mean_new = A*mean + b
variance_new = A*variance*A'

Generate random samples from arbitrary discrete probability density function in Matlab

I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.

Multivariate Gaussian distribution formula implementation

I have a certain problem while implementing multivariate Gaussian distribution for anomaly detection.
I have referred the formula from Andrew Ng notes
http://www.holehouse.org/mlclass/15_Anomaly_Detection.html
below is the problem I face
Suppose I have a data set with 2 features and m number of training set i.e n=2 and wants to determine my multivariate Gaussian probability p(x;mu;sigma) which should be a [m*1] matrix because it produces estimated Gaussian value by feature correlation.
The problem I face is I am unable to use the formula to produce the matrix [m*1].
I am using Octave as IDE to develop the algorithm.
Below is a snapshot showcasing my problem
Considering the multiplication of the Red boundary equation because the LHS of the red boundary is just a real number
PLEASE HELP ME UNDERSTAND WHERE AM I GOING WRONG
Thanks
I think you got the dimensions wrong.
Let's assume you have a 2-dimensional (n=2) data of m instances. We can store this data as a n-by-m matrix in MATLAB (columns are data instances, rows represent features/dimensions). In this case we have:
X the data matrix of size nxm, each instance x = X(:,i) is a vector of size nx1 (column vector in our convention).
mu is the mean vector (mu = mean(X,2)). This is also a column vector of same size as an instance nx1.
sigma is the covariance matrix (sigma = cov(X.')). It has size nxn (it describes how each dimensions co-vary with each other dimension).
So the part that you highlighted in red involves expressions of the following sizes:
= ([nx1] - [nx1])' * [nxn] * ([nx1] - [nx1])
= [1xn] * [nxn] * [nx1]
= 1x1

it is possible determinant of matrix(256*256) be infinite

i have (256*1) vectors of feature come from (16*16) of gray images. number of vectors is 550
when i compute Sample covariance of this vectors and compute covariance matrix determinant
answer is inf
it is possible determinant of finite matrix with finite range (0:255) value be infinite or i mistake some where?
in fact i want classification with bayesian estimation , my distribution is gaussian and when
i compute determinant be inf and ultimate Answer(likelihood) is zero .
some part of my code:
Mean = mean(dataSet,2);
MeanMatrix = Mean*ones(1,NoC);
Xc = double(dataSet)-MeanMatrix; % transform data to the origine
Sigma = (1/NoC) *Xc*Xc'; % calculate sample covariance matrix
Parameters(i).M = Mean';
Parameters(i).C = Sigma;
likelihoods(i) = (1/(2*pi*sqrt(det(params(i).C)))) * (exp(-0.5 * (double(X)-params(i).M)' * inv(params(i).C) * (double(X)-params(i).M)));
variable i show my classes;
variable X show my feature vector;
Can the determinant of such matrix be infinite? No it cannot.
Can it evaluate as infinite? Yes definitely.
Here is an example of a matrix with a finite amount of elements, that are not too big, yet the determinant will rarely evaluate as a finite number:
det(rand(255)*255)
In your case, probably what is happening is that you have too few datapoints to produce a full-rank covariance matrix.
For instance, if you have N examples, each with dimension d, and N<d, then your d x d covariance matrix will not be full rank and will have a determinant of zero.
In this case, a matrix inverse (precision matrix) does not exist. However, attempting to compute the determinant of the inverse (by taking 1/|X'*X|=1/0 -> \infty) will produce an infinite value.
One way to get around this problem is to set the covariance to X'*X+eps*eye(d), where eps is a small value. This technique corresponds to placing a weak prior distribution on elements of X.
no it is not possible. it may be singular but taking elements a large value has will have a determinant value.

Principal Components calculated using different functions in Matlab

I am trying to understand principal component analysis in Matlab,
There seems to be at least 3 different functions that do it.
I have some questions re the code below:
Am I creating approximate x values using only one eigenvector (the one corresponding to the largest eigenvalue) correctly? I think so??
Why are PC and V which are both meant to be the loadings for (x'x) presented differently? The column order is reversed because eig does not order the eigenvalues with the largest value first but why are they the negative of each other?
Why are the eig values not in ordered with the eigenvector corresponding to the largest eigenvalue in the first column?
Using the code below I get back to the input matrix x when using svd and eig, but the results from princomp seem to be totally different? What so I have to do to make princomp match the other two functions?
Code:
x=[1 2;3 4;5 6;7 8 ]
econFlag=0;
[U,sigma,V] = svd(x,econFlag);%[U,sigma,coeff] = svd(z,econFlag);
U1=U(:,1);
V1=V(:,1);
sigma_partial=sigma(1,1);
score1=U*sigma;
test1=score1*V';
score_partial=U1*sigma_partial;
test1_partial=score_partial*V1';
[PC, D] = eig(x'*x)
score2=x*PC;
test2=score2*PC';
PC1=PC(:,2);
score2_partial=x*PC1;
test2_partial=score2_partial*PC1';
[o1 o2 o3]=princomp(x);
Yes. According to the documentation of svd, diagonal elements of the output S are in decreasing order. There is no such guarantee for the the output D of eig though.
Eigenvectors and singular vectors have no defined sign. If a is an eigenvector, so is -a.
I've often wondered the same. Laziness on the part of TMW? Optimization, because sorting would be an additional step and not everybody needs 'em sorted?
princomp centers the input data before computing the principal components. This makes sense as normally the PCA is computed with respect to the covariance matrix, and the eigenvectors of x' * x are only identical to those of the covariance matrix if x is mean-free.
I would compute the PCA by transforming to the basis of the eigenvectors of the covariance matrix (centered data), but apply this transform to the original (uncentered) data. This allows to capture a maximum of variance with as few principal components as possible, but still to recover the orginal data from all of them:
[V, D] = eig(cov(x));
score = x * V;
test = score * V';
test is identical to x, up to numerical error.
In order to easily pick the components with the most variance, let's fix that lack of sorting ourselves:
[V, D] = eig(cov(x));
[D, ind] = sort(diag(D), 'descend');
V = V(:, ind);
score = x * V;
test = score * V';
Reconstruct the signal using the strongest principal component only:
test_partial = score(:, 1) * V(:, 1)';
In response to Amro's comments: It is of course also possible to first remove the means from the input data, and transform these "centered" data. In that case, for perfect reconstruction of the original data it would be necessary to add the means again. The way to compute the PCA given above is the one described by Neil H. Timm, Applied Multivariate Analysis, Springer 2002, page 446:
Given an observation vector Y with mean mu and covariance matrix Sigma of full rank p, the goal of PCA is to create a new set of variables called principal components (PCs) or principal variates. The principal components are linear combinations of the variables of the vector Y that are uncorrelated such that the variance of the jth component is maximal.
Timm later defines "standardized components" as those which have been computed from centered data and are then divided by the square root of the eigenvalues (i.e. variances), i.e. "standardized principal components" have mean 0 and variance 1.