MATLAB: Calculating AIC of a Linear Regression Model - matlab

I have a matrix X with each row containing one training set, and each column containing unique features; and a vector y with each row containing respective responses (or solutions) for X. I can create a Linear Model like so:
modl = fitlm(X,y)
How can I calculate the AIC value for the above model? Unfortunately, the aic() function in matlab is not defined for linear models.

Related

Matlab's Arburg (Autoregression Burg's method) for forecasting time series

Matlab's arburg function returns a vector of coefficients of the form [1 c(i) c(2) ... c(p)] where p is the model's order. But these are not the coefficients for forecasting, instead they are used with a random input vector to simulate an stochastic AR process. Without forecasting anything on test data how can I compute model's error to calculate say AIC criterion? Is there a categorical difference between AR models like this and those used for forecasting?
So I have found that yes indeed we can use those coefficients (except the first one which is always 1) to forecast the next time point. To use the coefficients, first we need to remove the first one, then flip and negate the array. The mean absolute error can be calculated like this:
coeffs = -flip(coeffs(2:end))
error = mean(abs(time_series(t) - coeffs*time_series(t-length(coeffs):t-1)))
where * is the matrix multiplication assuming coeffs is a row vector and time_series is a column vector.

How to reduce dimensions of Gaussian Mixture Model parameters

Assuming I have already built a Gaussian Mixture Model using the fitgmdist function and want to map the multivariate distributions into a subspace with a smaller dimension without having to recreate the model how do I go about it?
In MATLAB terms, I have a GMM, gmm_goal, with gmm_goal.NumComponents = K and gmm_goal.NumVariables = N and want to reduce N to a number n < N.
If code isn't available, an explanation or mathematical derivation will do.
The parameters of the Gaussian Mixture Model effected by the transformation into a subspace are the mean and variance of the Gaussian distributions that form the GMM.
Assuming a linear transformation of your data points x:
y = A*x + b
Because of linearity of expectation, we can calculate the new mean and variance of the subspace from the old ones:
mean_new = A*mean + b
variance_new = A*variance*A'

How can I compute kernels in Matlab?

I want to calculate weighted kernels (for using in a SVM classifier) in Matlab but I'm currently compeletely confused.
I would like to implement the following weighted RBF and Sigmoid kernel:
x and y are vectors of size n, gamma and b are constants and w is a vector of size n with weights.
The problem now is that the fitcsvm method from Matlab need two matrices as input, i.e. K(X,Y). For example the not weighted RBF and sigmoid kernel can be computed as follows:
K_rbf = exp(-gamma .* pdist2(X,Y,'euclidean').^2)
K_sigmoid = tanh(gamma*X*Y' + b);
X and Y are matrices where the rows are the data points (vectors).
How can I compute the above weighted kernels efficiently in Matlab?
Simply scale your input by the weights before passing to the kernel equations. Lets assume you have a vector w of weights (of size of the input problem), you have your data in rows of X, and features are columns. Multiply it with broadcasting over rows (for example using bsxfun) with w. Thats all. Do not do the same to Y though, just multiply one of the matrices. This is true for every such "weighted" kernel based on scalar product (like sigmoid); for distance based (like RBF) you want to scale both by sqrt of w.
Short proofs:
scalar based
f(<wx, y>) = f(w<x, y>) (linearity of scalar product)
distance based
f(||sqrt(w)x - sqrt(w)y||^2) = f(SUM_i (sqrt(w_i)(x_i - y_i))^2)
= f(SUM_i w_i (x_i - y_i)^2)

How to select first component and calculate percentage of variation in PCA?

I have a matrix M where the columns are data points and the rows are features. Now I want to do PCA and select only the first component which has highest variance.
I know that I can do it in Matlab with [coeff,score,latent] = pca(M'). First I think I have to transpose matrix M.
How can I select now the first component? I'm not sure about the three different output matrices.
Second, I also want to calculate the percentage of variance explained for each component. How can I do this?
Indeed, you should transpose your input to have rows as data points and columns as features:
[coeff, score, latent, ~, explained] = pca(M');
The principal components are given by the columns of coeff in order of descending variance, so the first column holds the most important component. The variances for each component are given in latent, and the percentage of total variance explained is given in explained.
firstCompCoeff = coeff(:,1);
firstCompVar = latent(1);
For more information: pca documentation.
Note that the pca function requires the Statistics Toolbox. If you don't have it, you can either search the internet for an alternative or implement it yourself using svd.
If your matrix has dimensions m x n, where m is cases and n is variables:
% First you might want to normalize the matrix...
M = normalize(M);
% means very close to zero
round(mean(M),10)
% standard deviations all one
round(std(M),10)
% Perform a singular value decomposition of the matrix
[U,S,V] = svd(M);
% First Principal Component is the first column of V
V(:,1)
% Calculate percentage of variation
(var(S) / sum(var(S))) * 100

Find largest subset of linearly independent vectors with Matlab

I need to create a matlab function that finds the largest subset of linearly independent vectors in a matrix A.
Initialize the output of the program to be 0, which corresponds to the empty set (containing no column vectors). Scanning the columns of A from left to right one by one; if adding the current column vector to the set of linearly independent vectors found so far makes the new set of vectors linearly DEPENDENT, then skip this vector, otherwise add this vector to the solution set; and move to the next column.
function [ out ] = maxindependent(A)
%MAXINDEPENDENT takes a matrix A and produces an array in which the columns
%are a subset of independent vectors with maximum size.
[r c]= size(A);
out=0;
A=A(:,rank(A))
for jj=1:c
M=[A A(:,jj)]
if rank(M)~=size(M,2)
A=A
elseif rank(M)==size(M,2)
A=M
end
end
out=A
if max(out)==0
0;
end
end
The number of linearly independent vectors in a matrix is equal to the rank of the matrix, and a particular subset of linearly independent vectors is not unique. Any 'largest subset' of linearly independent vectors will have size equal to the rank.
There is a function for this in MATLAB:
n = rank(A);
The algorithm you described is not necessary; you should just use the SVD. There is a concise way to do it here: how to get the maximally independent vectors given a set of vectors in MATLAB?