Matlab Covariance Matrix Computation for Different Classes - matlab

I've got 2 different files, one of them is an input matrix (X) which has 3823*63 elements (3823 input and 63 features), the other one is a class vector (R) which has 3823*1 elements; those elements have values from 0 to 9 (there are 10 classes).
I have to compute covariance matrices for every classes. So far, i could only compute mean vectors for every classes with so many nested loops. However, it leads me to brain dead.
Is there any other easy way?
There is the code for my purpose (thanks to Sam Roberts):
xTra = importdata('optdigits.tra');
xTra = xTra(:,2:64); % first column's inputs are all zero
rTra = importdata('optdigits.tra');
rTra = rTra(:,65); % classes of the data
c = numel(unique(rTra));
for i = 1:c
rTrai = (rTra==i-1); % Get indices of the elements from the ith class
meanvect{i} = mean(xTra(rTrai,:)); % Calculate their mean
covmat{i} = cov(xTra(rTrai,:)); % Calculate their covariance
end

Does this do what you need?
X = rand(3263,63);
R = randi(10,3263,1)-1;
numClasses = numel(unique(R));
for i = 1:numClasses
Ri = (R==i); % Get indices of the elements from the ith class
meanvect{i} = mean(X(Ri,:)); % Calculate their mean
covmat{i} = cov(X(Ri,:)); % Calculate their covariance
end
This code loops through each of the classes, selects the rows of R that correspond to observations from that class, and then gets the same rows from X and calculates their mean and covariance. It stores them in a cell array, so you can access the results like this:
% Display the mean vector of class 1
meanvect{1}
% Display the covariance matrix of class 2
covmat{2}
Hope that helps!

Don't use mean and sum as a variable names because they are names of useful Matlab built-in functions. (Type doc mean or doc sum for usage help)
Also cov will calculate the covariance matrix for you.
You can use logical indexing to pull out the examples.
covarianceMatrices = cell(m,1);
for k=0:m-1
covarianceMatrices{k} = cov(xTra(rTra==k,:));
end
One-liner
covarianceMatrices = arrayfun(#(k) cov(xTra(rTra==k,:)), 0:m-1, 'UniformOutput', false);

First construct the data matrix for each class.
Second compute the covariance for each data matrix.
The code below does this.
% assume allData contains all the data you've read in, each row is one data point
% assume classVector contains the class of each data point
numClasses = 10;
data = cell(10,1); %use cells to store each of the data matrices
covariance = cell(10,1); %store each of the class covariance matrices
[numData dummy] = size(allData);
%get the data out of allData and into each class' data matrix
%there is probably a nice matrix way to do this, but this is hopefully clearer
for i = 1:numData
currentClass = classVector(i) + 1; %Matlab indexes from 1
currentData = allData(i,:);
data{currentClass} = [data{currentClass}; currentData];
end
%calculate the covariance matrix for each class
for i = 1:numClasses
covariance{i} = cov(data{i});
end

Related

Optimal way of doing iterative assembly of sparse matrices in Matlab?

My code needs to in a loop modify the elements of a sparse matrix. Doing this matlab warns me that This sparse indexing expression is likely to be slow. I am preallocating the sparse arrays using the Spalloc function but am still getting this warning. What is the optimal approach for assembling of sparse matrices? This is what I am currently doing.
K=spalloc(n,n,100); f=spalloc(n,1,100);
for i = 1:Nel
[Ke,fe] = myFunction(Ex(i),Ey(i));
inds = data(i,2:end);
K(inds,inds) = K(inds,inds) + Ke;
f(inds) = f(inds)+fe;
end
the indices in inds may be appear several times in the loop, meaning an element in K or f may receive multiple contributions. The last two lines within the loop are where I'm getting warnings.
A common approach is to use the triplet form of the sparse constructor:
S = sparse(i,j,v,m,n)
i and j are row and column index vectors and v is the corresponding data vector. Values corresponding to repeated indices are summed like your code does. So you could instead build up row and column index vectors along with a data vector and then just call sparse with those.
For example something like:
nout = Nel*(size(data,2)-1);
% Data vector for K
Kdata = zeros(1,nout);
% Data vector for f
fdata = zeros(1,nout);
% Index vector for K and f
sparseIdxvec = ones(1,nout);
outIdx = 1;
for i = 1:Nel
[Ke,fe] = myFunction(Ex(i),Ey(i));
inds = data(i,2:end);
nidx = numel(inds);
outIdxvec = outIdx:outIdx+nidx-1;
sparseIdxvec(outIdxvec) = inds;
Kdata(outIdxvec) = Ke;
fdata(outIdxvec) = fe;
outIdx = outIdx + nidx;
end
K = sparse(sparseIdxvec,sparseIdxvec,Kdata,n,n);
f = sparse(sparseIdxvec,1,fdata,n,1);
Depending on your application, that may or may not actually be faster.

Generate a random sparse matrix with N non-zero-elements

I've written a function that generates a sparse matrix of size nxd
and puts in each column 2 non-zero values.
function [M] = generateSparse(n,d)
M = sparse(d,n);
sz = size(M);
nnzs = 2;
val = ceil(rand(nnzs,n));
inds = zeros(nnzs,d);
for i=1:n
ind = randperm(d,nnzs);
inds(:,i) = ind;
end
points = (1:n);
nnzInds = zeros(nnzs,d);
for i=1:nnzs
nnzInd = sub2ind(sz, inds(i,:), points);
nnzInds(i,:) = nnzInd;
end
M(nnzInds) = val;
end
However, I'd like to be able to give the function another parameter num-nnz which will make it choose randomly num-nnz cells and put there 1.
I can't use sprand as it requires density and I need the number of non-zero entries to be in-dependable from the matrix size. And giving a density is basically dependable of the matrix size.
I am a bit confused on how to pick the indices and fill them... I did with a loop which is extremely costly and would appreciate help.
EDIT:
Everything has to be sparse. A big enough matrix will crash in memory if I don't do it in a sparse way.
You seem close!
You could pick num_nnz random (unique) integers between 1 and the number of elements in the matrix, then assign the value 1 to the indices in those elements.
To pick the random unique integers, use randperm. To get the number of elements in the matrix use numel.
M = sparse(d, n); % create dxn sparse matrix
num_nnz = 10; % number of non-zero elements
idx = randperm(numel(M), num_nnz); % get unique random indices
M(idx) = 1; % Assign 1 to those indices

Multiple constant to a matrix and convert them into block diagonal matrix in matlab

I have a1 a2 a3. They are constants. I have a matrix A. What I want to do is to get a1*A, a2*A, a3*A three matrices. Then I want transfer them into a diagonal block matrix. For three constants case, this is easy. I can let b1 = a1*A, b2=a2*A, b3=a3*A, then use blkdiag(b1, b2, b3) in matlab.
What if I have n constants, a1 ... an. How could I do this without any looping?I know this can be done by kronecker product but this is very time-consuming and you need do a lot of unnecessary 0 * constant.
Thank you.
Discussion and code
This could be one approach with bsxfun(#plus that facilitates in linear indexing as coded in a function format -
function out = bsxfun_linidx(A,a)
%// Get sizes
[A_nrows,A_ncols] = size(A);
N_a = numel(a);
%// Linear indexing offsets between 2 columns in a block & between 2 blocks
off1 = A_nrows*N_a;
off2 = off1*A_ncols+A_nrows;
%// Get the matrix multiplication results
vals = bsxfun(#times,A,permute(a,[1 3 2])); %// OR vals = A(:)*a_arr;
%// Get linear indices for the first block
block1_idx = bsxfun(#plus,[1:A_nrows]',[0:A_ncols-1]*off1); %//'
%// Initialize output array base on fast pre-allocation inspired by -
%// http://undocumentedmatlab.com/blog/preallocation-performance
out(A_nrows*N_a,A_ncols*N_a) = 0;
%// Get linear indices for all blocks and place vals in out indexed by them
out(bsxfun(#plus,block1_idx(:),(0:N_a-1)*off2)) = vals;
return;
How to use: To use the above listed function code, let's suppose you have the a1, a2, a3, ...., an stored in a vector a, then do something like this out = bsxfun_linidx(A,a) to have the desired output in out.
Benchmarking
This section compares or benchmarks the approach listed in this answer against the other two approaches listed in the other answers for runtime performances.
Other answers were converted to function forms, like so -
function B = bsxfun_blkdiag(A,a)
B = bsxfun(#times, A, reshape(a,1,1,[])); %// step 1: compute products as a 3D array
B = mat2cell(B,size(A,1),size(A,2),ones(1,numel(a))); %// step 2: convert to cell array
B = blkdiag(B{:}); %// step 3: call blkdiag with comma-separated list from cell array
and,
function out = kron_diag(A,a_arr)
out = kron(diag(a_arr),A);
For the comparison, four combinations of sizes of A and a were tested, which are -
A as 500 x 500 and a as 1 x 10
A as 200 x 200 and a as 1 x 50
A as 100 x 100 and a as 1 x 100
A as 50 x 50 and a as 1 x 200
The benchmarking code used is listed next -
%// Datasizes
N_a = [10 50 100 200];
N_A = [500 200 100 50];
timeall = zeros(3,numel(N_a)); %// Array to store runtimes
for iter = 1:numel(N_a)
%// Create random inputs
a = randi(9,1,N_a(iter));
A = rand(N_A(iter),N_A(iter));
%// Time the approaches
func1 = #() kron_diag(A,a);
timeall(1,iter) = timeit(func1); clear func1
func2 = #() bsxfun_blkdiag(A,a);
timeall(2,iter) = timeit(func2); clear func2
func3 = #() bsxfun_linidx(A,a);
timeall(3,iter) = timeit(func3); clear func3
end
%// Plot runtimes against size of A
figure,hold on,grid on
plot(N_A,timeall(1,:),'-ro'),
plot(N_A,timeall(2,:),'-kx'),
plot(N_A,timeall(3,:),'-b+'),
legend('KRON + DIAG','BSXFUN + BLKDIAG','BSXFUN + LINEAR INDEXING'),
xlabel('Datasize (Size of A) ->'),ylabel('Runtimes (sec)'),title('Runtime Plot')
%// Plot runtimes against size of a
figure,hold on,grid on
plot(N_a,timeall(1,:),'-ro'),
plot(N_a,timeall(2,:),'-kx'),
plot(N_a,timeall(3,:),'-b+'),
legend('KRON + DIAG','BSXFUN + BLKDIAG','BSXFUN + LINEAR INDEXING'),
xlabel('Datasize (Size of a) ->'),ylabel('Runtimes (sec)'),title('Runtime Plot')
Runtime plots thus obtained at my end were -
Conclusions: As you can see, either one of the bsxfun based methods could be looked into, depending on what kind of datasizes you are dealing with!
Here's another approach:
Compute the products as a 3D array using bsxfun;
Convert into a cell array with one product (matrix) in each cell;
Call blkdiag with a comma-separated list generated from the cell array.
Let A denote your matrix, and a denote a vector with your constants. Then the desired result B is obtained as
B = bsxfun(#times, A, reshape(a,1,1,[])); %// step 1: compute products as a 3D array
B = mat2cell(B,size(A,1),size(A,2),ones(1,numel(a))); %// step 2: convert to cell array
B = blkdiag(B{:}); %// step 3: call blkdiag with comma-separated list from cell array
Here's a method using kron which seems to be faster and more memory efficient than Divakar's bsxfun based solution. I'm not sure if this is different to your method, but the timing seems pretty good. It might be worth doing some testing between the different methods to work out which is more efficient for you problem.
A=magic(4);
a1=1;
a2=2;
a3=3;
kron(diag([a1 a2 a3]),A)

How to see resampled data after BOOTSTRAP

I was trying to resample (with replacement) my database using 'bootstrap' in Matlab as follows:
D = load('Data.txt');
lead = D(:,1);
depth = D(:,2);
X = D(:,3);
Y = D(:,4);
%Bootstraping to resample 100 times
[resampling100,bootsam] = bootstrp(100,'corr',lead,depth);
%plottig the bootstraping result as histogram
hist(resampling100,10);
... ... ...
... ... ...
Though the script written above is correct, I wonder how I would be able to see/load the resampled 100 datasets created through bootstrap? 'bootsam(:)' display the indices of the data/values selected for the bootstrap samples, but not the new sample values!! Isn't it funny that I'm creating fake data from my original data and I can't even see what is created behind the scene?!?
My second question: is it possible to resample the whole matrix (in this case, D) altogether without using any function? However, I know how to create random values from a vector data using 'unidrnd'.
Thanks in advance for your help.
The answer to question 1 is that bootsam provides the indices of the resampled data. Specifically, the nth column of bootsam provides the indices of the nth resampled dataset. In your case, to obtain the nth resampled dataset you would use:
lead_resample_n = lead(bootsam(:, n));
depth_resample_n = depth(bootsam(:, n));
Regarding the second question, I'm guessing what you mean is, how would you just get a re-sampled dataset without worrying about applying a function to the resampled data. Personally, I would use randi, but in this situation, it is irrelevant whether you use randi or unidrnd. An example follows that assumes 4 columns of some data matrix D (as in your question):
%# Build an example dataset
T = 10;
D = randn(T, 4);
%# Obtain a set of random indices, ie indices of draws with replacement
Ind = randi(T, T, 1);
%# Obtain the resampled data
DResampled = D(Ind, :);
To create multiple re-sampled data, you can simply loop over the creation of random indices. Or you could do it in one step by creating a matrix of random indices and using that to index D. With careful use of reshape and permute you can turn this into a T*4*M array, where indexing m = 1, ..., M along the third dimension yields the mth resampled dataset. Example code follows:
%# Build an example dataset
T = 10;
M = 3;
D = randn(T, 4);
%# Obtain a set of random indices, ie indices of draws with replacement
Ind = randi(T, T, M);
%# Obtain the resampled data
DResampled = permute(reshape(D(Ind, :)', 4, T, []), [2 1 3]);

Table of correlation values

If you run the following code you will end up with a cell array composed of a correlation value in CovMatrix(:,3) and the name of the data used in calculating the correlation in CovMatrix(:,1) and CovMatrix(:,2):
clear all
FieldName = {'Name1','Name2','Name3','Name4','Name5'};
Data={rand(12,1),rand(12,1),rand(12,1),rand(12,1),rand(12,1)};
DataCell = [FieldName;Data];%place in a structure - this is the same
%structure that the data for the lakes will be placed in.
DataStructure = struct(DataCell{:});
FieldName = fieldnames(DataStructure);
Combinations = nchoosek (1:numel(FieldName),2);
d1 = cell2mat(struct2cell(DataStructure)');%this will be the surface temperatures
%use the combinations found in 'Combinations' to define which elements to
%use in calculating the coherence.
R = cell(1,size(Combinations,1));%pre-allocate the cell array
Names1 = cell(1,size(Combinations,1));
for j = 1:size(Combinations,1);
[R{j},P{j}] = corrcoef([d1(:,[Combinations(j,1)]),d1(:,[Combinations(j,2)])]);
Names1{j} = ([FieldName([Combinations(j,1)],1),FieldName([Combinations(j,2)],1)]);
end
%only obtain a single value for the correlation and p-value
for i = 1:size(Combinations,1);
R{1,i} = R{1,i}(1,2);
P{1,i} = P{1,i}(1,2);
end
R = R';P = P';
%COVARIANCE MATRIX
CovMatrix=cell(size(Combinations,1),3);%pre-allocate memory
for i=1:size(Combinations,1);
CovMatrix{i,3}=R{i,1};
CovMatrix{i,1}=Names1{1,i}{1,1};
CovMatrix{i,2}=Names1{1,i}{1,2};
end
From this I need to produce a table of the values, preferably in the form of a correlation matrix, similar to jeremytheadventurer.blogspot.com. Would this be possible in MATLAB?
You can compute the correlation matrix of your entire data set in one shot using corrcoef command:
% d1 can be simply computed as
d1_new = cell2mat(Data);
% Make sure that d1_new is the same matrix as d1
max(abs(d1(:)-d1_new(:)))
% Compute correlation matrix of columns of data in d1_new in one shot
CovMat = corrcoef(d1_new)
% Make sure that entries in CovMat are equivalent to the third column of
% CovMatrix, e.g.
CovMat(1,2)-CovMatrix{1,3}
CovMat(1,4)-CovMatrix{3,3}
CovMat(3,4)-CovMatrix{8,3}
CovMat(4,5)-CovMatrix{10,3}
Because the correlation matrix CovMat is symmetric, this contains the required result if you ignore the upper triangular part.