Correlation coefficient between rows of two matrices - matlab

I have the following code, how will I be able to simplify it using the function as it currently runs pretty slow, assume X is 10x7 and Y is 4x7 and D is a matrix stores the correlation between each pair of vectors. If the solution is to use the xcorr2 function can someone show me how it is done?
for i = 1:4
for j = 1:10
D(j,i) = corr2(X(j,:),Y(i,:));
end
end

Use pdist2 (Statistics toolbox) with 'correlation' option. It's faster than your code (even with preallocation), and requires just one line:
D = 1-pdist2(X,Y,'correlation');

Here is how I would do it:
First of all, store/process your matrix transposed. This makes for easier use of the correlation function.
Now assuming you have matrices X and Y and want to get the correlations between combinations of columns, this is easily achieved with a single loop:
Take the first column of X
use corrcoef to determine the correlation with all columns of Y at once.
As long as there is one, take the next column of X
Alternate to this, you can check whether it helps to replace corr2 in your original code with corr, xcorr or corrcoef and see which one runs fastest.

With corrcoef you can do this without a loop, and without using a toolbox:
D = corrcoef([X', Y']);
D = D(1 : size(X, 1), end - size(Y, 1) + 1 : end);
A drawback is that more coefficients are computed than necessary.
The transpose is necessary because your matrices do not follow the Matlab convention to enumerate samples with the first index and variables with the second.

Related

How to generate a matrix automatically with given n in matlab

For the linear regression, I want to generate the matrix for polynomials of n degree.
if n is 1
X=[x(:), ones(length(x),1)]
if n is 2
X=[x(:).^2 x(:) ones(length(x),1)]
...
if n is 5
X=[x(:).^5 x(:).^4 x(:).^3 x(:).^2 x(:) ones(length(x),1)]
I do not know how to code with matlab if I set n=6 and it will automatically generate the wanted X matrix. Hope you can help me.
This can be easily done with bsxfun:
X = bsxfun(#power, x(:), n:-1:0);
Or, in Matlab versions from R1016b onwards, you can use implicit expansion:
X = x(:).^(n:-1:0);
Check out the polyval function. I believe that will do what you’re looking for.
To get increasing the polynomial to increase in degree, you can increase the length of your p argument using a loop.
If you write edit polyfit you can see how MATLAB have implemented the polyfit command, which is similar to what you are trying to do. In there you will find the code
% Construct the Vandermonde matrix V = [x.^n ... x.^2 x ones(size(x))]
V(:,n+1) = ones(length(x),1,class(x));
for j = n:-1:1
V(:,j) = x.*V(:,j+1);
end
Which constructs the matrix you are interested in. The benefit of this method over the bsxfun is that you only calculate x(:).^n and then saves the intermediary results. Instead of treating all powers as seperate problems, e.g. x(:)^(n-1) as a seperate problem to x(:).^n.

How to vectorize nested for loop for operations on different indices of the same matrix/array?

I've been trying to vectorize a for loop in Matlab because it bottlenecks the whole program but I have no clue about how to do it when iterating on different rows or columns of a single matrix inside of the loop. Here's the code :
// X is an n*k matrix, W is n*n
// W(i,j) stores the norm of the vector resulting from subtracting
// the j-th line of X from its i-th line.
for i = 1:n
for j = 1:n
W(i,j) = norm(X(i,:) - X(j,:))
end
end
Edit :
I chose Luis Mendo's answer as it was the most convenient for me and the one that's closer to the mathematical concept behind my algorithm, yet all three answers were correct and I'd advise to use the one that's the most convenient depending on which Matlab Toolboxes you have or the way you want to code.
I also noticed that the common point of all answers was to use a different format, for example : using an array that would store the indices, reshaping current arrays, using one more dimension...
So I think that's the right thing to explore if you have a similar problem.
Using combinatory:
% sample data
X = randi(9,5,4);
n = size(X,1);
% row index combinations
combIdx = combvec(1:n,1:n);
% difference between row combinations
D = X(combIdx(1,:),:)-X(combIdx(2,:),:);
% norm of each row
W = diag(sqrt(D*D'));
% reshape
W = reshape(W,n,[]);
Here is another solution that may be more efficient for big matrices.
For Matlab >=R2016 you can simple use
W = sqrt(sum(abs(reshape(X,n,1,k) - reshape(X,1,n,k)).^2,3))
(if your array is real-valued you can also skip the abs). For earlier versions of Matlab you need to add some repmat magic, namely:
W = sqrt(sum(abs(repmat(reshape(X,n,1,k),[1,n,1]) - repmat(reshape(X,1,n,k),[n,1,1])).^2,3));
P.S.: R2017b might make it even more convenient, at least the release notes mention a function called vecnorm that could replace the somewhat ugly sqrt(sum(abs(.).^2))) thing. But the documentation is not up yet so I don't know what exactly it will do. "
If you have the Statistics Toolbox, use pdist:
W = squareform(pdist(X));
No toolbox: good old bsxfun:
W = sqrt(sum(bsxfun(#minus, permute(X, [1 3 2]), permute(X, [3 1 2])).^2, 3));

How to calculate matrix entries efficently using Matlab

I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.
As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...
Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)

Remove duplicates in correlations in matlab

Please see the following issue:
P=rand(4,4);
for i=1:size(P,2)
for j=1:size(P,2)
[r,p]=corr(P(:,i),P(:,j))
end
end
Clearly, the loop will cause the number of correlations to be doubled (i.e., corr(P(:,1),P(:,4)) and corr(P(:,4),P(:,1)). Does anyone have a suggestion on how to avoid this? Perhaps not using a loop?
Thanks!
I have four suggestions for you, depending on what exactly you are doing to compute your matrices. I'm assuming the example you gave is a simplified version of what needs to be done.
First Method - Adjusting the inner loop index
One thing you can do is change your j loop index so that it only goes from 1 up to i. This way, you get a lower triangular matrix and just concentrate on the values within the lower triangular half of your matrix. The upper half would essentially be all set to zero. In other words:
for i = 1 : size(P,2)
for j = 1 : i
%// Your code here
end
end
Second Method - Leave it unchanged, but then use unique
You can go ahead and use the same matrix like you did before with the full two for loops, but you can then filter the duplicates by using unique. In other words, you can do this:
[Y,indices] = unique(P);
Y will give you a list of unique values within the matrix P and indices will give you the locations of where these occurred within P. Note that these are column major indices, and so if you wanted to find the row and column locations of where these locations occur, you can do:
[rows,cols] = ind2sub(size(P), indices);
Third Method - Use pdist and squareform
Since you're looking for a solution that requires no loops, take a look at the pdist function. Given a M x N matrix, pdist will find distances between each pair of rows in a matrix. squareform will then transform these distances into a matrix like what you have seen above. In other words, do this:
dists = pdist(P.', 'correlation');
distMatrix = squareform(dists);
Fourth Method - Use the corr method straight out of the box
You can just use corr in the following way:
[rho, pvals] = corr(P);
corr in this case will produce a m x m matrix that contains the correlation coefficient between each pair of columns an n x m matrix stored in P.
Hopefully one of these will work!
this works ?
for i=1:size(P,2)
for j=1:i
Since you are just correlating each column with the other, then why not just use (straight from the documentation)
[Rho,Pval] = corr(P);
I don't have the Statistics Toolbox, but according to http://www.mathworks.com/help/stats/corr.html,
corr(X) returns a p-by-p matrix containing the pairwise linear correlation coefficient between each pair of columns in the n-by-p matrix X.

Simple way to multiply column elements by corresponding vector elements in MATLAB?

I want to multiply every column of a M × N matrix by corresponding element of a vector of size N.
I know it's possible using a for loop. But I'm seeking a more simple way of doing it.
I think this is what you want:
mat1=randi(10,[4 5]);
vec1=randi(10,[1 5]);
result=mat1.*repmat(vec1,[size(mat1,1),1]);
rempat will replicate vec1 along the rows of mat1. Then we can do element-wise multiplication (.*) to "multiply every column of a M × N matrix by corresponding element of a vector of size N".
Edit: Just to add to the computational aspect. There is an alternative to repmat that I would like you to know. Matrix indexing can achieve the same behavior as repmat and be faster. I have adopted this technique from here.
Observe that you can write the following statement
repmat(vec1,[size(mat1,1),1]);
as
vec1([1:size(vec1,1)]'*ones(1,size(mat1,1)),:);
If you see closely, the expression boils down to vec1([1]'*[1 1 1 1]),:); which is again:
vec1([1 1 1 1]),:);
thereby achieving the same behavior as repmat and be faster. I ran three solutions 100000 times, namely,
Solution using repmat : 0.824518 seconds
Solution using indexing technique explained above : 0.734435 seconds
Solution using bsxfun provided by #LuisMendo : 0.683331 seconds
You can observe that bsxfun is slightly faster.
Although you can do it with repmat (as in #Parag's answer), it's often more efficient to use bsxfun. It also has the advantage that the code (last line) is the same for a row and for a column vector.
%// Example data
M = 4;
N = 5;
matrix = rand(M,N);
vector = rand(1,N); %// or size M,1
%// Computation
result = bsxfun(#times, matrix, vector); %// bsxfun does an "implicit" repmat