MATLAB: Efficient (vectorized) way to apply function on two matrices? - matlab

I have two matrices X and Y, both of order mxn. I want to create a new matrix O of order mxm such that each i,j th entry in this new matrix is computed by applying a function to ith and jth row of X and Y respectively. In my case m = 10000 and n = 500. I tried using a loop but it takes forever. Is there an efficient way to do it?
I am targeting two functions dot product -- dot(row_i, row_j) and exp(-1*norm(row_i-row_j)). But I was wondering if there is a general way so that I can plugin any function.

Solution #1
For the first case, it looks like you can simply use matrix multiplication after transposing Y -
X*Y'
If you are dealing with complex numbers -
conj(X*ctranspose(Y))
Solution #2
For the second case, you need to do a little more work. You need to use bsxfun with permute to re-arrange dimensions and employ the raw form of norm calculations and finally squeeze to get a 2D array output -
squeeze(exp(-1*sqrt(sum(bsxfun(#minus,X,permute(Y,[3 2 1])).^2,2)))
If you would like to avoid squeeze, you can use two permute's -
exp(-1*sqrt(sum(bsxfun(#minus,permute(X,[1 3 2]),permute(Y,[3 1 2])).^2,3)))
I would also advise you to look into this problem - Efficiently compute pairwise squared Euclidean distance in Matlab.
In conclusion, there isn't a common most efficient way that could be employed for every function to ith and jth row of X. If you are still hell bent on that, you can use anonymous function handles with bsxfun, but I am afraid it won't be the most efficient technique.

For the second part, you could also use pdist2:
result = exp(-pdist2(X,Y));

Related

Multiplying multi-dimensional matrices efficiently

I'd love to know if there is a more efficient way to multiply specific elements of multi-dimensional matrices that doesn't require a 'for' loop.
I have a region * time matrix for an individual (say, 50 regions and 1000 timepoints) and I want to multiply each pair of regions at each timepoint to create a new matrix of the products of each region pair at each time point (50 x 50 x 1000). The way that I'm currently running it is:
for t = 1:1000
for i = 1:50
for j = 1:50
new(i,j,t) = old(i,t) .* old(j,t)
As I'm sure you can imagine, this is super slow. Any ideas on how i can fix it up so that it will run more quickly?
%some example data easy to trace
old=[1:5]'
old(:,2)=old*i
%multiplicatiion
a=permute(old,[1,3,2])
b=permute(old,[3,1,2])
bsxfun(#times,a,b)
permute is used to make 3d-matrices with dimensions n*1*m and 1*n*m out of the n*m input matrix. Changing the dimensions this way, new(i,j,k) can be calculated using new(i,j,k)=a(i,1,k)*b(1,j,k). Applying such operations element-by-element is what bsxfun was designed for.
Regarding bsxfun, try to understand simple 2d-examples like bsxfun(#times,[1:7],[1,10,100]') first

Remove duplicates in correlations in matlab

Please see the following issue:
P=rand(4,4);
for i=1:size(P,2)
for j=1:size(P,2)
[r,p]=corr(P(:,i),P(:,j))
end
end
Clearly, the loop will cause the number of correlations to be doubled (i.e., corr(P(:,1),P(:,4)) and corr(P(:,4),P(:,1)). Does anyone have a suggestion on how to avoid this? Perhaps not using a loop?
Thanks!
I have four suggestions for you, depending on what exactly you are doing to compute your matrices. I'm assuming the example you gave is a simplified version of what needs to be done.
First Method - Adjusting the inner loop index
One thing you can do is change your j loop index so that it only goes from 1 up to i. This way, you get a lower triangular matrix and just concentrate on the values within the lower triangular half of your matrix. The upper half would essentially be all set to zero. In other words:
for i = 1 : size(P,2)
for j = 1 : i
%// Your code here
end
end
Second Method - Leave it unchanged, but then use unique
You can go ahead and use the same matrix like you did before with the full two for loops, but you can then filter the duplicates by using unique. In other words, you can do this:
[Y,indices] = unique(P);
Y will give you a list of unique values within the matrix P and indices will give you the locations of where these occurred within P. Note that these are column major indices, and so if you wanted to find the row and column locations of where these locations occur, you can do:
[rows,cols] = ind2sub(size(P), indices);
Third Method - Use pdist and squareform
Since you're looking for a solution that requires no loops, take a look at the pdist function. Given a M x N matrix, pdist will find distances between each pair of rows in a matrix. squareform will then transform these distances into a matrix like what you have seen above. In other words, do this:
dists = pdist(P.', 'correlation');
distMatrix = squareform(dists);
Fourth Method - Use the corr method straight out of the box
You can just use corr in the following way:
[rho, pvals] = corr(P);
corr in this case will produce a m x m matrix that contains the correlation coefficient between each pair of columns an n x m matrix stored in P.
Hopefully one of these will work!
this works ?
for i=1:size(P,2)
for j=1:i
Since you are just correlating each column with the other, then why not just use (straight from the documentation)
[Rho,Pval] = corr(P);
I don't have the Statistics Toolbox, but according to http://www.mathworks.com/help/stats/corr.html,
corr(X) returns a p-by-p matrix containing the pairwise linear correlation coefficient between each pair of columns in the n-by-p matrix X.

How to vectorize the intersection kernel function in MATLAB?

I need to pre-compute the histogram intersection kernel matrices for using LIBSVM in MATLAB.
Assume x, y are two vectors. The kernel function is K(x, y) = sum(min(x, y)). In order to be efficient, the best practice in most cases is to vectorize the operations.
What I want to do is like calculate the kernel matrices like calculating the euclidean distance between two matrices, like pdist2(A, B, 'euclidean'). After defining function 'intKernel', I could calculate the intersection kernel by calling pdist2(A, B, intKernel).
I know the function 'pdist2' may be an option. But I have no idea how to write the self-defined distance function. While, I do not know how to code the intersection kernel between vector(1-by-M) and matrix(M-by-N) in one condense expression.
'repmat' may not be feasible, because the matrix is really large, let us say, 20000-by-360000.
Any help would be appreciated.
Regards,
Peiyun
I think pdist2 is a good option, so I help you to define your distance function.
According to the doc, the self-defined distance function must have 2 inputs: first one is a 1-by-N vector; second one is a M-by-N matrix (be careful of the order!).
To avoid the use of repmat which is indeed memory-consumant, you can use bsxfun to apply some basic operations on data with expansion over singleton dimensions. In your case, you can do the following thing:
distance_kernel = #(x,Y) sum(bsxfun(#min,x,Y),2);
Summation is done over the columns to get a column vector as output.
Then just call pdist2 and you are done.

MATLAB submatrix

MATLAB question:
I have an array A(2,2,2) that is three-dimensional. I would like to define a 2x2 array as a subarray of A, as follows:
B = A(1,:,:).
That is, we are simply projecting on the first component. But matlab will now treat this 2x2 matrix as a 1x2x2 array instead, so that I can't do certain things (like multiply by another 2x2 matrix).
How do I get B as a 2x2 subarray of A?
If you think about a skyscraper, your A(1,:,:) is taking the first floor out and this operation inevitably happens across the 3rd dimension.
You can use reshape(), squeeze() or permute() to get rid of the singleton dimension:
reshape(A(1,:,:),2,2)
squeeze(A(1,:,:))
permute(A(1,:,:),[2,3,1])
squeeze() pretty much does all the job by itself, however it is not an inbuilt function and in fact uses reshape(). The other two alternatives are expected to be faster.
You'd want to use the function squeeze which removes the singleton dimensions:
B = squeeze(A(1,:,:))

Matlab - Matrix Division by Zero - Zeros and NaNs

I am trying to convert a precision matrix sigmaT to a covariance matrix. I've tried two approaches:
covMat = zeros(size(sigmaT));
for i=1:t
covMat(:, :, i) = eye(D)/sigmaT(:,:,i);
end
and
covMat = bsxfun(#rdivide, eye(D), sigmaT);
Some of the elements in sigmaT are zero, so division by zero occurs. The first loop-based solution keeps the elements where division by 0 occurs as 0, the second approach sets the elements to NaN.
My questions would be: why do they behave differently and how can I change the second one-line approach to behave as the loop-based approach? I believe the latter solution should be significantly faster on large matrices.
Your loop based approach is performing matrix division, that is the result for each iteration is the matrix inverse of sigmaT(:,:,i). You can adjust the loop to perform per-element math by using the ./ operator (instead of /).
Your bsxfun based approach is performing per-element division, that is each individual element is inverted. There is no way to use bsxfun to perform a matrix operation on each 2D matrix contained in a 3D array.
These answers are very different. You should use whatever approach is appropriate for your problem. The performance difference between the two is probably relatively small.