How to calculate matrix entries efficently using Matlab - matlab

I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.

As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...

Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)

Related

Summation using for loop in MATLAB

J = 0;
sumTerm = 0;
for i=1:m
sumTerm = sumTerm + ((theta(1)+theta(2)*X(i))-y(i)).^2;
end
J = (1/2*m)*sumTerm;
Is this the right way to do summation ?
How about this:
J = 0.5 * sum(((theta(1)*ones(size(X))+theta(2)*X)-y).^2)/m
Or as #rayryeng pointed out, you can even drop the ones
J = 0.5 * sum(((theta(1)+theta(2)*X)-y).^2)/m
That's correct, but you'll want to implement that vectorized instead of using loops. You can take advantage of this by using linear algebra to compute the sum for you. You can compute theta(1) + theta(2)*X(i) - y(i) for each term by first creating the matrix X that is a matrix of points where the first column is appended with all ones and the next column contains your single feature / data points. You would finally compute the difference between the output from the prediction line and the true output for each data point by X*theta - y which would thus produce a vector of differences for each data point. This is also assuming that your array of points and theta are both column vectors, and I believe that this is the right structure since this looks like you're implementing the cost function for univariate linear regression from Andrew Ng's Machine Learning course.
You can then compute the dot product of this vector with itself to compute the sum of square differences, then you can divide by 2*m when you're done:
vec = [ones(m,1) X]*theta - y;
J = (vec.'*vec) / (2*m); %'
The reason why you should pursue a linear algebra solution instead is because native matrix and vector operations in MATLAB are very, very fast and if you can find a solution to your computational problems with linear algebra, it'll be the fastest you can ever get your code to compute things.
For example, see this post on why matrix multiplication in MATLAB is amongst the fastest when benchmarking with other platforms: Why is MATLAB so fast in matrix multiplication?

MATLAB - Efficient row-vector*Matrix*column-vector

I'm working on a piece of software in MATLAB and I believe I've reached the limit of my knowledge when it comes to optimisation and efficiency. Here's where the expertise of the people on StackOverflow might be helpful.
Using MATLAB's profiler, I've found that the last inefficient line of code is a multiplication of the following form:
function [energy] = getEnergy(S,W)
energy = -(S*W*S');
end
S is a 1xN row vector, W is an NxN matrix (it's not just a diagonal matrix though), and S' is a Nx1 column vector, whose multiplication returns a number.
I understand that this is a primitive operation, but I was wondering whether there is any way to speed this up.
I tried searching Google etc, but unfortunately I do not know the right keywords to search for. I apologise if this is a duplicate.
Thanks in advance.
Your implementation is correct, and the fastest.
You can save ~20-30% of computation time by performing it inside the main code, without call to the function.
>> S = randn(1, 500);
>> W = randn(500);
>> tic; for k = 1 : 10000, e = -(S * W * S'); end; toc
Elapsed time is 0.321595 seconds.
If the bottleneck stems from the fact that you need to repeat this computation for a LOT of different vectors S, then you can do the following vectorization:
% s is k-by-N matrix of k row vectors
energy = sum( ( s * W ) .* s, 2 ); % note the .* in the middle!

Duplicating a 2d matrix in matlab along a 3rd axis MANY times

I'm looking to duplication a 784x784 matrix in matlab along a 3rd axis. The following code seems to work:
mat = reshape(repmat(mat, 1,10000),784,784,10000);
Unfortunately, it takes so long to run it's worthless (changing the 10,000s to 1000 makes it take a few minutes, and using 10,000 makes my whole machine freeze up practically). is there a faster way to do this?
For reference, I'm looking to use mvnpdf on 10,000 vectors each of length 784, using the same covariance matrix for each. So my final call looks like
mvnpdf(X,mu,mat)
%size(X) = (10000,784), size(mu) = (10000,784), size(mat) = 784,784,10000
If there's a way to do this that's not repeating the covariance matrix 10,000 times, that'd be helpful too. Thanks!
For replication in more than 2 dimensions, you need to supply the replication counts as an array:
out = repmat(mat,[1,1,10000])
Creating a 784x784 matrix 10,000 times isn't going to take advantage of the vectorization in MATLAB, which is going to be more useful for small arrays. Avoiding a for loop also won't help too much, given the following:
The main speedup you can gain here is by computing the inverse of the covariance matrix once, and then computing the pdf yourself. The inverse of sigma takes O(n^3), and you are needlessly doing that 10,000 times. (Also, the square root determinant can be precomputed.) For reference, the PDF of the multivariate normal distribution is computed as follows:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Properties
Better to just compute the inverse once, and then compute z = x - mu for each value, then doing z'Sz for each pdf value, and applying a simple function and a constant. But wait! You can vectorize that, too.
I don't have MATLAB in front of me, but this is basically what you need to do, and it'll run in an instant.
s = inv(sigma);
c = -0.5*log(det(s)) - (k/2)*log(2*pi);
z = x - mu; % 10000 x 784 matrix
ans = exp( c - 0.5 .* dot(z*s, z, 2) ); % 10000 x 1 vector

MATLAB: Convolution of Matrix Valued Function

I've written this code to perform the 1-d convolution of a 2-d matrix valued function (k is my time index, kend is on the order of 10e3). Is there a faster or cleaner way to do this, perhaps using built in functions?
for k=1:kend
C(:,:,k)=zeros(3);
for l=0:k-1
C(:,:,k)=C(:,:,k)+A(:,:,k-l)*B(:,:,l+1);
end
end
NEW SOLUTION:
This is a newer solution built on the older solution, which solved the previously given formula. The code in the question is actually a modification of that formula, in which the overlap between the two matrices in the third dimension is repeatedly shifted (it's akin to a convolution along the third dimension of the data). The previous solution I gave only computed the result for the last iteration of the code in the question (i.e. k = kend). So, here's a full solution that should be much more efficient than the code in the question for kend on the order of 1000:
kend = size(A,3); %# Get the value for kend
C = zeros(3,3,kend); %# Preallocate the output
Anew = reshape(flipdim(A,3),3,[]); %# Reshape A into a 3-by-3*kend matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Reshape B into a 3*kend-by-3 matrix
for k = 1:kend
C(:,:,k) = Anew(:,3*(kend-k)+1:end)*Bnew(1:3*k,:); %# Index Anew and Bnew so
end %# they overlap in steps
%# of three
Even when using just kend = 100, this solution came out to be about 30 times faster for me than the one in the question and about 4 times faster than a pure for-loop-based solution (which would involve 5 loops!). Note that the discussion below of floating-point accuracy still applies, so it is normal and expected that you will see slight differences between the solutions on the order of the relative floating-point accuracy.
OLD SOLUTION:
Based on this formula you linked to in a comment:
it appears that you actually want to do something different than the code you provided in the question. Assuming A and B are 3-by-3-by-k matrices, the result C should be a 3-by-3 matrix and the formula from your link written out as a set of nested for loops would look like this:
%# Solution #1: for loops
k = size(A,3);
C = zeros(3);
for i = 1:3
for j = 1:3
for r = 1:3
for l = 0:k-1
C(i,j) = C(i,j) + A(i,r,k-l)*B(r,j,l+1);
end
end
end
end
Now, it is possible to perform this operation without any for loops by reshaping and reorganizing A and B appropriately:
%# Solution #2: matrix multiply
Anew = reshape(flipdim(A,3),3,[]); %# Create a 3-by-3*k matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Create a 3*k-by-3 matrix
C = Anew*Bnew; %# Perform a single matrix multiply
You could even rework the code you have in your question to create a solution with a single loop that performs a matrix multiply of your 3-by-3 submatrices:
%# Solution #3: mixed (loop and matrix multiplication)
k = size(A,3);
C = zeros(3);
for l = 0:k-1
C = C + A(:,:,k-l)*B(:,:,l+1);
end
So now the question: Which one of these approaches is faster/cleaner?
Well, "cleaner" is very subjective, and I honestly couldn't tell you which of the above pieces of code makes it any easier to understand what the operation is doing. All the loops and variables in the first solution make it a little hard to track what's going on, but it clearly mirrors the formula. The second solution breaks it all down into a simple matrix operation, but it's difficult to see how it relates to the original formula. The third solution seems like a middle-ground between the two.
So, let's make speed the tie-breaker. If I time the above solutions for a number of values of k, I get these results (in seconds needed to perform 10,000 iterations of the given solution, MATLAB R2010b):
k | loop | matrix multiply | mixed
-----+--------+-----------------+--------
5 | 0.0915 | 0.3242 | 0.1657
10 | 0.1094 | 0.3093 | 0.2981
20 | 0.1674 | 0.3301 | 0.5838
50 | 0.3181 | 0.3737 | 1.3585
100 | 0.5800 | 0.4131 | 2.7311 * The matrix multiply is now fastest
200 | 1.2859 | 0.5538 | 5.9280
Well, it turns out that for smaller values of k (around 50 or less) the for-loop solution actually wins out, showing once again that for loops are not as "evil" as they used to be considered in older versions of MATLAB. Under certain circumstances, they can be more efficient than a clever vectorization. However, when the value of k is larger than around 100, the vectorized matrix-multiply solution starts to win out, scaling much more nicely with increasing k than the for-loop solution does. The mixed for-loop/matrix-multiply solution scales atrociously for reasons that I'm not exactly sure of.
So, if you expect k to be large, I'd go with the vectorized matrix-multiply solution. One thing to keep in mind is that the results you get from each solution (the matrix C) will differ ever so slightly (on the level of the floating-point precision) since the order of additions and multiplications performed for each solution are different, thus leading to a difference in accumulation of rounding errors. In short, the difference between the results for these solutions should be negligible, but you should be aware of it.
Have you looked into Matlab's conv method?
I can't compare it against your provided code, because what you provided gives me a problem with trying to access the zeroth element of A. (When k=1, k-1=0.)
Have you considered using FFTs to convolve? A convolution operation is simply a point-wise multiplication in the frequency domain. You'll have to take some precaution with finite sequences, as you'll end up with circular convolution if you're not careful (but this is trivial to take care of).
Here's a simple example for a 1D case.
>> a=rand(4,1);
>> b=rand(3,1);
>> c=conv(a,b)
c =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
The same using FFTs
>> A=fft(a,6);
>> B=fft(b,6);
>> C=real(ifft(A.*B))
C =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
A convolution of an M point vector and an N point vector results in an M+N-1 point vector. So, I've padded each of the vectors a and b with zeros before taking the FFT (this is automatically taken care of when I take the 4+3-1=6 point FFT of it).
EDIT
Although the equation that you showed is similar to a circular convolution, it's not exactly it. So you can ditch the FFT approach, and the built-in conv* functions. To answer your question, here's the same operation done without explicit loops:
dim1=3;dim2=dim1;
dim3=10;
a=rand(dim1,dim2,dim3);
b=rand(dim1,dim2,dim3);
mIndx=cellfun(#(x)(1:x),num2cell(1:dim3),'UniformOutput',false);
fun=#(x)sum(reshape(cell2mat(cellfun(#(y,z)a(:,:,y)*b(:,:,z),num2cell(x),num2cell(fliplr(x)),'UniformOutput',false)),[dim1,dim2,max(x)]),3);
c=reshape(cell2mat(cellfun(#(x)fun(x),mIndx,'UniformOutput',false)),[dim1,dim2,dim3]);
mIndx here is a cell, where the ith cell contains a vector 1:i. This is your l index (as others have noted, please don't use l as a variable name).
The next line is an anonymous function that does the convolution operation, making use of the fact that the k index is just the l index flipped around. The operations are carried out on individual cells, and then assembled.
The last line actually performs the operations on the matrices.
The answer is the same as that obtained with the loops. However, you'll find that the looped solution is actually an order of magnitude faster (I averaged 0.007s for my code and 0.0006s for the loop). This is because the loop is pretty straightforward, whereas with this sort of nested construction, there's plenty of function call overheads and repeated reshaping that slow it down.
MATLAB's loops have come a long way since the early days when loops were dreaded. Certainly, vectorized operations are blazing fast; but not everything can be vectorized, and sometimes, loops are more efficient than such convoluted anonymous functions. I could probably shave off a few more tenths here and there by optimizing my construction (or maybe taking a different approach), but I'm not going to do that.
Remember that good code should be readable, as well as efficient and minor optimization at the cost of readability serves no one. Although I wrote the code above, I certainly will not be able to decipher what it does if I revisited it a month later. Your looped code was clear, readable and fast and I would suggest that you stick with it.

Faster projected-norm (quadratic-form, metric-matrix...) style computations

I need to perform lots of evaluations of the form
X(:,i)' * A * X(:,i) i = 1...n
where X(:,i) is a vector and A is a symmetric matrix. Ostensibly, I can either do this in a loop
for i=1:n
z(i) = X(:,i)' * A * X(:,i)
end
which is slow, or vectorise it as
z = diag(X' * A * X)
which wastes RAM unacceptably when X has a lot of columns. Currently I am compromising on
Y = A * X
for i=1:n
z(i) = Y(:,i)' * X(:,i)
end
which is a little faster/lighter but still seems unsatisfactory.
I was hoping there might be some matlab/scilab idiom or trick to achieve this result more efficiently?
Try this in MATLAB:
z = sum(X.*(A*X));
This gives results equivalent to Federico's suggestion using the function DOT, but should run slightly faster. This is because the DOT function internally computes the result the same way as I did above using the SUM function. However, DOT also has additional input argument checks and extra computation for cases where you are dealing with complex numbers, which is extra overhead you probably don't want or need.
A note on computational efficiency:
Even though the time difference is small between how fast the two methods run, if you are going to be performing the operation many times over it's going to start to add up. To test the relative speeds, I created two 100-by-100 matrices of random values and timed the two methods over many runs to get an average execution time:
METHOD AVERAGE EXECUTION TIME
--------------------------------------------
Z = sum(X.*Y); 0.0002595 sec
Z = dot(X,Y); 0.0003627 sec
Using SUM instead of DOT therefore reduces the execution time of this operation by about 28% for matrices with around 10,000 elements. The larger the matrices, the more negligible this difference will be between the two methods.
To summarize, if this computation represents a significant bottleneck in how fast your code is running, I'd go with the solution using SUM. Otherwise, either solution should be fine.
Try this:
z = dot(X, A*X)
I don't have Matlab here to test, but it works on Octave, so I expect Matlab to have an analogous dot() function.
From Octave's help:
-- Function File: dot (X, Y, DIM)
Computes the dot product of two vectors. If X and Y are matrices,
calculate the dot-product along the first non-singleton dimension.
If the optional argument DIM is given, calculate the dot-product
along this dimension.
For completeness, gnovice's answer in Scilab would be
z = sum(X .* Y, 1)'