How can I speed up this simple matrix multiplication in Matlab? - matlab

I do not have much experience with Matlab, which is why I am asking this question here to get some directions to start looking.
I have this code:
A = A - A(:,i)*(A(i,:)/(delta + A(i,i)));
The A matrix is 216x31285, which means this computation is quite expensive for the number of times it is executed. It executes for all rows (216) for each data set (28) so naturally the 0.192 seconds it takes is quite a bit. Any ideas on how I can speed this up?

The easiest way to speed up code in MATLAB is to take advantage of vectorization to get rid of loops. Instead of looping each line or column, attempt to apply whole operations instead, e.g., to multiply all lines of a matrix M with appropriate vector V do:
M = magic(5)
V = rand(5,1)
M = M.*repmat(V,[size(M,1) 1])
will in general work much faster than the equivalent for loop.
The actual implementation of vectorization is specific to each problem but very useful operators are the elementwise operators, e.g.: .* ./ .^, etc. Also the repmat function is extremely useful.
In your case however you are applying a recursive operation to matrix A:
A = A - f(i) = A - (prevA - f(i-1)) = ...
which means that you cannot apply all iterations at once as you would normally do when vectorizing code.
In other words, at each iteration i, matrix A depends on matrix A at the previous iteration and therefore it is impossible to operate all iterations at once using the equation you provide.

An addendum to #Pedro's answer. If you have such a recursive equation you can usually trade space for time.
In the first step you update a matrix, say B (following Pedro's notation):
B = A - f(i)
in the second step you update A:
A = B - f(i)
in the third step, well that's the same as the first step. And so on.

Related

Efficient implementation of a sequence of matrix-vector products / specific "tensor"-matrix product

I have a special algorithm where as one of the lasts steps I need to carry out a multiplication of a 3-D array with a 2-D array such that each matrix-slice of the 3-D array is multiplied wich each column of the 2-D array. In other words, if, say A is an N x N x N matrix and B is an N x N matrix, I need to compute a matrix C of size N x N where C(:,i) = A(:,:,i)*B(:,i);.
The naive way to implement this is a loop, i.e.,
C = zeros(N,N);
for i = 1:N
C(:,i) = A(:,:,i)*B(:,i);
end
However, loops aren't the fastest in Matlab and should be avoided. I'm looking for faster ways of doing this. Right now, what I do is to use the fact that (now Mathjax would be great!):
[A1 b1, A2 b2, ..., AN bN] = [A1, A2, ..., AN]*blkdiag(b1,b2,...,bN)
This allows to get rid of the loop, however, we have to create a block-diagonal matrix of size N^2 x N. I'm making it via sparse to be efficient, i.e., like this:
A_long = reshape(A,N,N^2);
b_cell = mat2cell(B,N,ones(1,N)); % convert matrix to cell array of vectors
b_cell{1} = sparse(b_cell{1}); % make first element sparse, this is enough to trigger blkdiag into sparse mode
B_blk = blkdiag(b_cell{:});
C = A_long*B_blk;
According to my benchmarks, this approach is faster than the loop by a factor of around two (for large N), despite the necessary preparations (the multiplication alone is 3 to 4-fold faster than the loop).
Here is a quick benchmark I did, varying the problem size N and measuring the time for the loop and the alternative approach (with and without the preparation steps). For large N the speedup is around 2...2.5.
Still, this looks awfully complicated to me. Is there a simpler or better way to achieve this? This looks like it's a quite generic/standard problem so I could imagine that solutions are around, I just don't know what to search for really.
P.S.: blkdiag(A1,...,AN)*B is an obvious alternative but here the block diagonal is already N^2 x N^2 so I don't think it can be better than what I did.
edit: Thanks to everyone for commenting! I have carried out a new benchmark on a Matlab R2016b. Unfortunately, I do not have both versions on the same computer so we cannot compare the absolute numbers but the relative comparison is still interesting, since it has changed a bit. Here it is:
And here is a zoom on the high-N area:
Couple of observations:
SumRepDot is the solution proposed by Divakar, namely, to use squeeze(sum(bsxfun(#times,A,permute(B,[3,1,2])),2)) which on R2016b simplifies to squeeze(sum(A.*permute(B,[3,1,2]),2)). It is faster than the loop for high N by a factor of around 1.2...1.4.
The loop is still "slow" in a sense that the multiplication with the sparse block diagonal matrix is much faster.
For the latter, the preparation overhead seems to become negligible for high N which makes it overall a factor of 3...4 faster than the loop. This is a nice result.

Obtaining the constant that makes the integral equal to zero in Matlab

I'm trying to code a MATLAB program and I have arrived at a point where I need to do the following. I have this equation:
I must find the value of the constant "Xcp" (greater than zero), that is the value that makes the integral equal to zero.
In order to do so, I have coded a loop in which the the value of Xcp advances with small increments on each iteration and the integral is performed and checked if it's zero, if it reaches zero the loop finishes and the Xcp is stored with this value.
However, I think this is not an efficient way to do this task. The running time increases a lot, because this loop is long and has the to perform the integral and the integration limits substitution every time.
Is there a smarter way to do this in Matlab to obtain a better code efficiency?
P.S.: I have used conv() to multiply both polynomials. Since cl(x) and (x-Xcp) are both polynomials.
EDIT: Piece of code.
p = [1 -Xcp]; % polynomial (x-Xcp)
Xcp=0.001;
i=1;
found=false;
while(i<=x_te && found~=true) % Xcp is upper bounded by x_te
int_cl_p = polyint(conv(cl,p));
Cm_cp=(-1/c^2)*diff(polyval(int_cl_p,[x_le,x_te]));
if(Cm_cp==0)
found=true;
else
Xcp=Xcp+0.001;
end
end
This is the code I used to run this section. Another problem is that I have to do it for different cases (different cl functions), for this reason the code is even more slow.
As far as I understood, you need to solve the equation for X_CP.
I suggest using symbolic solver for this. This is not the most efficient way for large polynomials, but for polynomials of degree 20 it takes less than 1 second. I do not claim that this solution is fastest, but this provides generic solution to the problem. If your polynomial does not change every iteration, then you can use this generic solution many times and not spend time for calculating integral.
So, generic symbolic solution in terms of xLE and xTE is obtained using this:
syms xLE xTE c x xCP
a = 1:20;
%//arbitrary polynomial of degree 20
cl = sum(x.^a.*randi([-100,100],1,20));
tic
eqn = -1/c^2 * int(cl * (x-xCP), x, xLE, xTE) == 0;
xCP = solve(eqn,xCP);
pretty(xCP)
toc
Elapsed time is 0.550371 seconds.
You can further use matlabFunction for finding the numerical solutions:
xCP_numerical = matlabFunction(xCP);
%// we then just plug xLE = 10 and xTE = 20 values into function
answer = xCP_numerical(10,20)
answer =
19.8038
The slight modification of the code can allow you to use this for generic coefficients.
Hope that helps
If you multiply by -1/c^2, then you can rearrange as
and integrate however you fancy. Since c_l is a polynomial order N, if it's defined in MATLAB using the usual notation for polyval, where coefficients are stored in a vector a such that
then integration is straightforward:
MATLAB code might look something like this
int_cl_p = polyint(cl);
int_cl_x_p = polyint([cl 0]);
X_CP = diff(polyval(int_cl_x_p,[x_le,x_te]))/diff(polyval(int_cl_p,[x_le,x_te]));

How to calculate matrix entries efficently using Matlab

I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.
As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...
Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)

MATLAB: Convolution of Matrix Valued Function

I've written this code to perform the 1-d convolution of a 2-d matrix valued function (k is my time index, kend is on the order of 10e3). Is there a faster or cleaner way to do this, perhaps using built in functions?
for k=1:kend
C(:,:,k)=zeros(3);
for l=0:k-1
C(:,:,k)=C(:,:,k)+A(:,:,k-l)*B(:,:,l+1);
end
end
NEW SOLUTION:
This is a newer solution built on the older solution, which solved the previously given formula. The code in the question is actually a modification of that formula, in which the overlap between the two matrices in the third dimension is repeatedly shifted (it's akin to a convolution along the third dimension of the data). The previous solution I gave only computed the result for the last iteration of the code in the question (i.e. k = kend). So, here's a full solution that should be much more efficient than the code in the question for kend on the order of 1000:
kend = size(A,3); %# Get the value for kend
C = zeros(3,3,kend); %# Preallocate the output
Anew = reshape(flipdim(A,3),3,[]); %# Reshape A into a 3-by-3*kend matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Reshape B into a 3*kend-by-3 matrix
for k = 1:kend
C(:,:,k) = Anew(:,3*(kend-k)+1:end)*Bnew(1:3*k,:); %# Index Anew and Bnew so
end %# they overlap in steps
%# of three
Even when using just kend = 100, this solution came out to be about 30 times faster for me than the one in the question and about 4 times faster than a pure for-loop-based solution (which would involve 5 loops!). Note that the discussion below of floating-point accuracy still applies, so it is normal and expected that you will see slight differences between the solutions on the order of the relative floating-point accuracy.
OLD SOLUTION:
Based on this formula you linked to in a comment:
it appears that you actually want to do something different than the code you provided in the question. Assuming A and B are 3-by-3-by-k matrices, the result C should be a 3-by-3 matrix and the formula from your link written out as a set of nested for loops would look like this:
%# Solution #1: for loops
k = size(A,3);
C = zeros(3);
for i = 1:3
for j = 1:3
for r = 1:3
for l = 0:k-1
C(i,j) = C(i,j) + A(i,r,k-l)*B(r,j,l+1);
end
end
end
end
Now, it is possible to perform this operation without any for loops by reshaping and reorganizing A and B appropriately:
%# Solution #2: matrix multiply
Anew = reshape(flipdim(A,3),3,[]); %# Create a 3-by-3*k matrix
Bnew = reshape(permute(B,[1 3 2]),[],3); %# Create a 3*k-by-3 matrix
C = Anew*Bnew; %# Perform a single matrix multiply
You could even rework the code you have in your question to create a solution with a single loop that performs a matrix multiply of your 3-by-3 submatrices:
%# Solution #3: mixed (loop and matrix multiplication)
k = size(A,3);
C = zeros(3);
for l = 0:k-1
C = C + A(:,:,k-l)*B(:,:,l+1);
end
So now the question: Which one of these approaches is faster/cleaner?
Well, "cleaner" is very subjective, and I honestly couldn't tell you which of the above pieces of code makes it any easier to understand what the operation is doing. All the loops and variables in the first solution make it a little hard to track what's going on, but it clearly mirrors the formula. The second solution breaks it all down into a simple matrix operation, but it's difficult to see how it relates to the original formula. The third solution seems like a middle-ground between the two.
So, let's make speed the tie-breaker. If I time the above solutions for a number of values of k, I get these results (in seconds needed to perform 10,000 iterations of the given solution, MATLAB R2010b):
k | loop | matrix multiply | mixed
-----+--------+-----------------+--------
5 | 0.0915 | 0.3242 | 0.1657
10 | 0.1094 | 0.3093 | 0.2981
20 | 0.1674 | 0.3301 | 0.5838
50 | 0.3181 | 0.3737 | 1.3585
100 | 0.5800 | 0.4131 | 2.7311 * The matrix multiply is now fastest
200 | 1.2859 | 0.5538 | 5.9280
Well, it turns out that for smaller values of k (around 50 or less) the for-loop solution actually wins out, showing once again that for loops are not as "evil" as they used to be considered in older versions of MATLAB. Under certain circumstances, they can be more efficient than a clever vectorization. However, when the value of k is larger than around 100, the vectorized matrix-multiply solution starts to win out, scaling much more nicely with increasing k than the for-loop solution does. The mixed for-loop/matrix-multiply solution scales atrociously for reasons that I'm not exactly sure of.
So, if you expect k to be large, I'd go with the vectorized matrix-multiply solution. One thing to keep in mind is that the results you get from each solution (the matrix C) will differ ever so slightly (on the level of the floating-point precision) since the order of additions and multiplications performed for each solution are different, thus leading to a difference in accumulation of rounding errors. In short, the difference between the results for these solutions should be negligible, but you should be aware of it.
Have you looked into Matlab's conv method?
I can't compare it against your provided code, because what you provided gives me a problem with trying to access the zeroth element of A. (When k=1, k-1=0.)
Have you considered using FFTs to convolve? A convolution operation is simply a point-wise multiplication in the frequency domain. You'll have to take some precaution with finite sequences, as you'll end up with circular convolution if you're not careful (but this is trivial to take care of).
Here's a simple example for a 1D case.
>> a=rand(4,1);
>> b=rand(3,1);
>> c=conv(a,b)
c =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
The same using FFTs
>> A=fft(a,6);
>> B=fft(b,6);
>> C=real(ifft(A.*B))
C =
0.1167
0.3133
0.4024
0.5023
0.6454
0.3511
A convolution of an M point vector and an N point vector results in an M+N-1 point vector. So, I've padded each of the vectors a and b with zeros before taking the FFT (this is automatically taken care of when I take the 4+3-1=6 point FFT of it).
EDIT
Although the equation that you showed is similar to a circular convolution, it's not exactly it. So you can ditch the FFT approach, and the built-in conv* functions. To answer your question, here's the same operation done without explicit loops:
dim1=3;dim2=dim1;
dim3=10;
a=rand(dim1,dim2,dim3);
b=rand(dim1,dim2,dim3);
mIndx=cellfun(#(x)(1:x),num2cell(1:dim3),'UniformOutput',false);
fun=#(x)sum(reshape(cell2mat(cellfun(#(y,z)a(:,:,y)*b(:,:,z),num2cell(x),num2cell(fliplr(x)),'UniformOutput',false)),[dim1,dim2,max(x)]),3);
c=reshape(cell2mat(cellfun(#(x)fun(x),mIndx,'UniformOutput',false)),[dim1,dim2,dim3]);
mIndx here is a cell, where the ith cell contains a vector 1:i. This is your l index (as others have noted, please don't use l as a variable name).
The next line is an anonymous function that does the convolution operation, making use of the fact that the k index is just the l index flipped around. The operations are carried out on individual cells, and then assembled.
The last line actually performs the operations on the matrices.
The answer is the same as that obtained with the loops. However, you'll find that the looped solution is actually an order of magnitude faster (I averaged 0.007s for my code and 0.0006s for the loop). This is because the loop is pretty straightforward, whereas with this sort of nested construction, there's plenty of function call overheads and repeated reshaping that slow it down.
MATLAB's loops have come a long way since the early days when loops were dreaded. Certainly, vectorized operations are blazing fast; but not everything can be vectorized, and sometimes, loops are more efficient than such convoluted anonymous functions. I could probably shave off a few more tenths here and there by optimizing my construction (or maybe taking a different approach), but I'm not going to do that.
Remember that good code should be readable, as well as efficient and minor optimization at the cost of readability serves no one. Although I wrote the code above, I certainly will not be able to decipher what it does if I revisited it a month later. Your looped code was clear, readable and fast and I would suggest that you stick with it.

Faster projected-norm (quadratic-form, metric-matrix...) style computations

I need to perform lots of evaluations of the form
X(:,i)' * A * X(:,i) i = 1...n
where X(:,i) is a vector and A is a symmetric matrix. Ostensibly, I can either do this in a loop
for i=1:n
z(i) = X(:,i)' * A * X(:,i)
end
which is slow, or vectorise it as
z = diag(X' * A * X)
which wastes RAM unacceptably when X has a lot of columns. Currently I am compromising on
Y = A * X
for i=1:n
z(i) = Y(:,i)' * X(:,i)
end
which is a little faster/lighter but still seems unsatisfactory.
I was hoping there might be some matlab/scilab idiom or trick to achieve this result more efficiently?
Try this in MATLAB:
z = sum(X.*(A*X));
This gives results equivalent to Federico's suggestion using the function DOT, but should run slightly faster. This is because the DOT function internally computes the result the same way as I did above using the SUM function. However, DOT also has additional input argument checks and extra computation for cases where you are dealing with complex numbers, which is extra overhead you probably don't want or need.
A note on computational efficiency:
Even though the time difference is small between how fast the two methods run, if you are going to be performing the operation many times over it's going to start to add up. To test the relative speeds, I created two 100-by-100 matrices of random values and timed the two methods over many runs to get an average execution time:
METHOD AVERAGE EXECUTION TIME
--------------------------------------------
Z = sum(X.*Y); 0.0002595 sec
Z = dot(X,Y); 0.0003627 sec
Using SUM instead of DOT therefore reduces the execution time of this operation by about 28% for matrices with around 10,000 elements. The larger the matrices, the more negligible this difference will be between the two methods.
To summarize, if this computation represents a significant bottleneck in how fast your code is running, I'd go with the solution using SUM. Otherwise, either solution should be fine.
Try this:
z = dot(X, A*X)
I don't have Matlab here to test, but it works on Octave, so I expect Matlab to have an analogous dot() function.
From Octave's help:
-- Function File: dot (X, Y, DIM)
Computes the dot product of two vectors. If X and Y are matrices,
calculate the dot-product along the first non-singleton dimension.
If the optional argument DIM is given, calculate the dot-product
along this dimension.
For completeness, gnovice's answer in Scilab would be
z = sum(X .* Y, 1)'