I would like to multiply each sub-block of a matrix A mxn with a matrix B pxq. For example A can be divided into k sub blocks each one of size mxp.
A = [A_1 A_2 ... A_k]
The resulting matrix will be C = [A_1*B A_2*B ... A_k*B] and I would like to do it efficiently.
What I have tried until now is:
C = A*kron(eye(k),B)
Edited: Daniel I think you are right. I tried 3 different ways. Computing a kronecker product seems to be a bad idea. Even the solution with the reshape works faster than the more compact kron solution.
tic
for i=1:k
C1(:,(i-1)*q+1:i*q) = A(:,(i-1)*p+1:i*p)*B;
end
toc
tic
C2 = A*kron(eye(k),B);
toc
tic
A = reshape(permute(reshape(A,m,p,[]),[1 3 2]),m*k,[]);
C3 = A*B;
C3 = reshape(permute(reshape(C3,m,k,[]),[1 3 2]),m,[]);
toc
When I look at your matrix multiplication code, you have perfectly optimized code within the loop. You can't beat matrix multiplication. Everything you could cut down is the overhead for the iteration, but compared to the long runtime of a matrix multiplication the overhead has absolutely no influence.
What you attempted to do would be the right strategy when the operation within the loop is trivial but the loop is iterated many times. If you take the following parameters, you will notice that your permute solution has actually it's strength, but not for your problem dimensions:
q=1;p=1;n=1;m=1;
k=10^6
Kron totally fails. Your permute solution takes 0.006s while the loop takes 1.512s
Related
I am calculating the solution of a constrained linear least-squares problem as follows:
lb = zeros(7,1);
ub = ones(7,1);
for i = 1:size(b,2)
x(:,i) = lsqlin(C,b(:,i),[],[],[],[],lb,ub);
end
where C is m x 7 and b is m x n. n is quite large leading to a slow computation time. Is there any way to speed up this procedure and get rid of the slow for loop. I am using lsqlin instead of pinv or \ because I need to constrain my solution to the boundaries of 0–1 (lb and ub).
The for loop is not necessarily the reason for any slowness – you're not pre-allocating and lsqlin is probably printing out a lot of stuff on each iteration. However, you may be able to speed this up by turning your C matrix into a sparse block diagonal matrix, C2, with n identical blocks (see here). This solves all n problems in one go. If the new C2 is not sparse you may use a lot more memory and the computation may take much longer than with the for loop.
n = size(b,2);
C2 = kron(speye(n),C);
b2 = b(:);
lb2 = repmat(lb,n,1); % or zeros(7*n,1);
ub2 = repmat(ub,n,1); % or ones(7*n,1);
opts = optimoptions(#lsqlin,'Algorithm','interior-point','Display','off');
x = lsqlin(C2,b2,[],[],[],[],lb2,ub2,[],opts);
Using optimoptions, I've specified the algorithm and set 'Display' to 'off' to make sure any outputs and warnings don't slow down the calculations.
On my machine this is 6–10 times faster than using a for loop (with proper pre-allocation and setting options). This approach assumes that the sparse C2 matrix with m*n*7 elements can fit in memory. If not, a for loop based approach will be the only option (other than writing your own specialized version of lsqlin or taking advantage any other spareness in the problem).
Does Matlab do a full matrix multiplication when a matrix multiplication is given as an argument to the trace function?
For example, in the code below, does A*B actually happen, or are the columns of B dotted with the rows of A, then summed? Or does something else happen?
A = [2,2;2,2];
B = eye(2);
f = trace(A*B);
Yes, MATLAB calculates the product, but you can avoid it!
First, let's see what MATLAB does if you do f = trace(A*B):
I think the picture from my Performance monitor says it all really. The first bump is when I created a large A = 2*ones(n), the second, very little bump is for the creation of B = eye(n), and the last bump is where f = trace(A*B) is calculated.
Now, let's see that you get if you do it manually:
If you do it manually, you can save a lot of memory, and it's much faster.
tic
n = 6e3;
A = rand(n);
B = rand(n);
f = trace(A*B);
toc
pause(10)
tic
C(n) = 0;
for ii = 1:n
C(ii) = sum(A(ii,:)*B(:,ii));
end
g = sum(C);
toc
abs(f-g) < 1e-10
Elapsed time is 11.982804 seconds.
Elapsed time is 0.540285 seconds.
ans =
1
Now, as you asked about in the comments: "Is this still true if you use it in a function where optimization can kick in?"
This depends on what you mean here, but as a quick example:
Calculating x = inv(A)*b can be done in a few different ways. If you do:
x = A\b;
MATLAB will chose an algorithm that's best suited for your particular matrix/vector. There are many different alternatives here, depending on the structure of the matrix: is it triangular, hermatian, sparse...? Often it's a upper/lower triangulation. I can pretty much guarantee you that you can't write a code in MATLAB that can outperform MATLABs builtin functions here.
However, if you calculate the same thing this way:
x = inv(A)*b;
MATLAB will actually calculate the inverse of A, then multiply it by b, even though the inverse is not stored in the workspace afterwards. This is much slower, and can also be inaccurate. (In the A\b approach, MATLAB will, if necessary create a permutation matrix to ensure numerical stability.
I was wondering if someone could help me with my problem.
Let say that I have the coordinates of MxN vectors in a tensor r of dimensions [M,N,3]. I would like to save in a 3M-by-3N block matrix all dyadic products r_0'*r_0, where r_0 is the vector r_0 = r(m,n,:) for some m and n, and I would like to do this without using for loops.
If haven't explain myself correctly, here is an example code that shows what I would like to obtain (but using for loops, of course):
N=10;
M=5;
r=rand(M,N,3);
Dyadic=zeros(3*M,3*N);
for m=1:M
a1=3*m-2;
a2=3*m;
for n=1:N
b1=3*n-2;
b2=3*n;
aux(3)=r(m,n,3);
aux(2)=r(m,n,2);
aux(1)=r(m,n,1);
Dyadic(a1:a2,b1:b2)=transpose(aux)*aux
end
end
Thanks in advance!
You need to use bsxfun(#times and then re-arrange elements to have the desired output -
%// Get the multipliication result
mat_mult = bsxfun(#times,permute(r,[1 2 4 3]),r);
%// OR if you would like to keep mat_mult as 3D that could be potentially faster -
%// mat_mult = bsxfun(#times,reshape(r,[],3),permute(reshape(r,[],3),[1 3 2]));
%// Re-arrange elements to have them the way you are indexing in nested loops
Dyadic = reshape(permute(reshape(mat_mult,M,N,3,[]),[3 1 4 2]),M*3,N*3);
The major play about this solution is really the re-arrangement of elements after we have the multiplication result.
Quick runtime tests with the input r as 1000 x 1000 x 3 sized array, show that this bsxfun based approach gives over 20x speedup over the nested loop code listed in the question!
What's the best way to do the following (in Matlab) if I have two matrices A and B, let'say both of size m-by-n:
C = zeros(m,m);
for t=1:n
C=C+A(:,t)*B(:,t)';
end
This is nothing more than
C = A*B';
where A and B are each m-by-n. I'm not sure that you're going to get more efficient than that unless the matrices have special properties.
One place where you might get a benefit from using bsxfun for matrix multiplication is when the dimensions are sufficiently large (probably 100-by-100 or more) and one matrix is diagonal, e.g.:
A = rand(1e2);
B = diag(rand(1,1e2));
C = bsxfun(#times,A,diag(B).');
This occurs in many matrix transforms – see the code for sqrtm for example (edit sqrtm).
let's say I have a big Matrix X with a lot of zeros, so of course I make it sparse in order to save on memory and CPU. After that I do some stuff and at some point I want to have the nonzero elements. My code looks something like this:
ind = M ~= 0; % Whereby M is the sparse Matrix
This looks however rather silly to me since the structure of the sparse Matrix should allow the direct extraction of the information.
To clarify: I do not look for a solution that works, but rather would like to avoid doing the same thing twice. A sparse Matrix should perdefinition already know it's nonzero values, so there should be no need to search for it.
yours magu_
The direct way to retrieve nonzero elements from a sparse matrix, is to call nonzeros().
The direct way is obviously the fastest method, however I performed some tests against logical indexing on the sparse and its full() counterparty, and the indexing on the former is faster (results depend on the sparsity pattern and dimension of the matrix).
The sum of times over 100 iterations is:
nonzeros: 0.02657 seconds
sparse idx: 0.52946 seconds
full idx: 2.27051 seconds
The testing suite:
N = 100;
t = zeros(N,3);
for ii = 1:N
s = sprand(10000,1000,0.01);
r = full(s);
% Direct call nonzeros
tic
nonzeros(s);
t(ii,1) = toc;
% Indexing sparse
tic
full(s(s ~= 0));
t(ii,2) = toc;
% Indexing full
tic
r(r~=0);
t(ii,3) = toc;
end
sum(t)
I'm not 100% sure what you're after but maybe [r c] = find(M) suits you better?
You can get to the values of M by going M(r,c) but the best method will surely be dictated by what you intend to do with the data next.
find function is recommended by MATLAB:
[row,col] = find(X, ...) returns the row and column indices of the nonzero entries in the matrix X. This syntax is especially useful when working with sparse matrices.
While find has been proposed before, I think this is an important addition:
[r,c,v] = find(M);
Gives you not only the indices r,c, but also the non-zero values v. Using the nonzeros command seems to be a bit faster, but find is in general very useful when dealing with sparse matrices because the [r,c,v] vectors describe the complete matrix (except matrix dimension).