Matlab: element 3D matrices multiplication - matlab

I have two matrices: B with size 9x100x51 and K with size 34x9x100. I want to multiply all of K(34) with each one of B(9) so as to have a final matrix G with size 34x9x100x51.
For example: the element G(:,5,60,25) is composed as follow
G(:,5,60,25)=K(:,5,60)*B(5,60,25).
I hope that the example helps to understand what I want to do.
Thank you

Any time you find yourself writing nested loops in matlab, there's a good chance you can speed up quite a bit using the built-in vectorized forms of the functions. The code ends up being quite a bit shorter typically too (but often less immediately clear to a reader, so comment your code!).
In this case, does avoiding the nested loops make a difference? Absolutely! Let's get to work. #slayton has provided a 3-loop solution. We can get faster.
Restating the problem a bit, B has 51 9x100 matrices and K has 34 9x100 matrices. For each combination of 51x34, you want to element-wise multiply the respective 9x100 matrices from B and K.
Element-wise multiplication is a great job for bsxfun, so we can conceptually reduce this problem to working along two dimensions (the third dimension of B, first dimension of K):
Initial, two-loop solution:
B = rand(9,100,51);
K = rand(34,9,100);
G = nan(34,9,100,51);
for b=1:size(B,3)
for k=1:size(K,1)
G(k,:,:,b) = bsxfun(#times,B(:,:,b), squeeze(K(k,:,:)));
end
end
Ok, two loops is making progress. Can we do better? Well, let's recognize that the matrices B and K can be replicated along the appropriate dimensions, then element-wise multiplied all at once.
B = rand(9,100,51);
K = rand(34,9,100);
B2 = repmat(permute(B,[4 1 2 3]), [size(K,1) size(B)]);
K2 = repmat(K, [size(K) size(B,3)]);
G = bsxfun(#times,B2,K2);
So, how do the solutions compare speed-wise? I tested the on the octave online utility, and didn't include the time to generate the initial B and K matrices. I did include the time to preallocate the G matrix for the solutions that needed preallocation. The code is below.
3 loops (#slayton's answer): 4.024471 s
2 loop solution: 1.616120 s
0-loop repmat/bsxfun solution: 1.211850 s
0-loop repmat/bsxfun solution, no temporaries: 0.605838 s
Caveat: The timing may depend quite a bit on your machine, I wouldn't trust the online utility for great timing tests. Changing the order of when the loops were executed (even taking care not to reuse variables and mess up time of allocation) did change things a bit, namely the 2-loop solution was sometimes as fast as the no-loop solution with temporaries stored. However, the more vectorized you can get, the better you will be.
Here's the code for the speed test:
B = rand(9,100,51);
K = rand(34,9,100);
tic
G1 = nan(34,9,100,51);
for ii = 1:size(B,1)
for jj = 1:size(B,2);
for kk = 1:size(B,3)
G1(:, ii, jj, kk) = K(:,ii,jj) .* B(ii,jj,kk);
end
end
end
t=toc;
printf('Time for 3 loop solution: %f\n' ,t)
tic
G2 = nan(34,9,100,51);
for b=1:size(B,3)
for k=1:size(K,1)
G2(k,:,:,b) = bsxfun(#times,B(:,:,b), squeeze(K(k,:,:)));
end
end
t=toc;
printf('Time for 2 loop solution: %f\n' ,t)
tic
B2 = repmat(permute(B,[4 1 2 3]), [size(K,1) 1 1 1]);
K2 = repmat(K, [1 1 1 size(B,3)]);
G3 = bsxfun(#times,B2,K2);
t=toc;
printf('Time for 0-loop repmat/bsxfun solution: %f\n' ,t)
tic
G4 = bsxfun(#times,repmat(permute(B,[4 1 2 3]), [size(K,1) 1 1 1]),repmat(K, [1 1 1 size(B,3)]));
t=toc;
printf('Time for 0-loop repmat/bsxfun solution, no temporaries: %f\n' ,t)
disp('Are the results equal?')
isequal(G1,G2)
isequal(G1,G3)
Time for 3 loop solution: 4.024471
Time for 2 loop solution: 1.616120
Time for 0-loop repmat/bsxfun solution: 1.211850
Time for 0-loop repmat/bsxfun solution, no temporaries: 0.605838
Are the results equal?
ans = 1
ans = 1

You can do this with nested loops, although it probably won't be terribly fast:
B = rand(9,100,51);
K = rand(34,9,100);
G = nan(34,9,100,51)
for ii = 1:size(B,1)
for jj = 1:size(B,2);
for kk = 1:size(B,3)
G(:, ii, jj, kk) = K(:,ii,jj) .* B(ii,jj,kk);
end
end
end
Its been a long day and my brain is a bit fried, kudos to anyone who can improve this!

Related

Contracting tensor in Matlab

I am looking for a way to contract two indices of a tensor in Matlab.
Say I have a tensor of dimension [17,10,17,12] I am looking for a function that sums over the first and third dimension with the same index and leaves a matrix of dimension [10,12] (analogous to a trace in two dimensions).
I am currently studying tensor networks and I mainly use the functions "permute" and "reshape". If one is contracting multiple tensors and is not careful from the beginning, one might end up with indices one wants to contract in one tensor of the form [i,j,i,k].
Of course one can go back and contract the tensors in a way such that this does not happen, but I'd nonetheless be interested in a more robust solution.
EDIT:
Something to the effect of:
A = rand(17,10,17,12);
A_contracted = zeros(10,12);
for i = [1:10]
for j = [1:12]
for k = [1:17]
A_contracted(i,j) = A_contracted(i,j) + A(k,i,k,j);
end
end
end
Here's a way to do it:
A_contracted = permute(sum( ...
A.*((1:size(A,1)).'==reshape(1:size(A,3), 1, 1, [])), [1 3]), [2 4 1 3]);
The above uses implicit expansion and the possibility to operate along multiple dimensions at once in sum, which are recent Matlab features. For older Matlab versions,
A_contracted = permute(sum(sum( ...
A.*bsxfun(#eq, (1:size(A,1)).', reshape(1:size(A,3), 1, 1, [])),1),3), [2 4 1 3]);
[I feel like I'm starting to sound like a broken record...]
You should always implement your code as a loop first, then try to optimize using permute and reshape. But note that permute needs to copy data, so tends to increase the amount of work, rather than decrease it. Recent versions of MATLAB are no longer slow with loops, and thus copying data is no longer always a useful hack to speed up things.
For example, the loop in the question can be simplified to:
A_contracted = zeros(size(A,2),size(A,4));
for k = 1:size(A,1)
A_contracted = A_contracted + squeeze(A(k,:,k,:));
end
(I've also generalized to arbitrary sizes).
Comparing with Luis' answer, I see the vectorized method winning for small arrays such as the one in the OP (17x10x17x12) with 0.09 ms vs 0.19 ms. But with very small times all around it is likely not worth the effort. However, for larger arrays (I tried 17x100x17x120) I see the loop method winning 1.3 ms vs 2.6 ms.
The more data, the bigger the advantage to using just plain old loops. With 170x100x170x120 it is 0.04 s vs 0.45 s.
Test code:
A = rand(17,100,17,120);
assert(all(method2(A)==method1(A),'all'))
timeit(#()method1(A))
timeit(#()method2(A))
function A_contracted = method1(A)
A_contracted = permute(sum( ...
A.*((1:size(A,1)).'==reshape(1:size(A,3), 1, 1, [])), [1 3]), [2 4 1 3]);
end
function A_contracted = method2(A)
A_contracted = zeros(size(A,2),size(A,4));
for k = 1:size(A,1)
A_contracted = A_contracted + squeeze(A(k,:,k,:));
end
end
My professor suggested another solution (in the following denoted by method3) involving reshape and matrix multiplication.
take a unit matrix of the size of the contracted index
reshape it into a vector
reshape the tensor you want to contract accordingly
multiply the vector and the tensor
reshape the Contracted tensor
sample code comparing to Luis's (method1) and Cris's answer (method2):
A = rand(17,10,17,10);
timeit(#()method1(A))
timeit(#()method2(A))
timeit(#()method3(A))
function A_contracted = method1(A)
A_contracted = permute(sum( ...
A.*((1:size(A,1)).'==reshape(1:size(A,3), 1, 1, [])), [1 3]), [2 4 1 3]);
end
function A_contracted = method2(A)
A_contracted = zeros(size(A,2),size(A,4));
for k = 1:size(A,1)
A_contracted = A_contracted + squeeze(A(k,:,k,:));
end
end
function A_contracted = method3(A)
sa_1 = size(A,1);
Unity = eye(size(A, 1));
Unity = reshape(Unity, [1,sa_1*sa_1]);
A1 = permute(A, [1,3,2,4]);
A2 = reshape(A1, [sa_1*sa_1, size(A1, 3)* size(A1,4)]);
UnA = Unity*A2;
A_contracted = reshape(UnA, [size(A1,3), size(A1,4)]);
end
method3 dominates for small dimensions by an order of magnitude over both method1 and method2 and beats method1 for larger dimensions as well, but is beaten by for loops for larger dimensions by one order of magnitude.
method3 has the (somewhat personal) advantage of being more intuitive for the application in my physics course in the sense that a contraction is not really in the tensor itself, but with respect to a metric. method3 may be easily adapted to incorporate this feature.
Pretty easy
squeeze(sum(sum(a,3),1))
The sum(a,n) sums over the nth dimension of the array and the squeeze removes any singleton dimensions

Fastest way of finding the only index of vector b where array A(i,j) == b

I have 2 big arrays A and b:
A: 10.000++ rows, 4 columns, not unique integers
b: vector with 500.000++ elements, unique integers
Due to the uniqueness of the values of b, I need to find the only index of b, where A(i,j) == b.
What I started with is
[rows,columns] = size(A);
B = zeros(rows,columns);
for i = 1 : rows
for j = 1 : columns
B(i,j) = find(A(i,j)==b,1);
end
end
This takes approx 5.5 seconds to compute, which is way to long, since A and b can be significantly bigger... That in mind I tried to speed up the code by using logical indexing and reducing the for-loops
[rows,columns] = size(A);
B = zeros(rows,columns);
for idx = 1 : numel(b)
B(A==b(idx)) = idx;
end
Sadly this takes even longer: 21 seconds
I even tried to do use bsxfun
for i = 1 : columns
[I,J] = find(bsxfun(#eq,A(:,i),b))
... stitch B together ...
end
but with a bigger arrays the maximum array size is quickly exceeded (102,9GB...).
Can you help me find a faster solution to this? Thanks in advance!
EDIT: I extended find(A(i,j)==b,1), which speeds up the algorithm by factor 2! Thank you, but overall still too slow... ;)
The function ismember is the right tool for this:
[~,B] = ismember(A,b);
Test code:
function so
A = rand(1000,4);
b = unique([A(:);rand(2000,1)]);
B1 = op1(A,b);
B2 = op2(A,b);
isequal(B1,B2)
tic;op1(A,b);op1(A,b);op1(A,b);op1(A,b);toc
tic;op2(A,b);op2(A,b);op2(A,b);op2(A,b);toc
end
function B = op1(A,b)
B = zeros(size(A));
for i = 1:numel(A)
B(i) = find(A(i)==b,1);
end
end
function B = op2(A,b)
[~,B] = ismember(A,b);
end
I ran this on Octave, which is not as fast with loops as MATLAB. It also doesn't have the timeit function, hence the crappy timing using tic/toc (sorry for that). In Octave, op2 is more than 100 times faster than op1. Timings will be different in MATLAB, but ismember should still be the fastest option. (Note I also replaced your double loop with a single loop, this is the same but simpler and probably faster.)
If you want to repeatedly do the search in b, it is worthwhile to sort b first, and implement your own binary search. This will avoid the checks and sorting that ismember does. See this other question.
Assuming that you have positive integers you can use array indexing:
mm = max(max(A(:)),max(b(:)));
idxs = sparse(b,1,1:numel(b),mm,1);
result = full(idxs(A));
If the range of values is small you can use dense matrix instead of sparse matrix:
mm = max(max(A(:)),max(b(:)));
idx = zeros(mm,1);
idx(b)=1:numel(b);
result = idx(A);

Need help in using bsxfun

I have two arrays in MATLAB:
A; % size(A) = [NX NY NZ 3 3]
b; % size(b) = [NX NY NZ 3 1]
In fact, in the three dimensional domain, I have two arrays defined for each (i, j, k) which are obtained from above-mentioned arrays A and b, respectively and their sizes are [3 3] and [3 1], respectively. Let's for the sake of example, call these arrays m and n.
m; % size(m) = [3 3]
n; % size(n) = [3 1]
How can I solve m\n for each point of the domain in a vectorize fashion? I used bsxfun but I am not successful.
solution = bsxfun( #(A,b) A\b, A, b );
I think the problem is with the expansion of the singleton elements and I don't know how to fix it.
I tried some solutions, it seems that a for loop is acutally the fastest possibility in this case.
A naive approach looks like this:
%iterate
C=zeros(size(B));
for a=1:size(A,1)
for b=1:size(A,2)
for c=1:size(A,3)
C(a,b,c,:)=squeeze(A(a,b,c,:,:))\squeeze(B(a,b,c,:));
end
end
end
The squeeze is expensive in computation time, because it needs some advanced indexing. Swapping the dimensions instead is faster.
A=permute(A,[4,5,1,2,3]);
B=permute(B,[4,1,2,3]);
C2=zeros(size(B));
for a=1:size(A,3)
for b=1:size(A,4)
for c=1:size(A,5)
C2(:,a,b,c)=(A(:,:,a,b,c))\(B(:,a,b,c));
end
end
end
C2=permute(C2,[2,3,4,1]);
The second solution is about 5 times faster.
/Update: I found an improved version. Reshaping and using only one large loop increases the speed again. This version is also suitable to be used with the parallel computing toolbox, in case you own it replace the for with a parfor and start the workers.
A=permute(A,[4,5,1,2,3]);
B=permute(B,[4,1,2,3]);
%linearize A and B to get a better performance
linA=reshape(A,[size(A,1),size(A,2),size(A,3)*size(A,4)*size(A,5)]);
linB=reshape(B,[size(B,1),size(B,2)*size(B,3)*size(B,4)]);
C3=zeros(size(linB));
for a=1:size(linA,3)
C3(:,a)=(linA(:,:,a))\(linB(:,a));
end
%undo linearization
C3=reshape(C3,size(B));
%undo dimension swap
C3=permute(C3,[2,3,4,1]);

MATLAB: Block matrix multiplying without loops

I have a block matrix [A B C...] and a matrix D (all 2-dimensional). D has dimensions y-by-y, and A, B, C, etc are each z-by-y. Basically, what I want to compute is the matrix [D*(A'); D*(B'); D*(C');...], where X' refers to the transpose of X. However, I want to accomplish this without loops for speed considerations.
I have been playing with the reshape command for several hours now, and I know how to use it in other cases, but this use case is different from the other ones and I cannot figure it out. I also would like to avoid using multi-dimensional matrices if at all possible.
Honestly, a loop is probably the best way to do it. In my image-processing work I found a well-written loop that takes advantage of Matlab's JIT compiler is often faster than all the extra overhead of manipulating the data to be able to use a vectorised operation. A loop like this:
[m n] = size(A);
T = zeros(m, n);
AT = A';
for ii=1:m:n
T(:, ii:ii+m-1) = D * AT(ii:ii+m-1, :);
end
contains only built-in operators and the bare minimum of copying, and given the JIT is going to be hard to beat. Even if you want to factor in interpreter overhead it's still only a single statement with no functions to consider.
The "loop-free" version with extra faffing around and memory copying, is to split the matrix and iterate over the blocks with a hidden loop:
blksize = size(D, 1);
blkcnt = size(A, 2) / blksize;
blocks = mat2cell(A, blksize, repmat(blksize,1,blkcnt));
blocks = cellfun(#(x) D*x', blocks, 'UniformOutput', false);
T = cell2mat(blocks);
Of course, if you have access to the Image Processing Toolbox, you can also cheat horribly:
T = blockproc(A, size(D), #(x) D*x.data');
Prospective approach & Solution Code
Given:
M is the block matrix [A B C...], where each A, B, C etc. are of size z x y. Let the number of such matrices be num_mat for easy reference later on.
If those matrices are concatenated along the columns, then M would be of size z x num_mat*y.
D is the matrix to be multiplied with each of those matrices A, B, C etc. and is of size y x y.
Now, as stated in the problem, the output you are after is [D*(A'); D*(B'); D*(C');...], i.e. the multiplication results being concatenated along the rows.
If you are okay with those multiplication results to be concatenated along the columns instead i.e. [D*(A') D*(B') D*(C') ...],
you can achieve the same with some reshaping and then performing the
matrix multiplications for the entire M with D and thus have a vectorized no-loop approach. Thus, to get such a matrix multiplication result, you can do -
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
But, if you HAVE to get an output with the multiplication results being concatenated along the rows, you need to do some more reshaping like so -
out = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
Benchmarking
This section covers benchmarking codes comparing the proposed vectorized approach against a naive JIT powered loopy approach to get the desired output. As discussed earlier, depending on how the output array must hold the multiplication results, you can have two cases.
Case I: Multiplication results concatenated along the columns
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(z,y*num_mat);
for k1 = 1:y:y*num_mat
out1(:,k1:k1+y-1) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
toc
Case II: Multiplication results concatenated along the rows
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(y*num_mat,z);
for k1 = 1:y:y*num_mat
out1(k1:k1+y-1,:) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
out2 = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
toc
Runtimes
Case I:
---------------------------- With loopy approach
Elapsed time is 3.889852 seconds.
---------------------------- With proposed approach
Elapsed time is 3.051376 seconds.
Case II:
---------------------------- With loopy approach
Elapsed time is 3.798058 seconds.
---------------------------- With proposed approach
Elapsed time is 3.292559 seconds.
Conclusions
The runtimes suggest about a good 25% speedup with the proposed vectorized approach! So, hopefully this works out for you!
If you want to get A, B, and C from a bigger matrix you can do this, assuming the bigger matrix is called X:
A = X(:,1:y)
B = X(:,y+1:2*y)
C = X(:,2*y+1:3*y)
If there are N such matrices, the best way is to use reshape like:
F = reshape(X, x,y,N)
Then use a loop to generate a new matrix I call it F1 as:
F1=[];
for n=1:N
F1 = [F1 F(:,:,n)'];
end
Then compute F2 as:
F2 = D*F1;
and finally get your result as:
R = reshape(F2,N*y,x)
Note: this for loop does not slow you down as it is just to reformat the matrix and the multiplication is done in matrix form.

How to perform a column by column circular shift of a matrix without a loop

I need to circularly shift individual columns of a matrix.
This is easy if you want to shift all the columns by the same amount, however, in my case I need to shift them all by a different amount.
Currently I'm using a loop and if possible I'd like to remove the loop and use a faster, vector based, approach.
My current code
A = randi(2, 4, 2);
B = A;
for i = 1:size( A,2 );
d = randi( size( A,1 ));
B(:,i) = circshift( A(:,i), [d, 0] );
end
Is is possible to remove the loop from this code?
Update I tested all three methods and compared them to the loop described in this question. I timed how long it would take to execute a column by column circular shift on a 1000x1000 matrix 100 times. I repeated this test several times.
Results:
My loop took more than 12 seconds
Pursuit's suggestion less than a seconds
Zroth's orginal answer took just over 2 seconds
Ansari's suggest was slower than the original loop
Edit
Pursuit is right: Using a for-loop and appropriate indexing seems to be the way to go here. Here's one way of doing it:
[m, n] = size(A);
D = randi([0, m - 1], [1, n]);
B = zeros(m, n);
for i = (1 : n)
B(:, i) = [A((m - D(i) + 1 : m), i); A((1 : m - D(i) ), i)];
end
Original answer
I've looked for something similar before, but I never came across a good solution. A modification of one of the algorithms used here gives a slight performance boost in my tests:
[m, n] = size(A);
mtxLinearIndices ...
= bsxfun(#plus, ...
mod(bsxfun(#minus, (0 : m - 1)', D), m), ...
(1 : m : m * n));
C = A(idxs);
Ugly? Definitely. Like I said, it seems to be slightly faster (2--3 times faster for me); but both algorithms are clocking in at under a second for m = 3000 and n = 1000 (on a rather old computer, too).
It might be worth noting that, for me, both algorithms seem to outperform the algorithm provided by Ansari, though his answer is certainly more straightforward. (Ansari's algorithm's output does not agree with the other two algorithms for me; but that could just be a discrepancy in how the shifts are being applied.) In general, arrayfun seems pretty slow when I've tried to use it. Cell arrays also seem slow to me. But my testing might be biased somehow.
Not sure how much faster this would be, but you could try this:
[nr, nc] = size(A);
B = arrayfun(#(i) circshift(A(:, i), randi(nr)), 1:nc, 'UniformOutput', false);
B = cell2mat(B);
You'll have to benchmark it, but using arrayfun may speed it up a little bit.
I suspect, your circular shifting, operations on the random integer matrix donot make it any more random since the numbers are uniformly distributed.
So I hope your question is using randi() for demonstration purposes only.