Optimizing tensor multiplications - matlab

I've got a real-time image processing program I'm trying to optimize, and it all boils down to matrix multiplications. Consider 3 tensors I'm calculating in the initialization stage:
A = np.arange(35 * 51 * 59).reshape([35, 51, 59])
B = np.arange(37 * 51 * 51 * 59).reshape([37, 51, 51, 59])
C = np.arange(59 * 27).reshape([59, 27])
Each frame, I'm getting a new data in the form of a fourth tensor:
M = np.arange(35 * 37 * 59).reshape([35, 37, 59]).
Currently, I'm calculating D = np.einsum('xyf,xtf,ytpf,fr->tpr', M, A, B, C), where D is my desired result, and it's the major bottleneck of the program. There are two directions I'm trying to follow in order to optimize it.
First I tried coming up with a tensor T, a function of A, B, C, D that I can pre-calculate, and then it'll all boil to D = np.tensordot(M, T, axes=..). I wasn't successful. I spent a lot of time on it, is it even possible at all?
Moreover, the program itself is written in MATLAB. As it doesn't have a built-in tensor multiplication function (einsum or tensordot equivilent), I'm currently using the tprod toolbox, and doing:
temp1 = etprod('dcb', A, 'abc', M, 'adc');
temp2 = etprod('dbc', B, 'abcd', temp1, 'adb');
D = etprod('cdb', C, 'ab', temp2, 'acd');
As the default dot product function in MATLAB (for 2D matrices) is much faster then etprod, I though about reshaping A, B, C, D to 2D arrays in a way that I will able to multiple 2D matrices using the default function, without hand-written for loops. I wasn't successful with that either.
Any thoughts? thanks!

If this operation is done many times with different values of M we could define
D0 = np.einsum('xft,fr->tpr',A, B, C)
The whole operation could be broken into binary steps:
D0=np.einsum('xtf,ytpf->xyptf',A,B)
D0=np.einsum('xyptf,fr->xyftpr',D0,C)
D=np.einsum('tprxfy,xfy->tpr',D0,M)
The final operation uses D0 and M and can be coded as a matrix vector operation. In Matlab it would be
D=reshape(D0.[],numel(M))*M(:);
which could then be reordered as desired.
We could write this order as (((A,B),C),M)
It might be better, however, to use ((M,C),A,B)
D=np.einsum('xyf,fr->xyfr',M,C)
D0=np.einsum('xyfr,xtf->ytfr',D,A)
D=np.einsum('ytfr,ytpf->tpr',D,B)
This ordering of operations has intermediate arrays with only 4 indices rather than one with 6. If each operation is much faster than the single one this may be an advantage.

Related

Multiply two vectors with dimensions increasing along time

I have two vectors (called A and B) with length N. Then I need to multiply both of them, but as an "integration" process. Which means I have to multiply first A(1)*B(1), then A(1:2)*B(1:2), until A(1:N)*B(1:N). The result of multiplying booth vector is a number, since B is a column vector. I've done it with a for loop:
for k = 1:N
C(k) = A(1:k) * B(1:k).';
end
But I wanted to ask you if this is the best solution or there is any other option more time-efficient, since N is very large (about 110,000)
C = cumsum(A.*B)
does the same thing without for loop. As EBH suggested in the comments if you are not sure whether A and B have same orientation, then use
C = cumsum(A(:).*B(:))

How to get the union of the two 3D matrix?

I have two 3D matrix A and B. The size of A and B are both 40*40*20 double.
The values in matrix A and B are either 0 or 1. The number of "1" in A are 100,
the number of "1" in B are 50. The "1" in matrix A and B may or may not be in
the same coordinates. I want to get the union of matrix A and B, called C. The values in 3D matrix C is either "1" or "0". The number of "1" in C is less than or equal to 150. My question is how to get the 3D matrix C in Matlab?
You can use the operator or, which is a logical or. So or(a,b) is equivalent to the logical operation a | b.
C = or(A,B);
C = a | b;
| and or are the same operator in MatLab, it's just two different way to call it.
I think this is the best solution as long as it's integrated into MatLab. However, you have plenty different ways to do it.
Just as an example, you can do
C = logical(a+b);
logical is an operator that convert every value into logical values. Long story short, it will replace any value different of 0 by 1.
You can approach it in 2 ways. The more efficient one is using vectors but you can also do it in classical nested for loops.
A = rand(40,40,20);
A = A > 0.01; # Get approximate 320 ones and rest zeros
B = rand(40,40,20);
B = B > 0.005; # Get approximate 160 ones and rest zeros
C = zeros(size(A));
for iter1 = 1:size(A,1)
for iter2 = 1:size(A,2)
for iter3 = 1:size(A,3)
C(iter1,iter2,iter3) = A(iter1,iter2,iter3)|B(iter1,iter2,iter3)
end
end
end
This method will be very slow. You can vectorized it to improve performance
C = A|B

Multidimensional Arrays Multiplication in Matlab

I have the following three arrays in Matlab:
A size: 2xMxN
B size: MxN
C size: 2xN
Is there any way to remove the following loop to speed things up?
D = zeros(2,N);
for i=1:N
D(:,i) = A(:,:,i) * ( B(:,i) - A(:,:,i)' * C(:,i) );
end
Thanks
Yes, it is possible to do without the for loop, but whether this leads to a speed-up depends on the values of M and N.
Your idea of a generalized matrix multiplication is interesting, but it is not exactly to the point here, because through the repeated use of the index i you effectively take a generalized diagonal of a generalized product, which means that most of the multiplication results are not needed.
The trick to implement the computation without a loop is to a) match matrix dimensions through reshape, b) obtain the matrix product through bsxfun(#times, …) and sum, and c) get rid of the resulting singleton dimensions through reshape:
par = B - reshape(sum(bsxfun(#times, A, reshape(C, 2, 1, N)), 1), M, N);
D = reshape(sum(bsxfun(#times, A, reshape(par, 1, M, N)), 2), 2, N);
par is the value of the inner expression in parentheses, D the final result.
As said, the timing depends on the exact values. For M = 100 and N = 1000000 I find a speed-up by about a factor of two, for M = 10000 and N = 10000 the loop-less implementation is actually a bit slower.
You may find that the following
D=tprod(A,[1 -3 2],B-tprod(A,[-3 1 2],C,[-3 2]),[-3 2]);
cuts the time taken. I did a few tests and found the time was cut in about half.
tprod is available at
http://www.mathworks.com/matlabcentral/fileexchange/16275
tprod requires that A, B and C are full (not sparse).

Accelerate the calculation of inv(X'*X)*Q*inv(X'*X) in Matlab?

I have to calculate Newey-West standard errors for large multiple regression models.
The final step of this calculation is to obtain
nwse = sqrt(diag(N.*inv(X'*X)*Q*inv(X'*X)));
This file exchange contribution implements this as
nwse = sqrt(diag(N.*((X'*X)\Q/(X'*X))));
This looks reasonable, but in my case (5000x5000 sparse Q and X'*X) it's far too slow for my needs (about 30secs, I have to repeat this for about one million different models). Any ideas how to make this line faster?
Please note that I need only the diagonal, not the entire matrix and that both Q and (X'*X) are positive-definite.
I believe you can save a lot of computation time by explicitly doing an LU factorization, [l, u, p, q] = lu(X'*X); and use those factors when doing the calculations. Also, since X are constant for about 100 models, pre-calculating X'*X will most likely save you some time.
Note that in your case, the most time demanding operation might very well be the sqrt-function.
% Constant for every 100 models or so:
M = X'*X;
[l, u, p, q] = lu(M);
% Now, I guess this should be quite a bit faster (I might have messed up the order):
nwse = sqrt(diag(N.*(q * ( u \ (l \ (p * Q))) * q * (u \ (l \ p)))));
The first two terms are commonly used:
l the lower triangular matrix
u the upper triangular matrix
Now, p and q are a bit more uncommon.
p is a row permutation matrix used to obtain numerical stability. There is not much performance gain in doing [l, u, p] = lu(M) compared to [l, u] = lu(M) for sparse matrices.
q however offers a significant performance gain. q is a column permutation matrix that is used to reduce the amount of fill when doing the factorization.
Note that the [l, u, p, q] = lu(M) is only a valid syntax for sparse matrices.
As for why using full pivoting as described above should be faster:
Try the following to see the purpose of the column permutation matrix q. It is easier to work with elements that are aligned around the diagonal.
S = sprand(100,100,0.01);
[l, u, p] = lu(S);
spy(l)
figure
spy(u)
Now, compare it with this:
[ll, uu, pp, qq] = lu(S);
spy(ll);
figure
spy(uu);
Unfortunately, I don't have MATLAB here right now, so I can't guarantee that I put all the arguments in the correct order, but I think it's correct.
Following the helpful answer of Robert_P and the comments of Parag, I found the following to be the fastest for my particular large-scale sparse data:
L=chol(X'*X,'lower'); L=full(L);
invXtX = L'\(L\ speye(size(X,2)));
nwse = sqrt(N.*sum(invXtX.*(Q*invXtX)));
The last line computes the diagonal efficiently, idea taken from here.

Multiplying a 3x3 matrix to 3nx1 array without using loops

In my code, I have to multiply a matrix A (dimensions 3x3) to a vector b1 (dimensions 3x1), resulting in C. So C = A*b1. Now, I need to repeat this process n times keeping A fixed and updating b to a different (3x1) vector each time. This can be done using loops but I want to avoid it to save computational cost. Instead I want to do it as matrix and vector product. Any ideas?
You need to build a matrix of b vectors, eg for n equal to 4:
bMat = [b1 b2 b3 b4];
Then:
C = A * bMat;
provides the solution of size 3x4 in this case. If you want the solution in the form of a vector of length 3n by 1, then do:
C = C(:);
Can we construct bMat for arbitrary n without a loop? That depends on what the form of all your b vectors is. If you let me know in a comment, I can update the answer.