MATLAB - apply multiple convolution masks to a single matrix - matlab

I need to convolve a matrix with many other matrices with few calls to convn.
for example: I have size(MyMat)=[fm, fm ,1, bSize] and size(masks)=[s, s, maskNum]
I want res(:,:,k,:) to be the product of convolving masks(:,:,k) with MyMat
res(:,:,k,:)=convn(MyMat,masks(:,:,k));
since the convolution takes up over 80% of the running time for my script and is called hundreds of thousands of times, I don't want to use a loop.
I'm looking for the fastest way to do this. basically, you could say I have bSize matrices, and I want to apply convolution masks masks to all of them with as few calls as possible to convolution.
The matrices are all small,non-sparse, fft-based convolution will probably slow it down (as a commentor here verified :) )
(The reason I have a 1 in the size of MyMat is because I actually have more elements in that dimension, but I compute the convolution for each element in that dimension in a loop)
The main goal is simply to eliminate the need for the following loop, or make it parallel with very little overhead, if possible:
for i=1:length
res(:,:,:,i)=convn(MyArray,convMask(:,:,i));
end
parallelizing for the GPU would be great if there's a way to do this with less overhead than the usual parfor
Thank you!

I assume that you are preallocating the array res correctly? Without a simple demo of what your doing and an idea of the size of fm, s, etc., one can only make guesses to help you. If the sizes of your matrices are large enough you might look into FFT-based convolution methods (there are some for convn on the Matlab File Exchange). If the data is sparse (> 50% zeros), you could try converting this to matrix multiplication and use sparse data types. You could also try gpuArray/convn if you have a decent one.

Related

repmat vs simple matrix multiplication in MATLAB

Let v be a row vector of length n. The goal is to create a matrix A with m rows that are all equal to v.
MATLAB has a function for this that is called repmat. Possible code would be
A = repmat(v,[m 1])
There is an alternative equally concise way using simple matrix operations
A = ones(m,1)*v
Is any of the two methods preferable for large m and n?
Lets compare them!
When testing algorithms 2 metrics are important: time, and memory.
Lets start with time:
Clearly repmat wins!
Memory:
profile -memory on
for m=1000:1000:50000
f1=#()(repmat(v,[m 1]));
f2=#()(ones(m,1)*v);
ii=ii+1;
t1(ii)=timeit(f1);
t2(ii)=timeit(f2);
end
profreport
It seems that both take the same amount of memory. However, the profiler is known for not showing all the memory, so we can not fully trust it.
Still, it is clear that repmat is better
You should use repmat().
Matrix Multiplication is O(n ^ 3) operation which is much slower then replicating data in memory.
On top of that, the second option allocate more data in memory of the size of the output.
In the case above you create a vector which the outer multiplication is faster yet still not as memory operation.
MATLAB doesn't use the knowledge all vector elements are 1, hence you multiply each element of x by 1 m times.
Both operations will be mainly memory bounded, yet more efficient, fast and direct method would be going with repmat().
The question is, what you do afterwards?
Because you may not need repmat().

Grouping Data in a Matrix in MATLAB

I've got a really big matrix which I should "upscale" (i.e.: create another matrix where the elements of the first are grouped 40-by-40). For every 40-by-40 group I should evaluate a series of parameters (i.e.: frequencies, average and standard deviation).
I'm quite sure I can make such thing with a loop, but I was wondering if there was a more elegant vectorized method...
You might find blockproc useful. This command allows you to apply a function (e.g. #mean, #std etc.) to each distinct block in a 2D matrix.

Fast way in Matlab to compute inverse of big matrix 10800x10800?

I have a matrix of size 10800x10800 in Matlab and I compute its inverse
directly with the function:
inv(A)
It takes 3 to 4 minutes just one such computation. And that is part of an
iterative algorithm which needs more than 20 iterations, so overall things would
be very slow. Is there a better way to do this? Maybe some mathematical formulas
or maybe a better Matlab function?
Edit: The matrix is diagonal. Each iteration the diagonal elements are updated
based on formulas for fitting a factor analyzer. But that is irrelevant, the
important thing is that it is a diagonal matrix and it changes each iteration.
THanks
If your matrix is indeed diagonal, you can obviously just do
Ainv = diag(1./diag(A));
which should be very fast.
The backslash operator \ is said to be faster and also could be more accurate. Without MATLAB really I cannot tell, but you could try to run A \ eye(10800) instead of inv(A), and see if it works out.

Matlab dwt across specified dimension

I have a dataset Sig of size 65536 x 192 in Matlab. If I want to take the one-dimensional fft along the second dimension, I could either do a for loop:
%pre-allocate ect..
for i=1:65536
F(i,:) = fft(Sig(i,:));
end
or I could specify the dimension and do it without the for loop:
F = fft(Sig,[],2);
which is about 20 times faster for my dataset.
I have looked for something similar for the discrete wavelet transform (dwt), but been unable to find it. So I was wondering if anyone knows a way to do dwt across a specified dimension in Matlab? Or do I have to use for loops?
In your loop FFT example, it seems you operate on lines. Matlab use a Column-major order. It may explain the difference of performance. Is the performance the same if you operate on columns ?
If this is the right explanation, you could use dwt in a loop.
A solution if you really need performance is to do your own MEX calling a C discrete wavelet transform library the way you want.
I presume you're using the function from the Wavelet Toolbox: http://www.mathworks.co.uk/help/toolbox/wavelet/ref/dwt.html
The documentation doesn't seem to describe acting on an array, so it's probably not supported. If it does allow you to input an array, then it will operate on the first non-singleton dimension or it will ignore the shape and treat it as a vector.

Solving multiple linear systems using vectorization

Sorry if this is obvious but I searched a while and did not find anything (or missed it).
I'm trying to solve linear systems of the form Ax=B with A a 4x4 matrix, and B a 4x1 vector.
I know that for a single system I can use mldivide to obtain x: x=A\B.
However I am trying to solve a great number of systems (possibly > 10000) and I am reluctant to use a for loop because I was told it is notably slower than matrix formulation in many MATLAB problems.
My question is then: is there a way to solve Ax=B using vectorization with A 4x4x N and B a matrix 4x N ?
PS: I do not know if it is important but the B vector is the same for all the systems.
You should use a for loop. There might be a benefit in precomputing a factorization and reusing it, if A stays the same and B changes. But for your problem where A changes and B stays the same, there's no alternative to solving N linear systems.
You shouldn't worry too much about the performance cost of loops either: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.
I don't think you can optimize this further. As explained by #Tom, since A is the one changing, there is no benefit in factoring the various A's beforehand...
Besides the looped solution is pretty fast given the dimensions you mention:
A = rand(4,4,10000);
B = rand(4,1); %# same for all linear systems
tic
X = zeros(4,size(A,3));
for i=1:size(A,3)
X(:,i) = A(:,:,i)\B;
end
toc
Elapsed time is 0.168101 seconds.
Here's the problem:
you're trying to perform a 2D operation (mldivide) on a 3d matrix. No matter how you look at it, you need reference the matrix by index which is where the time penalty kicks in... it's not the for loop which is the problem, but it's how people use them.
If you can structure your problem differently, then perhaps you can find a better option, but right now you have a few options:
1 - mex
2 - parallel processing (write a parfor loop)
3 - CUDA
Here's a rather esoteric solution that takes advantage of MATLAB's peculiar optimizations. Construct an enormous 4k x 4k sparse matrix with your 4x4 blocks down the diagonal. Then solve all simultaneously.
On my machine this gets the same solution up to single precision accuracy as #Amro/Tom's for-loop solution, but faster.
n = size(A,1);
k = size(A,3);
AS = reshape(permute(A,[1 3 2]),n*k,n);
S = sparse( ...
repmat(1:n*k,n,1)', ...
bsxfun(#plus,reshape(repmat(1:n:n*k,n,1),[],1),0:n-1), ...
AS, ...
n*k,n*k);
X = reshape(S\repmat(B,k,1),n,k);
for a random example:
For k = 10000
For loop: 0.122570 seconds.
Giant sparse system: 0.032287 seconds.
If you know that your 4x4 matrices are positive definite then you can use chol on S to improve the accuracy.
This is silly. But so is how slow matlab's for loops still are in 2015, even with JIT. This solution seems to find a sweet spot when k is not too large so everything still fits into memory.
I know this post is years old now, but I'll contribute my two cents anyway. You CAN put all of your A matricies into a bigger block diagonal matrix, where there will be 4x4 blocks on the diagonal of a big matrix. The right hand side will be all of your b vectors stacked on top of each other over and over. Once you set this up, it is represented as a sparse system, and can be efficiently solved with the algorithms mldivide chooses. The blocks are numerically decoupled, so even if there are singular blocks in there, the answers for the nonsingular blocks should be right when you use mldivide. There is a code that took this approach on MATLAB Central:
http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver
I suggest experimenting to see if the approach is any faster than looping. I suspect it can be more efficient, especially for large numbers of small systems. In particular, if there are nice formulas for the coefficients of A across the N matricies, you can build the full left hand side using MATLAB vector operations (without looping), which could give you additional cost savings. As others have noted, vectorized operations aren't always faster, but they often are in my experience.