Generic Matlab code optimization [closed] - matlab

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We know that matlab for loops are very expensive as far as the time is concerned. I am hearing from colleagues saying try to replace for loops with matrix operations. In other words try to replace time consumption with memory consumption.
In a previous post I have asked how it was possible to use CUDA to compare element by element two matrices. #Eric then suggested my to transform the two for loops I had with matrix operations and finally do these matrix operations on the gpu. It was very intuitive. Inspired by this answer I start thinking more seriously matlab code optimizations.
The reason I am doing this post is because I would like to ask if anyone can give some similar intuitive examples or explain efficient coding techniques for writting efficient matlab code? References to any tutorial or a book it would be great.
Thank you!!

Vectorize loops
In Matlab, you can gain speedup by using Vectorize loops. MATLAB is specifically designed to operate on vectors and matrices, so it is usually quicker to perform operations on vectors, or matrices, rather than using a loop. For example:
index=0;
for time=0:0.001:60;
index=index+1;
waveForm(index)=cos(time);
end;
would run considerably faster if replaced with:
time=0:0.001:60;
waveForm=cos(time);
Functions that you might find useful when using vector operations in place of loops include:
any() – returns true if any element is non-zero.
size() – returns a vector containing the number of elements in an array along each dimension.
find() – returns the indices of any non-zero elements. To get the non-zero values themselves you can use something like a(find(a));.
cumsum() – returns a vector containing the cumulative sum of the elements in its argument vector, e.g. cumsum([0:5]) returns [0 1 3 6 10 15].
sum() – returns the sum of all the elements in a vector, e.g. sum([0:5]) returns 15.
Besides replacing for-loops with matrix operations, you can maximizing code performance by optimizing memory access:
1. Preallocate arrays before accessing them within loops
When creating or repeatedly modifying arrays within loops, always allocate the arrays beforehand. Of all three techniques, this familiar one can give the biggest performance improvement.
Code segment 2 executes in 99.8% less time (580 times faster) than segment 1 on machine A, and in 99.7% less time (475 times faster) than segment 1 on machine B.
2. Store and access data in columns
When processing 2-D or N-D arrays, access your data in columns and store it so that it is easily accessible by columns.
Code segment 2 executes in 33% less time than segment 1 on machine A, and in 55% less time than segment 1 on machine B.
3. Avoid creating unnecessary variables
When creating new variables or variables that are functions of existing data sets, ensure that they are essential to your algorithm, particularly if your data set is large.
Code segment 2 executes in 40% less time than code segment 1 on machine A and in 96% less time on machine B.
Code segment 4 executes in 40% less time than code segment 3 on machine A and in 59% less time on machine B.
References:
Profiling, Optimization, and Acceleration of MATLAB code
Programming Patterns: Maximizing Code Performance by Optimizing Memory Access
Writing Fast MATLAB Code
Introduction to Computers for Engineers

Related

the maximum size of matrix in Matlab [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want to solve the linear equation Ax=b, with A is 40000x400000 matrix, x and b are 400000x1 array . When I use A\b in MATLAB, I have the error "out of memory". Because the size of matrix is too large, I can not use A\b in MATLAB. How can I solve my problem without using the command A\b in Matlab.
Thank you for your help
The problem is partially on Matlab and partially on your machine.
First, think very well if you actually need to solve this problem with a 40000x400000, maybe you can simplify it, maybe you can segment your problem, maybe the system is decoupled, just check very well.
For context, to store a 40000x400000 matrix of 8 byte floats, you'd need around 120gb. That's likely too much, there's a good chance you don't even have enough free disk space for it.
If this matrix has many zeros, at least much more than non-zeros, then you could exploit Matlab's sparse matrix functions. They work without storing the whole matrix, and operating only on non-zero numbers, basically.
If you are lazy but you have a very good machine (say 1TB of SSD), you might consider increasing the size of the paging file in linux (or its analogous on windows). That basically means that there is a space in the disk that you'll allow the computer to use as if it were RAM memory. While memory wouldn't crash, the operations you need to perform would take insanely long, so consider starting with a smaller matrix to gauge the execution time, which should grow with the cube of the vector length.
So case in point, try to review the problem you've got at your hands.

how to speed up Matlab nested for loops when I cannot vectorize the calculations?

I have three big 3D arrays of the same size [41*141*12403], named in the Matlab code below alpha, beta and ni. From them I need to calculate another 3D array with the same size, which is obtained elementwise from the original matrices through a calculation that combines an infinite sum and definite integral calculations, using the value of each element. It therefore seems inevitible to have to use several nested loops to make this calculation. The code is already running now for several hours(!) and it is still in the first iteration of the outer loop (which needs to be performed 41 times!! According to my calculation, in this way the program will have to run more than two years!!!). I don't know how to optimize the code. Please help me !!
the code I use:
z_len=size(KELDYSH_PARAM_r_z_t,1); % 41 rows
r_len=size(KELDYSH_PARAM_r_z_t,2); % 141 columns
t_len=size(KELDYSH_PARAM_r_z_t,3); % 12403 slices
sumRes=zeros(z_len,r_len,t_len);
for z_ind=1:z_len
z_ind % in order to track the advancement of the calculation
for r_ind=1:r_len
for t_ind=1:t_len
sumCurrent=0;
sumPrevious=inf;
s=0;
while abs(sumPrevious-sumCurrent)>1e-6
kapa=kapa_0+s; %some scalar
x_of_w=(beta(z_ind,r_ind,t_ind).*(kapa-ni...
(z_ind,r_ind,t_ind))).^0.5;
sumPrevious=sumCurrent;
sumCurrent=sumCurrent+exp(-alpha(z_ind,r_ind,t_ind).* ...
(kapa-ni(z_ind,r_ind,t_ind))).*(x_of_w.^(2*abs(m)+1)/2).* ...
w_m_integral(x_of_w,m);
s=s+1;
end
sumRes(z_ind,r_ind,t_ind)=sumCurrent;
end
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function res=w_m_integral(x_of_w,m)
res=quad(#integrandFun,0,1,1e-6);
function y=integrandFun(t)
y=exp(-x_of_w^2*t).*t.^(abs(m))./((1-t).^0.5);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Option 1 - more vectorising
It's a pretty complex model you're working with and not all the terms are explained, but some parts can still be further vectorised. Your alpha, beta and ni matrices are presumably static and precomputed? Your s value is a scalar and kapa could be either, so you can probably precompute the x_of_w matrix all in one go too. This would give you a very slight speedup all on its own, though you'd be spending memory to get it - 71 million points is doable these days but will call for an awful lot of hardware. Doing it once for each of your 41 rows would reduce the burden neatly.
That leaves the integral itself. The quad function doesn't accept vector inputs - it would be a nightmare wouldn't it? - and neither does integral, which Mathworks are recommending you use instead. But if your integration limits are the same in each case then why not do the integral the old-fashioned way? Compute a matrix for the value of the integrand at 1, compute another matrix for the value of the integrand at 0 and then take the difference.
Then you can write a single loop that computes the integral for the whole input space then tests the convergence for all the matrix elements. Make a mask that notes the ones that have not converged and recalculate those with the increased s. Repeat until all have converged (or you hit a threshold for iterations).
Option 2 - parallelise it
It used to be the case that matlab was much faster with vectorised operations than loops. I can't find a source for it now but I think I've read that it's become a lot faster recently with for loops too, so depending on the resources you have available you might get better results by parallelising the code you currently have. That's going to need a bit of refactoring too - the big problems are overheads while copying in data to the workers (which you can fix by chopping the inputs up into chunks and just feeding the relevant one in) and the parfor loop not allowing you to use certain variables, usually ones which cover the whole space. Again chopping them up helps.
But if you have a 2 year runtime you will need a factor of at least 100 I'm guessing, so that means a cluster! If you're at a university or somewhere where you might be able to get a few days on a 500-core cluster then go for that...
If you can write the integral in a closed form then it might be amenable to GPU computation. Those things can do certain classes of computation very fast but you have to be able to parallelise the job and reduce the actual computation to something basic comprised mainly of addition and multiplication. The CUDA libraries have done a lot of the legwork and matlab has an interface to them so have a read about those.
Option 3 - reduce the scope
Finally, if neither of the above two results in sufficient speedups, then you may have to reduce the scope of your calculation. Trim the input space as much as you can and perhaps accept a lower convergence threshold. If you know how many iterations you tend to need inside the innermost while loop (the one with the s counter in it) then it might turn out that reducing the convergence criterion reduces the number of iterations you need, which could speed it up. The profiler can help see where you're spending your time.
The bottom line though is that 71 million points are going to take some time to compute. You can optimise the computation only so far, the odds are that for a problem of this size you will have to throw hardware at it.

Vectorizing nested for loops and pointwise multiplication in matlab

Long time reader, first time inquisitor. So I am currently hitting a serious bottleneck from the following code:
for kk=1:KT
parfor jj=1:KT
umodes(kk,jj,:,:,:) = repmat(squeeze(psi(kk,jj,:,:)), [1,1,no_bands]).*squeeze(umodes(kk,jj,:,:,:));
end
end
In plain language, I need to tile the multi-dimensional array 'psi' across another dimension of length 'no_bands' and then perform pointwise multiplication with the matrix 'umodes'. The issue is that each of the arrays I am working with is large, on the order of 20 GB or more.
What is happening then, I suspect, is that my processors grind to a halt due to cache limits or because data is being paged. I am reasonably convinced there is no practical way to reduce the size of my arrays, so at this point I am trying to reduce computational overhead to a bare minimum.
If that is not possible, it might be time to think of using a proper programming language where I can enforce pass by reference to avoid unnecessary replication of arrays.
Often bsxfun uses less memory than repmat. So you can try:
for kk=1:KT
parfor jj=1:KT
umodes(kk,jj,:,:,:) = bsxfun(#times, squeeze(psi(kk,jj,:,:)), squeeze(umodes(kk,jj,:,:,:)));
end
end
Or you can vectorize the two loops. Vectorizing is usually faster, although not necessarily more memory-efficient, so I'm not sure it helps in your case. In any case, bsxfun benefits from multihreading:
umodes = bsxfun(#times, psi, umodes);

MATLAB : Appending to pre-allocated matrix

I have some MATLAB code with mxn matrix.
Initially, I put first row in it and then the code runs through a for loop which appends remaining m-1 rows one by one; one for each iteration of the loop.
As expected, MATLAB recommends me to pre-allocate the matrix because it is expanding with every iteration of loop.
So, if I pre-allocate zeros in all m rows, MATLAB most probably will append rows after the m rows(starting from m+1 for 1st appended row) because m rows are already filled(even though with zeros!)
Is there any way of pre-allocating matrix in this scenario for improving speed?
You cannot pre-allocate a MATLAB array without also changing it's size, at least not manually. However, MATLAB has improved automatic array growth performance a lot in recent versions, so you might not see a huge performance hit. Still, best practice would be to pre-allocate your array with zeros and index the rows with A(i,:) = rowVec; instead of appending a row (A = [A; rowVec];).
Pre-allocation
If you are determined to squeeze every bit of performance out of MATLAB, Yair Altman has a couple of excellent articles on the topic of memory pre-allocation:
Preallocation performance
Preallocation performance and multithreading
Automatic Array Growth Optimization
If you really want to use dynamic array resizing by growing along a dimension, there are ways to do it right. See this this MathWorks blog post by Steve Eddins. The most important thing to note is that you should grow along the last dimension for best performance. (i.e. add columns in your case). Yair also discusses dynamic array resizing in another post on his blog.
Also, there are ways of allocating an array without initializing using some hairy MEX API acrobatics, but that's it.

How does MATLAB vectorized code work "under the hood"?

I understand how using vectorization in a language like MATLAB speeds up the code by removing the overhead of maintaining a loop variable, but how does the vectorization actually take place in the assembly / machine code? I mean there still has to be a loop somewhere, right?
Matlab 'vectorization' concept is completely different than the vector instructions concept, such as SSE. This is a common misunderstanding between two groups of people: matlab programmers and C/asm programmers. Matlab 'vectorization', as the word is commonly used, is only about expressing loops in the form of (vectors of) matrix indices, and sometimes about writing things in terms of basic matrix/vector operations (BLAS), instead of writing the loop itself. Matlab 'vectorized' code is not necessarily expressed as vectorized CPU instructions. Consider the following code:
A = rand(1000);
B = (A(1:2:end,:)+A(2:2:end,:))/2;
This code computes mean values for two adjacent matrix rows. It is a 'vectorized' matlab expression. However, since matlab stores matrices column-wise (columns are contiguous in memory), this operation is not trivially changed into operations on SSE vectors: since we perform the operations row-wise the data you need to load into the vectors is not stored contiguously in the memory.
This code on the other hand
A = rand(1000);
B = (A(:,1:2:end)+A(:,2:2:end))/2;
can take advantage of SSE instructions and streaming instructions, since we operate on two adjacent columns at a time.
So, matlab 'vectorization' is not equivalent to using CPU vector instructions. It is just a word used to signify the lack of a loop implemented in MATLAB. To add to the confusion, sometimes people even use the word to say that some loop has been implemented using a built-in function, such as arrayfun, or bsxfun. Which is even more misleading since those functions might be significantly slower than native matlab loops. As robince said, not all loops are slow in matlab nowadays, though you do need to know when they work, and when they don't.
And in any way you always need a loop, it is just implemented in matlab built-in functions / BLAS instead of the users matlab code.
Yes there is still a loop. But it is able to loop directly in compiled code. Loops in Fortran (on which Matlab was originally based) C or C++ are not inherently slow. That they are slow in Matlab is a property of dynamic runtime (they are also slower in other dynamic languages like Python).
Since Matlab has introduced a Just-In-Time compiler loop performance has actually increased dramatically - so the old guidelines to avoid loops are less important with recent versions than they once were.