I have been working with a few not particularly well designed or documented matlab scripts provided by my prof. I am required to modify them in order to make them do what is required by the specification, but it is not immediately clear how large certain matrices will become. I know if done properly the problems should not consume excessive resources, but more than a few times now I've had matlab hang trying to allocate huge blocks of memory. I suspect this is because I multiplied a matrix I did not fully understand in the wrong order, resulting in a larger matrix, rather than for example a scalar.
Is there a way to have matlab prompt me if it finds it will need to allocate a matrix over a certain number of bytes? That way while I'm debugging/maintaining code I'll have an opportunity to cancel, before it goes beserk.
Related
Never used this toolbox before, I have a very large problem (i.e. number of variables) to be optimzed. I'm aware it's possible to optimize the hessian computation, which is my issue given the error:
Error using eye
Requested 254016x254016 (480.7GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may
take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information.
But according to this quote (from a forum) it must be possible to optimize the hessian computation:
If you are going to use the trust-region algorithm, you will need to
choose some combination of the options 'Hessian', 'HessMult', and
'HessPattern' to avoid full, explicit computation of the Hessian.
I struggle to find examples of this settings, does anyone know?
My problem is a sparse problem, if such information is necessary.
Basically I'm sure there's some extra options to be put in a line like:
option = optimoptions(#fminunc,...
'Display','iter','GradObj','on','MaxIter',30,...
'ObjectiveLimit',10e-10,'Algorithm','quasi-newton');
You probably need to add 'HessPattern',Hstr to optimoptions. An example is given here (In this example, Hstr is defined in brownhstr.mat; you need to calculate your own hessian sparsity pattern matrix Hstr).
I have a matrix X, size 40-by-60000
while writing the SVM, I need to form a linear kernel: K = X'*X
And of course I would get an error
Requested 60000x60000 (26.8GB) array exceeds maximum array size preference.
How is it usually done? The data set is Mnist, so someone must have done this before. In this case rank(K) <= 40, I need a way to store K and later pass it to quadprog.
How is it usually done?
Usually kernel matrices for big datasets are not precomputed. Since optimisation methods used (like SMO or gradient descent) do only need access to a subset of samples in each iteration, you simply need a data structure which is a lazy kernel matrix, in other words - each time an optimiser requests K[i,j] you literally compute K(xi,xj) then. Often, there are also caching mechanisms to make sure that often requested kernel values are already prepared etc.
If you're willing to commit to a linear kernel (or any other kernel whose corresponding feature transformation is easily computed) you can avoid allocating O(N^2) memory by using a primal optimization method, which does not construct the full kernel matrix K.
Primal methods represent the model using a weighted sum of the training samples' features, and so will only take O(NxD) memory, where N and D are the number of training samples and their feature dimension.
You could also use liblinear (if you resolve the C++ issues).
Note this comment from their website: "Without using kernels, one can quickly train a much larger set via a linear classifier."
This problem occurs due to the large size of your data set, thus it exceeds the amount of RAM available in your system. In 64-bit systems data processing performs better than in 32-bit, so you'll want to check which of the two your system is.
I have three big 3D arrays of the same size [41*141*12403], named in the Matlab code below alpha, beta and ni. From them I need to calculate another 3D array with the same size, which is obtained elementwise from the original matrices through a calculation that combines an infinite sum and definite integral calculations, using the value of each element. It therefore seems inevitible to have to use several nested loops to make this calculation. The code is already running now for several hours(!) and it is still in the first iteration of the outer loop (which needs to be performed 41 times!! According to my calculation, in this way the program will have to run more than two years!!!). I don't know how to optimize the code. Please help me !!
the code I use:
z_len=size(KELDYSH_PARAM_r_z_t,1); % 41 rows
r_len=size(KELDYSH_PARAM_r_z_t,2); % 141 columns
t_len=size(KELDYSH_PARAM_r_z_t,3); % 12403 slices
sumRes=zeros(z_len,r_len,t_len);
for z_ind=1:z_len
z_ind % in order to track the advancement of the calculation
for r_ind=1:r_len
for t_ind=1:t_len
sumCurrent=0;
sumPrevious=inf;
s=0;
while abs(sumPrevious-sumCurrent)>1e-6
kapa=kapa_0+s; %some scalar
x_of_w=(beta(z_ind,r_ind,t_ind).*(kapa-ni...
(z_ind,r_ind,t_ind))).^0.5;
sumPrevious=sumCurrent;
sumCurrent=sumCurrent+exp(-alpha(z_ind,r_ind,t_ind).* ...
(kapa-ni(z_ind,r_ind,t_ind))).*(x_of_w.^(2*abs(m)+1)/2).* ...
w_m_integral(x_of_w,m);
s=s+1;
end
sumRes(z_ind,r_ind,t_ind)=sumCurrent;
end
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function res=w_m_integral(x_of_w,m)
res=quad(#integrandFun,0,1,1e-6);
function y=integrandFun(t)
y=exp(-x_of_w^2*t).*t.^(abs(m))./((1-t).^0.5);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Option 1 - more vectorising
It's a pretty complex model you're working with and not all the terms are explained, but some parts can still be further vectorised. Your alpha, beta and ni matrices are presumably static and precomputed? Your s value is a scalar and kapa could be either, so you can probably precompute the x_of_w matrix all in one go too. This would give you a very slight speedup all on its own, though you'd be spending memory to get it - 71 million points is doable these days but will call for an awful lot of hardware. Doing it once for each of your 41 rows would reduce the burden neatly.
That leaves the integral itself. The quad function doesn't accept vector inputs - it would be a nightmare wouldn't it? - and neither does integral, which Mathworks are recommending you use instead. But if your integration limits are the same in each case then why not do the integral the old-fashioned way? Compute a matrix for the value of the integrand at 1, compute another matrix for the value of the integrand at 0 and then take the difference.
Then you can write a single loop that computes the integral for the whole input space then tests the convergence for all the matrix elements. Make a mask that notes the ones that have not converged and recalculate those with the increased s. Repeat until all have converged (or you hit a threshold for iterations).
Option 2 - parallelise it
It used to be the case that matlab was much faster with vectorised operations than loops. I can't find a source for it now but I think I've read that it's become a lot faster recently with for loops too, so depending on the resources you have available you might get better results by parallelising the code you currently have. That's going to need a bit of refactoring too - the big problems are overheads while copying in data to the workers (which you can fix by chopping the inputs up into chunks and just feeding the relevant one in) and the parfor loop not allowing you to use certain variables, usually ones which cover the whole space. Again chopping them up helps.
But if you have a 2 year runtime you will need a factor of at least 100 I'm guessing, so that means a cluster! If you're at a university or somewhere where you might be able to get a few days on a 500-core cluster then go for that...
If you can write the integral in a closed form then it might be amenable to GPU computation. Those things can do certain classes of computation very fast but you have to be able to parallelise the job and reduce the actual computation to something basic comprised mainly of addition and multiplication. The CUDA libraries have done a lot of the legwork and matlab has an interface to them so have a read about those.
Option 3 - reduce the scope
Finally, if neither of the above two results in sufficient speedups, then you may have to reduce the scope of your calculation. Trim the input space as much as you can and perhaps accept a lower convergence threshold. If you know how many iterations you tend to need inside the innermost while loop (the one with the s counter in it) then it might turn out that reducing the convergence criterion reduces the number of iterations you need, which could speed it up. The profiler can help see where you're spending your time.
The bottom line though is that 71 million points are going to take some time to compute. You can optimise the computation only so far, the odds are that for a problem of this size you will have to throw hardware at it.
Problem: A is square, full rank, sparse and banded. It has way too many elements to be stored as a single matrix in Matlab (at least ~4.6*1018 and ideally ~1040, both of which exceed max array size. EDIT: A is stored as sparse, and the problem is not with limited memory but with limited number of elements). Therefore I have to store it as a collection of smaller arrays (rows/diagonals/columns/blocks).
Looking for: a way to solve Ax=b, with A given as a collection of smaller arrays. Ideally in Matlab but not a must.
Alternatively, if not in Matlab: maybe there's a program that can store and solve such a big A?
Found so far: methods if A is tri/pentadiagonal, but my A has N diagonals. Also found something about partitioning A to blocks, but couldn't find a way to then solve a linear system with these blocks.
p.s. The system is 64-bit.
Thanks everyone!
Not using Matlab would allow you to store larger arrays. ROOT is an open source framework developed at CERN that has C++ and Python interfaces and a variety of solvers. It is also capable of handling huge datasets and has a variety of visualization and analysis tools as well.
If you are interested in writing C or Fortran BLAS(Basic Linear Algebra Subroutines) and CBLAS would be good options. There are many open source and proprietary implementations of BLAS that should be available for most Linux/UNIX distributions. There are also plenty of examples showing how to use the BLAS subroutines in C and Fortran code available online.
If you have access to MATLAB's Parallel Computing Toolbox together with MATLAB Distributed Computing Server, you may be able to store A as a distributed array, in other words a single array whose elements are distributed across the memories of multiple machines in a cluster. You can call MATLAB's backslash command directly on a distributed array, and MATLAB handles the parallelization for you.
I wanted to put this as a comment, but I think it is better to state it as an answer.
You have a serious problem. It is not only a problem of indexing, it is also a problem of memory: 4.6x10^18 is huge. That is 4.6 exa elements. If you store them as real single precision, you need 4x4.6 exabyte of memory. A computer which such a huge memory, does not yet exists to my knowledge. You will need to gather all the storage (hard disk, not RAM) of a significant proportion of all computers in the world to store such a matrix. Think about it. Going to 10^40 elements is nearly impractical for the time being. With your 64 bit computers, the 64 bit address space can bearly address 4.6x10^18 elements. 64 bits address (or integer) makes it possible to directly index 2^64 elements which is roughly 16x10^18. So you have to think twice.
Going back to the problem itself, there are chances that you can turn your matrix into an implicit operator. By implicit operator, I mean, you do not need to store it, because it has a pattern that you know how to reproduce, or you can apply it to a vector without actually forming the matrix. If you have the matrix in hand, you are very likely in this situation, considering what I said above.
If that is the case, to solve your problem, you simply need to use an iterative solver and provide a black box that does your matrix multiplication. Going to other directions might be a waste of your time.
I understand how using vectorization in a language like MATLAB speeds up the code by removing the overhead of maintaining a loop variable, but how does the vectorization actually take place in the assembly / machine code? I mean there still has to be a loop somewhere, right?
Matlab 'vectorization' concept is completely different than the vector instructions concept, such as SSE. This is a common misunderstanding between two groups of people: matlab programmers and C/asm programmers. Matlab 'vectorization', as the word is commonly used, is only about expressing loops in the form of (vectors of) matrix indices, and sometimes about writing things in terms of basic matrix/vector operations (BLAS), instead of writing the loop itself. Matlab 'vectorized' code is not necessarily expressed as vectorized CPU instructions. Consider the following code:
A = rand(1000);
B = (A(1:2:end,:)+A(2:2:end,:))/2;
This code computes mean values for two adjacent matrix rows. It is a 'vectorized' matlab expression. However, since matlab stores matrices column-wise (columns are contiguous in memory), this operation is not trivially changed into operations on SSE vectors: since we perform the operations row-wise the data you need to load into the vectors is not stored contiguously in the memory.
This code on the other hand
A = rand(1000);
B = (A(:,1:2:end)+A(:,2:2:end))/2;
can take advantage of SSE instructions and streaming instructions, since we operate on two adjacent columns at a time.
So, matlab 'vectorization' is not equivalent to using CPU vector instructions. It is just a word used to signify the lack of a loop implemented in MATLAB. To add to the confusion, sometimes people even use the word to say that some loop has been implemented using a built-in function, such as arrayfun, or bsxfun. Which is even more misleading since those functions might be significantly slower than native matlab loops. As robince said, not all loops are slow in matlab nowadays, though you do need to know when they work, and when they don't.
And in any way you always need a loop, it is just implemented in matlab built-in functions / BLAS instead of the users matlab code.
Yes there is still a loop. But it is able to loop directly in compiled code. Loops in Fortran (on which Matlab was originally based) C or C++ are not inherently slow. That they are slow in Matlab is a property of dynamic runtime (they are also slower in other dynamic languages like Python).
Since Matlab has introduced a Just-In-Time compiler loop performance has actually increased dramatically - so the old guidelines to avoid loops are less important with recent versions than they once were.