What's the maximum length of matrix I can store in Matlab - matlab

I am trying to store a matrix of size 4 x 10^6, but the Matlab can't do it when running it, it's like it can't store a matrix with that size or I should use another way to store. The code is as below:
matrix = [];
for j = 1 : 10^6
x = randn(4,1);
matrix = [matrix x];
end
The problem it still running for long time and can't finish it, however when I remove the line matrix = [matrix x]; , it finishes the loop very quickly. So what I need is to have the matrix in file so that I can use it wherever I need.

It is determined by your amount of available RAM. If you store double values, like here, you require 64 bits per number. Thus, storing 4M values requires 4*10^6*64 = 256M bits, which in turn is 32MB RAM.
A = rand(4,1e6);
whos A
Name Size Bytes Class Attributes
A 4x1000000 32000000 double
Thus you only cannot store this if you have less than 32MB RAM free.
The reason your code takes so long, is because you grow your matrix in place. The orange wiggles on the line matrix = [matrix x]; are not because the festive season is almost here, but because it is very bad practise to do this. As the warning tells you: preallocate your matrix. You know how large it will be, so just initialise it as matrix = zeros(4,1e6); instead of growing it.
Of course in this case you can simply do matrix = rand(4,1e6), which is even faster than looping.
For more information about preallocation see the official MATLAB documentation, this question (which I answered), or this one.

Related

Memory error in Matlab while solving a linear equation

I am having Out of Memory error while trying to solve a certain linear equation (I will put the code below). Since I am used to coding in C where you have every control over the objects you create I am wondering if I am using matlab inefficiently. Here is the relevant part of the code
myData(n).AMatrix = sparse(fscanf(fid2, '%f', [2*M, 2*M]));
myData(n).AMatrix = transpose(myData(n).AMatrix);
%Read the covariance^2 matrix
myData(n).CovMatrix = sparse(fscanf(fid2, '%f', [2*M,2*M]));
myData(n).CovMatrix = reshape(myData(n).CovMatrix, [4*M*M,1]);
%Kronecker sum of A with itself
I=sparse(eye(2*M));
myData(n).AA=kron( I, myData(n).AMatrix)+kron( myData(n).AMatrix,I);
myData(n).AMatrix=[];
I=[];
%Solve (A+A)x = Vec(CovMatrix)
x=myData(n).CovMatrix\myData(n).AA;
Trying to use this code I get the error
Error using \
Out of memory. Type HELP MEMORY for your options.
Error in COV (line 62)
x=myData(n).CovMatrix\myData(n).AA;
Before this piece of code I only open some files (which contain two 100x100 array of floats) so I dont think they contribute to this error. The element AMatrix is a 100 x 100 array. So the linear equation in question has dimensions 10000 x 10000. Also AA has one dimensional kernel, I dont know if this affects the numerical computations. Later I project the obtained solution to the orthogonal complement of the kernel to get the "good" solution but it comes after the error. For people who are familiar with it this is just a solution to the Lyapunov equation AX + XA = Cov. The matrix A is sparse, it has 4 50x50 sublocks one of which is all zeros, the other is identity, the other is diagonal and the other has less than 1000 non-zero elements. The matrix CovMatrix is diagonal with 50 non-zero elements in the diagonal.
The problem is at the moment I can only do the calculations on a small personal computer with 2GB RAM with 2.5-6GB of virtual memmory. When I run memmory on matlab it gives
>> memory
Maximum possible array: 311 MB (3.256e+08 bytes) *
Memory available for all arrays: 930 MB (9.749e+08 bytes) **
Memory used by MATLAB: 677 MB (7.102e+08 bytes)
Physical Memory (RAM): 1931 MB (2.025e+09 bytes)
I am not very knowledgable when it comes to memory so I am open to even simple advices. Thanks.
Complex functions usually allocate temp memory during computation. 10000x10000 looks quite large if a temp dense matrix of such size is allocated during the computation. You could try a few smaller problem sizes and find out the upper limit of your current computer.

MATLAB spending an incredible amount of time writing a relatively small matrix

I have a small MATLAB script (included below) for handling data read from a CSV file with two columns and hundreds of thousands of rows. Each entry is a natural number, with zeros only occurring in the second column. This code is taking a truly incredible amount of time (hours) to run what should be achievable in at most some seconds. The profiler identifies that approximately 100% of the run time is spent writing a matrix of zeros, whose size varies depending on input, but in all usage is smaller than 1000x1000.
The code is as follows
function [data] = DataHandler(D)
n = size(D,1);
s = max(D,1);
data = zeros(s,s);
for i = 1:n
data(D(i,1),D(i,2)+1) = data(D(i,1),D(i,2)+1) + 1;
end
It's the data = zeros(s,s); line that takes around 100% of the runtime. I can make the code run quickly by just changing out the s's in this line for 1000, which is a sufficient upper bound to ensure it won't run into errors for any of the data I'm looking at.
Obviously there're better ways to do this, but being that I just bashed the code together to quickly format some data I wasn't too concerned. As I said, I fixed it by just replacing s with 1000 for my purposes, but I'm perplexed as to why writing that matrix would bog MATLAB down for several hours. New code runs instantaneously.
I'd be very interested if anyone has seen this kind of behaviour before, or knows why this would be happening. Its a little disconcerting, and it would be good to be able to be confident that I can initialize matrices freely without killing MATLAB.
Your call to zeros is incorrect. Looking at your code, D looks like a D x 2 array. However, your call of s = max(D,1) would actually generate another D x 2 array. By consulting the documentation for max, this is what happens when you call max in the way you used:
C = max(A,B) returns an array the same size as A and B with the largest elements taken from A or B. Either the dimensions of A and B are the same, or one can be a scalar.
Therefore, because you used max(D,1), you are essentially comparing every value in D with the value of 1, so what you're actually getting is just a copy of D in the end. Using this as input into zeros has rather undefined behaviour. What will actually happen is that for each row of s, it will allocate a temporary zeros matrix of that size and toss the temporary result. Only the dimensions of the last row of s is what is recorded. Because you have a very large matrix D, this is probably why the profiler hangs here at 100% utilization. Therefore, each parameter to zeros must be scalar, yet your call to produce s would produce a matrix.
What I believe you intended should have been:
s = max(D(:));
This finds the overall maximum of the matrix D by unrolling D into a single vector and finding the overall maximum. If you do this, your code should run faster.
As a side note, this post may interest you:
Faster way to initialize arrays via empty matrix multiplication? (Matlab)
It was shown in this post that doing zeros(n,n) is in fact slow and there are several neat tricks to initializing an array of zeros. One way is to accomplish this by empty matrix multiplication:
data = zeros(n,0)*zeros(0,n);
One of my personal favourites is that if you assume that data was not declared / initialized, you can do:
data(n,n) = 0;
If I can also comment, that for loop is quite inefficient. What you are doing is calculating a 2D histogram / accumulation of data. You can replace that for loop with a more efficient accumarray call. This also avoids allocating an array of zeros and accumarray will do that under the hood for you.
As such, your code would basically become this:
function [data] = DataHandler(D)
data = accumarray([D(:,1) D(:,2)+1], 1);
accumarray in this case will take all pairs of row and column coordinates, stored in D(i,1) and D(i,2) + 1 for i = 1, 2, ..., size(D,1) and place all that match the same row and column coordinates into a separate 2D bin, we then add up all of the occurrences and the output at this 2D bin gives you the total tally of how many values at this 2D bin which corresponds to the row and column coordinate of interest mapped to this location.

Reading time series of NII images getting slower

I am developing a program to read a time series of NIfTY format images to a 4D matrix in MATLAB. There are about 60 images in the stack and the program runs without problems until the 28th image. (All the images are approximately same size, same details) But after that the reading get slower and slower.
In fact, the delay is accumulating.
I checked the program again and there are no open files. Everything looks fine.
Can someone give me an advice?
Size of current array (double)
Unless you are running on a machine with more than ~20GB RAM memory your matrix simply becomes too large to handle.
To check the size of the first three dimensions of your matrix:
A = rand(512,512,160);
whos('A')
Output:
Name Size Bytes Class Attributes
A 512x512x160 335544320 double
Now multiply by 60 to obtain the size of your 4D matrix and divide by 1024^3 to obtain GB's:
335544320*60/1024^3 = 18.7500 GB
So yes, your matrix is most likely too large to handle efficiently/effectively.
A matrix exceeding your RAM memory forces MatLab to use the swap file (HDD/SSD) which is orders of magnitude slower than your random access memory (even if you have a SSD).
Switch to different data types
I you do not require double precision, i.e. 16 digits of accuracy, you can always switch to less digits, i.e. single precision floating point numbers. By doing this you can reduce size. You can even reduce size further is the numbers are for example unsigned integers in the range of 0-255. See code below:
% Create doubles
A_double = rand(512,512,160);
S1=whos('A_double');
% Create floats
A_float = single(A_double);
S2=whos('A_float');
% Create unsigned int range 0-255
A_uint=uint8(randi(256,[512,512,160])-1);
S3=whos('A_uint');
fprintf('Size A_double is %4.2f GB\n',(S1.bytes*60)/1024^3)
fprintf('Size A_float is %4.2f GB\n',(S2.bytes*60)/1024^3)
fprintf('Size A_uint is %4.2f GB\n',(S3.bytes*60)/1024^3)
Output:
Size A_double is 18.75 GB
Size A_float is 9.38 GB
Size A_uint is 2.34 GB
Which may just fit inside your RAM. Make sure you indeed pre-allocate memory first, i.e. create an empty matrix using the zeros() function.

Matlab fast neighborhood operation

I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?
+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.
One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/

matlab: out of memory when concatenating sparse matrix with a vector

I create a 3560 x 3560 sparse matrix, A. I then create two 1 X 3560 vectors, S and T.
When I run the following code (which concatenates S and T as rows in A and afterwards also as columns in A)
A=[A;S;T];
S=[S 0 0];
T=[T 0 0];
A=[A, S', T'];
The last line produces an out of memory error.
I guess I am running out of memory since I have other variables stored, but it seems odd to me that adding two 3560 vectors would be the point in which I am exactly hitting my limit, so I think (or more accurately, wishfully think) that somehow the concatenations aren't done in a smart way...
Am I right or is there no hope (except for optimizing other pieces in my code)?
EDIT:
At the request of yoda, I am posting the full code.
Basically what it does is get a N X N matrix of edge weights between the nodes of a graph, and adds two vectors that will act as a source and sink in a max flow computation.
nbr_sim(nbr_sim<0.8)=0;
A=sparse(size(nbr_sim,1)+2,size(nbr_sim,2)+2);
nelements=size(nbr_sim,1);
A(nbr_sim>0)=nbr_sim(nbr_sim>0);
clear nbr_sim;
S=abs([1 0 0]*n);
T=abs([0 1 0]*n);
A(1:nelements,end-1)=S';
A(1:nelements,end)=T';
A(end-1,1:nelements)=S;
A(end,1:nelements)=T;
EDIT:
As you say you have used considerable resources before this operation, it is entirely likely that you are close to the tipping point, when MATLAB gives you an out of memory error.
Remember that when you grow matrices on the fly either by concatenating or by indexing out of range, MATLAB creates a copy of the matrix in memory. So you're not just using up resources for that extra row, but for a copy of that entire matrix!
Here's an example on my machine where I try to grow a vector that's large enough to tip it over the memory limit.
clear
a=rand(2*10^9+1,1); %#create a large array
whos a
Name Size Bytes Class Attributes
a 2000000001x1 16000000008 double
%#Now repeat the same, but by growing the array by one element
clear
a=rand(2*10^9,1);
a=[a;0];
??? Error using ==> vertcat
Out of memory. Type HELP MEMORY for your options.
So you see that although MATLAB can create a matrix with 2*10^9+1 elements in one go, when you try to create an array of the same size by append a single element to a 2*10^9 element vector, it runs out of memory.
If S and T are column vectors as you say, then A=[A;S;T] should give you an error:
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.
So you must be doing something else. Concatenating will not change sparseness of the matrix i.e., it won't switch from sparse to full.
A=sprand(3560,3560,0.01); %#test matrices
S=rand(3560,1);
T=rand(3560,1);
B=[A,S,T]; %#join the columns
issparse(B)
ans =
1
Moreover, a 3560x3560 matrix of doubles is only ~97 MB, which shouldn't give you an "out of memory" error...
When dealing with large matrix:
For full matrix, you'd better preallocate memory to avoid memory copy during extending.see why
The sparse case is more complicated, and can be even less efficiency than extending in full matrix, because the elements is stored in a compressed manner. Setting an "inner" entry may cause large memory overwrites(have a look here).
So you'd better edit all the entries in advance and create with sparse() function, rather than call sparse() and then pad the data.