A memory-efficient replacement for meshgrid - matlab

Assume a simple example where I have indices
index_pos = [3,4,5];
index_neg = [1,2];
I would like to have a matrix:
result =
1 3
2 3
1 4
2 4
1 5
2 5
For this purpose I write the following code:
[X,Y] = meshgrid(index_pos,index_neg);
result = [Y(:) X(:)];
I think this is not a very efficient way. Also, this uses too much of my memory when I use big instances. I get the following error:
Error using repmat
Out of memory. Type "help memory" for your options.
Error in meshgrid (line 58)
xx = repmat(xrow,size(ycol));
Error in FME_funct (line 36)
[X,Y] = meshgrid(index_pos,index_neg);
Is there any 'clever' way to generate this matrix using less memory?
PS: I noticed that what I do is also given here. Most probably I have found this idea from there.

This depends entirely on how big your two variables are in relation to the amount of memory in your computer (plus the types of numbers you're using).
Try this:
res = zeros(numel(index_neg)*numel(index_pos), 2)
If that gives you an out-of-memory error then you don't have enough memory in your computer to store the result, regardless of the efficiency of the generator, so if the above errors, then you're stuck. If it does not error, then you could well write a looping algorithm that uses less temporary memory.
That said, by default MATLAB represents numbers with double precision, 8 bytes per number. If your index_ variables happen to contain, say, only positive integers (all less than 65,536) then you could use 16-bit unsigned integers. These are just 2 bytes per number and so take up 4 times less space than doubles. You can test this with:
res = zeros(numel(index_neg)*numel(index_pos), 2, 'uint16')
Finally you can find out how much memory is available to MATLAB with the memory command.

Here is a faster way to generate such a matrix. It avoids explicit temporary arrays by building the matrix directly in place,
res2 = [ reshape( bsxfun( #times , index_neg.' , ones(size(index_pos)) ) , [] , 1 ) , ...
reshape( bsxfun( #times , index_pos , ones(size(index_neg)).' ) , [] , 1 ) ] ;
Note that this require the same amount of memory to hold the main array, so it will not be possible to generate arrays larger than with your method (which fails at the meshgrid stage). This maximum size is ultimately dictated by the amount of RAM available to your system.

Related

What's the maximum length of matrix I can store in Matlab

I am trying to store a matrix of size 4 x 10^6, but the Matlab can't do it when running it, it's like it can't store a matrix with that size or I should use another way to store. The code is as below:
matrix = [];
for j = 1 : 10^6
x = randn(4,1);
matrix = [matrix x];
end
The problem it still running for long time and can't finish it, however when I remove the line matrix = [matrix x]; , it finishes the loop very quickly. So what I need is to have the matrix in file so that I can use it wherever I need.
It is determined by your amount of available RAM. If you store double values, like here, you require 64 bits per number. Thus, storing 4M values requires 4*10^6*64 = 256M bits, which in turn is 32MB RAM.
A = rand(4,1e6);
whos A
Name Size Bytes Class Attributes
A 4x1000000 32000000 double
Thus you only cannot store this if you have less than 32MB RAM free.
The reason your code takes so long, is because you grow your matrix in place. The orange wiggles on the line matrix = [matrix x]; are not because the festive season is almost here, but because it is very bad practise to do this. As the warning tells you: preallocate your matrix. You know how large it will be, so just initialise it as matrix = zeros(4,1e6); instead of growing it.
Of course in this case you can simply do matrix = rand(4,1e6), which is even faster than looping.
For more information about preallocation see the official MATLAB documentation, this question (which I answered), or this one.

Memory error in Matlab while solving a linear equation

I am having Out of Memory error while trying to solve a certain linear equation (I will put the code below). Since I am used to coding in C where you have every control over the objects you create I am wondering if I am using matlab inefficiently. Here is the relevant part of the code
myData(n).AMatrix = sparse(fscanf(fid2, '%f', [2*M, 2*M]));
myData(n).AMatrix = transpose(myData(n).AMatrix);
%Read the covariance^2 matrix
myData(n).CovMatrix = sparse(fscanf(fid2, '%f', [2*M,2*M]));
myData(n).CovMatrix = reshape(myData(n).CovMatrix, [4*M*M,1]);
%Kronecker sum of A with itself
I=sparse(eye(2*M));
myData(n).AA=kron( I, myData(n).AMatrix)+kron( myData(n).AMatrix,I);
myData(n).AMatrix=[];
I=[];
%Solve (A+A)x = Vec(CovMatrix)
x=myData(n).CovMatrix\myData(n).AA;
Trying to use this code I get the error
Error using \
Out of memory. Type HELP MEMORY for your options.
Error in COV (line 62)
x=myData(n).CovMatrix\myData(n).AA;
Before this piece of code I only open some files (which contain two 100x100 array of floats) so I dont think they contribute to this error. The element AMatrix is a 100 x 100 array. So the linear equation in question has dimensions 10000 x 10000. Also AA has one dimensional kernel, I dont know if this affects the numerical computations. Later I project the obtained solution to the orthogonal complement of the kernel to get the "good" solution but it comes after the error. For people who are familiar with it this is just a solution to the Lyapunov equation AX + XA = Cov. The matrix A is sparse, it has 4 50x50 sublocks one of which is all zeros, the other is identity, the other is diagonal and the other has less than 1000 non-zero elements. The matrix CovMatrix is diagonal with 50 non-zero elements in the diagonal.
The problem is at the moment I can only do the calculations on a small personal computer with 2GB RAM with 2.5-6GB of virtual memmory. When I run memmory on matlab it gives
>> memory
Maximum possible array: 311 MB (3.256e+08 bytes) *
Memory available for all arrays: 930 MB (9.749e+08 bytes) **
Memory used by MATLAB: 677 MB (7.102e+08 bytes)
Physical Memory (RAM): 1931 MB (2.025e+09 bytes)
I am not very knowledgable when it comes to memory so I am open to even simple advices. Thanks.
Complex functions usually allocate temp memory during computation. 10000x10000 looks quite large if a temp dense matrix of such size is allocated during the computation. You could try a few smaller problem sizes and find out the upper limit of your current computer.

Unreasonable [positive] log-likelihood values from matlab "fitgmdist" function

I want to fit a data sets with Gaussian mixture model, the data sets contains about 120k samples and each sample has about 130 dimensions. When I use matlab to do it, so I run scripts (with cluster number 1000):
gm = fitgmdist(data, 1000, 'Options', statset('Display', 'iter'), 'RegularizationValue', 0.01);
I get the following outputs:
iter log-likelihood
1 -6.66298e+07
2 -1.87763e+07
3 -5.00384e+06
4 -1.11863e+06
5 299767
6 985834
7 1.39525e+06
8 1.70956e+06
9 1.94637e+06
The log likelihood is bigger than 0! I think it's unreasonable, and don't know why.
Could somebody help me?
First of all, it is not a problem of how large your dataset is.
Here is some code that produces similar results with a quite small dataset:
options = statset('Display', 'iter');
x = ones(5,2) + (rand(5,2)-0.5)/1000;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 64.4731
2 73.4987
3 73.4987
Of course you know that the log function (the natural logarithm) has a range from -inf to +inf. I guess your problem is that you think the input to the log (i.e. the aposteriori function) should be bounded by [0,1]. Well, the aposteriori function is a pdf function, which means that its value can be very large for very dense dataset.
PDFs must be positive (which is why we can use the log on them) and must integrate to 1. But they are not bounded by [0,1].
You can verify this by reducing the density in the above code
x = ones(5,2) + (rand(5,2)-0.5)/1;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 -8.99083
2 -3.06465
3 -3.06465
So, I would rather assume that your dataset contains several duplicate (or very close) values.

Matlab out of memory error behave differently in one and two dimensional arrays

Today I have the need to allocate a vector with size 100000 in Matlab. I try to do it simply using:
a=ones(100000);
which my Matlab angrily answered with:
Out of memory. Type HELP MEMORY for your options.
Which is strange since I have Matlab 64 bit running on a 64 bit machine with 8 GB RAM. I tried many of the "resolving out of memory errors in Matlab" recipe in SO or other places but no luck so far.
Now I'm more confused when something like:
a=ones(10000,10000);
Runs without problem in my machine.
Does this mean that Matlab have some mechanism to limit the number of elements of a vector in a single-dimensional space?
Today I have the need to allocate a vector with size 100000 in Matlab.
Now, as noted in the comments and such, the method you tried (a=ones(100000);) creates a 100000x100000 matrix, which is not what you want.
I would suggest you try:
a = ones(1, 100000);
Since that creates a vector rather than a matrix.
Arguments Matter
Calling Matlab's ones() or zeros() or magic() with a single argument n, creates a square matrix with size n-by-n:
>> a = ones(5)
a = 1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Calling the same functions with 2 arguments (r, c) instead creates a matrix of size r-by-c:
>> a = ones(2, 5)
a = 1 1 1 1 1
1 1 1 1 1
This is all well documented in Matlab's documentation.
Size Matters Too
Doubles
Having said this, when you do a = zeros(1e6) you are creating a square matrix of size 1e6 * 1e6 = 1e12. Since these are doubles the total allocated size would be 8 * 1e12 Bytes which is circa (8 * 1e12) / 1024^3 = 7450.6GB. Do you have this much RAM on your machine?
Compare this with a = zeros(1, 1e6) which creates a column-vector of size 1 * 1e6 = 1e6, for a total allocated size of (8 * 1e6) / 1024^3 = 7.63MB.
Logicals
Logical values, on the other hand are boolean values, which can be set to either 0 or 1 representing False or True. With this in mind, you can allocate matrices of logicals using either false() or true(). Here the same single-argument rule applies, hence a = false(1e6) creates a square matrix of size 1e6 * 1e6 = 1e12. Matlab today, as many other programming languages, stores bit values, such as booleans, into single Bytes. Even though there is a clear cost in terms of memory usage, such a mechanism provides significant performance improvements. This is because it is accessing single bits is a slow operation.
The total allocated size of our a = false(1e6) matrix would therefore be 1e12 Bytes which is circa 1e12 / 1024^3 = 931.32GB.
Well the first declaration tries to build a matrix of 1000000x1000000 ones. That would be ~931 GB.
The second tries to declare a matrix of 10000 x 10000. That would be ~95MB.
I assumed each one is stored on a byte. If they use floats, than the requested memory size will be 4 times larger.

Matlab fast neighborhood operation

I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?
+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.
One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/