pre-allocation of array size in Matlab - matlab

What is the efficient way of dynamically preallocating the size of an array variable which appears to change its size on every loop iteration in Matlab? It is possible to initialize it using the zeros() matrix but it sometimes is very tricky (for example: in determining the upper and lower limits).

This is the solution that I use for 2D arrays of dynamic size. Say the maximal limit of my array is 2000x2000, then I just preallocate zeros of the 1-D vector analogue (a 2000^2x1 vector) and after the code ends get rid of the zeros (or just ignore them in case the data is going to a histogram), and reshape once if needed after the code ends...
For example:
for n=1:100;
v=zeros(2000^2,1);
v(1:numel(data))=data(:);
% rest of the code here
end

Related

Submatrix based on size vector

It seems like this problem should be common, but I haven't found a good duplicate...
I'm implementing a level 2 S-function with a variable-sized multidimensional output. The state has to be in fixed-size Dwork vectors, so I zero-pad the input matrix to the maximum size allowed for the input and then reshape it to a vector.
When I reshape it back to a matrix for output, I need to trim it back down to the correct size.
The function needs to be general enough to support an arbitrary number of dimensions. The size of the output is stored in a size array.
For example, I may have a 500x500 matrix N, and a size array S = [40 25]. I need a MATLAB expression that would give me N(1:S(1), 1:S(2)), but it needs to work for any number of dimensions so I can't simply hardcode it like that.
Here is a solution in m-code:
%your input
M=rand(10,10,10);
S=[2,3,4]
%generate indices:
Index=arrayfun(#(x)(1:x),S,'uni',0)
%use comma separated list to index:
smallM=M(Index{:})

Sparse Matrix Assignment becomes very slow in Matlab

I am filling a sparse matrix P (230k,290k) with values coming from a text file which I read line by line, here is the (simplified) code
while ...
C = textscan(text_line,'%d','delimiter',',','EmptyValue', 0);
line_number = line_number+1;
P(line_number,:)=C{1};
end
the problem I have is that while at the beginning the
P(line_number,:)=C{1};
statement is fast, after a few thousands lines become exterely slow, I guess because Matlab need to find the memory space to allocate every time. Is there a way to pre-allocate memory with sparse matrixes? I don't think so but maybe I am missing something. Any other advise which can speed up the operation (e.g. having a lot of free RAM can make the difference?)
There's a sixth input argument to sparse that tells the number of nonzero elements in the matrix. That's used by Matlab to preallocate:
S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an
m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space
allocated for nzmax nonzeros.
So you could initiallize with
P = sparse([],[],[],230e3,290e3,nzmax);
You can make a guess about the number of nonzeros (perhaps checking file size?) and use that as nzmax. If it turns you need more nonzero elements in the end, Matlab will preallocate on the fly (slowly).
By far the fastest way to generate a sparse matrix wihtin matlab is to load all the values in at once, then generate the sparse matrix in one call to sparse. You have to load the data and arrange it into vectors defining the row and column indices and values for each filled cell. You can then call sparse using the S = sparse(i,j,s,m,n) syntax.

Visualizing a large matrix in matlab

I have a huge sparse matrix (1,000 x 1,000,000) that I cannot load on matlab (not enough RAM).
I want to visualize this matrix to have an idea of its sparsity and of the differences of the values.
Because of the memory constraints, I want to proceed as follows:
1- Divide the matrix into 4 matrices
2- Load each matrix on matlab and visualize it so that the colors give an idea of the values (and of the zeros particularly)
3- "Stick" the 4 images I will get in order to have a global idea for the original matrix
(i) Is it possible to load "part of a matrix" in matlab?
(ii) For the visualization tool, I read about spy (and daspect). However, this function only enables to visualize the non-zero values indifferently of their scales. Is there a way to add a color code?
(iii) How can I "stick" plots in order to make one?
If your matrix is sparse, then it seems that the currently method of storing it (as a full matrix in a text file) is very inefficient, and certainly makes loading it into MATLAB very hard. However, I suspect that as long as it is sparse enough, it can still be leaded into MATLAB as a sparse matrix.
The traditional way of doing this would be to load it all in at once, then convert to sparse representation. In your case, however, it would make sense to read in the text file, one line at a time, and convert to a MATLAB sparse matrix on-the-fly.
You can find out if this is possible by estimating the sparsity of your matrix, and using this to see if the whole thing could be loaded into MATLAB's memory as a sparse matrix.
Try something like: (untested code!)
% initialise sparse matrix
sparse_matrix = sparse(num_rows, num_cols);
row_num = 1;
fid = fopen(filename);
% read each line of text file in turn
while ~feof(fid)
this_line = fscanf(fid, '%f');
% add row to sparse matrix (note transpose, which I think is required)
sparse_matrix(row_num, :) = this_line';
row_num = row_num + 1;
end
fclose(fid)
% visualise using spy
spy(sparse_matrix)
Visualisation
With regards to visualisation: visualising a sparse matrix like this via a tool like imagesc is possible, but I believe it may internally create the full matrix – maybe someone can confirm if this is true or not. If it does, then it's going to cause you memory problems.
All spy is really doing is plotting in 2D the locations of the non-zero elements. You can fairly easily write your own spy function, which can have different coloured or sized points depending on the values at each location. See this answer for some examples.
Saving sparse matrices
As I say above, the method your matrix is saved as is pretty inefficient – for a matrix with 10% sparsity, around 95% of your text file will be a zero or a space. I don't know where this data has come from, but if you have any control over its creation (e.g. it comes from another program you have written) it would make much more sense to save only the non-zero elements in the format row_idx, col_idx, value.
You can then use spconvert to import the sparse matrix directly.
One of the simplest methods (if you can actually store the full sparse matrix in RAM) is to use gnuplot to visualize the sparisty pattern.
I was able to spy matrices of size 10-20GB using gnuplot without problems. But make sure you use png or jpeg formats to output the image. Note that you don't need the value of the non-zero entry only the integers (row, col). And plot them "plot "row_col.dat" using 1:2 with points".
This chooses your row as x axis and cols as your y axis and start plotting the non-zero entries. It is very easy to do this. This is the most scalable solution I know. Gnuplot works at decent speed even for very large datasets (>10GB of [row, cols]), but Matlab just hangs (with due respect)
I use imagesc() to visualise arrays. It scales the values in array to values between 0 and 1, then plots the array like a greyscale bitmap image (of course you can change the colormap to make it easier to see detail).

Matlab: 'Block' matrix multiplication without resorting to repmat

I have a matrix A (of dimensions mxn) and vector b (of dimensions nx1).
I would like to construct a vector which is _repmat_((A*b),[C 1]), where C = n/m . I am using a lot of data and therefore n~100000 and C~10.
As you can see this is really block matrix multiplication without having to explicitly create the full A block matrix (dimensions nxn) as this easily exceeds available memory.
A is sparse and has already been converted using the function _sparse()_.
Is there a better way of doing this? (Considering speed and memory
footprint trade-off, I'd rather have a smaller memory footprint)
Usually if I was doing elementwise calculations, I would use bsxfun instead of using repmat to minimise memory footprint. As far as I know there is no equivalent bsxfun for matrix multiplication?
It looks like it is time for you to REALLY learn to use sparse matrices instead of wondering how to do this otherwise.
SPARSE block diagonal matrices do NOT take up a lot of memory if you create them correctly. I'll use my blktridiag function, which actually creates block tridiagonal matrices. Here I've used it to create random block diagonal matrices. I've set the off-diagonal elements to zero, so it really is block diagonal.
A = rand(3,3,100000);
tic,M = blktridiag(A,zeros(3,3,99999),zeros(3,3,99999));toc
Elapsed time is 0.478068 seconds.
And, while it is not small, the memory required is not that much more than twice the meemory required to store the diagonal eleemnts themselves.
whos A M
Name Size Bytes Class Attributes
A 3x3x100000 7200000 double
M 300000x300000 16800008 double sparse
Here about 17 megabytes, compared to 7 megabytes.
Note that blktridiag explicitly creates a sparse matrix directly.

Beginning Matlab question (matrix of zeros)

Why create a matrix of 0's in Matlab? For example,
A=zeros(5,5);
for i = 1:5
A(i)=exp(i);
end
Following on from j_random_hacker's answer, it's much more efficient in MATLAB to pre-allocate an array rather than letting MATLAB expand it. MATLAB can expand arrays if you simply assign elements off the current "end" of the array, like so:
x = []
for ii=1:1e4
x(ii) = 1/ii;
end
That's really inefficient because at each step in the loop, MATLAB will re-allocate "x" to be one element larger than it was previously. The following is much faster:
x = zeros( 1, 1e4 );
for ii=1:1e4
x(ii) = 1/ii;
end
(Probably fastest still in this case is: x = 1./(1:1e4);, but the pre-allocation route is what you need when you can't resolve things to a vectorised operation)
This is identical to asking: Why create a variable with value 0?
Usually you would do this if you plan to accumulate a bunch of results together somehow. In this case, you have to start "somewhere".
Although it is possible to start out with an empty matrix and expand it by concatenating (adding) new elements, vector extension is highly inefficient in MATLAB because it requires new memory every time another element is concatenated. Preallocation establishes a matrix that's the right size in advance, then each zero element can be replaced with the correct value. This method is much more efficient, especially in programs involving looping.
This is helpful if you are going to work on large matrix. Or if you are going to work with sparse matrix. This is also helpful when you are using the same vector or matrix again and again.