I have to create a matlab matrix that is much bigger that my phisical memory, and i want to take advantage of the sparsity.
This matrix is really really sparse [say N elements in an NxN matrix], and my ram is enought for this. I create the matrix in this way:
A=sparse(zeros(N));
but it goes out of memory.
Do you know the right way to create this matrix?
zeros(N) is creating an NxN matrix, which is not sparse, hence you are running out of memory. Your code is equivalent to
temp = zeros(N)
A = sparse(temp)
Just do sparse(N,N).
Creating an all zeros sparse matrix, and then modifying it is extremely inefficient in matlab.
Instead of doing something like:
A = sparse(N,N) % or even A = sparse([],[],[],N,N,N)
A(1:N,7) = 1:N
It is much more efficient to construct the matrix in triplet form. That is,
construct the column and row indices and the nonzero entries first, then
form the matrix. For example,
i = 1:N;
j = 7*ones(1,N);
x = 1:N;
A = sparse(i,j,x,N,N);
I'd actually recommend the full syntax of sparse([],[],[],N,N,N).
It's useful to preallocate if you know the maximum number of nonzero elements as otherwise you'll get reallocs when you insert new elements.
Related
I am filling a sparse matrix P (230k,290k) with values coming from a text file which I read line by line, here is the (simplified) code
while ...
C = textscan(text_line,'%d','delimiter',',','EmptyValue', 0);
line_number = line_number+1;
P(line_number,:)=C{1};
end
the problem I have is that while at the beginning the
P(line_number,:)=C{1};
statement is fast, after a few thousands lines become exterely slow, I guess because Matlab need to find the memory space to allocate every time. Is there a way to pre-allocate memory with sparse matrixes? I don't think so but maybe I am missing something. Any other advise which can speed up the operation (e.g. having a lot of free RAM can make the difference?)
There's a sixth input argument to sparse that tells the number of nonzero elements in the matrix. That's used by Matlab to preallocate:
S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an
m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space
allocated for nzmax nonzeros.
So you could initiallize with
P = sparse([],[],[],230e3,290e3,nzmax);
You can make a guess about the number of nonzeros (perhaps checking file size?) and use that as nzmax. If it turns you need more nonzero elements in the end, Matlab will preallocate on the fly (slowly).
By far the fastest way to generate a sparse matrix wihtin matlab is to load all the values in at once, then generate the sparse matrix in one call to sparse. You have to load the data and arrange it into vectors defining the row and column indices and values for each filled cell. You can then call sparse using the S = sparse(i,j,s,m,n) syntax.
I have N kx1 sparse vectors and I need to multiply each of them by their transpose, creating N square matrices, which I then have to sum over. The desired output is a k by k matrix. I have tried doing this in a loop and using arrayfun, but both solutions are too slow. Perhaps one of you can come up with something faster. Below are specific details about the best solution I've come up with.
mdev_big is k by N sparse matrix, containing each of the N vectors.
fun_sigma_i = #(i) mdev_big(:,i)*mdev_big(:,i)';
sigma_i = arrayfun(fun_sigma_i,1:N,'UniformOutput',false);
sigma = sum(reshape(full([sigma_i{:}]),k,k,N),3);
The slow part of this process is making sigma_i full, but I cannot reshape it into a 3d array otherwise. I've also tried cat instead of reshape (slower), ndSparse instead of full (way slower), and making fun_sigma_i return a full matrix rather than a sparse one (slower).
Thanks for the help! ,
I have to perform this operation:
N = A'*P*A
The structure of the P matrix is block diagonal while the A matrix is largely sparse (also in a banded structure). The multiplication is performed in blocks. But the problem is storage.
The N matrix is too huge to store in full (out of memory when trying to allocate). So, I want to store in a sparse fashion. While the sparse command generates only the values in row,column format, can it be applied to store banded matrices with the row column as the index of the block?
I have tried spalloc given in the this question but it hasnt helped storing the row and index of the block.
Thank you.
Image for A P A' formation
The problem lies in the blocks. The blocks are themselves sparse. So is it possible to make blocks as sparse matrices themselves while saving.
So, if a block has a row = 1 and col = 1, then can this be done?
N(row,col) = sparse(A'*P*A)
There may be some additional tricks to play but the first thing to try is to make sure the full matrix N is never created in memory. The immediate problem is that if you call sparse(A'*P*A) then you multiple A'*P then (A'*P)*A and only then do you make it sparse and take out the zeros. Right before making it sparse, the entire non-sparse matrix representation of N is in memory. To force MATLAB to be smarter do the following:
SA = sparse(A);
N = SA'*sparse(P)*SA;
whos N
You should see that N is sparse but, more importantly, each multiplication result is sparse as well because you are multiplying a sparse matrix times a sparse matrix.
let's say I have a big Matrix X with a lot of zeros, so of course I make it sparse in order to save on memory and CPU. After that I do some stuff and at some point I want to have the nonzero elements. My code looks something like this:
ind = M ~= 0; % Whereby M is the sparse Matrix
This looks however rather silly to me since the structure of the sparse Matrix should allow the direct extraction of the information.
To clarify: I do not look for a solution that works, but rather would like to avoid doing the same thing twice. A sparse Matrix should perdefinition already know it's nonzero values, so there should be no need to search for it.
yours magu_
The direct way to retrieve nonzero elements from a sparse matrix, is to call nonzeros().
The direct way is obviously the fastest method, however I performed some tests against logical indexing on the sparse and its full() counterparty, and the indexing on the former is faster (results depend on the sparsity pattern and dimension of the matrix).
The sum of times over 100 iterations is:
nonzeros: 0.02657 seconds
sparse idx: 0.52946 seconds
full idx: 2.27051 seconds
The testing suite:
N = 100;
t = zeros(N,3);
for ii = 1:N
s = sprand(10000,1000,0.01);
r = full(s);
% Direct call nonzeros
tic
nonzeros(s);
t(ii,1) = toc;
% Indexing sparse
tic
full(s(s ~= 0));
t(ii,2) = toc;
% Indexing full
tic
r(r~=0);
t(ii,3) = toc;
end
sum(t)
I'm not 100% sure what you're after but maybe [r c] = find(M) suits you better?
You can get to the values of M by going M(r,c) but the best method will surely be dictated by what you intend to do with the data next.
find function is recommended by MATLAB:
[row,col] = find(X, ...) returns the row and column indices of the nonzero entries in the matrix X. This syntax is especially useful when working with sparse matrices.
While find has been proposed before, I think this is an important addition:
[r,c,v] = find(M);
Gives you not only the indices r,c, but also the non-zero values v. Using the nonzeros command seems to be a bit faster, but find is in general very useful when dealing with sparse matrices because the [r,c,v] vectors describe the complete matrix (except matrix dimension).
I have an array of valued double M where size(M)=15000
I need to convert this array to a diagonal matrix with command diag(M)
but i get the famous error out of memory
I run matlab with option -nojvm to gain memory space
and with the optin 3GB switch on windows
i tried also to convert my array to double precision
but the problem persist
any other idea?
There are much better ways to do whatever you're probably trying to do than generating the full diagonal matrix (which will be extremely sparse).
Multiplying that matrix, which has 225 million elements, by other matrices will also take a very long time.
I suggest you restructure your algorithm to take advantage of the fact that:
diag(M)(a, b) =
M(a) | a == b
0 | a != b
You'll save a huge amount of time and memory and whoever is paying you will be happier.
This is what a diagonal matrix looks like:
Every entry except those along the diagonal of the matrix (the ones where row index equals the column index) is zero. Relating this example to your provided values, diag(M) = A and M(n) = An
Use saprse matrix
M = spdiags( M, 0, numel(M), numel(M) );
For more info see matlab doc on spdiags and on sparse matrices in general.
If you have an n-by-n square matrix, M, you can directly extract the diagonal elements into a row vector via
n = size(M,1); % Or length(M), but this is more general
D = M(1:n+1:end); % 1-by-n vector containing diagonal elements of M
If you have an older version of Matlab, the above may even be faster than using diag (if I recall, diag wasn't always a compiled function). Then, if you need to save memory and only need the diagonal of M and can get rid of the rest, you can do this:
M(:) = 0; % Zero out M
M(1:n+1:end) = D; % Insert diagonal elements back into M
clear D; % Clear D from memory
This should not allocate much more than about (n^2+n)*8 = n*(n+1)*8 bytes at any one time for double precision values (some will needed for indexing operations). There are other ways to do the above that might save a bit more if you need a (full, non-sparse) n-by-n diagonal matrix, but there's no way to get around that you'll need n^2*8 bytes at a minimum just to store the matrix of doubles.
However, you're still likely to run into problems. I'd investigate sparse datatypes as #user2379182 suggests. Or rework you algorithms. Or better yet, look into obtaining 64-bit Matlab and/or a 64-bit OS!