Creating sparse matrix within a loop - matlab

I want to make a sparse matrix from a graph which is stored in an Mx2 matrix:
for i = 1:m
adj = sparse(Graph(i,1), Graph(i,2), 1);
end
But adj only keeps one value. I don't know how big adj will be prior to the loop.
How can I tell MATLAB to create this sparse matrix?

No for loop is necessary. The sparse function takes in a vector of non-zero row locations, a vector of non-zero column locations and a vector of non-zero values. If all of the values are the same, you can simply use a scalar value to initialize all of them at once.
Simply do this1:
adj = sparse(Graph(:,1), Graph(:,2), 1);
This accesses all of the row locations with Graph(:,1), the column locations with Graph(:,2) and finally we initialize all values at these locations to 1.
This is also assuming that you have non-duplicate row and column locations in Graph. If you do have duplicate row and column locations, the non-zero values defined at these locations are accumulated into the same locations. For example, if we had three instances of (6,3) in our matrix, the output at this location in the sparse matrix would be 3.
1. Credit goes to Luis Mendo for initially proposing the answer

Related

Vectorizing a parallel FOR loop across multiple dimensions MATLAB

Please correct me if there are somethings unclear in this question. I have two matrices pop, and ben of 3 dimensions. Call these dimensions as c,t,w . I want to repeat the exact same process I describe below for all of the c dimensions, without using a for loop as that is slow. For the discussion below, fix a value of the dimension c, to explain my thinking, later I will give a MWE. So when c is fixed I have a 2D matrix with dimension t,w.
Now I repeat the entire process (coming below!) for all of the w dimension.
If the value of u is zero, then I find the next non zero entry in this same t dimension. I save both this entry as well as the corresponding t index. If the value of u is non zero, I simply store this value and the corresponding t index. Call the index as i - note i would be of dimension (c,t,w). The last entry of every u(c,:,w) is guaranteed to be non zero.
Example if the u(c,:,w) vector is [ 3 0 4 2 0 1], then the corresponding i values are [1,3,3,4,6,6].
Now I take these entries and define a new 3d array of dimension (c,t,w) as follows. I take my B array and do the following what is not a correct syntax but to explain you: B(c,t,w)/u(c,i(c,t,w),w). Meaning I take the B values and divide it by the u values corresponding to the non zero indices of u from i that I computed.
For the above example, the denominator would be [3,4,4,2,1,1]. I hope that makes sense!!
QUESTION:
To do this, as this process simply repeats for all c, I can do a very fast vectorizable calculation for a single c. But for multiple c I do not know how to avoid the for loop. I don't knw how to do vectorizable calculations across dimensions.
Here is what I did, where c_size is the dimension of c.
for c=c_size:-1:1
uu=squeeze(pop(c,:,:)) ; % EXTRACT A 2D MATRIX FROM pop.
BB=squeeze(B(c,:,:)) ; % EXTRACT A 2D MATRIX FROM B
ii = nan(size(uu)); % Start with all nan values
[dum_row, ~] = find(uu); % Get row indices of non-zero values
ii(uu ~= 0) = dum_row; % Place row indices in locations of non-zero values
ii = cummin(ii, 1, 'reverse'); % Column-wise cumulative minimum, starting from bottomi
dum_i = ii+(time_size+1).*repmat(0:(scenario_size-1), time_size+1, 1); % Create linear index
ben(c,:,:) = BB(dum_i)./uu(dum_i);
i(c,:,:) = ii ;
clear dum_i dum_row uu BB ii
end
The central question is to avoid this for loop.
Related questions:
Vectorizable FIND function with if statement MATLAB
Efficiently finding non zero numbers from a large matrix
Vectorizable FIND function with if statement MATLAB

Scale every row in a sparse matrix by an element of a vector in MATLAB

I have a sparse matrix
obj.resOp = sparse(row,col,val);
and a vector containing the sums of each row in the matrix
sums = sparse(sum(obj.resOp,2));
Now what I want to do is
obj.resOp = obj.resOp ./ sums;
which would scale every row in the matrix so that the rowsum in each row is 1.
However in this last line, MATLAB internally seems to construct a full matrix from obj.resOp and hence I get this error:
Error using ./ Requested 38849x231827 (17.5GB) array exceeds maximum
array size preference. Creation of arrays greater than this limit may
take a long time and cause MATLAB to become unresponsive. See array size limit or preference
panel for more information.
for sufficiently large matrices.
In theory I think that expanding to a full matrix is not necessary. Is there any MATLAB formulation of what I want to achieve while keeping the sparsity of obj.resOp?
You can do this with a method similar to the one described in this answer.
Start with some sparse matrix
% Random sparse matrix: 10 rows, 4 cols, density 20%
S = sprand(10,4, 0.2);
Get the row sums, note that sum returns a sparse matrix from sparse inputs, so no need for your additional conversion (docs).
rowsums = sum(S,2);
Find all non-zero indices and their values
[rowidx, colidx, vals] = find(S)
Now create a sparse matrix from the element-wise division
out = sparse(rowidx, colidx, vals./rowsums(rowidx), size(S,1), size(S,2));
The equivalent calculation
obj.resOp = inv(diag(sums)) * obj.resOp;
works smoothly.

How to change the value of a random subset of elements in a matrix without using a loop?

I'm currently attempting to select a subset of 0's in a very large matrix (about 400x300 elements) and change their value to 1. I am able to do this, but it requires using a loop where each instance it selects the next value in a randperm vector. In other words, 50% of the 0's in the matrix are randomly selected, one-at-a-time, and changed to 1:
z=1;
for z=1:(.5*numberofzeroes)
A(zeroposition(rpnumberofzeroes(z),1),zeroposition(rpnumberofzeroes(z),2))=1;
z=z+1;
end
Where 'A' is the matrix, 'zeroposition' is a 2-column-wide matrix with the positions of the 0's in the matrix (the "coordinates" if you like), and 'rpnumberofzeros' is a randperm vector from 1 to the number of zeroes in the matrix.
So for example, for z=20, the code might be something like this:
A(3557,2684)=1;
...so that the 0 which appears in this location within A will now be a 1.
It performs this loop thousands of times, because .5*numberofzeroes is a very big number. This inevitably takes a long time, so my question is can this be done without using a loop? Or at least, in some way that takes less processing resources/time?
As I said, the only thing that needs to be done is an entirely random selection of 50% (or whatever proportion) of the 0's changed to 1.
Thanks in advance for the help, and let me know if I can clear anything up! I'm new here, so apologies in advance if I've made any faux pa's.
That's very easy. I'd like to introduce you to my friend sub2ind. sub2ind allows you to take row and column coordinates of a matrix and convert them into linear column-major indices so that you can access multiple values in a matrix simultaneously in a single call. As such, the equivalent code you want is:
%// First access the values in rpnumberofzeroes
vals = rpnumberofzeroes(1:0.5*numberofzeroes, :);
%// Now, use the columns of these to determine which rows and columns we want
%// to access A
rows = zeroposition(vals(:,1), 1);
cols = zeroposition(vals(:,2), 2);
%// Get linear indices via sub2ind
ind1 = sub2ind(size(A), rows, cols);
%// Now set these locations to 1
A(ind1) = 1;
The first statement gets the first half of your matrix of coordinates stored in rpnumberofzeroes. The first column is the row coordinates, the second column is the column coordinates. Notice that in your code, you wish to use the values in zeroposition to access the locations in A. As such, extract out the corresponding rows and columns from rpnumberofzeroes to figure out the right rows and columns from zeroposition. Once that's done, we wish to use these new rows and columns from zeroposition and index into A. sub2ind requires three inputs - the size of the matrix you are trying to access... so in our case, that's A, the row coordinates and the column coordinates. The output is a set of column major indices that are computed for each row and column pair.
The last piece of the puzzle is to use these to index into A and set the locations to 1.
This can be accomplished with linear indexing as well:
% find linear position of all zeros in matrix
ix=find(abs(A)<eps);
% set one half of those, selected at random, to one.
A(ix(randperm(round(numel(ix)*.5)))=1;

Find largest subset of linearly independent vectors with Matlab

I need to create a matlab function that finds the largest subset of linearly independent vectors in a matrix A.
Initialize the output of the program to be 0, which corresponds to the empty set (containing no column vectors). Scanning the columns of A from left to right one by one; if adding the current column vector to the set of linearly independent vectors found so far makes the new set of vectors linearly DEPENDENT, then skip this vector, otherwise add this vector to the solution set; and move to the next column.
function [ out ] = maxindependent(A)
%MAXINDEPENDENT takes a matrix A and produces an array in which the columns
%are a subset of independent vectors with maximum size.
[r c]= size(A);
out=0;
A=A(:,rank(A))
for jj=1:c
M=[A A(:,jj)]
if rank(M)~=size(M,2)
A=A
elseif rank(M)==size(M,2)
A=M
end
end
out=A
if max(out)==0
0;
end
end
The number of linearly independent vectors in a matrix is equal to the rank of the matrix, and a particular subset of linearly independent vectors is not unique. Any 'largest subset' of linearly independent vectors will have size equal to the rank.
There is a function for this in MATLAB:
n = rank(A);
The algorithm you described is not necessary; you should just use the SVD. There is a concise way to do it here: how to get the maximally independent vectors given a set of vectors in MATLAB?

Drawing a random non-zero element from a sparse matrix

I have a sparse logical matrix, which is quite large. I would like to draw random non-zero elements from it without storing all of its non-zero elements in a separate vector (eg. by using find command). Is there an easy way to do this?
Currently I am implementing rejection sampling, which is drawing a random element and checking whether that is non-zero or not. But it is not efficient when the ratio of non-zero elements is small.
A sparse logical matrix is not a very practical representation of your data if you want to pick random locations. Rejection sampling and find are the only two ways that make sense to me. Here's how you can do them efficiently (assuming you want to get 4 random locations):
%# using find
idx = find(S);
%# draw 4 without replacement
fourRandomIdx = idx(randperm(length(idx),4));
%# draw 4 with replacement
fourRandomIdx = idx(randi(1,length(idx),4));
%# get row, column values
[row,col] = ind2sub(size(S),fourRandomIdx);
%# using rejection sampling
density = nnz(S)/prod(size(S));
%# estimate how many samples you need to get at least 4 hits
%# and multiply by 2 (or 3)
n = ceil( 1 / (1-(1-density)^4) ) * 2;
%# random indices w/ replacement
randIdx = randi(1,n,prod(size(S)));
%# identify the first four non-zero elements
[row,col] = find(S(randIdx),4,'first');
An n x m matrix with nnz non-zero elements requires nnz + n + 1 integers to store the locations of its non-zero entries. For a logical matrix there is no need to store the value of the non-zero entries: these are all true. Correspondingly, you would do best to convert your logical sparse matrix into a list of the linear indices of its non-zero entries, together with n and m, which requires only nnz + 2 integers of storage. From these (and ind2sub) you can readily reconstruct the subscripts corresponding to any non-zero entry that you choose randomly using randi over the range 1..nnz
find is the standard interface to get the non-zero elements in a sparse matrix. Have a look here http://www.mathworks.se/help/techdoc/math/f6-9182.html#f6-13040
[i,j,s] = find(S)
find returns the row indices of nonzero values in vector i, the column indices in vector j, and the nonzero values themselves in the vector s.
No need to get s. Just pick a random index in i,j.
By representing the entries in a 3 column format, aka a coordinate list (i, j, value), you can simply select the items from the list. To get this, you can either use your original method for creating the sparse matrix (i.e. the precursor to sparse()), or use the find command, a la [i,j,s] = find(S);
If you don't need the entries, and it seems you don't, you can just extract i and j.
If, for some reason, your matrix is massive and your RAM limitations are severe, you can simply divide the matrix into regions, and let the probability of selecting a given sub-matrix be proportional to the number of non-zero elements (using nnz) in that sub-matrix. You could go so far as to divide the matrix into individual columns, and the rest of the calculation is trivial. NB: by applying sum to the matrix, you can get the per-column counts (assuming your entries are just 1s).
In this way, you need not even bother with rejection sampling (which seems pointless to me in this case, since Matlab knows where all of the non-zero entries are).