Converting a sparse matrix into cell array efficiently - matlab

I have this sparse matrix of the following form
Lets take an example of 5x10 matrix
1 2 3 4 5 6 7 8 9 10
1 1 1 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0
3 .............................
4 .............................
5 .............................
From this sparse matrix, I want to create a cell array C of form
C{1} 1
C{2} = [1,2]
...........
...........
...........
My sparse matrix is high dimensional like 40000 by 790000. How can I do it efficiently in matlab. I can definitely use a loop and do it inefficiently. But I want the most efficient. Suggestions?

Use find to get the indices and accumarray to group them by columns:
[ii, jj] = find(A);
C = accumarray(jj, ii, [], #(v) {v.'});
Benchmarking
%// Random sparse matrix. Code adapted from #teng's answer
sz = [4e4 79e4];
nz = 1e5; %// number of nonzeros
A = sparse(randi(sz(1),[nz 1]),randi(sz(2),[nz 1]),1,sz(1),sz(2));
tic;
[ii, jj] = find(A);
C = accumarray(jj, ii, [], #(v) {v.'});
toc
Results:
For nz = 1e4:
Elapsed time is 0.099657 seconds.
For nz = 1e5:
Elapsed time is 0.756234 seconds.
For nz = 1e6:
Elapsed time is 5.431427 seconds.

Let me get the party started...
let's start with the basics:
tic;
sz = [ 400 7900]; % hehe...
aMat = sparse(randi(sz(1),[1000 1]),randi(sz(2),[1000 1]),1,sz(1),sz(2));
aCell = mat2cell(aMat,ones([sz(1) 1]));
preC = cellfun(#(x) x(x~=0), aCell,'UniformOutput',false);
C = cellfun(#(x) find(x), preC,'UniformOutput',false);
toc

Related

Matlab onehot to integers

I want to convert a onehot array to an array of integer values in MATLAB. Given:
Y = 1 0 0
0 1 0
0 1 0
I want to return:
new_y = 1
2
2
You could use find and return only the column indices like so
Y = [1 0 0; 0 1 0; 0 1 0];
[~, new_y] = find(Y); % output: [1; 2; 2] is the col indices of your 1s
Similarly you can return the row indices if your input was the transpose
[new_y, ~] = find(Y); % output: [1; 2; 3] is the row indices of your 1s
The Neural Network toolbox of MATLAB has built-in functions for converting between one-hot vectors and indices: ind2vec() to create a one-hot matrix, and vec2ind() to convert the one-hot matrix back to a vector of indices.
Note: ind2vec returns a sparse matrix. To convert it to a full matrix, you have to use the full() function.
>> Y = full(ind2vec([1, 2, 3]))
Y =
1 0 0
0 1 0
0 0 1
>> new_y = vec2ind(Y)
new_y =
1 2 3

Extract unique elements from each row of a matrix (Matlab)

I would like to apply the function unique to each row of a given matrix, without involving any for loop. Suppose I have the following 4-by-5 matrix
full(A) = [0 1 0 0 1
2 1 0 3 0
1 2 0 0 2
0 3 1 0 0]
where A is the corresponding sparse matrix. As an example using a for loop, I can do
uniq = cell(4,1);
for i = 1:4
uniq{i} = unique(A(i,:));
end
and I would obtain the cell structure uniq given by
uniq{1} = {1}
uniq{2} = {[1 2 3]}
uniq{3} = {[1 2]}
uniq{4} = {[1 3]}
Is there a quicker way to vectorize this and avoid for loops?
I need to apply this to matrices M-by-5 with M large.
Note that I'm not interested in the number of unique elements per row (I know there are around answers for such a problem).
You can use accumarray with a custom function:
A = sparse([0 1 0 0 1; 2 1 0 3 0; 1 2 0 0 2; 0 3 1 0 0]); % data
[ii, ~, vv] = find(A);
uniq = accumarray(ii(:), vv(:), [], #(x){unique(x.')});
This gives:
>> celldisp(uniq)
uniq{1} =
1
uniq{2} =
1 2 3
uniq{3} =
1 2
uniq{4} =
1 3
you can use num2cell(A,2) to convert each row into cell and then cellfun with unique to get cell array of unique values from each row:
% generate large nX5 matrix
n = 5000;
A = randi(5,n,5);
% convert each row into cell
C = num2cell(A,2);
% take unique values from each cell
U = cellfun(#unique,C,'UniformOutput',0);

Block diagonalize rows of a J-by-2 matrix

Given a J-by-2 matrix, for example
A = [1 2 ; 3 4 ; 5 6]
I want to block diagonalize it. That is, I want:
B = [1 2 0 0 0 0 ; 0 0 3 4 0 0 ; 0 0 0 0 5 6].
One command that does this is:
blkdiag(A(1,:),A(2,:),A(3,:))
This is will be slow and tedious if J is large. Is there a built-in Matlab function that does this?
Here's one hacky solution for J x 2 array case using linear indexing -
%// Get number of rows
N = size(A,1);
%// Get linear indices of the first column elements positions in output array
idx = 1:2*N+1:(N-1)*(2*N+1)+1;
%// Setup output array
out = zeros(N,N*2);
%// Put first and second column elements into idx and idx+N positions
out([idx(:) idx(:)+N]) = A
There is just one function call (neglecting size as it must be minimal) overhead of zeros and even that can be removed with this undocumented zeros initialization trick -
out(N,N*2) = 0; %// Instead of out = zeros(N,N*2);
Sample run -
A =
1 2
3 4
5 6
7 8
out =
1 2 0 0 0 0 0 0
0 0 3 4 0 0 0 0
0 0 0 0 5 6 0 0
0 0 0 0 0 0 7 8
Here's a benchmarking for the posted solutions thus far.
Benchmarking Code
%//Set up some random data
J = 7000; A = rand(J,2);
%// Warm up tic/toc
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------------- With #mikkola solution')
tic
temp = mat2cell(A, ones(J,1), 2);
B = blkdiag(temp{:});
toc, clear B temp
disp('---------------------------------- With #Jeff Irwin solution')
tic
m = size(A, 1);
n = size(A, 2);
B = zeros(m, m * n);
for k = 1: n
B(:, k: n: m * n) = diag(A(:, k));
end
toc, clear B k m n
disp('---------------------------------- With Hacky1 solution')
tic
N = size(A,1);
idx = 1:2*N+1:(N-1)*(2*N+1)+1;
out = zeros(N,N*2);
out([idx(:) idx(:)+N]) = A;
toc, clear out idx N
disp('---------------------------------- With Hacky2 solution')
tic
N = size(A,1);
idx = 1:2*N+1:(N-1)*(2*N+1)+1;
out(N,N*2) = 0;
out([idx(:) idx(:)+N]) = A;
toc, clear out idx N
Runtimes
---------------------------------- With #mikkola solution
Elapsed time is 0.546584 seconds.
---------------------------------- With #Jeff Irwin solution
Elapsed time is 1.330666 seconds.
---------------------------------- With Hacky1 solution
Elapsed time is 0.455735 seconds.
---------------------------------- With Hacky2 solution
Elapsed time is 0.364227 seconds.
Here's one using mat2cell to reshape into an J-by-1 cell array where each element contains a row of A. Then use the {:} operator to push the contents to blkdiag as a comma-separated variable list:
%//Set up some random data
J = 100;
A = rand(J,2);
%// Solution for arbitrary J-by-2 A
temp = mat2cell(A, ones(J,1), 2);
B = blkdiag(temp{:})
Nice solutions! I had a few timing results here, but repeated them running the benchmarking code by #Divakar. Results on my end below.
---------------------------------- With #mikkola solution
Elapsed time is 0.100674 seconds.
---------------------------------- With #Jeff Irwin solution
Elapsed time is 0.283275 seconds.
---------------------------------- With #Divakar Hacky1 solution
Elapsed time is 0.079194 seconds.
---------------------------------- With #Divakar Hacky2 solution
Elapsed time is 0.051629 seconds.
Here's another solution. Not sure how efficient it is, but it works for a matrix A of any size.
A = [1 2; 3 4; 5 6]
m = size(A, 1)
n = size(A, 2)
B = zeros(m, m * n)
for k = 1: n
B(:, k: n: m * n) = diag(A(:, k))
end
I found a way to do this using sparse that beats the other solutions both in memory and clock time:
N = size(A,1);
ind_1 = [1:N].';
ind_2 = [1:2:2*N-1].';
A_1 = sparse(ind_1,ind_2,A(:,1),N,2*N);
ind_2 = [2:2:2*N].';
A_2 = sparse(ind_1,ind_2,A(:,2),N,2*N);
out = A_1 + A_2;
Results are reported below using the same benchmarking code as #Divakar:
---------------------------------- With #mikkola solution
Elapsed time is 0.065136 seconds.
---------------------------------- With #Jeff Irwin solution
Elapsed time is 0.500264 seconds.
---------------------------------- With Hacky1 solution
Elapsed time is 0.200303 seconds.
---------------------------------- With Hacky2 solution
Elapsed time is 0.011991 seconds.
---------------------------------- With #Matt T solution
Elapsed time is 0.000712 seconds.

Create a sequence from vectors of start and end numbers

How is it possible to create a sequence if I have vectors of starting and ending numbers of the subsequences in a vectorized way in Matlab?
Example Input:
A=[12 20 34]
B=[18 25 37]
I want to get (spacing for clarity):
C=[12 13 14 15 16 17 18 20 21 22 23 24 25 34 35 36 37]
Assuming that A and B are sorted ascending and that B(i) < A(i+1) holds then:
idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;
C = find(cumsum(idx))
To get around the issue mentioned by Dennis in the comments:
m = min(A)-1;
A = A-m;
B = B-m;
idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;
C = find(cumsum(idx)) + m;
cumsum based approach for a generic case (negative numbers or overlaps) -
%// Positions in intended output array at which group shifts
intv = cumsum([1 B-A+1])
%// Values to be put at those places with intention of doing cumsum at the end
put_vals = [A(1) A(2:end) - B(1:end-1)]
%// Get vector of ones and put_vals
id_arr = ones(1,intv(end)-1)
id_arr(intv(1:end-1)) = put_vals
%// Final output with cumsum of id_arr
out = cumsum(id_arr)
Sample run -
>> A,B
A =
-2 -3 1
B =
5 -1 3
>> out
out =
-2 -1 0 1 2 3 4 5 -3 -2 -1 1 2 3
Benchmarking
Here's a runtime test after warming-up tic-toc to compare the various approaches listed to solve the problem for large sized A and B -
%// Create inputs
A = round(linspace(1,400000000,200000));
B = round((A(1:end-1) + A(2:end))/2);
B = [B A(end)+B(1)];
disp('------------------ Divakar Method')
.... Proposed approach in this solution
disp('------------------ Dan Method')
tic
idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;
C = find(cumsum(idx));
toc, clear C idx
disp('------------------ Santhan Method')
tic
In = [A;B];
difIn = diff(In);
out1 = bsxfun(#plus, (0:max(difIn)).',A); %//'
mask = bsxfun(#le, (1:max(difIn)+1).',difIn+1); %//'
out1 = out1(mask).'; %//'
toc, clear out1 mask difIn In
disp('------------------ Itamar Method')
tic
C = cell2mat(cellfun(#(a,b){a:b},num2cell(A),num2cell(B)));
toc, clear C
disp('------------------ dlavila Method')
tic
C = cell2mat(arrayfun(#(a,b)a:b, A, B, 'UniformOutput', false));
toc
Runtimes
------------------ Divakar Method
Elapsed time is 0.793758 seconds.
------------------ Dan Method
Elapsed time is 2.640529 seconds.
------------------ Santhan Method
Elapsed time is 1.662889 seconds.
------------------ Itamar Method
Elapsed time is 2.524527 seconds.
------------------ dlavila Method
Elapsed time is 2.096454 seconds.
Alternative using bsxfun
%// Find the difference between Inputs
difIn = B - A;
%// Do colon on A to A+max(difIn)
out = bsxfun(#plus, (0:max(difIn)).',A); %//'
%// mask out which values you want
mask = bsxfun(#le, (1:max(difIn)+1).',difIn+1); %//'
%// getting only the masked values
out = out(mask).'
Results:
>> A,B
A =
-2 -4 1
B =
5 -3 3
>> out
out =
-2 -1 0 1 2 3 4 5 -4 -3 1 2 3
This is one option:
C = cell2mat(cellfun(#(a,b){a:b},num2cell(A),num2cell(B)));
You can do this:
C = cell2mat(arrayfun(#(a,b)a:b, A, B, "UniformOutput", false))
arrayfun(...) create all the subsequences taking pairs of A and B.
cell2mat is used to concatenate each subsequence.

Calculate roots of multiple polynomials

Given a matrix A that represents polynomials in each column. How can the roots of each polynomial be calculated efficiently without loops?
Here's a comparison between 3 methods:
A simple loop through all the rows, using roots on each row.
A completely loopless approach, based on YBE's idea of using a block-diagonal matrix,
using sparse as an intermediate
A simple loop through all the rows, but this time using "inlined" code from roots.
The code:
%// The polynomials
m = 15;
n = 8;
N = 1e3;
X = rand(m,n);
%// Simplest approach
tic
for mm = 1:N
R = zeros(n-1,m);
for ii = 1:m
R(:,ii) = roots(X(ii,:));
end
end
toc
%// Completely loopless approach
tic
for mm = 1:N
%// Indices for the scaled coefficients
ii = repmat(1:n-1:m*(n-1), n-1,1);
jj = 1:m*(n-1);
%// Indices for the ones
kk = bsxfun(#plus, repmat(2:n-1, m,1), (n-1)*(0:m-1).'); %'
ll = kk-1;
%// The block diagonal matrix
coefs = -bsxfun(#rdivide, X(:,2:end), X(:,1)).'; %'
one = ones(n-2,m);
C = full(sparse([ii(:); kk(:)], [jj(:); ll(:)],...
[coefs(:); one(:)]));
%// The roots
R = reshape(eig(C), n-1,m);
end
toc
%// Simple loop, roots() "inlined"
tic
R = zeros(n-1,m);
for mm = 1:N
for ii = 1:m
A = zeros(n-1);
A(1,:) = -X(ii,2:end)/X(ii,1);
A(2:n:end) = 1;
R(:,ii) = eig(A);
end
end
toc
The results:
%// m=15, n=8, N=1e3:
Elapsed time is 0.780930 seconds. %// loop using roots()
Elapsed time is 1.959419 seconds. %// Loopless
Elapsed time is 0.326140 seconds. %// loop over inlined roots()
%// m=150, n=18, N=1e2:
Elapsed time is 1.785438 seconds. %// loop using roots()
Elapsed time is 110.1645 seconds. %// Loopless
Elapsed time is 1.326355 seconds. %// loop over inlined roots()
Of course, your mileage may vary, but the general message should be clear: The old advice of avoiding loops in MATLAB is just that: OLD. It just no longer applies to MATLAB versions R2009 and up.
Vectorization can still be a good thing though, but certainly not always. As in this case: profiling will tell you that most time is spent on computing the eigenvalues for the block-diagonal matrix. The algorithm underlying eig scales as N³ (yes, that is a three), plus it cannot take advantage of sparse matrices in any way (like this block-diagonal one), making the approach a poor choice in this particular context.
Loops are your friend here ^_^
Now, this is of course based on eig() of the companion matrix, which is a nice and simple method to compute all roots in one go. There are of course many more methods to compute roots of polynomials, each with their own advantages/disadvantages. Some are a lot faster, but aren't so good when a few of the roots are complex. Others are a lot faster, but need a fairly good initial estimate for every root, etc. Most other rootfinding methods are usually a lot more complicated, which is why I left these out here.
Here is a nice overview, and here is a more in-depth overview, along with some MATLAB code examples.
If you're smart, you should only dive into this material if you need to do this computation millions of times on a daily basis for at least the next few weeks, otherwise, it's just not worth the investment.
If you're smarter, you'll recognize that this will undoubtedly come back to you at some point, so it's worthwhile to do anyway.
And if you're an academic, you master all the root-finding methods so you'll have a giant toolbox, so you can pick the best tool for the job whenever a new job comes along. Or even dream up your own method :)
You can use arrayfun in combination with roots, which will give you the results in terms of cell arrays.
n = size(A,2);
t = arrayfun(#(x)roots(A(:,x)), 1:n, 'UniformOutput', 0);
You can then use cell2mat to convert it to a matrix. Either: r = cell2mat(t), or
r = cell2mat(arrayfun(#(x)roots(A(:,x)), 1:n, 'UniformOutput', 0));
Practically what roots does is to find the eigenvalues of the companion matrix.
roots(p) = eig(compan(p))
So here is my example that constructs a block-diagonal matrix out of the companion matrices of each polynomial, than finds the eigenvalues of the block-diagonal matrix.
>> p1=[2 3 5 7];
>> roots(p1)
ans =
-0.0272 + 1.5558i
-0.0272 - 1.5558i
-1.4455
>> eig(compan(p1))
ans =
-0.0272 + 1.5558i
-0.0272 - 1.5558i
-1.4455
>> p2=[1 2 9 5];
>> roots(p2)
ans =
-0.6932 + 2.7693i
-0.6932 - 2.7693i
-0.6135
>> p3=[5 1 4 7];
>> roots(p3)
ans =
0.3690 + 1.1646i
0.3690 - 1.1646i
-0.9381
>> A=blkdiag(compan(p1),compan(p2),compan(p3))
A =
-1.5000 -2.5000 -3.5000 0 0 0 0 0 0
1.0000 0 0 0 0 0 0 0 0
0 1.0000 0 0 0 0 0 0 0
0 0 0 -2.0000 -9.0000 -5.0000 0 0 0
0 0 0 1.0000 0 0 0 0 0
0 0 0 0 1.0000 0 0 0 0
0 0 0 0 0 0 -0.2000 -0.8000 -1.4000
0 0 0 0 0 0 1.0000 0 0
0 0 0 0 0 0 0 1.0000 0
>> eig(A)
ans =
-0.0272 + 1.5558i
-0.0272 - 1.5558i
-1.4455
-0.6932 + 2.7693i
-0.6932 - 2.7693i
-0.6135
0.3690 + 1.1646i
0.3690 - 1.1646i
-0.9381