How to deal with nonzero elements from rows of sparse matrix in Matlab? - matlab

I am dealing with quite a big sparse matrix, its size is about 150,000*150,000. I need to access into its rows, extract the non-zero elements and replace these values following the rule as the code below:
tic
H = [];
for i = 1: size(A,2)
[a,b,c] = find(A(i,:)); % extract the rows
add = diff([0 c(2:end) 0]); % the replacing rule
aa = i*ones(1,size(a,2)); % return back the old position of rows
G0 = [aa' b' add']; % put it back the old position with replaced values
H = [H; G0];
end
H1 = H(:,1);
H2 = H(:,2);
H3 = H(:,3);
ADD = sparse(H1,H2,H3,nm,nm,nzmax);
toc
I found that the find function is really time consuming (0.1s/rows) in this code and with this current size of my sparse matrix, it takes me up to about 33 hours for this job. I do believe there is some ways out but I am such a newborn to coding and dealing with sparse matrix is really scary.
Would you drop me some ideas?

You can use the find function once applying it on the whole array then use accumarray to apply the function on each row:
[a b c]=find(A.');
add=accumarray(b,c,[],#(x){diff([0 ;x(2:end) ;0])});
H = [b a vertcat(add{:})];

Related

Optimal way of doing iterative assembly of sparse matrices in Matlab?

My code needs to in a loop modify the elements of a sparse matrix. Doing this matlab warns me that This sparse indexing expression is likely to be slow. I am preallocating the sparse arrays using the Spalloc function but am still getting this warning. What is the optimal approach for assembling of sparse matrices? This is what I am currently doing.
K=spalloc(n,n,100); f=spalloc(n,1,100);
for i = 1:Nel
[Ke,fe] = myFunction(Ex(i),Ey(i));
inds = data(i,2:end);
K(inds,inds) = K(inds,inds) + Ke;
f(inds) = f(inds)+fe;
end
the indices in inds may be appear several times in the loop, meaning an element in K or f may receive multiple contributions. The last two lines within the loop are where I'm getting warnings.
A common approach is to use the triplet form of the sparse constructor:
S = sparse(i,j,v,m,n)
i and j are row and column index vectors and v is the corresponding data vector. Values corresponding to repeated indices are summed like your code does. So you could instead build up row and column index vectors along with a data vector and then just call sparse with those.
For example something like:
nout = Nel*(size(data,2)-1);
% Data vector for K
Kdata = zeros(1,nout);
% Data vector for f
fdata = zeros(1,nout);
% Index vector for K and f
sparseIdxvec = ones(1,nout);
outIdx = 1;
for i = 1:Nel
[Ke,fe] = myFunction(Ex(i),Ey(i));
inds = data(i,2:end);
nidx = numel(inds);
outIdxvec = outIdx:outIdx+nidx-1;
sparseIdxvec(outIdxvec) = inds;
Kdata(outIdxvec) = Ke;
fdata(outIdxvec) = fe;
outIdx = outIdx + nidx;
end
K = sparse(sparseIdxvec,sparseIdxvec,Kdata,n,n);
f = sparse(sparseIdxvec,1,fdata,n,1);
Depending on your application, that may or may not actually be faster.

Assign labels based on given examples for a large dataset effectively

I have matrix X (100000 X 10) and vector Y (100000 X 1). X rows are categorical and assume values 1 to 5, and labels are categorical too (11 to 20);
The rows of X are repetitive and there are only ~25% of unique rows, I want Y to have statistical mode of all the labels for a particular unique row.
And then there comes another dataset P (90000 X 10), I want to predict labels Q based on the previous exercise.
What I tried is finding unique rows of X using unique in MATLAB, and then assign statistical mode of each of these labels for the unique rows. For P, I can use ismember and carry out the same.
The issue is in the size of the dataset and it takes an 1.5-2 hours to complete the process. Is there a vectorize version possible in MATLAB?
Here is my code:
[X_unique,~,ic] = unique(X,'rows','stable');
labels=zeros(length(X_unique),1);
for i=1:length(X_unique)
labels(i)=mode(Y(ic==i));
end
Q=zeros(length(P),1);
for j=1:length(X_unique)
Q(all(repmat(X_unique(j,:),length(P),1)==P,2))=label(j);
end
You will be able to accelerate your first loop a great deal if you replace it entirely with:
labels = accumarray(ic, Y, [], #(y) mode(y));
The second loop can be accelerated by using all(bsxfun(#eq, X_unique(i,:), P), 2) inside Q(...). This is a good vectorized approach assuming your arrays are not extremely large w.r.t. the available memory on your machine. In addition, to save more time, you could use the unique trick you did with X on P, run all the comparisons on a much smaller array:
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
EDIT:
to compute Q_unique in the following way: and then convert it back to the full array using:
Q_unique = zeros(length(P_unique),1);
for i = 1:length(X_unique)
Q_unique(all(bsxfun(#eq, X_unique(i,:), P_unique), 2)) = labels(i)
end
and convert back to Q_full to match the original P input:
Q_full = Q_unique(IC_P);
END EDIT
Finally, if memory is an issue, in addition to everything above, you might want you use a semi-vectorized approach inside your second loop:
for i = 1:length(X_unique)
idx = true(length(P), 1);
for j = 1:size(X_unique,2)
idx = idx & (X_unique(i,j) == P(:,j));
end
Q(idx) = labels(i);
% Q(all(bsxfun(#eq, X_unique(i,:), P), 2)) = labels(i);
end
This would take about x3 longer compared with bsxfun but if memory is limited then you gotta pay with speed.
ANOTHER EDIT
Depending on your version of Matlab, you could also use containers.Map to your advantage by mapping textual representations of the numeric sequences to the calculated labels. See example below.
% find unique members of X to work with a smaller array
[X_unique, ~, IC_X] = unique(X, 'rows', 'stable');
% compute labels
labels = accumarray(IC_X, Y, [], #(y) mode(y));
% convert X to cellstr -- textual representation of the number sequence
X_cellstr = cellstr(char(X_unique+48)); % 48 is ASCII for 0
% map each X to its label
X_map = containers.Map(X_cellstr, labels);
% find unique members of P to work with a smaller array
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
% convert P to cellstr -- textual representation of the number sequence
P_cellstr = cellstr(char(P_unique+48)); % 48 is ASCII for 0
% --- EDIT --- avoiding error on missing keys in X_map --------------------
% find which P's exist in map
isInMapP = X_map.isKey(P_cellstr);
% pre-allocate Q_unique to the size of P_unique (can be any value you want)
Q_unique = nan(size(P_cellstr)); % NaN is safe to use since not a label
% find the labels for each P_unique that exists in X_map
Q_unique(isInMapP) = cell2mat(X_map.values(P_cellstr(isInMapP)));
% --- END EDIT ------------------------------------------------------------
% convert back to full Q array to match original P
Q_full = Q_unique(IC_P);
This takes about 15 seconds to run on my laptop. Most of which is consumed by computation of mode.

Multiple constant to a matrix and convert them into block diagonal matrix in matlab

I have a1 a2 a3. They are constants. I have a matrix A. What I want to do is to get a1*A, a2*A, a3*A three matrices. Then I want transfer them into a diagonal block matrix. For three constants case, this is easy. I can let b1 = a1*A, b2=a2*A, b3=a3*A, then use blkdiag(b1, b2, b3) in matlab.
What if I have n constants, a1 ... an. How could I do this without any looping?I know this can be done by kronecker product but this is very time-consuming and you need do a lot of unnecessary 0 * constant.
Thank you.
Discussion and code
This could be one approach with bsxfun(#plus that facilitates in linear indexing as coded in a function format -
function out = bsxfun_linidx(A,a)
%// Get sizes
[A_nrows,A_ncols] = size(A);
N_a = numel(a);
%// Linear indexing offsets between 2 columns in a block & between 2 blocks
off1 = A_nrows*N_a;
off2 = off1*A_ncols+A_nrows;
%// Get the matrix multiplication results
vals = bsxfun(#times,A,permute(a,[1 3 2])); %// OR vals = A(:)*a_arr;
%// Get linear indices for the first block
block1_idx = bsxfun(#plus,[1:A_nrows]',[0:A_ncols-1]*off1); %//'
%// Initialize output array base on fast pre-allocation inspired by -
%// http://undocumentedmatlab.com/blog/preallocation-performance
out(A_nrows*N_a,A_ncols*N_a) = 0;
%// Get linear indices for all blocks and place vals in out indexed by them
out(bsxfun(#plus,block1_idx(:),(0:N_a-1)*off2)) = vals;
return;
How to use: To use the above listed function code, let's suppose you have the a1, a2, a3, ...., an stored in a vector a, then do something like this out = bsxfun_linidx(A,a) to have the desired output in out.
Benchmarking
This section compares or benchmarks the approach listed in this answer against the other two approaches listed in the other answers for runtime performances.
Other answers were converted to function forms, like so -
function B = bsxfun_blkdiag(A,a)
B = bsxfun(#times, A, reshape(a,1,1,[])); %// step 1: compute products as a 3D array
B = mat2cell(B,size(A,1),size(A,2),ones(1,numel(a))); %// step 2: convert to cell array
B = blkdiag(B{:}); %// step 3: call blkdiag with comma-separated list from cell array
and,
function out = kron_diag(A,a_arr)
out = kron(diag(a_arr),A);
For the comparison, four combinations of sizes of A and a were tested, which are -
A as 500 x 500 and a as 1 x 10
A as 200 x 200 and a as 1 x 50
A as 100 x 100 and a as 1 x 100
A as 50 x 50 and a as 1 x 200
The benchmarking code used is listed next -
%// Datasizes
N_a = [10 50 100 200];
N_A = [500 200 100 50];
timeall = zeros(3,numel(N_a)); %// Array to store runtimes
for iter = 1:numel(N_a)
%// Create random inputs
a = randi(9,1,N_a(iter));
A = rand(N_A(iter),N_A(iter));
%// Time the approaches
func1 = #() kron_diag(A,a);
timeall(1,iter) = timeit(func1); clear func1
func2 = #() bsxfun_blkdiag(A,a);
timeall(2,iter) = timeit(func2); clear func2
func3 = #() bsxfun_linidx(A,a);
timeall(3,iter) = timeit(func3); clear func3
end
%// Plot runtimes against size of A
figure,hold on,grid on
plot(N_A,timeall(1,:),'-ro'),
plot(N_A,timeall(2,:),'-kx'),
plot(N_A,timeall(3,:),'-b+'),
legend('KRON + DIAG','BSXFUN + BLKDIAG','BSXFUN + LINEAR INDEXING'),
xlabel('Datasize (Size of A) ->'),ylabel('Runtimes (sec)'),title('Runtime Plot')
%// Plot runtimes against size of a
figure,hold on,grid on
plot(N_a,timeall(1,:),'-ro'),
plot(N_a,timeall(2,:),'-kx'),
plot(N_a,timeall(3,:),'-b+'),
legend('KRON + DIAG','BSXFUN + BLKDIAG','BSXFUN + LINEAR INDEXING'),
xlabel('Datasize (Size of a) ->'),ylabel('Runtimes (sec)'),title('Runtime Plot')
Runtime plots thus obtained at my end were -
Conclusions: As you can see, either one of the bsxfun based methods could be looked into, depending on what kind of datasizes you are dealing with!
Here's another approach:
Compute the products as a 3D array using bsxfun;
Convert into a cell array with one product (matrix) in each cell;
Call blkdiag with a comma-separated list generated from the cell array.
Let A denote your matrix, and a denote a vector with your constants. Then the desired result B is obtained as
B = bsxfun(#times, A, reshape(a,1,1,[])); %// step 1: compute products as a 3D array
B = mat2cell(B,size(A,1),size(A,2),ones(1,numel(a))); %// step 2: convert to cell array
B = blkdiag(B{:}); %// step 3: call blkdiag with comma-separated list from cell array
Here's a method using kron which seems to be faster and more memory efficient than Divakar's bsxfun based solution. I'm not sure if this is different to your method, but the timing seems pretty good. It might be worth doing some testing between the different methods to work out which is more efficient for you problem.
A=magic(4);
a1=1;
a2=2;
a3=3;
kron(diag([a1 a2 a3]),A)

Delete rows in a vector starting from a specific index Matlab

I have an nx1 vector in Matlab. I want to delete rows starting from a specific index. For example, if n is 100 and the index is 60 then all rows from 60 till 100 will be removed. I've found this REMOVEROWS but I don't know how to make it do this.
The removerows function is probably overkill for a vector, but here is how it can be used along with the usual linear indexing method:
n = 100;
index = 60;
a = rand(n,1); % An n-by-1 column vector
b1 = a(1:index-1)
b2 = removerows(a,'ind',index:n) % Or removerows(a,'ind',index:size(a,1))
Note that the removerows function is in the Neural Network toolbox and thus not part of core Matlab.
This should the trick:
your_vector(index:end) = [];
If you want to remove from the end that you can do:
your_vector(end-index+1:end) = [];

How can I build a Scilab / MATLAB program that averages a 3D matrix?

I need to make a scilab / MATLAB program that averages the values of a 3D matrix in cubes of a given size(N x N x N).I am eternally grateful to anyone who can help me.
Thanks in advance
In MATLAB, mat2cell and cellfun make a great team for working on N-dimensional non-overlapping blocks, as I think is the case in the question. An example scenario:
[IN]: A = [30x30x30] array
[IN]: bd = [5 5 5], size of cube
[OUT]: B = [6x6x6] array of block means
To accomplish the above, the solution is:
dims = [30 30 30]; bd = [5 5 5];
A = rand(dims);
f = floor(dims./bd);
remDims = mod(dims,bd); % handle dims that are not a multiple of block size
Ac = mat2cell(A,...
[bd(1)*ones(f(1),1); remDims(1)*ones(remDims(1)>0)], ....
[bd(2)*ones(f(2),1); remDims(2)*ones(remDims(2)>0)], ....
[bd(3)*ones(f(3),1); remDims(3)*ones(remDims(3)>0)] );
B = cellfun(#(x) mean(x(:)),Ac);
If you need a full size output with the mean values replicated, there is a straightforward solution involving the 'UniformOutput' option of cellfun followed by cell2mat.
If you want overlapping cubes and the same size output as input, you can simply do convn(A,ones(blockDims)/prod(blockDims),'same').
EDIT: Simplifications, clarity, generality and fixes.
N = 10; %Same as OP's parameter
M = 10*N;%The input matrix's size in each dimensiona, assumes M is an integer multiple of N
Mat = rand(M,M,M); % A random input matrix
avgs = zeros((M/N)^3,1); %Initializing output vector
l=1; %indexing
for i=1:M/N %indexing 1st coord
for j=1:M/N %indexing 2nd coord
for k=1:M/N % indexing third coord
temp = Mat((i-1)*N+1:i*N,(j-1)*N+1:j*N,(k-1)*N+1:k*N); %temporary copy
avg(l) = mean(temp(:)); %averaging operation on the N*N*N copy
l = l+1; %increment indexing
end
end
end
The for loops and copying can be eliminated once you get the gist of indexing.