I have a cell array of 53 different (40,000 x 2000) sparse matrices. I need to take the mean over the third dimension, so that for example element (2,5) is averaged across the 53 cells. This should yield a single (33,000 x 2016) output. I think there ought to be a way to do this with cellfun(), but I am not able to write a function that works across cells on the same within-cell indices.
You can convert from sparse matrix to indices and values of nonzeros entries, and then use sparse to automatically obtain the sum in sparse form:
myCell = {sparse([0 1; 2 0]), sparse([3 0; 4 0])}; %// example
C = numel(myCell);
M = cell(1,C); %// preallocate
N = cell(1,C);
V = cell(1,C);
for c = 1:C
[m n v] = find(myCell{c}); %// rows, columns and values of nonzero entries
M{c} = m.';
N{c} = n.';
V{c} = v.';
end
result = sparse([M{:}],[N{:}],[V{:}])/C; %'// "sparse" sums over repeated indices
This should do the trick, just initialize an empty array and sum over each element of the cell array. I don't see any way around using a for loop without concatenating it into one giant 3D array (which will almost definitely run out of memory)
running_sum=zeros(size(cell_arr{1}))
for i=1:length(cell_arr)
running_sum=running_sum+cell_arr{i};
end
means = running_sum./length(cell_arr);
Related
This question generalizes the previous one Any way for matlab to sum an array according to specified bins NOT by for iteration? Best if there is buildin function for this. I am not sure, but I tried and the answers in previous post seem not to work with matrices.
For example, if
A = [7,8,1,1,2,2,2]; % the bins or subscripts
B = [2,1; ...
1,1; ...
1,1; ...
2,0; ...
3,1; ...
0,2; ...
2,4]; % the matrix
then the desired function "binsum" has two outputs, one is the bins, and the other is the accumulated row vectors. It is adding rows in B according to subscripts in A. For example, for 2, the sum is [3,1] + [0,2] + [2,4] = [5,6], for 1 it is [1,1] + [2,0] = [3,1].
[bins, sums] = binsum(A,B);
bins = [1,2,7,8]
sums = [2,1;
1,1;
3,1;
5,6]
The first method accumarray says its "val" argument can only be a scalar or vector. The second method spare seems not to accept a vector as the value "v" for each tuple (i,j) neither. So I have to post for help again, and it is still not desired to use iterations to go over the columns of B to do this.
I am using 2017a. Many thanks again!
A way to do that is using matrix multiplication:
bins = unique(A);
sums = (A==bins.')*B;
The above is memory-expensive, as it builds an intermediate logical matrix of size M×N, where M is the the number of bins and N is the length of A. Alternatively, you can build that matrix as sparse logical to save memory:
[bins, ~, labels] = unique(A);
sums = sparse(labels, 1:numel(A), true)*B;
A method base on sort and cumsum:
[s,I]=sort(A);
c=cumsum(B(I,:));
k= [s(1:end-1)~=s(2:end) true];
sums = diff([zeros(1,size(B,2)); c(k,:)])
bins=s(k)
I have a 1-by-4 cell array, D. Each of the cell elements contains 2-by-2 double matrices. I want to do random permutation over each matrix independently which in result I will have the same size cell array as D but its matrices' elements will be permuted and then the inverse in order to obtain the original D again.
for a single matrix case I have the code and it works well as follows:
A=rand(3,3)
p=randperm(numel(A));
A(:)=A(p)
[p1,ind]=sort(p);
A(:)=A(ind)
but it doesn't work for a cell array.
The simplest solution for you is to use a loop:
nd = numel(D);
D_permuted{1,nd} = [];
D_ind{1,nd} = [];
for d = 1:nd)
A=D{d};
p=randperm(numel(A));
A(:)=A(p)
[~,ind]=sort(p);
D_permuted{d} = A;
D_ind{d} = ind;
end
Assuming your D matrix is just a list of identically sized (e.g. 2-by-2) matrices, then you could avoid the loop by using a 3D double matrix instead of the cell-array.
For example if you hade a D like this:
n = 5;
D = repmat([1,3;2,4],1,1,n)*10 %// Example data
Then you can do the permutation like this
m = 2*2; %// Here m is the product of the dimensions of each matrix you want to shuffle
[~,I] = sort(rand(m,n)); %// This is just a trick to get the equivalent of a vectorized form of randperm as unfortunately randperm only accepts scalars
idx = reshape(I,2,2,n);
D_shuffled = D(idx);
I have a cell array with x columns, each with a yx1 cell. I would like to randomize the "rows" within the columns. That is, for each yx1 cell with elements a_1, a_2, ... a_y, I would like to apply the same permutation to the indices of a_i.
I've got a function that does this,
function[Oarray] = shuffleCellArray(Iarray);
len = length(Iarray{1});
width = length(Iarray);
perm = randperm(len);
Oarray=cell(width, 0);
for i=1:width;
for j=1:len;
Oarray{i}{j}=Iarray{i}{perm(j)};
end;
end;
but as you can see it's a bit ugly. Is there a more natural way to do this?
I realize that I'm probably using the wrong data type, but for legacy reasons I'd like to avoid switching. But, if the answer is "switch" then I guess that's the answer.
I'm assuming you have a cell array of column vectors, such as
Iarray = {(1:5).' (10:10:50).' (100:100:500).'};
In that case, you could do it this way:
ind = randperm(numel(Iarray{1})); %// random permutation
Oarray = cellfun(#(x) x(ind), Iarray, 'UniformOutput', 0); %// apply that permutation
%// to each "column"
Or converting to an intermediate matrix and then back to a cell array:
ind = randperm(numel(Iarray{1})); %// random permutation
x = cat(2,Iarray{:}); %// convert to matrix
Oarray = mat2cell(x(ind,:), size(x,1), ones(1,size(x,2))); %// apply permutation to rows
%// and convert back
Say I have a nxm matrix and want to treat each row as vectors in a function. So, if I have a function that adds vectors, finds the Cartesian product of vectors or for some reason takes the input of several vectors, I want that function to treat each row in a matrix as a vector.
This sounds like a very operation in Matlab. You can access the ith row of a matrix A using A(i, :). For example, to add rows i and j, you would do A(i, :) + A(j, :).
Given an nxm matrix A:
If you want to edit a single column/row you could use the following syntax: A(:, i) for the ith-column and A(i, :) for ith-row.
If you want to edit from a column/row i to a column/row j, you could use that syntax: A(:, i:j) or A(i:j, :)
If you want to edit (i.e.) from the penultimate column/row to the last one, you could you: A(:, end-1:end) or A(end-1:end, :)
EDIT:
I can't add a comment above because I don't have 50 points, but you should post the function setprod. I think you should be able to do what you want to do, by iterating the matrix you're passing as an argument, with a for-next statement.
I think you're going to have to loop:
Input
M = [1 2;
3 4;
5 6];
Step 1: Generate a list of all possible row pairs (row index numbers)
n = size(M,1);
row_ind = nchoosek(1:n,2)
Step 2: Loop through these indices and generate the product set:
S{n,n} = []; //% Preallocation of cell matrix
for pair = 1:size(row_ind,1)
p1 = row_ind(pair,1);
p2 = row_ind(pair,2);
S{p1,p2} = setprod(M(p1,:), M(p2,:))
end
Transform the matrix into a list of row vectors using these two steps:
Convert the matrix into a cell array of the matrix rows, using mat2cell.
Generate a comma-separated list from the cell array, using linear indexing of the cell contents.
Example: let
v1 = [1 2];
v2 = [10 20];
v3 = [11 12];
M = [v1; v2; v3];
and let fun be a function that accepts an arbitrary number of vectors as its input. Then
C = mat2cell(M, ones(1,size(M,1)));
result = fun(C{:});
is the same as result = fun(v1, v2, v3).
Assuming i have a series of column-vectors with different length, what would be the best way, in terms of computation time, to join all of them into one matrix where the size of it is determined by the longest column and the elongated columns cells are all filled with NaN's.
Edit: Please note that I am trying to avoid cell arrays, since they are expensive in terms of memory and run time.
For example:
A = [1;2;3;4];
B = [5;6];
C = magicFunction(A,B);
Result:
C =
1 5
2 6
3 NaN
4 NaN
The following code avoids use of cell arrays except for the estimation of number of elements in each vector and this keeps the code a bit cleaner. The price for using cell arrays for that tiny bit of work shouldn't be too expensive. Also, varargin gets you the inputs as a cell array anyway. Now, you can avoid cell arrays there too, but it would most probably involve use of for-loops and might have to use variable names for each of the inputs, which isn't too elegant when creating a function with unknown number of inputs. Otherwise, the code uses numeric arrays, logical indexing and my favourite bsxfun, which must be cheap in the market of runtimes.
Function Code
function out = magicFunction(varargin)
lens = cellfun(#(x) numel(x),varargin);
out = NaN(max(lens),numel(lens));
out(bsxfun(#le,[1:max(lens)]',lens)) = vertcat(varargin{:}); %//'
return;
Example
Script -
A1 = [9;2;7;8];
A2 = [1;5];
A3 = [2;6;3];
out = magicFunction(A1,A2,A3)
Output -
out =
9 1 2
2 5 6
7 NaN 3
8 NaN NaN
Benchmarking
As part of the benchmarking, we are comparing our solution to #gnovice's solution that was mostly based on using cell arrays. Our intention here to see that after avoiding cell arrays, what speedups we are getting if there's any. Here's the benchmarking code with 20 vectors -
%// Let's create row vectors A1,A2,A3.. to be used with #gnovice's solution
num_vectors = 20;
max_vector_length = 1500000;
vector_lengths = randi(max_vector_length,num_vectors,1);
vs =arrayfun(#(x) randi(9,1,vector_lengths(x)),1:numel(vector_lengths),'uni',0);
[A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20] = vs{:};
%// Maximally cell-array based approach used in linked #gnovice's solution
disp('--------------------- With #gnovice''s approach')
tic
tcell = {A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20};
maxSize = max(cellfun(#numel,tcell)); %# Get the maximum vector size
fcn = #(x) [x nan(1,maxSize-numel(x))]; %# Create an anonymous function
rmat = cellfun(fcn,tcell,'UniformOutput',false); %# Pad each cell with NaNs
rmat = vertcat(rmat{:});
toc, clear tcell maxSize fcn rmat
%// Transpose each of the input vectors to get column vectors as needed
%// for our problem
vs = cellfun(#(x) x',vs,'uni',0); %//'
[A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20] = vs{:};
%// Our solution
disp('--------------------- With our new approach')
tic
out = magicFunction(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,...
A11,A12,A13,A14,A15,A16,A17,A18,A19,A20);
toc
Results -
--------------------- With #gnovice's approach
Elapsed time is 1.511669 seconds.
--------------------- With our new approach
Elapsed time is 0.671604 seconds.
Conclusions -
With 20 vectors and with a maximum length of 1500000, the speedups are between 2-3x and it was seen that the speedups have increased as we have increased the number of vectors. The results to prove that are not shown here to save space, as we have already used quite a lot of it here.
If you use a cell matrix you won't need them to be filled with NaNs, just write each array into one column and the unused elements stay empty (that would be the space efficient way). You could either use:
cell_result{1} = A;
cell_result{2} = B;
THis would result in a size 2 cell array which contains all elements of A,B in his elements. Or if you want them to be saved as columns:
cell_result(1,1:numel(A)) = num2cell(A);
cell_result(2,1:numel(B)) = num2cell(B);
If you need them to be filled with NaN's for future coding, it would be the easiest to find the maximum length you got. Create yourself a matrix of (max_length X Number of arrays).
So lets say you have n=5 arrays:A,B,C,D and E.
h=zeros(1,n);
h(1)=numel(A);
h(2)=numel(B);
h(3)=numel(C);
h(4)=numel(D);
h(5)=numel(E);
max_No_Entries=max(h);
result= zeros(max_No_Entries,n);
result(:,:)=NaN;
result(1:numel(A),1)=A;
result(1:numel(B),2)=B;
result(1:numel(C),3)=C;
result(1:numel(D),4)=D;
result(1:numel(E),5)=E;