I have a vector of indices and want to increase values in matrix in every index. For example:
ind = [1 2 2 5];
m = zeros(3);
m(ind) = m(ind) + 1;
The result is as follow:
m = [1 0 0
1 1 0
0 0 0]
But I need the results to be
m = [1 0 0
2 1 0
0 0 0]
The time complexity is very important to me, and I can't use for. Thanks.
Here's a way. I haven't timed it.
ind = [1 2 2 5];
N = 3;
m = full(reshape(sparse(ind, 1, 1, N^2, 1), N, N));
Equivalently, you can use
ind = [1 2 2 5];
N = 3;
m = reshape(accumarray(ind(:), 1, [N^2 1]), N, N);
or its variation (thanks to #beaker)
ind = [1 2 2 5];
N = 3;
m = zeros(N);
m(:) = accumarray(ind(:), 1, [N^2 1]);
This one is probably slower than the others:
ind = [1 2 2 5];
N = 3;
m = zeros(N);
[ii, ~, vv] = find(accumarray(ind(:), 1));
m(ii) = vv;
For a sorted array of indices, we can have a play of diff -
out = zeros(M,N); % Output array of size(M,N)
df = diff([0,ind,ind(end)+1]);
put_idx = diff(find(df)); % gets count of dups
out(ind(df(1:end-1)~=0)) = put_idx;
The basic idea being we count the duplicates along the length with diff. Those counts are the values to be assigned into the zeros array. The indices at which those values are to be assigned are simply the unique indices, which could be found out by looking for the start of each group of duplicate indices.
Benchmarking
Script to create sorted indices array (create_data.m) -
function ind = create_data(M,N, num_unq_ind, max_repeats)
unq_ind = unique(randi([1,M*N],1,num_unq_ind));
num_repeats = randi(max_repeats, [1,numel(unq_ind)]);
ind = repelem(unq_ind, num_repeats);
Benchmarking script (bench1.m) to test out various scenarios -
clear all; close all;
M = 5000; % Array size
N = 5000;
% Input params and setup input indices array (edited for various runs)
num_unq_ind = 100000;
max_repeats = 100;
ind = create_data(M,N, num_unq_ind, max_repeats);
num_iter = 100; % No. of iterations to have reliable benchmarking
disp('Input params :')
disp(['num_unq_ind = ' int2str(num_unq_ind)])
disp(['max_repeats = ' int2str(max_repeats)])
disp('------------------ Using diff ----------------')
tic
for i=1:num_iter
out = zeros(M,N);
df = diff([0,ind,ind(end)+1]);
put_idx = diff(find(df));
out(ind(df(1:end-1)~=0)) = put_idx;
end
toc
% Luis's soln
disp('------------------ Using accumaray ----------------')
tic
for i=1:num_iter
m = reshape(accumarray(ind(:), 1, [N^2 1]), N, N);
end
toc
Various scenario runs -
>> bench1
Input params :
num_unq_ind = 10000
max_repeats = 10
------------------ Using diff ----------------
Elapsed time is 0.948544 seconds.
------------------ Using accumaray ----------------
Elapsed time is 1.502658 seconds.
>> bench1
Input params :
num_unq_ind = 100000
max_repeats = 10
------------------ Using diff ----------------
Elapsed time is 1.784576 seconds.
------------------ Using accumaray ----------------
Elapsed time is 1.533280 seconds.
>> bench1
Input params :
num_unq_ind = 10000
max_repeats = 100
------------------ Using diff ----------------
Elapsed time is 1.315998 seconds.
------------------ Using accumaray ----------------
Elapsed time is 1.391323 seconds.
>> bench1
Input params :
num_unq_ind = 100000
max_repeats = 100
------------------ Using diff ----------------
Elapsed time is 6.180565 seconds.
------------------ Using accumaray ----------------
Elapsed time is 3.576154 seconds.
With less sparsey and more repeats, accumarray seems to be doing better.
You can use histcounts
n = 3;
m = reshape(histcounts(ind, [1:n^2 n^2]), n, n);
Related
Working with Matlab 2019b.
x = [10 10 10 20 20 30]';
How do I get a cumulative count of unique elements in x, which should look like:
y = [1 2 3 1 2 1]';
EDIT:
My real array is actually much longer than the example given above. Below are the methods I tested:
x = randi([1 100], 100000, 1);
x = sort(x);
% method 1: check neighboring values in one loop
tic
y = ones(size(x));
for ii = 2:length(x)
if x(ii) == x(ii-1)
y(ii) = y(ii-1) + 1;
end
end
toc
% method 2 (Wolfie): count occurrence of unique values explicitly
tic
u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
idx = (x == u(ii));
y(idx) = 1:nnz(idx);
end
toc
% method 3 (Luis Mendo): triangular matrix
tic
y = sum(triu(x==x'))';
toc
Results:
Method 1: Elapsed time is 0.016847 seconds.
Method 2: Elapsed time is 0.037124 seconds.
Method 3: Elapsed time is 10.350002 seconds.
EDIT:
Assuming that x is sorted:
x = [10 10 10 20 20 30].';
x = sort(x);
d = [1 ;diff(x)];
f = find(d);
d(f) = f;
ic = cummax(d);
y = (2 : numel(x) + 1).' - ic;
When x is unsorted use this:
[s, is] = sort(x);
d = [1 ;diff(s)];
f = find(d);
d(f) = f;
ic = cummax(d);
y(is) = (2 : numel(s) + 1).' - ic;
Original Answer that only works on GNU Octave:
Assuming that x is sorted:
x = [10 10 10 20 20 30].';
x = sort(x);
[~, ic] = cummax(x);
y = (2 : numel(x) + 1).' - ic;
When x is unsorted use this:
[s, is] = sort(x);
[~, ic] = cummax(s);
y(is) = (2 : numel(s) + 1).' - ic;
You could loop over the unique elements, and set their indices to 1:n each time...
u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
idx = (x == u(ii));
y(idx) = 1:nnz(idx);
end
This is a little inefficient because it generates an intermediate matrix, when actually only a triangular half is needed:
y = sum(triu(x==x.')).';
Here's a no-for-loop version. On my machine it's a bit faster than the previous working methods:
% if already sorted, can omit this first and last line
[s, is] = sort(x);
[u,~,iu] = unique(s);
c = accumarray(iu,1);
cs = cumsum([0;c]);
z = (1:numel(x))'-repelem(cs(1:end-1),c);
y(is) = z;
I have a big matrix M (nxm). I am going to sum some elements which are specified by index stored in vector as cell elements. There are many groups of indices so the cell has more than one element. For example
M = rand(2103, 2030);
index{1} = [1 3 2 4 53 5 23 3];
index{2} = [2 3 1 3 23 10234 2032];
% ...
index{2032} = ...;
I am going to sum up all elements at index{1}, sum up all elements at index{2} ..., now I am using a loop
sums = zeros(1, 2032);
for n=1:2032
sums(n) = sum(M(index{n}));
end
I am wondering if there is any way to use one-line command instead of a loop to do that. Using a loop is pretty slow.
Probably a classic use of cellfun
sums = cellfun(#(idx) sum(M(idx)), index);
EDIT: here is a benchmarking for a large case that shows that this approach is slightly slower than a for loop but faster than Eitan T's method
M = rand(2103, 2030);
index = cell(1, 2032);
index{1} = [1 3 2 4 53 5 23 3];
index{2} = [2 3 1 3 23 10234 2032];
for n=3:2032
index{n} = randi(numel(M), 1, randi(10000));
end
N = 1e1;
sums = zeros(1, 2032);
tic
for kk = 1:N
for n=1:2032
sums(n) = sum(M(index{n}));
end
end
toc
tic
for kk = 1:N
sums = cellfun(#(idx) sum(M(idx)), index);
end
toc
tic
for kk = 1:N
sums = cumsum(M([index{:}]));
sums = diff([0, sums(cumsum(cellfun('length', index)))]);
end
toc
results in
Elapsed time is 2.072292 seconds.
Elapsed time is 2.139882 seconds.
Elapsed time is 2.669894 seconds.
Perhaps not as elegant as a cellfun one-liner, but runs more than an order of magnitude faster:
sums = cumsum(M([index{:}]));
sums = diff([0, sums(cumsum(cellfun('length', index)))]);
It even runs approximately 4 or 5 times faster than a JIT-accelerated loop for large inputs. Note that when each cell in index contains a vector with more than ~2000 elements, the performance of this approach begins to deteriorate in comparison with a loop (and cellfun).
Benchmark
M = rand(2103, 2030);
I = ceil(numel(M) * rand(2032, 10));
index = mat2cell(I, ones(size(I, 1), 1), size(I, 2));
N = 100;
tic
for k = 1:N
sums = zeros(1, numel(index));
for n = 1:numel(sums)
sums(n) = sum(M(index{n}));
end
end
toc
tic
for k = 1:N
sums = cellfun(#(idx) sum(M(idx)), index);
end
toc
tic
for k = 1:N
sums = cumsum(M([index{:}]));
sums2 = diff([0, sums(cumsum(cellfun('length', index)))]);
end
toc
When executing this in MATLAB 2012a (Windows Server 2008 R2 running on a 2.27GHz 16-core Intel Xeon processor), I got:
Elapsed time is 0.579783 seconds.
Elapsed time is 1.789809 seconds.
Elapsed time is 0.111455 seconds.
Given the matrix:
a =
1 1 2 2
1 1 2 2
3 3 4 4
3 3 4 4
I would like to get the following four 2x2 matrices:
a1 =
1 1
1 1
a2 =
2 2
2 2
a3 =
3 3
3 3
a4 =
4 4
4 4
From there, I would like to take the max of each matrix and then reshape the result into a 2x2 result matrix, like so:
r =
1 2
3 4
The location of the result max values relative to their original position in the initial matrix is important.
Currently, I'm using the following code to accomplish this:
w = 2
S = zeros(size(A, 1)/w);
for i = 1:size(S)
for j = 1:size(S)
Window = A(i*w-1:i*w, j*w-1:j*w);
S(i, j) = max(max(Window));
end
end
This works but it seems like there must be a way that doesn't involve iteration (vectorization).
I tried using reshape like so:
reshape(max(max(reshape(A, w, w, []))), w, w, [])
however that takes the max of the wrong values and returns:
ans =
3 4
3 4
Is there any way to accomplish this without iteration or otherwise improve my iterative method?
UPDATE: I'm not sure how I've ended up with the most votes (as of 2012-10-28). For anyone reading this, please see angainor's or Rody's answers for better solutions that don't require any additional toolboxes.
Here is a horse race of every answer thus far (excluding Nates - sorry, don't have the requisite toolbox):
Z = 1000;
A = [1 1 2 2; 1 1 2 2; 3 3 4 4; 3 3 4 4];
w = 2;
%Method 1 (OP method)
tic
for z = 1:Z
S = zeros(size(A, 1)/w);
for i = 1:size(S)
for j = 1:size(S)
Window = A(i*w-1:i*w, j*w-1:j*w);
S(i, j) = max(max(Window));
end
end
end
toc
%Method 2 (My double loop with improved indexing)
tic
for z = 1:Z
wm = w - 1;
Soln2 = NaN(w, w);
for m = 1:w:size(A, 2)
for n = 1:w:size(A, 1)
Soln2((m+1)/2, (n+1)/2) = max(max(A(n:n+wm, m:m+wm)));
end
end
Soln2 = Soln2';
end
toc
%Method 3 (My one line method)
tic
for z = 1:Z
Soln = cell2mat(cellfun(#max, cellfun(#max, mat2cell(A, [w w], [w w]), 'UniformOutput', false), 'UniformOutput', false));
end
toc
%Method 4 (Rody's method)
tic
for z = 1:Z
b = [A(1:2,:) A(3:4,:)];
reshape(max(reshape(b, 4,[])), 2,2);
end
toc
The results of the speed test (the loop over z) are:
Elapsed time is 0.042246 seconds.
Elapsed time is 0.019071 seconds.
Elapsed time is 0.165239 seconds.
Elapsed time is 0.011743 seconds.
Drat! It appears that Rody (+1) is the winner. :-)
UPDATE: New entrant to the race angainor (+1) takes the lead!
Not very general, but it works for a:
b = [a(1:2,:) a(3:4,:)];
reshape(max(reshape(b, 4,[])), 2,2).'
The general version of this is a bit *ahum* fuglier:
% window size
W = [2 2];
% number of blocks (rows, cols)
nW = size(a)./W;
% indices to first block
ids = bsxfun(#plus, (1:W(1)).', (0:W(2)-1)*size(a,1));
% indices to all blocks in first block-column
ids = bsxfun(#plus, ids(:), (0:nW(1)-1)*W(1));
% indices to all blocks
ids = reshape(bsxfun(#plus, ids(:), 0:nW(1)*prod(W):numel(a)-1), size(ids,1),[]);
% maxima
M = reshape(max(a(ids)), nW)
It can be done a bit more elegantly:
b = kron(reshape(1:prod(nW), nW), ones(W));
C = arrayfun(#(x) find(b==x), 1:prod(nW), 'uni', false);
M = reshape(max(a([C{:}])), nW)
but I doubt that's gonna be faster...
Another option: slower than the cell2mat(cellfun...) code, but gives the intermediate step:
fun = #(block_struct) reshape((block_struct.data), [],1);
B = reshape(blockproc(A,[2 2],fun),2,2,[])
r=reshape(max(max(B)) ,2,[])
B(:,:,1) =
1 1
1 1
B(:,:,2) =
3 3
3 3
B(:,:,3) =
2 2
2 2
B(:,:,4) =
4 4
4 4
r =
1 2
3 4
I'll join the horse-race with another non-general (yet;) solution, based on linear indices
idx = [1 2 5 6; 3 4 7 8]';
splita = [A(idx) A(idx+8)];
reshape(max(splita), 2, 2);
The times obtained by Colins code, my method last:
Elapsed time is 0.039565 seconds.
Elapsed time is 0.021723 seconds.
Elapsed time is 0.168946 seconds.
Elapsed time is 0.011688 seconds.
Elapsed time is 0.006255 seconds.
The idx array can be easily generalized to larger windows and system sizes.
Note: Nate's solution uses the Image Processing Toolbox function |blockproc|. I would rewrite that:
fun = #(x) max(max(x.data));
r = blockproc(A,[2 2],fun)
Comparing timing across different computers is fraught with difficulties, as is timing things once that are happening in a fraction of a second. TIMEIT would be useful here:
http://www.mathworks.com/matlabcentral/fileexchange/18798
But timing this on my computer with tic/toc took 0.008 seconds.
Cheers,
Brett
I have a very large matrix (216 rows, 31286 cols) of doubles. For reasons specific to the data, I want to average every 9 rows to produce one new row. So, the new matrix will have 216/9=24 rows.
I am a Matlab beginner so I was wondering if this solution I came up with can be improved upon. Basically, it loops over every group, sums up the rows, and then divides the new row by 9. Here's a simplified version of what I wrote:
matrix_avg = []
for group = 1:216/9
new_row = zeros(1, 31286);
idx_low = (group - 1) * 9 + 1;
idx_high = idx_low + 9 - 1;
% Add the 9 rows to new_row
for j = idx_low:idx_high
new_row = new_row + M(j,:);
end
% Compute the mean
new_row = new_row ./ 9
matrix_avg = [matrix_avg; new_row];
end
You can reshape your big matrix from 216 x 31286 to 9 x (216/9 * 31286).
Then you can use mean, which operates on each column. Since your matrix only has 9 rows per column, this takes the 9-row average.
Then you can just reshape your matrix back.
% generate big matrix
M = rand([216 31286]);
n = 9 % want 9-row average.
% reshape
tmp = reshape(M, [n prod(size(M))/n]);
% mean column-wise (and only 9 rows per col)
tmp = mean(tmp);
% reshape back
matrix_avg = reshape(tmp, [ size(M,1)/n size(M,2) ]);
In a one-liner (but why would you?):
matrix_avg = reshape(mean(reshape(M,[n prod(size(M))/n])), [size(M,1)/n size(M,2)]);
Note - this will have problems if the number of rows in M isn't exactly divisible by 9, but so will your original code.
I measured the 4 solutions and here are the results:
reshape: Elapsed time is 0.017242 seconds.
blockproc [9 31286]: Elapsed time is 0.242044 seconds.
blockproc [9 1]: Elapsed time is 44.477094 seconds.
accumarray: Elapsed time is 103.274071 seconds.
This is the code I used:
M = rand(216,31286);
fprintf('reshape: ');
tic;
n = 9;
matrix_avg1 = reshape(mean(reshape(M,[n prod(size(M))/n])), [size(M,1)/n size(M,2)]);
toc
fprintf('blockproc [9 31286]: ');
tic;
fun = #(block_struct) mean(block_struct.data);
matrix_avg2 = blockproc(M,[9 31286],fun);
toc
fprintf('blockproc [9 1]: ');
tic;
fun = #(block_struct) mean(block_struct.data);
matrix_avg3 = blockproc(M,[9 1],fun);
toc
fprintf('accumarray: ');
tic;
[nR,nC] = size(M);
n2average = 9;
[xx,yy] = ndgrid(1:nR,1:nC);
x = ceil(xx/n2average); %# makes xx 1 1 1 1 2 2 2 2 etc
matrix_avg4 = accumarray([xx(:),yy(:)],M(:),[],#mean);
toc
Here's an alternative based on accumarray. You create an array with row and column indices into matrix_avg that tells you which element in matrix_avg a given element in M contributes to, then you use accumarray to average the elements that contribute to the same element in matrix_avg. This solution works even if the number of rows in M is not divisible by 9.
M = rand(216,31286);
[nR,nC] = size(M);
n2average = 9;
[xx,yy] = ndgrid(1:nR,1:nC);
x = ceil(xx/n2average); %# makes xx 1 1 1 1 2 2 2 2 etc
matrix_avg = accumarray([xx(:),yy(:)],M(:),[],#mean);
I have N^2 matrixes.
Each one is a 3x3 matrix.
One way to concatenation them to a 3Nx3N matrix is to write
A(:,:,i)= # 3x3 matrix i=1:N^2
B=[A11 A12 ..A1N;A21 ...A2N;...]
But When N is large is a tedious work.
What do you offer?
Here's a really fast one-liner that only uses RESHAPE and PERMUTE:
B = reshape(permute(reshape(A,3,3*N,N),[2 1 3]),3*N,3*N).';
And a test:
>> N=2;
>> A = rand(3,3,N^2)
A(:,:,1) =
0.5909 0.6571 0.8082
0.7118 0.6090 0.7183
0.4694 0.9588 0.5582
A(:,:,2) =
0.1791 0.6844 0.6286
0.4164 0.4140 0.5833
0.1380 0.1099 0.8970
A(:,:,3) =
0.2232 0.2355 0.1214
0.1782 0.6873 0.3394
0.5645 0.4745 0.9763
A(:,:,4) =
0.5334 0.7559 0.9984
0.8454 0.7618 0.1065
0.0549 0.5029 0.3226
>> B = reshape(permute(reshape(A,3,3*N,N),[2 1 3]),3*N,3*N).'
B =
0.5909 0.6571 0.8082 0.1791 0.6844 0.6286
0.7118 0.6090 0.7183 0.4164 0.4140 0.5833
0.4694 0.9588 0.5582 0.1380 0.1099 0.8970
0.2232 0.2355 0.1214 0.5334 0.7559 0.9984
0.1782 0.6873 0.3394 0.8454 0.7618 0.1065
0.5645 0.4745 0.9763 0.0549 0.5029 0.3226
Try the following code:
N = 4;
A = rand(3,3,N^2); %# 3-by-3-by-N^2
c1 = squeeze( num2cell(A,[1 2]) );
c2 = cell(N,1);
for i=0:N-1
c2{i+1} = cat(2, c1{i*N+1:(i+1)*N});
end
B = cat(1, c2{:}); %# 3N-by-3N
Another possibility involving mat2cell and reshape
N = 2;
A = rand(3,3,N^2);
C = mat2cell(A,3,3,ones(N^2,1));
C = reshape(C,N,N)'; %'# make a N-by-N cell array and transpose
%# catenate into 3N-by-3N cell array
B = cell2mat(C);
Here's the same in one line if you like that better
B = cell2mat(reshape(mat2cell(A,2,2,ones(N^2,1)),N,N)');
For N=2
>> A = rand(3,3,N^2)
A(:,:,1) =
0.40181 0.12332 0.41727
0.075967 0.18391 0.049654
0.23992 0.23995 0.90272
A(:,:,2) =
0.94479 0.33772 0.1112
0.49086 0.90005 0.78025
0.48925 0.36925 0.38974
A(:,:,3) =
0.24169 0.13197 0.57521
0.40391 0.94205 0.05978
0.096455 0.95613 0.23478
A(:,:,4) =
0.35316 0.043024 0.73172
0.82119 0.16899 0.64775
0.015403 0.64912 0.45092
B =
0.40181 0.12332 0.41727 0.94479 0.33772 0.1112
0.075967 0.18391 0.049654 0.49086 0.90005 0.78025
0.23992 0.23995 0.90272 0.48925 0.36925 0.38974
0.24169 0.13197 0.57521 0.35316 0.043024 0.73172
0.40391 0.94205 0.05978 0.82119 0.16899 0.64775
0.096455 0.95613 0.23478 0.015403 0.64912 0.45092
Why not do the old fashioned pre-allocate and loop? Should be pretty fast.
N = 4;
A = rand(3,3,N^2); % Assuming column major order for Aij
8
B = zeros(3*N, 3*N);
for j = 1:N^2
ix = mod(j-1, N)*3 + 1;
iy = floor((j-1)/N)*3 + 1;
fprintf('%02d - %02d\n', ix, iy);
B(ix:ix+2, iy:iy+2) = A(:,:,j);
end
EDIT: For the speed junkies out here are the rankings:
N = 200;
A = rand(3,3,N^2); % test set
#gnovice solution: Elapsed time is 0.013069 seconds.
#Amro solution: Elapsed time is 0.203308 seconds.
#Rich C solution: Elapsed time is 0.887077 seconds.
#Jonas solution: Elapsed time is 7.065174 seconds.