Efficient aggregation of high dimensional arrays - matlab

I have a 3 dimensional (or higher) array that I want to aggregate by another vector. The specific application is to take daily observations of spatial data and average them to get monthly values. So, I have an array with dimensions <Lat, Lon, Day> and I want to create an array with dimensions <Lat, Lon, Month>.
Here is a mock example of what I want. Currently, I can get the correct output using a loop, but in practice, my data is very large, so I was hoping for a more efficient solution than the second loop:
% Make the mock data
A = [1 2 3; 4 5 6];
X = zeros(2, 3, 9);
for j = 1:9
X(:, :, j) = A;
A = A + 1;
end
% Aggregate the X values in groups of 3 -- This is the part I would like help on
T = [1 1 1 2 2 2 3 3 3];
X_agg = zeros(2, 3, 3);
for i = 1:3
X_agg(:,:,i) = mean(X(:,:,T==i),3);
end
In 2 dimensions, I would use accumarray, but that does not accept higher dimension inputs.

Before getting to your answer let's first rewrite your code in a more general way:
ag = 3; % or agg_size
X_agg = zeros(size(X)./[1 1 ag]);
for i = 1:ag
X_agg(:,:,i) = mean(X(:,:,(i-1)*ag+1:i*ag), 3);
end
To avoid using the for loop one idea is to reshape your X matrix to something that you can use the mean function directly on.
splited_X = reshape(X(:), [size(X_agg), ag]);
So now splited_X(:,:,:,i) is the i-th part
that contains all the matrices that should be aggregated which is X(:,:,(i-1)*ag+1:i*ag)) (like above)
Now you just need to find the mean in the 3rd dimension of splited_X:
temp = mean(splited_X, 3);
However this results in a 4D matrix (where its 3rd dimension size is 1). You can again turn it into 3D matrix using reshape function:
X_agg = reshape(temp, size(X_agg))
I have not tried it to see how much more efficient it is, but it should do better for large matrices since it doesn't use for loops.

Related

Matlab: Vectorizing 4 nested for loops

So, I need to vectorize some for loops into a single line. I understand how vectorize one and two for-loops, but am really struggling to do more than that. Essentially, I am computing a "blur" matrix M2 of size (n-2)x(m-2) of an original matrix M of size nxm, where s = size(M):
for x = 0:1
for y = 0:1
m = zeros(1, 9);
k = 1;
for i = 1:(s(1) - 1)
for j = 1:(s(2) - 1)
m(1, k) = M(i+x,j+y);
k = k+1;
end
end
M2(x+1,y+1) = mean(m);
end
end
This is the closest I've gotten:
for x=0:1
for y=0:1
M2(x+1, y+1) = mean(mean(M((x+1):(3+x),(y+1):(3+y))))
end
end
To get any closer to a one-line solution, it seems like there has to be some kind of "communication" where I assign two variables (x,y) to index over M2 and index over M; I just don't see how it can be done otherwise, but I am assured there is a solution.
Is there a reason why you are not using MATLAB's convolution function to help you do this? You are performing a blur with a 3 x 3 averaging kernel with overlapping neighbourhoods. This is exactly what convolution is doing. You can perform this using conv2:
M2 = conv2(M, ones(3) / 9, 'valid');
The 'valid' flag ensures that you return a size(M) - 2 matrix in both dimensions as you have requested.
In your code, you have hardcoded this for a 4 x 4 matrix. To double-check to see if we have the right results, let's generate a random 4 x 4 matrix:
rng(123);
M = rand(4, 4);
s = size(M);
If we run this with your code, we get:
>> M2
M2 =
0.5054 0.4707
0.5130 0.5276
Doing this with conv2:
>> M2 = conv2(M, ones(3) / 9, 'valid')
M2 =
0.5054 0.4707
0.5130 0.5276
However, if you want to do this from first principles, the overlapping of the pixel neighbourhoods is very difficult to escape using loops. The two for loop approach you have is good enough and it tackles the problem appropriately. I would make the size of the input instead of being hard coded. Therefore, write a function that does something like this:
function M2 = blur_fp(M)
s = size(M);
M2 = zeros(s(1) - 2, s(2) - 2);
for ii = 2 : s(1) - 1
for jj = 2 : s(2) - 1
p = M(ii - 1 : ii + 1, jj - 1 : jj + 1);
M2(ii - 1, jj - 1) = mean(p(:));
end
end
The first line of code defines the function, which we will call blur_fp. The next couple lines of code determine the size of the input matrix as well as initialising a blank matrix to store out output. We then loop through each pixel location in the matrix that is possible without the kernel going outside of the boundaries of the image, we grab a 3 x 3 neighbourhood with each pixel location serving as the centre, we then unroll the matrix into a single column vector, find the average and store it in the appropriate output. For small kernels and relatively large matrices, this should perform OK.
To take this a little further, you can use user Divakar's im2col_sliding function which takes overlapping neighbourhoods and unrolls them into columns. Therefore, each column represents a neighbourhood which you can then blur the input using vector-matrix multiplication. You would then use reshape to reshape the result back into a matrix:
T = im2col_sliding(M, [3 3]);
V = ones(1, 9) / 9;
s = size(M);
M2 = reshape(V * T, s(1) - 2, s(2) - 2);
This unfortunately cannot be done in a single line unless you use built-in functions. I'm not sure what your intention is, but hopefully the gamut of approaches you have seen here have given you some insight on how to do this efficiently. BTW, using loops for small matrices (i.e. 4 x 4) may be better in efficiency. You will start to notice performance changes when you increase the size of the input... then again, I would argue that using loops are competitive as of R2015b when the JIT has significantly improved.

MATLAB: Applying vectors of row and column indices without looping

I have a situation analogous to the following
z = magic(3) % Data matrix
y = [1 2 2]' % Column indices
So,
z =
8 1 6
3 5 7
4 9 2
y represents the column index I want for each row. It's saying I should take row 1 column 1, row 2 column 2, and row 3 column 2. The correct output is therefore 8 5 9.
I worked out I can get the correct output with the following
x = 1:3;
for i = 1:3
result(i) = z(x(i),y(i));
end
However, is it possible to do this without looping?
Two other possible ways I can suggest is to use sub2ind to find the linear indices that you can use to sample the matrix directly:
z = magic(3);
y = [1 2 2];
ind = sub2ind(size(z), 1:size(z,1), y);
result = z(ind);
We get:
>> result
result =
8 5 9
Another way is to use sparse to create a sparse matrix which you can turn into a logical matrix and then sample from the matrix with this logical matrix.
s = sparse(1:size(z,1), y, 1, size(z,1), size(z,2)) == 1; % Turn into logical
result = z(s);
We also get:
>> result
result =
8
5
9
Be advised that this only works provided that each row index linearly increases from 1 up to the end of the rows. This conveniently allows you to read the elements in the right order taking advantage of the column-major readout that MATLAB is based on. Also note that the output is also a column vector as opposed to a row vector.
The link posted by Adriaan is a great read for the next steps in accessing elements in a vectorized way: Linear indexing, logical indexing, and all that.
there are many ways to do this, one interesting way is to directly work out the indexes you want:
v = 0:size(y,2)-1; %generates a number from 0 to the size of your y vector -1
ind = y+v*size(z,2); %generates the indices you are looking for in each row
zinv = z';
zinv(ind)
>> ans =
8 5 9

Append vector to 3d matrix in Matlab

I wish to append a row-vector (and later, also a column vector) to an existing x by y by z matrix. So basically "Add a new row (at the "bottom") for each z in the original 3d matrix. Consider the following short Matlab program
appendVector = [1 2 3 4 5]; % Small matrix for brevity. Actual matrices used are much larger.
origMatrix = ones(5,5,3);
appendMatrix = [origMatrix( ... ); appendVector];
My question is: How do I adress (using Matlab-style matrix adressing, not a "manual" C-like loop) origMatrix( ... ) in order to append the vector above? Feel free to also include a suggestion on how to do the same operation for a column-vector (I am thinking that the correct way to do the latter is to simply use the '-operator in Matlab).
A "row" in a 3D matrix is actually a multi-dimensional array.
size(origMatrix(1,:,:))
% 5 3
So to append a row, you would need to append a 5 x 3 array.
toAppend = rand(5, 3);
appendMatrix = cat(1, origMatrix, toAppend);
You could append just a 5 element vector and specify an index for the third dimension. In this case, the value for the "row" for all other indices in the third dimension would be filled with zeros.
appendVector = [1 2 3 4 5];
origMatrix = ones(5,5,3);
appendMatrix = origMatrix;
appendMatrix(end+1, :, 1) = appendVector;
If instead, you want to append the same vector along the third dimension, you could use repmat to turn your vector into a 1 x 5 x 3 array and then append that.
appendVector = repmat([1 2 3 4 5], 1, 1, size(origMatrix, 3));
appendMatrix = cat(1, origMatrix, appendVector);

Sum every n rows of matrix

Is there any way that I can sum up columns values for each group of three rows in a matrix?
I can sum three rows up in a manual way.
For example
% matrix is the one I wanna store the new data.
% data is the original dataset.
matrix(1,1:end) = sum(data(1:3, 1:end))
matrix(2,1:end) = sum(data(4:6, 1:end))
...
But if the dataset is huge, this wouldn't work.
Is there any way to do this automatically without loops?
Here are four other ways:
The obligatory for-loop:
% for-loop over each three rows
matrix = zeros(size(data,1)/3, size(data,2));
counter = 1;
for i=1:3:size(data,1)
matrix(counter,:) = sum(data(i:i+3-1,:));
counter = counter + 1;
end
Using mat2cell for tiling:
% divide each three rows into a cell
matrix = mat2cell(data, ones(1,size(data,1)/3)*3);
% compute the sum of rows in each cell
matrix = cell2mat(cellfun(#sum, matrix, 'UniformOutput',false));
Using third dimension (based on this):
% put each three row into a separate 3rd dimension slice
matrix = permute(reshape(data', [], 3, size(data,1)/3), [2 1 3]);
% sum rows, and put back together
matrix = permute(sum(matrix), [3 2 1]);
Using accumarray:
% build array of group indices [1,1,1,2,2,2,3,3,3,...]
idx = floor(((1:size(data,1))' - 1)/3) + 1;
% use it to accumulate rows (appliead to each column separately)
matrix = cell2mat(arrayfun(#(i)accumarray(idx,data(:,i)), 1:size(data,2), ...
'UniformOutput',false));
Of course all the solution so far assume that the number of rows is evenly divisble by 3.
This one-liner reshapes so that all the values needed for a particular cell are in a column, does the sum, and then reshapes the back to the expected shape.
reshape(sum(reshape(data, 3, [])), [], size(data, 2))
The naked 3 could be changed if you want to sum a different number of rows together. It's on you to make sure the number of rows in each group divides evenly.
Slice the matrix into three pieces and add them together:
matrix = data(1:3:end, :) + data(2:3:end, :) + data(3:3:end, :);
This will give an error if size(data,1) is not a multiple of three, since the three pieces wouldn't be the same size. If appropriate to your data, you might work around that by truncating data, or appending some zeros to the end.
You could also do something fancy with reshape and 3D arrays. But I would prefer the above (unless you need to replace 3 with a variable...)
Prashant answered nicely before but I would have a simple amendment:
fl = filterLength;
A = yourVector (where mod(A,fl)==0)
sum(reshape(A,fl,[]),1).'/fl;
There is the ",1" that makes the line run even when fl==1 (original values).
I discovered this while running it in a for loop like so:
... read A ...
% Plot data
hold on;
averageFactors = [1 3 10 30 100 300 1000];
colors = hsv(length(averageFactors));
clear legendTxt;
for i=1:length(averageFactors)
% ------ FILTERING ----------
clear Atrunc;
clear ttrunc;
clear B;
fl = averageFactors(i); % filter length
Atrunc = A(1:L-mod(L,fl),:);
ttrunc = t(1:L-mod(L,fl),:);
B = sum(reshape(Atrunc,fl,[]),1).'/fl;
tB = sum(reshape(ttrunc,fl,[]),1).'/fl;
length(B)
plot(tB,B,'color',colors(i,:) )
%kbhit ()
endfor

Creating a vector with random sampling of two vectors in matlab

How does one create a vector that is composed of a random sampling of two other vectors?
For example
Vector 1 [1, 3, 4, 7], Vector 2 [2, 5, 6, 8]
Random Vector [random draw from vector 1 or 2 (value 1 or 2), random draw from vector 1 or 2 (value 3 or 5)... etc]
Finally, how can one ask matlab to repeat this process n times to draw a distribution of results?
Thank you,
There are many ways you could do this. One possibility is:
tmp=round(rand(size(vector1)))
res = tmp.*vector1 + (1-tmp).*vector2
To get one mixed sample, you may use the idea of the following code snippet (not the optimal one, but maybe clear enough):
a = [1, 3, 4, 7];
b = [2, 5, 6, 8];
selector = randn(size(a));
sample = a.*(selector>0) + b.*(selector<=0);
For n samples put the above code in a for loop:
for k=1:n
% Sample code (without initial "samplee" assignments)
% Here do stuff with the sample
end;
More generally, if X is a matrix and for each row you want to take a sample from a column chosen at random, you can do this with a loop:
y = zeros(size(X,1),1);
for ii = 1:size(X,1)
y(ii) = X(ii,ceil(rand*size(X,2)));
end
You can avoid the loop using clever indexing via sub2ind:
idx_n = ceil(rand(size(X,1),1)*size(X,2));
idx = sub2ind(size(X),(1:size(X,1))',idx_n);
y = X(idx);
If I understand your question, you are choosing two random numbers. First you decide whether to select vector 1 or vector 2; next you pick an element from the chosen vector.
The following code takes advantage of the fact that vector1 and vector2 are the same length:
N = 1000;
sampleMatrix = [vector1 vector2];
M = numel(sampleMatrix);
randIndex = ceil(rand(1,N)*M); % N random numbers from 1 to M
randomNumbers = sampleMatrix(randIndex); % sample N times from the matrix
You can then display the result with, for instance
figure; hist(randomNumbers); % draw a histogram of numbers drawn
When vector1 and vector2 have different elements, you run into a problem. If you concatenate them, you will end up picking elements from the longer vector more often. One way around this is to create random samplings from both arrays, then choose between them:
M1 = numel(vector1);
M2 = numel(vector2);
r1 = ceil(rand(1,N)*M1);
r2 = ceil(rand(1,N)*M2);
randMat = [vector1(r1(:)) vector2(r2(:))]; % two columns, now pick one or the other
randPick = ceil(rand(1,N)*2);
randomNumbers = [randMat(randPick==1, 1); randMat(randPick==2, 2)];
On re-reading, maybe you just want to pick "element 1 from either 1 or 2", then "element 2 from either 1 or 2", etc for all the elements of the vector. In that case, do
N=numel(vector1);
randPick = ceil(rand(1,N)*2);
randMat=[vector1(:) vector2(:)];
randomNumbers = [randMat(randPick==1, 1); randMat(randPick==2, 2)];
This problem can be solved using the function datasample.
Combine both vectors into one and apply the function. I like this approach more than the handcrafted versions in the other answers. It gives you much more flexibility in choosing what you actually want, while being a one-liner.