Matlab function for cumulative power - matlab

Is there a function in MATLAB that generates the following matrix for a given scalar r:
1 r r^2 r^3 ... r^n
0 1 r r^2 ... r^(n-1)
0 0 1 r ... r^(n-2)
...
0 0 0 0 ... 1
where each row behaves somewhat like a power analog of the CUMSUM function?

You can compute each term directly using implicit expansion and element-wise power, and then apply triu:
n = 5; % size
r = 2; % base
result = triu(r.^max((1:n)-(1:n).',0));
Or, maybe a little faster because it doesn't compute unwanted powers:
n = 5; % size
r = 2; % base
t = (1:n)-(1:n).';
u = find(t>=0);
t = t(u);
result = zeros(n);
result(u) = r.^t;

Using cumprod and triu:
% parameters
n = 5;
r = 2;
% Create a square matrix filled with 1:
A = ones(n);
% Assign the upper triangular part shifted by one with r
A(triu(A,1)==1)=r;
% cumprod along the second dimension and get only the upper triangular part
A = triu(cumprod(A,2))

Well, cumsum accumulates the sum of a vector but you are asking for a specially design matrix, so the comparison is a bit problematic....
Anyway, it might be that there is a function for this if this is a common special case triangular matrix (my mathematical knowledge is limited here, sorry), but we can also build it quite easily (and efficiently=) ):
N = 10;
r = 2;
% allocate arry
ary = ones(1,N);
% initialize array
ary(2) = r;
for i = 3:N
ary(i) = ary(i-1)*r;
end
% build matrix i.e. copy the array
M = eye(N);
for i = 1:N
M(i,i:end) = ary(1:end-i+1);
end
This assumes that you want to have a matrix of size NxN and r is the value that you want calculate the power of.
FIX: a previous version stated in line 13 M(i,i:end) = ary(i:end);, but the assignment needs to start always at the first position of the ary

Related

How to do a toeplitz matrix efficiently matlab

Let's suppose I have a vector x and 2 constants initialized as follows:
x = [ones(1,21) zeros(1,79)]; %step of 100 components
p = 2; q = 0;
Now, I want to build this matrix:
But in this case for example x(q-1) = x(-1) doesn't exist, so I want it to be 0, and I was wondering if there is a way to do it with the minimum lines of code. Note that the matrix can be written with the function toeplitz(), but I don't know how to replace nonexistent position of my vector x with zeros.
I hope someone can help me. Thank you for your answers.
You need to be careful about zero-based or one-based indexing.
In your question, you state that negative indices are invalid - in MATLAB the index 0 is also invalid. The below code assumes your x(q) is zero-based as described, but I do a +1 conversion. Be aware of this if q+p-1 is near numel(x).
x = [ones(1,21) zeros(1,79)]; %step of 100 components
p = 2; q = 0;
% Set up indexing matrix using implicit expansion (R2016b or newer)
m = ( q:-1:q-p+1 ) + ( 0:1:q+p-1 ).';
% Convert from 0-based to 1-based for MATLAB
m = m + 1;
% Set up output matrix, defaulting to zero
M = zeros( size( m ) );
% Put elements where 'm' is valid from 'x' into output 'M'
M( m > 0 ) = x( m( m > 0 ) );
The output is a (q+p) * p matrix.

adaptive elliptical structuring element in MATLAB

I'm trying to create an adaptive elliptical structuring element for an image to dilate or erode it. I write this code but unfortunately all of the structuring elements are ones(2*M+1).
I = input('Enter the input image: ');
M = input('Enter the maximum allowed semi-major axes length: ');
% determining ellipse parameteres from eigen value decomposition of LST
row = size(I,1);
col = size(I,2);
SE = cell(row,col);
padI = padarray(I,[M M],'replicate','both');
padrow = size(padI,1);
padcol = size(padI,2);
for m = M+1:padrow-M
for n = M+1:padcol-M
a = (l2(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
b = (l1(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
if e1(m-M,n-M,1)==0
phi = pi/2;
else
phi = atan(e1(m-M,n-M,2)/e1(m-M,n-M,1));
end
% defining structuring element for each pixel of image
x0 = m;
y0 = n;
se = zeros(2*M+1);
row_se = 0;
for i = x0-M:x0+M
row_se = row_se+1;
col_se = 0;
for j = y0-M:y0+M
col_se = col_se+1;
x = j-y0;
y = x0-i;
if ((x*cos(phi)+y*sin(phi))^2)/a^2+((x*sin(phi)-y*cos(phi))^2)/b^2 <= 1
se(row_se,col_se) = 1;
end
end
end
SE{m-M,n-M} = se;
end
end
a, b and phi are semi-major and semi-minor axes length and phi is angle between a and x axis.
I used 2 MATLAB functions to compute the Local Structure Tensor of the image, and then its eigenvalues and eigenvectors for each pixel. These are the matrices l1, l2, e1 and e2.
This is the bit of your code I didn't understand:
a = (l2(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
b = (l1(m-M,n-M)+eps/l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
I simplified the expression for b to (just removing the indexing):
b = (l1+eps/l1+l2+2*eps)*M;
For l1 and l2 in the normal range we get:
b =(approx)= (l1+0/l1+l2+2*0)*M = (l1+l2)*M;
Thus, b can easily be larger than M, which I don't think is your intention. The eps in this case also doesn't protect against division by zero, which is typically the purpose of adding eps: if l1 is zero, eps/l1 is Inf.
Looking at this expression, it seems to me that you intended this instead:
b = (l1+eps)/(l1+l2+2*eps)*M;
Here, you're adding eps to each of the eigenvalues, making them guaranteed non-zero (the structure tensor is symmetric, positive semi-definite). Then you're dividing l1 by the sum of eigenvalues, and multiplying by M, which leads to a value between 0 and M for each of the axes.
So, this seems to be a case of misplaced parenthesis.
Just for the record, this is what you need in your code:
a = (l2(m-M,n-M)+eps ) / ( l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
b = (l1(m-M,n-M)+eps ) / ( l1(m-M,n-M)+l2(m-M,n-M)+2*eps)*M;
^ ^
added parentheses
Note that you can simplify your code by defining, outside of the loops:
[se_x,se_y] = meshgrid(-M:M,-M:M);
The inner two loops, over i and j, to construct se can then be written simply as:
se = ((se_x.*cos(phi)+se_y.*sin(phi)).^2)./a.^2 + ...
((se_x.*sin(phi)-se_y.*cos(phi)).^2)./b.^2 <= 1;
(Note the .* and .^ operators, these do element-wise multiplication and power.)
A further slight improvement comes from realizing that phi is first computed from e1(m,n,1) and e1(m,n,2), and then used in calls to cos and sin. If we assume that the eigenvector is properly normalized, then
cos(phi) == e1(m,n,1)
sin(phi) == e1(m,n,2)
But you can always make sure they are normalized:
cos_phi = e1(m-M,n-M,1);
sin_phi = e1(m-M,n-M,2);
len = hypot(cos_phi,sin_phi);
cos_phi = cos_phi / len;
sin_phi = sin_phi / len;
se = ((se_x.*cos_phi+se_y.*sin_phi).^2)./a.^2 + ...
((se_x.*sin_phi-se_y.*cos_phi).^2)./b.^2 <= 1;
Considering trigonometric operations are fairly expensive, this should speed up your code a bit.

Why does the one-dimensional variant of a 2-d random walk not work?

There is a two-dimensional random walk that one can find here which works perfectly in Octave. However, when I tried to write a one-dimensional random walk program, I got an error. Here is the program:
t=[];
x=[];
for i=1:100000
J=rand;
if J<0.5
x(i+1)=x(i)+1;
t(i+1)=t(i)+1;
else
x(i+1)=x(i)-1;
t(i+1)=t(i)+1;
end
end
plot(t,x)
Here is the error:
error: A(I): index out of bounds; value 1 out of bound 0
Thank you.
No need for a loop:
N = 100000;
t = 1:N;
x = cumsum(2*(rand(1,N)<.5)-1);
plot(t,x)
For the 2D case you could use the same approach:
N = 100000;
%// t = 1:N; it won't be used in the plot, so not needed
x = cumsum(2*(rand(1,N)<.5)-1);
y = cumsum(2*(rand(1,N)<.5)-1);
plot(x,y)
axis square
You get an error because you ask MATLAB to use x(1) in the first iteration when you actually defined x to be of length 0. So you need to either initialize x and t with the proper size:
x=zeros(1,100001);
t=zeros(1,100001);
or change your loop to add the new values at the end of the vectors:
x(i+1)=[x(i) x(i)+1];
Since t and x are empty, therefore, you cannot index them through x(i+1) and x(i).
I believe you should intialize x and t with all zeros.
In the first iteration, i = 1, you have x(2) = x(1) +or- 1 while x has dimension of zero. You should define the starting point for x and t, which is usually the origin, you can also change the code a little bit,
x = 0;
N = 100000;
t = 0 : N;
for i = 1 : N
x(i+1) = x(i) + 2 * round(rand) - 1;
end
plot(t,x)

Optimizing repetitive estimation (currently a loop) in MATLAB

I've found myself needing to do a least-squares (or similar matrix-based operation) for every pixel in an image. Every pixel has a set of numbers associated with it, and so it can be arranged as a 3D matrix.
(This next bit can be skipped)
Quick explanation of what I mean by least-squares estimation :
Let's say we have some quadratic system that is modeled by Y = Ax^2 + Bx + C and we're looking for those A,B,C coefficients. With a few samples (at least 3) of X and the corresponding Y, we can estimate them by:
Arrange the (lets say 10) X samples into a matrix like X = [x(:).^2 x(:) ones(10,1)];
Arrange the Y samples into a similar matrix: Y = y(:);
Estimate the coefficients A,B,C by solving: coeffs = (X'*X)^(-1)*X'*Y;
Try this on your own if you want:
A = 5; B = 2; C = 1;
x = 1:10;
y = A*x(:).^2 + B*x(:) + C + .25*randn(10,1); % added some noise here
X = [x(:).^2 x(:) ones(10,1)];
Y = y(:);
coeffs = (X'*X)^-1*X'*Y
coeffs =
5.0040
1.9818
0.9241
START PAYING ATTENTION AGAIN IF I LOST YOU THERE
*MAJOR REWRITE*I've modified to bring it as close to the real problem that I have and still make it a minimum working example.
Problem Setup
%// Setup
xdim = 500;
ydim = 500;
ncoils = 8;
nshots = 4;
%// matrix size for each pixel is ncoils x nshots (an overdetermined system)
%// each pixel has a matrix stored in the 3rd and 4rth dimensions
regressor = randn(xdim,ydim, ncoils,nshots);
regressand = randn(xdim, ydim,ncoils);
So my problem is that I have to do a (X'*X)^-1*X'*Y (least-squares or similar) operation for every pixel in an image. While that itself is vectorized/matrixized the only way that I have to do it for every pixel is in a for loop, like:
Original code style
%// Actual work
tic
estimate = zeros(xdim,ydim);
for col=1:size(regressor,2)
for row=1:size(regressor,1)
X = squeeze(regressor(row,col,:,:));
Y = squeeze(regressand(row,col,:));
B = X\Y;
% B = (X'*X)^(-1)*X'*Y; %// equivalently
estimate(row,col) = B(1);
end
end
toc
Elapsed time = 27.6 seconds
EDITS in reponse to comments and other ideas
I tried some things:
1. Reshaped into a long vector and removed the double for loop. This saved some time.
2. Removed the squeeze (and in-line transposing) by permute-ing the picture before hand: This save alot more time.
Current example:
%// Actual work
tic
estimate2 = zeros(xdim*ydim,1);
regressor_mod = permute(regressor,[3 4 1 2]);
regressor_mod = reshape(regressor_mod,[ncoils,nshots,xdim*ydim]);
regressand_mod = permute(regressand,[3 1 2]);
regressand_mod = reshape(regressand_mod,[ncoils,xdim*ydim]);
for ind=1:size(regressor_mod,3) % for every pixel
X = regressor_mod(:,:,ind);
Y = regressand_mod(:,ind);
B = X\Y;
estimate2(ind) = B(1);
end
estimate2 = reshape(estimate2,[xdim,ydim]);
toc
Elapsed time = 2.30 seconds (avg of 10)
isequal(estimate2,estimate) == 1;
Rody Oldenhuis's way
N = xdim*ydim*ncoils; %// number of columns
M = xdim*ydim*nshots; %// number of rows
ii = repmat(reshape(1:N,[ncoils,xdim*ydim]),[nshots 1]); %//column indicies
jj = repmat(1:M,[ncoils 1]); %//row indicies
X = sparse(ii(:),jj(:),regressor_mod(:));
Y = regressand_mod(:);
B = X\Y;
B = reshape(B(1:nshots:end),[xdim ydim]);
Elapsed time = 2.26 seconds (avg of 10)
or 2.18 seconds (if you don't include the definition of N,M,ii,jj)
SO THE QUESTION IS:
Is there an (even) faster way?
(I don't think so.)
You can achieve a ~factor of 2 speed up by precomputing the transposition of X. i.e.
for x=1:size(picture,2) % second dimension b/c already transposed
X = picture(:,x);
XX = X';
Y = randn(n_timepoints,1);
%B = (X'*X)^-1*X'*Y; ;
B = (XX*X)^-1*XX*Y;
est(x) = B(1);
end
Before: Elapsed time is 2.520944 seconds.
After: Elapsed time is 1.134081 seconds.
EDIT:
Your code, as it stands in your latest edit, can be replaced by the following
tic
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
% Actual work
picture = randn(xdim,ydim,n_timepoints);
picture = reshape(picture, [xdim*ydim,n_timepoints])'; % note transpose
YR = randn(n_timepoints,size(picture,2));
% (XX*X).^-1 = sum(picture.*picture).^-1;
% XX*Y = sum(picture.*YR);
est = sum(picture.*picture).^-1 .* sum(picture.*YR);
est = reshape(est,[xdim,ydim]);
toc
Elapsed time is 0.127014 seconds.
This is an order of magnitude speed up on the latest edit, and the results are all but identical to the previous method.
EDIT2:
Okay, so if X is a matrix, not a vector, things are a little more complicated. We basically want to precompute as much as possible outside of the for-loop to keep our costs down. We can also get a significant speed-up by computing XT*X manually - since the result will always be a symmetric matrix, we can cut a few corners to speed things up. First, the symmetric multiplication function:
function XTX = sym_mult(X) % X is a 3-d matrix
n = size(X,2);
XTX = zeros(n,n,size(X,3));
for i=1:n
for j=i:n
XTX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
if i~=j
XTX(j,i,:) = XTX(i,j,:);
end
end
end
Now the actual computation script
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10
% Actual work
tic % start timing
picture = reshape(picture, [xdim*ydim,n_timepoints])';
% Here we precompute the (XT*Y) calculation to speed things up later
picture_y = [sum(Y);sum(Y.*picture)];
% initialize
est = zeros(size(picture,2),1);
picture = permute(picture,[1,3,2]);
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX); % precompute (XT*X) for speed
X = zeros(2,2); % preallocate for speed
XY = zeros(2,1);
for x=1:size(picture,2) % second dimension b/c already transposed
%For some reason this is a lot faster than X = XTX(:,:,x);
X(1,1) = XTX(1,1,x);
X(2,1) = XTX(2,1,x);
X(1,2) = XTX(1,2,x);
X(2,2) = XTX(2,2,x);
XY(1) = picture_y(1,x);
XY(2) = picture_y(2,x);
% Here we utilise the fact that A\B is faster than inv(A)*B
% We also use the fact that (A*B)*C = A*(B*C) to speed things up
B = X\XY;
est(x) = B(1);
end
est = reshape(est,[xdim,ydim]);
toc % end timing
Before: Elapsed time is 4.56 seconds.
After: Elapsed time is 2.24 seconds.
This is a speed up of about a factor of 2. This code should be extensible to X being any dimensions you want. For instance, in the case where X = [1 x x^2], you would change picture_y to the following
picture_y = [sum(Y);sum(Y.*picture);sum(Y.*picture.^2)];
and change XTX to
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture,picture.^2);
You would also change a lot of 2s to 3s in the code, and add XY(3) = picture_y(3,x) to the loop. It should be fairly straight-forward, I believe.
Results
I sped up your original version, since your edit 3 was actually not working (and also does something different).
So, on my PC:
Your (original) version: 8.428473 seconds.
My obfuscated one-liner given below: 0.964589 seconds.
First, for no other reason than to impress, I'll give it as I wrote it:
%%// Some example data
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
estimate = zeros(xdim,ydim); %// initialization with explicit size
picture = randn(xdim,ydim,n_timepoints);
%%// Your original solution
%// (slightly altered to make my version's results agree with yours)
tic
Y = randn(n_timepoints,xdim*ydim);
ii = 1;
for x = 1:xdim
for y = 1:ydim
X = squeeze(picture(x,y,:)); %// or similar creation of X matrix
B = (X'*X)^(-1)*X' * Y(:,ii);
ii = ii+1;
%// sometimes you keep everything and do
%// estimate(x,y,:) = B(:);
%// sometimes just the first element is important and you do
estimate(x,y) = B(1);
end
end
toc
%%// My version
tic
%// UNLEASH THE FURY!!
estimate2 = reshape(sparse(1:xdim*ydim*n_timepoints, ...
builtin('_paren', ones(n_timepoints,1)*(1:xdim*ydim),:), ...
builtin('_paren', permute(picture, [3 2 1]),:))\Y(:), ydim,xdim).'; %'
toc
%%// Check for equality
max(abs(estimate(:)-estimate2(:))) % (always less than ~1e-14)
Breakdown
First, here's the version that you should actually use:
%// Construct sparse block-diagonal matrix
%// (Type "help sparse" for more information)
N = xdim*ydim; %// number of columns
M = N*n_timepoints; %// number of rows
ii = 1:N;
jj = ones(n_timepoints,1)*(1:N);
s = permute(picture, [3 2 1]);
X = sparse(ii,jj(:), s(:));
%// Compute ALL the estimates at once
estimates = X\Y(:);
%// You loop through the *second* dimension first, so to make everything
%// agree, we have to extract elements in the "wrong" order, and transpose:
estimate2 = reshape(estimates, ydim,xdim).'; %'
Here's an example of what picture and the corresponding matrix X looks like for xdim = ydim = n_timepoints = 2:
>> clc, picture, full(X)
picture(:,:,1) =
-0.5643 -2.0504
-0.1656 0.4497
picture(:,:,2) =
0.6397 0.7782
0.5830 -0.3138
ans =
-0.5643 0 0 0
0.6397 0 0 0
0 -2.0504 0 0
0 0.7782 0 0
0 0 -0.1656 0
0 0 0.5830 0
0 0 0 0.4497
0 0 0 -0.3138
You can see why sparse is necessary -- it's mostly zeros, but will grow large quickly. The full matrix would quickly consume all your RAM, while the sparse one will not consume much more than the original picture matrix does.
With this matrix X, the new problem
X·b = Y
now contains all the problems
X1 · b1 = Y1
X2 · b2 = Y2
...
where
b = [b1; b2; b3; ...]
Y = [Y1; Y2; Y3; ...]
so, the single command
X\Y
will solve all your systems at once.
This offloads all the hard work to a set of highly specialized, compiled to machine-specific code, optimized-in-every-way algorithms, rather than the interpreted, generic, always-two-steps-away from the hardware loops in MATLAB.
It should be straightforward to convert this to a version where X is a matrix; you'll end up with something like what blkdiag does, which can also be used by mldivide in exactly the same way as above.
I had a wee play around with an idea, and I decided to stick it as a separate answer, as its a completely different approach to my other idea, and I don't actually condone what I'm about to do. I think this is the fastest approach so far:
Orignal (unoptimised): 13.507176 seconds.
Fast Cholesky-decomposition method: 0.424464 seconds
First, we've got a function to quickly do the X'*X multiplication. We can speed things up here because the result will always be symmetric.
function XX = sym_mult(X)
n = size(X,2);
XX = zeros(n,n,size(X,3));
for i=1:n
for j=i:n
XX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
if i~=j
XX(j,i,:) = XX(i,j,:);
end
end
end
The we have a function to do LDL Cholesky decomposition of a 3D matrix (we can do this because the (X'*X) matrix will always be symmetric) and then do forward and backwards substitution to solve the LDL inversion equation
function Y = fast_chol(X,XY)
n=size(X,2);
L = zeros(n,n,size(X,3));
D = zeros(n,n,size(X,3));
B = zeros(n,1,size(X,3));
Y = zeros(n,1,size(X,3));
% These loops compute the LDL decomposition of the 3D matrix
for i=1:n
D(i,i,:) = X(i,i,:);
L(i,i,:) = 1;
for j=1:i-1
L(i,j,:) = X(i,j,:);
for k=1:(j-1)
L(i,j,:) = L(i,j,:) - L(i,k,:).*L(j,k,:).*D(k,k,:);
end
D(i,j,:) = L(i,j,:);
L(i,j,:) = L(i,j,:)./D(j,j,:);
if i~=j
D(i,i,:) = D(i,i,:) - L(i,j,:).^2.*D(j,j,:);
end
end
end
for i=1:n
B(i,1,:) = XY(i,:);
for j=1:(i-1)
B(i,1,:) = B(i,1,:)-D(i,j,:).*B(j,1,:);
end
B(i,1,:) = B(i,1,:)./D(i,i,:);
end
for i=n:-1:1
Y(i,1,:) = B(i,1,:);
for j=n:-1:(i+1)
Y(i,1,:) = Y(i,1,:)-L(j,i,:).*Y(j,1,:);
end
end
Finally, we have the main script which calls all of this
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10
tic % start timing
picture = reshape(pr, [xdim*ydim,n_timepoints])';
% Here we precompute the (XT*Y) calculation
picture_y = [sum(Y);sum(Y.*picture)];
% initialize
est2 = zeros(size(picture,2),1);
picture = permute(picture,[1,3,2]);
% Now we calculate the X'*X matrix
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX);
% Call our fast Cholesky decomposition routine
B = fast_chol(XTX,picture_y);
est2 = B(1,:);
est2 = reshape(est2,[xdim,ydim]);
toc
Again, this should work equally well for a Nx3 X matrix, or however big you want.
I use octave, thus I can't say anything about the resulting performance in Matlab, but would expect this code to be slightly faster:
pictureT=picture'
est=arrayfun(#(x)( (pictureT(x,:)*picture(:,x))^-1*pictureT(x,:)*randn(n_ti
mepoints,1)),1:size(picture,2));

Vectorizing sums of different diagonals in a matrix

I want to vectorize the following MATLAB code. I think it must be simple but I'm finding it confusing nevertheless.
r = some constant less than m or n
[m,n] = size(C);
S = zeros(m-r,n-r);
for i=1:m-r+1
for j=1:n-r+1
S(i,j) = sum(diag(C(i:i+r-1,j:j+r-1)));
end
end
The code calculates a table of scores, S, for a dynamic programming algorithm, from another score table, C.
The diagonal summing is to generate scores for individual pieces of the data used to generate C, for all possible pieces (of size r).
Thanks in advance for any answers! Sorry if this one should be obvious...
Note
The built-in conv2 turned out to be faster than convnfft, because my eye(r) is quite small ( 5 <= r <= 20 ). convnfft.m states that r should be > 20 for any benefit to manifest.
If I understand correctly, you're trying to calculate the diagonal sum of every subarray of C, where you have removed the last row and column of C (if you should not remove the row/col, you need to loop to m-r+1, and you need to pass the entire array C to the function in my solution below).
You can do this operation via a convolution, like so:
S = conv2(C(1:end-1,1:end-1),eye(r),'valid');
If C and r are large, you may want to have a look at CONVNFFT from the Matlab File Exchange to speed up calculations.
Based on the idea of JS, and as Jonas pointed out in the comments, this can be done in two lines using IM2COL with some array manipulation:
B = im2col(C, [r r], 'sliding');
S = reshape( sum(B(1:r+1:end,:)), size(C)-r+1 );
Basically B contains the elements of all sliding blocks of size r-by-r over the matrix C. Then we take the elements on the diagonal of each of these blocks B(1:r+1:end,:), compute their sum, and reshape the result to the expected size.
Comparing this to the convolution-based solution by Jonas, this does not perform any matrix multiplication, only indexing...
I would think you might need to rearrange C into a 3D matrix before summing it along one of the dimensions. I'll post with an answer shortly.
EDIT
I didn't manage to find a way to vectorise it cleanly, but I did find the function accumarray, which might be of some help. I'll look at it in more detail when I am home.
EDIT#2
Found a simpler solution by using linear indexing, but this could be memory-intensive.
At C(1,1), the indexes we want to sum are 1+[0, m+1, 2*m+2, 3*m+3, 4*m+4, ... ], or (0:r-1)+(0:m:(r-1)*m)
sum_ind = (0:r-1)+(0:m:(r-1)*m);
create S_offset, an (m-r) by (n-r) by r matrix, such that S_offset(:,:,1) = 0, S_offset(:,:,2) = m+1, S_offset(:,:,3) = 2*m+2, and so on.
S_offset = permute(repmat( sum_ind, [m-r, 1, n-r] ), [1, 3, 2]);
create S_base, a matrix of base array addresses from which the offset will be calculated.
S_base = reshape(1:m*n,[m n]);
S_base = repmat(S_base(1:m-r,1:n-r), [1, 1, r]);
Finally, use S_base+S_offset to address the values of C.
S = sum(C(S_base+S_offset), 3);
You can, of course, use bsxfun and other methods to make it more efficient; here I chose to lay it out for clarity. I have yet to benchmark this to see how it compares with the double-loop method though; I need to head home for dinner first!
Is this what you're looking for? This function adds the diagonals and puts them into a vector similar to how the function 'sum' adds up all of the columns in a matrix and puts them into a vector.
function [diagSum] = diagSumCalc(squareMatrix, LLUR0_ULLR1)
%
% Input: squareMatrix: A square matrix.
% LLUR0_ULLR1: LowerLeft to UpperRight addition = 0
% UpperLeft to LowerRight addition = 1
%
% Output: diagSum: A vector of the sum of the diagnols of the matrix.
%
% Example:
%
% >> squareMatrix = [1 2 3;
% 4 5 6;
% 7 8 9];
%
% >> diagSum = diagSumCalc(squareMatrix, 0);
%
% diagSum =
%
% 1 6 15 14 9
%
% >> diagSum = diagSumCalc(squareMatrix, 1);
%
% diagSum =
%
% 7 12 15 8 3
%
% Written by M. Phillips
% Oct. 16th, 2013
% MIT Open Source Copywrite
% Contact mphillips#hmc.edu fmi.
%
if (nargin < 2)
disp('Error on input. Needs two inputs.');
return;
end
if (LLUR0_ULLR1 ~= 0 && LLUR0_ULLR1~= 1)
disp('Error on input. Only accepts 0 or 1 as input for second condition.');
return;
end
[M, N] = size(squareMatrix);
if (M ~= N)
disp('Error on input. Only accepts a square matrix as input.');
return;
end
diagSum = zeros(1, M+N-1);
if LLUR0_ULLR1 == 1
squareMatrix = rot90(squareMatrix, -1);
end
for i = 1:length(diagSum)
if i <= M
countUp = 1;
countDown = i;
while countDown ~= 0
diagSum(i) = squareMatrix(countUp, countDown) + diagSum(i);
countUp = countUp+1;
countDown = countDown-1;
end
end
if i > M
countUp = i-M+1;
countDown = M;
while countUp ~= M+1
diagSum(i) = squareMatrix(countUp, countDown) + diagSum(i);
countUp = countUp+1;
countDown = countDown-1;
end
end
end
Cheers