Generating arrays using bsxfun with anonymous function and for elementwise subtractions - MATLAB - matlab

I have the following code:
n = 10000;
s = 100;
Z = rand(n, 2);
x = rand(s, 1);
y = rand(s, 1);
fun = #(a) exp(a);
In principle, the anonymous function f can have a different form. I need to create two arrays.
First, I need to create an array of size n x s x s with generic elements
fun(Z(i, 1) - x(j)) * fun(Z(i, 2) - y(k))
where i=1,...n while j,k=1,...,s. What I can easily do, is to construct matrices using bsxfun, e.g.
bsxfun(#(x, y) fun(x - y), Z(:, 1), x');
bsxfun(#(x, y) fun(x - y), Z(:, 2), y');
But then I would need to combine them into 3D array by multiplying element-wise each column of those two matrices.
In the second step, I need to create an array of size n x 3 x s x s, which would look from one side as the following matrix
[ones(n, 1), Z(:, 1) - x(i), Z(:, 2) - y(j);]
where i=1,...s, j=1,...s. I could loop over the two extra dimensions with something like
A = [ones(n, 1), Z(:, 1) - x(1), Z(:, 2) - y(1)];
for i = 1:s
for j = 1:s
A(:, :, i, j) = [ones(n, 1), Z(:, 1) - x(i), Z(:, 2) - y(j);];
end
end
Is there a way to avoid loops?
In the third step, suppose that after obtaining array out1 (output from first step), I want to create a new array out3 of dimension n x n x s x s, which contains the original array out1 on the main diagonal, i.e. out3(i,i,s,s) = out1(i, s, s) and out3(i,j,s,s)=0 for all i~=j. Is there some kind of alternative of diag for creating "diagonal arrays"? Alternatively, if I create n x n x s x s array of zeros, is there a way to put out1 on the main diagonal?

Code
exp_Z_x = exp(bsxfun(#minus,Z(:,1),x.')); %//'
exp_Z_y = exp(bsxfun(#minus,Z(:,2),y.')); %//'
out1 = bsxfun(#times,exp_Z_x,permute(exp_Z_y,[1 3 2]));
Z1 = [ones(n,1) Z(:,1) Z(:,2)];
X1 = permute([ zeros(s,1) x zeros(s,1)],[3 2 1]);
Y1 = permute([ zeros(s,1) zeros(s,1) y],[4 2 3 1]);
out2 = bsxfun(#minus,bsxfun(#minus,Z1,X1),Y1);
out3 = zeros(n,n,s,s); %// out3(n,n,s,s) = 0; could be used for performance
out3(bsxfun(#plus,[1:n+1:n*n]',[0:s*s-1]*n*n)) = out1; %//'
%// out1, out2 and out3 are the desired outputs

Related

How to compute the sum of squares of outer products of two matrices minus a common matrix in Matlab?

Suppose there are three n * n matrices X, Y, S. How to fast compute the the following scalars b
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C
for i = 1:n
C = C + X(i,:)' * Y(i,:);
end
can be calculated without for loop C = A.'*B which is only O(n^2). Is there exists a faster way to compute b?
You can use:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Given your example
b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end
We can first bring the summation out of the loop:
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))
Knowing that we can write (a - b)^2 as a^2 - 2*a*b + b^2
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))
And we know that (a * b) ^ 2 is the same as a^2 * b^2:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))
Now we can compute each term separately:
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Here is the result of a test in Octave that compares my method and two other methods provided by #AndrasDeak and the original loop based solution for inputs of size 500*500:
===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.
===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.
===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.
===Loop solution===
Elapsed time is 2.20357 seconds
n=500;
X= rand(n);
Y= rand(n);
S= rand(n);
disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, []));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, []));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc
You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:
% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);
% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, []));
% check
b - b2 % close to machine epsilon i.e. zero
What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n] against one with [n, n], the latter being implicitly the same as [n, n, 1]. The overlapping first index corresponds to the i in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i. Then we permute the indices in order to put the "i" index last, so that we can again broadcast the result with S of (implicit) size [n, n, 1]. What we then have is a matrix of size [n, n, n] where the first two indices are matrix indices in your original and the last one corresponds to i. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).
A slight variation of the above transposes S instead of the 3d array which might be faster (again, you should time it):
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, []));
In terms of performance, reshape is free (it only reinterprets data, it doesn't copy) but permute/transpose will often lead to a perforance hit when data gets copied.

How can I express this large number of computations without for loops?

I work primarily in MATLAB but I think the answer should not be too hard to carry over from one language to another.
I have a multi-dimensional array X with dimensions [n, p, 3].
I would like to calculate the following multi-dimensional array.
T = zeros(p, p, p)
for i = 1:p
for j = 1:p
for k = 1:p
T(i, j, k) = sum(X(:, i, 1) .* X(:, j, 2) .* X(:, k, 3));
end
end
end
The sum is of the elements of a length-n vector. Any help is appreciated!
You only need some permuting of dimensions and multiplication with singleton expansion:
T = sum(bsxfun(#times, bsxfun(#times, permute(X(:,:,1), [2 4 5 3 1]), permute(X(:,:,2), [4 2 5 3 1])), permute(X(:,:,3), [4 5 2 3 1])), 5);
From R2016b onwards, this can be written more easily as
T = sum(permute(X(:,:,1), [2 4 5 3 1]) .* permute(X(:,:,2), [4 2 5 3 1]) .* permute(X(:,:,3), [4 5 2 3 1]), 5);
As I mentioned in a comment, vectorization is not always a huge advantage any more. Therefore there are vectorization methods that slow down the code rather than speed it up. You must always time your solutions. Vectorization often involves the creation of large temporary arrays, or the copy of large amounts of data, which are avoided in loop code. It depends on the architecture, the size of the input, and many other factors if such a solution is going to be faster.
Nonetheless, in this case it seems vectorization approaches can yield a large speedup.
The first thing to notice about the original code is that X(:, i, 1) .* X(:, j, 2) gets re-computed in the inner loop, though it is a constant value there. Rewriting the inner loop as this will save time:
Y = X(:, i, 1) .* X(:, j, 2);
for k = 1:p
T(i, j, k) = sum(Y .* X(:, k, 3));
end
Now we notice that the inner loop is a dot product, and can be written as follows:
Y = X(:, i, 1) .* X(:, j, 2);
T(i, j, :) = Y.' * X(:, :, 3);
The .' transpose on Y does not copy the data, as Y is a vector. Next, we notice that X(:, :, 3) is indexed repeatedly. Let's move this out of the outer loop. Now I'm left with the following code:
T = zeros(p, p, p);
X1 = X(:, :, 1);
X2 = X(:, :, 2);
X3 = X(:, :, 3);
for i = 1:p
for j = 1:p
Y = X1(:, i) .* X2(:, j);
T(i, j, :) = Y.' * X3;
end
end
It is likely that removing the loop over j is equally easy, which would leave a single loop over i. But this is where I stop.
This is the timings I see (R2017a, 3-year old iMac with 4 cores). For n=10, p=20:
original: 0.0206
moving Y out the inner loop: 0.0100
removing inner loop: 0.0016
moving indexing out of loops: 7.6294e-04
Luis' answer: 1.9196e-04
For a larger array with n=50, p=100:
original: 2.9107
moving Y out the inner loop: 1.3488
removing inner loop: 0.0910
moving indexing out of loops: 0.0361
Luis' answer: 0.1417
"Luis' answer" is this one. It is by far fastest for small arrays, but for larger arrays it shows the cost of the permutation. Moving the computation of the first product out of the inner loop saves a bit over half the computation cost. But removing the inner loop reduces the cost quite dramatically (which I hadn't expected, I presume the single matrix product can use parallelism better than the many small element-wise products). We then get a further time reduction by reducing the amount of indexing operations within the loop.
This is the timing code:
function so()
n = 10; p = 20;
%n = 50; p = 100;
X = randn(n,p,3);
T1 = method1(X);
T2 = method2(X);
T3 = method3(X);
T4 = method4(X);
T5 = method5(X);
assert(max(abs(T1(:)-T2(:)))<1e-13)
assert(max(abs(T1(:)-T3(:)))<1e-13)
assert(max(abs(T1(:)-T4(:)))<1e-13)
assert(max(abs(T1(:)-T5(:)))<1e-13)
timeit(#()method1(X))
timeit(#()method2(X))
timeit(#()method3(X))
timeit(#()method4(X))
timeit(#()method5(X))
function T = method1(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
for k = 1:p
T(i, j, k) = sum(X(:, i, 1) .* X(:, j, 2) .* X(:, k, 3));
end
end
end
function T = method2(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
Y = X(:, i, 1) .* X(:, j, 2);
for k = 1:p
T(i, j, k) = sum(Y .* X(:, k, 3));
end
end
end
function T = method3(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
Y = X(:, i, 1) .* X(:, j, 2);
T(i, j, :) = Y.' * X(:, :, 3);
end
end
function T = method4(X)
p = size(X,2);
T = zeros(p, p, p);
X1 = X(:, :, 1);
X2 = X(:, :, 2);
X3 = X(:, :, 3);
for i = 1:p
for j = 1:p
Y = X1(:, i) .* X2(:, j);
T(i, j, :) = Y.' * X3;
end
end
function T = method5(X)
T = sum(permute(X(:,:,1), [2 4 5 3 1]) .* permute(X(:,:,2), [4 2 5 3 1]) .* permute(X(:,:,3), [4 5 2 3 1]), 5);
You have mentioned you are open to other languages and NumPy by its syntax is very close to MATLAB, so we will try to have a NumPy based solution on this.
Now, these tensor related sum-reductions, specially matrix multiplications ones are easily expressed as einstein-notation and NumPy luckily has one function on the same as np.einsum. Under the hoods, it's implemented in C and is pretty efficient. Recently it's been optimized further to leverage BLAS based matrix-multiplication implementations.
So, a translation of the stated code onto NumPy territory keeping in mind that it follows 0-based indexing and the axes are visuallized differently than the dimensions with MATLAB, would be -
import numpy as np
# X is a NumPy array of shape : (n,p,3). So, a random one could be
# generated with : `X = np.random.rand(n,p,3)`.
T = np.zeros((p, p, p))
for i in range(p):
for j in range(p):
for k in range(p):
T[i, j, k] = np.sum(X[:, i, 0] * X[:, j, 1] * X[:, k, 2])
The einsum way to solve it would be -
np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2])
To leverage matrix-multiplication, use optimize flag -
np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2],optimize=True)
Timings (with large sizes)
In [27]: n,p = 100,100
...: X = np.random.rand(n,p,3)
In [28]: %%timeit
...: T = np.zeros((p, p, p))
...: for i in range(p):
...: for j in range(p):
...: for k in range(p):
...: T[i, j, k] = np.sum(X[:, i, 0] * X[:, j, 1] * X[:, k, 2])
1 loop, best of 3: 6.23 s per loop
In [29]: %timeit np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2])
1 loop, best of 3: 353 ms per loop
In [31]: %timeit np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2],optimize=True)
100 loops, best of 3: 10.5 ms per loop
In [32]: 6230.0/10.5
Out[32]: 593.3333333333334
Around 600x speedup there!

How to randomly select multiple small and non-overlapping matrices from a large matrix?

Let's say I've a large N x M -sized matrix A (e.g. 1000 x 1000). Selecting k random elements without replacement from A is relatively straightforward in MATLAB:
A = rand(1000,1000); % Generate random data
k = 5; % Number of elements to be sampled
sizeA = numel(A); % Number of elements in A
idx = randperm(sizeA); % Random permutation
B = A(idx(1:k)); % Random selection of k elements from A
However, I'm looking for a way to expand the above concept so that I could randomly select k non-overlapping n x m -sized sub-matrices (e.g. 5 x 5) from A. What would be the most convenient way to achieve this? I'd very much appreciate any help!
This probably isn't the most efficient way to do this. I'm sure if I (or somebody else) gave it more thought there would be a better way but it should help you get started.
First I take the original idx(1:k) and reshape it into a 3D matrix reshape(idx(1:k), 1, 1, k). Then I extend it to the length required, padding with zeros, idx(k, k, 1) = 0; % Extend padding with zeros and lastly I use 2 for loops to create the correct indices
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
end
end
The complete script built onto the end of yours
A = rand(1000, 1000);
k = 5;
idx = randperm(numel(A));
B = A(idx(1:k));
idx = reshape(idx(1:k), 1, 1, k);
idx(k, k, 1) = 0; % Extend padding with zeros
for n = 1:k
for m = 1:k
idx(m, 1:k, n) = size(A)*(m - 1) + idx(1, 1, n):size(A)*(m - 1) + idx(1, 1, n) + k - 1;
end
end
C = A(idx);

MATLAB - vectorize iteration over two matrices used in function

I have two matrices X and Y, both of order mxn. I want to create a new matrix Z of order mx1 such that each i th entry in this new matrix is computed by applying a function to ith and ith row of X and Y respectively. In my case m = 100000 and n = 2. I tried using a loop but it takes forever.
for i = 1:m
Z = function(X(1,:),Y(1,:), constant_parameters)
end
Is there an efficient way to vectorize it?
EDIT 1
This is the function
function [peso] = fxPesoTexturaCN(a,b, img, r, L)
ac = num2cell(a);
bc = num2cell(b);
imgint1 = img(sub2ind(size(img),ac{:}));
imgint2 = img(sub2ind(size(img),bc{:}));
peso = (sum((a - b) .^ 2) + (r/L) * (imgint2 - imgint1)) / (2*r^2);
Where img, r, L are constats. a is X(1,:) and b is Y(1,:)
And the call of this function is
peso = bsxfun(#(a,b) fxPesoTexturaCN(a,b,img,r,L), a, b);

manipulating indices of matrix in parallel in matlab

Suppose I have a m-by-n-by-p matrix "A", each indices stores a real number, now I want to create another matrix "B" and B(i, j, k) = f(A(i, j, k), i, j, k, otherVars), is there a faster way to do it in matlab rather than looping through all the elements? (notice the function requires the index number (i, j, k))
An example is as follows(The actual function f could be more complex):
A = rand(3, 4, 5);
B = zeros(size(A));
C = 10;
for x = 1:size(A, 1)
for y = 1:size(A, 2)
for z = 1:size(A, 3)
B(x, y, z) = A(x,y,z) + x - y * z + C;
end
end
end
I've tried creating a cell "B", and
B{i, j, k} = [A(i, j, k), i, j, k];
I then applied cellfun() to do the parallel computing, but it's even slower than a for-loop over each elements in A.
In my real implementation, function f is much more complex than B = A + X - Y.*Z + C; it takes four scaler values and I don't want to modify it since it's a function written in an external package. Any suggestions?
Vectorize it by building an ndgrid of the appropriate values:
[X,Y,Z] = ndgrid(1:size(A,1), 1:size(A,2), 1:size(A,3));
B = A + X - Y.*Z + C;