I have a matlab program with 5 nested
for
loops and a
if
condition like this:
for x0=1:N
for y0=1:N
for k=1:N
for x1=1:N
for y1=1:N
if ~((y1-x1>N/2)||(x1-y1>N/2)) && ~((y0-x0>N/2)||(x0-y0>N/2))
A(x0,y0)=A(x0,y0)+2^(k*((x0-y0)+(x1-y1)))*B(x1,y1)
end
end
end
end
end
end
where A and B are two matrices. How can I make this program run faster?
I've tried to use meshgrid but it seems doesn't work because there's a
if
condition.
Lets be smart about loops and conditions first, as you are using the loop indices as condition variables.
We start with
~(y1-x1>N/2)||(x1-y1>N/2), or way clearer, abs(y1-x1)<N/2.
Instead of having an if condition, why not enforce y1 to be in range, always?
The last loop can be written as y1=max(x1-N/2,1):min(x1+N/2,N), and thus the entirety of the first part of the if condition is not needed. We can do the same for the other variables, of course:
for x0=1:N
for y0=max(x0-N/2,1):min(x0+N/2,N)
for k=1:N
for x1=1:N
for y1=max(x1-N/2,1):min(x1+N/2,N)
A(x0,y0)=A(x0,y0)+2^(k*((x0-y0)+(x1-y1)))*B(x1,y1)
end
end
end
end
end
Now, for clarity, lets reshuffle and vectorize that k. There is no need for it to be the middle loop, in fact, its only feature as the middle loop is to confuse the person reading the code. But aside from that, there is no need for it to be a loop either.
k=1:N;
for x0=1:N
for y0=max(x0-N/2,1):min(x0+N/2,N)
for x1=1:N
for y1=max(x1-N/2,1):min(x1+N/2,N)
A(x0,y0)=A(x0,y0)+sum(2.^(k*((x0-y0)+(x1-y1))))*B(x1,y1)
end
end
end
end
Now, is this faster?
No. MATLAB is really good at optimizing your code, so it is not faster. But at least its way way clearer, so I guess you got that going for you. But you need it faster! Well.... I am not sure you can. You have a 5 nested loops, that is just super slow. I don't think you can meshgrid this, even without the conditions, because you intermingle all loops. meshgrid is good when well, you do operations on a mesh grid, but in your case you use all x1,y1 for every x0,y0 and thus its not a mesh operation.
Here is a vectorized solution:
x0 = (1:N).';
y0 = 1:N;
x1 = (1:N).';
y1 = 1:N;
k = reshape(1:N, 1, 1, N);
conditiona = ~((y0-x0 > N/2) | (x0-y0 > N/2));
conditionb = ~((y1-x1 > N/2) | (x1-y1 > N/2));
a = 2 .^ (k .* (x0-y0)) .* conditiona;
b = 2 .^ (k .* (x1-y1)) .* B .* conditionb;
bsum = squeeze(sum(sum(b, 1) ,2));
A = A + reshape(reshape(a, [] , N) * bsum ,N ,N);
Note that two 3D arrays a and b are created that may/may not require a lot of memory. In such a case you need to loop over k. For example in the first iteration set k to 1:5. In the second iteration set it to 6:10 and so on. You need to addv the result of each iteration to the previous iteration to form the final A.
Explanation
This function can be vectorized by implicit expansion (that is more efficient than using meshgrid) and using element-wise operators like .^ and .* instead of ^ and * operators. As a result a 5D array is created (because we have 5 loop variables) that should be summed over 3-5th dimensions to produce the final 2D matrix. However that may require a lot of memory. Another point is that functions that contains the sum of products usually can be written as efficient matrix multiplication.
The expression:
2^(k*((x0-y0)+(x1-y1)))*B(x1,y1);
can be written as:
2 .^ (k .* (x0-y0)) .* 2 .^ (k .* (x1-y1)) .* B(x1, y1)
------- a -------- ------------- b ---------------
that is the multiplication of two sub-expressions that each has 3 dimensions, because each contains just 3 loop variables. So the 5D problem is reduced to 3D.
The if condition has also two sub-expressions that each can be multiplied by a and b sub-expressions:
conditiona = ~((y0-x0 > N/2) | (x0-y0 > N/2));
conditionb = ~((y1-x1 > N/2) | (x1-y1 > N/2));
a = 2 .^ (k .* (x0-y0)) .* conditiona;
b = 2 .^ (k .* (x1-y1)) .* B .* conditionb;
A for loop can be formed just by using two loop variables x0 and y0:
for x0=1:N
for y0=1:N
A(x0,y0)=A(x0,y0)+ sum(sum(sum(a(x0,x0, :) .* b, 3), 2), 1);
%or simply A(x0,y0)=A(x0,y0)+ sum(a(x0,x0, :) .* b, "all");
end
end
That can be simplified to the following loop by precomputing sum of b:
bsum = sum(sum(b, 1) ,2);
% bsum = sum(b ,[1, 2]);
for x0=1:N
for y0=1:N
A(x0,y0)=A(x0,y0)+ sum(a(x0,x0, :) .* bsum, 3);
% or as vector x vector multiplication
% A(x0,y0)=A(x0,y0)+ squeeze(a(x0,x0, :)).' * squeeze(bsum);
end
end
Here the loop can be prevented by using the matrix x vector multiplication:
A = A + reshape(reshape(a, [] , N) * bsum ,N ,N);
Update this solution may not be faster under Matlab, because the execution engine can optimise the loops in the original code. (It does provide a speedup under Octave.)
One trick to deal with if statements within loops is to turn the if statement (or part of it) into a logical matrix. You can then multiply the logical matrix elementwise by the matrix of values you are adding in each step. A false value will evaluate to zero and will not change the result.
This only works if each element can be computed independently of the others.
It will generally make the actual calculation line slower, but in Matlab this is often outweighed by the huge speed improvement from the the removal of the for loops.
In your example, we can use this idea with meshgrid to remove the loops over x0 and y0.
The calculation line needs to become an elementwise matrix caluclation, so elementwise operators .*, .^ and | replace *, ^ and |.
% Warning: Y0 and X0 are swapped in this example
[Y0, X0] = meshgrid(1:N,1:N);
% Create a logical matrix which represents part of the if statement
C = ~((Y0-X0>N/2) | (X0-Y0>N/2));
for k=1:N
for x1=1:N
for y1=1:N
if ~((y1-x1>N/2)||(x1-y1>N/2))
% Multiply by C elementwise
A = A + C.*2.^(k*((X0-Y0)+(x1-y1)))*B(x1,y1);
end
end
end
end
You could even take this concept further and remove more loops using multidemensional ndgrids, but it becomes more complex (you have to start summing over dimensions) and the multidimensional arrays become unwieldy if N is large.
Note: be careful with index order. meshgrid defines y as rows and x as columns, so matrix indexing is A(y,x) but you are using A(x,y). So to make your example work I've switched x and y in the output of meshgrid.
Related
I am trying to evaluate the matrices Y(p,k) and Z(p,k) using the following simplified Matlab code.
They depend on some matrices A(j,k), B(j,p) and C(j,k) which I am able to precalculate, so I have just initialised them as random arrays for this MWE. (Note that B is a different size to A and C).
Nj = 5000;
Nk = 1000;
Np = 500; % max loop iterations
A = rand(Nj,Nk); % dummy precalculated matrices
B = rand(Nj,Np);
C = rand(Nj,Nk);
Y = zeros(Np,Nk); % allocate storage
Z = zeros(Np,Nk);
tic
for p = 1:Np
X = A .* B(:,p);
Y(p,:) = sum( X , 1 );
Z(p,:) = sum( C .* X , 1 );
end
toc % Evaluates to 11 seconds on my system
As can be seen above, I am repeating my calculation by looping over index p (because the matrix B depends on p).
I have managed to get this far by moving everything which can be precalculated outside the loop (contained in A, B and C), but on my system this code still takes around 11 seconds to execute. Can anyone see a way in Matlab to speed this up, or perhaps even remove the loop and process all at once?
Thank you
I think the following should be equivalent and much faster:
Y = B' * A;
Z = B' * (A.*C);
Notes:
If B is complex-valued then you should use .' for transposition instead.
You may also want to pre-compute B directly in transposed form (i.e. as a Np by Nj matrix) to avoid the transposition altogether.
If C is not needed anywhere else, then pre-compute it as A.*C instead in order to avoid the extra element-wise multiplication.
Consider the preallocation of the following two vectors:
vecCol = NaN( 3, 1 );
vecRow = NaN( 1, 3 );
Now the goal is to assign values to those vectors (e.g. within a loop if vectorization is not possible). Is there a convention or best practice regarding the indexing?
Is the following approach recommended?
for k = 1:3
vecCol( k, 1 ) = 1; % Row, Column
vecRow( 1, k ) = 2; % Row, Column
end
Or is it better to code as follows?
for k = 1:3
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
It makes no difference functionally. If the context means that the vectors are always 1D (your naming convention in this example helps) then you can just use vecCol(i) for brevity and flexibility. However, there are some advantages to using the vecCol(i,1) syntax:
It's explicitly clear which type of vector you're using. This is good if it matters, e.g. when using linear algebra, but might be irrelevant if direction is arbitrary.
If you forget to initialise (bad but it happens) then it will ensure the direction is as expected
It's a good habit to get into so you don't forget when using 2D arrays
It appears to be slightly quicker. This will be negligible on small arrays but see the below benchmark for vectors with 10^8 elements, and a speed improvement of >10%.
function benchie()
% Benchmark. Set up large row/column vectors, time value assignment using timeit.
n = 1e8;
vecCol = NaN(n, 1); vecRow = NaN(1, n);
f = #()fullidx(vecCol, vecRow, n);
s = #()singleidx(vecCol, vecRow, n);
timeit(f)
timeit(s)
end
function fullidx(vecCol, vecRow, n)
% 2D indexing, copied from the example in question
for k = 1:n
vecCol(k, 1) = 1; % Row, Column
vecRow(1, k) = 2; % Row, Column
end
end
function singleidx(vecCol, vecRow, n)
% Element indexing, copied from the example in question
for k = 1:n
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
end
Output (tested on Windows 64-bit R2015b, your mileage may vary!)
% f (full indexing): 2.4874 secs
% s (element indexing): 2.8456 secs
Iterating this benchmark over increasing n, we can produce the following plot for reference.
A general rule of thumb in programming is "explicit is better than implicit". Since there is no functional difference between the two, I'd say it depends on context which one is cleaner/better:
if the context uses a lot of matrix algebra and the distinction between row and column vectors is important, use the 2-argument indexing to reduce bugs and facilitate reading
if the context doesn't disciminate much between the two and you're just using vectors as simple arrays, using 1-argument indexing is cleaner
So, I need to vectorize some for loops into a single line. I understand how vectorize one and two for-loops, but am really struggling to do more than that. Essentially, I am computing a "blur" matrix M2 of size (n-2)x(m-2) of an original matrix M of size nxm, where s = size(M):
for x = 0:1
for y = 0:1
m = zeros(1, 9);
k = 1;
for i = 1:(s(1) - 1)
for j = 1:(s(2) - 1)
m(1, k) = M(i+x,j+y);
k = k+1;
end
end
M2(x+1,y+1) = mean(m);
end
end
This is the closest I've gotten:
for x=0:1
for y=0:1
M2(x+1, y+1) = mean(mean(M((x+1):(3+x),(y+1):(3+y))))
end
end
To get any closer to a one-line solution, it seems like there has to be some kind of "communication" where I assign two variables (x,y) to index over M2 and index over M; I just don't see how it can be done otherwise, but I am assured there is a solution.
Is there a reason why you are not using MATLAB's convolution function to help you do this? You are performing a blur with a 3 x 3 averaging kernel with overlapping neighbourhoods. This is exactly what convolution is doing. You can perform this using conv2:
M2 = conv2(M, ones(3) / 9, 'valid');
The 'valid' flag ensures that you return a size(M) - 2 matrix in both dimensions as you have requested.
In your code, you have hardcoded this for a 4 x 4 matrix. To double-check to see if we have the right results, let's generate a random 4 x 4 matrix:
rng(123);
M = rand(4, 4);
s = size(M);
If we run this with your code, we get:
>> M2
M2 =
0.5054 0.4707
0.5130 0.5276
Doing this with conv2:
>> M2 = conv2(M, ones(3) / 9, 'valid')
M2 =
0.5054 0.4707
0.5130 0.5276
However, if you want to do this from first principles, the overlapping of the pixel neighbourhoods is very difficult to escape using loops. The two for loop approach you have is good enough and it tackles the problem appropriately. I would make the size of the input instead of being hard coded. Therefore, write a function that does something like this:
function M2 = blur_fp(M)
s = size(M);
M2 = zeros(s(1) - 2, s(2) - 2);
for ii = 2 : s(1) - 1
for jj = 2 : s(2) - 1
p = M(ii - 1 : ii + 1, jj - 1 : jj + 1);
M2(ii - 1, jj - 1) = mean(p(:));
end
end
The first line of code defines the function, which we will call blur_fp. The next couple lines of code determine the size of the input matrix as well as initialising a blank matrix to store out output. We then loop through each pixel location in the matrix that is possible without the kernel going outside of the boundaries of the image, we grab a 3 x 3 neighbourhood with each pixel location serving as the centre, we then unroll the matrix into a single column vector, find the average and store it in the appropriate output. For small kernels and relatively large matrices, this should perform OK.
To take this a little further, you can use user Divakar's im2col_sliding function which takes overlapping neighbourhoods and unrolls them into columns. Therefore, each column represents a neighbourhood which you can then blur the input using vector-matrix multiplication. You would then use reshape to reshape the result back into a matrix:
T = im2col_sliding(M, [3 3]);
V = ones(1, 9) / 9;
s = size(M);
M2 = reshape(V * T, s(1) - 2, s(2) - 2);
This unfortunately cannot be done in a single line unless you use built-in functions. I'm not sure what your intention is, but hopefully the gamut of approaches you have seen here have given you some insight on how to do this efficiently. BTW, using loops for small matrices (i.e. 4 x 4) may be better in efficiency. You will start to notice performance changes when you increase the size of the input... then again, I would argue that using loops are competitive as of R2015b when the JIT has significantly improved.
So I have the following matrices:
A = [1 2 3; 4 5 6];
B = [0.5 2 3];
I'm writing a function in MATLAB that will allow me to multiply a vector and a matrix by element as long as the number of elements in the vector matches the number of columns. In A there are 3 columns:
1 2 3
4 5 6
B also has 3 elements so this should work. I'm trying to produce the following output based on A and B:
0.5 4 9
2 10 18
My code is below. Does anyone know what I'm doing wrong?
function C = lab11(mat, vec)
C = zeros(2,3);
[a, b] = size(mat);
[c, d] = size(vec);
for i = 1:a
for k = 1:b
for j = 1
C(i,k) = C(i,k) + A(i,j) * B(j,k);
end
end
end
end
MATLAB already has functionality to do this in the bsxfun function. bsxfun will take two matrices and duplicate singleton dimensions until the matrices are the same size, then perform a binary operation on the two matrices. So, for your example, you would simply do the following:
C = bsxfun(#times,mat,vec);
Referencing MrAzzaman, bsxfun is the way to go with this. However, judging from your function name, this looks like it's homework, and so let's stick with what you have originally. As such, you need to only write two for loops. You would use the second for loop to index into both the vector and the columns of the matrix at the same time. The outer most for loop would access the rows of the matrix. In addition, you are referencing A and B, which are variables that don't exist in your code. You are also initializing the output matrix C to be 2 x 3 always. You want this to be the same size as mat. I also removed your checking of the length of the vector because you weren't doing anything with the result.
As such:
function C = lab11(mat, vec)
[a, b] = size(mat);
C = zeros(a,b);
for i = 1:a
for k = 1:b
C(i,k) = mat(i,k) * vec(k);
end
end
end
Take special note at what I did. The outer-most for loop accesses the rows of mat, while the inner-most loop accesses the columns of mat as well as the elements of vec. Bear in mind that the number of columns of mat need to be the same as the number of elements in vec. You should probably check for this in your code.
If you don't like using the bsxfun approach, one alternative is to take the vector vec and make a matrix out of this that is the same size as mat by stacking the vector vec on top of itself for as many times as we have rows in mat. After this, you can do element-by-element multiplication. You can do this stacking by using repmat which repeats a vector or matrices a given number of times in any dimension(s) you want. As such, your function would be simplified to:
function C = lab11(mat, vec)
rows = size(mat, 1);
vec_mat = repmat(vec, rows, 1);
C = mat .* vec_mat;
end
However, I would personally go with the bsxfun route. bsxfun basically does what the repmat paradigm does under the hood. Internally, it ensures that both of your inputs have the same size. If it doesn't, it replicates the smaller array / matrix until it is the same size as the larger array / matrix, then applies an element-by-element operation to the corresponding elements in both variables. bsxfun stands for Binary Singleton EXpansion FUNction, which is a fancy way of saying exactly what I just talked about.
Therefore, your function is further simplified to:
function C = lab11(mat, vec)
C = bsxfun(#times, mat, vec);
end
Good luck!
I am writing a program in MATLAB as a part of my project based on DFT.
Let the N x N data matrix be X and the corresponding DFT matrix be Y, then the DFT coefficients can be expressed as
Y(k1,k2) = ∑(n1=0:N-1)∑(n2=0:N-1)[X(n1,n2)*(WN^(n1k1+n2k2))] (1)
0≤k1,k2≤N-1
Where WN^k=e^((-j2πk)/N)
Since the twiddle factor WN is periodic, (1) can be expressed as
Y(k1,k2)=∑(n1=0:N-1)∑(n1=0:N-1)[X(n1,n2)*(WN^([(n1k1+n2k2)mod N) ] (2)
The exponent ((n1k1 +n2k2)) N = p is satisfied by a set of (n1,n2) for a given (k1,k2). Hence, by grouping such data and applying the property that WN^(p+N /2) = -(WN^P),
(2) can be expressed as
Y(k1,k2)= ∑(p=0:M-1)[Y(k1,k2,p)*(WN^p)] (3)
Where
Y(k1,k2,p)= ∑(∀(n1,n2)|z=p)X(n1,n2) - ∑(∀(n1,n2)|z=p+M)X(n1,n2) (4)
z=[(n1k1+n2k2)mod N] (5)
I am coding a program to find Y(k1,k2,p).ie I need to create slices of 2d matrices(ie a 3D matrix in which each slice is a 2D matrix )from a given 2D square matrix (which is the matrix X)..Dimensions of X can be upto 512.
Based on the above equations,I have written a code as follows.I need to vectorise it.
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
for n1=1:N
for n2=1:N
N1=n1-1; N2=n2-1; P=p-1; K1=k1-1; K2=k2-1;
z=mod((N1*K1+N2*K2),N);
if (z==P)
Y(k1,k2,p)= Y(k1,k2,p)+ X(n1,n2);
elsif (z==(P+M))
Y(k1,k2,p)= Y(k1,k2,p)- X(n1,n2);
end
end
end
end
end
As there is 5 FOR loops, the execution time is very large for large dimensions of N. Hence please provide me a solution for eliminating the FOR loops and vectorising the code..I need to make the code execute in maximum speed...Thanks Again..
Here is a first hint to vectorize the most inner loop.
From your code, we can notice that n1, N1, P, K1 and K2 are constant in this loop.
So we can rewrite z as a mask vector as follows:
z = mod(N1*K1+K2*(0:N-1));
Then your if-statement is equivalent to adding the sum of all elements in X so that z==P minus the sum of all elements in X so that z==P+M. Rewriting this is straightforward:
Y(k1,k2,p)= Y(k1,k2,p)+sum(X(n1,z==P))-sum(X(n1,z==P+M));
So your program can be first written as follows:
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
for n1=1:N
N1=n1-1; P=p-1; K1=k1-1; K2=k2-1;
z=mod(N1*K1+K2*(0:N-1),N);
Y(k1,k2,p) = sum(X(n1,z==P))-sum(X(n1,z==P+M));
end
end
end
end
Then you can do the same thing with n1; for that, you need to construct a 2D array for z, such as:
z = mod(K1*repmat(0:N-1,N,1)+K2*repmat((0:N-1).',1,N));
Notice that size(z)==size(X).Then the 2D sum for Y becomes:
Y(k1,k2,p) = Y(k1,k2,p)+sum(X(z==P))-sum(X(z==P+M));
The += operation is here no longer needed, since you access only once to each element of Y:
Y(k1,k2,p)= sum(X(n1,z==P))-sum(X(n1,z==P+M));
And so we discard one more loop:
N=size(X,1);
M=N/2;
Y(1:N,1:N,1:M)=0;
for k1 = 1:N
for k2 = 1:N
for p= 1:M
P=p-1; K1=k1-1; K2=k2-1;
z = mod(K1*repmat(0:N-1,N,1)+K2*repmat((0:N-1).',1,N));
Y(k1,k2,p) = sum(X(z==P))-sum(X(z==P+M));
end
end
end
Concerning the other loops, I don't think it worths it to vectorize them, as you have to build a 5D array, which could be very huge in memory. My advise is to keep z as a 2D array, as it is of the size of X. If it does not fit well in memory, just vectorize the most inner loop.