I have a probability matrix(glcm) of size 256x256x20. I have reshaped the matrix to
65536x20, so that I can eliminate one loop (along the 3rd dimension).
I want to do the following calculation.
for y = 1:256
for x = 1:256
if (ismember((x + y),(2:2*256)))
p_xplusy((x+y),:) = p_xplusy((x+y),:) + glcm(((y-1)*256+x),:);
end
end
end
So the p_xplusy will be a 511x20 matrix which each element is the sum of the diagonal of nxn sub matrix (where n belongs to 1:256) of the original 256x256x20 matrix.
This code block is making my program inefficient and I want to vectorize this loop. Any help would be appreciated.
Since your if statement is just checking whether x+y is less than or equal to 256, just force it to always be, and remove excess loops:
for y = 1:256
for x = 1:256-y
p_xplusy((x+y),:) = p_xplusy((x+y),:) + glcm(((y-1)*256+x),:);
end
end
This should provide a noticeable speed-up to your code.
You can reduce the complexity from O(n^2) to O(2*n) and thus improve runtime efficiency -
N = 256;
for k1 = 1:N
idx_glcm = k1:N-1:N*(k1-1)+1;
p_xplusy(k1+1,:) = p_xplusy(k1+1,:) + sum(glcm(idx_glcm,:),1);
end
for k1 = 2:N
idx_glcm = k1*N:N-1:N*(N-1)+k1;
p_xplusy(N+k1,:) = p_xplusy(N+k1,:) + sum(glcm(idx_glcm,:),1);
end
Some quick runtime tests seem to confirm our efficiency theory too.
Related
The following MATLAB code loops through all elements of a matrix with size 2IJ x 2IJ.
for i=1:(I-2)
for j=1:(J-2)
ij1 = i*J+j+1; % row
ij2 = i*J+j+1 + I*J; % col
D1(ij1,ij1) = 2;
D1(ij1,ij2) = -1;
end
end
Is there any way I can parallelize it use MATLAB's parfor command? You can assume any element not defined is 0. So this matrix ends up being sparse (mostly 0s).
Before using parfor it is recommended to read the guidelines related to decide when to use parfor. Specially this:
Generally, if you want to make code run faster, first try to vectorize it.
Here vectorization can be used effectively to compute indices of the nonzero elements. Those indices are used in function sparse. For it you need to define one of i or j to be a column vector and another a row vector. Implicit expansion takes effect and indices are computed.
I = 300;
J = 300;
i = (1:I-2).';
j = 1:J-2;
ij1 = i*J+j+1;
ij2 = i*J+j+1 + I*J;
D1 = sparse(ij1, ij1, 2, 2*I*J, 2*I*J) + sparse(ij1, ij2, -1, 2*I*J, 2*I*J);
However for the comparison this can be a way of using parfor (not tested):
D1 = sparse (2*I*J, 2*I*J);
parfor i=1:(I-2)
for j=1:(J-2)
ij1 = i*J+j+1;
ij2 = i*J+j+1 + I*J;
D1 = D1 + sparse([ij1;ij1], [ij1;ij2], [2;-1], 2*I*J, 2*I*J) ;
end
end
Here D1 used as reduction variable.
I am getting confused on how to properly set up this equation. To find a value of V(i,j). The end result would be plotting V over time. I understand that there needs to be loops to allow this equation to work, however I am lost when it comes to setting it up. Basically I am trying to take the sum from n=1 to infinity of (1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*j)/L)*sin((n*pi*i)/L)
I originally thought that I should make it a while loop to increment n by 1 until I reach say 10 or so just to get an idea of what the output would look like. All of the variables were unknown and values were added again to see what the plot would look like.
I have down another code where the equation is just dependent on i and j. However with this n term, I am thrown off. Any advice would be great as to setting up the equation. Thank you.
L=10;
x=linspace(0,L,30);
t1= 50;
X=30;
p=1
c=t1/1000;
V=zeros(X,t1);
V(1,:)=0;
V(30,:)=0;
R=((4*p*L^3)/c);
n=1;
t=1:50;
while n < 10
for i=1:31
for j=1:50
V(i,j)=R*sum((1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*j)/L)*sin((n*pi*i)/L));
end
end
n=n+1;
end
figure(1)
plot(V(i,j),t)
Various ways of doing so:
1) Computing the sum up to one Nmax in one shot:
Nmax = 30;
Vijn = #(i,j,n) R*((1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*j)/L)*sin((n*pi*i)/L));
i = 1:31;
j = 1:50;
n = 1:Nmax;
[I,J,N] = ndgrid(i,j,n);
V = arrayfun(Vijn,I,J,N);
Vc = cumsum(V,3);
% now Vc(:,:,k) is sum_n=1^{k+1} V(i,j,n)
figure(1);clf;imagesc(Vc(:,:,end));
2) Looping indefinitely
n = 1;
V = 0;
i = 1:31;
j = 1:50;
[I,J] = meshgrid(i,j);
while true
V = V + R*((1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*J)/L).*sin((n*pi*I)/L));
n = n + 1;
figure(1);clf;
imagesc(V);
title(sprintf('N = %d',n))
drawnow;
pause(0.25);
end
Note that in your example you won't need many terms, since:
Every second term is zero (for even n, the term 1-(-1)^n is zero).
The terms decay with 1/n^4. In norms: n=1 contributes ~2e4, n=3 contributes ~4e2, n=5 contributes 5e1, n=7 contributes ~14, etc. Visually, there is a small difference between n=1 and n=1+n=3 but barely a noticeable one for n=1+n=3+n=5.
Given that so few terms are needed, the first approach is probably the better one. Also, skip the even indices, as you don't need them.
I have two nested for-loops that are used to format data that I load it. The loops have the following construction:
data = magic(20000);
data = data(:,1:3);
for i=0:10
for j=0:10
data_tmp = data((1:100)+100*j+100*10*i,:);
vx(:, i+1,j+1) = data_tmp(:,1);
vy(:, i+1,j+1) = data_tmp(:,2);
vz(:, i+1,j+1) = data_tmp(:,3);
end
end
Arrays vx, vy and vz I do pre-allocate to their desired size. However, is there a way to vectorize the for-loops to increase the efficiency? I'm not convinced it is the case due to the first line in the second loop, data((1:100)+100*j+100*10*i,:), is there a better way to do this?
It turns out that you have repeated index in loop
at i=k, j=10 and i=k+1, j=0 for k<10
for example, you read 1:100 + 100*10 + 100*10*0 and then read 1:100 + 100*0 + 100*10*1 which are identical.
Reshape w/ Repeated index
If this was what you intended to do, then vectorization needs one more step (index generation).
Following is my suggestion (N=100, M=10 where N is the length of data_tmp and M is the maximum loop variable)
index = bsxfun(#plus,bsxfun(#plus,(1:N)',reshape(N*(0:M),1,1,M+1)),M*N*(0:M)); %index generation
vx = data(index);
vy = data(index + size(data,1));
vz = data(index + size(data,1)*2);
This is not that desirable, but it will work.
When I tested on my laptop, it is twice faster than your original code with pre-allocation. As I increase the size of data, the gap gets smaller and smaller.
Reshape w/o Repeated index
If not i.e., you want to reshape each column in the direction of 3rd dimension first, 2nd dimension last), then following would work.
Firstly, this is how I interpreted your code
data = magic(20000);
data = data(:,1:3);
N = 100; M = 10;
for i=0:(M-1)
for j=0:(M-1)
data_tmp = data((1:N)+M*j+N*M*i,:);
vx(:, i+1,j+1) = data_tmp(:,1);
vy(:, i+1,j+1) = data_tmp(:,2);
vz(:, i+1,j+1) = data_tmp(:,3);
end
end
Note that loop ended at (M-1).
Following is my suggestion.
vx = permute(reshape(dat(1:N*M*M,1), N, M, M),[1,3,2]);
vy = permute(reshape(dat(1:N*M*M,2), N, M, M),[1,3,2]);
vz = permute(reshape(dat(1:N*M*M,3), N, M, M),[1,3,2]);
In my laptop, it is 4 times faster than original code. As I increase the size, the gap approaches to 2.
Just in case you want to stick with the loop, here is a much faster way to do this:
data = randi(100,20000,3);
[vx,vy,vz] = deal(zeros(100,11,11));
[J,I] = ndgrid(1:11,1:11);
c = 1;
for k = 0:100:11000
vx(:,I(c),J(c)) = data((1:100)+k,1);
vy(:,I(c),J(c)) = data((1:100)+k,2);
vz(:,I(c),J(c)) = data((1:100)+k,3);
c = c+1;
end
My guess is that reshape from #Dohyun answer is what you looking for (and it's x10 faster than this, and x10000 faster than your code), but for next time you use loops, this may be useful.
And here is another option to do this without reshape, in a similar time to the reshape version:
[vx,vy,vz] = deal(zeros(100,10,11));
vx(:) = data(1:11000,1);
vy(:) = data(1:11000,2);
vz(:) = data(1:11000,3);
vx = permute(vx,[1 3 2]);
vy = permute(vy,[1 3 2]);
vz = permute(vz,[1 3 2]);
The idea is that you define the shape of [vx,vy,vz] while allocating them.
In matlab, I have a loop of the form:
a=1;
for (i = 1:N)
a = a * b(i) + c(i);
end
Can this loop be vectorized, or partially unrolled?
For b and c of length 4 each, when the loop is unrolled, you would have -
output = b1b2b3b4 + c1b2b3b4 + c2b3b4 + c3b4 + c4
So, the generic formula would be :
output = b1b2b3...bN + c1b2b3..bN + c2b3..bN + c3b4..bN + ... cN-1bN + cN
The cummulative product of b could be calculated with cumprod with elements being flipped or "reversed". Rest is all about elementwise multiplication with c elements that are 1 place shifted and then including the remaining scalar elements from the cummulative product and c and finally summing all those up to get us the final scalar output.
So, the coded version would look something like this -
cumb = cumprod(b,'reverse');
a = sum(cumb(2:end).*c(1:end-1)) + cumb(1) + c(end);
Benchmarking
Let's compare the loopy approach from the question against the vectorized one as proposed earlier in this post.
Here are the approaches as functions -
function a = loopy(b,c)
N = numel(b);
a = 1;
for i = 1:N
a = a * b(i) + c(i);
end
return;
function a = vectorized(b,c)
cumb = cumprod(b,'reverse');
a = sum(cumb(2:end).*c(1:end-1)) + cumb(1) + c(end);
return;
Here's the code to benchmark those two approaches -
datasizes = 10.^(1:8);
Nd = numel(datasizes);
time_loopy = zeros(1,Nd);
time_vectorized = zeros(1,Nd);
for k1 = 1:numel(datasizes)
N = datasizes(k1);
b = rand(1,N);
c = rand(1,N);
func1 = #() loopy(b,c);
func2 = #() vectorized(b,c);
time_loopy(k1) = timeit(func1);
time_vectorized(k1) = timeit(func2);
end
figure,
loglog(datasizes,time_loopy,'-rx'), hold on
loglog(datasizes,time_vectorized,'-b+'),
set(gca,'xgrid','on'),set(gca,'ygrid','on'),
xlabel('Datasize (# elements)'), ylabel('Runtime (s)')
legend({'Loop','Vectorized'}),title('Runtime Plot')
figure,
semilogx(datasizes,time_loopy./time_vectorized,'-k.')
set(gca,'xgrid','on'),set(gca,'ygrid','on'),
xlabel('Datasize (# elements)'), ylabel('Speedup (x)')
legend({'Speedup with vectorized method over loopy one'}),title('Speedup Plot')
Here's the runtime and speedup plots -
Few observations
Stage #1: From starting datasize until 1000 elements, loopy approach has the upper hand, as the vectorized approach isn't getting enough elements to benefit from everything-in-one-go approach.
Stage #2: From 1000 elements until 1000,0000 elements, vectorized method seems to be the better one, as its getting enough elements to work with.
Stage #3: For the bigger datasize cases, it seems the memory bandwidth requirement of storing and working with millions of elements with vectorized approach as opposed to using just a scalar value with the loopy approach
might be pegging back the vectorized approach.
Conclusions: If performance is the most important criteria, one can go with vectorized method or stay with the original loopy code based on the datasizes.
This question is from chegg.com.
Given a vector a of N elements a_{n},n =1,2,...,N, the simple moving average of m sequential elements of this vector is defined as
mu(j) = mu(j-1) + (a(m+j-1)-a(j-1))/m for j = 2,3,...,(N-m+1)
where
mu(1) = sum(a(k))/m for k = 1,2,...,m
Write a script that computes these moving averages when a is given by a=5*(1+rand(N,1)), where rand generates uniformly distributed random numbers. Assume that N=100 and m=6. Plot the results using plot(j,mu(j)) for j=1,2,...,N-m+1.
My current code is below, but I'm not sure where to go from here or if it's even right.
close all
clear all
clc
N = 100;
m = 6;
a = 5*(1+rand(N,1));
mu = zeros(N-m+1,1);
mu(1) = sum(a(1:m));
for j=2
mu(j) = mu(j-1) + (a-a)/m
end
plot(1:N-m+1,mu)
I'll step you through the modifications.
Firstly, mu(1) was not fully defined. The equation given is slightly incorrect, but this is what it should be:
mu(1) = sum(a(1:m))/m;
then the for loop has to go from j=2 to j=N-m+1
for j=2:N-m+1
and at each step, mu(j) is given by this formula, the same as given in the question
mu(j) = mu(j-1) + (a(m+j-1)-a(j-1))/m
And that's all you need to change!