I am implementing an Sum of Square Distances based disparity Map function in Matlab for computer vision. Currently the code has a nested for loop and runs very slow. Any suggestions on vectorizing it to make it more efficient? Thanks
%im1 and im2 are images and win1, win2 are window sizes
for i=win1+1:1:bottom-win1
parfor j=win2+1:1:right-win2
%j=[win2+1:bottom-win2];
template=im1(i-win1:i+win1,j-win2:j+win2);
arg1=conv2(im2.^2,ones(size(template))/2,'same');
arg2=conv2(im2,rot90(template,2),'same');
arg=arg1-arg2;
[xj]=find(arg==min(arg(:)));
disparityMap(i,j)=1-xj(1);
end
end
Three suggestions to try to speed things up:
move the parfor to the outer loop to reduce the overhead of the parallel construct ;
compute im2.^2 once before the loop and save its value in a temporary variable as it does not depend on the loop variables there is no need to compute it again and again, and actually
move the whole computation of arg1 out of the loops as it only depends on the size of template and not its value, and if I see correctly, the size is constant ;
replace the [xj]=find(arg==min(arg(:))); construct with something along the lines of [tmp, ind] = min(arg(:)) ; xj=ind2sub(size(arg), ind) to avoid the call to find and rescan the matrix while the indices can be computed simply.
Untested, but it should give you a start
arg1=conv2(im2.^2,ones([2*win1+1, 2*win2+1])/2,'same');
parfor i=win1+1:1:bottom-win1
for j=win2+1:1:right-win2
%j=[win2+1:bottom-win2];
template=im1(i-win1:i+win1,j-win2:j+win2);
arg2=conv2(im2,rot90(template,2),'same');
arg=arg1-arg2;
[tmp, ind] = min(arg(:)) ;
xj=ind2sub(size(arg), ind);
disparityMap(i,j)=1-xj(1);
end
end
Also make sure the number of workers is chosen appropriately, and try to compile the code to mex to see if there is improvement.
Related
I am trying to use a for loop inside of a parfor loop in Matlab.
The for loop is equivalent to the ballode example in here.
Inside the for loop a function ballBouncing is called which is a system of 6 differential equations.
So, what I am trying to do is to use 500 different sets of parameter values for the ODE system and run it, but for each parameter set, a sudden impulse is added, which is handled through the code in 'for' loop.
However, I don't understand how to implement this using a parfor and a for loop as below.
I could run this code by using two for loops but when the outer loop is made to be a parfor it gives the errors,
the PARFOR loop cannot run due to the way variable results is used,
the PARFOR loop cannot run due to the way variable y0 is used and
Valid indices for results are restricted in PARFOR loops
results=NaN(500,100);
x=rand(500,10);
parfor j=1:500
bouncingTimes=[10,50];%at time 10 a sudden impulse is added
refine=2;
tout=0;
yout=y0;%initial conditions of ODE system
paras=x(j,:);%parameter values for the ODE
for i=1:2
tfinal=bouncingTimes(i);
[t,y]=ode45(#(t,y)ballBouncing(t,y,paras),tstart:1:tfinal,y0,options);
nt=length(t);
tout=[tout;t(2:nt)];
yout=[yout;y(2:nt,:)];
y0(1:5)=y(nt,1:5);%updating initial conditions with the impulse
y0(6)=y(nt,6)+paras(j,10);
options = odeset(options,'InitialStep',t(nt)-t(nt-refine),...
'MaxStep',t(nt)-t(1));
tstart =t(nt);
end
numRows=length(yout(:,1));
results(1:numRows,j)=yout(:,1);
end
results;
Can someone help me to implement this using a parfor outer loop.
Fixing the assignment into results is relatively straightforward - what you need to do is ensure you always assign a whole column. Here's how I would do that:
% We will always need the full size of results in dimension 1
numRows = size(results, 1);
parfor j = ...
yout = ...; % variable size
yout(end:numRows, :) = NaN; % Expand if necessary
results(:, j) = yout(1:numRows, 1); % Shrink 'yout' if necessary
end
However, y0 is harder to deal with - the iterations of your parfor loop are not order-independent because of the way you're passing information from one iteration to the next. parfor can only handle loops where the iterations are order-independent.
We're developing an application which is processing medical images of an eye retina.
Very often the straight iterating through the pixel indices is used. And even when the size of images is fixed to 1024*768 pixels it may be a CPU-consuming operation e.g. to assign certain values to binarized pixels we need.
lowlayers2 = zeros(img_y_size, img_x_size);
for i=1:numel(lowlayers)
y = rem(lowlayers(i),img_y_size);
x = fix(lowlayers(i)/img_y_size)+1;
lowlayers2(y,x) = 1;
end;
When trying to use parfor in a simple loop above the debugger types that all variables in the loop must be presented as sliced ones. I guess it's in order to divide iterations more primitively inside the loop.
How can I modify the loop or variable to be able to use parfor? May every variable be presented as a sliced variable (mean more multidim matrix with 2 or 3 dimentions)?
a sliced variable is a variable that has a reference out of parfor loop and each of its element only accessed by a single worker (in parfor paralle workers)
sometime matlab doesnt recognize a variable in parfor loop as "sliced variable"
so you could use a temporary variable and collect results after the parfor loop,
lowlayers2 = zeros(img_y_size, img_x_size);
parfor i=1:numel(lowlayers)
y = rem(lowlayers(i),img_y_size);
x = fix(lowlayers(i)/img_y_size)+1;
t(i)= sub2ind(size(lowlayers2),y,x);
end
lowlayers2(t)=1;
NOTE 1: It is better to vectorise code in the older versions because loops didn't use to be as good as they are now in R2017 referring to (this).
I have a problem with MathWorks Parallel Computing Toolbox in Matlab. See my code below
for k=1:length(Xab)
n1=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n1)=JXab{k};
MY_j(1,n1)=JYab{k};
MZ_j(1,n1)=Z;
end
for k=length(Xab)+1:length(Xab)+length(Xbc)
n2=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n2)=JXbc{k-length(Xab)};
MY_j(1,n2)=JYbc{k-length(Yab)};
MZ_j(1,n2)=Z;
end
for k=length(Xab)+length(Xbc)+1:length(Xab)+length(Xbc)+length(Xcd)
n3=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n3)=JXcd{k-length(Xab)-length(Xbc)};
MY_j(1,n3)=JYcd{k-length(Yab)-length(Ybc)};
MZ_j(1,n3)=Z;
end
for k=length(Xab)+length(Xbc)+length(Xcd)+1:length(Xab)+length(Xbc)+length(Xcd)+length(Xda)
n4=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n4)=JXda{k-length(Xab)-length(Xbc)-length(Xcd)};
MY_j(1,n4)=JYda{k-length(Yab)-length(Ybc)-length(Ycd)};
MZ_j(1,n4)=Z;
end
If I change the for-loop to parfor-loop, matlab warns me that MX_j is not an efficient variable. I have no idea how to solve this and how to make these for loops compute in parallel?
For me, it looks like you can combine it to one loop. Create combined cell arrays.
JX = cat(2,JXab, JXbc, JXcd, JXda);
JY = cat(2,JYab, JYbc, JYcd, JYda);
Check for the right dimension here. If your JXcc arrays are column arrays, use cat(1,....
After doing that, one single loop should do it:
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
for k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Before parallizing anything, check if this still valid. I haven't tested it. If everything's nice, you can switch to parfor.
When using parfor, the arrays must be preallocated. The following code could work (untested due to lack of test-data):
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
MX_j = zeros(1,n*length(Z));
MY_j = MX_j;
MZ_j = MX_j;
parfor k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Note: As far as I can see, the parfor loop will be much slower here. You simply assign some values... no calculation at all. The setup of the worker pool will take 99.9% of the total execution time.
I have made for loop parralel in matlab, but I get warning that some of my arrays are broadcasted which lead to unnecessary communication overhead. I'm new in matlab and don't know how to solve this issue. Would someone help me with this?
A and Y is broadcasted in this code
parfor k=1:length(gamma)
Lambda=gamma(k);
tmp=zeros(nfolds,Num_Tasks);
for p=1:length(omega)
OmegA= omega(p)
for Fold=1:size(Fold_indices,2)
% Creat train and test fold
A_Train=A(logical(Fold_indices(:,Fold)),1:end); # A is broadcasted
%size(A_Train)
Y_Train=Y(logical(Fold_indices(:,Fold)),1:Num_Tasks); # Y is broadcated
A_Test=A(~logical(Fold_indices(:,Fold)),1:end);
Y_Test=Y(~logical(Fold_indices(:,Fold)),1:Num_Tasks);
coff=Estimate_x(Y_Train,A_Train,Lambda,OmegA,Binding_matrix) ;
% Do the prediction on Kth f;old and compute the error
% coff
%sum((A_Test*coff-Y_Test).^2) ./ size(A_Test,1)
tmp(Fold,1:end)=sum((A_Test*coff-Y_Test).^2) ./ size(A_Test,1);
tmp;
%Coefficents{:,Fold}=coff;
coff
end
In_Fold_Error{1,p}=tmp;
%In_Fold_Error{2,k}= Lambda;
Coefficents{:,p}=coff;
end
CVErr_twoparam{1,k}=In_Fold_Error;
Coefficents_twoparam=Coefficents;
end
Unless you can compute A_Train and Y_Train directly in the parfor loop, as opposed to using subscripting in order to get their values from A and Y, there's nothing you can do. Your source of information is unique, so these values will be broadcasted.
If you have enough memory, maybe it would be a better idea to compute A_Train and Y_Train as cell arrays of matrices in a normal for, and then use these sequentially pre-computed values inside the parfor. Like this you don't send the entire matrices A and Y to workers.
I want to paralyze my forloop in Matlab.I use parfor function for that, but I get error because the way I used variable inside of the loop. would someone help me to fix that. I'm new in matlab.
Here is part of my try.
Here is part of problematic part:
CV_err=zeros(length(gamma), (Num_Tasks + 1));
parfor k=1:length(gamma)
#block of code
#
CV_err(k,1:Num_Tasks)= sum(In_Fold_Error)./size(In_Fold_Error,1);
CV_err(k,Lambda_location)= Lambda;
CV_err(k,(Num_Tasks +2))= sum(CV_err(k,1:Num_Tasks))/Num_Tasks;
end
Error: parfor loop can not run due to way CV_err is used.
CV_err is indexed in different ways, potentially causing dependencies
Seems that valid indices are restricted in parfor .
While your variable is sliced, you only access the k-th row in the k-th iteration, the code analyser does not understand it. Give a little help to matlab, first put all data into a vector and then write all at once to the sliced variable.
CV_err=zeros(length(gamma), (Num_Tasks + 2));
parfor k=1:length(gamma)
%block of code
%
temp=zeros(1,(Num_Tasks + 2));
temp(1,1:Num_Tasks)= sum(In_Fold_Error)./size(In_Fold_Error,1);
temp(1,Lambda_location)= Lambda;
temp(1,(Num_Tasks +2))= sum(temp(1,1:Num_Tasks))/Num_Tasks;
CV_err(k,:)=temp;
end
The limitation is explained in the documentation:
Form of Indexing. Within the list of indices for a sliced variable, one of these indices is of the form i, i+k, i-k, k+i, or k-i, where i is the loop variable and k is a constant or a simple (nonindexed) broadcast variable; and every other index is a scalar constant, a simple broadcast variable, a nested for-loop index, colon, or end.
Source
To fix pre-allocation, don't pre-allocate. You're just telling to MATLAB how it should split the work among workers; parfor doesn't like that.
The answer is: don't make loops change common variables, write your results separately, grow cell arrays instead of matrices, i.e.
clear CV_err;
parfor k=1:length(gamma)
%// here your other code
this_CV_err = zeros(Num_Tasks+2,1);
this_CV_err(1:Num_Tasks) = sum(In_Fold_Error)./size(In_Fold_Error,1);
this_CV_err(Lambda_location) = Lambda;
this_CV_err(Num_Tasks+2) = mean(this_CV_err(1:Num_Tasks));
CV_err{k} = this_CV_err;
end;