Slicing Variable in Parfor loop - matlab

Hi I want to run the code as below using PARFOR.
When I try it says that :
Valid indices for 'A_x' and 'A_y' are restricted in PARFOR loops.
Explanation For MATLAB to execute parfor loops efficiently, the
amount of data sent to the MATLAB workers must be minimal. One of the
ways MATLAB achieves this is by restricting the way variables can be
indexed in parfor iterations. The indicated variable is indexed in a
way that is incompatible with parfor. Suggested Action
Fix the indexing. For a description of the indexing restrictions, see “Sliced Variables” in the >Parallel Computing Toolbox documentation:
N=eveninteger;
H=zeros(N);
V=zeros(N);
A_x=zeros(N);
A_y=zeros(N);
parfor i=1:N;
for j=1:N;
if H(i,j)==-2;
t=0.3;
As_x=t*(j-i)/a;
As_y=t*(j-i)/a;
elseif H(i,j)==-3;
t=0.8;
As_x=t*(j-i)/(a*sqrt3);
As_y=t*(j-i)/(a*sqrt3);
elseif i==j
As_x=i;
As_y=i;
else
t=0;
As_x=0;
As_y=0;
end
for p=1:N/2
for q=N/2+1:N
A_x(p,q)=A_x(p,q)+As_x*(V(i,p)*V(j,q));
A_y(p,q)=A_y(p,q)+As_y*(V(i,p)*V(j,q));
end
end
end
end
I could not find solution. Could you offer me a solution.
Thanks in advance.
Erico

It looks like you're trying to perform a "reduction" on A_x and A_y using +. You might be able to work around this by doing something like the following:
parfor i = 1:N
A_x_tmp = zeros(N);
A_y_tmp = zeros(N);
for p=1:N/2
for q=N/2+1:N
A_x_tmp(p,q) = A_x_tmp(p,q) + ...
A_y_tmp(p,q) = A_y_tmp(p,q) + ...
end
end
A_x = A_x + A_x_tmp;
A_y = A_y + A_y_tmp;
end
In that way, PARFOR will understand the reduction operations on A_x and A_y.

Related

How to find argmin/best fit/optimize for an overdetermined quadratic system for multiple variables in Matlab

I have 100 equations with 5 variables. Is there a function in Matlab which I can use to find the optimal solution of these equations?
My problem is to find argmin ||(a-ic)^2 + (b-jd)^2 + e - h(i,j)|| over all i, j from -10 to 10. ie.
%% Note: not Matlab code. Just showing the Math.
for i = -10:10
for j = -10:10
(a-ic)^2 + (b-jd)^2 + e = h(i,j)
known: h(i,j) is a 10*10 matrix,and i,j are indexes
expected: the optimal result of a,b,c,d,e
You can try using lsqnonlin as follows.
%% define a helper function in your .m file
function f = fun(x)
a=x(1); b=x(2); c=x(3); d=x(4); e=x(5); % Using variable names from your question. In other situations, be careful when overwriting e.
f=zeros(21*21,0); % size(f) is taken from your question. You should make this a variable for good practice.
for i = -10:10
for j = -10:10
f(10*(i+10+1)+(j+10+1)) = (a-i*c)^2 + (b-j*d)^2 + e - h(i,j); % 10 is taken from your question.
end
end
end
(Aside, why is your h(i,j) taking negative indices??)
In your main function you can simply write
function out=myproblem(x0)
out=lsqnonlin(#fun,x0);
end
In your cmd, you can call with specific initial try such as
myproblem([0,0,0,0,0])
Helper function over anonymous because in my experience helpers get sped up by JIT while anonymous do not. I also opted to reshape in the loops as an opposed to actually call reshape after because I expect reshape to cost significant extra time. Remember that O(1) in fun is not O(1) in lsqnonlin.
(As always, a solution to a nonlinear problem is not guaranteed.)

performance difference between subscript indexing and linear indexing

I have a 2D matrix in MATLAB and I use two different ways to access its elements. One is based on subscript indexing and the other is based on linear indexing. I test both methods by following code:
N = 512; it = 400; im = zeros(N);
%// linear indexing
[ind_x,ind_y] = ndgrid(1:2:N,1:2:N);
index = sub2ind(size(im),ind_x,ind_y);
tic
for i=1:it
im(index) = im(index) + 1;
end
toc %//cost 0.45 seconds on my machine (MATLAB2015b, Thinkpad T410)
%// subscript indexing
x = 1:2:N;
y = 1:2:N;
tic
for i=1:it
im(x,y) = im(x,y) +1;
end
toc %// cost 0.12 seconds on my machine(MATLAB2015b, Thinkpad T410)
%//someone pointed that double or uint32 might an issue, so we turn both into uint32
%//uint32 for linear indexing
index = uint32(index);
tic
for i=1:it
im(index) = im(index) +1;
end
toc%// cost 0.25 seconds on my machine(MATLAB2015b, Thinkpad T410)
%//uint32 for the subscript indexing
x = uint32(1:2:N);
y = uint32(1:2:N);
tic
for i=1:it
im(x,y) = im(x,y) +1;
end
toc%// cost 0.11 seconds on my machine(MATLAB2015b, Thinkpad T410)
%% /*********************comparison with others*****************/
%//third way of indexing, loops
tic
for i=1:it
for j=1:2:N
for k=1:2:N
im(j,k) = im(j,k)+1;
end
end
end
toc%// cost 0.74 seconds on my machine(MATLAB2015b, Thinkpad T410)
It seems that directly using subscript indexing is faster than the linear indexing obtained from sub2ind. Does anyone know why? I thought they were almost the same.
The intuition
As Daniel mentioned in his answer, the linear index takes up more space in RAM while the subscripts are much smaller.
For the subscripted indexing, internally, Matlab will not create the linear index, but it will use a (double) compiled loop to cycle through all elements.
The subscripted version on the other hand will have to loop through all the linear indices passed from outside, which will require more reads from memory, thus will take longer.
Claims
Linear indexing is faster
...as long as the total number of indices is the same
Timings
From the timings we see a direct confirmation for the first claim and we can infer the second with some additional testing (below).
LOOPED
subs assignment: 0.2878s
linear assignment: 0.0812s
VECTORIZED
subs assignment: 0.0302s
linear assignment: 0.0862s
First claim
We can test it with loops. The number of subref operations is the same but the linear index points directly to the element of interest while subscripts, internally, need to be converted.
The functions of interest:
function B = subscriptedIndexing(A,row,col)
n = numel(row);
B = zeros(n);
for r = 1:n
for c = 1:n
B(r,c) = A(row(r),col(c));
end
end
end
function B = linearIndexing(A,index)
B = zeros(size(index));
for ii = 1:numel(index)
B(ii) = A(index(ii));
end
end
Second claim
This claim is an inference from the observed difference in speed when using the vectorized approach.
First, the vectorized approach (as opposed to the looped) speeds up the subscripted assignment while linear indexing is slightly slower (probably not statistically significant).
Second, the only difference in the two indexing methods comes from the size of the indices/subscripts. We want to isolate this as the only possible cause of the difference in the timings. One other major player could be JIT optimization.
The testing functions:
function B = subscriptedIndexingVect(A,row,col)
n = numel(row);
B = zeros(n);
B = A(row,col);
end
function B = linearIndexingVect(A,index)
B = zeros(size(index));
B = A(index);
end
NOTE: I keep the superfluous preallocation of B, to keep the vectorized and looped approaches comparable. In other words, differences in timings should only come from indexing and the internal implementation of the loops.
All tests are run with:
function testFun(N)
A = magic(N);
row = 1:2:N;
col = 1:2:N;
[ind_x,ind_y] = ndgrid(row,col);
index = sub2ind(size(A),ind_x,ind_y);
% isequal(linearIndexing(A,index), subscriptedIndexing(A,row,col))
% isequal(linearIndexingVect(A,index), subscriptedIndexingVect(A,row,col))
fprintf('<strong>LOOPED</strong>\n')
fprintf(' subs assignment: %.4fs\n', timeit(#()subscriptedIndexing(A,row,col)))
fprintf(' linear assignment: %.4fs\n\n',timeit(#()linearIndexing(A,index)))
fprintf('<strong>VECTORIZED</strong>\n')
fprintf(' subs assignment: %.4fs\n', timeit(#()subscriptedIndexingVect(A,row,col)))
fprintf(' linear assignment: %.4fs\n', timeit(#()linearIndexingVect(A,index)))
end
Turning JIT on/off has NO impact:
feature accel off
testFun(5e3)
...
VECTORIZED
subs assignment: 0.0303s
linear assignment: 0.0873s
feature accel on
testFun(5e3)
...
VECTORIZED
subs assignment: 0.0303s
linear assignment: 0.0871s
This excludes that subscripted assignment's superior speed comes from JIT optimization which leaves us with the only plausible cause, number of RAM accesses. It is true that the final matrix has the same number of elements. However, the linear assignment has to retrieve all elements of the index in order to fetch the numbers.
SETUP
Tested on Win7 64 with MATLAB R2015b. Prior versions of Matlab will provide different results due to recent changes in Matlab's execution engine
In fact, turning JIT off in Matlab R2014a affects timings, but only for the loops (expected result):
feature accel off
testFun(5e3)
LOOPED
subs assignment: 7.8915s
linear assignment: 6.4418s
VECTORIZED
subs assignment: 0.0295s
linear assignment: 0.0878s
This again confirms that the difference in timings between linear and sibscripted assignment should come from the number of RAM accesses, since JIT does not play a role in the vectorized approach.
It does not really surprise me that the subscript indexing is much faster here. If you take a look at your input data, the index is much smaller in this case. For the subscript indexing case you have 512 elements while for the linear indexing case you have 65536 elements.
When you apply your example to a vector instead, you will notice that there is no difference between both methods.
Here is the slightly modified code I used to evaluate different matrix sizes:
it = 400; im = zeros(512*512,1);
x = 1:2:size(im,1);
y = 1:2:size(im,2);
%// linear indexing
[ind_x,ind_y] = ndgrid(x,y);
index = sub2ind(size(im),ind_x,ind_y);
tic
for i=1:it
im(index) = im(index) + 1;
end
toc
%// subscript indexing
tic
for i=1:it
im(x,y) = im(x,y) +1;
end
toc
A very good question. Right ahead, I don't know the correct answer, however, you can analyze the behavior. Save the first toc into t1 and the second one into t2. At the end calculate t1/t2. You will recognize, changing the number of iterations or the size of your matrix does (almost) not change the factor.
I propose:
The amount of iterations only improves the quality of the tictoc. (obvious?)
The size of the matrix has no influcence, i.e. there must be a time delay in the syntax.
I imagine, that there is simply an internal check or transformation from linear index to subscript indexing, i.e. the internal addition (operation) you perform is exactly the same. It appears to be more natural to use subscript indexing instead of linear indexing, so maybe mathworks simply optimized the first.
UPDATE:
You can also simply access an element in your matrix, you will see, that using subscript index is faster than using linear index. That supports the theory, that there is a slow conversion done internally from linear to subscript.
DISCLAIMER: I don't have a MATLAB license at the moment, so the code I provide below is admittedly untested. However, if anyone decides to test, please comment on this answer accordingly.
Depending on your release of MATLAB (are you using R2015b?), there is a possibility that you may not have paid the full upfront cost of preallocation when invoking "zeros". There is a possibility that you are paying for allocation on the first get/set of im, which is causing additional but hidden overhead when you first access the values inside im.
See: http://undocumentedmatlab.com/blog/preallocation-performance
As an initial test, I suggest switching the order that you are profiling the code:
N = 512; it = 400; im = zeros(N);
%// subscript indexing
x = 1:2:N;
y = 1:2:N;
tic
for i=1:it
im(x,y) = im(x,y) +1;
end
toc %// What's the cost now?
%// linear indexing
[ind_x,ind_y] = ndgrid(1:2:N,1:2:N);
index = sub2ind(size(im),ind_x,ind_y);
tic
for i=1:it
im(index) = im(index) + 1;
end
toc %// What's the cost now?
To profile perhaps more fairly against subscript vs. linear indexing, I suggest one of two possible methods:
Make sure you incur allocation costs on both methods by creating two separate im matrices, im1 and im2, both initially set to zeros(N), and use each matrix for a separate indexing method.
Run a full get/set on each element of im before actually profiling between subscript vs. linear indexing.
Method 1:
N = 512; it = 400; im1 = zeros(N); im2 = zeros(N);
%// subscript indexing
x = 1:2:N;
y = 1:2:N;
tic
for i=1:it
im1(x,y) = im1(x,y) + 1;
end
toc %// What's the cost now?
%// linear indexing
[ind_x,ind_y] = ndgrid(1:2:N,1:2:N);
index = sub2ind(size(im2),ind_x,ind_y);
tic
for i=1:it
im2(index) = im2(index) + 1;
end
toc %// What's the cost now?
Method 2:
N = 512; it = 400; im = zeros(N);
%// Run a full get/set on each element to force allocation
tic
for i=1:N^2
im(i) = im(i) +1;
end
toc
%// subscript indexing
x = 1:2:N;
y = 1:2:N;
tic
for i=1:it
im(x,y) = im(x,y) +1;
end
toc %// What's the cost now?
%// linear indexing
[ind_x,ind_y] = ndgrid(1:2:N,1:2:N);
index = sub2ind(size(im),ind_x,ind_y);
tic
for i=1:it
im(index) = im(index) + 1;
end
toc %// What's the cost now?
I have a second hypothesis, which is that you incur some additional overhead when you explicitly declare each and every single element to be accessed, versus if you have MATLAB infer the elements for you. excasa's "duplicate post" reference (not exactly a duplicate in my humble opinion) has the same general insight, but uses different datapoints to come to this conclusion. I won't write examples of this here, but basically, creating a straight up giant array index compared to the smaller subscript indices x and y gives MATLAB less room for internal optimizations. I don't know what inside MATLAB would perform these specific optimizations, but perhaps they come from the black magic that you may know as MATLAB's JIT/LXE. If you honestly want to check if JIT is the culprit here (and are working in 2014b or prior), then you can try disabling it and then running the code above.
There are several ways to disable the JIT:
Use undocumented feature methods.
Copy/paste the commands to the command prompt, as opposed running them straight from the script editor.
Unfortunately, I do not know of a way to turn of LXE in R2015a and later, and trying to diagnose if LXE is the culprit may be a bit of an uphill battle. If this is where you are stuck, perhaps you can delve even further via MathWorks' technical support or MathWorks Central. You may be surprised to find some astounding experts from either source.

Might be growing inside the loop. Consider Prealocating for Speed

clear all
k_1 = 37.6;
miu_1 = 41;
Den = 2.7;
N = 100;
n=1;
phi(1)=1;
for n=1:N
phi(n)= 0.3*(n/N);
K_s(n)= K_1*(1-(1+(3*k_1)/(4*miu_1))*phi(n));
miu_s(n)= miu_1*(1-(1+(3*k_1)/(4*miu_1))*phi(n));
den1(n)=Den*(1-phi(n));
vp(n)=sqrt((k_s(n)+(4/3)*miu_s(n))/den1(n));
end
figure(1);
plot(phi,miu_s);
figure(2);
plot (phi,vp)
i am new on matlab and do not know what is problem with my code when i run my program only a beep buzz and nothing happens. guide me
The reason your code doesn't work is case sensitivity. You are using k_1 and K_1, and k_s and K_s (unless that is intentional). When I change that, your code compiles ok.
clear all
k_1 = 37.6;
miu_1 = 41;
Den = 2.7;
N = 100;
n=1;
phi(1)=1;
for n=1:N
phi(n)= 0.3*(n/N);
k_s(n)= k_1*(1-(1+(3*k_1)/(4*miu_1))*phi(n));
miu_s(n)= miu_1*(1-(1+(3*k_1)/(4*miu_1))*phi(n));
den1(n)=Den*(1-phi(n));
vp(n)=sqrt((k_s(n)+(4/3)*miu_s(n))/den1(n));
end
figure(1);
plot(phi,miu_s);
figure(2);
plot (phi,vp)
when programming in MatLab, is usually a good practice to prealocate variables instead of declaring them in a loop. In this way, MatLab creates the object just once and changes each of it's values once in the loop. Otherwise you will be declaring a new variable and writing all its contents every loop iteration which is a costly process. Your Code might be working but be extreeeeemly slow, leading you to think nothing is happening. Try prealocating all the variables inside the loop with the zeros() function like this:
phi=zeros(N,1);
phi(1)=1;
K_s=zeros(N,1);
%... and so on for all your variables inside the loop
for n=1:N
phi(n)= 0.3*(n/N);
K_s(n)= k_1*(1-(1+(3*k_1)/(4*miu_1))*phi(n));
miu_s(n)= miu_1*(1-(1+(3*k_1)/(4*miu_1))*phi(n));
den1(n)=Den*(1-phi(n));
vp(n)=sqrt((k_s(n)+(4/3)*miu_s(n))/den1(n));
end
Hope that helps
You are doing a lot of unnecessary things here, including that entire loop.
For example:
N = 100;
n=1; %this value is never used
phi(1)=1; % this is overwritten in loop
for n=1:N
phi(n)= 0.3*(n/N);
... (loop continues)
You don't need a loop here. Instead, work on whole vectors
N = 100;
n = 1:100; %predefine vector
phi = 0.3*(n/N); % outputs vector of phi from 0.003 to 0.3
For cases when you are combining multiple vectors remember to use ./ and .* for element-wise divison and multiplication, e.g. the last equation will end up being:
vp=sqrt((k_s+(4/3)*miu_s)./den1);

parfor loop has wrong sliced variables

Can anyone explain to me, why the following gives an error for u but nor for h
max_X = 100;
max_Y = 100;
h = ones(max_Y,max_X);
u = zeros(max_Y,max_X);
parfor l=1:max_X*max_Y
i = mod(l-1,max_X) + 1;
j = floor((l-1)/max_Y) + 1;
for k=1:9
m = i + floor((k-1)/3) - 1;
n = j + mod(k,-3) + 1;
h_average(k) = sqrt(h(i,j)*h(m,n));
u_average(k) = (u(i,j)*sqrt(h(i,j)) + u(m,n)*sqrt(h(m,n)))/(sqrt(h(i,j)) + sqrt(h(m,n)));
end
end
I can now substitute (i,j) with (l), but even if I try to calculate the related variable, let's call it p, according to (m,n), and write u(p) instead of u(m,n) it gives me an error message.
It only underlines the u(m,n), resp. u(p) but not the h(m,n).
MATLAB says:
Explanation:
For MATLAB to execute parfor loops efficiently, the amount of data sent to the MATLAB workers must be minimal. One of the ways MATLAB achieves this is by restricting the way variables can be indexed in parfor iterations. The indicated variable is indexed in a way that is incompatible with parfor.
Suggested Action
Fix the indexing. For a description of the indexing restrictions, see “Sliced Variables” in the Parallel Computing Toolbox documentation
Any idea, what's wrong here?
The problems with u and h are that they are both being sent as broadcast variables to the PARFOR loop. This is not an error - it's just a warning indicating that more data than might otherwise be necessary is being sent.
The PARFOR loop cannot run because you're indexing but not slicing u_average and h_average. It's not clear what outputs you want from this loop since you're overwriting u_average and h_average each time, therefore the PARFOR loop is pointless.

use of parfor in matlab for a lattice boltzmann code

i'm working on lattice boltzmann method and i've written a matlab code.
I would like to parallelize some parts of the code but i'm new to this so i'd appreciate your help.
I'd like to know if it's possible to use the parfor for this part(collision operator):
for i=1:lx
for j=1:ly
fork=1:9
f(k,i,j)=f(k,i,j) .* (1 - omega) + omega .* feq(k,i,j);
end
end
end
I've tried to replace the outermost for loop with a parfor but the code seems to be slower.
any suggestions?
thanks in advance
You should be able to do this whole operation with a single line of code without the loops:
f = f.*(1 - omega) + omega .* feq;
On my computer with 2 cores and starting with:
f = rand(9,400,400);
feq = rand(9,400,400);
[lx,ly,lz] = size(f);
omega = rand(1);
your loop takes 0.087933 seconds, the parfor loop takes 1.166662 seconds, and this method takes 0.009388 seconds. If you can, always vectorize your code.