The script is as follows:
Lambdass = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000];
numcores = feature('numcores'); % get the number of cpu cores
num_slices = floor(length(Lambdass)/numcores); % get the number of slices for parallel computing
if mod(length(Lambdass), numcores)~=0
num_slices = num_slices + 1;
end
for slice_i=1:num_slices
if slice_i~=num_slices
Lambdas = Lambdass(((slice_i-1)*numcores+1):((slice_i)*numcores));
else
Lambdas = Lambdass(((slice_i-1)*numcores+1):end);
end
% start the parallel processing
myparpool = parpool(length(Lambdas))
parfor li = 1:length(Lambdas)
% spmd
lambda = Lambdas(li);
save_path1 = sprintf('results/lambda_%f/', lambda);
if ~exist([save_path1, '/fs_results.mat'], 'file')
do_something_and_save_results(lambda, save_path1);
end
end
delete(myparpool)
end
This script could be run correctly in one computer, but in another computer, the parfor seems to not work correctly and there are some warning information as follows, and the parfor finally runs without parallel mode, just seems the for in sequential. Could anyone help give some advice, please?
Starting parallel pool (parpool) using the 'local' profile ... Warning: Could not launch SMPD process manager. Using fallback parallel mechanism.
> In SmpdGateway>SmpdGateway.canUseSmpd at 81
In Local.hSubmitCommunicatingJob at 15
In CJSCommunicatingJob>CJSCommunicatingJob.submitOneJob at 81
In Job.Job>Job.submit at 302
In InteractiveClient>InteractiveClient.start at 327
In Pool.Pool>iStartClient at 537
In Pool.Pool>Pool.hBuildPool at 434
In parpool at 104
Related
I am running multiple simulations using the parfor loop of MATLAB. The simulations run in a time-slotted fashion.
In these simulations, flows arrive at a server, finish there session and leave the server, of course, the arrival process is random (i.e: the departure process is also random). For each simulation, there are 100 arrivals, which means that for 2 simulations, under the parfor loop, the 100 arrivals are less likely to occur at the same slot in the 2 simulations (in other words: it is less likely that 2 simulations are perfectly identical).
I am calculating some metrics at the end of each simulation. After running 20 simulations I observe that for some simulations the values of the metrics are identical, see lines: 2, 4, 6, 8, 12, 15, 17 and 20:
1 12.1380000000000 8.67000000000000e-07 2126951.79378669
2 38.3040000000000 2.73600000000000e-06 964079.569727887
3 7.95200000000000 5.68000000000000e-07 2724654.56890640
4 38.3040000000000 2.73600000000000e-06 964079.569727887
5 21.0080000000000 1.50057142857143e-06 1653785.58341616
6 38.3040000000000 2.73600000000000e-06 964079.569727887
7 21.0080000000000 1.50057142857143e-06 1653785.58341616
8 38.3040000000000 2.73600000000000e-06 964079.569727887
9 11.9820000000000 8.55857142857143e-07 1827114.39301842
10 7.63000000000000 5.45000000000000e-07 2662037.48091595
11 8.28400000000000 5.91714285714286e-07 2584669.40096182
12 38.3040000000000 2.73600000000000e-06 964079.569727887
13 16.7040000000000 1.19314285714286e-06 1745020.01488049
14 20.6480000000000 1.47485714285714e-06 1131378.20827498
15 38.3040000000000 2.73600000000000e-06 964079.569727887
16 9.45400000000000 6.75285714285714e-07 2330783.95992713
17 38.3040000000000 2.73600000000000e-06 964079.569727887
18 20.5960000000000 1.47114285714286e-06 1349336.77768965
19 9.54400000000000 6.81714285714286e-07 2344366.38257795
20 38.3040000000000 2.73600000000000e-06 964079.569727887
Getting results that are very identical makes me think that the simulations were perfectly identical also, which means that the arrival/departure to/from the server occurs at the same slots in different simulations.
Why I am getting perfectly similar results when the simulations are random?
If simulations are perfectly identical they will not be useful for me since I want to use results to determine confidence intervals...
I tried my best to make the question clear unfortunately I can not put code here (1xxx line of code in the main function)
EDIT
I tried to simulate the arrival process only:
The script to launch simulations in parallel:
numSim = 20;
numSlotPerSec = 10;
arrivalRatesVec = [0.2 0.3];
parfor i = 1:numSim
[arrivals] = simulationFunction(numSlotPerSec, i,arrivalRatesVec);
end
The function that runs the simulations:
function [arrivals] = simulationFunction(numSlotPerSec, numSim, arrivalRatesVec)
arrivals = [];
numSlots = 1;
nextArrival = 1 + round(numSlotPerSec*exprnd(1/sum(arrivalRatesVec)));
arrivals = [arrivals nextArrival];
output_file = ['ResTest_' num2str(numSim)];
while numSlots < 10000
numSlots = numSlots + 1;
nextArrival = nextArrival + max(1, round(numSlotPerSec*exprnd(1/sum(arrivalRatesVec))));
arrivals = [arrivals nextArrival];
eval(['arrivals_' num2str(numSim) '= arrivals']);
save(output_file,['arrivals_', num2str(numSim)]);
end
end
I noticed that arrivals occur in different slots but I still don't understand why my metrics are perfectly identical.
The function eval can cause problems when it is called in a parfor loop? I saw on MATLAB that it might not access the correct workspace.
EDIT 2
I launched 20 simulations again in parallel just to see if the arrival of events happens at the same slots
Figures below (corresponding to simulation 2 and 13) that users arrive at the server at the same slots (see columns "entry" for arrivals and "exit" for departures)
Which means that these simulations are identical.
Also, I tried to check if eval causes the problem, so I saved one of my metrics without using eval and turns out that there are duplicated values.
I'm using MATLAB 2016a on win 10 64bit OS. I run my program which is almost a complicated simulation of an engineering problem.
Question is that i use parfor and there is 2 other for loops in this program. I've been cautious to use minimum for loops and using array smart and built in commands such as repmat, bsxfun and etc. to avoid for loops. when i run the program it goes quite a while nice and stores results for me but suddenly after some iterations i encounter this error:
"All workers aborted during execution of the parfor loop."
and program terminates. I'm using a powerful system with these specs:
corei7 CPU intel 4720HQ, 16 GB RAM DDR4, 8MB cache, GPU: Geforce GTX 970M.
An example will be like this (although main program is very more demanding from both memory and computational point of view and I've omitted many lines and also 3 functions are called which are not included here):
lambda = 5e-5;
tau = 10.^((-5:25)*0.1);
tau = 0.3;
eta = 1.5;
b = 0.3;
c = 0.4;
beta = (0:90)';
x = (0.01:1000+0.01)';
r = (80.21:800+80.21)';
h = (10:0.1:30.5)';
Lh = length(h);
Lr = length(r);
Lx = length(x);
N = 6;
binom_coeff = factorial(N)*ones(N,1)./(factorial((1:N)').*factorial((N(1:N))'));
pdf_x = 2*pi*x*lambda.*exp(-pi*lambda*x.^2);
pdf_R = 2*pi*lambda*r.*exp(-pi*lambda*r.^2);
theta_l = atan(repmat(h,1,Lr)./repmat(r',Lh,1))*180/pi;
ratio = sqrt(repmat(h,1,Lr)+repmat(r',Lh,1));
coverage = zeros(size(beta_m));
Integrand_x = zeros(size(x));
Y = (b*h+c)*(1-a);
for k=1:length(beta_m)
for thr = 1:length(tau)
parfor i=1:Lx
temp = (-1)*eta*tau(thr)*(G_l/G_0.*( ratio/sqrt(x(i)^2+h_0^2)).^(-v));
temp_N = repmat(temp,1,N).*reshape(repmat(1:N,size(temp,1)*size(temp,2),1),size(temp,1),size(temp,2)*N);
Integrand = (1-(trapz(h,exp(temp_N).*repmat(Y,1,Lr*N))))';
Integrand_x(i) = exp(trapz(r,(Integrand * binom_coeff)));
end
coverage(thr,k) = trapz(x,pdf_x.*Integrand_x);
end
end
savepar = ['FinalMainRes_longheiv',num2str(v),'h0',num2str(h_0),'a',num2str(a),'.mat'];
save(savepar)
It's worth to mention that running with just one worker does not crush (although it took about 4 days to complete the run).
What is the problem and how can i prevent it. Any help is appreciated.
Thanks in advance.
I've used spmd to calculate two piece of code simultaneously. The computer which I'm using have a processor with 8 cores.which means the communication overhead is something like zero!
I compare the running time of this spmd block and same code outside of spmd with tic & toc.
When I run the code, The parallel version of my code take more time than the sequential form.
Any idea why is that so?
Here is a sample code of what I'm talking about :
tic;
spmd
if labindex == 1
gamma = (alpha*beta);
end
if labindex == 2
for t = 1:T,
for i1=1:n
for j1=1:n
kesi(i1,j1,t) = (alpha(i1,t) + phi(j1,t));
end;
end;
end;
end
end
t_spmd = toc;
tic;
gamma2= (alpha * beta);
for t = 1:T,
for i1=1:n
for j1=1:n
kesi2(i1,j1,t) = (alpha(i1,t) + phi(j1,t));
end;
end;
end;
t_seq = toc;
disp('t spmd : ');disp(t_spmd);
disp('t seq : ');disp(t_seq);
There are two reasons here. Firstly, your use of if labindex == 2 means that the main body of the spmd block is being executed by only a single worker - there's no parallelism here.
Secondly, it's important to remember that (by default) parallel pool workers run in single computational thread mode. Therefore, when using local workers, you can only expect speedup when the body of your parallel construct cannot be implicitly multi-threaded by MATLAB.
Finally, in this particular case, you're much better off using bsxfun (or implicit expansion in R2016b or later), like so:
T = 10;
n = 7;
alpha = rand(n, T);
phi = rand(n, T);
alpha_r = reshape(alpha, n, 1, T);
phi_r = reshape(phi, 1, n, T);
% In R2016b or later:
kesi = alpha_r + phi_r;
% In R2016a or earlier:
kesi = bsxfun(#plus, alpha_r, phi_r);
I am trying to optimize the time efficiency of the following MATLAB code, it currently takes in excess of 4 hours to run (I have preallocated the two structures just not included that part here):
for combination = 1:1771
for hankel_size = 1:4;
for window = 1:999
Output.bin_r(:, window, combination, hankel_size) = bsxfun(#minus, data.hankel_index_mean(window, combination ,hankel_size),centers(window, :, hankel)');
Output.score(window, combination, hankel_size) = probs(window, :, hankel_size)*Output.bin_r(:, window, combination, hankel_size);
end
end
end
Note that:
centers is a 999 x 50 x 4 matrix
hankel_index_mean is a 999 x 1771 x 4 matrix
probs is a 999 x 50 x 4 matrix
Thanks for your help in advance!
parfor combination = 1:1771
for hankel_size = 1:4;
for window = 1:999
Output.bin_r(:, window, combination, hankel_size) = bsxfun(#minus, data.hankel_index_mean(window, combination ,hankel_size),centers(window, :, hankel)');
Output.score(window, combination, hankel_size) = probs(window, :, hankel_size)*Output.bin_r(:, window, combination, hankel_size);
end
end
end
parfor utilises all the cores in your CPU.
Open a parallel pool by either the matlab default which opens one on the calling of parallel functions (eg parfor or spmd) or open one explicitly by calling parpool or gcp.
Edit parallel preferences under Home->parallel->parallel preferences.
Asking for random numbers in a parallel loop always return the same pseudo random numbers. How can I avoid this?
% workers initialization:
if matlabpool('size') == 0
matlabpool('open',2);
else
matlabpool('close');
matlabpool('open',2);
end
% parallel loop always give the same random numbers...
parfor k = 1:10
fprintf([num2str(rand(1,1)), ' ']);
end
One ideal solution would be to initialize the pseudo random number generator in each thread by CPU time or similar. Things like rng('shuffle') don't seem to help here...
console output:
Sending a stop signal to all the workers ... stopped.
Starting matlabpool using the 'local' profile ... connected to 2 workers.
0.32457 0.66182 0.63488 0.64968
0.26459 0.096779 0.50518 0.48662 0.034895 0.85227
There is documentation here about various options here. Here's one way you might do something close.
numWorkers = matlabpool('size');
[streams{1:numWorkers}] = RandStream.create('mrg32k3a', ...
'Seed', 'shuffle', 'NumStreams', numWorkers);
spmd
RandStream.setGlobalStream(streams{labindex});
end
Or, to avoid creating all the streams at the client, you could do this instead:
rng('shuffle'); % shuffle the client
workerSeed = randi([0, 2^32-1]);
spmd
stream = RandStream.create('mrg32k3a', ...
'Seed', workerSeed, ...
'NumStreams', numlabs, ...
'StreamIndices', labindex);
RandStream.setGlobalStream(stream);
end