I am doing parallel computations with MATALB parfor. The code structure looks pretty much like
%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
mid_fitnesses = zeros(1, numel(new_indi_idices));
right_fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
bitmap = bitmaps{idx};
if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
[mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
mid_fitness = 100+mid_dsp;
right_fitness = 100+right_dsp;
else % porosity not even qualified
mid_fitness = 0;
right_fitness = 0;
end
mid_fitnesses(idx) = mid_fitness;
right_fitnesses(idx) = right_fitness;
fprintf('Done.\n');
pause(0.01); % for break
end
I encountered the following weird error.
Error using parallel.internal.pool.deserialize (line 9)
Bad version or endian-key
Error in distcomp.remoteparfor/getCompleteIntervals (line 141)
origErr =
parallel.internal.pool.deserialize(intervalError);
Error in nsga2 (line 57)
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
How should I fix it? A quick Google search returns no solution.
Update 1
The weirder thing is the following snippet works perfectly under the exactly same settings and the same HPC. I think there might be some subtle differences between them two, causing one to work and the other to fail. The working snippet:
%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
bitmap = bitmaps{idx};
if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
displacement = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
fitness = 100+displacement;
else % porosity not even qualified
fitness = 0;
end
fitnesses(idx) = fitness;
%fprintf('Done.\n', gen, idx);
pause(0.01); % for break
end
pop(3, new_indi_idices) = num2cell(fitnesses);
Update 2
Suspecting [mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]); causes me trouble, I replace it with
mid_dsp = rand();
right_dsp = rand();
Then, it works! This proves that this is indeed caused by this particular line. However, I do have tested the function, and it returns two numbers correctly! Since the function returns value just as rand() does, I can't see any difference. This confuses me more.
I had the same issue and it came out that Matlab 2015 is reserving all necessary memory resources for each of the loops in the parfor resulting in memory break shortage.
The error message is tricky. After fine tuning the code in the loop and providing 120GB of RAM from the SSD through system setting in Pagefile in Windows 10, the parfor executed beautifully.
After working a while on my own similar code block, I've decided that this is actually a memory issue.
I'm using a 6 core 4GHz CPU and 8 gigs of RAM and seen this issue (on MATLAB 2014b) when I set the worker count high, and did not have any problems with low worker counts.
When I use 6 or more workers (which is not ideal I know), memory consumption is high and this error message pops out sporadically. Also I have seen various out of memory errors in my tests.
I havent seen the error when I use 5 or less workers thus far, and I'm pretty sure some memory limit (possibly inside a java code block) is causing this issue by preventing some of the results' integrity (or existance)
Hope you can resolve this issue by reducing the worker count.
Related
This has got me stumped.
I've written a function parObjectiveFunction that runs several simulations in parallel using createJob and createTask. It takes as an argument an objectiveFunction which is passed deeper into the code to calculate the objective function value for each simulation.
When I run parObjectiveFunction from the directory where objectiveFunction is found, it works as expected, but when I go one level up, it can no longer find objectiveFunction. The specific error I get is
Error using parallel.Job/fetchOutputs (line 1255)
An error occurred during execution of Task with ID 1.
Error in parObjectiveFunction (line 35)
taskoutput = fetchOutputs(job);
Caused by:
Error using behaviourObjective/getPenalty (line 57)
Undefined function 'objectiveFunction' for input arguments of type 'double'.
(behaviourObjective is an object)
This is weird for several reasons.
objectiveFunction is definitely in path, and when I try which objectiveFunction, it points to the correct function.
I have other components of the deeper code in other directories, and they are found without issue (they are objects rather than functions, but that shouldn't make a difference).
There's a line of code in parObjectiveFunction that runs the simulation, and when I run that directly in the matlab command window it finds objectiveFunction without issue.
I get the same results on my local machine and an HPC server.
My first thought was that the individual task might have its own path which didn't include objectiveFunction, but then that should cause problems for the other components (it doesn't). The problem is compounded because I can't work out how to debug the parallel code.
What am I doing wrong? Code that produced the issue is below.
Are there any known issues where matlab can't find functions when
using parallel processing with createJob, createTask, submit
and fetchOutputs?.
How can you debug in matlab when the issue is
only when operating in parallel? None of my print statements appear.
To make something work for external testing would take quite a bit of hacking, but for the sake of the question, the parallel function is:
function penalty = parObjectiveFunction(params, objectiveFunction, N)
% Takes a vector of input parameters, runs N instances of them in parallel
% then assesses the output through the objectiveFunction
n = params(1);
np = params(2);
ees = params(3);
ms = params(4);
cct = params(5);
wt = params(6);
vf = params(7);
dt = 0.001;
bt = 10;
t = 10;
c = parcluster;
job = createJob(c);
testFunction = #(run_number)behaviourObjective(objectiveFunction,n,np,ees,ms,cct,wt,vf,t,dt,bt,run_number);
for i = 1:N
createTask(job, testFunction, 1, {i});
end
submit(job);
wait(job);
taskoutput = fetchOutputs(job);
pensum = 0;
for i = 1:N
pensum = pensum + taskoutput{i}.penalty;
end
penalty = pensum/N;
end
It sounds like you need to attach some additional files to your job. You can see which files were picked up by MATLAB's dependency analysis by running listAutoAttachedFiles, i.e.
job.listAutoAttachedFiles()
If this isn't showing your objectiveFunction, then you can manually attach this by modifying the AttachedFiles property of the job.
It seems as though objectiveFunction is a function_handle though, so you might need to something like this:
f = functions(objectiveFunction)
job.AttachedFiles = {f.file}
I am using parfor for parallel computing in Matlab. I am not familiar with this command. If that is possible, please look at my code below and tell me if I can write it with parfor.
These errors and warnings are appear in Matlab Editor:
The parfor loop cannot be run due to the way variable Dat is used. (when I comment line Dat.normXpj = normXpj(pj,:); This error is solved and other errors similar to the following error is appeared.
The entire array or structure Bound is broadcast variable. This
might result in unnecessary communication overhead.
parfor pj = 1:size(normXpj,1)
Dat.normXpj = normXpj(pj,:);
if size(Dat.InitialGuess)==0
X = (Bound(:,1)+(Bound(:,2)-Bound(:,1)).*rand(Nvar,1))';
else
X = Dat.InitialGuess;
end
[Xsqp, ~, FLAG,Options] = mopOPT(X,Dat);
FEVALS = Options.funcCount;
FES = FES+FEVALS;
PSet(pj,:) = Xsqp;
PFront(pj,:) = mop(Xsqp,Dat,0);
if FLAG==-2
disp('.......... Algo paso...');
else
F = PFront(pj,:);
if Nobj==2
plot(F(1,1),F(1,2),'*r'); grid on; hold on;
elseif Nobj==3
end
end
end
The problem here is that it we can see that you're not using Dat in a way that is order-dependent, but the static analysis machinery of parfor cannot deduce that because of the way you're assigning into it. I think you can work around this by instead creating a whole new Dat for each iteration of the loop, like so:
Dat = struct('normXpj', rand(10,1), 'InitialGuess', 3);
normXpj = rand(10);
parfor idx = 1:10
tmpDat = struct('normXpj', normXpj(:,idx), 'InitialGuess', Dat.InitialGuess);
% use 'tmpDat'
disp(tmpDat);
end
The answer is no, unfortunately. At line:
Dat.normXpj = normXpj(pj,:);
you assign a value to Dat.normXpj, but you have to know that in a parfor loop there can be multiple iterations executing at the same time. So what value should be used for Dat.normXpj ? Matlab cannot decide, hence your error.
More generally, your code looks quite messy. I suppose you want to use parfor to increase execution speed. Probably a more efficient option would be to use the profiler (see profile) to detect the bottlenecks in your code, and apply a correction if that's possible.
Best,
I have this script in Matlab
struct = svmTraining(feature_train,class_final_train);
svmclassify(struct,feature_test);
but, after 5 seconds the following message appears
??? Error using ==> svmclassify at 117
An error was encountered during classification.
Out of memory. Type HELP MEMORY for your options.
Help me, Thanks
I was able to solve this same problem for myself by calling the svmclassify() function on successive subsets of the test data. For some reason it needs an enormous amount of memory if you give it a large array of test data.
So here is something that worked for me
numExemplars = size(testData,1);
chunkSize = 1000;
j=1:chunkSize:numExemplars;
classifications = zeros(numExemplars,1); %initialize
for i=1:length(j)-1;
index1 = j(i);
index2 = j(i+1)-1;
fprintf('classifying exemplars %d to %d\n', index1, index2 );
chunk = testData(index1:index2,:);
classifications(index1:index2) = svmclassify(SVM_struct,chunk);
end
% last bit of data
chunk = testData(j(end):numExemplars,:);
classifications(j(end):numExemplars) = svmclassify(SVM_struct,chunk);
The error means you do not have enough memory available on your machine to carry out the classification.
Firstly, try repeating the commands with a freshly started MATLAB, without creating any more variables than necessary, and with no other applications running.
If that doesn't work, then essentially you will need to either work with a smaller dataset, or get more memory for your machine.
I have written the following code in MATLAB to process large images of the order of 3000x2500 pixels. Currently the operation takes more than half hour to complete. Is there any scope to improve the code to consume less time? I heard parallel processing can make things faster, but I have no idea on how to implement it. How do I do it, given the following code?
function dirvar(subfn)
[fn,pn] = uigetfile({'*.TIF; *.tiff; *.tif; *.TIFF; *.jpg; *.bmp; *.JPG; *.png'}, ...
'Select an image', '~/');
I = double(imread(fullfile(pn,fn)));
ld = input('Enter the lag distance = '); % prompt for lag distance
fh = eval(['#' subfn]); % Function handles
I2 = uint8(nlfilter(I, [7 7], fh));
imshow(I2); % Texture Layer Image
imwrite(I2,'result_mat.tif');
% Zero Degree Variogram
function [gamma] = ewvar(I)
c = (size(I)+1)/2; % Finds the central pixel of moving window
EW = I(c(1),c(2):end); % Determines the values from central pixel to margin of window
h = length(EW) - ld; % Number of lags
gamma = 1/(2 * h) * sum((EW(1:ld:end-1) - EW(2:ld:end)).^2);
end
The input lag distance is usually 1.
You really need to use the profiler to get some improvements out of it. My first guess (as I haven't run the profiler, which you should as suggested already), would be to use as little length operations as possible. Since you are processing every image with a [7 7] window, you can precalculate some parts,
such that you won't repeat these actions
function dirvar(subfn)
[fn,pn] = uigetfile({'*.TIF; *.tiff; *.tif; *.TIFF; *.jpg; *.bmp; *.JPG; *.png'}, ...
'Select an image', '~/');
I = double(imread(fullfile(pn,fn)));
ld = input('Enter the lag distance = '); % prompt for lag distance
fh = eval(['#' subfn]); % Function handles
%% precalculations
wind = [7 7];
center = (wind+1)/2; % Finds the central pixel of moving window
EWlength = (wind(2)+1)/2;
h = EWlength - ld; % Number of lags
%% calculations
I2 = nlfilter(I, wind, fh);
imshow(I2); % Texture Layer Image
imwrite(I2,'result_mat.tif');
% Zero Degree Variogram
function [gamma] = ewvar(I)
EW = I(center(1),center(2):end); % Determines the values from central pixel to margin of window
gamma = 1/(2 * h) * sum((EW(1:ld:end-1) - EW(2:ld:end)).^2);
end
end
Note that by doing so, you trade performance for clearness of your code and coupling (between the function dirvar and the nested function ewvar). However, since I haven't profiled your code (you should do that yourself using your own inputs), you can find what line of your code consumes the most time.
For batch processing, I would also recommend to leave out any input, imshow, imwrite and uigetfile. Those are commands that you typically call from a more high-level function/script and that will force you to enter these inputs even when you want them to stay the same. So instead of that code, make each of the variables they produce (/process) a parameter (/return value) for your function. That way, you could leave MATLAB running during the weekend to process everything (without having manually enter to all those values), even if you are unable to speed up the code.
A few general purpose tricks:
1 - use the MATLAB profiler to determine all the computational bottlenecks
2 - parallel processing can make things faster and there are a lot of tools that you can use, but it depends on how your entire code is set up and whether the code is optimized for it. By far the easiest trick to learn is parfor, where you can replace the top level for loop by parfor. This does mean you must open the MATLAB pool with matlabpool open.
3 - If you have a rather recent Nvidia GPU as well as MATLAB 2011, you can also write some CUDA code.
All in all 30 mins to me is peanuts, so don't fret it too much.
First of all, I strongly suggest you follow the advice by #Egon: Write a separate function that collects a list of files (the excellent UIPICKFILES from the FEX is your friend here), and then runs your filtering code in a loop for each image. Note that you should definitely keep the call to imwrite in your filtering code: In case the analysis crashes at image 48 (e.g. due to power failure), you don't want to lose all the previous work.
Running thusly in batch mode has two big advantages: (1) you can start running your code and go home for the week-end, and (2) you can easily parallelize this outside loop using PARFOR. However, with only a dual-core machine, it is unlikely that you get any significant improvements from parallelization - your OS also wants to run stuff at times, and the overhead of parallelization might be more than the gain from running two workers. Also, 2.5GB of RAM is seriously limiting.
As to your specific code: in my experience using IM2COL is often faster than NLFILTER. im2col creates a nElementsInMask-by-nMasks array out of your image, so that you can apply the filtering in one single operation. With a 7x7 window, the output of im2col will be 3000*2500*49 bytes, which is close to 400MB. Thus, it should just work. All that you need to do is rewrite ewvar so that it works on a 49x1 array of pixels that make up the pixels your mask, which will require some index juggling, if I understand your code correctly.
When multiplying two matrices, I tried the following two options:
1)
res = X*A;
2)
for i = 1:size(A,2)
res(:,i) = X*A(:,i);
end
I preallocated memory for res in both. And surprisingly, I found option 2 to be faster.
Can someone explain how this is so?
edit:
I tried
K=10000;
clear t1 t2
t1=zeros(K,1);
t2=zeros(K,1);
for k=1:K
clear res
x = rand(100,100);
a = rand(100,100);
tic
res = x*a;
t1(k) = toc;
end
for k=1:K
clear res2
res2 = zeros(100,100);
x = rand(100,100);
a = rand(100,100);
tic
for i = 1:100
res2(:,i) = x*a(:,i);
end
t2(k) = toc;
end
I run both codes in a loop 1000 times. In average (but not always) the first vectorized code was 3-4 times faster. I cleared the result variables and preallocated before starting timer.
x = rand(100,100);
a = rand(100,100);
K=1000;
clear t1 t2
t1=zeros(K,1);
t2=zeros(K,1);
for k=1:K
clear res
tic
res = x*a;
t1(k) = toc;
end
for k=1:K
clear res2
res2 = zeros(100,100);
tic
for i = 1:100
res2(:,i) = x*a(:,i);
end
t2(k) = toc;
end
So, never make a timing conclusion based on a single run.
I believe I can chime in on the variation in timings between the two methods, as well as why people are getting different relative speeds.
Before Matlab version 2008a (or a version near that release), for loops took a major hit in any Matlab code because the interpreter (a layer between the very readable script and a lower level implementation of the code) would have to re-interpret the code each time through the for loop.
Since that release, the interpreter has gotten progressively better so, when running a modern version of Matlab, the interpreter can look at your code and say "Ah ha! I know what he is doing, let me optimize it just a bit" and avoid the hit it would otherwise take by reinterpreting the code.
I would expect the two ways of performing matrix multiplies to evaluate in the same amount of time, why the for loop implementation runs faster is because of some detail in the optimizations of the interpreter that us mere mortals are not privy to know.
One broad lesson we should take from this, is not all versions are equal. I do work on a couple of bleeding edge cases using two Matlab add ons, the SimBiology and the Parallel Computing Toolboxes, both of which (especially if you want them to work together) are version dependent in speed of execution, and from time to time other stability issues. As such, I keep the three most recent releases of Matlab, will test that I get the same answers out of each version, and I'll occasionally roll back to an earlier version if I find issues with some features. This is probably overkill for most people, but gives you an idea of version differences.
Hope this helps.
Edits:
To clarify, code vectorization is still important. But given a script like:
x_slow = zeros(1,1e5);
x_fast = zeros(1,1e5);
tic;
for i=1:1e5
x_slow(i) = log(i);
end
time_slow = toc; % evaluates for me in .0132 seconds
tic;
x_fast = log(1:1e5);
time_fast = toc; % evaluates for me in .0055 seconds
The disparity between time_slow and time_fast has reduced in the past several versions based on improvements in the interpreter. The example I saw I believe was on 2000a vs. 2008b, but that's subject to my recollection.
There is something else that might be going on that was addressed by Oli and Yuk. There is often a difference between the time_1 and time_2 in:
tic; x = log(1:1e5); time_1 = toc
tic; x = log(1:1e5); time_2 = toc
So the test of one million evaluations vs. one evaluation is valuable, depending on where in memory x is (in cache or no).
Hope this helps again.
This may well be an effect of caching. a is already in the cache by the time you do the second version, so it has an advantage. Try creating an independent set of inputs to make it fair. Also, it's probably better to measure the time of e.g. 1 million iterations of this, in order to eliminate typical variations due to outside effects.
It looks to me that you are not multiplying matrix properly, you need to sum all the products from ith row of X matrix and jth column of A matrix, that might be a reason.
Look here to see how it's done.