Parallel image processing with MATLAB - matlab

I have written a MATLAB program which performs calculations on a video. I think it is a perfect candidate to adapt to multiple cpu cores as there is a lot of averaging done. I am just sturggling with the first bit of sending each section of frames to each lab. Say (for simplicity) it is a 200 frame file. I have read some guides and using SPMD gotten this.
spmd
limitA = 1;
limitB = 200;
a = floor(((limitB-limitA)/numlabs)*(labindex-1)+limitA);
b = floor((((limitB-limitA)/numlabs)*(labindex-1)+limitA)+(((limitB-limitA)/numlabs)));
fprintf (1,'Lab %d works on [%f,%f].\n',labindex,a,b);
end
It successfully outputs that each worker will work on their respective section (Eg Lab 1 works on 1:50, Lab 2 50:100 etc).
Now where I am stuck is how do actually make my main body of code work on each Lab's section of frames. Is there a tip or an easy way to now edit my main code so it knows what frames to work on based on labindex? Adding spmd to the loop results in an error hence my question.
Thanks

Following on from what you had, don't you simply need something like this:
spmd
% each lab has its own different values for 'a' and 'b'
for idx = a:b
frame = readFrame(idx); % or whatever
newFrame = doSomethingWith(frame);
writeFrame(idx, newFrame);
end
end
Of course, if that's the sort of thing you're doing, you may need to serialize the frame writing (i.e. make sure only one process at a time is writing).

Related

Framework for Managing Matlab Runs

TL;DR: How should custom simulation runs be managed in Matlab? Detailed Questions at the end.
I am working with matlab where i created some code to check the influence of various parameters on a simulated system. It has a lot of inputs and outputs but a MWE would be:
function [number2x,next_letter] = test(number, letter)
number2x = number * 2;
next_letter = letter + 1;
disp(['next letter is ' next_letter])
disp(['number times 2 is ' num2str(number2x)])
end
This works if this is all there is to test. However with time multiple new inputs and outputs had to be added. Also because of the growing number of paramters that have been test some sort of log had to be created:
xlswrite('testfile.xlsx',[num2str(number), letter,num2str(number2x),next_letter],'append');
Also because the calculation takes a few hours and should run over night multiple parameter sets had to be started at one point. This is easily done with [x1,y1] = test(1,'a');[x2,y2] = test(2,'b'); in one line or adding new tasks while the old still run. However this way you can't keep track on how many are still open.
So in total I need some sort of testing framework, that can keep up with changeging inpus and outputs, keeps track on already doen calculations and ideally also handles the open runs.
I feel like i can't be the only one who faces this issue, in fact I think so many people face this issue that Mathworks would already came up with a solution.
For Simulink this has been done in form of a Simluationmanager, but for Matlab functions the closest thing i found is the Testing framework (example below) which seems to be rather for software development and debugging and not at all for what i am trying. And somepoint there seem to be 3rd party solutions but they are no longer continued in favor of this Testing framework.
function solutions = sampleTest
solutions = functiontests({#paramtertest});
end
function paramtertest(vargin)
test(1,'a');
test(2,'b');
end
function [number2x,next_letter] = test(number, letter)
number2x = number * 2;
next_letter = letter + 1;
disp(['next letter is ' next_letter])
disp(['number times 2 is ' num2str(number2x)])
xlswrite('testfile.xlsx',[num2str(number), letter,num2str(number2x),next_letter],'append');
end
Alternatively I could create my test as a class, create an interface similar to the Simulationmanager, create numerous functions for managing inputs and outputs and visualize the progress and then spawn multiple instances of if i want to set up a new set of parameters while already running a simulation. Possible, yet a lot of work that does not involve the simulation directly.
In total following questions arise:
Is there a build in Matlab function for managing simulations that i totally missed so far?
Can the the Testing framework be used for this purpose?
Is there already some Framework (not from Mathworks) that can handle this?
If i create my own class, could multiple instances run individually and keep track of their own progress? And would those be handled simultaniously or would matlab end up running the in the order they started?
I know this question is somewhat in the off-topic: recommend or find a tool, library or favorite off-site resource area. If you feel it is too much so, please focus on the last question.
Thank you!
I've done similar tests using GUI elements. Basic part of simulation was inside while loop, for example in your case:
iter = 0;
iter_max = 5; %the number of your times, you will call script
accu_step = 2; %the accuracy of stored data
Alphabet = 'abcdefghijklmnopqrstuvwxyz'
while iter < iter_max
iter = iter+1;
[x1,y1] = test(i,Alphabet(i));
end
Now you should create a handle to progress bar inside your computation script. It will show you both on which step you are, and the progress of current step.
global h;
global iter_opt;
if isempty(h)
h=waitbar(0,'Solving...');
else
waitbar(t/t_end,h,sprintf('Solving... current step is:%d',iter));
end
You didn't specified which function you use, if it is for example time-depended the above t/t_end example is an estimation of current progress.
The solving of result also require to be changed on every execution of loop, for example:
global iter;
i_line = (t_end/accu_step+2)*(iter-1);
xlswrite('results.xlsx',{'ITERATION ', iter},sheet,strcat('A',num2str(i_line+5)))
xlswrite('results.xlsx',results_matrix(1:6),sheet,strcat('D',num2str(i_line+5)))
The above example were also with assumption that your results are time-related, so you store data every 2 units of time (day, hours, min, what you need), from t_0 to t_end, with additional 2 rows of separation, between steps. The number of columns is just exemplary, you can adjust it to your needs.
After the calculation is done, you can close waitbar with:
global h
close(h)

Sending data to workers

I am trying to create a piece of parallel code to speed up the processing of a very large (couple of hundred million rows) array. In order to parallelise this, I chopped my data into 8 (my number of cores) pieces and tried sending each worker 1 piece. Looking at my RAM usage however, it seems each piece is send to each worker, effectively multiplying my RAM usage by 8. A minimum working example:
A = 1:16;
for ii = 1:8
data{ii} = A(2*ii-1:2*ii);
end
Now, when I send this data to workers using parfor it seems to send the full cell instead of just the desired piece:
output = cell(1,8);
parfor ii = 1:8
output{ii} = data{ii};
end
I actually use some function within the parfor loop, but this illustrates the case. Does MATLAB actually send the full cell data to each worker, and if so, how to make it send only the desired piece?
In my personal experience, I found that using parfeval is better regarding memory usage than parfor. In addition, your problem seems to be more breakable, so you can use parfeval for submitting more smaller jobs to MATLAB workers.
Let's say that you have workerCnt MATLAB workers to which you are gonna handle jobCnt jobs. Let data be a cell array of size jobCnt x 1, and each of its elements corresponds to a data input for function getOutput which does the analysis on data. The results are then stored in cell array output of size jobCnt x 1.
in the following code, jobs are assigned in the first for loop and the results are retrieved in the second while loop. The boolean variable doneJobs indicates which job is done.
poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
future(jobNo) = parfeval(poolObj,#getOutput,...
nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
[idx,result] = fetchnext(future);
output{idx} = result;
doneJobs(idx) = true;
end
Also, you can take this approach one step further if you want to save up more memory. What you could do is that after fetching the results of a done job, you can delete the corresponding member of future. The reason is that this object stores all the input and output data of getOutput function which probably is going to be huge. But you need to be careful, as deleting members of future results index shift.
The following is the code I wrote for this porpuse.
poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
future(jobNo) = parfeval(poolObj,#getOutput,...
nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
[idx,result] = fetchnext(future);
furure(idx) = []; % remove the done future object
oldIdx = 0;
% find the index offset and correct index accordingly
while oldIdx ~= idx
doneJobsInIdxRange = sum(doneJobs((oldIdx + 1):idx));
oldIdx = idx
idx = idx + doneJobsInIdxRange;
end
output{idx} = result;
doneJobs(idx) = true;
end
The comment from #m.s is correct - when parfor slices an array, then each worker is sent only the slice necessary for the loop iterations that it is working on. However, you might well see the RAM usage increase beyond what you originally expect as unfortunately copies of the data are required as it is passed from the client to the workers via the parfor communication mechanism.
If you need the data only on the workers, then the best solution is to create/load/access it only on the workers if possible. It sounds like you're after data parallelism rather than task parallelism, for which spmd is indeed a better fit (as #Kostas suggests).
I would suggest to use the spmd command of MATLAB.
You can write code almost as it would be for a non-parallel implementation and also have access to the current worker by the labindex "system" variable.
Have a look here:
http://www.mathworks.com/help/distcomp/spmd.html
And also at this SO question about spmd vs parfor:
SPMD vs. Parfor

advice with pointers in matlab

I am running a very large meta-simulation where I go through two hyperparameters (lets say x and y) and for each set of hyperparameters (x_i & y_j) I run a modest sized subsimulation. Thus:
for x=1:I
for y=1:j
subsimulation(x,y)
end
end
For each subsimulation however, about 50% of the data is common to every other subsimulation, or subsimulation(x_1,y_1).commondata=subsimulation(x_2,y_2).commondata.
This is very relevant since so far the total simulation results file size is ~10Gb! Obviously, I want to save the common subsimulation data 1 time to save space. However, the obvious solution, being to save it in one place would screw up my plotting function, since it directly calls subsimulation(x,y).commondata.
I was wondering whether I could do something like
subsimulation(x,y).commondata=% pointer to 1 location in memory %
If that cant work, what about this less elegant solution:
subsimulation(x,y).commondata='variable name' %string
and then adding
if(~isstruct(subsimulation(x,y).commondata)),
subsimulation(x,y).commondata=eval(subsimulation(x,y).commondata)
end
What solution do you guys think is best?
Thanks
DankMasterDan
You could do this fairly easily by defining a handle class. See also the documentation.
An example:
classdef SimulationCommonData < handle
properties
someData
end
methods
function this = SimulationCommonData(someData)
% Constructor
this.someData = someData;
end
end
end
Then use like this,
commonData = SimulationCommonData(something);
subsimulation(x, y).commondata = commonData;
subsimulation(x, y+1).commondata = commonData;
% These now point to the same reference (handle)
As per my comment, as long as you do not modify the common data, you can pass it as third input and still not copy the array in memory on each iteration (a very good read is Internal Matlab memory optimizations). This image will clarify:
As you can see, the first jump in memory is due to the creation of common and the second one to the allocation of the output c. If the data were copied on each iteration, you would have seen many more memory fluctuations. For instance, a third jump, then a decrease, then back up again and so on...
Follows the code (I added a pause in between each iteration to make it clearer that no big jumps occur during the loop):
function out = foo(a,b,common)
out = a+b+common;
end
for ii = 1:10; c = foo(ii,ii+1,common); pause(2); end

Scope for improvement in this code

I have written the following code in MATLAB to process large images of the order of 3000x2500 pixels. Currently the operation takes more than half hour to complete. Is there any scope to improve the code to consume less time? I heard parallel processing can make things faster, but I have no idea on how to implement it. How do I do it, given the following code?
function dirvar(subfn)
[fn,pn] = uigetfile({'*.TIF; *.tiff; *.tif; *.TIFF; *.jpg; *.bmp; *.JPG; *.png'}, ...
'Select an image', '~/');
I = double(imread(fullfile(pn,fn)));
ld = input('Enter the lag distance = '); % prompt for lag distance
fh = eval(['#' subfn]); % Function handles
I2 = uint8(nlfilter(I, [7 7], fh));
imshow(I2); % Texture Layer Image
imwrite(I2,'result_mat.tif');
% Zero Degree Variogram
function [gamma] = ewvar(I)
c = (size(I)+1)/2; % Finds the central pixel of moving window
EW = I(c(1),c(2):end); % Determines the values from central pixel to margin of window
h = length(EW) - ld; % Number of lags
gamma = 1/(2 * h) * sum((EW(1:ld:end-1) - EW(2:ld:end)).^2);
end
The input lag distance is usually 1.
You really need to use the profiler to get some improvements out of it. My first guess (as I haven't run the profiler, which you should as suggested already), would be to use as little length operations as possible. Since you are processing every image with a [7 7] window, you can precalculate some parts,
such that you won't repeat these actions
function dirvar(subfn)
[fn,pn] = uigetfile({'*.TIF; *.tiff; *.tif; *.TIFF; *.jpg; *.bmp; *.JPG; *.png'}, ...
'Select an image', '~/');
I = double(imread(fullfile(pn,fn)));
ld = input('Enter the lag distance = '); % prompt for lag distance
fh = eval(['#' subfn]); % Function handles
%% precalculations
wind = [7 7];
center = (wind+1)/2; % Finds the central pixel of moving window
EWlength = (wind(2)+1)/2;
h = EWlength - ld; % Number of lags
%% calculations
I2 = nlfilter(I, wind, fh);
imshow(I2); % Texture Layer Image
imwrite(I2,'result_mat.tif');
% Zero Degree Variogram
function [gamma] = ewvar(I)
EW = I(center(1),center(2):end); % Determines the values from central pixel to margin of window
gamma = 1/(2 * h) * sum((EW(1:ld:end-1) - EW(2:ld:end)).^2);
end
end
Note that by doing so, you trade performance for clearness of your code and coupling (between the function dirvar and the nested function ewvar). However, since I haven't profiled your code (you should do that yourself using your own inputs), you can find what line of your code consumes the most time.
For batch processing, I would also recommend to leave out any input, imshow, imwrite and uigetfile. Those are commands that you typically call from a more high-level function/script and that will force you to enter these inputs even when you want them to stay the same. So instead of that code, make each of the variables they produce (/process) a parameter (/return value) for your function. That way, you could leave MATLAB running during the weekend to process everything (without having manually enter to all those values), even if you are unable to speed up the code.
A few general purpose tricks:
1 - use the MATLAB profiler to determine all the computational bottlenecks
2 - parallel processing can make things faster and there are a lot of tools that you can use, but it depends on how your entire code is set up and whether the code is optimized for it. By far the easiest trick to learn is parfor, where you can replace the top level for loop by parfor. This does mean you must open the MATLAB pool with matlabpool open.
3 - If you have a rather recent Nvidia GPU as well as MATLAB 2011, you can also write some CUDA code.
All in all 30 mins to me is peanuts, so don't fret it too much.
First of all, I strongly suggest you follow the advice by #Egon: Write a separate function that collects a list of files (the excellent UIPICKFILES from the FEX is your friend here), and then runs your filtering code in a loop for each image. Note that you should definitely keep the call to imwrite in your filtering code: In case the analysis crashes at image 48 (e.g. due to power failure), you don't want to lose all the previous work.
Running thusly in batch mode has two big advantages: (1) you can start running your code and go home for the week-end, and (2) you can easily parallelize this outside loop using PARFOR. However, with only a dual-core machine, it is unlikely that you get any significant improvements from parallelization - your OS also wants to run stuff at times, and the overhead of parallelization might be more than the gain from running two workers. Also, 2.5GB of RAM is seriously limiting.
As to your specific code: in my experience using IM2COL is often faster than NLFILTER. im2col creates a nElementsInMask-by-nMasks array out of your image, so that you can apply the filtering in one single operation. With a 7x7 window, the output of im2col will be 3000*2500*49 bytes, which is close to 400MB. Thus, it should just work. All that you need to do is rewrite ewvar so that it works on a 49x1 array of pixels that make up the pixels your mask, which will require some index juggling, if I understand your code correctly.

MATLAB program takes more and more of my memory

I'm going to write a program in MATLAB that takes a function, sets the value D from 10 to 100 (the for loop), integrates the function with Simpson's rule (the while loop) and then displays it. Now, this works fine for the first 7-8 values, but then it takes longer time and eventually I run out of memory, and I don't understand the reason for this. This is the code so far:
global D;
s=200;
tolerance = 9*10^(-5);
for D=10:1:100
r = Simpson(#f,0,D,s);
error = 1;
while(error>tolerance)
s = 2*s;
error = (1/15)*(Simpson(#f,0,D,s)-r);
r = Simpson(#f,0,D,s);
end
clear error;
disp(r)
end
mtrw's comment probably already answers the question in part: s should be reinitialized inside the for loop. The posted code results in s increasing irreversibly every time the error was too large, so for larger values of D the largest s so far will be used.
Additionally, since the code re-evaluates the entire integration instead of reusing the previous integration from [0, D-1] you waste lots of resources unless you want to explicitly show the error tolerance of your Simpson function - s will have to increase a lot for large D to maintain the same low error (since you integrate over a larger range you have to sum up more points).
Finally, your implementation of Simpson could of course do funny stuff as well, which no one can tell without seeing it...