How Should I Parallelize My Genetic Algorithm Fitness Evaluation? - matlab

I have a GA code that I developed myself. Since I'm new to coding, my code is not fast. I have a Dual-Core CPU 2.6GHz.
The only line of the code that takes a long time to run is the fitness function. I am not familiar with the GA toolbox and my fitness function is quite complex so I assume even if I knew how to use the GA toolbox, I would have to code the fitness function myself.
The algoritm's structure is as follows:
after generating the initial generation and evaluating the fitness values (which takes long but does not matter that much because this is only run once), it starts a loop which will be iterated for up to 10000 times. In each iteration, we have a new generation whose fitness values needs to be calculated. So when a new generation of 50 individuals is generated, the whole generation is fed to the fitness_function. In this function there is a for loop which calculates the fitness value for each 50 individual (so the for loop is iterated 50 times). Here is my question. How should I use parfor so that 25 individual is evaluated by one CPU core and the other 25 individuals with the other core, so that the calculation time is decreased to almost half. I already know from here
I have tried changing the for loop in the fitness_function directly to parfor and I have received the following error: "The PARFOR loop cannot run due to the way variable "Z" is used." and "Variable z is indexed in different ways. Potentially causing dependencies between iterations." Variable Z is a 50*3 matrix which stores the fitness values for each of the individuals.

The problem with your assignment into Z is that you have three different assignment statements, and that is not allowed. You need to make the assignment into Z meet the requirements for a "sliced" variable. The easiest way to do this is to make a temporary variable Zrow to store the values for the ith row of Z, and then make a single assignment, like this
parfor i = 1:50
Zrow = zeros(1, 3); % allocate to ensure parfor knows this is a temporary
...
Zrow(1) = TTT;
...
Zrow(2) = sum(FSL,1);
Zrow(3) = 0.5*Z(i,1)+0.5*Z(i,2);
% Finally, make a single sliced assignment into Z
Z(i, :) = Zrow;
end
Also, in general, it's best to have the parfor loop be the outermost one. Also, whether parfor actually gives you any speed-up depends a lot on whether the body of the loop is already being multithreaded by MATLAB's built-in multithreaded capabilities. (If it is, then parfor using only your local machine cannot make things faster because in that case, the multithreaded code is already taking full advantage of your computer's resources).

Related

Iteration for convergence in Matlab without using a while loop

I have to iterate a process where I have an initial guess for the Mach number (M0). This initial guess will give me another guess for the Mach number by using two equations (Mn). Eventually, i want to iterate this process untill the error between M0 and Mn is small. I have the following piece of code and it actually works well with a while loop.
However, I am afraid that the while loop will take many iterations and computational time for certain inputs since this will be part of a bigger code which most likely will give unfeasible inputs for the while loop.
Therefore my question is the following. How can I iterate this process within Matlab without consulting a while loop? The code that I am implementing now is the following:
%% Input
gamma = 1.4;
theta = atan(0.315);
cpi = -0.732;
%% Loop
M0 = 0.2; %initial guess
Err = 100;
iterations = 0;
while Err > 0.5E-3
B = (1-(M0^2)*(1-M0*cpi))^0.5;
Mn = (((gamma+1)/2) * ((B+((1-cpi)^0.5)*sec(theta)-1)^2/(B^2 + (tan(theta))^2)) - ((gamma-1)/2) )^-0.5;
Err = abs(M0 - Mn);
M0 = Mn;
iterations=iterations+1;
end
disp(iterations) disp(Mn)
Many thanks
Since M0 is calculated in each iteration and you have trigonometric functions, you cannot use another way than iteration structures (i.e. while).
If you had a specific increase or decrease at M0, then you could initialize a vector of M0 and do vector calculations for B and Err.
But, with sec and tan this is not possible.
Another wat would be to use the parallel processing. But, since you change the M0 at each iteration then you cannot use the parfor loop.
As for a for loop, in MATLAB you need an array for for "command" argument (e.g. 1:10 or 1:length(x) or i = A, where A = 1:10 or A = [1:10;11:20]). Since you evaluate a condition and depending on the result of the evaluation you judge if you continue the execution or not, it seems that the while loop (or do while in another language) is the only way to go.
I think you need to clarify the issue. If it the issue you want to solve is that some inputs take a long time to calculate, it is not the while loop that takes the time, it is the execution of the code multiple times that causes it. Any method that loops through will be restricted by the time the block of code takes to execute multiplied by the number of iterations required to converge.
You can introduce something to stop at a certain number of iterationtions, conceptually:
While ((err > tolerance) && (numIterations < limit))
If you want an answer which does not require iterating over the code, this is akin to finding a closed form solution, and I suspect this does not exist.
Edit to add: by not exist I mean in a practical form which can be implemented in a more efficient way then iterating to a solution.

Running internal matlab functions in parallel within existing matlabpool with "-singleCompThread"

I'm implementing an adaptive (approximate) matrix-vector multiplication for very large systems (known sparsity structure) - see Predicting runtime of parallel loop using a-priori estimate of effort per iterand (for given number of workers) for a more long-winded description. I first determine the entries I need to calculate for each block, but even though the entries are only a small subset, calculating them directly (with quadrature) would be impossibly expensive. However, they are characterised by an underlying structure (the difference of their respective modulations) which means I only need to calculate the quadrature once per "equivalence class", which I get by calling unique on a large 2xN matrix of differences (and then mapping back to the original entries).
Unfortunately, this 2xN-matrix becomes so large in practice, that it is becoming somewhat of a bottleneck in my code - which is still orders magnitude faster than calculating the quadrature redundantly, but annoying nevertheless, since it could run faster in principle.
The problem is that the cluster on which I compute requires the -singleCompThread option, so that Matlab doesn't spread where it shouldn't. This means that unique is forced to use only one core, even though I could arrange it within the code that it is called serially (as this procedure must be completed for all relevant blocks).
My search for a solution has lead me to the function maxNumCompThreads, but it is deprecated and will be removed in a future release (aside from throwing warnings every time it's called), so I didn't pursue it further.
It is also possible to pass a function to a batch job and specify a cluster and a poolsize it should run on (e.g. j=batch(cluster,#my_unique,3,{D,'cols'},'matlabpool',127); this is 2013a; in 2013b, the key for 'matlabpool' changed to 'Pool'), but the problem is that batch opens a new pool. In my current setup, I can have a permanently open pool on the cluster, and it would take a lot of unnecessary time to always open and shut pools for batch (aside from the fact that the maximal size of the pool I could open would decrease).
What I'd like is to call unique in such a way, that it takes advantage of the currently open matlabpool, without requesting new pools of or submitting jobs to the cluster.
Any ideas? Or is this impossible?
Best regards,
Axel
Ps. It is completely unfathomable to me why the standard set functions in Matlab have a 'rows'- but not a 'cols'-option, especially since this would "cost" about 5 lines of code within each function. This is the reason for my_unique:
function varargout=my_unique(a,elem_type,varargin)
% Adapt unique to be able to deal with columns as well
% Inputs:
% a:
% Set of which the unique values are sought
% elem_type (optional, default='scalar'):
% Parameter determining which kind of unique elements are sought.
% Possible arguments are 'scalar', 'rows' and 'cols'.
% varargin (optional):
% Any valid combination of optional arguments that can be passed to
% unique (with the exception of 'rows' if elem_type is either 'rows'
% or 'cols')
%
% Outputs:
% varargout:
% Same outputs as unique
if nargin < 2; elem_type='scalar'; end
if ~any(strcmp(elem_type,{'scalar','rows','cols'}))
error('Unknown Flag')
end
varargout=cell(1,max(nargout,1));
switch (elem_type)
case 'scalar'
[varargout{:}]=unique(a,varargin{:});
case 'rows'
[varargout{:}]=unique(a,'rows',varargin{:});
case 'cols'
[varargout{:}]=unique(transpose(a),'rows',varargin{:});
varargout=cellfun(#transpose,varargout,'UniformOutput',false);
end
end
Without trying the example you cited above, you could try blockproc to do block processing. It however, belongs to Image Processing Toolbox.
Leaving aside the 'rows' problem for the time being, if I've understood correctly, what you're after is a way to use an open parallel pool to do a large call to 'unique'. One option may be to use distributed arrays. For example, you could do:
spmd
A = randi([1 100], 1e6, 2); % already transposed to Nx2
r = unique(A, 'rows'); % operates in parallel
end
This works because sortrows is implemented for codistributed arrays. You'll find that you only get speedup from (co)distributed arrays if you can arrange for the data always to live on the cluster, and also when the data is so large that processing it on one machine is infeasible.

how to create a changing variable for fsolve

i want fsolve to calculate the output for different uc each time (increasing uc by 0.001 each time). each output from fsolve should be sent to a simulink model seperatly. so i set a loop to do so, but i believe that at the currenty constellation (if it will work)will just calculate 1000 different values? is there a way to send out the values seperately?
if not, how can i create a parameter uc. that goes from 0 to say 1000? i tried uc=0:0.001:1000, but again, the demension doen't seem to fit.
how do i create a function that takes the next element of a vector/matrix each time the function is called?
best regards
The general approach to iterating over an array of values and feeding them one-by-one into a series of evaluations of a function follows this form:
for ix = 0:0.1:10
func(arg1, arg2, ix)
end
See how each call to func includes the current value of ix ? On the first iteration ix==0, on the next ix==0.1 and so forth. You should be able to adapt this to your needs; in your code the loop index (which you call i) is not used inside the loop.
Now some un-asked-for criticism of your code. The lines
x0=[1,1,1];
y=x0(1);
u=x0(2);
yc=x0(3);
options=optimset('Display','off');
do not change as the loop iterations advance; they always return the same values whatever the value of the loop iterator (i in your code) may be. It is pointless including them inside the loop.
Leaving them inside the loop may even be a waste of a lot of time if Matlab decides to calculate them at every iteration. I'm not sure what Matlab does in this case, it may be smart enough to figure out that these values don't change at each iteration, but even if it does it is bad programming practice to write your code this way; lift constant expressions such as these out of loops.
It's not clear from the fragment you've posted why you have defined y, u and yc at all, they're not used anywhere; perhaps they're used in other parts of your program.

Parfor in MATLAB Problem

Why can't I use the parfor in this piece of code?
parfor i=1:r
for j=1:N/r
xr(j + (N/r) * (i-1)) = x(i + r * (j-1));
end
end
This is the error:
Error: The variable xr in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview".
The issue here is that of improper indexing of the sliced array. parfor loops are run asynchronously, meaning the order in which each iteration is executed is random. From the documentation:
MATLAB workers evaluate iterations in no particular order, and independently of each other. Because each iteration is independent, there is no guarantee that the iterations are synchronized in any way, nor is there any need for this.
You can easily verify the above statement by typing the following in the command line:
parfor i=1:100
i
end
You'll see that the ordering is arbitrary. Hence if you split a parallel job between different workers, one worker has no way of telling if a different iteration has finished or not. Hence, your variable indexing cannot depend on past/future values of the iterator.
Let me demonstrate this with a simple example. Consider the Fibonacci series 1,1,2,3,5,8,.... You can generate the first 10 terms of the series easily (in a naïve for loop) as:
f=zeros(1,10);
f(1:2)=1;
for i=3:10
f(i)=f(i-1)+f(i-2);
end
Now let's do the same with a parfor loop.
f=zeros(1,10);
f(1:2)=1;
parfor i=3:10
f(i)=f(i-1)+f(i-2);
end
??? Error: The variable f in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview"
But why does this give an error?
I've shown that iterations are executed in an arbitrary order. So let's say that a worker gets the loop index i=7 and the expression f(i)=f(i-1)+f(i-2);. It is now supposed to execute the expression and return the results to the master node. Now has iteration i=6 finished? Is the value stored in f(6) reliable? What about f(5)? Do you see what I'm getting at? Supposing f(5) and f(6) are not done, then you'll incorrectly calculate that the 7th term in the Fibonnaci series is 0!
Since MATLAB has no way of telling if your calculation can be guaranteed to run correctly and reproduce the same result each time, such ambiguous assignments are explicitly disallowed.

Parallelize or vectorize all-against-all operation on a large number of matrices?

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.
In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so
list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));
for i=1:numel(list)
for j=1:numel(list)
if i ~= j
eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
end
end
end
is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:
# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
[nextData{:}] = deal(data{i});
H{:,i} = cellfun(#compare,data,nextData,'UniformOutput',false);
end
Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).
Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?
Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).
Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear Hi. Unfortunately, the saves are very slow...
Does
compare(a,b) == compare(b,a)
and
compare(a,a) == 1
If so, change your loop
for i=1:numel(list)
for j=1:numel(list)
...
end
end
to
for i=1:numel(list)
for j= i+1 : numel(list)
...
end
end
and deal with the symmetry and identity case. This will cut your calculation time by half.
The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.
%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
%# only make the necessary comparisons
H{i+1:k,i} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
%# if the above doesn't work, try this
hSlice = cell(k,1);
hSlice{i+1:k} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
H{:,i} = hSlice;
end
If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.