How can I incorporate a for loop into my genetic algorithm? - matlab

I'm doing a genetic algorithm that attempts to find an optimized solution over the course of 100 generations. My code as is will find 2 generations. I'm trying to find a way to add a for loop in order to repeat the process for the full duration of 100 generations.
clc,clear
format shorte
k=80;
mu=50;
s=.05;
c1=k+(4/3)*mu;
c2=k-(2/3)*mu;
for index=1:50 %6 traits generated at random 50 times
a=.005*rand-.0025;
b=.005*rand-.0025;
c=.005*rand-.0025;
d=.005*rand-.0025;
e=.005*rand-.0025;
f=.005*rand-.0025;
E=[c1,c2,c2,0,0,0;
c2,c1,c2,0,0,0;
c2,c2,c1,0,0,0;
0,0,0,mu,0,0;
0,0,0,0,mu,0;
0,0,0,0,0,mu];
S=[a;d;f;2*b;2*c;2*e];
G=E*S;
g=G(1);
h=G(2);
i=G(3);
j=G(4);
k=G(5);
l=G(6);
F=[(g-h)^2+(h-i)^2+(i-g)^2+6*(j^2+k^2+l^2)];
PI=((F-(2*s^2))/(2*s^2))^2; %cost function, fitness assessed
RP(index,:)=[a,b,c,d,e,f,PI]; %initial random population
end
Gen1=sortrows(RP,7,{'descend'}); %the initial population ranked
%for loop 1:100 would start here
children=zeros(10,6); %10 new children created from the top 20 parents
babysitter=1;
for parent=1:2:20
theta=rand(1);
traita=theta*Gen1(parent,1)+(1-theta)*Gen1(1+parent,1);
theta=rand(1);
traitb=theta*Gen1(parent,2)+(1-theta)*Gen1(1+parent,2);
theta=rand(1);
traitc=theta*Gen1(parent,3)+(1-theta)*Gen1(1+parent,3);
theta=rand(1);
traitd=theta*Gen1(parent,4)+(1-theta)*Gen1(1+parent,4);
theta=rand(1);
traite=theta*Gen1(parent,5)+(1-theta)*Gen1(1+parent,5);
theta=rand(1);
traitf=theta*Gen1(parent,6)+(1-theta)*Gen1(1+parent,6);
children(babysitter,:)=[traita,traitb,traitc,traitd,traite,traitf];
babysitter=babysitter+1;
end
top10parents=Gen1(1:10,1:6);
Gen1([11:50],:)=[]; %bottom 40 parents removed
for newindex=1:30 %6 new traits generated randomly 30 times
newa=.005*rand-.0025;
newb=.005*rand-.0025;
newc=.005*rand-.0025;
newd=.005*rand-.0025;
newe=.005*rand-.0025;
newf=.005*rand-.0025;
newgenes(newindex,:)=[newa,newb,newc,newd,newe,newf];
end
nextgen=[top10parents;children;newgenes]; %top 10 parents, the 10 new children, and the new 30 random traits added into one new matrix
for new50=1:50
newS=[nextgen(new50,1);nextgen(new50,4);nextgen(new50,6);2*nextgen(new50,2);2*nextgen(new50,3);2*nextgen(new50,5)];
newG=E*newS;
newg=newG(1);
newh=newG(2);
newi=newG(3);
newj=newG(4);
newk=newG(5);
newl=newG(6);
newF=[(newg-newh)^2+(newh-newi)^2+(newi-newg)^2+6*(newj^2+newk^2+newl^2)]; %von-Mises criterion
newPI=((newF-(2*s^2))/(2*s^2))^2; %fitness assessed for new generation
PIcolumn(new50,:)=[newPI];
end
nextgenwPI=[nextgen,PIcolumn]; %pi column added to nextgen matrix
Gen2=sortrows(nextgenwPI,7,{'descend'}) %generation 2 ranked
So my question is, how can I get the generations to count themselves in order to make the for loop work. I've searched for an answer and I've read that having matrices count themselves is not a good idea. However, I'm not sure how I could do this besides finding a way to make a genN matrix that counts upward in increments of 1 after the first generation. Any suggestions?
Thank you

Related

MATLAB is too slow calculation of Spearman's rank correlation for 9-element vectors

I need to calculate the Spearman's rank correlation (using corr function) for pairs of vectors with different lengths (for example 5-element vectors to 20-element vectors). The number of pairs is usually above 300 pairs for each length. I track the progress with waitbar. I have noticed that it takes unusually very long time for 9-element pair of vectors, where for other lengths (greater and smaller) it takes very short times. Since the formula is exactly the same, the problem must have originated in MATLAB function corr.
I wrote the following code to verify that the problem is with corr function and not other calculations that I have besides 'corr', where all of that calculations (including 'corr') take place inside some 2 or 3 'for' loops. The code repeats the timing 50 times to avoid accidental results.
The result is a bar graph, confirming that it takes a long time for MATLAB to calculate Spearman's rank correlation for 9-element vectors. Since my calculations are not that heavy, this problem does not cause endless wait, it just increases the total time consumed for the whole process. Can someone tell me that what causes the problem and how to avoid it?
Times1 = zeros(20,50);
for i = 5:20
for j = 1:50
tic
A = rand(i,2);
[r,p] = corr(A(:,1),A(:,2),'type','Spearman');
Times1(i,j) = toc;
end
end
Times2 = mean(Times1,2);
bar(Times2);
xticks(1:25);
xlabel('number of elements in vectors');
ylabel('average time');
After some investigation, I think I found the root of this very interesting problem. My tests have been conducted profiling every outer iteration using the built-in Matlab profiler, as follows:
res = cell(20,1);
for i = 5:20
profile clear;
profile on -history;
for j = 1:50
uni = rand(i,2);
corr(uni(:,1),uni(:,2),'type','Spearman');
end
profile off;
p = profile('info');
res{i} = p.FunctionTable;
end
The produced output looks like this:
The first thing I noticed is that the Spearman correlation for matrices with a number of rows less than or equal to 9 is computed in a different way than for matrices with 10 or more rows. For the former, the functions being internally called by the corr function are:
Function Number of Calls
----------------------- -----------------
'factorial' 100
'tiedrank>tr' 100
'tiedrank' 100
'corr>pvalSpearman' 50
'corr>rcumsum' 50
'perms>permsr' 50
'perms' 50
'corr>spearmanExactSub' 50
'corr>corrPearson' 50
'corr>corrSpearman' 50
'corr' 50
'parseArgs' 50
'parseArgs' 50
For the latter, the functions being internally called by the corr function are:
Function Number of Calls
----------------------- -----------------
'tiedrank>tr' 100
'tiedrank' 100
'corr>AS89' 50
'corr>pvalSpearman' 50
'corr>corrPearson' 50
'corr>corrSpearman' 50
'corr' 50
'parseArgs' 50
'parseArgs' 50
Since the computation of the Spearman correlation for matrices with 10 or more rows seems to run smoothly and quickly and doesn't show any evidence of performance bottlenecks, I decided to avoid losing time investigating on this fact and I focused on the main concern: the small matrices.
I tried to understand the difference between the execution time of the whole process for a matrix with 5 rows and for one with 9 rows (the one notably showing the worst performance). This is the code I used:
res5 = res{5,1};
res5_tt = [res5.TotalTime];
res5_tt_perc = ((res5_tt ./ sum(res5_tt)) .* 100).';
res9_tt = [res{9,1}.TotalTime];
res9_tt_perc = ((res9_tt ./ sum(res9_tt)) .* 100).';
res_diff = res9_tt_perc - res5_tt_perc;
[~,res_diff_sort] = sort(res_diff,'desc');
tab = [cellstr(char(res5.FunctionName)) num2cell([res5_tt_perc res9_tt_perc res_diff])];
tab = tab(res_diff_sort,:);
tab = cell2table(tab,'VariableNames',{'Function' 'TT_M5' 'TT_M9' 'DIFF'});
And here is the result:
Function TT_M5 TT_M9 DIFF
_______________________ _________________ __________________ __________________
'corr>spearmanExactSub' 7.14799963478685 16.2879721171023 9.1399724823154
'corr>pvalSpearman' 7.98185309750143 16.3043118970503 8.32245879954885
'perms>permsr' 3.47311716905926 8.73599255035966 5.26287538130039
'perms' 4.58132952553723 8.77488502392486 4.19355549838763
'corr>corrSpearman' 15.629476293326 16.440893059217 0.811416765890929
'corr>rcumsum' 0.510550019981949 0.0152486312660671 -0.495301388715882
'factorial' 0.669357868472376 0.0163923929871943 -0.652965475485182
'parseArgs' 1.54242684137027 0.0309456171268161 -1.51148122424345
'tiedrank>tr' 2.37642998160463 0.041010720272735 -2.3354192613319
'parseArgs' 2.4288171135289 0.0486075856244615 -2.38020952790444
'corr>corrPearson' 2.49766877262937 0.0484657591710417 -2.44920301345833
'tiedrank' 3.16762535118088 0.0543584195582888 -3.11326693162259
'corr' 21.8214856092549 16.5664346332513 -5.25505097600355
Once the bottleneck was detected, I started analyzing the internal code (open corr) and I finally found the cause of the problem. Within the spearmanExactSub, this part of code is being executed (where n is the number of rows of the matrix):
n = arg1;
nfact = factorial(n);
Dperm = sum((repmat(1:n,nfact,1) - perms(1:n)).^2, 2);
A permutation is being computed on a vector whose values range from 1 to n. This is what comes into play increasing the computational complexity (and, obviously, the computational time) of the function. Other operations, like the subsequent repmat on factorial(n) of 1:n and the ones below that point, contribute to worsen the situation. Now, long story short...
factorial(5) = 120
factorial(6) = 720
factorial(7) = 5040
factorial(8) = 40320
factorial(9) = 362880
can you see the reason why, between 5 and 9, your bar graph shows an "exponentially" increasing computational time?
On a side note, there is nothing you can do to solve this problem, unless you find another implementation of the Spearman correlation that doesn't present the same bottleneck or you implement your own.

Animated plot of infectious disease spread with for loop (Matlab)

I'm a beginner in Matlab and I'm trying to model the spread of an infectious disease using Matlab. However, I encounter some problems.
At first, I define the matrices that need to be filled and their initial status:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.0001; % Rate of spread
Now, I want to make a plot where the spread of the disease is shown, using a for loop. But i'm stuck here...
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix(t) = round((Rate).*(out_ZT));
diseasematrix(t) = diseasematrix(t-1) + infectionmatrix(t-1);
healthymatrix(t) = healthymatrix(t-1) - infectionmatrix(t-1);
imagesc(diseasematrix(t));
title(sprintf('Day %i',t));
drawnow;
end
This basically says that the infectionmatrix is calculated based upon the formula in the loop, the diseasematrix is calculated by adding up the sick people of the previous timestep with the infected people of the previous time. The healthy people that remain are calculated by substracting the healthy people of the previous time step with the infected people. The variable out_ZT is a function I made:
function [ZT] = calc_ZT(Zneighboursum, diseasematrix)
Zneighboursum = Zneighboursum + circshift(diseasematrix,[1 0]);
Zneighboursum = Zneighboursum + circshift(diseasematrix,[0 1]);
ZT=Zneighboursum;
end
This is to quantify the number of sick people around a central cell.
However, the result is not what I want. The plot does not evolve dynamically and the values don't seem to be right. Can anyone help me?
Thanks in advance!
There are several problems with the code:
(Rate).*(out_ZT) is wrong. Because first one is a scalar and
second is a matrix, while .* requires both to be matrices of the
same size. so a single * would work.
The infectionmatrix,
diseasematrix, healthymatrix are all 2 dimensional matrices and
in order to keep them in memory you need to have a 3 dimensional
matrix. But since you don't use the things you store later you can
just rewrite on the old one.
You store integers in the
infectionmatrix, because you calculate it with round(). That
sets the result always to zero.
The value for Rate was too low to see any result. So I increased it to 0.01 instead
(just a cautionary point) you haven't used healthymatrix in your code anywhere.
The code for the function is fine, so after debugging according to what I perceived, here's the code:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01;
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix = (Rate*out_ZT);
diseasematrix = diseasematrix + infectionmatrix;
healthymatrix = healthymatrix - infectionmatrix;
imagesc(diseasematrix);
title(sprintf('Day %i',t));
drawnow;
end
There is several problems:
1) If you want to save a 3D matrix you will need a 3D vector:
so you have to replace myvariable(t) by myvariable(:,:,t);
2) Why did you use round ? if you round a value < 0.5 the result will be 0. So nothing will change in your loop.
3) You need to define the boundary condition (t=1) and then start your loop with t = 2.
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix =zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01; % Rate of spread
for t=2:365
Zneighboursum=zeros(size(diseasematrix,1),size(diseasematrix,2));
out_ZT = calc_ZT(Zneighboursum, diseasematrix(:,:,t-1));
infectionmatrix(:,:,t) = (Rate).*(out_ZT);
diseasematrix(:,:,t) = diseasematrix(:,:,t-1) + infectionmatrix(:,:,t-1);
healthymatrix(:,:,t) = healthymatrix(:,:,t-1) - infectionmatrix(:,:,t-1);
imagesc(diseasematrix(:,:,t));
title(sprintf('Day %i',t));
drawnow;
end
IMPORTANT: circshift clone your matrix in order to deal with the boundary effect.

Matlab forecasting with autoregressive exogenous modell

i have a file, which is the energy consumption of a house.
every 10 minute one value (watt):
10:00 123
10:10 125
10:20 0
...
It means each day have 144 value (Rows).
i want to forecast the energy of the next day with ARX an ARMAX programm. i did write ARX code in Matlab. but i can't forecast the next day. My code take the last 5 consumption and forecast the 6th one. How can i forecast the nex 144 value ( = the day after)
% ARX Process----------------------------
L=length(u_in)
u_in_ID=u_in;% Input data used for Identification
u_in_vfy=u_in;% Input data used for verification
y_out_ID=y_out;% Output data used for Identification
y_out_vfy=y_out;%Output data used for verification
m=5; %Parameter to be used to generate order of delay for Input, Output and Error
n=length(y_out_ID)-m;
I=eye(n,1)+1;
I(1)=I(1)-1;
A=I; % Initialize Matrix A
Y=y_out_ID((m+1):end); % Defining Y vector
length(Y)
na=1;
% Put output delay 1 to m-na in A matrix
for k=1:1:m-na
A=[A y_out_ID((m-k+1):(end-k))];
end
% Put "Current Input -- mth delayed Input" to Matrix A
for p=1:1:m
k=p-1;
A=[A u_in_ID((m-k+1):(end-k))];
end
A(:,1)=[]; % Delete 1st column of Matrix A, which was used to Initialize it
parsol=inv(A'*A)*A'*Y;
BB=A*parsol;
% Generate Identified Output vector based on previous
% outputs, current and previous Inputs and Parameters solved by Least
% square method
n=length(y_out_vfy)-m;
I=eye(n,1)+1;
I(1)=I(1)-1;
A=I;
for k=1:1:m-na
A=[A y_out_vfy((m-k+1):(end-k))];
end
for p=1:1:m
k=p-1;
A=[A u_in_vfy((m-k+1):(end-k))];
end
A(:,1)=[]; % Delete 1st column of Matrix A, which was used to Initialize it
A;
y_out_sysID=A*parsol;
can anyone help me?

flip coin 100 times get exactly 50 Matlab

If I flip a coin 100 times, what is the probability that exactly 50 will be heads? My thoughts were to get the number of times exactly 50 appeared in the 100 coin flips out of 1000 times and divide that by 1000, the number of events.
I have to model this experiment in Matlab.
I understand that flipping a coin 100 times and retrieving the number of heads and adding a count to the number of exactly 50 heads is one event. But I do not know how to repeat that event 1000, or 10000 times.
Here is the code I have written so far:
total_flips=100;
heads=0;
tails=0;
n=0;
for z=1:1000
%tosses 100 coins
for r=1:100
%randomizes to choose 1 or 0, 0 being heads
coin=floor(2*rand(1));
if (coin==0)
heads=heads+1;
else
tails=tails+1;
end
end
if heads==50
n=n+1;
end
end
I have tried to encompass the for loop and the if statement within a for loop, but had no luck. How do I repeat it?
although your problem is solved, here comes comments on your code:
1) You set the variable total_flips=100, but you do not use it in your for-loop, where it goes from 1 to 100. It could go from 1 to total_flips
2) Omitting for-loops: although this was not your question, but your code can be optimized. You do not need a single for-loop for your problem:
repititions = 1000;
total_flips = 100;
coin_flip_matrix = floor(2*rand(total_flips, repititions)); % all coin flips: one column per repitition
num_of_heads = sum(coin_flip_matrix); % number of heads for each repitition (shaped: 1 x repitions)
n = sum(num_of_heads == 50) % how often did we hit 50?
You don't need tails at all, and you need to set heads back to zero inside the outer for z=1:1000 loop.

Selection operator and minimizing the fitness function in Genetic Algorithms

I'm developing a nurse rostering tool im matlab using genetic algorithms, without using GA toolbox.
The individual is a weekly schedule and is represented as a 2-d array with rows equals to the number of nurses and seven columns because it is a weekly schedule.
The fitness function takes the entire population and returns an array with size equal to the population size containing fitness values.
The fitness function should be minimized so the best schedule is the one having the lowest fitness value.
My fitness function is:
function fitness_values =Fitness_Function( thePopulation)
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
[Ar1 Ar2 popsize num_nur] = Return_Data( 0,0,0,0 );
[prev_sched OffArr]=Return_Data1(0,0);
constraints=cell(popsize,1);
fitness_values=zeros(popsize,1);
size=[1 7];
c1=zeros(popsize,1);
c1values=cell(popsize,1);
W1=0.25; %hard
W2=0.25; %hard
W3=0.25; %hard
W4=0.125; %soft
W5=0.125; %soft
for i=1: popsize
c1values{i}=zeros(size);
end
% Checking Constraint c1 (the difference between night and day shifts in
% each day of the schedule
for i=1:popsize
for j=1:7
day_sum=0;
night_sum=0;
for k=1:num_nur
if thePopulation{i}(k,j)==1
day_sum=day_sum+1;
elseif thePopulation{i}(k,j)==2
night_sum=night_sum+1;
end
end
abs_diff=abs(day_sum-night_sum);
c1values{i}(1,j)=abs_diff.^2;
end
c1(i)=sum(c1values{i}(1,:));
%celldisp(c1values);
%defining the array that will hold the result of multiplying the number of
%violations with the correspondig weight,a cell array where each cell
%containts num_nur rows and 4 columns for c2, c3,c4 and c5.
nurse_fitness=zeros(num_nur,1);
for in=1:popsize
constraints{in}=zeros(num_nur,4);
end
for j=1:num_nur
v2=0;
v3=0;
v4=0;
%check violations with the previous schedule(the last day of the
%previous schedule with the first day of the evaluated schedule
% c2
if prev_sched(j,7)==2 && thePopulation{i}(j,1)==1
v2=v2+1;
end
% c3
%check the last day of the previous schedule
if prev_sched(j,7)==1 && thePopulation{i}(j,1)==1 && thePopulation{i}(j,2)~=3
v3=v3+1;
%check the last 2 days of the previous schedule
elseif prev_sched(j,6)==1 &&prev_sched(j,7)==1 && thePopulation{i}(j,2)~=3
v3=v3+1;
end
%c4
%check the last day of the previous schedule
if prev_sched(j,7)==2 && thePopulation{i}(j,1)==3 &&thePopulation{i}(j,2)==1
v4=v4+1;
%check the last 2 days of the previous schedule
elseif prev_sched(j,6)==2 &&prev_sched(j,7)==3 && thePopulation{i}(j,2)==1
v4=v4+1;
end
%check violations of constraints c2,c3 and c4 in the
%evaluated schedule
for k=1:6
%check violations of c2 N->N or N->O (hard)
if thePopulation{i}(j,k)==2 && thePopulation{i}(j,k+1)==1
v2=v2+1;
end
end
%check violations of c3 D->D->O (hard)
for k=1:5
if thePopulation{i}(j,k)==1 && thePopulation{i}(j,k+1)==1 && thePopulation{i}(j,k+2)~=3
v3=v3+1;
end
%check violations of c4 N->O->N or N->O->O (soft)
if thePopulation{i}(j,k)==2 && thePopulation{i}(j,k+1)==3 && thePopulation{i}(j,k+2)==1
v4=v4+1;
end
end
constraints{i}(j,1)=v2*W2;
constraints{i}(j,2)=v3*W3;
constraints{i}(j,3)=v4*W4;
%check violations of c5 (perefrences of each nurse)
offdays=find(thePopulation{i}(j,:)==3);
%disp(offdays);
%disp(OffArr(j,:));
%find intersection between the perefreces and the days off in the
%schedule of each nurse
inters=intersect(offdays,OffArr(j,:));
num_inters=length(inters);
if(length(offdays)==1)
%for head nurse
if num_inters==1
constraints{i}(j,4)=0;
else
constraints{i}(j,4)=3*W5;
end
else
penalty=3-num_inters;
constraints{i}(j,4)=penalty*W5;
end
nurse_fitness(j)=sum(constraints{i}(j,:));
end
%calculating the fitness value for the whole schedule
fitness_values(i)=W1*c1(i)+sum(nurse_fitness);
end
end
I'll summarize how it works: it takes a cell array (the population) each cell contains a schedule represented as matrix having rows =number of nurses and 7 columns (weekly schedule),,the problem has 3 hard constraints and 2 soft constraints, so the fitness will check the violation of these constraints in each schedule,,the violation is penalized by multiplying the number of violations in each nurse with the corresponding wheight of the constraint so the final fitness value is the sum of penalty values of each nurse. finally the fitness value of the the evaluated schedule is saved in an array of fitness values (the same index where the evaluated scheule is stored in the population array).
My question is what is the suitable selection operator to select parents for crossover and mutation operators?
Points missing from your question:
what is the meaning of a value in the array of an "individual"?
what are the constraints that can be violated?
Does an "individual" mean both a Genotype and a Phenotype?
I think you could also illustrate these with a simple example, and for others' best understanding can you please use the GA terminology?
Up to this point I can only give a general answer. Generally I think it is better to search through non-violating individuals. What I would do is not to use a complicated fitting function. I would rather have the phenotype always a non-violating solution that can be quickly calculated from a (possibly violating) genotype. Maybe the genotype should not grasp the whole problem, just give a starting point for a simple allocating algorithm that does the phenotype.
If you have non-violating chromosomes, the mutations should be lightly affecting, leading to similar solutions. Your chromosomes will be potentially some sort of permutations and the mutations could be some transpositions on these. The crossover-born children should jump-away from the parent solutions, preserving some of their characteristics. For permutation-type chromosomes you can find standard crossover operators.