Please help me to improve the following Matlab code to improve execution time.
Actually I want to make a random matrix (size [8,12,10]), and on every row, only have integer values between 1 and 12. I want the random matrix to have the sum of elements which has value (1,2,3,4) per column to equal 2.
The following code will make things more clear, but it is very slow.
Can anyone give me a suggestion??
clc
clear all
jum_kel=8
jum_bag=12
uk_pop=10
for ii=1:uk_pop;
for a=1:jum_kel
krom(a,:,ii)=randperm(jum_bag); %batasan tidak boleh satu kelompok melakukan lebih dari satu aktivitas dalam satu waktu
end
end
for ii=1:uk_pop;
gab1(:,:,ii) = sum(krom(:,:,ii)==1)
gab2(:,:,ii) = sum(krom(:,:,ii)==2)
gab3(:,:,ii) = sum(krom(:,:,ii)==3)
gab4(:,:,ii) = sum(krom(:,:,ii)==4)
end
for jj=1:uk_pop;
gabh1(:,:,jj)=numel(find(gab1(:,:,jj)~=2& gab1(:,:,jj)~=0))
gabh2(:,:,jj)=numel(find(gab2(:,:,jj)~=2& gab2(:,:,jj)~=0))
gabh3(:,:,jj)=numel(find(gab3(:,:,jj)~=2& gab3(:,:,jj)~=0))
gabh4(:,:,jj)=numel(find(gab4(:,:,jj)~=2& gab4(:,:,jj)~=0))
end
for ii=1:uk_pop;
tot(:,:,ii)=gabh1(:,:,ii)+gabh2(:,:,ii)+gabh3(:,:,ii)+gabh4(:,:,ii)
end
for ii=1:uk_pop;
while tot(:,:,ii)~=0;
for a=1:jum_kel
krom(a,:,ii)=randperm(jum_bag); %batasan tidak boleh satu kelompok melakukan lebih dari satu aktivitas dalam satu waktu
end
gabb1 = sum(krom(:,:,ii)==1)
gabb2 = sum(krom(:,:,ii)==2)
gabb3 = sum(krom(:,:,ii)==3)
gabb4 = sum(krom(:,:,ii)==4)
gabbh1=numel(find(gabb1~=2& gabb1~=0));
gabbh2=numel(find(gabb2~=2& gabb2~=0));
gabbh3=numel(find(gabb3~=2& gabb3~=0));
gabbh4=numel(find(gabb4~=2& gabb4~=0));
tot(:,:,ii)=gabbh1+gabbh2+gabbh3+gabbh4;
end
end
Some general suggestions:
Name variables in English. Give a short explanation if it is not immediately clear,
what they are indented for. What is jum_bag for example? For me uk_pop is music style.
Write comments in English, even if you develop source code only for yourself.
If you ever have to share your code with a foreigner, you will spend a lot of time
explaining or re-translating. I would like to know for example, what
%batasan tidak boleh means. Probably, you describe here that this is only a quick
hack but that someone should really check this again, before going into production.
Specific to your code:
Its really easy to confuse gab1 with gabh1 or gabb1.
For me, krom is too similar to the built-in function kron. In fact, I first
thought that you are computing lots of tensor products.
gab1 .. gab4 are probably best combined into an array or into a cell, e.g. you
could use
gab = cell(1, 4);
for ii = ...
gab{1}(:,:,ii) = sum(krom(:,:,ii)==1);
gab{2}(:,:,ii) = sum(krom(:,:,ii)==2);
gab{3}(:,:,ii) = sum(krom(:,:,ii)==3);
gab{4}(:,:,ii) = sum(krom(:,:,ii)==4);
end
The advantage is that you can re-write the comparsisons with another loop.
It also helps when computing gabh1, gabb1 and tot later on.
If you further introduce a variable like highestNumberToCompare, you only have to
make one change, when you certainly find out that its important to check, if the
elements are equal to 5 and 6, too.
Add a semicolon at the end of every command. Having too much output is annoying and
also slow.
The numel(find(gabb1 ~= 2 & gabb1 ~= 0)) is better expressed as
sum(gabb1(:) ~= 2 & gabb1(:) ~= 0). A find is not needed because you do not care
about the indices but only about the number of indices, which is equal to the number
of true's.
And of course: This code
for ii=1:uk_pop
gab1(:,:,ii) = sum(krom(:,:,ii)==1)
end
is really, really slow. In every iteration, you increase the size of the gab1
array, which means that you have to i) allocate more memory, ii) copy the old matrix
and iii) write the new row. This is much faster, if you set the size of the
gab1 array in front of the loop:
gab1 = zeros(... final size ...);
for ii=1:uk_pop
gab1(:,:,ii) = sum(krom(:,:,ii)==1)
end
Probably, you should also re-think the size and shape of gab1. I don't think, you
need a 3D array here, because sum() already reduces one dimension (if krom is
3D the output of sum() is at most 2D).
Probably, you can skip the loop at all and use a simple sum(krom==1, 3) instead.
However, in every case you should be really aware of the size and shape of your
results.
Edit inspired by Rody Oldenhuis:
As Rody pointed out, the 'problem' with your code is that its highly unlikely (though
not impossible) that you create a matrix which fulfills your constraints by assigning
the numbers randomly. The code below creates a matrix temp with the following characteristics:
The numbers 1 .. maxNumber appear either twice per column or not at all.
All rows are a random permutation of the numbers 1 .. B, where B is equal to
the length of a row (i.e. the number of columns).
Finally, the temp matrix is used to fill a 3D array called result. I hope, you can adapt it to your needs.
clear all;
A = 8; B = 12; C = 10;
% The numbers [1 .. maxNumber] have to appear exactly twice in a
% column or not at all.
maxNumber = 4;
result = zeros(A, B, C);
for ii = 1 : C
temp = zeros(A, B);
for number = 1 : maxNumber
forbiddenRows = zeros(1, A);
forbiddenColumns = zeros(1, A/2);
for count = 1 : A/2
illegalIndices = true;
while illegalIndices
illegalIndices = false;
% Draw a column which has not been used for this number.
randomColumn = randi(B);
while any(ismember(forbiddenColumns, randomColumn))
randomColumn = randi(B);
end
% Draw two rows which have not been used for this number.
randomRows = randi(A, 1, 2);
while randomRows(1) == randomRows(2) ...
|| any(ismember(forbiddenRows, randomRows))
randomRows = randi(A, 1, 2);
end
% Make sure not to overwrite previous non-zeros.
if any(temp(randomRows, randomColumn))
illegalIndices = true;
continue;
end
end
% Mark the rows and column as forbidden for this number.
forbiddenColumns(count) = randomColumn;
forbiddenRows((count - 1) * 2 + (1:2)) = randomRows;
temp(randomRows, randomColumn) = number;
end
end
% Now every row contains the numbers [1 .. maxNumber] by
% construction. Fill the zeros with a permutation of the
% interval [maxNumber + 1 .. B].
for count = 1 : A
mask = temp(count, :) == 0;
temp(count, mask) = maxNumber + randperm(B - maxNumber);
end
% Store this page.
result(:,:,ii) = temp;
end
OK, the code below will improve the timing significantly. It's not perfect yet, it can all be optimized a lot further.
But, before I do so: I think what you want is fundamentally impossible.
So you want
all rows contain the numbers 1 through 12, in a random permutation
any value between 1 and 4 must be present either twice or not at all in any column
I have a hunch this is impossible (that's why your code never completes), but let me think about this a bit more.
Anyway, my 5-minute-and-obvious-improvements-only-version:
clc
clear all
jum_kel = 8;
jum_bag = 12;
uk_pop = 10;
A = jum_kel; % renamed to make language independent
B = jum_bag; % and a lot shorter for readability
C = uk_pop;
krom = zeros(A, B, C);
for ii = 1:C;
for a = 1:A
krom(a,:,ii) = randperm(B);
end
end
gab1 = sum(krom == 1);
gab2 = sum(krom == 2);
gab3 = sum(krom == 3);
gab4 = sum(krom == 4);
gabh1 = sum( gab1 ~= 2 & gab1 ~= 0 );
gabh2 = sum( gab2 ~= 2 & gab2 ~= 0 );
gabh3 = sum( gab3 ~= 2 & gab3 ~= 0 );
gabh4 = sum( gab4 ~= 2 & gab4 ~= 0 );
tot = gabh1+gabh2+gabh3+gabh4;
for ii = 1:C
ii
while tot(:,:,ii) ~= 0
for a = 1:A
krom(a,:,ii) = randperm(B);
end
gabb1 = sum(krom(:,:,ii) == 1);
gabb2 = sum(krom(:,:,ii) == 2);
gabb3 = sum(krom(:,:,ii) == 3);
gabb4 = sum(krom(:,:,ii) == 4);
gabbh1 = sum(gabb1 ~= 2 & gabb1 ~= 0)
gabbh2 = sum(gabb2 ~= 2 & gabb2 ~= 0);
gabbh3 = sum(gabb3 ~= 2 & gabb3 ~= 0);
gabbh4 = sum(gabb4 ~= 2 & gabb4 ~= 0);
tot(:,:,ii) = gabbh1+gabbh2+gabbh3+gabbh4;
end
end
Related
I'm writing a program that finds the indices of a matrix G where there is only a single 1 for either a column index or a row index and removes any found index if it has a 1 for both the column and row index. Then I want to take these indices and use them as indices in an array U, which is where the trouble comes. The indices do not seem to be stored as integers and I'm not sure what they are being stored as or why. I'm quite new to Matlab (but thats probably obvious) and so I don't really understand how types work for Matlab or how they're assigned. So I'm not sure why I',m getting the error message mentioned in the title and I'm not sure what to do about it. Any assistance you can provide would be greatly appreciated.
I forgot to mention this before but G is a matrix that only contains 1s or 0s and U is an array of strings (i think what would be called a cell?)
function A = ISClinks(U, G)
B = [];
[rownum,colnum] = size(G);
j = 1;
for i=1:colnum
s = sum(G(:,i));
if s == 1
B(j,:) = i;
j = j + 1;
end
end
for i=1:rownum
s = sum(G(i,:));
if s == 1
if ismember(i, B)
B(B == i) = [];
else
B(j,:) = i;
j = j+1;
end
end
end
A = [];
for i=1:size(B,1)
s = B(i,:);
A(i,:) = U(s,:);
end
end
This is the problem code, but I'm not sure what's wrong with it.
A = [];
for i=1:size(B,1)
s = B(i,:);
A(i,:) = U(s,:);
end
Your program seems to be structured as though it had been written in a language like C. In MATLAB, you can usually substitute specialized functions (e.g. any() ) for low-level loops in many cases. Your function could be written more efficiently as:
function A = ISClinks(U, G)
% Find columns and rows that are set in the input
active_columns=any(G,1);
active_rows=any(G,2).';
% (Optional) Prevent columns and rows with same index from being simultaneously set
%exclusive_active_columns = active_columns & ~active_rows; %not needed; this line is only for illustrative purposes
%exclusive_active_rows = active_rows & ~active_columns; %same as above
% Merge column state vector and row state vector by XORing them
active_indices=xor(active_columns,active_rows);
% Select appropriate rows of matrix U
A=U(active_indices,:);
end
This function does not cause errors with the example input matrices I tested. If U is a cell array (e.g. U={'Lorem','ipsum'; 'dolor','sit'; 'amet','consectetur'}), then return value A will also be a cell array.
I have a code with two for loops. The code is working properly. The problem is that at the end I would like to get a variable megafinal with the results for all the years. The original varaible A has 3M rows, so it gives me an error because the size of the megafinal changes with each loop iteration and matlab stops running the code. I guess it’s a problem of inefficiency. Does anyone know a way to get this final variable despite of the size?
y = 1997:2013;
for i=1:length(y)
A=b(cell2mat(b(:,1))==y(i),:);
%Obtain the absolute value of the difference
c= cellfun(#minus,A(:,3),A(:,4));
c=abs(c);
c= num2cell(c);
A(:,end+1) = c;
%Delete rows based on a condition
d = (abs(cell2mat(A(:,8)) - cell2mat(A(:,7))));
[~, ind1] = sort(d);
e= A(ind1(end:-1:1),:);
[~, ind2,~] = unique(strcat(e(:,2),e(:, 6)));
X= e(ind2,:);
(…)
for j = 2:length(X)
if strcmp(X(j,2),X(j-1,2)) == 0
lin2 = j-1;
%Sort
X(lin1:lin2,:) = sortrows(X(lin1:lin2,:),13);
%Rank
[~,~,f]=unique([X{lin1:lin2,13}].');
g=accumarray(f,(1:numel(f))',[],#mean);
X(lin1:lin2,14)=num2cell(g(f));
%Score
out1 = 100 - ((cell2mat(X(lin1:lin2,14))-1) ./ size(X(lin1:lin2,:),1))*100;
X(lin1:lin2,15) = num2cell(out1);
lin1 = j;
end
end
%megafinal(i)=X
end
Make megafinal a cell array. This will account for the varying sizes of X at each iteration. As such, simply do this:
megafinal{i} = X;
To access a cell element, you just have to do megafinal{num}, where num is any index you want.
I have written a code that stores data in a matrix, but I want to shorten it so it iterates over itself.
The number of matrices created is the known variable. If it was 3, the code would be:
for i = 1:31
if idx(i) == 1
C1 = [C1; Output2(i,:)];
end
if idx(i) == 2
C2 = [C2; Output2(i,:)];
end
if idx(i) == 3
C3 = [C3; Output2(i,:)];
end
end
If I understand correctly, you want to extract rows from Output2 into new variables based on idx values? If so, you can do as follows:
Output2 = rand(5, 10); % example
idx = [1,1,2,2,3];
% get rows from Output which numbers correspond to those in idx with given value
C1 = Output2(find(idx==1),:);
C2 = Output2(find(idx==2),:);
C3 = Output2(find(idx==3),:);
Similar to Marcin i have another solution. Here i predefine my_C as a cell array. Output2 and idx are random generated and instead of find i just use logical adressing. You have to convert the data to type cell {}
Output2 = round(rand(31,15)*10);
idx = uint8(round(1+rand(1,31)*2));
my_C = cell(1,3);
my_C(1,1) = {Output2(idx==1,:)};
my_C(1,2) = {Output2(idx==2,:)};
my_C(1,3) = {Output2(idx==3,:)};
If you want to get your data back just use e.g. my_C{1,1} for the first group.
If you have not 3 but n resulting matrices you can use:
Output2 = round(rand(31,15)*10);
idx = uint8(round(1+rand(1,31)*(n-1)));
my_C = cell(1,n);
for k=1:n
my_C(1,k) = {Output2(idx==k,:)};
end
Where n is a positive integer number
I would recommend a slighty different approach. Except for making the rest of the code more maintainable it may also slightly speed up the execution. This due to that matlab uses a JIT compiler and eval must be recompiled every time. Try this:
nMatrices = 3
for k = 1:nMatrices
C{k} = Output2(idx==k,:);
end
As patrik said in the comments, naming variables like this is poor practice. You would be better off using cell arrays M{1}=C1, or if all the Ci are the same size, even just a 3D array M, for example, where M(:,:,1)=C1.
If you really want to use C1, C2, ... as you variable names, I think you will have to use eval, as arielnmz mentioned. One way to do this in matlab is
for i=1:3
eval(['C' num2str(idx(i)) '=[C' num2str(idx(i)) ';Output2(' num2str(i) ',:)];'])
end
Edited to add test code:
idx=[2 1 3 2 2 3];
Output2=rand(6,4);
C1a=[];
C2a=[];
C3a=[];
for i = 1:length(idx)
if idx(i) == 1
C1a = [C1a; Output2(i,:)];
end
if idx(i) == 2
C2a = [C2a; Output2(i,:)];
end
if idx(i) == 3
C3a = [C3a; Output2(i,:)];
end
end
C1=[];
C2=[];
C3=[];
for i=1:length(idx)
eval(['C' num2str(idx(i)) '=[C' num2str(idx(i)) ';Output2(' num2str(i) ',:)];'])
end
all(C1a(:)==C1(:))
all(C2a(:)==C2(:))
all(C3a(:)==C3(:))
Hello again logical friends!
I’m aware this is quite an involved question so please bear with me! I think I’ve managed to get it down to two specifics:- I need two loops which I can’t seem to get working…
Firstly; The variable rollers(1).ink is a (12x1) vector containing ink values. This program shares the ink equally between rollers at each connection. I’m attempting to get rollers(1).ink to interact with rollers(2) only at specific timesteps. The ink should transfer into the system once for every full revolution i.e. nTimesSteps = each multiple of nBins_max. The ink should not transfer back to rollers(1).ink as the system rotates – it should only introduce ink to the system once per revolution and not take any back out. Currently I’ve set rollers(1).ink = ones but only for testing. I’m truly stuck here!
Secondly; The reason it needs to do this is because at the end of the sim I also wish to remove ink in the form of a printed image. The image should be a reflection of the ink on the last roller in my system and half of this value should be removed from the last roller and taken out of the system at each revolution. The ink remaining on the last roller should be recycled and ‘re-split’ in the system ready for the next rotation.
So…I think it’s around the loop beginning line86 where I need to do all this stuff. In pseudo, for the intermittent in-feed I’ve been trying something like:
For k = 1:nTimeSteps
While nTimesSteps = mod(nTimeSteps, nBins_max) == 0 % This should only output when nTimeSteps is a whole multiple of nBins_max i.e. one full revolution
‘Give me the ink on each segment at each time step in a matrix’
End
The output for averageAmountOfInk is the exact format I would like to return this data except I don’t really need the average, just the actual value at each moment in time. I keep getting errors for dimensional mismatches when I try to re-create this using something like:
For m = 1:nTimeSteps
For n = 1:N
Rollers(m,n) = rollers(n).ink’;
End
End
I’ll post the full code below if anyone is interested to see what it does currently. There’s a function at the end also which of course needs to be saved out to a separate file.
I’ve posted variations of this question a couple of times but I’m fully aware it’s quite a tricky one and I’m finding it difficult to get my intent across over the internets!
If anyone has any ideas/advice/general insults about my lack of programming skills then feel free to reply!
%% Simple roller train
% # Single forme roller
% # Ink film thickness = 1 micron
clc
clear all
clf
% # Initial state
C = [0,70; % # Roller centres (x, y)
10,70;
21,61;
11,48;
21,34;
27,16;
0,0
];
R = [5.6,4.42,9.8,6.65,10.59,8.4,23]; % # Roller radii (r)
% # Direction of rotation (clockwise = -1, anticlockwise = 1)
rotDir = [1,-1,1,-1,1,-1,1]';
N = numel(R); % # Amount of rollers
% # Find connected rollers
isconn = #(m, n)(sum(([1, -1] * C([m, n], :)).^2)...
-sum(R([m, n])).^2 < eps);
[Y, X] = meshgrid(1:N, 1:N);
conn = reshape(arrayfun(isconn, X(:), Y(:)), N, N) - eye(N);
% # Number of bins for biggest roller
nBins_max = 50;
nBins = round(nBins_max*R/max(R))';
% # Initialize roller struct
rollers = struct('position',{}','ink',{}','connections',{}',...
'rotDirection',{}');
% # Initialise matrices for roller properties
for ii = 1:N
rollers(ii).ink = zeros(1,nBins(ii));
rollers(ii).rotDirection = rotDir(ii);
rollers(ii).connections = zeros(1,nBins(ii));
rollers(ii).position = 1:nBins(ii);
end
for ii = 1:N
for jj = 1:N
if(ii~=jj)
if(conn(ii,jj) == 1)
connInd = getConnectionIndex(C,ii,jj,nBins(ii));
rollers(ii).connections(connInd) = jj;
end
end
end
end
% # Initialize averageAmountOfInk and calculate initial distribution
nTimeSteps = 1*nBins_max;
averageAmountOfInk = zeros(nTimeSteps,N);
inkPerSeg = zeros(nTimeSteps,N);
for ii = 1:N
averageAmountOfInk(1,ii) = mean(rollers(ii).ink);
end
% # Iterate through timesteps
for tt = 1:nTimeSteps
rollers(1).ink = ones(1,nBins(1));
% # Rotate all rollers
for ii = 1:N
rollers(ii).ink(:) = ...
circshift(rollers(ii).ink(:),rollers(ii).rotDirection);
end
% # Update all roller-connections
for ii = 1:N
for jj = 1:nBins(ii)
if(rollers(ii).connections(jj) ~= 0)
index1 = rollers(ii).connections(jj);
index2 = find(ii == rollers(index1).connections);
ink1 = rollers(ii).ink(jj);
ink2 = rollers(index1).ink(index2);
rollers(ii).ink(jj) = (ink1+ink2)/2;
rollers(index1).ink(index2) = (ink1+ink2)/2;
end
end
end
% # Calculate average amount of ink on each roller
for ii = 1:N
averageAmountOfInk(tt,ii) = sum(rollers(ii).ink);
end
end
image(5:20) = (rollers(7).ink(5:20))./2;
inkPerSeg1 = [rollers(1).ink]';
inkPerSeg2 = [rollers(2).ink]';
inkPerSeg3 = [rollers(3).ink]';
inkPerSeg4 = [rollers(4).ink]';
inkPerSeg5 = [rollers(5).ink]';
inkPerSeg6 = [rollers(6).ink]';
inkPerSeg7 = [rollers(7).ink]';
This is an extended comment rather than a proper answer, but the comment box is a bit too small ...
Your code overwhelms me, I can't see the wood for the trees. I suggest that you eliminate all the stuff we don't need to see to help you with your immediate problem (all those lines drawing figures for example) -- I think it will help you to debug your code yourself to put all that stuff into functions or scripts.
Your code snippet
For k = 1:nTimeSteps
While nTimesSteps = mod(nTimeSteps, nBins_max) == 0
‘Give me the ink on each segment at each time step in a matrix’
End
might be (I don't quite understand your use of the while statement, the word While is not a Matlab keyword, and as you have written it the value returned by the statement doesn't change from iteration to iteration) equivalent to
For k = 1:nBins_max:nTimeSteps
‘Give me the ink on each segment at each time step in a matrix’
End
You seem to have missed an essential feature of Matlab's colon operator ...
1:8 = [1 2 3 4 5 6 7 8]
but
1:2:8 = [1 3 5 7]
that is, the second number in the triplet is the stride between successive elements.
Your matrix conn has a 1 at the (row,col) where rollers are connected, and a 0 elsewhere. You can find the row and column indices of all the 1s like this:
[ri,ci] = find(conn==1)
You could then pick up the (row,col) locations of the 1s without the nest of loops and if statements that begins
for ii = 1:N
for jj = 1:N
if(ii~=jj)
if(conn(ii,jj) == 1)
I could go on, but won't, that's enough for one comment.
I am trying to implement decision tree with recursion: So far I have written the following:
From a give data set, find the best split and return the branches, to give more details lets say I have data with features as columns of matrix and last column indicate the class of the data 1, -1.
Based on 1. I have a best feature to split along with the branches under that split, lets say based on Information gain I get feature 9 is the best split and unique values in feature 9 {1,3,5} are the branches of 9
I have figured how to get the data related to ach branch, then I need to iterate over each branch's data to get the next set of split. I am having trouble figuring this recursion.
Here is the code that I have so far, the recursion that I am doing right now doesn't look right: How can I fix this?
function [indeces_of_node, best_split] = split_node(X_train, Y_train)
%cell to save split information
feature_to_split_cell = cell(size(X_train,2)-1,4);
%iterate over features
for feature_idx=1:(size(X_train,2) - 1)
%get current feature
curr_X_feature = X_train(:,feature_idx);
%identify the unique values
unique_values_in_feature = unique(curr_X_feature);
H = get_entropy(Y_train); %This is actually H(X) in slides
%temp entropy holder
%Storage for feature element's class
element_class = zeros(size(unique_values_in_feature,1),2);
%conditional probability H(X|y)
H_cond = zeros(size(unique_values_in_feature,1),1);
for aUnique=1:size(unique_values_in_feature,1)
match = curr_X_feature(:,1)==unique_values_in_feature(aUnique);
mat = Y_train(match);
majority_class = mode(mat);
element_class(aUnique,1) = unique_values_in_feature(aUnique);
element_class(aUnique,2) = majority_class;
H_cond(aUnique,1) = (length(mat)/size((curr_X_feature),1)) * get_entropy(mat);
end
%Getting the information gain
IG = H - sum(H_cond);
%Storing the IG of features
feature_to_split_cell{feature_idx, 1} = feature_idx;
feature_to_split_cell{feature_idx, 2} = max(IG);
feature_to_split_cell{feature_idx, 3} = unique_values_in_feature;
feature_to_split_cell{feature_idx, 4} = element_class;
end
%set feature to split zero for every fold
feature_to_split = 0;
%getting the max IG of the fold
max_IG_of_fold = max([feature_to_split_cell{:,2:2}]);
%vector to store values in the best feature
values_of_best_feature = zeros(size(15,1));
%Iterating over cell to get get the index and the values under best
%splited feature.
for i=1:length(feature_to_split_cell)
if (max_IG_of_fold == feature_to_split_cell{i,2});
feature_to_split = i;
values_of_best_feature = feature_to_split_cell{i,4};
end
end
display(feature_to_split)
display(values_of_best_feature(:,1)')
curr_X_feature = X_train(:,feature_to_split);
best_split = feature_to_split
indeces_of_node = unique(curr_X_feature)
%testing
for k = 1 : length(values_of_best_feature)
% Condition to stop the recursion, if clases are pure then we are
% done splitting, if both classes have save number of attributes
% then we are done splitting.
if (sum(values_of_best_feature(:,2) == -1) ~= sum(values_of_best_feature(:,2) == 1))
if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
[indeces_of_node, best_split] = split_node(mat1, Y_train);
end
end
end
end
Here is the out of my code: and looks like some in my recursion I am only going depth of one branch and after that I never go back to rest of the branches
feature_to_split =
5
ans =
1 2 3 4 5 6 7 8 9
feature_to_split =
9
ans =
3 5 7 8 11
feature_to_split =
21
feature_to_split =
21
feature_to_split =
21
feature_to_split =
21
if you are interest in running this code: git
After multiple rounds of debug, I figured the answers, I hope someone will benefit from this:
for k = 1 : length(values_of_best_feature)
% Condition to stop the recursion, if clases are pure then we are
% done splitting, if both classes have save number of attributes
% then we are done splitting.
if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
X_train(:,feature_to_split) = [];
mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
%if(level >= curr_level)
split_node(mat1, Y_train, 1, 2, level-1);
%end
end
end
return;