Implementing Sequentialfs - matlab

I am trying to implement sequentialfs for feature selection. I saw this post : Sequential feature selection Matlab
Tried to follow the example given as the solution to implement.
My TrainVec is a matrix of dimension 268 x1475 whereas TestVec is 116x1475 and TestLabel is 116 x 1 and TestLabel is 268 x 1.
the code i implemented is
f = #(TrainVec,TrainLabel,TestVec,TestLabel) sum(TestLabel ~= predict_label);
fs = sequentialfs(f,Vec,Label);
The error i get is :
??? Error using ==> crossval>evalFun at 505
The function
'#(TrainVec,TrainLabel,TestVec,TestLabel)sum(TestLabel~=predict_label)'
generated the following error:
Matrix dimensions must agree.
Error in ==> crossval>getFuncVal at 524
funResult = evalFun(funorStr,arg(:));
Error in ==> crossval at 363
funResult = getFuncVal(1, nData, cvp, data, funorStr, []);
Error in ==> sequentialfs>callfun at 495
funResult = crossval(fun,x,other_data{:},...
I have checked all my matrixes and ensured that they are of same dimensions. Not sure what is wrong. Need some guidance.
Error in ==> sequentialfs at 357
crit(k) = callfun(fun,x,other_data,cv,mcreps);

I'm not sure here if predict_label is a variable or a function with zero input arguments. I would guess that if it's a variable, it's not the same size as TestLabel; and if it's a function, it's either not returning something the same size as TestLabel or it has some intermediate calculation that is erroring.
Either way, you would typically want to be writing
f = #(TrainVec,TrainLabel,TestVec,TestLabel) sum(TestLabel ~= predict_label(TrainVec,TrainLabel,TestVec));
where predict_label is now a function that takes in your TrainVec and TrainLabel, builds a model, evaluates it on TestVec, and returns an array of predicted labels of the same size as TestLabel.

Related

Optimization with genetic algorithm in matlab

I have written a simple optimization code using genetic algorithm.I don't know why I get error during running the code.Here is my code:
f = #(x1,x2) 1-x1.^2+(x1-x2).^2;
A = [1 1;-1 2;2 1];
b =[2 2 3]' ;
Aeq = [];
beq = [];
Lb = [0 0]';
Ub = [];
[Xopt,Fval] = ga(f,2,A,b,Aeq,beq,Lb,Ub)
I don not know why matlab gives me error.I wrote this programm based on the "Genetic algorithm Documentation" bit still gives me error:
Error using #(x1,x2)1-x1.^2+(x1-x2).^2
Not enough input arguments.
Error in createAnonymousFcn>#(x)fcn(x,FcnArgs{:}) (line 11)
fcn_handle = #(x) fcn(x,FcnArgs{:});
Error in makeState (line 48)
firstMemberScore = FitnessFcn(state.Population(initScoreProvided+1,:));
Error in galincon (line 18)
state = makeState(GenomeLength,FitnessFcn,Iterate,output.problemtype,options);
Error in ga (line 351)
[x,fval,exitFlag,output,population,scores] = galincon(FitnessFcn,nvars, ...
Caused by:
Failure in initial user-supplied fitness function evaluation. GA cannot continue
Objective functions of all optimization methods in MATLAB only accept 1 argument. According to ga documents:
fun — Objective function
Objective
function, specified as a function handle or function name. Write the
objective function to accept a row vector of length nvars and return a
scalar value.
When the 'UseVectorized' option is true, write fun to accept a
pop-by-nvars matrix, where pop is the current population size. In this
case, fun returns a vector the same length as pop containing the
fitness function values. Ensure that fun does not assume any
particular size for pop, since ga can pass a single member of a
population even in a vectorized calculation.
Change you objective function and it should work:
f = #(x) 1-x(1).^2+(x(1)-x(2)).^2;

Error in MATLAB sequentialfs while selecting features from 94*263 feature vectors

I have 94 samples with 263 features for each sample. The total feature vector is 94*263 in size. There are no NaN or Inf value in the feature vectors. There are two classes (51 in class a and 43 in class b). I am using sequentialfs to select features but I am getting the following error each time:
Error using crossval>evalFun (line 480)
The function '#(XT,yT,Xt,yt)(sum(~strcmp(yt,classify(Xt,XT,yT,'quadratic'))))' generated the following error:
The input to SVD must not contain NaN or Inf.
The code is:
X = FEATUREVECTOR;
y = LABELS;
c = cvpartition(y,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'quadratic'))));
[fs,history] = sequentialfs(fun,X,y,'cv',c,'options',opts)
Can you please tell me how to solve the problem?
It looks like you are calling sequentialfs with some inputs, that MAY be vaguely related to the mess of random numbers we see in your question. Beyond that, I can't read anything from your mind. If you want help you need to show what you did.
I change input data and it works well,
load fisheriris;
X = randn(150,10);
X(:,[1 3 5 7 ])= meas;
y = species;
c = cvpartition(y,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'quadratic'))));
[fs,history] = sequentialfs(fun,X,y,'cv',c,'options',opts)
Your input data has problem.

Matlab feature selection

I am trying to learn relevant features in a 300*299 training matrix by taking a random row from it as my test data and applying sequentialfs on it. I have used the following code:
>> Md1=fitcdiscr(xtrain,ytrain);
>> func = #(xtrain, ytrain, xtest, ytest) sum(ytest ~= predict(Md1,xtest));
>> learnt = sequentialfs(func,xtrain,ytrain)
xtrain and ytrain are 299*299 and 299*1 respectively. Predict will give me the predicted label for xtest(which is some random row from original xtrain).
However, when I run my code, I get the following error:
Error using crossval>evalFun (line 480)
The function '#(xtrain,ytrain,xtest,ytest)sum(ytest~=predict(Md1,xtest))' generated the following error:
X must have 299 columns.
Error in crossval>getFuncVal (line 497)
funResult = evalFun(funorStr,arg(:));
Error in crossval (line 343)
funResult = getFuncVal(1, nData, cvp, data, funorStr, []);
Error in sequentialfs>callfun (line 485)
funResult = crossval(fun,x,other_data{:},...
Error in sequentialfs (line 353)
crit(k) = callfun(fun,x,other_data,cv,mcreps,ParOptions);
Error in new (line 13)
learnt = sequentialfs(func,xtrain,ytrain)
Where did I go wrong?
You should build your classifier inside func, not before.
sequentialfs calls the function each time on different sets, and a classifier must be built specifically for each set, using only the features sequentialfs selected for that iteration.
I'm not sure I managed to be clear, in practice you should move the first line of your code inside the body of func
Source: MathWorks

Matlab invalid vector using lpsolve

I am using a function in Matlab based on lp_solve. In my case, lp_solve is structured as follows:
A = rand (13336,3); %A is made of real numbers between 0 and 1. For this mwe, I thought 'rand' was fine
W = [0; 0; 1];
C = A(:,3);
B = 1E+09;
e = -1;
m= 13336;
xint = linspace(1,13336,13336);
xint = xint';
obj = lp_solve(A*W,C,B,e,zeros(m,1),ones(m,1),xint)
But when I run it, I get this error:
Error using mxlpsolve
invalid vector.
Error in lp_solve (line 46)
mxlpsolve('set_rh_vec', lp, b);
Error in mylpsolvefunction (line 32) %This is my function that uses lp_solve
obj = lp_solve(A*W,C,B,e,zeros(m,1),ones(m,1),xint);
I looked in the documentation, and it say, under the chapter "Matrices" that:
[...] if a dense matrix is provided, the dimension must exactly match the dimension that is expected by mxlpsolve. Matrices with too few or too much elements gives an 'invalid vector.' error. Sparse matrices can off course provide less elements (the non provided elements are seen as zero). However if too many elements are provided or an element with a too large index, again an 'invalid vector.' error is raised.
I did not understand what they mean when they say that the dimension "must exactly match the dimensions that is expected by mxlpsolve". Anyway, since they say that the error my also occur "if too many elements are provided", I tried to "cut" my inputs from 13336 elements to 50 (I am sure it works with 58 and I am quite sure it does also with 2000), but also this way I receive the same error. What may the problem be?

comparing MATLAB fmincon and ga (genetic algorithm) results: issue with ga

I have a fairly complex optimization problem set up that I've solved through fmincon by calling it like this
myfun = #(x5) 0.5 * (norm(C*x5 - d))^2 + 0.5 * (timeIntervalMeanGlobal * powerAbsMaxMaxGlobal * sum(x5(28:128),1))^2;
[x5, fval] = fmincon(myfun, initialGuess, -A, b, Aeq, beq, lb, []);
The components are far to long to print here, but here are the dimensions
C: 49 x 128
x5: 128 x 1
d: 49 x 1
timeIntervalMeanGlobal, powerAbsMaxMaxGlobal: constants
initialGuess: 128 x 1
A: 44541 x 128
b: 44541 x 1
Aeq: 24 x 128
beq: 24 x 1
lb: 128 x 1
This works in code, but I don't get results that I'm completely happy with. I'd like to compare it with the built in ga function in MATLAB, which is called in a similar way, but I get an error when I try to run it like this
[x5, fval] = ga(myfun, nvars, -A, b, Aeq, beq, lb, []);
where nvars = 128. There's a long list of about 8 errors starting with
??? Error using ==> mtimes
Inner matrix dimensions must agree.
and ending with
Caused by:
Failure in user-supplied fitness function evaluation. GA cannot continue.
Can someone please instruct me on how to call ga properly, and give insight on why this error might occur with the ga call when the same code doesn't cause an error with fmincon? I've tried all the MATLAB help files and examples with a few different permutations of this but no better luck. Thanks.
UPDATE: I think I found the problem but I don't know how to fix it. The ga documentation says "The fitness function should accept a row vector of length nvars". In my case, myfun is the fitness function, but x5 is a column vector (so is lb). So while mathematically I know that C*x5 = d is the same as x5'*C' = d' even for non-square matrices, I can't formulate the problem that way for the ga solver. I tried - it makes it past the fitness function but then I get the error
The number of rows in A must be the same as the length of b.
Any thoughts on how to get this problem in the right format for the solver? Thanks!
Got it! I just had to manipulate the fitness function to make it use x5 as a row vector even though it's a column vector in all the constraints
myfun = #(x5) 0.5 * (norm(x5 * C' - d'))^2 + 0.5 * (timeIntervalMeanGlobal * powerAbsMaxMaxGlobal * sum(x5(28:128)))^2;
Phew!