Matlab feature selection - matlab

I am trying to learn relevant features in a 300*299 training matrix by taking a random row from it as my test data and applying sequentialfs on it. I have used the following code:
>> Md1=fitcdiscr(xtrain,ytrain);
>> func = #(xtrain, ytrain, xtest, ytest) sum(ytest ~= predict(Md1,xtest));
>> learnt = sequentialfs(func,xtrain,ytrain)
xtrain and ytrain are 299*299 and 299*1 respectively. Predict will give me the predicted label for xtest(which is some random row from original xtrain).
However, when I run my code, I get the following error:
Error using crossval>evalFun (line 480)
The function '#(xtrain,ytrain,xtest,ytest)sum(ytest~=predict(Md1,xtest))' generated the following error:
X must have 299 columns.
Error in crossval>getFuncVal (line 497)
funResult = evalFun(funorStr,arg(:));
Error in crossval (line 343)
funResult = getFuncVal(1, nData, cvp, data, funorStr, []);
Error in sequentialfs>callfun (line 485)
funResult = crossval(fun,x,other_data{:},...
Error in sequentialfs (line 353)
crit(k) = callfun(fun,x,other_data,cv,mcreps,ParOptions);
Error in new (line 13)
learnt = sequentialfs(func,xtrain,ytrain)
Where did I go wrong?

You should build your classifier inside func, not before.
sequentialfs calls the function each time on different sets, and a classifier must be built specifically for each set, using only the features sequentialfs selected for that iteration.
I'm not sure I managed to be clear, in practice you should move the first line of your code inside the body of func
Source: MathWorks

Related

How can I Minimize this function with MATLAB?

I wrote the following function in MATLAB:
function EX_EFFICIENCY=EXERGY_EFFICIENCY_FUNCTION(CR,ER,PC,T0,P0)
I used the following order (ga):
x = ga(#EXERGY_EFFICIENCY_FUNCTION,5)
But it gets the error:
Not enough input arguments.
Error in EXERGY_EFFICIENCY_FUNCTION (line 22)
T7p=T0.*(PC.^((k-1)./k));
Error in createAnonymousFcn>#(x)fcn(x,FcnArgs{:}) (line 11) fcn_handle
= #(x) fcn(x,FcnArgs{:});
Error in makeState (line 47)
firstMemberScore = FitnessFcn(state.Population(initScoreProvided+1,:));
Error in gaunc (line 40) state =
makeState(GenomeLength,FitnessFcn,Iterate,output.problemtype,options);
Error in ga (line 398)
[x,fval,exitFlag,output,population,scores] = gaunc(FitnessFcn,nvars, ...
Caused by:
Failure in initial user-supplied fitness function evaluation. GA cannot continue.
How can I minimize this function?
What are the variables you want to minimize over? All five CR,ER,PC,T0,P0? Then you need to tell ga to use a length-5-vector and deal its elements to the input arguments of the function. Like this:
xopt = ga(#(x) EXERGY_EFFICIENCY_FUNCTION(x(1),x(2),x(3),x(4),x(5)), 5);
You can also fix some and optimize over the others of course, like this:
xopt = ga(#(x) EXERGY_EFFICIENCY_FUNCTION(x(1),x(2),PC,T0,P0), 2);
optimizes CR, ER for fixed values of PC,T0, and P0.

"Unable to prove literally" error when trying to create a symbolic function based on an existing function

I need to find inverse of virtual value function of lognormal random variables. This is what I tried to do:
syms flogn(x,p1,p2)
% assume(p2<=0); %Adding this doesn't change the error
flogn(x,p1,p2) = x - logncdf(x,p1,p2,'upper')/lognpdf(x,p1,p2);
glogn = finverse(flogn,x);
But I got the error:
Error using symengine
Unable to prove 'p2 <= 0' literally. Use 'isAlways' to test the statement mathematically.
Error in sym/subsindex (line 810)
X = find(mupadmex('symobj::logical',A.s,9)) - 1;
Error in sym/privsubsasgn (line 1085)
L_tilde2 = builtin('subsasgn',L_tilde,struct('type','()','subs',{varargin}),R_tilde);
Error in sym/subsasgn (line 922)
C = privsubsasgn(L,R,inds{:});
Error in logncdf>locallogncdf (line 73)
sigma(sigma <= 0) = NaN;
Error in logncdf (line 47)
[varargout{1:max(1,nargout)}] = locallogncdf(uflag,x,varargin{:});
Error in Untitled5 (line 3)
flogn(x,p1,p2) = x - logncdf(x,p1,p2,'upper')/lognpdf(x,p1,p2);
I also tried with beta distribution, and got a similar error. How can I use logncdf with symbolic variables?
Note that you could have minimalize your sample code even more, like this:
syms flogn(x,p1,p2)
flogn(x,p1,p2) = logncdf(x,p1,p2);
It's shorter, produces the same error and helps us to focus on the source of the error message.
So, the error doesn't come from trying to inverse a function, but from trying to use an existing function for numerical calculations with symbolic variables.
The error comes from the fact that you want to create a symbolic function flogn based on existing function logncdf, and logncdf has a multiple comparisons.
With the command edit logncdf, you can read the source code of the function and see comparison at lines 73 and 76.
% Return NaN for out of range parameters.
sigma(sigma <= 0) = NaN;
% Negative data would create complex values, which erfc cannot handle.
x(x < 0) = 0;
Matlab cannot compare symbols so it throws errors.
Depending on what you really need, you can have different solutions.
Do you really need to symbolize the function flogn? Couldn't you just write it as a function then calculate the inverse of it (if it can be inversed...)?
If you really want to keep the symbolization, you can also rewrite your own function logncdf (with another name) so it does not have the comparisons. But it's still not guaranteed that you will find an inverse.

Matlab TreeBagger: Table variable is not a valid predictor

I'm trying to use TreeBagger to build a classifier based on the UCI Diabetes 130-US database, http://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008.
I have imported the data as a table (letting Matlab decide on the data types), and have done some cleaning on the data. I'm calling the classifier as from an example, using my own data:
num_trees = 50;
B = TreeBagger(num_trees, train, train.readmitted,...
'OOBPrediction','On',...
'Method','classification');
oobErrorBaggedEnsemble = oobError(B);
plot(oobErrorBaggedEnsemble)
xlabel 'Number of grown trees';
ylabel 'Out-of-bag classification error';
I get the following error:
Error using classreg.learning.internal.table2PredictMatrix>makeXMatrix (line 100) Table variable is not a valid predictor.
Error in classreg.learning.internal.table2PredictMatrix (line 57)
Xout = makeXMatrix(X,CategoricalPredictors,vrange,pnames);
Error in classreg.learning.classif.CompactClassificationTree/predict (line 639)
X = classreg.learning.internal.table2PredictMatrix(X,[],[],...
Error in CompactTreeBagger/treeEval (line 1083)
[labels,~,nodes] = predict(tree,x);
Error in CompactTreeBagger/predictAccum (line 1414)
thisR = treeEval(bagger,it,thisX,doclassregtree);
Error in CompactTreeBagger/error (line 470)
predictAccum(bagger,X,'useifort',useIforT,...
Error in TreeBagger/oobError (line 1479)
err = error(bagger.Compact,bagger.X,bagger.Y,...
train is a table, and table.readmitted is a cell retrieved from the table. Most of the rows are cells, as most of the data in this dataset is categorical.
I'm wondering is there are certain datatypes that the classifier can't handle.
Thanks for any help!
The use of tables for the Machine Learning toolbox was introduced in R2016a. For previous versions, the data could only be passed to the fit* functions or TreeBagger as arrays.
The behavior changed from R2015(a,b) to R2016a.

Regression with Categorical Covariates matlab error

I am trying to run the example at:
http://nl.mathworks.com/help/stats/group-comparisons-using-categorical-arrays.html
using Matlab R2013b.
clear
load('carsmall')
cars = table(MPG,Weight,Model_Year);
cars.Model_Year = nominal(cars.Model_Year);
fit = fitlm(cars,'MPG~Weight*Model_Year')
Unfortunatelly I get the error:
Error using classreg.regr.FitObject/assignData (line 257)
Predictor and response variables must have the same length.
Error in classreg.regr.TermsRegression/assignData (line 349)
model = assignData#classreg.regr.ParametricRegression(model,X,y,w,asCat,varNames,excl);
Error in LinearModel.fit (line 852)
model = assignData(model,X,y,weights,asCatVar,dummyCoding,model.Formula.VariableNames,exclude);
Error in fitlm (line 111)
model = LinearModel.fit(X,varargin{:});
Any clue why?

How to use fmincon to optimize two control vectors of a function

I have a function of 2 different vector. These are the control vector (decision variables) of the function. I want to use fmincon to optimize this function and also get the both control vector results separately.
I have tried to use handle ,#, but I got an error.
The function is:
function f = myFS(x,sv) % x is a vector (5,1)
f = norm(x)^2-sigma*(sv(1)+sv(2));
end
%% I tried to write fmincone to consider both control vectors (x and sv)
[Xtemp(:,h2),Fval, fiasco] = fmincon(#(x,sv)myFS(x,sv)...
,xstart,[],[],[],[],VLB,VUB,#(x,sv)myCon(sv),options);
Here is the error I get:
Error using myFS (line 12) Not enough input arguments.
Error in fmincon (line 564)
initVals.f =
feval(funfcn{3},X,varargin{:});
Error in main_Econstraint (line 58) [Xtemp(:,h2),Fval, fiasco] =
fmincon('myFS',xstart,[],[],[],[],VLB,VUB,#(x,sv)myCon(sv),options);
Thanks
fmincon expects your function to be of a single variable, there is no getting around that, but see:
http://se.mathworks.com/help/optim/ug/passing-extra-parameters.html
for example, if both x, cv are variables of the optimization you can combine them and then split them in the actual objective
for example
x_cv = vertcat(x, cv) and then x = x_cv(1:5); cv = x_cv(6:end)'
if cv is not a variable of the optimization, then 'freeze it' as the link above suggests