Error in MATLAB sequentialfs while selecting features from 94*263 feature vectors - matlab

I have 94 samples with 263 features for each sample. The total feature vector is 94*263 in size. There are no NaN or Inf value in the feature vectors. There are two classes (51 in class a and 43 in class b). I am using sequentialfs to select features but I am getting the following error each time:
Error using crossval>evalFun (line 480)
The function '#(XT,yT,Xt,yt)(sum(~strcmp(yt,classify(Xt,XT,yT,'quadratic'))))' generated the following error:
The input to SVD must not contain NaN or Inf.
The code is:
X = FEATUREVECTOR;
y = LABELS;
c = cvpartition(y,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'quadratic'))));
[fs,history] = sequentialfs(fun,X,y,'cv',c,'options',opts)
Can you please tell me how to solve the problem?

It looks like you are calling sequentialfs with some inputs, that MAY be vaguely related to the mess of random numbers we see in your question. Beyond that, I can't read anything from your mind. If you want help you need to show what you did.
I change input data and it works well,
load fisheriris;
X = randn(150,10);
X(:,[1 3 5 7 ])= meas;
y = species;
c = cvpartition(y,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'quadratic'))));
[fs,history] = sequentialfs(fun,X,y,'cv',c,'options',opts)
Your input data has problem.

Related

Matlab Error in using pdist2 for high dimensional data

I have a feature vector of 13 dimensions for m samples. I am trying to find k nearest neighbors of each sample.
I have selected the feature vector required.
[m,~] = size(featurevector);
X = [];
Y = [];
for i = 1: m-1
if (featurevector(i,14) == 3)
X = [X;featurevector(i,1:13)];
Y = [Y;featurevector(i,1:13)];
end
end
I have tried calculating the distances for each point
d = pdist2(X,Y,'euclidean');
It works fine till here. Now I wanted to find the k = 3 nearest neighbor indexes of each and every sample, so it tried
[idx,dist] = knnsearch(X,Y,'k',3,'distance','euclidean');
but it shows an error
Error using pdist2
Too many input arguments.
Error in ExhaustiveSearcher/knnsearch (line 207)
[dist,idx] = pdist2(obj.X,Y, distMetric, arg{:}, 'smallest',numNN);
Error in knnsearch (line 144)
[idx, dist] = knnsearch(O,Y,'k',numNN, 'includeties',includeTies);
I have tried with example that mentioned in help . it works fine and I am not able to do when number of samples = 630.
Did my coding went wrong ?
I am using Matlab 2015a
my paths on command
which -all pdist2
gives
c:\toolbox\classify\pdist2.m C:\Program Files (x86)\MATLAB\MATLAB Production Server\R2015a\toolbox\stats\stats\pdist2.m % Shadowed
any help appreciated !

MATLAB: Using the 'resubstitution' option in sequentialfs

I'm new to MATLAB and trying to implement sequentialfs to identify the best subsets to fit a linear regression. I've read through the online documentation but am finding it difficult to understand.
I have set of training data and would like to use the 'resubstitution' option to apply this as test data as well.
x_train is a matrix with 67 rows and 8 columns.
y_train has 67 rows and 1 column.
Could somebody please check why this code doesn't work?
x_train = std_pros_pred_full;
y_train = pros_resp;
fun_RSS = #RSS_check;
inmodel = sequentialfs(fun_RSS,x_train,y_train,'resubstitution');
The function RSS_check performs a linear regression calculation and outputs the sum of squared errors. It's defined (externally) like this:
function RSS_out = RSS_check(X,Y)
lin1 = X'*X;
lin2 = X'*Y;
lin_coef = (lin1^-1)*lin2;
lin_fit = X*lin_coef;
row_count = size(Y,1);
RSS_out = 0;
for q = 1:row_count
pred_diff = lin_fit(q,1) - Y(q,1);
RSS_out =RSS_out + pred_diff^2;
end
The error message is:
Error using sequentialfs (line 212)
Wrong number of arguments.
When trying different options, I've also had errors concerning the number of inputs and outputs to the function. A lot of examples I've seen reference separate matrices of test data (giving 4 inputs), but I thought that would be unnecessary with the 'resubstitution' option.

Fitting model to data in matlab

i have some experimental data and a theoretical model which i would like to try and fit. i have made a function file with the model - the code is shown below
function [ Q,P ] = RodFit(k,C )
% Function file for the theoretical scattering from a Rod
% R = radius, L = length
R = 10; % radius in Å
L = 1000; % length in Å
Q = 0.001:0.0001:0.5;
fun = #(x) ( (2.*besselj(1,Q.*R.*sin(x)))./...
(Q.*R.*sin(x)).*...
(sin(Q.*L.*cos(x)./2))./...
(Q.*L.*cos(x)./2)...
).^2.*sin(x);
P = (integral(fun,0,pi/2,'ArrayValued',true))*k+C;
end
with Q being the x-values and P being the y-values. I can call the function fine from the matlab command line and it works fine e.g. [Q,P] = RodFit(1,0.001) gives me a result i can plot using plot(Q,P)
But i cannot figure how to best find the fit to some experimental data. Ideally, i would like to use the optimization toolbox and lsqcurvefit since i would then also be able to optimize the R and L parameters. but i do not know how to pass (x,y) data to lsqcurvefit. i have attempted it with the code below but it does not work
File = 30; % the specific observation you want to fit the model to
ydata = DataFiles{1,File}.data(:,2)';
% RAdius = linspace(10,1000,length(ydata));
% LEngth = linspace(100,10000,length(ydata));
Multiplier = linspace(1e-3,1e3,length(ydata));
Constant = linspace(0,1,length(ydata));
xdata = [Multiplier; Constant]; % RAdius; LEngth;
L = lsqcurvefit(#RodFit,[1;0],xdata,ydata);
it gives me the error message:
Error using *
Inner matrix dimensions must agree.
Error in RodFit (line 15)
P = (integral(fun,0,pi/2,'ArrayValued',true))*k+C;
Error in lsqcurvefit (line 199)
initVals.F = feval(funfcn_x_xdata{3},xCurrent,XDATA,varargin{:});
Caused by:
Failure in initial user-supplied objective function evaluation. LSQCURVEFIT cannot continue.
i have tried i) making all vectors/matrices the same length and ii) tried using .* instead. nothing works and i am giving the same error message
Any kind of help would be greatly appreciated, whether it is suggestion regading what method is should use, suggestions to my code or something third.
EDIT TO ANSWER Osmoses:
A really good point but i do not think that is the problem. just checked the size of the all the vectors/matrices and they should be alright
>> size(Q)
ans =
1 1780
>> size(P)
ans =
1 1780
>> size(xdata)
ans =
2 1780
>> size([1;0.001]) - the initial guess/start point for xdata (x0)
ans =
2 1
>> size(ydata)
ans =
1 1780
UPDATE
I think i have identified the problem. the function RodFit works fine when i specify the input directly e.g. [Q,P] = RodFit(1,0.001);.
however, if i define x0 as x0 = [1,0.001] i cannot pass x0 to the function
>> x0 = [1;0.001]
x0 =
1.0000
0.0010
>> RodFit(x0);
Error using *
Inner matrix dimensions must agree.
Error in RodFit (line 15)
P = (integral(fun,0,pi/2,'ArrayValued',true))*k+C;
The same happens if i use x0 = [1,0.001]
clearly, matlab is interpreting x0 as input for k only and attempts to multiplay a vector of length(ydata) and a vector of length(x0) which obviously fails.
So my problem is that i need to code so that lsqcurvefit understands that the first column of xdata and x0 is the k variable and the second column of xdata and x0 is the C variable. According to the documentation - Passing Matrix Arguments - i should be able to pass x0 as a matrix to the solver. The solver should then also pass the xdata in the same format as x0.
Have you tried (that's sometimes the mistake) looking at the orientation of your input data (e.g. if xdata & ydata are both row/column vectors?). Other than that your code looks like it should work.
I have been able to solve some of the problems. One mistake in my code was that the objective function did not use of vector a variables but instead took in two variables - k and C. changing the code to accept a vector solved this problem
function [ Q,P ] = RodFit(X)
% Function file for the theoretical scattering from a Rod
% R = radius, L = length
% Q = 0.001:0.0001:0.5;
Q = linspace(0.11198,4.46904,1780);
fun = #(x) ( (2.*besselj(1,Q.*R.*sin(x)))./...
(Q.*R.*sin(x)).*...
(sin(Q.*L.*cos(x)./2))./...
(Q.*L.*cos(x)./2)...
).^2.*sin(x);
P = (integral(fun,0,pi/2,'ArrayValued',true))*X(1)+X(2);
with the code above, i can define x0 as x0 = [1 0.001];, and pass that into RodFit and get a result. i can also pass xdata into the function and get a result e.g. [Q,P] = RodFit(xdata(2,:));
Notice i have changed the orientation of all vectors so that they are now row-vectors and xdata has size size(xdata) = 1780 2
so i thought i had solved the problem completely but i still run into problems when i run lsqcurvefit. i get the error message
Error using RodFit
Too many input arguments.
Error in lsqcurvefit (line 199)
initVals.F = feval(funfcn_x_xdata{3},xCurrent,XDATA,varargin{:});
Caused by:
Failure in initial user-supplied objective function evaluation. LSQCURVEFIT cannot continue.
i have no idea why - does anyone have any idea about why Rodfit recieves to many input arguments when i call lsqcurvefit but not when i run the function manual using xdata?

"Index Exceeds Matrix Dimensions" neural network function error

I've got two datasets, which I load from a CSV file, and split them into X and T:
X (3x5000) double
T (1x5000) double
I'm trying to configure this function, but I can't
http://www.mathworks.co.uk/help/toolbox/nnet/ref/layrecnet.html
X has three features and 5000 examples. T has one feature and 5000 examples. For an example the target is feature 1 20 steps ahead. So basically X(1,21) == T(1).
[X,T] = simpleseries_dataset;
This works perfectly, in this case, I have 1x100, 1x100.
If I use my own data set, however, I get this:
X = data(:,1:3)';
T = data(:,4)';
net = layrecnet(1:2,10);
[Xs,Xi,Ai,Ts] = preparets(net,X,T);
??? Index exceeds matrix dimensions.
Error in ==> preparets at 273
ti = tt(:,FBS+((1-net.numLayerDelays):0));
I don't understand, what am I doing wrong?
UPDATE
I've noticed that my data set is T (1x5000) double while the example dataset is T (1x100) cell. What's the difference between double and cell?
I solved it by:
X = num2cell(X);
T = num2cell(T);
I have no idea why; it must be MATLAB syntax...
You can solve it by:
P = con2seq(p);
T = con2seq(t);
.....% for example
p=(1 2;3 4;5 6);
t=(3;7;11);
.....%now
P = con2seq(p);
T = con2seq(t);
net = elmannet(1:2,12);
[Xs,Xi,Ai,Ts] = preparets(net,P,T);
net = train(net,Xs,Ts,Xi,Ai);
view(net)
Y = net(Xs,Xi,Ai);
perf = perform(net,Ts,Y);
To clarify "(...)it must be MATLAB syntax...":
The problem here is the conversion from double to cell arrays. Matlab does not do this automatically since a cell can contain any type of value as mentioned here: http://www.mathworks.com/help/matlab/matlab_prog/what-is-a-cell-array.html
So, as mentioned in your answer, you can either convert your double arrays to cell arrays using num2cell() or you can allocate X and T as cell arrays from the very beginning using cell() and then copying your double values into them. This explicit type cast is necessary because preparets expects cell arrays as input, much like many of the plot functions in the ANN package.

Implementing Sequentialfs

I am trying to implement sequentialfs for feature selection. I saw this post : Sequential feature selection Matlab
Tried to follow the example given as the solution to implement.
My TrainVec is a matrix of dimension 268 x1475 whereas TestVec is 116x1475 and TestLabel is 116 x 1 and TestLabel is 268 x 1.
the code i implemented is
f = #(TrainVec,TrainLabel,TestVec,TestLabel) sum(TestLabel ~= predict_label);
fs = sequentialfs(f,Vec,Label);
The error i get is :
??? Error using ==> crossval>evalFun at 505
The function
'#(TrainVec,TrainLabel,TestVec,TestLabel)sum(TestLabel~=predict_label)'
generated the following error:
Matrix dimensions must agree.
Error in ==> crossval>getFuncVal at 524
funResult = evalFun(funorStr,arg(:));
Error in ==> crossval at 363
funResult = getFuncVal(1, nData, cvp, data, funorStr, []);
Error in ==> sequentialfs>callfun at 495
funResult = crossval(fun,x,other_data{:},...
I have checked all my matrixes and ensured that they are of same dimensions. Not sure what is wrong. Need some guidance.
Error in ==> sequentialfs at 357
crit(k) = callfun(fun,x,other_data,cv,mcreps);
I'm not sure here if predict_label is a variable or a function with zero input arguments. I would guess that if it's a variable, it's not the same size as TestLabel; and if it's a function, it's either not returning something the same size as TestLabel or it has some intermediate calculation that is erroring.
Either way, you would typically want to be writing
f = #(TrainVec,TrainLabel,TestVec,TestLabel) sum(TestLabel ~= predict_label(TrainVec,TrainLabel,TestVec));
where predict_label is now a function that takes in your TrainVec and TrainLabel, builds a model, evaluates it on TestVec, and returns an array of predicted labels of the same size as TestLabel.