K-fold cross validation - matlab

How do i perform k-fold cross validation on a data set, say X.
I have gone through the matlab site and have tried this for a data set X.
Following is the code for 10 fold cross validation on set X.
c= cvcrossvalidate(X,'kFold',10);
This creates an object c, but how do i access the different parts and use them to test my classifier? I am not able to comprehend even after going through various texts.

Follow this:
C = crossvalind('Kfold', X_label, 10);
for i = 1:10
Test = (C == i);
Train = ~Test;
SVMStruct = svmtrain ( X (Train,:), X_label (Train,:));
Result = svmclassify(SVMStruct, X (Test,:));
end
X_label = your data labelling.
X = your data set.

Related

Nonlinear Regression & Optimisation

I need some help... I have got a model function which is :
function y = Surf(param,x);
global af1 af2 tData % A2 mER2
A1 = param(1); m1 = param(2); A2 = param(3); m2 = param(4);
m = param(5); n = param(6);
k1 = #(T) A1*exp(mER1/T);
k2 = #(T) A2*exp(mER2/T);
af = #(T) sech(af1*T+af2);
y = zeros(length(x),1);
for i = 1:length(x)
a = x(i,1); T = temperature(i,1);
y(i) = (k2(T)+k1(T)*(a.^m))*((af(T)-a).^n);
end
end
And I have got a set of Data giving Cure, Cure_rate, Temperature. Which are all in a single vertical column matrix.
Basically, I tried to use :
[output,R1] = lsqcurvefit(#Surf, initial_guess, Cure, Cure_rate)
[output2,R2] = nlinfit(Cure,Cure_rate,#Surf,initial_guess)
And they works pretty well, (my initial_guess are initial guess of parameters in the above model which is in : [1.1e+07 -7.8e+03 1.2e+06 -7.1e+03 2.2 0.72])
My main problem is, when I try to look into different methods which could do nonlinear regression such as fminsearch, fmincon, fsolve, fminunc, etc. They just dont work and I am quite confused about the input that I am considering. Mainly beacuse they dont work as same as nlinfit and lsqcurvefit (input of Cure, Cure_rate), most of them considered the model function and the initial guess only, The way I did the above:
output3 = fminsearch(#Surf,initial_guess)
output4 = fsolve(#Surf,initial_guess)
output5 = fmincon(#Surf,x0,A,b,Aeq,beq)
(Not sure what should I put for Linear Inequality Constraint:
A,b and Aeq,beq )
output6 = fminunc(#Surf,initial_guess)
The problem is Matlab keep saying either I have not enough input or too many input which I don't get it and how should I include my Dataset in the fitting function (Cure, Cure_rate) in the above functions, like in nlinfit and lsqcurvefit?

EEG data classification with SWLDA using matlab

I want to ask your help in EEG data classification.
I am a graduate student trying to analyze EEG data.
Now I am struggling with classifying ERP speller (P300) with SWLDA using Matlab
Maybe there is something wrong in my code.
I have read several articles, but they did not cover much details.
My data size is described as below.
size(target) = [300 1856]
size(nontarget) = [998 1856]
row indicates the number of trials, column indicates spanned feature
(I stretched data [64 29] (for visual representation I did not select ROI)
I used stepwisefit function in Matlab to classify target vs non-target
Code is attached below.
ingredients = [targets; nontargets];
heat = [class_targets; class_nontargets]; % target: 1, non-target: -1
randomized_set = shuffle([ingredients heat]);
for k=1:10 % 10-fold cross validation
parition_factor = ceil(size(randomized_set,1) / 10);
cv_test_idx = (k-1)*parition_factor + 1:min(k * parition_factor, size(randomized_set,1));
total_idx = 1:size(randomized_set,1);
cv_train_idx = total_idx(~ismember(total_idx, cv_test_idx));
ingredients = randomized_set(cv_train_idx, 1:end-1);
heat = randomized_set(cv_train_idx, end);
[W,SE,PVAL,INMODEL,STATS,NEXTSTEP,HISTORY]= stepwisefit(ingredients, heat, 'penter', .1);
valid_id = find(INMODEL==1);
v_weights = W(valid_id)';
t_ingredients = randomized_set(cv_test_idx, 1:end-1);
t_heat = randomized_set(cv_test_idx, end); % true labels for test set
v_features = t_ingredients(:, valid_id);
v_weights = repmat(v_weights, size(v_features, 1), 1);
predictor = sum(v_weights .* v_features, 2);
m_result = predictor > 0; % class A: +1, B: 0
t_heat(t_heat==-1) = 0;
acc(k) = sum(m_result==t_heat) / length(m_result);
end
p.s. my code is currently very inefficient and might be bad..
In my assumption, stepwisefit calculates significant coefficients every steps, and valid column would be remained.
Even though it's not LDA, but for binary classification, LDA and linear regression are not different.
However, results were almost random chance.. (for other binary data on the internet, it worked..)
I think I made something wrong, and your help can correct me.
I will appreciate any suggestion and tips to implement classifier for ERP speller.
Or any idea for implementing SWLDA in Matlab code?
The name SWLDA is only used in the context of Brain Computer Interfaces, but I bet it has another name in a more general context.
If you track the recipe of SWLDA you will end up in Krusienski 2006 papers ("A comparison..." and "Toward enhanced P300..") and from there the book where stepwise logarithmic regression is explained: "Draper Smith, Applied Regression Analysis, 1981". However, as far as I am aware of, no paper gives actually the complete recipe on how to implement it (and their details and secrets).
My approach was using stepwiseglm:
H=predictors;
TH=variables;
lbs=labels % (1,2)
if (stepwiseflag)
mdl = stepwiseglm(H', lbs'-1,'constant','upper','linear','distr','binomial');
if (mdl.NumEstimatedCoefficients>1)
inmodel = [];
for i=2:mdl.NumEstimatedCoefficients
inmodel = [inmodel str2num(mdl.CoefficientNames{i}(2:end))];
end
H = H(inmodel,:);
TH = TH(inmodel,:);
end
end
lbls = classify(TH',H',lbs','linear');
You can also use a k-fold cross validaton approach using matlab cvpartition.
c = cvpartition(lbs,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'linear'))));

How to perform stratified 10 fold cross validation for classification in MATLAB?

My implementation of usual K-fold cross-validation is pretty much like:
K = 10;
CrossValIndices = crossvalind('Kfold', size(B,2), K);
for i = 1: K
display(['Cross validation, folds ' num2str(i)])
IndicesI = CrossValIndices==i;
TempInd = CrossValIndices;
TempInd(IndicesI) = [];
xTraining = B(:, CrossValIndices~=i);
tTrain = T_new1(:, CrossValIndices~=i);
xTest = B(:, CrossValIndices ==i);
tTest = T_new1(:, CrossValIndices ==i);
end
But To ensure that the training, testing, and validating dataset have similar proportions of classes (e.g., 20 classes).I want use stratified sampling
technique.Basic purpose is to avoid class imbalance problem.I know about SMOTE technique but i want to apply this one.
You can simply use crossvalind('Kfold', Group, K), where Group is the vector containing the class label for each observation. This will lead to sets where each group is proportionally abundant.

First non demo example for Gaussian process using GPML (Matlab)?

After having some basics understanding of GPML toolbox , I written my first code using these tools. I have a data matrix namely data consist of two array values of total size 1000. I want to use this matrix to estimate the GP value using GPML toolbox. I have written my code as follows :
x = data(1:200,1); %training inputs
Y = data(1:201,2); %, training targets
Ys = data(201:400,2);
Xs = data(201:400,1); %possibly test cases
covfunc = {#covSE, 3};
ell = 1/4; sf = 1;
hyp.cov = log([ell; sf]);
likfunc = #likGauss;
sn = 0.1;
hyp.lik = log(sn);
[ymu ys2 fmu fs2] = gp(hyp, #infExact, [], covfunc, likfunc,X,Y,Xs,Ys);
plot(Xs, fmu);
But when I am running this code getting error 'After having some basics understanding of GPML toolbox , I written my first code using these tools. I have a data matrix namely data consist of two array values of total size 1000. I want to use this matrix to estimate the GP value using GPML toolbox. I have written my code as follows :
x = data(1:200,1); %training inputs
Y = data(1:201,2); %, training targets
Ys = data(201:400,2);
Xs = data(201:400,1); %possibly test cases
covfunc = {#covSE, 3};
ell = 1/4; sf = 1;
hyp.cov = log([ell; sf]);
likfunc = #likGauss;
sn = 0.1;
hyp.lik = log(sn);
[ymu ys2 fmu fs2] = gp(hyp, #infExact, [], covfunc, likfunc,X,Y,Xs,Ys);
plot(Xs, fmu);
But when I am running this code getting:
Error using covMaha (line 58) Parameter mode is either 'eye', 'iso',
'ard', 'proj', 'fact', or 'vlen'
Please if possible help me to figure out where I am making mistake ?
I know this is way late, but I just ran into this myself. The way to fix it is to change
covfunc = {#covSE, 3};
to something like
covfunc = {#covSE, 'iso'};
It doesn't have to be 'iso', it can be any of the options listed in the error message. Just make sure your hyperparameters are set correctly for the specific mode you choose. This is detailed more in the covMaha.m file in GPML.

How to use aryule() in Matlab to extend a number series?

I have a series of numbers. I calculated the "auto-regression" between them using Yule-Walker method.
But now how do I extend the series?
Whole working is as follows:
a) the series I use:
143.85 141.95 141.45 142.30 140.60 140.00 138.40 137.10 138.90 139.85 138.75 139.85 141.30 139.45 140.15 140.80 142.50 143.00 142.35 143.00 142.55 140.50 141.25 140.55 141.45 142.05
b) this data is loaded in to data using:
data = load('c:\\input.txt', '-ascii');
c) the calculation of the coefficients:
ar_coeffs = aryule(data,9);
this gives:
ar_coeffs =
1.0000 -0.9687 -0.0033 -0.0103 0.0137 -0.0129 0.0086 0.0029 -0.0149 0.0310
d) Now using this, how do I calculate the next number in the series?
[any other method of doing this (except using aryule()) is also fine... this is what I did, if you have a better idea, please let me know!]
For a real valued sequence x of length N, and a positive order p:
coeff = aryule(x, p)
returns the AR coefficients of order p of the data x (Note that coeff(1) is a normalizing factor). In other words it models values as a linear combination of the past p values. So to predict the next value, we use the last p values as:
x(N+1) = sum_[k=0:p] ( coeff(k)*x(N-k) )
or in actual MATLAB code:
p = 9;
data = [...]; % the seq you gave
coeffs = aryule(data, p);
nextValue = -coeffs(2:end) * data(end:-1:end-p+1)';
EDIT: If you have access to System Identification Toolbox, then you can use any of a number of functions to estimate AR/ARMAX models (ar/arx/armax) (or even find the order of AR model using selstruc):
m = ar(data, p, 'yw'); % yw for Yule-Walker method
pred = predict(m, data, 1);
coeffs = m.a;
nextValue = pred(end);
subplot(121), plot(data)
subplot(122), plot( cell2mat(pred) )
Your data has a non-zero mean. Doesn't the Yule-Walker model assume the data is the output of a linear filter excited by a zero-mean white noise process?
If you remove the mean, this example using ARYULE and LPC might be what you're looking for. The procedure boils down to:
a = lpc(data,9); % uses Yule-Walker modeling
pred = filter(-a(2:end),1,data);
disp(pred(end)); % the predicted value at time N+1