Hi every one, I have a question, I have a matrix(2095 :6) for train and a matrix(2291 :6) for test. I want to use these dataset for train and test a knn in matlab. I could trained, but for test I dont know how I can?
pleas help me...
I wrote this cod for train knn..
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clc;close all
clear all;
%%
%== data for train
X1=load('hof_features_ped2_train001');
%%
%===train
y1=ones(size(X1.hof_features_ped2_train,1),1);
Mdl = fitcknn(X1.hof_features_ped2_train,y1)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Mdl =
ClassificationKNN
ResponseName: 'Y'
CategoricalPredictors: []
ClassNames: 1
ScoreTransform: 'none'
NumObservations: 2095
Distance: 'euclidean'
NumNeighbors: 1
Properties, Methods
%%%%%%%%%%%%%%%%%%%%
If my train is right, how I can test knn????????
Related
My data set is basically a matrix of 3 variables (input), and a matrix of 1 variable (target). There are 50 total data sets for each of these (basically 50 samples of f(x,y,z) = t)
I have only done the ANN training using the GUI. Never really with the script/code.
My most simple objective now is to split the data manually for each train-test run, so I can just painstakingly run the neural network 5 times, but I'm not even sure how to manually select a range of the data set for use in training, and which one for testing.
Here's the full exported script from MATLAB. The point of focus is shown below the wall of code.
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by NFTOOL
% Created Mon Jul 17 02:39:31 SGT 2017
%
% This script assumes these variables are defined:
%
% DEinp - input data.
% DEcgl - target data.
inputs = DEinp;
targets = DEcgl;
% Create a Fitting Network
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainlm'; % Levenberg-Marquardt
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotfit(net,inputs,targets)
%figure, plotregression(targets,outputs)
%figure, ploterrhist(errors)
I figured that all I needed to do was mess with the net.divideMode section, but I really have no idea how to change the syntax to complete my objective.
Network Parameters
The process of splitting the data into training, validation and test sets happens in the section that you identified. I'm just going to break down each of the lines. Starting with:
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideMode = 'sample'; % Divide up every sample
The divideMode is well documented in Neural Network Object Properties
net.divideMode
This property defines the target data dimensions which
to divide up when the data division function is called. Its default
value is 'sample' for static networks and 'time' for dynamic networks.
It may also be set to 'sampletime' to divide targets by both sample
and timestep, 'all' to divide up targets by every scalar value, or
'none' to not divide up data at all (in which case all data is used
for training, none for validation or testing).
So your network is a static network which divides up every sample into a training example. This will remain the same for your cross-validation. What you are interested in manipulating is the training, test, and validation splits.
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
Okay, the variable names here seem promising, but you want a little more control than just choosing the ratio size.
Again the Neural Network Object Properties point us towards more information
net.divideParam
This property defines the parameters and values of the current
data-division function. To get a description of what each field means,
type the following command:
help(net.divideFcn)
This will print out information about how your dataset is partitioned into training, validation, and test splits. In your current configuration, the message reads
dividerand Partition indices into three sets using random indices.
[trainInd,valInd,testInd] = dividerand(Q,trainRatio,valRatio,testRatio) takes a number of
samples Q and divides up the sample indices 1:Q between training,
validation and test indices.
dividerand randomly assigns sample indices to the three sets according to the three ratios.
(...)
See also divideblock, divideind, divideint, dividetrain.
Since you want more control of the partitions, you should check out these additional options.
I think the most promising is divideind. This option allows you to specify the indices for each partition. You can calculate the indices for each fold in your k-fold cross validation and reassign the partitions in each iteration using this option.
To set this parameter, replace the net.divideParam lines above with something like,
net.divideFcn = 'divideind';
net.divideParam.Q = length(targets); %This is the total number of instances in your data
net.divideParam.trainInd = your_train_ind;
net.divideParam.valInd = your_val_ind;
net.divideParam.testInd = your_test_ind;
K-folds
Last detail, how to select the indices? First, a quick review on k-fold cross-validation.
The data is split into k equally sized subsamples.
In each iteration of cross-validation, we train on k-1 of the subsamples and test on the remaining subsamples, rotating to a new testing subsamples each time.
An implementation sketch might look like this
k = 5; % As an example, let's let k = 5
sample_size = length(targets)/k;
%Make a vector of all the indices of your data from 1 to the total number of instances
indices= 1:length(targets);
% Optional: Randomize samples
indices = randperm(length(targets));
% Iterate in steps of sample_size
for ii = 1: sample_size:length(targets) - sample_size
% Grab one subsample of indices for testing
your_test_ind = indices( ii:ii + sample_size - 1);
% Everything else
your_train_ind = indices( [1:ii, ii + sample_size:end]);
%Train and test your network here!
end
This is just an implementation sketch and doesn't handle some edge cases correctly. For example, the first element is always added to the training set, but it should be enough to get you started.
I have a dataset of 20000 instances with 4421 features. For scientific reasons ( publication), I need to perform a 10 fold-cross validation from this dataset as the individual and average accuracy of the classifier using random forest with Matlab. Please, could you tell me have to perform 10 cv from my dataset and obtaining the classification accuracy?
Here my code so far:
data = load ('HCTSA_N.mat');
% This makes sure we get the same results every time we run the code.
rng default
traindata = data.TS_DataMat;
trainlabels = {data.TimeSeries.Keywords};
% How many trees do you want in the forest?
nTrees = 20;
% Train the TreeBagger (Decision Forest).
B = TreeBagger(nTrees,traindata,trainlabels, 'Method', 'classification');
I am looking for an example of applying 10-fold cross-validation in neural network.I need something link answer of this question: Example of 10-fold SVM classification in MATLAB
I would like to classify all 3 classes while in the example only two classes were considered.
Edit: here is the code I wrote for iris example
load fisheriris %# load iris dataset
k=10;
cvFolds = crossvalind('Kfold', species, k); %# get indices of 10-fold CV
net = feedforwardnet(10);
for i = 1:k %# for each fold
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx; %# get indices training instances
%# train
net = train(net,meas(trainIdx,:)',species(trainIdx)');
%# test
outputs = net(meas(trainIdx,:)');
errors = gsubtract(species(trainIdx)',outputs);
performance = perform(net,species(trainIdx)',outputs)
figure, plotconfusion(species(trainIdx)',outputs)
end
error given by matlab:
Error using nntraining.setup>setupPerWorker (line 62)
Targets T{1,1} is not numeric or logical.
Error in nntraining.setup (line 43)
[net,data,tr,err] = setupPerWorker(net,trainFcn,X,Xi,Ai,T,EW,enableConfigure);
Error in network/train (line 335)
[net,data,tr,err] = nntraining.setup(net,net.trainFcn,X,Xi,Ai,T,EW,enableConfigure,isComposite);
Error in Untitled (line 17)
net = train(net,meas(trainIdx,:)',species(trainIdx)');
It's a lot simpler to just use MATLAB's crossval function than to do it manually using crossvalind. Since you are just asking how to get the test "score" from cross-validation, as opposed to using it to choose an optimal parameter like for example the number of hidden nodes, your code will be as simple as this:
load fisheriris;
% // Split up species into 3 binary dummy variables
S = unique(species);
O = [];
for s = 1:numel(S)
O(:,end+1) = strcmp(species, S{s});
end
% // Crossvalidation
vals = crossval(#(XTRAIN, YTRAIN, XTEST, YTEST)fun(XTRAIN, YTRAIN, XTEST, YTEST), meas, O);
All that remains is to write that function fun which takes in input and output training and test sets (all provided to it by the crossval function so you don't need to worry about splitting your data yourself), trains a neural net on the training set, tests it on the test set and then output a score using your preferred metric. So something like this:
function testval = fun(XTRAIN, YTRAIN, XTEST, YTEST)
net = feedforwardnet(10);
net = train(net, XTRAIN', YTRAIN');
yNet = net(XTEST');
%'// find which output (of the three dummy variables) has the highest probability
[~,classNet] = max(yNet',[],2);
%// convert YTEST into a format that can be compared with classNet
[~,classTest] = find(YTEST);
%'// Check the success of the classifier
cp = classperf(classTest, classNet);
testval = cp.CorrectRate; %// replace this with your preferred metric
end
I don't have the neural network toolbox so I am unable to test this I'm afraid. But it should demonstrate the principle.
In order to find the best parameters to be used with libsvm I used the code below. Instead of './heart_scale' I had a file containing positive and negative examples each with a hog vector in libsvm format. I had 1000 positive examples and 4000 negative. However these were put in order, i.e. the 1st 1000 examples were positive examples and the others were negative.
Question: Now, I came in doubt whether the accuracy returned by this code is actual accuracy. This is because when I read on 5 fold cross-validation, it takes the first 4/5 of the data as training and the 1/5 left for testing. Does this mean that it can be the case the testing set is all negative? Or it takes the examples randomly please?
%# read some training data
[labels,data] = libsvmread('./heart_scale');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, ...
sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...
'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...
You can find answer to your question in the LIBSVM source code.
See the function svm_cross_validation in the svm.cpp.
As you can see, for classification cross-validation problem LIBSVM firstly performs class grouping and than shuffling.
So, answer to your question: yes, the accuracy returned by this code is actual accuracy.
Note: the accuracy estimation depends also on data nature, cross-validation folds number and itself is a random value with some distribution.
I tried to run this code found online, but it does not work. The error is
Error using svmclassify (line 53)
The first input should be a `struct` generated by `SVMTRAIN`.
Error in fisheriris_classification (line 27)
pred = svmclassify(svmModel, meas(testIdx,:), 'Showplot',false);
Can anyone help me fix this problem? Thank you so much!
clear all;
close all;
load fisheriris %# load iris dataset
groups = ismember(species,'setosa'); %# create a two-class problem
%# number of cross-validation folds:
%# If you have 50 samples, divide them into 10 groups of 5 samples each,
%# then train with 9 groups (45 samples) and test with 1 group (5 samples).
%# This is repeated ten times, with each group used exactly once as a test set.
%# Finally the 10 results from the folds are averaged to produce a single
%# performance estimation.
k=10;
cvFolds = crossvalind('Kfold', groups, k); %# get indices of 10-fold CV
cp = classperf(groups); %# init performance tracker
for i = 1:k %# for each fold
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx; %# get indices training instances
%# train an SVM model over training instances
svmModel = svmtrain(meas(trainIdx,:), groups(trainIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
%# test using test instances
pred = svmclassify(svmModel, meas(testIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, testIdx);
end
%# get accuracy
cp.CorrectRate
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
%with the output:
%ans =
% 0.99333
%ans =
% 100 1
% 0 49
% 0 0
The reason for the issue seems to me the way MATLAB finds functions on the search path. I am fairly certain that it is still attempting to use the LIBSVM function rather than the built-in MATLAB function. Here is more information about the search path:
http://www.mathworks.com/help/matlab/matlab_env/what-is-the-matlab-search-path.html
To verify whether this is the issue, please try the following command in the command window:
>> which -all svmtrain
You should find that the built-in function is being shadowed by the LIBSVM function. You can either remove LIBSVM from the MATLAB search path using the "Set Path" tool in the Toolstrip, or run your code from a different directory that does not contain the LIBSVM files. I would recommend the first option. To read more about the built-in MATLAB functions, check these links:
http://www.mathworks.com/help/stats/svmtrain.html
http://www.mathworks.com/help/stats/svmclassify.html
If you would like to continue use LIBSVM, I would recommend checking the following site out.
https://www.csie.ntu.edu.tw/~cjlin/index.html
Hope this helps.