I'm currently trying to create a diagnostic tool for predicting disease outcome. I want to do this by training a Nearest Mean Classifier (NMC) over candidate genes and evaluate its error by using a test set. For this end I generated a train dataset and test dataset using gendat from PrTools. Only, when I want to train the NMC Matlab gives the error that the train dataset doesn't have classes. How do I assign classes to the dataset?
load vantVeer.mat
% D.data is data from vantVeer
[train_data,test_data,I_train,I_test]=gendat(D.data',39);
W=nmc(train_data)
Error using isvaldfile (line 48)
Labeled datafile(set) expected
Error in nmc (line 52)
isvaldfile(a,1,2); % at least 1 object per class, 2 classes
You should generate a dataset with the data and its classes like below:
traindataset = dataset(train_data, train_classes);
testdataset = dataset(test_data, test_classes);
W = nmc(traindataset)
Related
It seems that cross validated models cannot be used with the predict function. How would one go about using the model with a test set? For example:
ens = fitcecoc(X, T, 'KFold', 10)
Directly using the predict function throws an error and MATLAB documentation explains why it does so very well. ens is a partitioned model with 10 different classifiers. Should we run predict using each classifier and then use the class with the maximum agreement?
Couple of other similar question haven't received answers so I figured I'll answer my own question with the solution I found. MATLAB K-Fold cross validation produces K different classifiers or regressors. They have been generated by holding out portions of data (holdout operation is random, hence if you have an unbalanced dataset - be careful). In order to predict the output class, you could iterate over all the trained K models and use mode to get the accurate class.
cv_Ensemble = crossval(Ensemble_Model, 'KFold', 10);
classIdx = zeros(N, length(cv_Ensemble .Trained));
for p = 1:length(cv_Ensemble .Trained)
[temp, ~] = predict(cv_Ensemble.Trained{p}, Data_f);
classIdx(:, p) = temp;
end
classIdx = mode(classIdx, 2)
I have two input datafiles to use in orange, one corresponds to the train set (with targets "A", "B" and "C") and the other to the unknown samples ( with targets "D" and "E" to be able to identify the unknown samples in the scatterplot of the two first principal components).
I have applied PCA to the train dataset and through a python script i have reapplied the PCA transformation to the test dataset, however the result have a ? in the target value for all entries in the unknown samples set.
I have tried to merge the train and unknown samples sets with the merge table widget, and apparently it does the same, all samples in train are correct, but the unknown samples have ? as targets.
The only way i managed to have this running properly is to have unknown samples and train set on the same input file. Which is not practical for obvious reasons.
Is there any way to fix this?
Please note that i have tried to change the domain.class_var and the target value directly on the transformed unknown samples, but it also alters the domain of the train dataset. Apparently when the new table is created it just have a reference to the domain of the original train data after PCA.
I have managed it by converting the data into numpy arrays concatenate them and then back to table.
Here is the code if anyone is interested:
import numpy
from Orange.data.table import Table
from Orange.data import Domain, DiscreteVariable, ContinuousVariable
trnsfrmd_knwn_data = numpy.array(in_object)
trnsfrmd_unkwn_data = numpy.array(Table(in_object.domain,in_data))
ndx = list(set(trnsfrmd_knwn_data[:,len(trnsfrmd_knwn_data[0])-1].tolist()))[-1] + 1
trnsfrmd_unkwn_data[:,len(trnsfrmd_knwn_data[0])-1] = numpy.array([i for i in range(0, len(trnsfrmd_unkwn_data))]) + ndx
targets = in_object.domain.class_var.values + in_data.domain.class_var.values
dm = Domain([ContinuousVariable(x.name) for x in in_object.domain.attributes], DiscreteVariable('region', values=targets))
out_data = Table.from_numpy(dm, numpy.append(trnsfrmd_knwn_data,trnsfrmd_unkwn_data,axis=0))
I am new to matlab. I want to know how to fixed the train and test set in svm code because I had find a code, the code randomly selects the test and train set. my database is YMU database, how should I fix the train and test set using svm code. because I use the crossvalind to randomly select the train and test set. which variable should I change with the crossvalind?
%load YMU database
%NMC is non-makeup , MC is makeup
%testingset = non-makeup, trainingset is makeup
load TestingSetNMC.mat
load TrainingSetMC.mat
load gnd_Test.mat
load gnd_Train.mat
data1 = TrainingSet;
data2 = TestingSet;
groups1 = ismember(gnd_Train,'data1');
groups2 = ismember(gnd_Test,'data2');
%crossvalind is random choose
[train] = crossvalind('holdOut',groups1);
[test] = crossvalind('holdOut',groups2);
cp = classperf(groups1);
svmStruct = svmtrain(data1(train,:),groups1(train),'showplot',true);
classes = svmclassify(svmStruct,data2(test,:),'showplot',true);
classperf(cp,classes,test);
cp.CorrectRate
With (most) matlab functions that generate pseudo-random output you can control that output by explicitly specifying a random number generator's seed and method.
In your case, place the following line anywhere before you call crossvalind:
rng(1, 'twister');
This sets the seed to 1 and the method to Mersenne Twister. In the documentation for rng you will find a more detailed explanation about controlling pseudo-random output.
Thanks in advance for the help.
I am trying to use stepwise regression on a set of data. I have the data in a table, with the single predictor variable on the far right of the table (as a column). Here is what my code looks like.
mdl = stepwiseglm(dummyTrainingTable,'modelspec',modelTech,'Criterion',criterion);
where modelTech and criterion are variables that hold strings dictating two name-value pair options. I am getting the following error
Error using classreg.regr.FitObject/assignData (line 257)
Predictor and response variables must have the same length.
Error in classreg.regr.TermsRegression/assignData (line 349)
model =
assignData#classreg.regr.ParametricRegression(model,X,y,w,asCat,varNames,excl);
Error in GeneralizedLinearModel/assignData (line 794)
model =
assignData#classreg.regr.TermsRegression(model,X,y,w,asCat,dummyCoding,varNames,excl);
Error in GeneralizedLinearModel.fit (line 1165)
model =
assignData(model,X,y,weights,offset,binomN,asCatVar,dummyCoding,model.Formula.VariableNames,exclude);
Error in GeneralizedLinearModel.stepwise (line 1271)
model = GeneralizedLinearModel.fit(X,y,start.Terms,'Distribution',distr,
...
Error in stepwiseglm (line 148)
model = GeneralizedLinearModel.stepwise(X,varargin{:});
This doesn't make sense to me since clearly my response and predictor variables have the same length; they're in a table together. If they weren't the same length, they couldn't be in a table right? Is this an issue with Matlab or is there just something simple that I am missing?
Note, I when I convert the table to a matrix, stepwiseglm runs just fine. i.e.,
dummyTrainingArray = table2array(dummyTrainingTable);
mdl = stepwiseglm(dummyTrainingArray(:,1:size(dummyTrainingArray,2) - 1), dummyTrainingArray(:,size(dummyTrainingArray,2)),modelTech,'VarNames', ...
dummyTrainingTable.Properties.VariableNames,'Criterion', criterion);
I figured out a solution. Although the documentation online states that the input can be a table, when I checked the manual within my version of Matlab (run 'help stepwiseglm'), I found that the function was compatible only with datasets. I then converted my table to a dataset and it ran fine.
Edit, I have Matlab version
8.2.0.701 (R2013b)
'modelspec' is not a valid argument name for the function. Try:
mdl = stepwiseglm(dummyTrainingTable, modelTech, 'Criterion', criterion);
I have a 90×8 dataset that I feature-extracted (by summing 1's in every 10×10 cell) from 90 character images i.e. digits 1-9. Every row represents an image.
I am trying to use following code to train a neural network and recognize new input images(that are digits between 1 and 9 inclusive):
net.trainFcn='traingdx';
net.performFcn='sse';
net.trainParam.goal=0.1;
net.trainParam.show=20;
net.trainParam.epochs=5000;
net.trainParam.mc=0.95;
net =newff(minmax(datasetNormalized'),[20 9],{'logsig' 'logsig'});
T=reshape(repmat([1:9],10,1),1,90);
[net,tr]=train(net,datasetNormalized,T);
Afterwards I want to use the following to recognize new images using the trained network. m is an image character that has also been feature extracted.
[a,m]=max(sim(net,m));
disp(b);
I am getting the following errors and I don't have any idea how to solve it:
Error using trainlm (line 109)
Inputs and targets have different numbers of samples.
Error in network/train (line 106) [net,tr] =
feval(net.trainFcn,net,X,T,Xi,Ai,EW,net.trainParam);
Error in Neural (line 55) [net,tr]=train(net,datasetNormalized,T);
Note: datasetNormalized is my dataset normalized in [0,1].
Which part causes the problem?
Inputs and targets have different numbers of samples. it seems to be the problem
T=reshape(repmat([1:9],10,1),1,90) --> T=reshape(repmat([1:9],10,1),90,1)
[net,tr]=train(net,datasetNormalized,T); --> [net,tr]=train(net,datasetNormalized',T);
T is to be used as target for the network; Therefore, following a friend's advice, I defined T as a 9*90 array in such a way that the first 10 columns have 1 in their first row-other rows being zero, the second 10 columns have 1 in their second row, and so on
T=zeros(9,90);
for j=1:90
i=ceil(j/10);
T(i,j)=1;
end
[net,tr]=train(net,datasetNormalized',T);
This solved the error I was getting upon training network, though I'm not still sure how it's going to be mapped to input characters and determine them.