I am trying to use Self Organizing Map to split datasets into training, validation and test sets.
I created the SOM model,
dimension1 = 10;
dimension2 = 10;
net = selforgmap([dimension1 dimension2],100,3,'hextop','linkdist');
[net, tr] = train(net, cancer);
however when I am trying to partition the dataset using
net.divideParam.trainRatio = 0.6;
net.divideParam.valRatio = 0.2;
net.divideParam.testRatio = 0.2;
I am getting an error
"Error in network/subsasgn>network_subsasgn (line 456)
if isempty(err), [net,err]=setDivideParam(net,divideParam); end
Error in network/subsasgn (line 10)
net = network_subsasgn(net,subscripts,v,netname);"
Could someone please provide me some guidelines how split datasets using SOM in Matlab?
Code Image
You can not use trainRatio, valRatio and testRatio in SOM.
These can be used in other neural networks such as MLP.
Related
I am trying to learn the correct procedure for training a neural network for classification. Many tutorials are there but they never explain how to report for the generalization performance. Can somebody please tell me if the following is the correct method or not. I am using first 100 examples from the fisheriris data set that has labels 1,2 and call them as X and Y respectively. Then I split X into trainData and Xtest with a 90/10 split ratio. Using trainData I trained the NN model. Now the NN internally further splits trainData into tr,val,test subsets. My confusion is which one is usually used for generalization purpose when reporting the performance of the model to unseen data in conferences/Journals?
The dataset can be found in the link: https://www.mathworks.com/matlabcentral/fileexchange/71468-simple-neural-networks-with-k-fold-cross-validation-manner
rng('default')
load iris.mat;
X = [f(1:100,:) l(1:100)];
numExamples = size(X,1);
indx = randperm(numExamples);
X = X(indx,:);
Y = X(:,end);
split1 = cvpartition(Y,'Holdout',0.1,'Stratify',true); %90% trainval 10% test
istrainval = training(split1); % index for fitting
istest = test(split1); % indices for quality assessment
trainData = X(istrainval,:);
Xtest = X(istest,:);
Ytest = Y(istest);
numExamplesXtrainval = size(trainData,1);
indxXtrainval = randperm(numExamplesXtrainval);
trainData = trainData(indxXtrainval,:);
Ytrain = trainData(:,end);
hiddenLayerSize = 10;
% data format = rows = number of dim, column = number of examples
net = patternnet(hiddenLayerSize);
net = init(net);
net.performFcn = 'crossentropy';
net.trainFcn = 'trainscg';
net.trainParam.epochs=50;
[net tr]= train(net,trainData', Ytrain');
Trained = sim(net, trainData'); %outputs predicted labels
train_predict = net(trainData');
performanceTrain = perform(net,Ytrain',train_predict)
lbl_train=grp2idx(Ytrain);
Yhat_train = (train_predict >= 0.5);
Lbl_Yhat_Train = grp2idx(Yhat_train);
[cmMatrixTrain]= confusionmat(lbl_train,Lbl_Yhat_Train )
accTrain=sum(lbl_train ==Lbl_Yhat_Train)/size(lbl_train,1);
disp(['Training Set: Total Train Acccuracy by MLP = ',num2str(100*accTrain ), '%'])
[confTest] = confusionmat(lbl_train(tr.testInd),Lbl_Yhat_Train(tr.testInd) )
%unknown test
test_predict = net(Xtest');
performanceTest = perform(net,Ytest',test_predict);
Yhat_test = (test_predict >= 0.5);
test_lbl=grp2idx(Ytest);
Lbl_Yhat_Test = grp2idx(Yhat_test);
[cmMatrix_Test]= confusionmat(test_lbl,Lbl_Yhat_Test )
This is the output.
Problem1: There seems to be no prediction for the other class. Why?
Problem2: Do I need a separate dataset like the one I created as Xtest for reporting generalization error or is it the practice to use the data trainData(tr.testInd,:) as the generalization test set? Did I create an unnecessary subset?
performanceTrain =
2.2204e-16
cmMatrixTrain =
45 0
45 0
Training Set: Total Train Acccuracy by MLP = 50%
confTest =
9 0
5 0
cmMatrix_Test =
5 0
5 0
There are a few issues with the code. Let's deal with them before answering your question. First, you set a threshold of 0.5 for making decisions (Yhat_train = (train_predict >= 0.5);) while all points of your net prediction are above 0.5. This means you only get zeros in your confusion matrices. You can plot the scores to choose a better threshold:
figure;
plot(train_predict(Ytrain == 1),'.b')
hold on
plot(train_predict(Ytrain == 2),'.r')
legend('label 1','label 2')
cvpartition gave me an error. It ran successfully as split1 = cvpartition(Y,'Holdout',0.1); In any case, artificial neural networks usuallly manage partitioning within the training process, so you feed in X and Y and some parameters for how to do it. See here for example: link where you set
net.divideParam.trainRatio = .4;
net.divideParam.valRatio = .3;
net.divideParam.testRatio = .3;
So how to report the results? Only for the test data. The train data will suffer from overfit, and will show false, too good results. If you use validation data (you havn't), then you cannot show results for it because it will also suffer from overfit. If you let the training do validation for you your test results will be safe from overfit.
I trained a SVM classifcation model using "fitcsvm" function and tested with the test data set. Now I want to use this model to predict the classes of new (previously unseen) data. What should be done ?
Following is the code I used.
load FeatureLabelsNum.csv
load FeatureOne.csv
X = FeatureOne(1:42,:);
y = FeatureLabelsNum(1:42,:);
%dividing the dataset into training and testing
rand_num = randperm(42);
%training Set
X_train = X(rand_num(1:34),:);
y_train = y(rand_num(1:34),:);
%testing Set
X_test = X(rand_num(34:end),:);
y_test = y(rand_num(34:end),:);
%preparing validation set out of training set
c = cvpartition(y_train,'k',5);
SVMModel =
fitcsvm(X_train,y_train,'Standardize',true,'KernelFunction','RBF',...
'KernelScale','auto','OutlierFraction',0.05);
CVSVMModel = crossval(SVMModel);
classLoss = kfoldLoss(CVSVMModel)
classOrder = SVMModel.ClassNames
sv = SVMModel.SupportVectors;
figure
gscatter(X_train(:,1),X_train(:,2),y_train)
hold on
plot(sv(:,1),sv(:,2),'ko','MarkerSize',10)
legend('Resampled','Non','Support Vector')
hold off
X_test_w_best_feature =X_test(:,:);
bp = (predict(SVMModel,X_test)== y_test);
You already use the predict function in your script, however, just pass the new data in and score will contain your predicted labels.
[~,score] = predict(SVMModel,X_new_data);
I'm trying to show the difference between results of an ANN trained with and without validation...
assume that I'm trying to train a ANN how a sin function should work:
this is gonna be my training data:
x = -1:0.05:1;
t = sin(2*pi*x)+0.01*randn(size(x));
and for validation data I gonna use this:
val.X = -0.975:.05:0.975;
val.T = sin(2*pi*val.X)+0.01*randn(size(val.X));
then I configure my net as follows:
net = feedforwardnet(10,'trainlm');
net.trainParam.show = 50;
net.trainParam.epochs = 300;
then I train net without validation :
[net1, tr1] = train(net,x,t);
and for training with validation I use this code:
[net2,tr2]=train(net,x,t,[],[],val);
but it doesn't work!?
EDIT:
the error says: "Error weights EW is not a matrix or cell array."
I wonder if you could tell me how to validate training ANN by custom data!?
I am Trying to train my Neural Network using vInputs and vTargets to create a model net
But I keep getting this irritating Error no matter what I try to correct in my code:
ERROR: Error using zeros.Maximum variable size allowed by the program is exceeded.
I dont find any such big data created by my code and I have a RAM of 4GB
I have 230 images which I need to classify in 3 parts 'Sedan', 'SUV', 'Hatchback'
image size [15x26]
I used this code to convert single image's features extracted by gabor filter to Feature Vector
function IMVECTOR = im2vec (img)
load Gabor;
img = adapthisteq(img);
Features75x208 = cell(5,8);
for s = 1:5
for j = 1:8
Features75x208{s,j} = mminmax(abs(ifft2(G{s,j}.*fft2(double(img),32,32),15,26)));
end
end
Features25x70 = cell2mat(Features75x208);
Features25x70 (3:3:end,:)=[];
Features25x70 (2:2:end,:)=[];
Features25x70 (:,3:3:end)=[];
Features25x70 (:,2:2:end)=[];
IMVECTOR = reshape (Features25x70,[1750 1]);
Now after the feature vector is created I made vInputs using same function for 230 images.
Thus I got
1. vInputs=[1750x230].
2. vTargets=[4x230].
But whenever I run the Project I get error in function.
function net = create_pr_net(inputs,targets)
%CREATE_PR_NET Creates and trains a pattern recognition neural network.
% Create Network
numHiddenNeurons = 35;
net = newpr(inputs,targets,numHiddenNeurons);
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train and Apply Network
[net,tr] = train(net,inputs,targets);% <== ERROR OCCURS HERE<==
outputs = sim(net,inputs);
% Plot
plotperf(tr);
plotconfusion(targets,outputs);
save net net
Here is the complete Error:
Error using zeros
Maximum variable size allowed by the program is exceeded.
Error in nnMex2.codeHints (line 117)
hints.TEMP =
zeros(1,ceil(tempSize/8),'double');
Error in nncalc.setup2 (line 13)
calcHints = calcMode.codeHints(calcHints);
Error in network/train (line 306)
[calcLib,calcNet] = nncalc.setup2(calcMode,calcNet,calcData,calcHints);
Error in create_pr_net (line 14)
[net,tr] = train(net,inputs,targets);
Error in executeMe (line 22)
net=create_pr_net(vInputs,vTargets);
Please Help me and ask me if I missed anything and need to specify some more details.
value of tempSize:
EDIT: Well I figured it out that as I am using 32bit system so I can address a maximum of 2^32 = 4.294967e+09 at a time.
While if I was using 64bit I could be able to address about 2^64 = 9.22337e+18 address at a time.
So can you guys give me some idea about how to make it work on my system.
I am facing problem in running huge data set in matlab NN Toolbox- the problem is-> when i use trainlm algorithm, NN Toolbox fails to run the data and shows Out of memory error, but for other algorithms there is no memory problem. Why is this so? Moreover when i put hidden neuron more than 15 it also shows out of memory. How to solve this kind of problems?
One more thing: i put 10, 45, 45 % data division for training -validation and testing, but after running the codes i found that in the workspace it executed 25% data for training, 37% data for validation, and 37% data for testing purpose. How to resolve this issue?
Do anybody have idea how to solve this kind of problems? I will be glad to have the comments and any kind of suggestion. Thanks.
I am using R2010b version of MATLAB in my laptop which is running in Windows 7.
Here is the code i used for training the dataset
EX_355 = xlsread('Training Dataset.xlsx','B2:B435106');
EX_532 = xlsread('Training Dataset.xlsx','C2:C435106');
BA_355 = xlsread('Training Dataset.xlsx','D2:D435106');
BA_532 = xlsread('Training Dataset.xlsx','E2:E435106');
BA_1064 = xlsread('Training Dataset.xlsx','F2:F435106');
Reff = xlsread('Training Dataset.xlsx','G2:G435106');
Input(1,:) = EX_355;
Input(2,:) = EX_532;
Input(3,:) = BA_355;
Input(4,:) = BA_532;
Input(5,:) = BA_1064;
Target(1,:) = Reff;
net = feedforwardnet;
net = configure(net,Input,Target);
net = init(net);
inputs = Input;
targets = Target;
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
net.divideFcn = 'dividerand';
net.divideMode = 'sample';
net.divideParam.trainRatio = 10/100;
net.divideParam.valRatio = 45/100;
net.divideParam.testRatio = 45/100;
net.trainFcn = 'trainlm';
net.performFcn = 'mse';
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ... 'plotregression', 'plotfit'};
[net,tr] = train(net,inputs,targets);
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
net.trainParam.epochs;
net.trainParam.time;
net.trainParam.goal;
net.trainParam.min_grad;
net.trainParam.mu_max;
net.trainParam.max_fail;
net.trainParam.show;
paste this before "train"
net.efficiency.memoryReduction = NUMBER;
change this number till the code run
you can increment from 1 --> inf
more description is available # http://www.mathworks.com/help/nnet/ug/train-the-network.html
You can check this link http://www.mathworks.co.uk/help/nnet/ug/optimize-neural-network-training-speed-and-memory.html
This is a quote from MathWorks:
If MATLAB is being used and memory limitations are a problem, the amount of temporary storage needed can be reduced by a factor of N, in exchange for performing the computations N times sequentially on each of N subsets of the data.
net2 = train(net1,x,t,'reduction',N);
This is called memory reduction.
Keep increasing the value of N till your code runs.