I am new to matlab. I want to know how to fixed the train and test set in svm code because I had find a code, the code randomly selects the test and train set. my database is YMU database, how should I fix the train and test set using svm code. because I use the crossvalind to randomly select the train and test set. which variable should I change with the crossvalind?
%load YMU database
%NMC is non-makeup , MC is makeup
%testingset = non-makeup, trainingset is makeup
load TestingSetNMC.mat
load TrainingSetMC.mat
load gnd_Test.mat
load gnd_Train.mat
data1 = TrainingSet;
data2 = TestingSet;
groups1 = ismember(gnd_Train,'data1');
groups2 = ismember(gnd_Test,'data2');
%crossvalind is random choose
[train] = crossvalind('holdOut',groups1);
[test] = crossvalind('holdOut',groups2);
cp = classperf(groups1);
svmStruct = svmtrain(data1(train,:),groups1(train),'showplot',true);
classes = svmclassify(svmStruct,data2(test,:),'showplot',true);
classperf(cp,classes,test);
cp.CorrectRate
With (most) matlab functions that generate pseudo-random output you can control that output by explicitly specifying a random number generator's seed and method.
In your case, place the following line anywhere before you call crossvalind:
rng(1, 'twister');
This sets the seed to 1 and the method to Mersenne Twister. In the documentation for rng you will find a more detailed explanation about controlling pseudo-random output.
Related
I have 297 Grayscale images and I would Like Divide Them into 3 parts (train-test and validation).
Ofcourse, I wrote some sample codes for example following codes from MathWorks (Object Detection Using Faster R-CNN Deep Learning)
% Split data into a training and test set.
idx = floor(0.6 * height(vehicleDataset));
trainingData = vehicleDataset(1:idx,:);
testData = vehicleDataset(idx:end,:);
But Matlab 2018a show the following error
Error:"Undefined function 'height' for input arguments of type
'struct'."
I would like to detect objects in images using "Faster R CNN" method and determine their locations in images.
Suppose your images are saved in the path "C:\Users\Student\Desktop\myImages"
First, create an imageDataStore object to manage a collection of image files.
datapath = "C:\Users\Student\Desktop\myImages";
imds = imageDatastore(datapath);%You may look at documentation for customizations.
[trainds,testds,valds] = splitEachLabel(imds,.6,.2);%Lets say 60% data for training, 20% for testing and 20% for validation
Now you have train data in the variable trainds and test data in the variable testds.
You can retrieve each images using readimage, say 5th image from train set as;
im = readimage(trainds,5);
It seems that cross validated models cannot be used with the predict function. How would one go about using the model with a test set? For example:
ens = fitcecoc(X, T, 'KFold', 10)
Directly using the predict function throws an error and MATLAB documentation explains why it does so very well. ens is a partitioned model with 10 different classifiers. Should we run predict using each classifier and then use the class with the maximum agreement?
Couple of other similar question haven't received answers so I figured I'll answer my own question with the solution I found. MATLAB K-Fold cross validation produces K different classifiers or regressors. They have been generated by holding out portions of data (holdout operation is random, hence if you have an unbalanced dataset - be careful). In order to predict the output class, you could iterate over all the trained K models and use mode to get the accurate class.
cv_Ensemble = crossval(Ensemble_Model, 'KFold', 10);
classIdx = zeros(N, length(cv_Ensemble .Trained));
for p = 1:length(cv_Ensemble .Trained)
[temp, ~] = predict(cv_Ensemble.Trained{p}, Data_f);
classIdx(:, p) = temp;
end
classIdx = mode(classIdx, 2)
I have been reading the documentation: here and here but it's really unclear for me and I don't see how to use pratically crossval to do a leave one out cross-validation.
vals = crossval(fun,X)
vals = crossval(fun,X,Y,...)
mse = crossval('mse',X,y,'Predfun',predfun)
mcr = crossval('mcr',X,y,'Predfun',predfun)
val = crossval(criterion,X1,X2,...,y,'Predfun',predfun)
vals = crossval(...,'name',value)
I really don't understand the funpart.
I have estimatimate chlorophyll rate with different index. Then I have done a linear regression between those index and the field taken chlorophyll rate. Now I want to validate them, one of my estimation is a column with 22 entries, so I want to use 21 of them as trainee and 1 as a test, and do 22 loops so that all the data have been used as test.
But I don't where should I put the regression model? If my regression is Y=aX+b,
do I re-use the a and b calculated before for the train part, or do I do a new linear regression with the train part then see what's the test will be with that?
I am not sure I totally understood how to make a leave one out model.
Then I want to know the result of the test by calculating the RMSE (and maybe the R²).
How do I code that using crossval?
I saw the answer to the question here but I don't have access to the crossvalind fonction with my license.
Well I finaly figure it out: so this is my script:
First I charged my data and the linear regression fonction
X=indicesCha_without_Cloud(:,3);
y=Cha_g_m2t_without_Cloud(:,3);
testval=#(XTRAIN,ytrain,XTEST)Linear_regression_indices( XTRAIN,ytrain,XTEST);
where in my case fun(in the Mathwork help) is testvaland Linear_regression_indices is a very simple fonction:
function [ Linear_regression_indices ] = Linear_regression_indices(XTRAIN,ytrain,XTEST )
Linear_regression_indices=(polyval(polyfit(XTRAIN,ytrain,1),XTEST));
end
There is 2 ways to do it and they both give the same result:
one by using simply the crossval fonction
cvMse = crossval('mse',X,y,'predfun',testval,'leaveout',1);
this will do as many fold as the data size, using each time one of the data as Xtest
the second one is using cvpartition
c = cvpartition(n,'LeaveOut') creates a random partition for leave-one-out cross validation on n observations. Leave-one-out is a special case of 'KFold', in which the number of folds equals the number of observations. link
c = cvpartition(y,'LeaveOut');
cvMse2=crossval('mse',X,y,'predfun',testval,'partition',c);
then the RMSE can be easily calculated
RMSE=sqrt(cvMse);
RMSE2=sqrt(cvMse2);
then I simply get my answer, in my case RMSE=0,3548
I just trained a neural network, and I would like to test it with a new data set that were not included in the training so as to check its performance on new data. This is my code:
net = patternnet(30);
net = train(net,x,t);
save (net);
y = net(x);
perf = perform(net,t,y)
classes = vec2ind(y);
where x and t are my input and target, respectively. I understand that save net and load net; can be used, but my questions are as follows:
At what point in my code should I use save net?
Using save net;, which location on the system is the trained network saved?
When I exit and open MATLAB again, how can I load the trained network and supply new data that I want to test it with?
Please Note: I have discovered that each time I run my code, it gives a different output which I do not want once I have an acceptable result. I want to be able to save the trained neural network such that when I run the code over and over again with the training data set, it gives the same output.
If you just call save net, all your current variables from the workspace will be saved as net.mat. You want to save only your trained network, so you need to use save('path_to_file', 'variable'). For example:
save('C:\Temp\trained_net.mat','net');
In this case the network will be saved under the given file name.
The next time you want to use the saved pre-trained network you just need to call load('path_to_file'). If you don't reinitialize or train this network again, the performance will be the same as before, because all weights and bias values will be the same.
You can see used weights and bias values by checking variables like net.IW{i,j} (input weights), net.LW{i,j} (layer weights) and net.b{i} (bias). As long as they stay the same, the network's performance stay the same.
Train and save
[x,t] = iris_dataset;
net = patternnet;
net = configure(net,x,t);
net = train(net,x,t);
save('C:\Temp\trained_net.mat','net');
y = net(x);
perf = perform(net,t,y);
display(['performance: ', num2str(perf)]);
It returns performance: 0.11748 in my case. The values will be different after each new training.
Load and use
clear;
[x,t] = iris_dataset;
load('C:\Temp\trained_net.mat');
y = net(x);
perf = perform(net,t,y);
display(['performance: ', num2str(perf)]);
It returns performance: 0.11748. The values will be the same when using the network on the same data set. Here we used the training set again.
If you get an absolutely new data set, the performance will be different, but it will always be the same for this particular data set.
clear;
[x,t] = iris_dataset;
%simulate a new data set of size 50
data_set = [x; t];
data_set = data_set(:,randperm(size(data_set,2)));
x = data_set(1:4, 1:50);
t = data_set(5:7, 1:50);
load('C:\Temp\trained_net.mat');
y = net(x);
perf = perform(net,t,y);
display(['performance: ', num2str(perf)]);
It returns performance: 0.12666 in my case.
I constructed a Gaussian Mixture Model in Matlab with a dataset:
model = gmdistribution.fit(data,M,'Replicates',5);
with M = 3 Gaussian components. I tested new data with:
[P, l] = posterior(model,new_data);
I ran the program several times and didn't get the same result. Each run produces different log-likelihood values. I use the log-likelihood to make decisions, and this value for the same data (new_data) differs for each run. What does it depend on? How can I resolve this problem?
First, assuming that you're using a newish version of Matlab, the gmdistribution.fit documentation indicates that the fit method is deprecated and that fitgmdist should be used. See here for an example.
Second, the documentation for gmdistribution.fit indicates that if the 'Replicates' option is larger than 1, the 'randSample' start method will be used to produce the initial parameters. This may be the cause (or at least one of the causes) of your observed variability.
Finally, you can also try using rng before calling gmdistribution.fit to set the seed of the global random number stream (assuming the function doesn't use it's own stream internally). Alternatively, you can try specifying an 'Options' parameter via statset:
seed = 1;
s = RandStream('mt19937ar','Seed',seed);
opts = statset('Streams',s);
model = gmdistribution.fit(data,M,'Replicates',5,'Options',opts);
I can't test this fully myself – see the gmdistribution class documentation for further details.