How to do a 10-fold cross validation without using built-in function in MATLAB? - matlab

I try to do a 10 folds cross validation without using built-in function to train and recognize digit from 0-9 I have sample of 500 picture(50 for each digit to train and test.)
I try implement the answer MATLAB: 10 fold cross Validation without using existing functions and other websites but it didn't help that much. Mostly because I'm new to MATLAB so I don't know much about what I should do to tweak it.
This is the code I have so far.
c=zeros(10,size(x,2),size(x,3));
K=10;
k=10;
test= 1:50/K;
for fold =1:K
if(test(1)~=1)
train = x(1:test(1)-1,:,:);
if (test(5) ~=50)
train=[train ; x(test(end):50,:,:)];
end
else
train = x(test(1):50,:,:);
end
test = test+ones(1,50/K)*50/K;
end
for i =0:9
test=test+50/K*ones(1,5);
c(i+1,:,:)=cal_likelihood(x(1+i*50:50+i*50,:,:),50/k*(k-1));
end
Variable explanation
x is the 500x28x28 double where it keeps all 500 digit picture.
test is a test set.
train is a training set.
In order to do 10 folds cross validation I need to change training set like
1st fold : 1:5 for test,6:45 for train
2nd fold : 6:10 for test,1:5 11:50 for train and so on
The problem is I don't know how to shift the training set from one set to another like from 6:45 to 1:5 and 11:50. or Can I write a better loop than this?
PSS. If someone who answer this don't mind What does 500x28x28 double actually mean.

There are a few ways you could write this, some of which are easier to understand than others. Matlab is quite nice to write in as while expressions such as 1:3 evaluate to [1,2,3], the expression 1:0 evaluates to the empty set. So, it is very straightforward to generate the sets without having to use if statements.
I'd start off the loop as:
samples_per_digit=50;
block_sze=samples_per_digit/K;
for fold =1:K
test_ind = 1+(fold-1)*block_sze:fold*block_sze;
train_ind = [1:(fold-1)*block_sze, (fold*block_sze+1):samples_per_digit];
for i=0:9
train=x(train_ind+i*samples_per_digit,:,:);
test=x(test_ind+i*samples_per_digit,:,:);
% Perform training and validation in here for this fold of the digit i
You can verify that test_ind and train_ind correspond to the subsets of blocks of training and validation that you need. It is only in the innermost loop that these translate to the matrices corresponding to the digit images, using the value of i to compute the offset. Of course, if you wish, you can swap the order of the loops, computing all of the folds for a single digit. It all depends on how you wish to store your results.

Related

How to use `crossval` in matlab for a Leave one Out Validation method

I have been reading the documentation: here and here but it's really unclear for me and I don't see how to use pratically crossval to do a leave one out cross-validation.
vals = crossval(fun,X)
vals = crossval(fun,X,Y,...)
mse = crossval('mse',X,y,'Predfun',predfun)
mcr = crossval('mcr',X,y,'Predfun',predfun)
val = crossval(criterion,X1,X2,...,y,'Predfun',predfun)
vals = crossval(...,'name',value)
I really don't understand the funpart.
I have estimatimate chlorophyll rate with different index. Then I have done a linear regression between those index and the field taken chlorophyll rate. Now I want to validate them, one of my estimation is a column with 22 entries, so I want to use 21 of them as trainee and 1 as a test, and do 22 loops so that all the data have been used as test.
But I don't where should I put the regression model? If my regression is Y=aX+b,
do I re-use the a and b calculated before for the train part, or do I do a new linear regression with the train part then see what's the test will be with that?
I am not sure I totally understood how to make a leave one out model.
Then I want to know the result of the test by calculating the RMSE (and maybe the R²).
How do I code that using crossval?
I saw the answer to the question here but I don't have access to the crossvalind fonction with my license.
Well I finaly figure it out: so this is my script:
First I charged my data and the linear regression fonction
X=indicesCha_without_Cloud(:,3);
y=Cha_g_m2t_without_Cloud(:,3);
testval=#(XTRAIN,ytrain,XTEST)Linear_regression_indices( XTRAIN,ytrain,XTEST);
where in my case fun(in the Mathwork help) is testvaland Linear_regression_indices is a very simple fonction:
function [ Linear_regression_indices ] = Linear_regression_indices(XTRAIN,ytrain,XTEST )
Linear_regression_indices=(polyval(polyfit(XTRAIN,ytrain,1),XTEST));
end
There is 2 ways to do it and they both give the same result:
one by using simply the crossval fonction
cvMse = crossval('mse',X,y,'predfun',testval,'leaveout',1);
this will do as many fold as the data size, using each time one of the data as Xtest
the second one is using cvpartition
c = cvpartition(n,'LeaveOut') creates a random partition for leave-one-out cross validation on n observations. Leave-one-out is a special case of 'KFold', in which the number of folds equals the number of observations. link
c = cvpartition(y,'LeaveOut');
cvMse2=crossval('mse',X,y,'predfun',testval,'partition',c);
then the RMSE can be easily calculated
RMSE=sqrt(cvMse);
RMSE2=sqrt(cvMse2);
then I simply get my answer, in my case RMSE=0,3548

Leave one out - MATLAB

I was trying to classify a dataset using the following strategy:
Leave one out cross validation
KNN to classify (count the number of errors) for each "fold"
Calculate final error
repeat for k=[1,2,3,4,5,7,10,12,15,20]
Here's the code, for the fisheriris dataset:
load fisheriris
cur=meas;true_label=species;
for norm=0:2
feats=normalizamos(cur,norm); %this is just a function I use in my dataset
for normalization. norm=0 equals no normalization
norm=1 and norm=2 are two different normalizations
c=cvpartition(size(feats,1),'leaveout');
for k=[1,2,3,4,5,7,10,12,15,20]
clear n_erros
for i=1:c.NumTestSets
tr=c.training(i);te=c.test(i);
train_set=feats(tr,:);
test_set=feats(te,:);
train_class=true_label(tr);
test_class=true_label(te);
pred=knnclassify(test_set,train_set,train_class,k);
n_erros(i)=sum(~strcmp(pred,test_class));
end
err_rate=sum(n_erros)/sum(c.TestSize)
end
end
Since the results (for my dataset) showed strange incoherent values, I decided to write my own version of LOO, as follows:
for i=1:size(cur,1)
test_set=feats(i,:);
test_class=true_label(i);
if i==1
train_set=feats(i+1:end,:);
train_class=true_label(i+1:end);
else
train_set=[feats(1:i-1,:);feats(i+1:end,:)];
train_class=[true_label(1:i-1);true_label(i+1:end)];
end
pred=knnclassify(test_set,train_set,train_class,k);
n_erros(i)=sum(~strcmp(pred,test_class));
end
Assuming my version of the code is well written, I was hoping for the same, or at least similar results. Here are both outcomes:
Any idea why the results are so different? What version should I use?
Now I'm thinking to rewrite the other tests I did (for 3-fold, 5-fold, etc...) just to be sure.
Thank you all

Create different `randperm` numbers in loops

Suppose that we have this structure:
for i=1:x1
Out = randperm(40);
Out_Final = %% divide 'Out' to 10 parts. and select these parts for some purposes
for j=1:x2
%% Process on `Out_Final`
end
end
I'm using outer loop (for i=1:x1) to repeat main process (for j=1:x2) loop and average between outputs to have more robust results. I want randperm doesn't result equal (or near equal) outputs. I want have different Output for this function as far as possible in every calling in (for i=1:x1) loop.
How can i do that in MATLAB R2014a?
The randomness algorithms used by randperm are very good. So, don't worry about that.
However, if you draw 10 random numbers from 1 to 10, you are likely to see some more frequently than others.
If you REALLY don't want this, you should probably not focus on randomly selecting the numbers, but on selecting the numbers in a way that they are nicely spread out througout their possible range. (This is a quite different problem to solve).
To address your comment:
The rng function allows you to create reproducible results, make sure to check doc rng for examples.
In your case it seems like you actually don't want to reset the rng each time, as that would lead to correlated random numbers.

How to sort in ascending order the solution vector in each iteration using ODE?

I've got an ODE system working perfectly. But now, I want in each iteration, sort in ascending order the solution vector. I've tried many ways but I could not do it. Does anyone know how to do?
Here is a simplified code:
function dtemp = tanque1(t,temp)
for i=1:N
if i==1
dtemp(i)=(((-k(i)*At*(temp(i)-temp(i+1)))/(y))-(U*As(i)*(temp(i)-Tamb)))/(ro(i)*vol_nodo*cp(i));
end
if i>1 && i<N
dtemp(i)=(((k(i)*At*(temp(i-1)-temp(i)))/(y))-((k(i)*At*(temp(i)-temp(i+1)))/(y))-(U*As(i)*(temp(i)-Tamb)))/(ro(i)*vol_nodo*cp(i));
end
if i==N
dtemp(i)=(((k(i)*At*(temp(i-1)-temp(i)))/(y))-(U*As(i)*(temp(i)-Tamb)))/(ro(i)*vol_nodo*cp(i));
end
end
end
Test Script:
inicial=343.15*ones(200,1);
[t temp]=ode45(#tanque1,0:360:18000,inicial);
It looks like you have three different sets of differential equations depending on the index i of the solution vector. I don't think you mean "sort," but rather a more efficient way to implement what you've already done - basically vectorization. Provided I haven't accidentally made any typos (you should check), the following should do what you need:
function dtemp = tanque1(t,temp)
dtemp(1) = (-k(1)*At*(temp(1)-temp(2))/y-U*As(1)*(temp(1)-Tamb))/(ro(1)*vol_nodo*cp(1));
dtemp(2:N-1) = (k(2:N-1).*(diff(temp(1:N-1))-diff(temp(2:N)))*At/y-U*As(2:N-1).*(temp(2:N-1)-Tamb))./(vol_nodo*ro(2:N-1).*cp(2:N-1));
dtemp(N) = (k(N)*At*(temp(N-1)-temp(N))/y-U*As(N)*(temp(N)-Tamb))/(ro(N)*vol_nodo*cp(N));
You'll still need to define N and the other parameters and ensure that temp is returned as a column vector. You could also try replacing N with the end keyword, which might be faster. The two uses of diff make the code shorter, but, depending on the value of N, they may also speed up the calculation. They could be replaced with temp(1:N-2)-temp(2:N-1) and temp(2:N-1)-temp(3:N). It may be possible to collapse these down to a single vectorized equation, but I'll leave that as an exercise for you to attempt if you like.
Note that I also removed a great many unnecessary parentheses for clarity. As you learn Matlab you'll to get used to the order of operations and figure out when parentheses are needed.

Parallelize or vectorize all-against-all operation on a large number of matrices?

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.
In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so
list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));
for i=1:numel(list)
for j=1:numel(list)
if i ~= j
eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
end
end
end
is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:
# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
[nextData{:}] = deal(data{i});
H{:,i} = cellfun(#compare,data,nextData,'UniformOutput',false);
end
Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).
Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?
Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).
Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear Hi. Unfortunately, the saves are very slow...
Does
compare(a,b) == compare(b,a)
and
compare(a,a) == 1
If so, change your loop
for i=1:numel(list)
for j=1:numel(list)
...
end
end
to
for i=1:numel(list)
for j= i+1 : numel(list)
...
end
end
and deal with the symmetry and identity case. This will cut your calculation time by half.
The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.
%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
%# only make the necessary comparisons
H{i+1:k,i} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
%# if the above doesn't work, try this
hSlice = cell(k,1);
hSlice{i+1:k} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
H{:,i} = hSlice;
end
If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.