MATLAB: 10 fold cross Validation without using existing functions - matlab

I have a matrix (I guess in MatLab you call it a struct) or data structure:
data: [150x4 double]
labels: [150x1 double]
here is out my matrix.data looks like assume I do load my file with the name of matrix:
5.1000 3.5000 1.4000 0.2000
4.9000 3.0000 1.4000 0.2000
4.7000 3.2000 1.3000 0.2000
4.6000 3.1000 1.5000 0.2000
5.0000 3.6000 1.4000 0.2000
5.4000 3.9000 1.7000 0.4000
4.6000 3.4000 1.4000 0.3000
5.0000 3.4000 1.5000 0.2000
4.4000 2.9000 1.4000 0.2000
4.9000 3.1000 1.5000 0.1000
5.4000 3.7000 1.5000 0.2000
4.8000 3.4000 1.6000 0.2000
4.8000 3.0000 1.4000 0.1000
4.3000 3.0000 1.1000 0.1000
5.8000 4.0000 1.2000 0.2000
5.7000 4.4000 1.5000 0.4000
5.4000 3.9000 1.3000 0.4000
5.1000 3.5000 1.4000 0.3000
5.7000 3.8000 1.7000 0.3000
5.1000 3.8000 1.5000 0.3000
And here is my matrix.labels look like
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
I am trying to create 10 cross fold validation without using any of the existing functions in MatLab and due to my very limited MatLab knowledge I am having trouble going forward with from what I have. Any help would be great.
This is what I have so far, and I am sure this probably not the matlab way, but I am very new to matlab.
function[output] = fisher(dataFile, number_of_folds)
data = load(dataFile);
%create random permutation indx
idx = randperm(150);
output = data.data(idx(1:15),:);
end

Here is my take for this cross validation. I create dummy data using magic(10) also I create labels randomly. Idea is following , we get our data and labels and combine them with random column. Consider following dummy code.
>> data = magic(4)
data =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
>> dataRowNumber = size(data,1)
dataRowNumber =
4
>> randomColumn = rand(dataRowNumber,1)
randomColumn =
0.8147
0.9058
0.1270
0.9134
>> X = [ randomColumn data]
X =
0.8147 16.0000 2.0000 3.0000 13.0000
0.9058 5.0000 11.0000 10.0000 8.0000
0.1270 9.0000 7.0000 6.0000 12.0000
0.9134 4.0000 14.0000 15.0000 1.0000
If we sort X according column 1, we sort our data randomly. This will give us cross validation randomness. Then next thing is to divide X according to cross validation percentage. Accomplishing this for one case easy enough. Lets consider %75 percent is train case and %25 percent is test case. Our size here is 4, then 3/4 = %75 and 1/4 is %25.
testDataset = X(1,:)
trainDataset = X(2:4,:)
But accomplishing this a bit harder for N cross folds. Since we need to make this N times. For loop is necessary for this. For 5 cross folds. I get , in first f
1st fold : 1 2 for test, 3:10 for train
2nd fold : 3 4 for test, 1 2 5:10 for train
3rd fold : 5 6 for test, 1:4 7:10 for train
4th fold : 7 8 for test, 1:6 9:10 for train
5th fold : 9 10 for test, 1:8 for train
Following code is an example for this process:
data = magic(10);
dataRowNumber = size(data,1);
labels= rand(dataRowNumber,1) > 0.5;
randomColumn = rand(dataRowNumber,1);
X = [ randomColumn data labels];
SortedData = sort(X,1);
crossValidationFolds = 5;
numberOfRowsPerFold = dataRowNumber / crossValidationFolds;
crossValidationTrainData = [];
crossValidationTestData = [];
for startOfRow = 1:numberOfRowsPerFold:dataRowNumber
testRows = startOfRow:startOfRow+numberOfRowsPerFold-1;
if (startOfRow == 1)
trainRows = [max(testRows)+1:dataRowNumber];
else
trainRows = [1:startOfRow-1 max(testRows)+1:dataRowNumber];
end
crossValidationTrainData = [crossValidationTrainData ; SortedData(trainRows ,:)];
crossValidationTestData = [crossValidationTestData ;SortedData(testRows ,:)];
end

Hahaha sorry, no solution. Don't have MATLAB on me right now so can't check code for errors. But here's the general idea:
Generate k (in your case 10) subsamples
Start two counters at 1 and preallocate new matrix: index = 1; subsample = 1; newmat = zeros("150","6") < 150 is the number of samples, 6 = 4 wide data + 1 wide labels + 1 we will use later
While you still have data: while ( length(labels) > 0 )
Generate a random number within the amount of data left: randNum = randi(length(labels))? I think that's a random int that goes from 1 to the size of your labels array (it could be 0, please check the doc - if it is, do simple math to make it 1 < rand < length)
Add that row to a new data set with labels: newmat(index,:) = [data(randNum,:) labels(randNum) subsample] < that last column is the subsample number from 1-10
Delete the row from data and labels: data(randNum,:) = []; same for labels < note this will physically remove a row from the matrices, which is why we have to use a while loop and check for length > 0 rather than a for loop and simple indices
Increment counters: index = index + 1; subsample = subsample + 1;
if subsample = 11, make it 1 again.
At the end of this, you should have a large data matrix that looks almost exactly like your original, but has randomly assigned "fold labels".
Loop over all this and your executing code k (10) times.
EDIT: code placed in more accessible manner. NOTE it's still pseudo-y code and is not complete! Also, you should note that this is NOT AT ALL the most efficient way, but shouldn't be too bad if you can't use matlab functions.
for k = 1:10
index = 1; subsample = 1; newmat = zeros("150","6");
while ( length(labels) > 0 )
randNum = randi(length(labels));
newmat(index,:) = [data(randNum,:) labels(randNum) subsample];
data(randNum,:) = []; same for labels
index = index + 1; subsample = subsample + 1;
if ( subsample == 11 )
subsample = 1;
end
end
% newmat is complete, now run code here using the sampled data
%(ie pick a random number from 1:10 and use that as your validation fold. the rest for training
end
EDIT FOR ANSWER #2:
Ok another way, is to create a vector that is as long as your data set
foldLabels = zeros("150",1);
Then, looping for that long (150), assign labels to random indices!
foldL = 1;
numAssigned = 0;
while ( numAssigned < 150 )
idx = randi(150);
% no need to reassign a given label, so check if is still 0
if ( foldLabels(idx) == 0 )
foldLabels(idx) = foldL;
numAssigned++; % not matlab code, just got lazy. you get it
foldL++;
if ( foldL > 10 )
foldL = 1;
end
end
end
EDIT FOR ANSWER #2.5
foldLabels = zeros("150",1);
for i = 1:150
notChosenLabels = [notChosenLabels i];
end
foldL = 1;
numAssigned = 0;
while ( length(notChosenLabels) > 0 )
labIdx = randi(length(notChosenLabels));
idx = notChosenLabels(labIdx);
foldLabels(idx) = foldL;
numAssigned++; % not matlab code, just got lazy. you get it
foldL++;
if ( foldL > 10 )
foldL = 1;
end
notChosenLabels(labIdx) = [];
end
EDIT FOR RANDPERM
generate the indices with randperm
idxs = randperm(150);
now just assign
foldLabels = zeros(150,1);
for i = 1:150
foldLabels(idxs(i)) = sampleLabel;
sampleLabel = sampleLabel + 1;
if ( sampleLabel > 10 )
sampleLabel = 1;
end
end

Related

Solve System of Linear Equations in MatLab with Matrix of Arbitrary Size for Finite Difference Calculation

I am trying to write a script in MatLab R2016a that can solve a system of linear equations that can have different sizes depending on the values of p and Q.
I have the following equations that I am trying to solve, where h=[-p:1:p]*dx. Obviously, there is some index m where h=0, but that shouldn't be a problem.
I'm trying to write a function where I can input p and Q and build the matrix and then just solve it to get the coefficients. Is there a way to build a matrix using the variables p, Q, and h instead of using different integer values for each individual case?
I would use bsxfun(in recent matlab versions this function may be implented to the interpreter, I don't know for sure):
p = 4;
Q = 8;
dx = 1;
h = -p:p*dx
Qvector = [Q,1:Q-1]'
Matrix = bsxfun(#(Qvector, h)h.^(Qvector)./factorial(Qvector), Qvector, h)
Output:
h =
-4 -3 -2 -1 0 1 2 3 4
Qvector =
8
1
2
3
4
5
6
7
Matrix =
1.6254 0.1627 0.0063 0.0000 0 0.0000 0.0063 0.1627 1.6254
-4.0000 -3.0000 -2.0000 -1.0000 0 1.0000 2.0000 3.0000 4.0000
8.0000 4.5000 2.0000 0.5000 0 0.5000 2.0000 4.5000 8.0000
-10.6667 -4.5000 -1.3333 -0.1667 0 0.1667 1.3333 4.5000 10.6667
10.6667 3.3750 0.6667 0.0417 0 0.0417 0.6667 3.3750 10.6667
-8.5333 -2.0250 -0.2667 -0.0083 0 0.0083 0.2667 2.0250 8.5333
5.6889 1.0125 0.0889 0.0014 0 0.0014 0.0889 1.0125 5.6889
-3.2508 -0.4339 -0.0254 -0.0002 0 0.0002 0.0254 0.4339 3.2508

Operations within matrix avoiding for loops

I have a rather simple question but I can't get the proper result in MATLAB.
I am writing a code in Matlab where I have a 200x3 matrix. This data corresponds to the recording of 10 different points, for each of which I took 20 frames.
This is just in order to account for the error in the measuring system. So now I want to calculate the 3D coordinates of each point from this matrix by calculating an average of the measured independent coordinates.
An example (for 1 point with 3 measurements) would be:
MeasuredFrames (Point 1) =
x y z
1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
Point = mean(MeasuredFrames(1:3, :))
Point =
1.0000 2.0667 3.0000
Now I want to get this result but for 10 points, all stored in a [200x3] array, in intervals of 20 frames.
Any ideas?
Thanks in advance!
If you have the Image processing toolbox blockproc could be an option:
A = blockproc(data,[20 3],#(x) mean(x.data,1))
If not the following using permute with reshape works as well:
B = permute(mean(reshape(data,20,10,3),1),[2,3,1])
Explanation:
%// transform data to 3D-Matrix
a = reshape(data,20,10,3);
%// avarage in first dimension
b = mean(a,1);
%// transform back to 10x3 matrix
c = permute(b,[2,3,1])
Some sample data:
x = [ 1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
1.0000 2.0000 3.0000
1.1000 2.2000 2.9000
0.9000 2.0000 3.1000
1.1000 2.2000 2.9000]
data = kron(1:20,x.').';
A = B =
1.5150 3.1200 4.4850
3.5350 7.2800 10.4650
5.5550 11.4400 16.4450
7.5750 15.6000 22.4250
9.5950 19.7600 28.4050
11.6150 23.9200 34.3850
13.6350 28.0800 40.3650
15.6550 32.2400 46.3450
17.6750 36.4000 52.3250
19.6950 40.5600 58.3050
If you do not have access to the blockproc function, you can do it with a combination of reshape:
np = 20 ; %// number of points for averaging
tmp = reshape( A(:) , np,[] ) ; %// unfold A then group by "np"
tmp = mean(tmp); %// calculate mean for each group
B = reshape(tmp, [],3 ) ; %// reshape back to nx3 matrix
In your case, replace A by MeasuredFrames and B by Points, and group in one line:
Points = reshape(mean(reshape( MeasuredFrames (:) , np,[] )), [],3 ) ;
Matrix multiplication can be used:
N=20;
L=size(MeasuredFrames,1);
Points = sparse(ceil((1:L)/N), 1:L, 1)*MeasuredFrames/N;

Generating a grid in matlab with a general number of dimensions

Problem
I have a vector w containing n elements. I do not know n in advance.
I want to generate an n-dimensional grid g whose values range from grid_min to grid_max and obtain the "dimension-wise" product of w and g.
How can I do this for an arbitrary n?
Examples
For simplicity, let's say that grid_min = 0 and grid_max = 5.
Case: n=1
>> w = [0.75];
>> g = 0:5
ans =
0 1 2 3 4 5
>> w * g
ans =
0 0.7500 1.5000 2.2500 3.0000 3.7500
Case: n=2
>> w = [0.1, 0.2];
>> [g1, g2] = meshgrid(0:5, 0:5)
g1 =
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
g2 =
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
>> w(1) * g1 + w(2) * g2
ans =
0 0.1000 0.2000 0.3000 0.4000 0.5000
0.2000 0.3000 0.4000 0.5000 0.6000 0.7000
0.4000 0.5000 0.6000 0.7000 0.8000 0.9000
0.6000 0.7000 0.8000 0.9000 1.0000 1.1000
0.8000 0.9000 1.0000 1.1000 1.2000 1.3000
1.0000 1.1000 1.2000 1.3000 1.4000 1.5000
Now suppose a user passes in the vector w and we do not know how many elements (n) it contains. How can I create the grid and obtain the product?
%// Data:
grid_min = 0;
grid_max = 5;
w = [.1 .2 .3];
%// Let's go:
n = numel(w);
gg = cell(1,n);
[gg{:}] = ndgrid(grid_min:grid_max);
gg = cat(n+1, gg{:});
result = sum(bsxfun(#times, gg, shiftdim(w(:), -n)), n+1);
How this works:
The grid (variable gg) is generated with ndgrid, using as output a comma-separated list of n elements obtained from a cell array. The resulting n-dimensional arrays (gg{1}, gg{2} etc) are contatenated along the n+1-th dimension (using cat), which turns gg into an n+1-dimensional array. The vector w is reshaped into the n+1-th dimension (shiftdim), multiplied by gg using bsxfun, and the results are summed along the n+1-th dimension.
Edit:
Following #Divakar's insightful comment, the last line can be replaced by
sz_gg = size(gg);
result = zeros(sz_gg(1:end-1));
result(:) = reshape(gg,[],numel(w))*w(:);
which results in a significant speedup, because Matlab is even better at matrix multiplication than at bsxfun (see for example here and here).

Blockdiagonal variation grid

I have the feeling I am missing something intuitive in my solution for generating a partially varied block-diagonal grid. In any case, I would like to get rid of the loop in my function (for the sake of challenge...)
Given tuples of parameters, number of intervals and percentage variation:
params = [100 0.5 1
24 1 0.9];
nint = 1;
perc = 0.1;
The desired output should be:
pspacegrid(params,perc,nint)
ans =
90.0000 0.5000 1.0000
100.0000 0.5000 1.0000
110.0000 0.5000 1.0000
100.0000 0.4500 1.0000
100.0000 0.5000 1.0000
100.0000 0.5500 1.0000
100.0000 0.5000 0.9000
100.0000 0.5000 1.0000
100.0000 0.5000 1.1000
21.6000 1.0000 0.9000
24.0000 1.0000 0.9000
26.4000 1.0000 0.9000
24.0000 0.9000 0.9000
24.0000 1.0000 0.9000
24.0000 1.1000 0.9000
24.0000 1.0000 0.8100
24.0000 1.0000 0.9000
24.0000 1.0000 0.9900
where you can see that the variation occurs at the values expressed by this mask:
mask =
1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
The function pspacegrid() is:
function out = pspacegrid(params, perc, nint)
% PSPACEGRID Generates a parameter space grid for sensitivity analysis
% Size and number of variation steps
sz = size(params);
nsteps = nint*2+1;
% Preallocate output
out = reshape(permute(repmat(params,[1,1,nsteps*sz(2)]),[3,1,2]),[],sz(2));
% Mask to index positions where to place interpolated
[tmp{1:sz(2)}] = deal(true(nsteps,1));
mask = repmat(logical(blkdiag(tmp{:})),sz(1),1);
zi = cell(sz(1),1);
% LOOP per each parameter tuple
for r = 1:sz(1)
% Columns, rows, rows to interpolate and lower/upper parameter values
x = 1:sz(2);
y = [1; nint*2+1];
yi = (1:nint*2+1)';
z = [params(r,:)*(1-perc); params(r,:)*(1+perc)];
% Interpolated parameters
zi{r} = interp2(x,y,z, x, yi);
end
out(mask) = cat(1,zi{:});
I think I got it, building off your pre-loop code:
params = [100 0.5 1
24 1 0.9];
nint = 1;
perc = 0.1;
sz = size(params);
nsteps = nint*2+1;
% Preallocate output
out = reshape(permute(repmat(params,[1,1,nsteps*sz(2)]),[3,1,2]),[],sz(2));
%Map of the percentage moves
[tmp{1:sz(2)}] = deal(linspace(-perc,perc,nint*2+1)');
mask = repmat(blkdiag(tmp{:}),sz(1),1) + 1; %Add one so we can just multiply at the end
mask.*out
So instead of making your mask replicate the ones I made it replicate the percentage moves each element makes which is a repeating pattern, the basic element is made like this:
linspace(-perc,perc,nint*2+1)'
Then it's as simple as adding 1 to the whole thing and multiplying by your out matrix
I tested it as follows:
me = mask.*out;
you = pspacegrid(params, perc, nint);
check = me - you < 0.0001;
mean(check(:))
Seemed to work when I fiddled with the inputs. However I did get an error with your function, I had to change true(...) to ones(...). This might be because I'm running it online which probably uses Octave rather than Matlab.

Inserting rows into matrix matlab

I have a ~ 100000/2 matrix. I'd like to go down the columns, average each vertically adjacent value, and insert that value in between the two values. For example...
1 2
3 4
4 6
7 8
would become
1 2
2 3
3 4
3.5 5
4 6
5.5 7
7 8
I'm not sure if there is a terse way to do this in matlab. I took a look at http://www.mathworks.com/matlabcentral/fileexchange/9984 but it seems to insert all of the rows in a matrix into the other one at a specific point. Obviously it can still be used, but just wondering if there is a simpler way.
Any help is appreciated, thanks.
Untested:
% Take the mean of adjacent pairs
x_mean = ([x; 0 0] + [0 0; x]) / 2;
% Interleave the two matrices
y = kron(x, [1;0]) + kron(x_mean(1:end-1,:), [0;1]);
%# works for any 2D matrix of size N-by-M
X = rand(100,2);
adjMean = mean(cat(3, X(1:end-1,:), X(2:end,:)), 3);
Y = zeros(2*size(X,1)-1, size(X,2));
Y(1:2:end,:) = X;
Y(2:2:end,:) = adjMean;
octave-3.0.3:57> a = [1,2; 3,4; 4,6; 7,8]
a =
1 2
3 4
4 6
7 8
octave-3.0.3:58> b = (circshift(a, -1) + a) / 2
b =
2.0000 3.0000
3.5000 5.0000
5.5000 7.0000
4.0000 5.0000
octave-3.0.3:60> reshape(vertcat(a', b'), 2, [])'(1:end-1, :)
ans =
1.0000 2.0000
2.0000 3.0000
3.0000 4.0000
3.5000 5.0000
4.0000 6.0000
5.5000 7.0000
7.0000 8.0000