I have following 10 fold implementation, I am using data set publish by UCI Machine learning, Here is the link for the data set:
Here are my dimensions
x =
data: [178x13 double]
labels: [178x1 double]
This is the error that I am getting
Index exceeds matrix dimensions.
Error in GetTenFold (line 33)
results_cell{i,2} = shuffledMatrix(testRows ,:);
This is my code:
%Function that accept data file as a name and the number of folds
%For the cross fold
function [results_cell] = GetTenFold(dataFile, x)
%loading the data file
dataMatrix = load(dataFile);
%combine the data and labels as one matrix
X = [dataMatrix.data dataMatrix.labels];
%geting the length of the of matrix
dataRowNumber = length(dataMatrix.data);
%shuffle the matrix while keeping rows intact
shuffledMatrix = X(randperm(size(X,1)),:);
crossValidationFolds = x;
%Assinging number of rows per fold
numberOfRowsPerFold = dataRowNumber / crossValidationFolds;
crossValidationTrainData = [];
crossValidationTestData = [];
%Assigning 10X2 cell to hold each fold as training and test data
results_cell = cell(10,2);
%starting from the first row and segment it based on folds
i = 1;
for startOfRow = 1:numberOfRowsPerFold:dataRowNumber
testRows = startOfRow:startOfRow+numberOfRowsPerFold-1;
if (startOfRow == 1)
trainRows = (max(testRows)+1:dataRowNumber);
else
trainRows = [1:startOfRow-1 max(testRows)+1:dataRowNumber];
i = i + 1;
end
%for i=1:10
results_cell{i,1} = shuffledMatrix(trainRows ,:);
results_cell{i,2} = shuffledMatrix(testRows ,:); %This is where I am getting my dimension error
%end
%crossValidationTrainData = [crossValidationTrainData ; shuffledMatrix(trainRows ,:)];
%crossValidationTestData = [crossValidationTestData ;shuffledMatrix(testRows ,:)];
end
end
You're looping over 1:numberOfRowsPerFold:dataRowNumber which is 1:x:178 and i increments every time. So that's a way you can get the index out of bounds error on results_cell.
Another way to get the error is that testRows selects rows out of bound of shuffledMatrix.
Learn to debug
To pause the code and start debugging when the error occurs, run dbstop if error before executing your code. This way the compiler goes in debug mode upon encountering an error and you can inspect the state of variables right before things mess up.
(to disable this debugging mode, run dbclear if error.)
Related
% 3. Calculation of strain energy density
% CALCULATION OF STRAIN-ENERGY-DENSITY FOR EACH LOAD CASE
% u=1/2*sigma*epsilon
for p = 1:N_ele
uLS1(p) = 1/2*(sigma_1(p,2:7)*epsilon_1(p,2:7)');
uLS2(p) = 1/2*(sigma_2(p,2:7)*epsilon_2(p,2:7)');
uLS3(p) = 1/2*(sigma_3(p,2:7)*epsilon_3(p,2:7)');
end
% AVERAGE OF ALL LOAD CASES
sed(:,a) = (uLS1' + uLS2' + uLS3')/3; %11 ... line
Error on command window:
"Unrecognized function or variable 'uLS1'."
Error in main_file (line 86)
sed(:,a) = (uLS1' + uLS2' + uLS3')/3;
Regarding the error: The variable sed must have N_ele rows such that size(sed,1) = N_ele. If the number N_ele changes with every iteration a, then you can use a cell instead an array, i.e., sed{a} = (uLS1' + uLS2' + uLS3')/3;.
Regarding the warning: Preallocate the arrays uLS1, uLS2, and uLS3 before the for-loop when you know the size they will have, i.e.,
uLS1 = zeros(1, N_ele);
uLS2 = zeros(1, N_ele);
uLS3 = zeros(1, N_ele);
If you don't know their sizes in advance, you have the choice to ignore Matlab's warning and proceed as is.
I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another
Hoping you may be able to assist me with this error. I am running some code to fit curves to ages using a cross validation regime. I iterate the curve fitting 1000 times to assess the best fit.
I define my models as:
linear_ft = fittype({'x', '1'});
monotonic_ft= fittype({'-1/x', '1'});
quadratic_ft = fittype('poly2');
I then run the following to iterate through different selections of data splitting, recording the residuals following the curve fit...
Data = randn(4,300,10,10);
Ages = randn(300,1);
for thisDim1 = 1:4
for thisDim2 = 1:10
for thisDim3 = 1:10
for nIts = 1:1000
RandomOrder = randperm(300,300);
Fit_Subs = RandomOrder(1:length(Ages)/2); % Take random subs to fit to
Test_Subs = RandomOrder(length(Ages)/2+1:300); % Take random subs to test fit to
Fit_Data = squeeze(Data(thisDim1,Fit_Subs,thisDim2,thisDim3)); % Take data to fit to
Test_Data = squeeze(Data(thisDim1,Test_Subs,thisDim2,thisDim3)); % Take data to test fit
Fit_Ages = Ages;
Fit_Ages(Fit_Subs) = []; %Take ages of Fit Subs only
Test_Ages = Ages;
Test_Ages(Test_Subs) = []; % Take ages of Test Subs only
Nsubs = (length(Ages)/2);
% Model Data using Curves
fFit_Lin = fit(Fit_Ages,Fit_Data',linear_ft);
fFit_Mon = fit(Fit_Ages,Fit_Data',monotonic_ft);
fFit_Quad = fit(Fit_Ages,Fit_Data',quadratic_ft);
% Fit Modelled Data to Test Data
tFit_Lin = fFit_Lin(Test_Ages);
tFit_Mon = fFit_Mon(Test_Ages);
tFit_Quad = fFit_Quad(Test_Ages);
% Calculate Median Residual
Lin_Med_Resid(nIts) = median(tFit_Lin - Test_Data');
Mon_Med_Resid(nIts) = median(tFit_Mon - Test_Data');
Quad_Med_Resid(nIts) = median(tFit_Quad - Test_Data');
end
end
end
end
If you run this with the fourth loop (nIts) as a for-loop it will run. If you run it as a parfor-loop it won't stating the error:
Error using fit>iFit (line 264)
The name 'lower' is not an accessible property for an instance of class
'llsqoptions'.
Error in fit (line 108) [fitobj, goodness, output, convmsg] = iFit(
xdatain, ydatain, fittypeobj, ...
Does anyone have any idea how to fix this? I would be most grateful for any advice!!
Thanks,
Ben
Try restarting MATLAB or typing clear all to see if it clears things up for you.
Your code works for me, but the parallel toolbox can be a bit finicky in my experience.
Finding maximum values of wave heights and wave lengths
dwcL01 though dwcL10 is arrays of <3001x2 double> with output from a numerical wave model.
Part of my script:
%% Plotting results from SWASH
% Examination of phase velocity on deep water with different number of layers
% Wave height 3 meters, wave peroid 8 sec on a depth of 30 meters
clear all; close all; clc;
T=8;
L0=1.56*T^2;
%% Loading results tabels.
load dwcL01.tbl; load dwcL02.tbl; load dwcL03.tbl; load dwcL04.tbl;
load dwcL05.tbl; load dwcL06.tbl; load dwcL07.tbl; load dwcL08.tbl;
load dwcL09.tbl; load dwcL10.tbl;
M(:,:,1) = dwcL01; M(:,:,2) = dwcL02; M(:,:,3) = dwcL03; M(:,:,4) = dwcL04;
M(:,:,5) = dwcL05; M(:,:,6) = dwcL06; M(:,:,7) = dwcL07; M(:,:,8) = dwcL08;
M(:,:,9) = dwcL09; M(:,:,10) = dwcL10;
%% Finding position of wave crest using diff and sign.
for ii=1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc(:,:,ii) = M(Tp,1,ii);
L(:,ii) = diff(Wc(:,1,ii))
end
The loop
for ii=1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc(:,:,ii) = M(Tp,1,ii);
L(:,ii) = diff(Wc(:,1,ii))
end
Works fine for ii = 1 Getting the following error for ii = 2
Index exceeds matrix dimensions.
Error in mkPlot (line 19)
Wc(:,:,i) = M(Tp,:,i);
Don't have the same number of wave crests for the different set ups, naturally M(Tp,1,ii) will have different dimensions. How do I work around this issue? Can it be done in a for loop? please feel free to email me or other wise ask for further information.
The problem is that Tp is a three dimensional array. I need to call the Tp(:,:,ii) corresponding to the present scenario. Together with this and defining Wc as a cell I solve my issue.
for ii = 1:10
Tp(:,1,ii) = diff(sign(diff([M(1,2,ii);M(:,2,ii)]))) < 0;
Wc{:,:,ii} = M(Tp(:,:,ii),1,ii);
L{:,ii} = diff(cell2mat(Wc(ii)));
end
I am on a project thumb recognition system on matlab. I implemented Kmean Algorithm and I got results as well. Actually now I want to plot the results like here they done. I am trying but couldn't be able to do so. I am using the following code.
load training.mat; % loaded just to get trainingData variable
labelData = zeros(200,1);
labelData(1:100,:) = 0;
labelData(101:200,:) = 1;
k=2;
[trainCtr, traina] = kmeans(trainingData,k);
trainingResult1=[];
for i=1:k
trainingResult1 = [trainingResult1 sum(trainCtr(1:100)==i)];
end
trainingResult2=[];
for i=1:k
trainingResult2 = [trainingResult2 sum(trainCtr(101:200)==i)];
end
load testing.mat; % loaded just to get testingData variable
c1 = zeros(k,1054);
c1 = traina;
cluster = zeros(200,1);
for j=1:200
testTemp = repmat(testingData(j,1:1054),k,1);
difference = sum((c1 - testTemp).^2, 2);
[value index] = min(difference);
cluster(j,1) = index;
end
testingResult1 = [];
for i=1:k
testingResult1 = [testingResult1 sum(cluster(1:100)==i)];
end
testingResult2 = [];
for i=1:k
testingResult2 = [testingResult2 sum(cluster(101:200)==i)];
end
in above code trainingData is matrix of 200 X 1054 in which 200 are images of thumbs and 1054 are columns. actually each image is of 25 X 42. I reshaped each image in to row matrix (1 X 1050) and 4 other (some features) columns so total of 1054 columns are in each image. Similarly testingData I made it in the similar manner as I made testingData It is also the order of 200 X 1054. Now my Problem is just to plot the results as they did in here.
After selecting 2 features, you can just follow the example. Start a figure, use hold on, and use plot or scatter to plot the centroids and the data points. E.g.
selectedFeatures = [42,43];
plot(trainingData(trainCtr==1,selectedFeatures(1)),
trainingData(trainCtr==1,selectedFeatures(2)),
'r.','MarkerSize',12)
Would plot the selected feature values of the data points in cluster 1.