I've created a neural network to model a certain (simple) input-output relationship. When I look at the time-series responses plot using the nntrain gui the predictions seem quite adequate, however, when I try to do out of sample prediction the results are nowhere close to the function being modelled.
I've googled this problem extensively and messed around with my code to no avail, I'd really appreciate a little insight into what I've been doing wrong.
I've included a minimal working example below.
A = 1:1000; B = 10000*sin(A); C = A.^2 +B;
Set = [A' B' C'];
input = Set(:,1:end-1);
target = Set(:,end);
inputSeries = tonndata(input(1:700,:),false,false);
targetSeries = tonndata(target(1:700,:),false,false);
inputSeriesVal = tonndata(input(701:end,:),false,false);
targetSeriesVal = tonndata(target(701:end,:),false,false);
inputDelays = 1:2;
feedbackDelays = 1:2;
hiddenLayerSize = 5;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
[inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
net.divideFcn = 'divideblock'; % Divide data in blocks
net.divideMode = 'time'; % Divide up every value
% Train the Network
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
Y = net(inputs,inputStates,layerStates);
% Prediction Attempt
delay=length(inputDelays); N=300;
inputSeriesPred = [inputSeries(end-delay+1:end),inputSeriesVal];
targetSeriesPred = [targetSeries(end-delay+1:end), con2seq(nan(1,N))];
netc = closeloop(net);
[Xs,Xi,Ai,Ts] = preparets(netc,inputSeriesPred,{},targetSeriesPred);
yPred = netc(Xs,Xi,Ai);
perf = perform(net,yPred,targetSeriesVal);
legend('Original Targets','Network Predictions','Expected Outputs')
I realise narx net with a time delay is probably overkill for this type of problem but I intend on using this example as a base for a more complicated time-series problem in the future.
Kind regards, James

The most likely causes of poor generalization from the training data to new data is that either (1) there was not enough training data to characterize the problem, or (2) the neural network has more neurons and delays than are needed for the problem so it is overfitting the data (i.e. it is having an easy time memorizing the examples instead of having to figure out how they are related.
The fix for (1) is typically more data. The fix for (2) is to reduce the number of tap delays and/or neurons.
Hope this helps!

I'm not sure if you solved the problem yet. But there is at least one more solution to your problem.
Since you are dealing with a time series it is better (at least in this case) to set net.divideFcn = 'dividerand'. The 'divideblock' will only use the first part of the time series for training which may result in lost information about the long-term trends.

Increase the inputdelay, feedbackdelay and hiddenlayersize as following:
inputDelays = 1:30;
feedbackDelays = 1:3;
hiddenLayerSize = 30;
Also change function as
net.divideFcn = 'dividerand';
this changes work for me even though network take time


EEG data classification with SWLDA using matlab

I want to ask your help in EEG data classification.
I am a graduate student trying to analyze EEG data.
Now I am struggling with classifying ERP speller (P300) with SWLDA using Matlab
Maybe there is something wrong in my code.
I have read several articles, but they did not cover much details.
My data size is described as below.
size(target) = [300 1856]
size(nontarget) = [998 1856]
row indicates the number of trials, column indicates spanned feature
(I stretched data [64 29] (for visual representation I did not select ROI)
I used stepwisefit function in Matlab to classify target vs non-target
Code is attached below.
ingredients = [targets; nontargets];
heat = [class_targets; class_nontargets]; % target: 1, non-target: -1
randomized_set = shuffle([ingredients heat]);
for k=1:10 % 10-fold cross validation
parition_factor = ceil(size(randomized_set,1) / 10);
cv_test_idx = (k-1)*parition_factor + 1:min(k * parition_factor, size(randomized_set,1));
total_idx = 1:size(randomized_set,1);
cv_train_idx = total_idx(~ismember(total_idx, cv_test_idx));
ingredients = randomized_set(cv_train_idx, 1:end-1);
heat = randomized_set(cv_train_idx, end);
[W,SE,PVAL,INMODEL,STATS,NEXTSTEP,HISTORY]= stepwisefit(ingredients, heat, 'penter', .1);
valid_id = find(INMODEL==1);
v_weights = W(valid_id)';
t_ingredients = randomized_set(cv_test_idx, 1:end-1);
t_heat = randomized_set(cv_test_idx, end); % true labels for test set
v_features = t_ingredients(:, valid_id);
v_weights = repmat(v_weights, size(v_features, 1), 1);
predictor = sum(v_weights .* v_features, 2);
m_result = predictor > 0; % class A: +1, B: 0
t_heat(t_heat==-1) = 0;
acc(k) = sum(m_result==t_heat) / length(m_result);
p.s. my code is currently very inefficient and might be bad..
In my assumption, stepwisefit calculates significant coefficients every steps, and valid column would be remained.
Even though it's not LDA, but for binary classification, LDA and linear regression are not different.
However, results were almost random chance.. (for other binary data on the internet, it worked..)
I think I made something wrong, and your help can correct me.
I will appreciate any suggestion and tips to implement classifier for ERP speller.
Or any idea for implementing SWLDA in Matlab code?
The name SWLDA is only used in the context of Brain Computer Interfaces, but I bet it has another name in a more general context.
If you track the recipe of SWLDA you will end up in Krusienski 2006 papers ("A comparison..." and "Toward enhanced P300..") and from there the book where stepwise logarithmic regression is explained: "Draper Smith, Applied Regression Analysis, 1981". However, as far as I am aware of, no paper gives actually the complete recipe on how to implement it (and their details and secrets).
My approach was using stepwiseglm:
lbs=labels % (1,2)
if (stepwiseflag)
mdl = stepwiseglm(H', lbs'-1,'constant','upper','linear','distr','binomial');
if (mdl.NumEstimatedCoefficients>1)
inmodel = [];
for i=2:mdl.NumEstimatedCoefficients
inmodel = [inmodel str2num(mdl.CoefficientNames{i}(2:end))];
H = H(inmodel,:);
TH = TH(inmodel,:);
lbls = classify(TH',H',lbs','linear');
You can also use a k-fold cross validaton approach using matlab cvpartition.
c = cvpartition(lbs,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...

How can I save the parameters in neural networks at the end of every t-1 and use them to reinitialize train from that point?

I am using Matlab and I try to train a neural network. Due to the big number of observations I need to reduce the computational time. Hence, I would like my network to save the parameters computed for time t-1 and use these as initial point for time t (instead of iterating say 1000 times for the solution, to iterate 6-10). The mocking code I prepared, without including the for loop, is the following, it works but without doing what I ask to do
x = randn(1,50);
y = x.^2;
HU = 2;
nets = trainnet(HU)
nets = train(nets,x,y)
net1 = configure(nets,x,y);
net1.IW = nets.IW;
net1.LW = nets.LW;
net1.b = nets.b;
net1 = trainnet(HU)
net1 = train(net1,x,y)
function net = trainnet(HU)
trainFcn = 'trainlm';
hiddenLayerSize = HU;
net = fitnet(hiddenLayerSize,trainFcn);
I would really appreciate any help. Thanks in advance.

Matlab Convolutional Neural network not learning

I'm running an example that I got from a Webinar.
this is the code:
%% Fine Tuning A Deep Neural Network
clear; clc;close all;
imagenet_cnn = load('imagenet-cnn');
net = imagenet_cnn.convnet;
%% Perform net surgery
layers = net.Layers(1:end-3);
layers(end+1) = fullyConnectedLayer(12, 'Name', 'fc8_2')
layers(end+1) = softmaxLayer('Name','prob_2');
layers(end+1) = classificationLayer('Name','classificationLayer_2')
%% Setup learning rates for fine-tuning
% fc 8 - bump up learning rate for last layers
layers(end-2).WeightLearnRateFactor = 100;
layers(end-2).WeightL2Factor = 1;
layers(end-2).BiasLearnRateFactor = 20;
layers(end-2).BiasL2Factor = 0;
%% Load Image Data
rootFolder = fullfile('E:\Universidad\Tesis\Matlab', 'TesisDataBase');
categories = {'Avion','Banana','Carro','Gato', 'Mango','Perro','Sandia','Tijeras','Silla','Mouse','Calculadora','Arbol'};
imds = imageDatastore(fullfile(rootFolder, categories), 'LabelSource', 'foldernames');
tbl = countEachLabel(imds);
%% Equalize number of images of each class in training set
minSetCount = min(tbl{:,2}); % determine the smallest amount of images in a category
% Use splitEachLabel method to trim the set.
imds = splitEachLabel(imds, minSetCount);
% Notice that each set now has exactly the same number of images.
[trainingDS, testDS] = splitEachLabel(imds, 0.7,'randomize');
% Convert labels to categoricals
trainingDS.Labels = categorical(trainingDS.Labels);
trainingDS.ReadFcn = #readFunctionTrain;
%% Setup test data for validation
testDS.Labels = categorical(testDS.Labels);
testDS.ReadFcn = #readFunctionValidation;
%% Fine-tune the Network
miniBatchSize = 32; % lower this if your GPU runs out of memory.
numImages = numel(trainingDS.Files);
numIterationsPerEpoch = 250;
maxEpochs = 62;
lr = 0.01;
opts = trainingOptions('sgdm', ...
'InitialLearnRate', lr,...
'LearnRateSchedule', 'none',...
'L2Regularization', 0.0005, ...
'MaxEpochs', maxEpochs, ...
'MiniBatchSize', miniBatchSize);
net = trainNetwork(trainingDS, layers, opts);
As you can see this code , uses the well known AlexNet as a first start, then the last 3 layers are deleted ,in order to put 3 new layers with the number of neurons necessary for the new task.
the read func for test and training are the same here you have one of them:
function Iout = readFunctionTrain(filename)
% Resize the flowers images to the size required by the network.
I = imread(filename);
% Some images may be grayscale. Replicate the image 3 times to
% create an RGB image.
if ismatrix(I)
I = cat(3,I,I,I);
% Resize the image as required for the CNN.
Iout = imresize(I, [227 227]);
this code runs well at the webinar, they use it to classify cars and subs that pass thru the matworks door.
The problem is that the new net is not learning when I try it with my own images,I have a data set of 12 categories each one with 1000 images more or less, all this images where downloaded from ImageNET.
the net does not increase its Mini batch accuracy, actually some times it does but very slow.
I also did the tutorial of this page
Matlab Deep Learning ToolBox
and it worked good with my images. So , I don't understand what is wrong with my fine-tuning. Thanks.
If you have R2016a and a GeForce GTX1080 or other Pascal GPU, then see this tech support answer and this bug report.
Your learning rate for the pre-trained section of the network (0.01) looks very high for a fine tuning workflow. Also, your LR of 1.0 is quite high for the randomly initialized classification head.
What happens if you set the learning rate of the pre-trained section to 0 and train only the randomly initialized head of the network? What happens if you just use a low learning rate and train end to end (say 1e-5)?
It would be useful to see the training-progress plot, but I think its possible you're not converging due to your learning rate settings.

Matlab fitcsvm gives me a zero training error and 40% in testing

I know its over-fitting to the training data set, yet I dont know how to change the parameters to avoid this.
I have tried changing the boxcontraint from 1e0, 1e1, 1e10 and got the same situation.
tTargets = ones(size(trainTargets,1),1);
svmModel = fitcsvm(trainData, ...
[Group, score] = predict(svmModel, trainData);
tTargets = ones(size(trainTargets,1),1);
svmTrainError = sum(tTargets ~= Group)/size(trainTargets,1);
[Group, score] = predict(svmModel, testData);
tTargets = ones(size(testTargets,1),1);
svmTestError = sum(tTargets ~= Group)/size(testTargets,1);
I hope someone can help with this
I found out that I was using a big C for the training. This made that the separation on the training data was really good but not for the testing.
Changing the C for a smaller value (1e-2) made my code run faster and now I have comparable over all accuracy in the training and testing.
Thank you!!!!

Using PCA before classification

I am using PCA to reduce number of features before training Random Forest. I first used around 70 principal components out of 125 which were around 99% of the energy (according to eigen values). I got much worse results after training Random Forests with new transformed features. After that I used all the principal components and I got the same results as when I used 70. This made no sense to me since that is the same feature space only in difirent base (the space has only be rotated so that should not affect the boundary).
Does anyone have the idea what may be the problem here?
Here is my code
clear all;
close all;
load patches_training_256.txt
load patches_testing_256.txt
Xtr = patches_training_256(:,2:end);
Xtr = Xtr';
Ytr = patches_training_256(:,1);
Ytr = Ytr';
Xtest = patches_testing_256(:,2:end);
Xtest = Xtest';
Ytest = patches_testing_256(:,1);
Ytest = Ytest';
data_size = size(Xtr, 2);
feature_size = size(Xtr, 1);
mu = mean(Xtr,2);
sigma = std(Xtr,0,2);
mu_mat = repmat(mu,1,data_size);
sigma_mat = repmat(sigma,1,data_size);
cov = ((Xtr - mu_mat)./sigma_mat) * ((Xtr - mu_mat)./sigma_mat)' / data_size;
[v d] = eig(cov);
%[U S V] = svd(((Xtr - mu_mat)./sigma_mat)');
k = 124;
%Ureduce = U(:,1:k);
%XtrReduce = ((Xtr - mu_mat)./sigma_mat) * Ureduce;
XtrReduce = v'*((Xtr - mu_mat)./sigma_mat);
B = TreeBagger(300, XtrReduce', Ytr', 'Prior', 'Empirical', 'NPrint', 1);
data_size_test = size(Xtest, 2);
mu_test = repmat(mu,1,data_size_test);
sigma_test = repmat(sigma,1,data_size_test);
XtestReduce = v' * ((Xtest - mu_test) ./ sigma_test);
Ypredict = predict(B,XtestReduce');
error = sum(Ytest' ~= (double(cell2mat(Ypredict)) - 48))
Random forest heavily depends on the choice of the base. It is not a linear model, which is (up to normalization) rotation invariant, RF completely changes behaviour once you "rotate the space". The reason behind it lies in the fact that it uses decision trees as base classifiers which analyze each feature completely independently, so as the result it fails to find any linear combination of features. Once you rotate your space you change "meaning" of features. There is nothing wrong with that, simply tree based classifiers are rather bad choice to apply after such transformations. Use features selection methods instead (methods which select which features are valuable without creating any linear combinations). In fact, RFs themselves can be used for such task due to their internal "feature importance" computation,
There is already a matlab function princomp which would do pca for you. I would suggest not to fall in numerical error loops. They have done it for us..:)