Matlab - Neural network training

Matlab - Neural network training - matlab

I'm working on creating a 2 layer neural network with back-propagation. The NN is supposed to get its data from a 20001x17 vector that holds following information in each row:
-The first 16 cells hold integers ranging from 0 to 15 which act as variables to help us determine which one of the 26 letters of the alphabet we mean to express when seeing those variables. For example a series of 16 values as follows are meant to represent the letter A: [2 8 4 5 2 7 5 3 1 6 0 8 2 7 2 7].
-The 17th cell holds a number ranging from 1 to 26 representing the letter of the alphabet we want. 1 stands for A, 2 stands for B etc.
The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.
Some things that are important before i present the code: I need to use the traingdm function and the hidden layer number is fixed (for now) at 21.
Trying to create the above concept i wrote the following matlab code:
%%%%%%%%
%Start of code%
%%%%%%%%
%
%Initialize the input and target vectors
%
p = zeros(16,20001);
t = zeros(26,20001);
%
%Fill the input and training vectors from the dataset provided
%
for i=2:20001
for k=1:16
p(k,i-1) = data(i,k);
end
t(data(i,17),i-1) = 1;
end
net = newff(minmax(p),[21 26],{'logsig' 'logsig'},'traingdm');
y1 = sim(net,p);
net.trainParam.epochs = 200;
net.trainParam.show = 1;
net.trainParam.goal = 0.1;
net.trainParam.lr = 0.8;
net.trainParam.mc = 0.2;
net.divideFcn = 'dividerand';
net.divideParam.trainRatio = 0.7;
net.divideParam.testRatio = 0.2;
net.divideParam.valRatio = 0.1;
%[pn,ps] = mapminmax(p);
%[tn,ts] = mapminmax(t);
net = init(net);
[net,tr] = train(net,p,t);
y2 = sim(net,pn);
%%%%%%%%
%End of code%
%%%%%%%%
Now to my problem: I want my outputs to be as described, namely each column of the y2 vector for example should be a representation of a letter. My code doesn't do that though. Instead it produced results that vary greatly between 0 and 1, values from 0.1 to 0.9.
My question is: is there some conversion i need to be doing that i am not? Meaning, do i have to convert my input and/or output data to a form by which i can actually see if my NN is learning correctly?
Any input would be appreciated.

This is normal. Your output layer is using a log-sigmoid transfer function, and that will always give you some intermediate output between 0 and 1.
What you would usually do would be to look for the output with the largest value -- in other words, the most likely character.
This would mean that, for every column in y2, you're looking for the index of the row that contains the largest value in that row. You can compute this as follows:
[dummy, I]=max(y2);
I is then a vector containing the indexes of the largest value in each row.

You can think of y2 as an output probability distribution for each input being one of the 26 alphabet characters, for example if one column of y2 says:
.2
.5
.15
.15
then its 50% probability that this character is B (if we assume only 4 possible outputs).
==REMARK==
The output layer of the NN consists of
26 outputs. Every time the NN is fed
an input like the one described above
it's supposed to output a 1x26 vector
containing zeros in all but the one
cell that corresponds to the letter
that the input values were meant to
represent. for example the output [1 0
0 ... 0] would be letter A, whereas [0
0 0 ... 1] would be the letter Z.
It is preferable to avoid using target values of 0,1 to encode the output of the network.
The reason for avoiding target values of 0 and 1 is that 'logsig' sigmoid transfer function cannot produce these output values given finite weights. If you attempt to train the network to fit target values of exactly 0 and 1, gradient descent will force the weights to grow without bound.
So instead of 0 and 1 values, try using values of 0.04 and 0.9 for example, so that [0.9,0.04,...,0.04] is the target output vector for the letter A.
Reference:
Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997, p114-115

Use hardlin fcn in output layer.
Use trainlm or trainrp for training the network.
To learn your network, use a for loop and a condition that compare the output and target. When it is the best use, break to exit from the learning loop.
Use another way instead of mapminmax for pre-processing data set.

I don't know if this constitutes an actual answer or not: but here are some remarks.
I don't understand your coding scheme. How is an 'A' represented as that set of numbers? It looks like you're falling into a fairly common trap of using arbitrary numbers to code categorical values. Don't do this: for example if 'a' is 1, 'b' is 2 and 'c' is 3, then your coding has implicitly stated that 'a' is more like 'b' than 'c' (because the network has real-value inputs the ordinal properties matter). The way to do this properly is to have each letter represented as 26 binary valued inputs, where only one is ever active, representing the letter.
Your outputs are correct, the activation at the output layer will not
ever be either 0 or 1, but real numbers. You could take the max as
your activity function, but this is problematic because it's not
differentiable, so you can't use back-prop. What you should do is
couple the outputs with the softmax function, so that their sum
is one. You can then treat the outputs as conditional probabilities
given the inputs, if you so desire. While the network is not
explicitly probabilistic, with the correct activity and activation
functions is will be identical in structure to a log-linear model
(possibly with latent variables corresponding to the hidden layer),
and people do this all the time.
See David Mackay's textbook for a nice intro to neural nets which will make clear the probabilistic connection. Take a look at this paper from Geoff Hinton's group which describes the task of predicting the next character given the context for details on the correct representation and activation/activity functions (although beware their method is non-trivial and uses a recurrent net with a different training method).

Related

How to decide the range for the hyperparameter space in SVM tuning? (MATLAB)

I am tuning an SVM using a for loop to search in the range of hyperparameter's space. The svm model learned contains the following fields
SVMModel: [1×1 ClassificationSVM]
C: 2
FeaturesIdx: [4 6 8]
Score: 0.0142
Question1) What is the meaning of the field 'score' and its utility?
Question2) I am tuning the BoxConstraint, C value. Let, the number of features be denoted by the variable featsize. The variable gridC will contain the search space which can start from any value say 2^-5, 2^-3, to 2^15 etc. So, gridC = 2.^(-5:2:15). I cannot understand if there is a way to select the range?

1. score had been documented in here, which says:
Classification Score
The SVM classification score for classifying observation x is the signed distance from x to the decision boundary ranging from -∞ to +∞.
A positive score for a class indicates that x is predicted to be in
that class. A negative score indicates otherwise.
In two class cases, if there are six observations, and the predict function gave us some score value called TestScore, then we could determine which class does the specific observation ascribed by:
TestScore=[-0.4497 0.4497
-0.2602 0.2602;
-0.0746 0.0746;
0.1070 -0.1070;
0.2841 -0.2841;
0.4566 -0.4566;];
[~,Classes] = max(TestScore,[],2);
In the two-class classification, we can also use find(TestScore > 0) instead, and it is clear that the first three observations are belonging to the second class, and the 4th to 6th observations are belonging to the first class.
In multiclass cases, there could be several scores > 0, but the code max(scores,[],2) is still validate. For example, we could use the code (from here, an example called Find Multiple Class Boundaries Using Binary SVM) following to determine the classes of the predict Samples.
for j = 1:numel(classes);
[~,score] = predict(SVMModels{j},Samples);
Scores(:,j) = score(:,2); % Second column contains positive-class scores
end
[~,maxScore] = max(Scores,[],2);
Then the maxScore will denote the predicted classes of each sample.
2. The BoxConstraint denotes C in the SVM model, so we can train SVMs in different hyperparameters and select the best one by something like:
gridC = 2.^(-5:2:15);
for ii=1:length(gridC)
SVModel = fitcsvm(data3,theclass,'KernelFunction','rbf',...
'BoxConstraint',gridC(ii),'ClassNames',[-1,1]);
%if (%some constraints were meet)
% %save the current SVModel
%end
end
Note: Another way to implement this is using libsvm, a fast and easy-to-use SVM toolbox, which has the interface of MATLAB.

Modeling an hrf time series in MATLAB

I'm attempting to model fMRI data so I can check the efficacy of an experimental design. I have been following a couple of tutorials and have a question.
I first need to model the BOLD response by convolving a stimulus input time series with a canonical haemodynamic response function (HRF). The first tutorial I checked said that one can make an HRF that is of any amplitude as long as the 'shape' of the HRF is correct so they created the following HRF in matlab:
hrf = [ 0 0 1 5 8 9.2 9 7 4 2 0 -1 -1 -0.8 -0.7 -0.5 -0.3 -0.1 0 ]
And then convolved the HRF with the stimulus by just using 'conv' so:
hrf_convolved_with_stim_time_series = conv(input,hrf);
This is very straight forward but I want my model to eventually be as accurate as possible so I checked a more advanced tutorial and they did the following. First they created a vector of 20 timepoints then used the 'gampdf' function to create the HRF.
t = 1:1:20; % MEASUREMENTS
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Is there a benefit to doing it this way over the simpler one? I suppose I have 3 specific questions.
The 'gampdf' help page is super short and only says the '6' and '10' in each function call represents 'A' which is a 'shape' parameter. What does this mean? It gives no other information. Why is it 6 in the first call and 10 in the second?
This question is directly related to the above one. This code is written for a situation where there is a TR = 1 and the stimulus is very short (like 1s). In my situation my TR = 2 and my stimulus is quite long (12s). I tried to adapt the above code to make a working HRF for my situation by doing the following:
t = 1:2:40; % 2s timestep with the 40 to try to equate total time to above
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Because I have no idea what the 'gampdf' parameters mean (or what that line does, in all actuality) I'm not sure this gives me what I'm looking for. I essentially get out 20 values where 1-14 have SOME numeric value in them but 15-20 are all 0. I'm assuming there will be a response during the entire 12s stimulus period (first 6 TRs so values 1-6) with the appropriate rectification which could be the rest of the values but I'm not sure.
Final question. The other code does not 'scale' the HRF to have an amplitude of 1. Will that matter, ultimately?

The canonical HRF you choose is dependent upon where in the brain the BOLD signal is coming from. It would be inappropriate to choose just any HRF. Your best source of a model is going to come from a lit review. I've linked a paper discussing the merits of multiple HRF models. The methods section brings up some salient points.

Unreasonable [positive] log-likelihood values from matlab "fitgmdist" function

I want to fit a data sets with Gaussian mixture model, the data sets contains about 120k samples and each sample has about 130 dimensions. When I use matlab to do it, so I run scripts (with cluster number 1000):
gm = fitgmdist(data, 1000, 'Options', statset('Display', 'iter'), 'RegularizationValue', 0.01);
I get the following outputs:
iter log-likelihood
1 -6.66298e+07
2 -1.87763e+07
3 -5.00384e+06
4 -1.11863e+06
5 299767
6 985834
7 1.39525e+06
8 1.70956e+06
9 1.94637e+06
The log likelihood is bigger than 0! I think it's unreasonable, and don't know why.
Could somebody help me?

First of all, it is not a problem of how large your dataset is.
Here is some code that produces similar results with a quite small dataset:
options = statset('Display', 'iter');
x = ones(5,2) + (rand(5,2)-0.5)/1000;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 64.4731
2 73.4987
3 73.4987
Of course you know that the log function (the natural logarithm) has a range from -inf to +inf. I guess your problem is that you think the input to the log (i.e. the aposteriori function) should be bounded by [0,1]. Well, the aposteriori function is a pdf function, which means that its value can be very large for very dense dataset.
PDFs must be positive (which is why we can use the log on them) and must integrate to 1. But they are not bounded by [0,1].
You can verify this by reducing the density in the above code
x = ones(5,2) + (rand(5,2)-0.5)/1;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 -8.99083
2 -3.06465
3 -3.06465
So, I would rather assume that your dataset contains several duplicate (or very close) values.

Creating a MIMO (multiple input multiple output) transfer function system without hardcoding the number of inputs and outputs

Introduction
As part of a larger system I'm trying to create a multiple input multiple output transfer function that only links inputs to outputs on the lead diagonal*. I.e. it has non zero transfer functions between input 1 and output 1, input 2 and output 2 etc etc.
*whether you really count that as a MIMO system is a fair comment, I want it in this format because it links to a larger system that really is MIMO.
Hard Coding
I can achieve this by concatenating transfer functions as so
tf1=tf([1 -1],[1 1]);
tf2=tf([1 2],[1 4 5]);
tf3=tf([1 2],[5 4 1]);
G=[tf1 0 0; 0 tf2 0; 0 0 tf3];
Which works fine, but (a) hard codes the number of inputs/outputs and (b) becomes increasingly horrible the more inputs and outputs you have.
Diag function
This problem seemed perfect for the diag function however diag does not seem to be defined for type 'tf'
G=diag([tf1, tf2, tf3])
??? Undefined function or method 'diag' for input arguments of type 'tf'.
Manual Matrix manipulation
I also tried manually manipulating a matrix (not that I was really expecting it to work)
G=zeros(3);
G(1,1)=tf1;
G(2,2)=tf2;
G(3,3)=tf3;
??? The following error occurred converting from tf to double:
Error using ==> double
Conversion to double from tf is not possible.
tf's direct to MIMO format
tf also has a format in which all the numerators and denominators are represented seperately and a MIMO system is directly created. I attempted to use this in a non hard coded format
numerators=diag({[1 -1], [1 2],[1 2]})
denominators=diag({[1 1], [1 4 5],[5 4 1]})
G=tf( numerators , denominators )
??? Error using ==> checkNumDenData at 19
Numerators and denominators must be specified as non empty row vectors.
This one almost worked, unfortunately numerators and denominators are empty on the off diagonal rather than being 0; leading to the error
Question
Is it possible to create a MIMO system from transfer functions without "hard coding" the number of inputs and outputs

I suggest you try realizing each SISO as a state space system, say (Ak, Bk, Ck, Dk), assembling a large diagonal system like
A = blkdiag(A1,....)
B = blkdiag(B1,...)
C = blkdiag(C1,...)
D = diag([D1, ....])
and then use ss2tf to compute the transfer function of the augmented system.

diag in matlab is not the same as blkdiag. The overloaded LTI operator is the blkdiag to put things on a diagonal of a matrix structure.
In your case, it is done simply by
tf1=tf([1 -1],[1 1]);
tf2=tf([1 2],[1 4 5]);
tf3=tf([1 2],[5 4 1]);
G = blkdiag(tf1,tf2,tf3)
The MIMO syntax requires cells to distinguish the polynomial entries from the MIMO structure. Moreover, it does not like identically zero denominator entries (which is understandable) hence if you wish to enter in the mimo context you need to use
G = tf({[1 -1],0,0;0,[1 2],0;0,0,[1 2]},{[1 1],1,1;1,[1 4 5],1;1,1,[5 4 1]})
or in your syntax
Num = {[1 -1],0,0;0,[1 2],0;0,0,[1 2]};
Den = {[1 1],1,1;1,[1 4 5],1;1,1,[5 4 1]};
tf(Num,Den)
Instead of ones you can basically use anything valid nonzero entries.

Matlab fast neighborhood operation

I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?

+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.

One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse