Confusion with inputs and targets for a neural network - matlab

Recently I've posted many question s regarding a character recognition program that I am making. I thought I had it working fully until today. I think it has to do with my training of the network. What follows is an explanation of how I think the training and simulation procedure goes.
Give these two images
targets
inputs
I want to train the network to recognize the letter D. Note that before this is done, I've processed the images into a binary matrix. For training I use
[net,tr] = train(net,inputs,targets);
where instead of inputs I was targets because I want to train the network to recognize all the letters in the target image.
I then run
outputs = sim(net,inputs);
where inputs is the image with the letter "D", or an image with any other letter that is in ABCD. The basic premise here is that I want to train the network to recognized all the letters in ABCD, then choose any letter A, B, C, or D and see if the network recognizes this choosen letter.
Question:
Am I correct with the training procedure?

Well it greatly depends on how you implemented your neural network. Although regarding the question you're asking I guess you didn't implement it yourself but used some ready made API.
Anyways, you should first understand the tools you use before you use them (here neural networks).
A neural network takes an input and performs linear or non-linear transformations of the input and returns an output.
Inputs and outputs are always numeric values. However they may represent any kind of data.
Inputs can be:
Pixels of an image
Real valued or integer attributes
Categories
etc.
In your case the inputs are the pixels of your character images (your binary matrices).
Outputs can be:
Classes (if you're doing classification)
Values (if you're doing regression)
Next value in a time series (if you're doing time series prediction)
In your case, you're doing classification (predicting which character the inputs represent) so your output is a class.
For you to understand how the network is trained, I'll first explain how to use it once it's trained and then what it implies for the training phase.
So once you've trained you network, you will give it the binary matrix representing your image and it will output the class (the character) which will be (for example): 0 for A, 1 for B, 2 for C and 3 for D. In other words, you have:
Input: binary matrix (image)
Output: 0,1,2 or 3 (depending on which character the network recognizes in the image)
The training phase consists in telling the network which output you would like for each input.
The type of data used during the training phase is the same as the one being used in the "prediction phase". Hence, for the training phase:
Inputs: binary matrices [A,B,C,D] (One for each letter! Very important !)
Targets: corresponding classes [0,1,2,3]
This way, you're telling the network to learn that if you give it the image of A it should output 0, if you give it the image of B it should output 1, and so on.
Note: You were mistaken because you thought of the "inputs" as the inputs you wanted to give the network after the training phase, when they were actually the inputs given to the network during the training phase.

Related

Backpropagation and training set for dummies

I'm at the very beginning of studying neural networks but my scarce skills or lack of intelligence do not allow me to understand from popular articles how to correctly prepare training set for backpropagation training method (or its limitations). For example, I want to train the simplest two-layer perceptron to solve XOR with backpropagation (e. g. modify random initial weights for 4 synapses from first layer and 4 from second). Simple XOR function has two inputs, one output: {0,0}=>0, {0,1}=>1, {1,0}=>1, {1,1}=>0. But neural networks theory tells that "backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient". Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs and this restriction can not be avoided? Or does it means, if I want to use backpropagation for solving such classification tasks as XOR (i. e. number of inputs is bigger than number of outputs), theory tells that it's always necessary to remake training set in the similarly way (input=>desired output): {0,0}=>{0,0}, {0,1}=>{1,1}, {1,0}=>{1,1}, {1,1}=>{0,0}?
Thanks for any help in advance!
Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs
If you mean the output is "the class" in classification task, I don't think so,
backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient
I think it's mean every input should have an output, not a different output.
In real life problem, like handwriting digit classification (MNIST), there are around 50.000 data training (input), but only classed to 10 digit

Using neural network for classification in matlab

I'm working on optical character recognition problem. I've successfully extracted features which is a [1X32] matrix (I've extracted 32 features from each segmented character). I've the complete training data set (the images of every individual character), but I'm breaking my head on creating Input & Target data set matrices. So please tell me about those matrices, the testing data, & in what format will I get output from neural network.
1)There are 258 different patterns (characters), so, should there be 258 class labels ?
My input matrix size is No. of rows = 32 (features) No. of cols = 258*4=1032 (No of characters*No of instances for each character)
2) what should be the size of my target matrix ? Just draw a dummy target matrix for my case.
Did you checked the Neural Network Toolbox of MATLAB already (http://www.mathworks.co.uk/help/nnet/examples/crab-classification.html?prodcode=NN&language=en) ? There you can find some examples how to work with neural networks.
Regarding your two specific questions:
1) Typically if you want to differentiate between N different characters you will need that amount of class labels. So in your case yes you should have 258 class labels. The output of a classification problem using neural networks is typically a binary output where one goes for the identified class and 0 for the remain classes. It can happen however, if you use a sigmoid function as the last activation function that neither output node is exactly 0 or 1, and in this case you can for example take the maximum of all output nodes, to get the highest or more probable class for a certain input.
2) The target matrix should be a binary matrix where 1 goes for the correct class and 0 for all the others classes for each input. So in your case it should be 258*1032 matrix. Again I recommend you to check the link given above.
Good luck.

ANN multiple vs single outputs

I recently started studying ANN, and there is something that I've been trying to figure out that I can't seem to find an answer to (probably because it's too trivial or because I'm searching for the wrong keywords..).
When do you use multiple outputs instead of single outputs? I guess in simplest case of 1/0-classification its the easiest to use the "sign" as the output activiation function. But in which case do you use several outputs? Is it if you have for instance a multiple classification problem, so you want to classify something as, say for instance, A, B or C and you choose 1 output neuron for each class? How do you determine which class it belongs to?
In a classification context, there are a couple of situations where using multiple output units can be helpful: multiclass classification, and explicit confidence estimation.
Multiclass
For the multiclass case, as you wrote in your question, you typically have one output unit in your network for each class of data you're interested in. So if you're trying to classify data as one of A, B, or C, you can train your network on labeled data, but convert all of your "A" labels to [1 0 0], all your "B" labels to [0 1 0], and your "C" labels to [0 0 1]. (This is called a "one-hot" encoding.) You also probably want to use a logistic activation on your output units to restrict their activation values to the interval (0, 1).
Then, when you're training your network, it's often useful to optimize a "cross-entropy" loss (as opposed to a somewhat more intuitive Euclidean distance loss), since you're basically trying to teach your network to output the probability of each class for a given input. Often one uses a "softmax" (also sometimes called a Boltzmann) distribution to define this probability.
For more info, please check out http://www.willamette.edu/~gorr/classes/cs449/classify.html (slightly more theoretical) and http://deeplearning.net/tutorial/logreg.html (more aimed at the code side of things).
Confidence estimation
Another cool use of multiple outputs is to use one output as a standard classifier (e.g., just one output unit that generates a 0 or 1), and a second output to indicate the confidence that this network has in its classification of the input signal (e.g., another output unit that generates a value in the interval (0, 1)).
This could be useful if you trained up a separate network on each of your A, B, and C classes of data, but then also presented data to the system later that came from class D (or whatever) -- in this case, you'd want each of the networks to indicate that they were uncertain of the output because they've never seen something from class D before.
Have a look at softmax layer for instance. Maximum output of this layer is your class. And it has got nice theoretical justification.
To be concise : you take previous layer's output and interpret it as a vector in m dimensional space. After that you fit K gaussians to it, which are sharing covariance matrices. If you model it and write out equations it amounts to softmax layer. For more details see "Machine Learning. A Probabilistic Perspective" by Kevin Murphy.
It is just an example of using last layer for multiclass classification. You can as well use multiple outputs for something else. For instance you can train ANN to "compress" your data, that is calculate a function from N dimensional to M dimensional space that minimizes loss of information (this model is called autoencoder)

How to use created "net" neural network object for prediction?

I used ntstool to create NAR (nonlinear Autoregressive) net object, by training on a 1x1247 input vector. (daily stock price for 6 years)
I have finished all the steps and saved the resulting net object to workspace.
Now I am clueless on how to use this object to predict the y(t) for example t = 2000, (I trained the model for t = 1:1247)
In some other threads, people recommended to use sim(net, t) function - however this will give me the same result for any value of t. (same with net(t) function)
I am not familiar with the specific neural net commands, but I think you are approaching this problem in the wrong way. Typically you want to model the evolution in time. You do this by specifying a certain window, say 3 months.
What you are training now is a single input vector, which has no information about evolution in time. The reason you always get the same prediction is because you only used a single point for training (even though it is 1247 dimensional, it is still 1 point).
You probably want to make input vectors of this nature (for simplicity, assume you are working with months):
[month1 month2; month2 month 3; month3 month4]
This example contains 2 training points with the evolution of 3 months. Note that they overlap.
Use the Network
After the network is trained and validated, the network object can be used to calculate the network response to any input. For example, if you want to find the network response to the fifth input vector in the building data set, you can use the following
a = net(houseInputs(:,5))
a =
34.3922
If you try this command, your output might be different, depending on the state of your random number generator when the network was initialized. Below, the network object is called to calculate the outputs for a concurrent set of all the input vectors in the housing data set. This is the batch mode form of simulation, in which all the input vectors are placed in one matrix. This is much more efficient than presenting the vectors one at a time.
a = net(houseInputs);
Each time a neural network is trained, can result in a different solution due to different initial weight and bias values and different divisions of data into training, validation, and test sets. As a result, different neural networks trained on the same problem can give different outputs for the same input. To ensure that a neural network of good accuracy has been found, retrain several times.
There are several other techniques for improving upon initial solutions if higher accuracy is desired. For more information, see Improve Neural Network Generalization and Avoid Overfitting.
strong text

Time series classification MATLAB

My task is to classify time-series data with use of MATLAB and any neural-network framework.
Describing task more specifically:
Is is a problem from computer-vision field. Is is a scene boundary detection task.
Source data are 4 arrays of neighbouring frame histogram correlations from the videoflow.
Based on this data, we have to classify this timeseries with 2 classes:
"scene break"
"no scene break"
So network input is 4 double values for each source data entry, and output is one binary value. I am going to show example of src data below:
0.997894,0.999413,0.982098,0.992164
0.998964,0.999986,0.999127,0.982068
0.993807,0.998823,0.994008,0.994299
0.225917,0.000000,0.407494,0.400424
0.881150,0.999427,0.949031,0.994918
Problem is that pattern-recogition tools from Matlab Neural Toolbox (like patternnet) threat source data like independant entrues. But I have strong belief that results will be precise only if net take decision based on the history of previous correlations.
But I also did not manage to get valid response from reccurent nets which serve time series analysis (like delaynet and narxnet).
narxnet and delaynet return lousy result and it looks like these types of networks not supposed to solve classification tasks. I am not insert any code here while it is allmost totally autogenerated with use of Matlab Neural Toolbox GUI.
I would apprecite any help. Especially, some advice which tool fits better for accomplishing my task.
I am not sure how difficult to classify this problem.
Given your sample, 4 input and 1 output feed-forward neural network is sufficient.
If you insist on using historical inputs, you simply pre-process your input d, such that
Your new input D(t) (a vector at time t) is composed of d(t) is a 1x4 vector at time t; d(t-1) is 1x4 vector at time t-1;... and d(t-k) is a 1x4 vector at time t-k.
If t-k <0, just treat it as '0'.
So you have a 1x(4(k+1)) vector as input, and 1 output.
Similar as Dan mentioned, you need to find a good k.
Speaking of the weights, I think additional pre-processing like windowing method on the input is not necessary, since neural network would be trained to assign weights to each input dimension.
It sounds a bit messy, since the neural network would consider each input dimension independently. That means you lose the information as four neighboring correlations.
One possible solution is the pre-processing extracts the neighborhood features, e.g. using mean and std as two features representative for the originals.