I'm working on optical character recognition problem. I've successfully extracted features which is a [1X32] matrix (I've extracted 32 features from each segmented character). I've the complete training data set (the images of every individual character), but I'm breaking my head on creating Input & Target data set matrices. So please tell me about those matrices, the testing data, & in what format will I get output from neural network.
1)There are 258 different patterns (characters), so, should there be 258 class labels ?
My input matrix size is No. of rows = 32 (features) No. of cols = 258*4=1032 (No of characters*No of instances for each character)
2) what should be the size of my target matrix ? Just draw a dummy target matrix for my case.
Did you checked the Neural Network Toolbox of MATLAB already (http://www.mathworks.co.uk/help/nnet/examples/crab-classification.html?prodcode=NN&language=en) ? There you can find some examples how to work with neural networks.
Regarding your two specific questions:
1) Typically if you want to differentiate between N different characters you will need that amount of class labels. So in your case yes you should have 258 class labels. The output of a classification problem using neural networks is typically a binary output where one goes for the identified class and 0 for the remain classes. It can happen however, if you use a sigmoid function as the last activation function that neither output node is exactly 0 or 1, and in this case you can for example take the maximum of all output nodes, to get the highest or more probable class for a certain input.
2) The target matrix should be a binary matrix where 1 goes for the correct class and 0 for all the others classes for each input. So in your case it should be 258*1032 matrix. Again I recommend you to check the link given above.
Good luck.
Related
I am working on a traffic sign recognition code in MATLAB using Belgian Traffic Sign Dataset. This dataset can be found here.
The dataset consists of training data and test data (or evaluation data).
I resized the given images and extracted HOG features using the VL_HOG function from VL_feat library.
Then, I trained a multi class SVM using all of the signs inside the training dataset. There are 62 categories (i.e. different types of traffic signs) and 4577 frames inside the training set.
I used the fitcecoc function to obtain the classifier.
Upon training the multi-class SVM, I want to test the classifier performance using the test data and I used the predict and confusionmat functions, respectively.
For some reason, the size of the returned confusion matrix is 53 by 53 instead of 62 by 62.
Why the size of the confusion matrix is not the same as the number of categories?
Some of the folders inside the testing dataset are empty, causing MATLAB to skip those rows and columns in the confusion matrix.
I am designing an algorithm for OCR using a neural network. I have 100 images([40x20] matrix) of each character so my input should be 2600x800. I have some question regarding the inputs and target.
1) is my input correct? and can all the 2600 images used in random order?
2) what should be the target? do I have to define the target for all 2600 inputs?
3) as the target for the same character is single, what is the final target vector?
(26x800) or (2600x800)?
Your input should be correct. You have (I am guessing) 26 characters and 100 images of size 800 for each, therefore the matrix looks good. As a side note, that looks pretty big input size, you may want to consider doing PCA and using the eigenvalues for training or just reduce the size of the images. I have been able to train NN with 10x10 images, but bigger== more difficult. Try, and if it doesn't work try doing PCA.
(and 3) Of course, if you want to train a NN you need to give it inputs with outputs, how else re you going to train it? You ourput should be of size 26x1 for each of the images, therefore the output for training should be 2600x26. In each of the outputs you should have 1 for the character index it belongs and zero in the rest.
Recently I've posted many question s regarding a character recognition program that I am making. I thought I had it working fully until today. I think it has to do with my training of the network. What follows is an explanation of how I think the training and simulation procedure goes.
Give these two images
targets
inputs
I want to train the network to recognize the letter D. Note that before this is done, I've processed the images into a binary matrix. For training I use
[net,tr] = train(net,inputs,targets);
where instead of inputs I was targets because I want to train the network to recognize all the letters in the target image.
I then run
outputs = sim(net,inputs);
where inputs is the image with the letter "D", or an image with any other letter that is in ABCD. The basic premise here is that I want to train the network to recognized all the letters in ABCD, then choose any letter A, B, C, or D and see if the network recognizes this choosen letter.
Question:
Am I correct with the training procedure?
Well it greatly depends on how you implemented your neural network. Although regarding the question you're asking I guess you didn't implement it yourself but used some ready made API.
Anyways, you should first understand the tools you use before you use them (here neural networks).
A neural network takes an input and performs linear or non-linear transformations of the input and returns an output.
Inputs and outputs are always numeric values. However they may represent any kind of data.
Inputs can be:
Pixels of an image
Real valued or integer attributes
Categories
etc.
In your case the inputs are the pixels of your character images (your binary matrices).
Outputs can be:
Classes (if you're doing classification)
Values (if you're doing regression)
Next value in a time series (if you're doing time series prediction)
In your case, you're doing classification (predicting which character the inputs represent) so your output is a class.
For you to understand how the network is trained, I'll first explain how to use it once it's trained and then what it implies for the training phase.
So once you've trained you network, you will give it the binary matrix representing your image and it will output the class (the character) which will be (for example): 0 for A, 1 for B, 2 for C and 3 for D. In other words, you have:
Input: binary matrix (image)
Output: 0,1,2 or 3 (depending on which character the network recognizes in the image)
The training phase consists in telling the network which output you would like for each input.
The type of data used during the training phase is the same as the one being used in the "prediction phase". Hence, for the training phase:
Inputs: binary matrices [A,B,C,D] (One for each letter! Very important !)
Targets: corresponding classes [0,1,2,3]
This way, you're telling the network to learn that if you give it the image of A it should output 0, if you give it the image of B it should output 1, and so on.
Note: You were mistaken because you thought of the "inputs" as the inputs you wanted to give the network after the training phase, when they were actually the inputs given to the network during the training phase.
I have big data set (time-series, about 50 parameters/values). I want to use Kohonen network to group similar data rows. I've read some about Kohonen neural networks, i understand idea of Kohonen network, but:
I don't know how to implement Kohonen with so many dimensions. I found example on CodeProject, but only with 2 or 3 dimensional input vector. When i have 50 parameters - shall i create 50 weights in my neurons?
I don't know how to update weights of winning neuron (how to calculate new weights?).
My english is not perfect and I don't understand everything I read about Kohonen network, especially descriptions of variables in formulas, thats why im asking.
One should distinguish the dimensionality of the map, which is usually low (e.g. 2 in the common case of a rectangular grid) and the dimensionality of the reference vectors which can be arbitrarily high without problems.
Look at http://www.psychology.mcmaster.ca/4i03/demos/competitive-demo.html for a nice example with 49-dimensional input vectors (7x7 pixel images). The Kohonen map in this case has the form of a one-dimensional ring of 8 units.
See also http://www.demogng.de for a java simulator for various Kohonen-like networks including ring-shaped ones like the one at McMasters. The reference vectors, however, are all 2-dimensional, but only for easier display. They could have arbitrary high dimensions without any change in the algorithms.
Yes, you would need 50 neurons. However, these types of networks are usually low dimensional as described in this self-organizing map article. I have never seen them use more than a few inputs.
You have to use an update formula. From the same article: Wv(s + 1) = Wv(s) + Θ(u, v, s) α(s)(D(t) - Wv(s))
yes, you'll need 50 inputs for each neuron
you basically do a linear interpolation between the neurons and the target (input) neuron, and use W(s + 1) = W(s) + Θ() * α(s) * (Input(t) - W(s)) with Θ being your neighbourhood function.
and you should update all your neurons, not only the winner
which function you use as a neighbourhood function depends on your actual problem.
a common property of such a function is that it has a value 1 when i=k and falls off with the distance euclidian distance. additionally it shrinks with time (in order to localize clusters).
simple neighbourhood functions include linear interpolation (up to a "maximum distance") or a gaussian function
My task is to classify time-series data with use of MATLAB and any neural-network framework.
Describing task more specifically:
Is is a problem from computer-vision field. Is is a scene boundary detection task.
Source data are 4 arrays of neighbouring frame histogram correlations from the videoflow.
Based on this data, we have to classify this timeseries with 2 classes:
"scene break"
"no scene break"
So network input is 4 double values for each source data entry, and output is one binary value. I am going to show example of src data below:
0.997894,0.999413,0.982098,0.992164
0.998964,0.999986,0.999127,0.982068
0.993807,0.998823,0.994008,0.994299
0.225917,0.000000,0.407494,0.400424
0.881150,0.999427,0.949031,0.994918
Problem is that pattern-recogition tools from Matlab Neural Toolbox (like patternnet) threat source data like independant entrues. But I have strong belief that results will be precise only if net take decision based on the history of previous correlations.
But I also did not manage to get valid response from reccurent nets which serve time series analysis (like delaynet and narxnet).
narxnet and delaynet return lousy result and it looks like these types of networks not supposed to solve classification tasks. I am not insert any code here while it is allmost totally autogenerated with use of Matlab Neural Toolbox GUI.
I would apprecite any help. Especially, some advice which tool fits better for accomplishing my task.
I am not sure how difficult to classify this problem.
Given your sample, 4 input and 1 output feed-forward neural network is sufficient.
If you insist on using historical inputs, you simply pre-process your input d, such that
Your new input D(t) (a vector at time t) is composed of d(t) is a 1x4 vector at time t; d(t-1) is 1x4 vector at time t-1;... and d(t-k) is a 1x4 vector at time t-k.
If t-k <0, just treat it as '0'.
So you have a 1x(4(k+1)) vector as input, and 1 output.
Similar as Dan mentioned, you need to find a good k.
Speaking of the weights, I think additional pre-processing like windowing method on the input is not necessary, since neural network would be trained to assign weights to each input dimension.
It sounds a bit messy, since the neural network would consider each input dimension independently. That means you lose the information as four neighboring correlations.
One possible solution is the pre-processing extracts the neighborhood features, e.g. using mean and std as two features representative for the originals.