Neural network output layer for multiple pattern recognition - neural-network

Assume that I have a method or other neural network to do pattern detection on an image correctly. How should I design a neural network where there are multiple patterns in an image?
Say that in an image, there are X patterns to be detected, what would be the best approach? AFAIK output layer neurons values should be [-1,1]. How would I know if there are X amount of patterns recognised? Does this mean that I have to set a hardcoded limit on how many patterns it can recognise (since number of output neuron is fixed)?

Here's a suggestion using face detection as an example. This Face Detection link on Github is described to detect multiples pattern (i.e. faces) using a Haar Classifier. If you read under the Implementation section it states that the algorithm uses scaleOption and templateSizeOption parameters (among others) to govern how many faces are detected in an image. It sounds like you should look for features in subspaces or windows of a given image (perhaps even spaces that overlap).
scaleOption - this parameter is used to specify the
rate at which the haar features used
for face detection will be scaled. A
lower scale option means that more
faces will be detected, while a higher
scale option will perform a faster
detection, but may miss some faces
from the input image. The default
scale value is 1.1, that determines an
increase in the features dimension of
10% at each step.
templateSizeOption – it is used to
specify the minimal area in which to
search for a face. If we want to
detect persons from close-up images,
the size should be over 40 pixels,
otherwise a 25 region pixels (which is
the default value ) is enough for
detecting a large number of faces.

to do this use a hopfild net.at first in equal windows extract your target and save in your the net. then with a simple algoritm search in your image and in any time compare the sim of the net with your target and for any target use separate array to save the result.at the end extract the nearest pattern in each array.you can use some image proccesing in your original image before starting.

Yes, this can be done by neural network. I think that most practical solutions would involve applying the neural network to a window which scanned over the image. Multiple hits from the neural network would imply multiple target objects in the image.
Incidentally, neural networks do not have to lie in the range -1 .. 1.

Related

How to decide which Convolution Neural Network architecture will work to identify the own data set?

I have data-set regarding chocolates. I need to detect whether it has scratches or not. I am planning to detect from Convolution Neural Network using Caffe. But how to define which neural network architecture will suit to my data-set?
Also how to generate heat values when there is any scratches in image?
I have tried detect normal image processing algorithms and it did not work.
Abnormal Image
Normal Image
Based on the little info you provide, the network architecture choice should be the last of your concerns. Also "trying normal image processing algorithms" is quite a vague statement.
A few points to consider
How big is the dataset? Are the chocolate photos taken in a controlled setting where they are always similar to your example photos or are they taken in the wild, i.e. where they could have different lighting conditions, positions, etc.? Is the dataset balanced?
How is the dataset labelled? Is it just a class for the whole image specifying normal vs abnormal? If so, you'd just be doing classification, and one way to potentially just visualise the location of the scratches (if they turn out to be the most prominent feature for the classification) is to use gradient-weighted class activation maps. On the other hand, if your dataset has labelled scratch points over images, then you can directly train your network to output heatmaps.
Once your dataset is properly set up with a training and validation set, you can just start with a baseline simple small convolutional network architecture, and then you can try out different and bigger network architectures like VGG16, ResNet, etc., and check whether they improve performance on your validation set.

How does a neural network produce an array?

Background
I've been studying Neural Networks, specifically the implmentation provided by this incredible online book. In the example network provided, we're shown how to create a neural network that classifies the MNIST training data to perform Optical Character Recognition (OCR).
The network is configured so that the input stimuli represents a discrete range of thresholded pixel data from a 24x24 image; at the output, we have ten signal paths which represent each of the different solutions for the input images; these are used classify a handwritten digit from zero to nine. In this implementation, a handwritten '3' would drive a strong signal down the third output path.
Now, I've seen that Neural Networks can be applied to far more 'unpredictable' output solutions; for example, take the team who taught a network to recognize the hair on a human:
Question
Surely in the application above, we couldn't use a fixed output array length because the number of points that would qualify within an image would vary just so wildly between different samples. Can anyone recommend what kind of pattern would have been used to accomplish this?
Assumption
In the interest of completeness, I'm going to propose that the team could have employed a kind of 'line following robot' for the classification task. So for an input image, a network could be trained by using a small range of discrete commands (LEFT, RIGHT, UP, DOWN) for a fixed period t and train the network to control the robot like an Etch-a-Sketch.
Alternatively, we could implement a network which would map pixels one-to-one, and define whether individual pixels contributed to hair; but this wouldn't be compatible with different image resolutions.
So, do either of these solutions sound plausable? If so, are these basic implementations of a known generic solution for this kind of problem? What approach would you use?

Convolution Neural Network for image detection/classification

So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.

Convolutional Neural Networks, Matrix of Convolutional (Kernel)

Good afternoon! In the first stage where on input of Convolutional Neural Network (input layer) we recieve a source image (hence an image of handwritten English letter). First of all we are using an nxn window which goes from left to right for scanning image and multiplication on kernel (convolutional matrix) to build Feature maps? But nowhere written about what exact values a kernel should be have (In other words on what Kernel values I should multiply data retrieved from n*n window ). Is it suitable to multiply data on this Convolutional Kernel intended for edge detection? There a numerous Convolutional Kernels (Emboss, Gaussian Filter, Edge detection, Angle detection, etc.)? But nowhere is written to what exact kernel it is need to multiply data for detecting hand written symbols.
Sample of Edge detection 3*3 kernel
Convolutional operation for multiplication on kernel
In addition, if size of entire image is 30*30, than is it possible to use window of 5*5 for building feature maps? Would it be enough sufficient for reaching optimal precision of letter detection?
On what exact kernel it is best to multiply area of entire image for the maximum precision of letter recognition? Or initially all values in kernel is equaled to 0? Could i also ask, what formula or rule is applied to detect overall needed amount of to be built feature maps? Or if the task is in letter recognition of English Language, than in each stage of Feature maps building process there must be exact 25 feature maps? Thank you for reply!
In a CNN, the convolutional kernel is a shared weight matrix, and is learned in a similar way to other weights. It is initialized in the same way, with small random values, and the weight deltas from back propagation are summed across all the features that receive its output (i.e. usually all "pixels" in the output of the convolutional layer)
A typical random kernel will perform a little like an edge detector.
After training, the first CNN layer can be displayed and will often have learned some kernels that can be interpreted if you are familiar with image processing
There is a nice animated view of kernel features being learned here: http://cs.nyu.edu/~yann/research/sparse/
In short your answer is this: There is no need to look for correct kernels to use. Instead look for a CNN library where you set params such as number of convolutional layers, and research the way to view the kernels as they learn - most CNN libraries will have a documented way to visualise them.

Can neural network fail to learn a function? and How to choose better feature descriptors for pattern recognition?

I was working on webots which is an environment used to model, program and simulate mobile robots. Basically i have a small robot with a VGA camera, and it looks for simple blue coloured patterns on white walls of a small lego maze and moves accordingly
The method I used here was
​
Obtain images of the patterns from webots and save it in a location
in PC.
​​Detect the blue pattern, form a square enclosing the pattern
with atleast 2 edges of the pattern being part of the boundary of the
square.
​Resize it to 7x7 matrix(using nearest neighbour
interpolation algorithm)
The input to the network is nothing but the red pixel intensities of each of the 7x7 image(when i look at the blue pixel through a red filter it appears black so). The intensities of each pixel is extracted and the 7x7 matrix is then converted it to a 1D vector i.e 1x49 which is my input to the neural network. (I chose this characteristic as my input because it is 'relatively' less difficult to access this information using C and webots.​​)
I used MATLAB for this offline training method and I used a slower learning rate(0.06) to ensure parameter convergence and tested it on large and small datasets(1189 and 346 respectively). On all the numerous times I have tried, the network fails to classify the pattern.(it says the pattern belongs to all the 4 classes !!!! ) . There is nothing wrong with the program as I tested it out on the simpleclass_dataset in matlab and it works almost perfectly
Is it possible that the neural network fails to learn the function because of really poor data? (by poor data i mean that the datapoints corresponding to one sample of one class are very close to another sample belonging to a different class or something of that sort). Or can the neural network fail because of very poor feature descriptors?
Can anyone suggest a simpler method to extract features from the image(I am now shifting to MATLAB as I am now only concerned with simulations in webots and not the real robot). What sort of features can I choose? The patterns are very simple (L,an inverted L and its reflected versions are the 4 patterns)
Neural networks CAN fail to learn a function; this is most often caused by employing a network topology which is too simple to model the necessary function. A classic example of this case is attempting to learn an XOR function using a perceptron classifier, although it can even happen in multilayer neural nets sometimes; especially for complex tasks like image recognition. See my previous answer for a rough guide on how to select neural network parameters (ignore the convolution stuff if you want, although I would highly recommened looking into convolutional neural networks if you are still having problems).
It is a possiblity that there is too little seperability between classes, although I doubt that this is the case given your current features. Is there a reason that your network needs to allow an image to be four classifications simultaneously? If not, then perhaps you could classify the input as the output with the highest activation instead of all those with high activations.