Trying to find object coordinates (x,y) in image, my neural network seems to optimize error without learning [closed] - neural-network

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I generate images of a single coin pasted over a white background of size 200x200. The coin is randomly chosen among 8 euro coin images (one for each coin) and has :
random rotation ;
random size (bewteen fixed bounds) ;
random position (so that the coin is not cropped).
Here are two examples (center markers added): Two dataset examples
I am using Python + Lasagne. I feed the color image into the neural network that has an output layer of 2 linear neurons fully connected, one for x and one for y.
The targets associated to the generated coin images are the coordinates (x,y) of the coin center.
I have tried (from Using convolutional neural nets to detect facial keypoints tutorial):
Dense layer architecture with various number of layers and number of units (500 max) ;
Convolution architecture (with 2 dense layers before output) ;
Sum or mean of squared difference (MSE) as loss function ;
Target coordinates in the original range [0,199] or normalized [0,1] ;
Dropout layers between layers, with dropout probability of 0.2.
I always used simple SGD, tuning the learning rate trying to have a nice decreasing error curve.
I found that as I train the network, the error decreases until a point where the output is always the center of the image. It looks like the output is independent of the input. It seems that the network output is the average of the targets I give. This behavior looks like a simple minimization of the error since the positions of the coins are uniformly distributed on the image. This is not the wanted behavior.
I have the feeling that the network is not learning but is just trying to optimize the output coordinates to minimize the mean error against the targets. Am I right? How can I prevent this? I tried to remove the bias of the output neurons because I thought maybe I'm just modifying the bias and all others parameters are being set to zero but this didn't work.
Is it possible for a neural network alone to perform well at this task?
I have read that one can also train a net for present/not present binary classification and then scan the image to find possible locations of objects. But I just wondered if it was possible just using the forward computation of a neural net.

Question : How can I prevent this [overfitting without improvement to test scores]?
What needs to be done is to re-architect your neural net. A neural net just isn't going to do a good job at predicting an X and Y coordinate. It can through create a heat map of where it detects a coin, or said another way, you could have it turn your color picture into a "coin-here" probability map.
Why? Neurons have a good ability to be used to measure probability, not coordinates. Neural nets are not the magic machines they are sold to be but instead really do follow the program laid out by their architecture. You'd have to lay out a pretty fancy architecture to have the neural net first create an internal space representation of where the coins are, then another internal representation of their center of mass, then another to use the center of mass and the original image size to somehow learn to scale the X coordinate, then repeat the whole thing for Y.
Easier, much easier, is to create a coin detector Convolution that converts your color image to a black and white image of probability-a-coin-is-here matrix. Then use that output for your custom hand written code that turns that probability matrix into an X/Y coordinate.
Question : Is it possible for a neural network alone to perform well at this task?
A resounding YES, so long as you set up the right neural net architecture (like the above), but it would probably be much easier to implement and faster to train if you broke the task into steps and only applied the Neural Net to the coin detection step.

Related

Explain how is heat map used in CNN for crowd count case?

im new to neural netwworks, and through the readings i often come across where heat maps are used in the network along with the ground truth provided with the dataset to (as far as my understanding is) evaluate the accuracy of network performance.
to be specific, consider the application of a crowd density estimation network, the dataset provide the crowd images, each image hase a corresponding ground truth .mat file, this file has:
a matrix of X and Y coordinations representing the appearanse of human head in the image.
the total number of human heads in the image (crowd count) which is equal to the matrix rows.
my current understanding is that one image will get through the network and the result is a going to be compared with the given ground-truth (either the head locations, ore the crowd count),
SO, how and at which point and is the crowd density map or heat map is used? do we generate one for the image while training an compare it with the one generated from the ground truth? how is this done?
all the papers i've read neglect describing this process.
any clear explanation will be appreciated.
The counting-by-density CNN approache, which you describe are fully convolutional regression network where the output is a density (heat map). That mean the training ground-truth data is a density map too. In general a MSE loss is used for training. The ground-truth density maps are in general precomputed from the head detection ground-truth.
E.g. if you look at the MCNN code preparation folder you will find get_density_map_gaussian.m file which performs the density estimation from the ground-truth annotated head.

Handwritten Word Segmentation using neural network

So what I am trying to do is to segment cursive handwritten English words into individual characters. I have applied a simple heuristic approach with artificial intelligence to do a basic over-segmentation of the words something like this:
I am coding this in Matlab. The approach involves preprocessing, slant correction size normalization etc and then thinning pen strokes to 1 pixel width and identify the ligatures present in the image using column sum of pixels of the image. Every column with pixel sum lower than a threshold is a possible segmentation point. Problem is open characters like 'u', 'v, 'm' 'n' and 'w' also have low column sum of pixels and gets segmented.
The approach I have used is a modified version of what is presented in this paper:
cursive script segmentation using neural networks.
Now to improve this arrangement I have to use a neural network to correct these over segmented points and recognize them as bad segmentations. I will write a 'newff' function for that and label the segments as good and bas manually but I fail to understand what should be the input to that neural network?
My guess is that we have to give some image data along with the column number at which possible segments are made(one segmentation point per training sample. The given image has about 40 segmentation points so it will lead to 40 training samples) and have it label as good or bad segment for training.
There will be just one output neuron telling us if the segmentation point is good or bad.
Can I give column sums of all the columns as input to the input layer? How do I tell it what the segmentation point for this training instance is? Won't the actual column number we have to classify as good or bad segment which is the most important value here drown in the sea of this n-dimensional input? (n being width of the image pixel-wise)
Since it has been last asked, I am now using image features in vicinity of each segmented column my heuristic algorithm has returned. These features (like column sum pixel density close to the segmented column) is my input to the neural network with a single output neuron. Target vectors are 1 for a good segmentation point and 0 for a bad one.

Convolutional Neural Networks, Matrix of Convolutional (Kernel)

Good afternoon! In the first stage where on input of Convolutional Neural Network (input layer) we recieve a source image (hence an image of handwritten English letter). First of all we are using an nxn window which goes from left to right for scanning image and multiplication on kernel (convolutional matrix) to build Feature maps? But nowhere written about what exact values a kernel should be have (In other words on what Kernel values I should multiply data retrieved from n*n window ). Is it suitable to multiply data on this Convolutional Kernel intended for edge detection? There a numerous Convolutional Kernels (Emboss, Gaussian Filter, Edge detection, Angle detection, etc.)? But nowhere is written to what exact kernel it is need to multiply data for detecting hand written symbols.
Sample of Edge detection 3*3 kernel
Convolutional operation for multiplication on kernel
In addition, if size of entire image is 30*30, than is it possible to use window of 5*5 for building feature maps? Would it be enough sufficient for reaching optimal precision of letter detection?
On what exact kernel it is best to multiply area of entire image for the maximum precision of letter recognition? Or initially all values in kernel is equaled to 0? Could i also ask, what formula or rule is applied to detect overall needed amount of to be built feature maps? Or if the task is in letter recognition of English Language, than in each stage of Feature maps building process there must be exact 25 feature maps? Thank you for reply!
In a CNN, the convolutional kernel is a shared weight matrix, and is learned in a similar way to other weights. It is initialized in the same way, with small random values, and the weight deltas from back propagation are summed across all the features that receive its output (i.e. usually all "pixels" in the output of the convolutional layer)
A typical random kernel will perform a little like an edge detector.
After training, the first CNN layer can be displayed and will often have learned some kernels that can be interpreted if you are familiar with image processing
There is a nice animated view of kernel features being learned here: http://cs.nyu.edu/~yann/research/sparse/
In short your answer is this: There is no need to look for correct kernels to use. Instead look for a CNN library where you set params such as number of convolutional layers, and research the way to view the kernels as they learn - most CNN libraries will have a documented way to visualise them.

Can neural network fail to learn a function? and How to choose better feature descriptors for pattern recognition?

I was working on webots which is an environment used to model, program and simulate mobile robots. Basically i have a small robot with a VGA camera, and it looks for simple blue coloured patterns on white walls of a small lego maze and moves accordingly
The method I used here was
​
Obtain images of the patterns from webots and save it in a location
in PC.
​​Detect the blue pattern, form a square enclosing the pattern
with atleast 2 edges of the pattern being part of the boundary of the
square.
​Resize it to 7x7 matrix(using nearest neighbour
interpolation algorithm)
The input to the network is nothing but the red pixel intensities of each of the 7x7 image(when i look at the blue pixel through a red filter it appears black so). The intensities of each pixel is extracted and the 7x7 matrix is then converted it to a 1D vector i.e 1x49 which is my input to the neural network. (I chose this characteristic as my input because it is 'relatively' less difficult to access this information using C and webots.​​)
I used MATLAB for this offline training method and I used a slower learning rate(0.06) to ensure parameter convergence and tested it on large and small datasets(1189 and 346 respectively). On all the numerous times I have tried, the network fails to classify the pattern.(it says the pattern belongs to all the 4 classes !!!! ) . There is nothing wrong with the program as I tested it out on the simpleclass_dataset in matlab and it works almost perfectly
Is it possible that the neural network fails to learn the function because of really poor data? (by poor data i mean that the datapoints corresponding to one sample of one class are very close to another sample belonging to a different class or something of that sort). Or can the neural network fail because of very poor feature descriptors?
Can anyone suggest a simpler method to extract features from the image(I am now shifting to MATLAB as I am now only concerned with simulations in webots and not the real robot). What sort of features can I choose? The patterns are very simple (L,an inverted L and its reflected versions are the 4 patterns)
Neural networks CAN fail to learn a function; this is most often caused by employing a network topology which is too simple to model the necessary function. A classic example of this case is attempting to learn an XOR function using a perceptron classifier, although it can even happen in multilayer neural nets sometimes; especially for complex tasks like image recognition. See my previous answer for a rough guide on how to select neural network parameters (ignore the convolution stuff if you want, although I would highly recommened looking into convolutional neural networks if you are still having problems).
It is a possiblity that there is too little seperability between classes, although I doubt that this is the case given your current features. Is there a reason that your network needs to allow an image to be four classifications simultaneously? If not, then perhaps you could classify the input as the output with the highest activation instead of all those with high activations.

Pattern recognition in Neural Network using matlab simulation

I am new to this neural network in matlab. I wanted to create a Neural Network using matlab simulation.
This matlab simulation is using pattern recognition.
I am running on a windows XP platform.
For example, I have a sets of waveforms of circular shape.
I have extracted out the poles.
These poles will teach my Neural Network that it is circular in shape, hence whenever I input another set of slightly different circular shape waveform, the Neural Network is able to distinguish between the shape.
Currently, I have extracted the poles of these 3 shapes, cylinder, circle and rectangle.
But I am clueless of how I should go about creating my Neural Network.
I'd recommend utilizing SOM (Self-organizing map) for pattern recognition since it's really robust. Also there's a Som Toolbox for Matlab you might be interested in. However, to make it learn waves while neglecting their offsets, you'd need to make some changes to the "similarity function". These changes will affect quite a lot on the SOM's training time but if that's not a problem, keep reading.
For the SOM you'll have to sample your waves to constant sized vectors, let say:
sin x -> sin_vector = (a1, a2, a3, ..., aN)
cos x -> cos_vector = (b1, b2, b3, ..., bN)
Usually similarity of "SOM-vectors" is calculated with euclidian distance. Euclidian distance of those two vectors is huge since they have a different offset. In your case they should be considered to be similar ie. distance to be small. So.. if you don't sample all the similar waves from the same starting point, they will be classified in different classes. That is probably a problem. But! Similarity of vectors in SOM is calculated in order to find the BMU (best-matching unit) from the map and pulling the BMU's and its neigborhood's vectors torwards the values of the given sample. So all you need to change is the way to compare those vectors and the way to pull the vectors' values torwards the sample so that both will be "offset-tolerent".
Slow but working solution is first finding the best offset index for each vector. Best offset index is the one that will produce the smallest value with euclidian distance for the sample. Smallest distance calculated with some node of the net will then be the BMU. Then the BMU's and its neigborhood's vectors are pulled torwards the given sample using the offset index calculated for each node just before. Everything else should work out-of-the-box.
This solution is relatively slow but should work great. I'd recommend studying the consept of SOM thoroughly and then reading this post (and angry comments) again :)
PLEASE comment if you know some mathematical solution that would be better than that previous one!
You can try to use Matlab's Neural network pattern recognition tool nprtool as it is specialize to train and test neural network for pattern recognition.