neural net width, depth - neural-network

Judging by answers to questions related to width,depth of neural nets, selecting values is more an art than science. Nets are even built that build their own layer sizes on-the fly to avoid making choices.
Why?
Factors would be: input data width, output data width. Add complexity: each I and O would have disallowed states not possible with real world data sets. Usually very finite input and output data set sizes, though all real possibilites may not always be intuitive.
Couldn't a neural net be constructed to recommend more ideal layer widths and depths than just making guesses? This would seem to be a natural in an AI researcher's tools box.

Related

How does upsampling in Fully Connected Convolutional network work?

I read several posts / articles and have some doubts on the mechanism of upsampling after the CNN downsampling.
I took the 1st answer from this question:
https://www.quora.com/How-do-fully-convolutional-networks-upsample-their-coarse-output
I understood that similar to normal convolution operation, the "upsampling" also uses kernels which need to be trained.
Question1: if the "spatial information" is already lost during the first stages of CNN, how can it be re-constructed in anyway ?
Question2: Why >"Upsampling from a small (coarse) featuremap deep in the network has good semantic information but bad resolution. Upsampling from a larger feature map closer to the input, will produce better detail but worse semantic information" ?
Question #1
Upsampling doesn't (and cannot) reconstruct any lost information. Its role is to bring back the resolution to the resolution of previous layer.
Theoretically, we can eliminate the down/up sampling layers altogether. However to reduce the number of computations, we can downsample the input before a layers and then upsample its output.
Therefore, the sole purpose of down/up sampling layers is to reduce computations in each layer, while keeping the dimension of input/output as before.
You might argue the down-sampling might cause information loss. That is always a possibility but remember the role of CNN is essentially extracting "useful" information from the input and reducing it into a smaller dimension.
Question #2
As we go from the input layer in CNN to the output layer, the dimension of data generally decreases while the semantic and extracted information hopefully increases.
Suppose we have the a CNN for image classification. In such CNN, the early layers usually extract the basic shapes and edges in the image. The next layers detect more complex concepts like corners, circles. You can imagine the very last layers might have nodes that detect very complex features (like presence of a person in the image).
So up-sampling from a large feature map close to the input produces better detail but has lower semantic information compared to the last layers. In retrospect, the last layers generally have lower dimension hence their resolution is worse compared to the early layers.

How does a neural network produce an array?

Background
I've been studying Neural Networks, specifically the implmentation provided by this incredible online book. In the example network provided, we're shown how to create a neural network that classifies the MNIST training data to perform Optical Character Recognition (OCR).
The network is configured so that the input stimuli represents a discrete range of thresholded pixel data from a 24x24 image; at the output, we have ten signal paths which represent each of the different solutions for the input images; these are used classify a handwritten digit from zero to nine. In this implementation, a handwritten '3' would drive a strong signal down the third output path.
Now, I've seen that Neural Networks can be applied to far more 'unpredictable' output solutions; for example, take the team who taught a network to recognize the hair on a human:
Question
Surely in the application above, we couldn't use a fixed output array length because the number of points that would qualify within an image would vary just so wildly between different samples. Can anyone recommend what kind of pattern would have been used to accomplish this?
Assumption
In the interest of completeness, I'm going to propose that the team could have employed a kind of 'line following robot' for the classification task. So for an input image, a network could be trained by using a small range of discrete commands (LEFT, RIGHT, UP, DOWN) for a fixed period t and train the network to control the robot like an Etch-a-Sketch.
Alternatively, we could implement a network which would map pixels one-to-one, and define whether individual pixels contributed to hair; but this wouldn't be compatible with different image resolutions.
So, do either of these solutions sound plausable? If so, are these basic implementations of a known generic solution for this kind of problem? What approach would you use?

can use more than ten inputs with a single layer neural network to separate into two categories

I have a pattern data with 12 categories. And I want to separate these data into two categories. So Can anyone tell me is it possible to do with a single layer neural network with 12 input values with the bias term? And also I implemented it with matlab but i'm having some doubt what should be the best initial weight values(range) and possible learning rate? can you please guide me on these cases.
Is a single layer enough?
Whether a single hidden layer suffices to correctly label your input data depends on the complexity of your data. You should empirically try different topologies (combinations of layers and number of neurons) until you discover a setting that works for you.
What are the best weight ranges?
The recommended weight ranges depends on the activation function you intend to use. For the sigmoid function, the range is a small interval centered around 0, eg: [-0.1, 0.1]
What is the ideal learning rate?
The learning rate often set to a small value such as 0.03, but if the data is easily learned by your network you can often increase the rate drastically eg: 0.3. Check out this discussion on how learning rates affect the learning process: https://stackoverflow.com/a/11415434/1149632
A side note
You should search the Web for a few pointers and tips, and rather post more to the point questions on StackOverflow.
Check this out:
http://www.willamette.edu/~gorr/classes/cs449/intro.html

Using a learned Artificial Neural Network to solve inputs

I've recently been delving into artificial neural networks again, both evolved and trained. I had a question regarding what methods, if any, to solve for inputs that would result in a target output set. Is there a name for this? Everything I try to look for leads me to backpropagation which isn't necessarily what I need. In my search, the closest thing I've come to expressing my question is
Is it possible to run a neural network in reverse?
Which told me that there, indeed, would be many solutions for networks that had varying numbers of nodes for the layers and they would not be trivial to solve for. I had the idea of just marching toward an ideal set of inputs using the weights that have been established during learning. Does anyone else have experience doing something like this?
In order to elaborate:
Say you have a network with 401 input nodes which represents a 20x20 grayscale image and a bias, two hidden layers consisting of 100+25 nodes, as well as 6 output nodes representing a classification (symbols, roman numerals, etc).
After training a neural network so that it can classify with an acceptable error, I would like to run the network backwards. This would mean I would input a classification in the output that I would like to see, and the network would imagine a set of inputs that would result in the expected output. So for the roman numeral example, this could mean that I would request it to run the net in reverse for the symbol 'X' and it would generate an image that would resemble what the net thought an 'X' looked like. In this way, I could get a good idea of the features it learned to separate the classifications. I feel as it would be very beneficial in understanding how ANNs function and learn in the grand scheme of things.
For a simple feed-forward fully connected NN, it is possible to project hidden unit activation into pixel space by taking inverse of activation function (for example Logit for sigmoid units), dividing it by sum of incoming weights and then multiplying that value by weight of each pixel. That will give visualization of average pattern, recognized by this hidden unit. Summing up these patterns for each hidden unit will result in average pattern, that corresponds to this particular set of hidden unit activities.Same procedure can be in principle be applied to to project output activations into hidden unit activity patterns.
This is indeed useful for analyzing what features NN learned in image recognition. For more complex methods you can take a look at this paper (besides everything it contains examples of patterns that NN can learn).
You can not exactly run NN in reverse, because it does not remember all information from source image - only patterns that it learned to detect. So network cannot "imagine a set inputs". However, it possible to sample probability distribution (taking weight as probability of activation of each pixel) and produce a set of patterns that can be recognized by particular neuron.
I know that you can, and I am working on a solution now. I have some code on my github here for imagining the inputs of a neural network that classifies the handwritten digits of the MNIST dataset, but I don't think it is entirely correct. Right now, I simply take a trained network and my desired output and multiply backwards by the learned weights at each layer until I have a value for inputs. This is skipping over the activation function and may have some other errors, but I am getting pretty reasonable images out of it. For example, this is the result of the trained network imagining a 3: number 3
Yes, you can run a probabilistic NN in reverse to get it to 'imagine' inputs that would match an output it's been trained to categorise.
I highly recommend Geoffrey Hinton's coursera course on NN's here:
https://www.coursera.org/course/neuralnets
He demonstrates in his introductory video a NN imagining various "2"s that it would recognise having been trained to identify the numerals 0 through 9. It's very impressive!
I think it's basically doing exactly what you're looking to do.
Gruff

How do neural networks handle large images where the area of interest is small?

If I've understood correctly, when training neural networks to recognize objects in images it's common to map single pixel to a single input layer node. However, sometimes we might have a large picture with only a small area of interest. For example, if we're training a neural net to recognize traffic signs, we might have images where the traffic sign covers only a small portion of it, while the rest is taken by the road, trees, sky etc. Creating a neural net which tries to find a traffic sign from every position seems extremely expensive.
My question is, are there any specific strategies to handle these sort of situations with neural networks, apart from preprocessing the image?
Thanks.
Using 1 pixel per input node is usually not done. What enters your network is the feature vector and as such you should input actual features, not raw data. Inputing raw data (with all its noise) will not only lead to bad classification but training will take longer than necessary.
In short: preprocessing is unavoidable. You need a more abstract representation of your data. There are hundreds of ways to deal with the problem you're asking. Let me give you some popular approaches.
1) Image proccessing to find regions of interest. When detecting traffic signs a common strategy is to use edge detection (i.e. convolution with some filter), apply some heuristics, use a threshold filter and isolate regions of interest (blobs, strongly connected components etc) which are taken as input to the network.
2) Applying features without any prior knowledge or image processing. Viola/Jones use a specific image representation, from which they can compute features in a very fast way. Their framework has been shown to work in real-time. (I know their original work doesn't state NNs but I applied their features to Multilayer Perceptrons in my thesis, so you can use it with any classifier, really.)
3) Deep Learning.
Learning better representations of the data can be incorporated into the neural network itself. These approaches are amongst the most popular researched atm. Since this is a very large topic, I can only give you some keywords so that you can research it on your own. Autoencoders are networks that learn efficient representations. It is possible to use them with conventional ANNs. Convolutional Neural Networks seem a bit sophisticated at first sight but they are worth checking out. Before the actual classification of a neural network, they have alternating layers of subwindow convolution (edge detection) and resampling. CNNs are currently able to achieve some of the best results in OCR.
In every scenario you have to ask yourself: Am I 1) giving my ANN a representation that has all the data it needs to do the job (a representation that is not too abstract) and 2) keeping too much noise away (and thus staying abstract enough).
We usually dont use fully connected network to deal with image because the number of units in the input layer will be huge. In neural network, we have specific neural network to deal with image which is Convolutional neural network(CNN).
However, CNN plays a role of feature extractor. The encoded feature will finally feed into a fully connected network which act as a classifier. In your case, I dont know how small your object is compare to the full image. But if the interested object is really small, even use CNN, the performance for image classification wont be very good. Then we probably need to use object detection(which used sliding window) to deal with it.
If you want recognize small objects on large sized image, you should use "scanning window".
For "scanning window" you can to apply dimention reducing methods:
DCT (http://en.wikipedia.org/wiki/Discrete_cosine_transform)
PCA (http://en.wikipedia.org/wiki/Principal_component_analysis)