Theanets: Removing individual connections - neural-network

How do you remove connections in Theanets? I'd like to create custom connectivity between an input layer, a single hidden layer, and an output layer. But the only defaults are feedforward all-to-all architectures or recurrent architectures. I'd like to remove specific connections from the all-to-all connectivity and then train the network.
Thanks in advance.

(Developer of theanets here.)
This is currently not directly possible with theanets. For computational efficiency the underlying computations in feedforward networks are implemented as simple matrix operations, which are fast and can be executed on a GPU for sometimes dramatic speedups.
You can, however, initialize the weights in a layer so that some (or many) of the weights are zero. To do this, just pass a dictionary in the layers list, and include a sparsity key:
import theanets
net = theanets.Autoencoder(
layers=(784, dict(size=1000, sparsity=0.9), 784))
This initializes the weights for the layer so that the given fraction of weights are zeros. The weights are, however, eligible for change during the training process, so this is only an initialization trick.
You can, however, implement a custom Layer subclass that does whatever you like, as long as you stay within the Theano boundaries. You could, for instance, implement a type of feedforward layer that uses a mask to ensure that some weights remain zeros during the feedforward computation.
For more details you might want to ask on the theanets mailing list.

Related

weight update of one random layer in multilayer neural network using backpagation?

In training Multi-layer Neural networks using back-propagation, weights of all layer are updated in each iteration.
I am thinking if we randomly select any layer and update weights of that layer only in each iteration of back-propagation.
How is it going to impact training time? Does model performance (generalization capabilities of model) suffers from this type of training?
My intuition is that generalization capability will be same and training time will be reduced. Please correct if I am wrong.
Your intution is wrong. What you are proposing is a block coordinated descent and while it makes sense to do something like this if the gradients are not correlated it does not make sense to do so in this context.
The problem in NNs for this is that you get the gradient of preceeding layers for free, while you calculate the gradient for any single layer, due to the chain rule. Therefore, you are just discarding this information for no good reason.

Implementing FC layers as Conv layers

I understand that implementing Fully Connected Layer as Convolution Layer reduces parameter, but does it increases Computational Speed. If Yes, then why do people still use Fully Connected Layers?
Convolutional layers are used for low-level reasoning like feature extraction. At this level, using a fully connected layer would be wasteful of resources, because so many more parameters have to be computed. If you have an image of size 32x32x3, a fully connected layer would require computation of 32*32*3 = 3072 weights for the first layer. These many parameters are not required for low-level reasoning. Features tend to have spatial locality in images, and local connectivity is sufficient for feature extraction. If you used convolutional layer, 12 filters of size 3x3, you only need to calculate 12*3*3 = 108 weights.
Fully connected layers are used for high-level reasoning. These are the layers in the network which determine the final output of the convolutional network. As the reasoning becomes more complex, local connectivity is no longer sufficient, which is why fully connected layers are used in later stages of the network.
Please read this for a more detailed and visual explanation

caffe SqueezeNet: where is the fully connected FC layer in prototxt

I am working on caffe SqueezeNet prototxt link.
I am just wondering where is the FC layer? (I only see type: data, conv, relu, pooling, concat, SoftmaxWithLoss and accuracy)
The reason is that FC layers have a ton of parameters, counting for the majority of the network's parameters in some architectures. The authors of SqueezeNet removed the FCs, replacing them with a convolutional layer and a global average pooling.
The conv layer has a number of filters equal to the number of classes, processing the output of a previous layer to (roughly) a map for each class. The pooling averages the response of each of these maps. They end up with a flattened vector with dimension equal to the number of classes that is, then, fed to the SoftMax layer.
With these modifications (not forgetting the Fire modules they proposed) they were able to significantly reduce memory footprint.
I strongly recommend that you read the SqueezeNet paper.
SqueezeNet doesn't have fully connected layers, it uses global average pooling instead.

Can a convolutional neural network be built with perceptrons?

I was reading this interesting article on convolutional neural networks. It showed this image, explaining that for every receptive field of 5x5 pixels/neurons, a value for a hidden value is calculated.
We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information.
So max-pooling is applied.
With multiple convolutional layers, it looks something like this:
But my question is, this whole architecture could be build with perceptrons, right?
For every convolutional layer, one perceptron is needed, with layers:
input_size = 5x5;
hidden_size = 10; e.g.
output_size = 1;
Then for every receptive field in the original image, the 5x5 area is inputted into a perceptron to output the value of a neuron in the hidden layer. So basically doing this for every receptive field:
So the same perceptron is used 24x24 amount of times to construct the hidden layer, because:
is that we're going to use the same weights and bias for each of the 24×24 hidden neurons.
And this works for the hidden layer to the pooling layer as well, input_size = 2x2; output_size = 1;. And in the case of a max-pool layer, it's just a max() function on an array.
and then finally:
The final layer of connections in the network is a fully-connected
layer. That is, this layer connects every neuron from the max-pooled
layer to every one of the 10 output neurons.
which is a perceptron again.
So my final architecture looks like this:
-> 1 perceptron for every convolutional layer/feature map
-> run this perceptron for every receptive field to create feature map
-> 1 perceptron for every pooling layer
-> run this perceptron for every field in the feature map to create a pooling layer
-> finally input the values of the pooling layer in a regular ALL to ALL perceptron
Or am I overseeing something? Or is this already how they are programmed?
The answer very much depends on what exactly you call a Perceptron. Common options are:
Complete architecture. Then no, simply because it's by definition a different NN.
A model of a single neuron, specifically y = 1 if (w.x + b) > 0 else 0, where x is the input of the neuron, w and b are its trainable parameters and w.b denotes the dot product. Then yes, you can force a bunch of these perceptrons to share weights and call it a CNN. You'll find variants of this idea being used in binary neural networks.
A training algorithm, typically associated with the Perceptron architecture. This would make no sense to the question, because the learning algorithm is in principle orthogonal to the architecture. Though you cannot really use the Perceptron algorithm for anything with hidden layers, which would suggest no as the answer in this case.
Loss function associated with the original Perceptron. This notion of Peceptron is orthogonal to the problem at hand, you're loss function with a CNN is given by whatever you try to do with your whole model. You can eventually use it, but it is non-differentiable, so good luck :-)
A sidenote rant: You can see people refer to feed-forward, fully-connected NNs with hidden layers as "Multilayer Perceptrons" (MLPs). This is a misnomer, there are no Perceptrons in MLPs, see e.g. this discussion on Wikipedia -- unless you go explore some really weird ideas. It would make sense call these networks as Multilayer Linear Logistic Regression, because that's what they used to be composed of. Up till like 6 years ago.

Artificial Neural Networks: Choosing initial neurons

How is the initial structure (Neurons and connections between them) chosen? My book only states that we give the connection random weights in the beginning before we train the network.
I think that we would add neurons during the training like this:
Start with a completely empty network
The first value I generate during the training will not exist
Add a neuron to correspond to this value, with a random weight
What you are after is a self-organizing ANN. Usually, the way the connections are organized is man-made into a model that the developer thinks will have sufficient power to perform the computation neccessary. You can of course start with a random selection of nodes with random connections, but the evolution of such a network will probably take a lot longer time than a standard two or three layer network.
So, yes, you are right in that you would use a similar approach when doing a self-organizing network. Keep track of two sets of genetic algorithms, one for the structure and one for the weights (or combine the two in some devious way) and evolve as you please.
I do not believe the question is about self-organising or GA-evolved ANNs. It sounds more like it is about a the most common ANN: a perceptron (single or multi-layer), in which case the structure of the network: the number of layers and the size of the layers, must be hand chosen at the beginning. A simple initial rule of thumb for initialising the weight is simply picking uniformly random values between -1.0 and 1.0.