Implementing FC layers as Conv layers - neural-network

I understand that implementing Fully Connected Layer as Convolution Layer reduces parameter, but does it increases Computational Speed. If Yes, then why do people still use Fully Connected Layers?

Convolutional layers are used for low-level reasoning like feature extraction. At this level, using a fully connected layer would be wasteful of resources, because so many more parameters have to be computed. If you have an image of size 32x32x3, a fully connected layer would require computation of 32*32*3 = 3072 weights for the first layer. These many parameters are not required for low-level reasoning. Features tend to have spatial locality in images, and local connectivity is sufficient for feature extraction. If you used convolutional layer, 12 filters of size 3x3, you only need to calculate 12*3*3 = 108 weights.
Fully connected layers are used for high-level reasoning. These are the layers in the network which determine the final output of the convolutional network. As the reasoning becomes more complex, local connectivity is no longer sufficient, which is why fully connected layers are used in later stages of the network.
Please read this for a more detailed and visual explanation

Related

Does BatchNormalization count as a layer in a network?

Is BatchNormalizationLayer considered a layer in a neural network?
For example, if we say, Resnet50 has 50 layers, does that mean that some of those layers may be batchnormalization layers?
When building models in Keras I considered it as an extra, similar to a dropout layer or when adding an “Activation layer”. But BatchNormalization has trainable parameters, so... I am confused
In DeepLearning literature, an X layer network simply refers to the usage of learnable layers that constitute the representational capacity of the network.
Activation layers, normalization layers (such as NLR, BatchNorm, etc), Downsampling layers (such as Maxpooling, etc) are not considered.
Layers such as CNN, RNN, FC, and the likes that are responsible for the representational capacity of the network are counted.
It really depends on how precise you define what a "layer" is. This may vary for different authors.
For your ResNet example it is pretty clear: In Section 3.4 Implementation you'll a description of the network, there it say's:
We adopt batch normalization (BN) right after each convolution and
before activation, [...].
So convolution and batch normalization is considered as a single layer. Figure 3. in the paper shows a picture of ResNet34 where the batch normalization layers are not even explicitly shown and the layers sum up to 34.
So in conclusion, the ResNet paper does not count batch normalization as extra layer.
Further Keras makes it really easy to check those things for many pretrained models, e.g.:
import tensorflow as tf
resnet = tf.keras.applications.ResNet50()
print(resnet.summary())

what is the fully connected layer in googlent/Resnet50/Resnet101/inception v2 and v3

I'm working on matlab and try to use the pretrained model cited above as feature extractor. In Alexnet and vggnet the fully connected layer is clear which named 'fc7' but in googlenet/resnet50/resnet101/inception v2 v3 it is not clear, could someone guide me? also what is the size of features in these models because in alexnet for example is 4096?
In any CNN, the fully connected layer can be spotted looking at the end of the network, as it processes the features extracted by the Convolutional Layer. If you access
net.Layers, you see that matlab calls the fully connected layer "Fully Connected" (which in ResNet 50 is fc1000). It is also followed by a softmax and a classification output.
The size of the classification layers depends on the Convolutional layer used for features extraction. In alexnet, different fully connected layers are stacked (fc6,fc7,fc8). I think that you can find the matrix extracted (therefore the features), by flattening the output before the first fully connected layer. In this case before fc1000

caffe SqueezeNet: where is the fully connected FC layer in prototxt

I am working on caffe SqueezeNet prototxt link.
I am just wondering where is the FC layer? (I only see type: data, conv, relu, pooling, concat, SoftmaxWithLoss and accuracy)
The reason is that FC layers have a ton of parameters, counting for the majority of the network's parameters in some architectures. The authors of SqueezeNet removed the FCs, replacing them with a convolutional layer and a global average pooling.
The conv layer has a number of filters equal to the number of classes, processing the output of a previous layer to (roughly) a map for each class. The pooling averages the response of each of these maps. They end up with a flattened vector with dimension equal to the number of classes that is, then, fed to the SoftMax layer.
With these modifications (not forgetting the Fire modules they proposed) they were able to significantly reduce memory footprint.
I strongly recommend that you read the SqueezeNet paper.
SqueezeNet doesn't have fully connected layers, it uses global average pooling instead.

Where do filters/kernels for a convolutional network come from?

I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.

Theanets: Removing individual connections

How do you remove connections in Theanets? I'd like to create custom connectivity between an input layer, a single hidden layer, and an output layer. But the only defaults are feedforward all-to-all architectures or recurrent architectures. I'd like to remove specific connections from the all-to-all connectivity and then train the network.
Thanks in advance.
(Developer of theanets here.)
This is currently not directly possible with theanets. For computational efficiency the underlying computations in feedforward networks are implemented as simple matrix operations, which are fast and can be executed on a GPU for sometimes dramatic speedups.
You can, however, initialize the weights in a layer so that some (or many) of the weights are zero. To do this, just pass a dictionary in the layers list, and include a sparsity key:
import theanets
net = theanets.Autoencoder(
layers=(784, dict(size=1000, sparsity=0.9), 784))
This initializes the weights for the layer so that the given fraction of weights are zeros. The weights are, however, eligible for change during the training process, so this is only an initialization trick.
You can, however, implement a custom Layer subclass that does whatever you like, as long as you stay within the Theano boundaries. You could, for instance, implement a type of feedforward layer that uses a mask to ensure that some weights remain zeros during the feedforward computation.
For more details you might want to ask on the theanets mailing list.