what is the fully connected layer in googlent/Resnet50/Resnet101/inception v2 and v3 - matlab

I'm working on matlab and try to use the pretrained model cited above as feature extractor. In Alexnet and vggnet the fully connected layer is clear which named 'fc7' but in googlenet/resnet50/resnet101/inception v2 v3 it is not clear, could someone guide me? also what is the size of features in these models because in alexnet for example is 4096?

In any CNN, the fully connected layer can be spotted looking at the end of the network, as it processes the features extracted by the Convolutional Layer. If you access
net.Layers, you see that matlab calls the fully connected layer "Fully Connected" (which in ResNet 50 is fc1000). It is also followed by a softmax and a classification output.
The size of the classification layers depends on the Convolutional layer used for features extraction. In alexnet, different fully connected layers are stacked (fc6,fc7,fc8). I think that you can find the matrix extracted (therefore the features), by flattening the output before the first fully connected layer. In this case before fc1000

Related

In Matlab 2019b/2020a, when building a DNN, how to reshape the output of a fully connected layer to 2D shape so that a pretrained CNN can follow?

I am using the Deep Learning Toolbox to design a deep neural network. In the network, a 2D convolutional layer needs to follow a fully connected layer. But the deepNetworkDesigner doesn't allow such a structure because the output of a fully connected layer is 1D. In other frameworks like Torch, the way to solve it is to reshape the output of a fully layer to be 2D. Is there a method to achieve it Matlab 2019b/2020a? Thank you.
If you are creating your network without using the Deep Learning Network Analyzer
You can do this by creating a fully connected layer with an output size of 10 and the name 'fc1'
layer = fullyConnectedLayer(10,'Name','fc1')
Then, by including it in your Layer array
layers = [ ...
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
reluLayer
maxPooling2dLayer(2,'Stride',2)
fullyConnectedLayer(10)
softmaxLayer
classificationLayer]
As you can see, the fully connected 2D layers were followed by a fully connected layer of size 10 with no problems.
As the name suggests, all neurons in a fully connected layer connect
to all the neurons in the previous layer. This layer combines all of
the features (local information) learned by the previous layers across
the image to identify the larger patterns. For classification
problems, the last fully connected layer combines the features to
classify the images. This is the reason that the output size argument
of the last fully connected layer of the network is equal to the
number of classes of the data set. For regression problems, the output
size must be equal to the number of response variables.
Then, you can finish up your network by creating the layer graph as lgraph = layerGraph(layers)
If you have already designed your network using the Deep Network Designer
You can export the network's architecture to the workspace, and modify its output layer in two ways:
1. Using the GUI
In the Designer pane, drag a new fullyConnectedLayer from the Layer Library onto the canvas. Set OutputSize to the number of classes in the new data
Delete the last fully connected layer and connect your new layer instead
if your goal was to develop a classification network, replace the output layer. Scroll to the end of the Layer Library and drag a new classificationLayer onto the canvas. Delete the original output layer and connect your new layer instead
Check your network by clicking Analyze. The network is ready for training if Deep Learning Network Analyzer reports zero errors.
2. Without using the GUI
On the Designer tab, click Export. Depending on the network architecture, Deep Network Designer exports the network as a LayerGraph lgraph
Replace the final layer by the one using
newlgraph = replaceLayer(lgraph,the layer you want to change, the desired new layer)
Use analyzeNetwork(newlgraph) to check if the network is ready for training. If so, the Network Analyzer has to report zero errors.

Implementing FC layers as Conv layers

I understand that implementing Fully Connected Layer as Convolution Layer reduces parameter, but does it increases Computational Speed. If Yes, then why do people still use Fully Connected Layers?
Convolutional layers are used for low-level reasoning like feature extraction. At this level, using a fully connected layer would be wasteful of resources, because so many more parameters have to be computed. If you have an image of size 32x32x3, a fully connected layer would require computation of 32*32*3 = 3072 weights for the first layer. These many parameters are not required for low-level reasoning. Features tend to have spatial locality in images, and local connectivity is sufficient for feature extraction. If you used convolutional layer, 12 filters of size 3x3, you only need to calculate 12*3*3 = 108 weights.
Fully connected layers are used for high-level reasoning. These are the layers in the network which determine the final output of the convolutional network. As the reasoning becomes more complex, local connectivity is no longer sufficient, which is why fully connected layers are used in later stages of the network.
Please read this for a more detailed and visual explanation

Feature Extraction from Convolutional Neural Network (CNN) and use this feature to other classification algorithm

My question is can we use CNN for feature extraction and then can we use this extracted feature as an input to another classification algorithm like SVM.
Thanks
Yes, this has already been done and well documented in several research papers, like CNN Features off-the-shelf: an Astounding Baseline for Recognition and How transferable are features in deep neural networks?. Both show that using CNN features trained on one dataset, but tested on a different one usually perform very well or beat the state of the art.
In general you can take the features from the layer before the last, normalize them and use them with another classifier.
Another related technique is fine tuning, where after training a network, the last layer is replaced and retrained, but previous layers' weights are kept fixed.

How do I use a pre-trained Caffe model?

I have some questions about how to actually interact with a pre-trained Caffe model. In my case I'm using a model for scene recognition.
In the caffe git repository, there are some code examples in Python and C++ on the implementations of Image Classifiers. However, those do not apply to my use case (since they only classify the input image as ONE class).
My goal is an application that takes an input image (jpg) and outputs the highest predicted class label for each pixel in the input image (e.i., indices for sky, beach, road, car).
Could anyone give me some pointers on how to proceed?
There already seem to exist implementations for this. This demo (http://places.csail.mit.edu/demo.html) is kind of what I what.
Thank you!
What you are looking for is not image classification, but rather semantic segmentation.
A recent work, by Jonathan Long, Evan Shelhamer and Trevor Darrell is based on Caffe, and can be found here. It uses fully convolutional network, that is, a network with no "InnerProduct" layers only convolutional layers, thus capable of producing outputs with different sizes for different sizes of inputs.

Where do filters/kernels for a convolutional network come from?

I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.