I’ve a confusion regarding using CNN (VGG, ResNet) as a backbone of U-net. I’m using segmentation model library for using VGG and ResNet as a backbone. My input shape is 512x512x3. As far I’ve understood in U-net the skip connection is used in before every layer where a downsampling happes (example: maxpool for VGG or conv with 2x2 stride for ResNet).But, in the model summary for both VGG and ResNet based backbone I’m seeing the skip connection is there from the second downsampling (256x256x64) but there is no skip connection from the 512 resolution. Can someone explain the reasons? I’ve added the detailed model diagram for reference.
I was following this code, https://github.com/bnsreenu/python_for_microscopists/blob/master/214_multiclass_Unet_sandstone_segm_models_ensemble.py.
Thanks in advance.
Related
I'm working on matlab and try to use the pretrained model cited above as feature extractor. In Alexnet and vggnet the fully connected layer is clear which named 'fc7' but in googlenet/resnet50/resnet101/inception v2 v3 it is not clear, could someone guide me? also what is the size of features in these models because in alexnet for example is 4096?
In any CNN, the fully connected layer can be spotted looking at the end of the network, as it processes the features extracted by the Convolutional Layer. If you access
net.Layers, you see that matlab calls the fully connected layer "Fully Connected" (which in ResNet 50 is fc1000). It is also followed by a softmax and a classification output.
The size of the classification layers depends on the Convolutional layer used for features extraction. In alexnet, different fully connected layers are stacked (fc6,fc7,fc8). I think that you can find the matrix extracted (therefore the features), by flattening the output before the first fully connected layer. In this case before fc1000
I have 3 classes. (50k for training, 12k for validation)
By using pretrained vgg16 and resnet50, and freezing the models and only training a dense layer on top, I reach a validation accuracy of 99%.
Should I fine tune to improve features by unfreezing the layers or should I use the features as it is?
Also, is vgg16 a better feature extractor than Resnet50 or should I use features from Resnet?
Thanks!
It depends on your problem domain. If you are fine-tuning the pretrained model for the same problem domain and the training data size is small, then what you have done is correct.
Maybe if you freeze only the first layers, which are well trained on for general feature extraction (egdes, blobs, shapes ..etc), you can boost your performance. It also recommended to apply data augmentation if you are going to do this to avoid over fitting
I encourage you to check the following tutorial on Transfer Learning for more details:
http://cs231n.github.io/transfer-learning/
So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.
I am beginner in deep learning.I am using deep neural network [DNN] for image segmentation. I have few doubts. I have input image size 512x512.1. I want to select 6 Kernels of 5X5 pixels.I could not understand these kernels how I have to select, is there any standard kernel available? if yes please tell me.2. How can I take patch of a image? is it like manual cropping of some part from original image?
A very good paper for CNN-based segmentation is "Fully Convolutional Networks for Semantic Segmentation" by J. Ling et al. and they released their pre-trained networks. They can be found in the Caffe model zoo page. They also released their code (in Caffe), so it is possible to train or fine-tune models on new segmentation problems.
Note that these models directly learn the "complete" segmentation of the images. They do not rely on sampled image patches with a single class as output like previous classification-based approaches.
I'm replicating the steps in
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
I want to change the network to VGG model which is obtained at
http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
does it suffice to simply substitute the model parameter as following?
./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights VGG_ISLVRC_16_layers.caffemodel -gpu 0
Or do I need to adjust learning rates, iterations, i.e. does it come with separate prototxt files?
There needs to be a 1-1 correspondence between the weights of the network you want to train and the weights you use for initializing/fine-tuning. The architecture of the old and new model have to match.
VGG-16 has a different architecture than the model described by models/finetune_flickr_style/train_val.prototxt (FlickrStyleCaffeNet). This is the network that the solver will try to optimize. Even if it doesn't crash, the weights you've loaded don't have any meaning in the new network.
The VGG-16 network is described in the deploy.prototxt file on this page in Caffe's Model Zoo.