How to improve the features from a convnnet for image retrieval? - neural-network

I have 3 classes. (50k for training, 12k for validation)
By using pretrained vgg16 and resnet50, and freezing the models and only training a dense layer on top, I reach a validation accuracy of 99%.
Should I fine tune to improve features by unfreezing the layers or should I use the features as it is?
Also, is vgg16 a better feature extractor than Resnet50 or should I use features from Resnet?
Thanks!

It depends on your problem domain. If you are fine-tuning the pretrained model for the same problem domain and the training data size is small, then what you have done is correct.
Maybe if you freeze only the first layers, which are well trained on for general feature extraction (egdes, blobs, shapes ..etc), you can boost your performance. It also recommended to apply data augmentation if you are going to do this to avoid over fitting
I encourage you to check the following tutorial on Transfer Learning for more details:
http://cs231n.github.io/transfer-learning/

Related

How to evaluate a quality of neural network for object detection in Keras?

I've already trained the neural network in Keras for detecting two classes of images (cats and dogs) and got accuracy on test data. Is it enough for the conclusion in the master thesis or should I do other actions for evaluating the quality of network (for instance, cross-validation)?
Not really, I would expect more than just accuracy from my students in any classification setup. Accuracy only evaluates that particular network on that particular test set but you would have to some extent justify the design choices you've made in building that network. Here are some things to consider:
Presumably you have some hyper-parameters you've fixed, you can investigate how these affect your results. How many filters? How many layers? and most importantly why?
An important aspect of object classification is how your model handles noise. Depending on your dataset, one simple way would be to pre-process the test data, blur it, invert colours etc and you'll see that your performance will drop. Why does it do that? How does the confusion matrix look like then?
What is the performance of the network? Is it fast, slow compared to another system, say VGG?
When you evaluate your project in general not just the network, asking why things worked helps a lot, not just why things didn't work.

Steps to think of while trying to increase accuracy of network

How do you proceed to increasing accuracy of your neural network?
I have tried lots of architectures yet in my image detection ( classification + localization ) I can only get 75% accuracy.
I am using VOC2007 dataset, and I extracted only data where 1 person is present.
What are the steps I can think of to increase the accuracy of my object detector?
thanks for help.
You might want to have a look at my masters thesis Analysis and Optimization of Convolutional Neural Network Architectures, chapter 2.5 page 15:
A machine learning developer has the following choices to improve the model’s quality:
(I1) Change the problem definition (e.g., the classes which are to be distinguished)
(I2) Get more training data
(I3) Clean the training data
(I4) Change the preprocessing (see Appendix B.1)
(I5) Augment the training data set (see Appendix B.2)
(I6) Change the training setup (see Appendices B.3 to B.5)
(I7) Change the model (see Appendices B.6 and B.7)
It's always good to check thoroughly where exactly the problem is and compare it with a human baseline. Reliably getting better than a human is super hard.

Neural Network Architectures that add layers based on complexity?

I'm searching for existing work on Neural Net architectures that grow based on need or complexity/variability of training data. Some architectures that I've found include self-organizing maps, and growing Neural gas. Are these the only one's out there?
What I'm searching for is best illustrated by a simple scenario;
if the training data only has a few patterns, then the neural net would be 2-3 layers deep with a small set of nodes in each layer. If the training data was more convoluted, then we would see deeper networks.
Such work seems rare or absent in the AI literature. Is it because the performance is comparatively weak ? I'd appreciate any guidance.
An example of this is called neuro-evolution. What you could do is combine backprop with evolution to find the optimal structure for your dataset. Neataptic is one of the NN libraries which offers neuro-evolution. With some simple coding you could turn this into backprop + evolution.
The disadvantage of this is that it will require much more computation power as it requires a genetic algorithm to run an entire population. So using neuro-evolution does make the performance comparibly weak.
However, I think there are more techniques out there that disable certain nodes, and if there is no negative effect on the output, they will be removed. I'm not sure though.

Feature Extraction from Convolutional Neural Network (CNN) and use this feature to other classification algorithm

My question is can we use CNN for feature extraction and then can we use this extracted feature as an input to another classification algorithm like SVM.
Thanks
Yes, this has already been done and well documented in several research papers, like CNN Features off-the-shelf: an Astounding Baseline for Recognition and How transferable are features in deep neural networks?. Both show that using CNN features trained on one dataset, but tested on a different one usually perform very well or beat the state of the art.
In general you can take the features from the layer before the last, normalize them and use them with another classifier.
Another related technique is fine tuning, where after training a network, the last layer is replaced and retrained, but previous layers' weights are kept fixed.

Where do filters/kernels for a convolutional network come from?

I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.