What are the kernels normally used in convolution neural network? - matlab

I am beginner in deep learning.I am using deep neural network [DNN] for image segmentation. I have few doubts. I have input image size 512x512.1. I want to select 6 Kernels of 5X5 pixels.I could not understand these kernels how I have to select, is there any standard kernel available? if yes please tell me.2. How can I take patch of a image? is it like manual cropping of some part from original image?

A very good paper for CNN-based segmentation is "Fully Convolutional Networks for Semantic Segmentation" by J. Ling et al. and they released their pre-trained networks. They can be found in the Caffe model zoo page. They also released their code (in Caffe), so it is possible to train or fine-tune models on new segmentation problems.
Note that these models directly learn the "complete" segmentation of the images. They do not rely on sampled image patches with a single class as output like previous classification-based approaches.

Related

Convolution Neural Network for image detection/classification

So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.

Deconvolution with caffe

I was wondering if it is possible to perform a deconvolution of images in Caffe using a point spread function of objective at a given focal point. Something along the lines of this approach.
If yes, what would be the best way to proceed?
It is possible to deconvolve images using Caffe (and CNN in general), but the approach may not be as general as you hope it to be.
CNNs can take blurry image as an input and output sharp image. As the networks are convolutional, the input can be of any size. This can be easily done in Caffe using Convolutional layers and Euclidean Loss layer. Optionally, you can experiment with adding some pooling and deconvolution layers.
CNNs can be trained to deconvolve images for specific blur PSF as in your link. (see: [Xu et al.:Deep Convolutional Neural Network for Image Deconvolution. NIPS 2014]). This works well but you have to re-train the CNN for each new PSF (which takes lot of time).
I've tried to train CNNs to do blind deconvolution (PSF is not known) and it works very well for text documents. You can get trained nets and python-Caffe scripts at [Hradiš et al.: Convolutional Neural Networks for Direct Text Deblurring. BMVC 2015]. This approach could work for other types of images, but it would not work for unrestricted photographs and diverse blurs. For general photos, I would guess It could work for small range of blurs.
Another possibility is to do inverse filtration (e.g. using Wiener filter) and process the output using a CNN. The advantage of this is that you can compute the inverse filter for new PSF very fast and the CNN stays the same. [Schuler et al.: A machine learning approach for non-blind image deconvolution. CVPR 2013]

How do I use a pre-trained Caffe model?

I have some questions about how to actually interact with a pre-trained Caffe model. In my case I'm using a model for scene recognition.
In the caffe git repository, there are some code examples in Python and C++ on the implementations of Image Classifiers. However, those do not apply to my use case (since they only classify the input image as ONE class).
My goal is an application that takes an input image (jpg) and outputs the highest predicted class label for each pixel in the input image (e.i., indices for sky, beach, road, car).
Could anyone give me some pointers on how to proceed?
There already seem to exist implementations for this. This demo (http://places.csail.mit.edu/demo.html) is kind of what I what.
Thank you!
What you are looking for is not image classification, but rather semantic segmentation.
A recent work, by Jonathan Long, Evan Shelhamer and Trevor Darrell is based on Caffe, and can be found here. It uses fully convolutional network, that is, a network with no "InnerProduct" layers only convolutional layers, thus capable of producing outputs with different sizes for different sizes of inputs.

Where do filters/kernels for a convolutional network come from?

I've seen some tutorial examples, like UFLDL covolutional net, where they use features obtained by unsupervised learning, or some others, where kernels are engineered by hand (using Sobel and Gabor detectors, different sharpness/blur settings etc). Strangely, I can't find a general guideline on how one should choose a good kernel for something more than a toy network. For example, considering a deep network with many convolutional-pooling layers, are the same kernels used at each layer, or does each layer have its own kernel subset? If so, where do these, deeper layer's filters come from - should I learn them using some unsupervised learning algorithm on data passed through the first convolution-and-pooling layer pair?
I understand that this question doesn't have a singular answer, I'd be happy to just the the general approach (some review article would be fantastic).
The current state of the art suggest to learn all the convolutional layers from the data using backpropagation (ref).
Also, this paper recommend small kernels (3x3) and pooling (2x2). You should train different filters for each layer.
Kernels in deep networks are mostly trained all at the same time in a supervised way (known inputs and outputs of network) using Backpropagation (computes gradients) and some version of Stochastic Gradient Descent (optimization algorithm).
Kernels in different layers are usually independent. They can have different sizes and their numbers can differ as well. How to design a network is an open question and it depends on your data and the problem itself.
If you want to work with your own dataset, you should start with an existing pre-trained network [Caffe Model Zoo] and fine-tune it on your dataset. This way, the architecture of the network would be fixed, as you would have to respect the architecture of the original network. The networks you can donwload are trained on very large problems which makes them able to generalize well to other classification/regression problems. If your dataset is at least partly similar to the original dataset, the fine-tuned networks should work very well.
Good place to get more information is Caffe # CVPR2015 tutorial.

How to choose the number of nodes for using BP network in face recognition?

I read some books but still cannot make sure how should I organize the network. For example, I have pgm image with size 120*100, how the input should be like(like a one dimensional array with size 120*100)? and how many nodes should I adapt.
It's typically best to organize your input image as a 2D matrix. The reason is that the layers at the lower levels of the neural networks used in machine perception tasks are typically locally connected. For example, each neuron of the first layer of such a neural net will only process the pixels of a small NxN patch of the input image. This naturally leads to a 2D structure which can be more easily described with 2D matrices.
For a detailed explanation I'll refer you to the DeepFace paper which describes the stat of the art in face recognition systems.
120*100 one dimensional vector is fine. The locations of the pixel values in that vector does not matter, because all nodes are fully connected with the nodes in the next layer anyway. But you must be consistent with their locations between training, validating, and testing.
The most successful approach so far was to go with a convolutional neural network with 2D input, just as #benoitsteiner stated. For a far simpler example I'd refer you to a LeNet-5, a small neural network developed for MNIST hand-written digit recognition. It is used in EBLearn for face recognition with quite good results.