How to create feature vector for neural network(MATLAB) - matlab

I'm trying to use neural network(multilayered NN) to help me to classify input image into its respective class(3 classes).
I have done as below:
(1)read input image(image)..
(2)apply canny edge detector(image=>edgeimage).
(3)label region in edge image(length(B)=20,no.of region=20)
(4)compute feature.
As for computing the image features, for example, in each region, i have computed circularity and convexity which result in 20 feature value for circularity and 20 feature value for convexity. The number of region labeled can also be varied from image to image.
In my current understanding now, the number of input neuron should be in equal to the number of features used. So..number of input neuron is equal to 2(for each circularity and convexity) or it should be 40(for each feature value)..?
And I want to know too, how to create the right feature vector-input and target for my problem.
thanks.

Related

How to extract memnet heat maps with the caffe model?

I want to extract both memorability score and memorability heat maps by using the available memnet caffemodel by Khosla et al. at link
Looking at the prototxt model, I can understand that the final inner-product output should be the memorability score, but how should I obtain the memorability map for a given input image? Here some examples.
Thanks in advance
As described in their paper [1], the CNN (MemNet) outputs a single, real-valued output for the memorability. So, the network they made publicly available, calculates this single memorability score, given an input image - and not a heatmap.
In section 5 of the paper, they describe how to use this trained CNN to predict a memorability heatmap:
To generate memorability maps, we simply scale up the image and apply MemNet to overlapping regions of the image. We do this for multiple scales of the image and average the resulting memorability maps.
Let's consider the two important steps here:
Problem 1: Make the CNN work with any input size.
To make the CNN work on images of any arbitrary size, they use the method presented in [2].
While convolutional layers can be applied to images of arbitrary size - resulting in smaller or larger outputs - the inner product layers have a fixed input and output size.
To make an inner product layer work with any input size, you apply it just like a convolutional kernel. For an FC layer with 4096 outputs, you interpret it as a 1x1 convolution with 4096 feature maps.
To do that in Caffe, you can directly follow the Net Surgery tutorial. You create a new .prototxt file, where you replace the InnerProduct layers with Convolution layers. Now, Caffe won't recognize the weights in the .caffemodel anymore, as the layer types don't match anymore. So, you load the old net and its parameters into Python, load the new net, and assign the old parameters to the new net and save it as a new .caffemodel file.
Now, we can run images of any dimensions (larger or equal than 227x227) through the network.
Problem 2: Generate the heat map
As explained in the paper [1], you apply this fully-convolutional network from Problem 1 to the same image at different scales. The MemNet is a re-trained AlexNet, so the default input dimension is 227x227. They mention that a 451x451 input gives a 8x8 output, which implies a stride of 28 for applying the layers. So a simple example could be:
Scale 1: 227x227 → 1x1. (I guess they definitely use this scale.)
Scale 2: 283x283 → 2x2. (Wild guess)
Scale 3: 339x339 → 4x4. (Wild guess)
Scale 4: 451x451 → 8x8. (This scale is mentioned in the paper.)
The results will look like this:
So, you'll just average these outputs to get your final 8x8 heatmap. From the image above, it should be clear how to average the different-scale outputs: you'll have to upsample the low-res ones to 8x8, and average then.
From the paper, I assume that they use very high-res scales, so their heatmap will be around the same size as the image initially was. They write that it takes 1s on a "normal" GPU. This is a quite long time, which also indicates that they probably upsample the input images quite to quite high dimensions.
Bibliography:
[1]: A. Khosla, A. S. Raju, A. Torralba, and A. Oliva, "Understanding and Predicting Image Memorability at a Large Scale", in: ICCV, 2015. [PDF]
[2]: J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation", in: CVPR, 2015. [PDF]

Handwritten Word Segmentation using neural network

So what I am trying to do is to segment cursive handwritten English words into individual characters. I have applied a simple heuristic approach with artificial intelligence to do a basic over-segmentation of the words something like this:
I am coding this in Matlab. The approach involves preprocessing, slant correction size normalization etc and then thinning pen strokes to 1 pixel width and identify the ligatures present in the image using column sum of pixels of the image. Every column with pixel sum lower than a threshold is a possible segmentation point. Problem is open characters like 'u', 'v, 'm' 'n' and 'w' also have low column sum of pixels and gets segmented.
The approach I have used is a modified version of what is presented in this paper:
cursive script segmentation using neural networks.
Now to improve this arrangement I have to use a neural network to correct these over segmented points and recognize them as bad segmentations. I will write a 'newff' function for that and label the segments as good and bas manually but I fail to understand what should be the input to that neural network?
My guess is that we have to give some image data along with the column number at which possible segments are made(one segmentation point per training sample. The given image has about 40 segmentation points so it will lead to 40 training samples) and have it label as good or bad segment for training.
There will be just one output neuron telling us if the segmentation point is good or bad.
Can I give column sums of all the columns as input to the input layer? How do I tell it what the segmentation point for this training instance is? Won't the actual column number we have to classify as good or bad segment which is the most important value here drown in the sea of this n-dimensional input? (n being width of the image pixel-wise)
Since it has been last asked, I am now using image features in vicinity of each segmented column my heuristic algorithm has returned. These features (like column sum pixel density close to the segmented column) is my input to the neural network with a single output neuron. Target vectors are 1 for a good segmentation point and 0 for a bad one.

KNN Classifier using cross validation

I am trying to implement KNN classifier using the cross validation approach where I have different images of a certain character for training(e.g 5 images) and another two for testing. Now I get the idea of the cross validation by simply choosing the K with the least error value when training & then using it with the test data to find how accurate my results are.
My question is how do I train images in matlab to get my K value? Do I compare them and try to find mismatch or what?!
Any help would be really appreciated.
First of you need to define your task precisely. F.ex Given an image I in R^(MxN) we wish to classify I as an image containing faces or an image without faces.
I often work with pixel classifiers, where the task is something like: For an image I decide if each pixel is a face pixel or a non-face pixel.
An important part of defining the task is to make a hypotheses that can be used as basis for training a classifier. F.ex We believe that the distribution of pixel intensities can be used to discriminate images of faces from images not containing faces.
Then you need to select some features that define your image. This can be done in many ways and you should search for what other people do when they analyse the same type of images you are working with.
One widely used method in pixel classification is to use pixel intensity values and do a multi-scale analysis of the image. The idea in multi-scale analysis is that different structures are most evident at different level of blurring called scales. As an illustration consider an image of a tree. Without blurring we notice the fine structure, such as small branches and leafs. When we blur the image we notice the trunk and major branches. This is often used as part of segmentation methods.
When you know your task and the features, you can train a classifier. If you use kNN and cross-validation to find the best k, you should split you dataset in train/testing and then split the training set in train/validate sets. You then train using the reduced training set and use the validation set to decide which k is the best. In the case of binary classification e.g face vs non-face the error rate is often used as a measure of performance.
Finally you use the parameters to train the classifier on the full dataset and estimate its performance on the test set.
A classification example: With or without milk?
As a full example, consider images of a cup of coffee taken from above so it shows the rim of the cup surrounding a brownly colored disk. Further assume that all images are scaled and cropped so the diameter of the disk is the same and dimensions of the image are the same. To simplify the task, we convert the color image to grayscale and scale the pixel intensities to the range [0,1].
We want to train a classifier so it can distinguish coffee with milk from coffee without milk. From inspection of histograms of some of the coffee images, we see that each image has two "bumps" in the histogram that are clearly separated. We believe that these bumps correspond to foreground (coffee) and background. Now we make the hypothesis that the average intensity of the foreground can be used to distinguish between coffee+milk/coffee.
To find the foreground pixels we observe that because the foreground/background ratio is the same (by design) we can just find the intensity value that gives us that ratio for each image. Then we calculate the average intensity of the foreground pixels and use this value as a feature for each image.
If we have N images that we have manually labeled, we split this into training and test set. We then calculate the average foreground intensity for each image in the training set, giving us a set of (average foreground intensity, label) values. We want to use kNN where an image is assigned the same class as the majority class of the k closest images. We measure the distance as the absolute value of the difference in average foreground pixel intensity.
We search for the optimal k with cross validation. We use 2-fold cross validation (aka holdout) to find the best k. We test k = {1,3,5} and select the k that gives the least prediction error on the validation set.

HOG Feature Implementation with SVM in MATLAB

I would like to do classification based on HOG Features using SVM.
I understand that HOG features is the combination of all the histograms in every cell (i.e. it becomes one aggregate histogram).
I extract HOG features using MATLAB code in this page for the Dalal-Triggs variant.
For example, I have grayscale image with size of 384 x 512, then I extracted the HOG features at 9 orientations and a cell-size of 8. By doing this, I get 48 x 64 x 36 features.
How can I make this a histogram and use it toward a SVM classifier?
Because for example, I'll have 7 classes of images and I want to do training (total images would be 700 for training) and then classify new data based on the model generated from the training phase.
I read that for multiclass, we can train our SVM with ONE vs ALL, that means that I have to train 7 classifier for my 7 classes.
So for the 1st train, I'll consider the 1st class to be labelled with +1 and the reast class will be 0.
And the 2nd train, I'll consider the 2nd class to be labelled with +1 and the reast class will be 0. And so on..
For example, I have classes of colors :
Red, green, blue, yellow, white, black and pink.
So for the 1st training, I make only 2 binary which is red and not red..
For the 2nd training, I make label green and not green..
Is it like that??
The syntax to train SVM is:
SVMStruct = svmtrain(Training,Group)
But in this case, I'll have 7 SVMStruct..
The syntax to classify / testing
Group = svmclassify(SVMStruct,Sample)
how to declare 7 SVMStruct in here??
Is that right??
Or there are another concept or syntaks that I have to know??
And for training, I'll have 48 x 64 x 36 features, howw I can train these features in SVM??
because as what I read, they just have 1xN matrix of features..
Please help me...
HOG and SVM is most successful algorithm for object detection. To apply this method, yes indeed you must have two different training datasets before they are fed into SVM classifier. For instance, you want to detect an apple, so you must have two training dataset, positive images is the one contains an apple in image and negative images is the one contains no apples in image. Then, you extract the features from both training datasets (positive and negative) into HOG descriptor separately and also label it separately (i.e 1 is for positive, 0 for negative). Afterwards, combine the features vector from positive and negative and feed them to SVM Classifier.
You can use SVM Light or LibSVM, which is easier and user friendly for beginner.
The Computer Vision System Toolbox for MATLAB includes extractHOGFeatures function, and the Statistics Toolbox includes SVM. Here's an example of how to classify images using HOG and SVM.
1. How can I make this a histogram and use it toward a SVM classifier?
One distinction I want to make is that you already have a 'histogram' of oriented gradient features. You now need to give these features as input to the SVM. It is weird to assign labels to each of these features because one HoG feature might turn up in another image labelled differently.
In practice what is done is to make another histogram called a bag of words from these HoG features and give them to the SVM as input. The intuition is if two features are very similar you would want one representation for both these HoG features. This reduces the variance in the input data. Now we make this new histogram for each image.
A bag of words is created in the following way:
Cluster all the HoG features into 'words'. Say you have 1000 of
these words.
Go through all HoG features and assign HoG feature to a word if it
is closest(euclidean distance) to that word among all words in bag.
Count how many words are assigned to each word in bag. This is the
1XN(N = number of words in bag) feature histogram which will be given
as input to the SVM after labeling.
2. How to carry out multi-class classification using a SVM?
If you will retrain the SVM you will get another model. There are two ways how you might do multiclass SVM using SVMTrain
1) One vs One SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features Image 2, GREEN
Example for model 1 input will be
bag of words features for Image 3, YELLOW
bag of words features Image 2, GREEN
The above is done for each pair of label classes. You will have N(N-1)/2 models. Now you can count the number of votes for each class from the N(N-1)/2 models to find which label to assign.
2) One vs All SVM
Train for each label class with input in the following way:
Example for model 1 input will be
bag of words features for Image 1, RED
bag of words features for Image 2, NOT RED
Example for model 1 input will be
bag of words features for Image 2, GREEN
bag of words features for Image 1, NOT GREEN
The above is done each label class. You will have N models. Now you can count the number of votes for each class from the N models to find which label to assign.
Read more on category-level classification here: http://www.di.ens.fr/willow/events/cvml2013/materials/slides/tuesday/Tue_bof_summer_school_paris_2013.pdf

Neural network output layer for multiple pattern recognition

Assume that I have a method or other neural network to do pattern detection on an image correctly. How should I design a neural network where there are multiple patterns in an image?
Say that in an image, there are X patterns to be detected, what would be the best approach? AFAIK output layer neurons values should be [-1,1]. How would I know if there are X amount of patterns recognised? Does this mean that I have to set a hardcoded limit on how many patterns it can recognise (since number of output neuron is fixed)?
Here's a suggestion using face detection as an example. This Face Detection link on Github is described to detect multiples pattern (i.e. faces) using a Haar Classifier. If you read under the Implementation section it states that the algorithm uses scaleOption and templateSizeOption parameters (among others) to govern how many faces are detected in an image. It sounds like you should look for features in subspaces or windows of a given image (perhaps even spaces that overlap).
scaleOption - this parameter is used to specify the
rate at which the haar features used
for face detection will be scaled. A
lower scale option means that more
faces will be detected, while a higher
scale option will perform a faster
detection, but may miss some faces
from the input image. The default
scale value is 1.1, that determines an
increase in the features dimension of
10% at each step.
templateSizeOption – it is used to
specify the minimal area in which to
search for a face. If we want to
detect persons from close-up images,
the size should be over 40 pixels,
otherwise a 25 region pixels (which is
the default value ) is enough for
detecting a large number of faces.
to do this use a hopfild net.at first in equal windows extract your target and save in your the net. then with a simple algoritm search in your image and in any time compare the sim of the net with your target and for any target use separate array to save the result.at the end extract the nearest pattern in each array.you can use some image proccesing in your original image before starting.
Yes, this can be done by neural network. I think that most practical solutions would involve applying the neural network to a window which scanned over the image. Multiple hits from the neural network would imply multiple target objects in the image.
Incidentally, neural networks do not have to lie in the range -1 .. 1.