Convert SURFpoints object MATLAB - matlab

Is there any way I can convert the SURFpoints object, generated by matlab, into a matrix with x and y positions, for feeding into a neural network?
I am a pretty much complete beginner, but from what I can tell, and by looking at documentation, I wasn't sure if there was a way to get SURFpoints into neural networks?
Many thanks,
Hugh

SURFPoints has a field, Location, that is an n x 2 matrix that has the (x,y) coordinates of each SURF point detected in the image.
Note, however, that SURF points have other attributes beside their location (such as scale and orientation). If you only take into account the (x,y) locations, you are throwing away a lot of data.
Also, it's unclear how you would feed this information into a neural network. A neural network, like many other machine learning models, expects a uniform length feature vector of an entity. If your task is something like image classification, you'll have to come up with some way to convert the list of SURF points into a feature vector that captures the properties you want your classifier to care about. Depending on your application, a neural network may or may not be the best way to go. In the context of computer vision and image processing, neural networks these days are more commonly used for unsupervised feature discovery (see "deep learning"). For supervised learning tasks, other models like boosted decision trees and SVMs give better theoretical guarantees and have fared much better in practice.

Related

Face Recognition based on Deep Learning (Siamese Architecture)

I want to use pre-trained model for the face identification. I try to use Siamese architecture which requires a few number of images. Could you give me any trained model which I can change for the Siamese architecture? How can I change the network model which I can put two images to find their similarities (I do not want to create image based on the tutorial here)? I only want to use the system for real time application. Do you have any recommendations?
I suppose you can use this model, described in Xiang Wu, Ran He, Zhenan Sun, Tieniu Tan A Light CNN for Deep Face Representation with Noisy Labels (arXiv 2015) as a a strating point for your experiments.
As for the Siamese network, what you are trying to earn is a mapping from a face image into some high dimensional vector space, in which distances between points reflects (dis)similarity between faces.
To do so, you only need one network that gets a face as an input and produce a high-dim vector as an output.
However, to train this single network using the Siamese approach, you are going to duplicate it: creating two instances of the same net (you need to explicitly link the weights of the two copies). During training you are going to provide pairs of faces to the nets: one to each copy, then the single loss layer on top of the two copies can compare the high-dimensional vectors representing the two faces and compute a loss according to a "same/not same" label associated with this pair.
Hence, you only need the duplication for the training. In test time ('deploy') you are going to have a single net providing you with a semantically meaningful high dimensional representation of faces.
For a more advance Siamese architecture and loss see this thread.
On the other hand, you might want to consider the approach described in Oren Tadmor, Yonatan Wexler, Tal Rosenwein, Shai Shalev-Shwartz, Amnon Shashua Learning a Metric Embedding for Face Recognition using the Multibatch Method (arXiv 2016). This approach is more efficient and easy to implement than pair-wise losses over image pairs.

Can neural network fail to learn a function? and How to choose better feature descriptors for pattern recognition?

I was working on webots which is an environment used to model, program and simulate mobile robots. Basically i have a small robot with a VGA camera, and it looks for simple blue coloured patterns on white walls of a small lego maze and moves accordingly
The method I used here was
​
Obtain images of the patterns from webots and save it in a location
in PC.
​​Detect the blue pattern, form a square enclosing the pattern
with atleast 2 edges of the pattern being part of the boundary of the
square.
​Resize it to 7x7 matrix(using nearest neighbour
interpolation algorithm)
The input to the network is nothing but the red pixel intensities of each of the 7x7 image(when i look at the blue pixel through a red filter it appears black so). The intensities of each pixel is extracted and the 7x7 matrix is then converted it to a 1D vector i.e 1x49 which is my input to the neural network. (I chose this characteristic as my input because it is 'relatively' less difficult to access this information using C and webots.​​)
I used MATLAB for this offline training method and I used a slower learning rate(0.06) to ensure parameter convergence and tested it on large and small datasets(1189 and 346 respectively). On all the numerous times I have tried, the network fails to classify the pattern.(it says the pattern belongs to all the 4 classes !!!! ) . There is nothing wrong with the program as I tested it out on the simpleclass_dataset in matlab and it works almost perfectly
Is it possible that the neural network fails to learn the function because of really poor data? (by poor data i mean that the datapoints corresponding to one sample of one class are very close to another sample belonging to a different class or something of that sort). Or can the neural network fail because of very poor feature descriptors?
Can anyone suggest a simpler method to extract features from the image(I am now shifting to MATLAB as I am now only concerned with simulations in webots and not the real robot). What sort of features can I choose? The patterns are very simple (L,an inverted L and its reflected versions are the 4 patterns)
Neural networks CAN fail to learn a function; this is most often caused by employing a network topology which is too simple to model the necessary function. A classic example of this case is attempting to learn an XOR function using a perceptron classifier, although it can even happen in multilayer neural nets sometimes; especially for complex tasks like image recognition. See my previous answer for a rough guide on how to select neural network parameters (ignore the convolution stuff if you want, although I would highly recommened looking into convolutional neural networks if you are still having problems).
It is a possiblity that there is too little seperability between classes, although I doubt that this is the case given your current features. Is there a reason that your network needs to allow an image to be four classifications simultaneously? If not, then perhaps you could classify the input as the output with the highest activation instead of all those with high activations.

How to choose the number of nodes for using BP network in face recognition?

I read some books but still cannot make sure how should I organize the network. For example, I have pgm image with size 120*100, how the input should be like(like a one dimensional array with size 120*100)? and how many nodes should I adapt.
It's typically best to organize your input image as a 2D matrix. The reason is that the layers at the lower levels of the neural networks used in machine perception tasks are typically locally connected. For example, each neuron of the first layer of such a neural net will only process the pixels of a small NxN patch of the input image. This naturally leads to a 2D structure which can be more easily described with 2D matrices.
For a detailed explanation I'll refer you to the DeepFace paper which describes the stat of the art in face recognition systems.
120*100 one dimensional vector is fine. The locations of the pixel values in that vector does not matter, because all nodes are fully connected with the nodes in the next layer anyway. But you must be consistent with their locations between training, validating, and testing.
The most successful approach so far was to go with a convolutional neural network with 2D input, just as #benoitsteiner stated. For a far simpler example I'd refer you to a LeNet-5, a small neural network developed for MNIST hand-written digit recognition. It is used in EBLearn for face recognition with quite good results.

How do neural networks handle large images where the area of interest is small?

If I've understood correctly, when training neural networks to recognize objects in images it's common to map single pixel to a single input layer node. However, sometimes we might have a large picture with only a small area of interest. For example, if we're training a neural net to recognize traffic signs, we might have images where the traffic sign covers only a small portion of it, while the rest is taken by the road, trees, sky etc. Creating a neural net which tries to find a traffic sign from every position seems extremely expensive.
My question is, are there any specific strategies to handle these sort of situations with neural networks, apart from preprocessing the image?
Thanks.
Using 1 pixel per input node is usually not done. What enters your network is the feature vector and as such you should input actual features, not raw data. Inputing raw data (with all its noise) will not only lead to bad classification but training will take longer than necessary.
In short: preprocessing is unavoidable. You need a more abstract representation of your data. There are hundreds of ways to deal with the problem you're asking. Let me give you some popular approaches.
1) Image proccessing to find regions of interest. When detecting traffic signs a common strategy is to use edge detection (i.e. convolution with some filter), apply some heuristics, use a threshold filter and isolate regions of interest (blobs, strongly connected components etc) which are taken as input to the network.
2) Applying features without any prior knowledge or image processing. Viola/Jones use a specific image representation, from which they can compute features in a very fast way. Their framework has been shown to work in real-time. (I know their original work doesn't state NNs but I applied their features to Multilayer Perceptrons in my thesis, so you can use it with any classifier, really.)
3) Deep Learning.
Learning better representations of the data can be incorporated into the neural network itself. These approaches are amongst the most popular researched atm. Since this is a very large topic, I can only give you some keywords so that you can research it on your own. Autoencoders are networks that learn efficient representations. It is possible to use them with conventional ANNs. Convolutional Neural Networks seem a bit sophisticated at first sight but they are worth checking out. Before the actual classification of a neural network, they have alternating layers of subwindow convolution (edge detection) and resampling. CNNs are currently able to achieve some of the best results in OCR.
In every scenario you have to ask yourself: Am I 1) giving my ANN a representation that has all the data it needs to do the job (a representation that is not too abstract) and 2) keeping too much noise away (and thus staying abstract enough).
We usually dont use fully connected network to deal with image because the number of units in the input layer will be huge. In neural network, we have specific neural network to deal with image which is Convolutional neural network(CNN).
However, CNN plays a role of feature extractor. The encoded feature will finally feed into a fully connected network which act as a classifier. In your case, I dont know how small your object is compare to the full image. But if the interested object is really small, even use CNN, the performance for image classification wont be very good. Then we probably need to use object detection(which used sliding window) to deal with it.
If you want recognize small objects on large sized image, you should use "scanning window".
For "scanning window" you can to apply dimention reducing methods:
DCT (http://en.wikipedia.org/wiki/Discrete_cosine_transform)
PCA (http://en.wikipedia.org/wiki/Principal_component_analysis)

Pattern recognition in Neural Network using matlab simulation

I am new to this neural network in matlab. I wanted to create a Neural Network using matlab simulation.
This matlab simulation is using pattern recognition.
I am running on a windows XP platform.
For example, I have a sets of waveforms of circular shape.
I have extracted out the poles.
These poles will teach my Neural Network that it is circular in shape, hence whenever I input another set of slightly different circular shape waveform, the Neural Network is able to distinguish between the shape.
Currently, I have extracted the poles of these 3 shapes, cylinder, circle and rectangle.
But I am clueless of how I should go about creating my Neural Network.
I'd recommend utilizing SOM (Self-organizing map) for pattern recognition since it's really robust. Also there's a Som Toolbox for Matlab you might be interested in. However, to make it learn waves while neglecting their offsets, you'd need to make some changes to the "similarity function". These changes will affect quite a lot on the SOM's training time but if that's not a problem, keep reading.
For the SOM you'll have to sample your waves to constant sized vectors, let say:
sin x -> sin_vector = (a1, a2, a3, ..., aN)
cos x -> cos_vector = (b1, b2, b3, ..., bN)
Usually similarity of "SOM-vectors" is calculated with euclidian distance. Euclidian distance of those two vectors is huge since they have a different offset. In your case they should be considered to be similar ie. distance to be small. So.. if you don't sample all the similar waves from the same starting point, they will be classified in different classes. That is probably a problem. But! Similarity of vectors in SOM is calculated in order to find the BMU (best-matching unit) from the map and pulling the BMU's and its neigborhood's vectors torwards the values of the given sample. So all you need to change is the way to compare those vectors and the way to pull the vectors' values torwards the sample so that both will be "offset-tolerent".
Slow but working solution is first finding the best offset index for each vector. Best offset index is the one that will produce the smallest value with euclidian distance for the sample. Smallest distance calculated with some node of the net will then be the BMU. Then the BMU's and its neigborhood's vectors are pulled torwards the given sample using the offset index calculated for each node just before. Everything else should work out-of-the-box.
This solution is relatively slow but should work great. I'd recommend studying the consept of SOM thoroughly and then reading this post (and angry comments) again :)
PLEASE comment if you know some mathematical solution that would be better than that previous one!
You can try to use Matlab's Neural network pattern recognition tool nprtool as it is specialize to train and test neural network for pattern recognition.