Making feature vector from Gabor filters for classification - matlab

My aim is to classify types of cars (Sedans,SUV,Hatchbacks) and earlier I was using corner features for classification but it didn't work out very well so now I am trying Gabor features.
code from here
Now the features are extracted and suppose when I give an image as input then for 5 scales and 8 orientations I get 2 [1x40] matrices.
1. 40 columns of squared Energy.
2. 40 colums of mean Amplitude.
Problem is I want to use these two matrices for classification and I have about 230 images of 3 classes (SUV,sedan,hatchback).
I do not know how to create a [N x 230] matrix which can be taken as vInputs by the neural netowrk in matlab.(where N be the total features of one image).
My question:
How to create a one dimensional image vector from the 2 [1x40] matrices for one image.(should I append the mean Amplitude to square energy matrix to get a [1x80] matrix or something else?)
Should I be using these gabor features for my purpose of classification in first place? if not then what?
Thanks in advance

In general, there is nothing to think about - simple neural network requires one dimensional feature vector and does not care about the ordering, so you can simply concatenate any number of feature vectors into one (and even do it in random order - it does not matter). In particular if you have same feature matrices you also concatenate each of its row to create a vectorized format.
The only exception is when your data actually has some underlying geometrical dependicies, for example - matrix is actualy a pixels matrix. In such case architectures like PyraNet, Convolutional Neural Networks and others, which apply some kind of receptive fields based on this 2d structure - should be better. But those implementations simply accept 2d feature vector as an input.

Related

How can I reduce extract features from a set of Matrices and vectors to be used in Machine Learning in MATLAB

I have a task where I need to train a machine learning model to predict a set of outputs from multiple inputs. My inputs are 1000 iterations of a set of 3x 1 vectors, a set of 3x3 covariance matrices and a set of scalars, while my output is just a set of scalars. I cannot use regression learner app because these inputs need to have the same dimensions, any idea on how to unify them?
One possible way to solve this is to flatten the covariance matrix into a vector. Once you did that, you can construct a 1000xN matrix where 1000 refers to the number of samples in your dataset and N is the number of features. For example if your features consist of a 3x1 vector, a 3x3 covariance matrix and lets say 5 other scalars, N could be 3+3*3+5=17. You then use this matrix to train an arbitrary model such as a linear regressor or more advanced models like a tree or the like.
When training machine learning models it is important to understand your data and exploit its structure to help the learning algorithms. For example we could use the fact that a covariance matrix is symmetric and positive semi-definite and thus lives in a closed convex cone. Symmetry of the matrix implies that it lives in a subspace of the set of all 3x3 matrices. In fact the dimension of the space of 3x3 symmetric matrices is only 6. You can use that knowledge to reduce redundancy in your data.

Understanding 3D convolution and when to use it?

I am new to convolutional neural networks, and I am learning 3D convolution.
What I could understand is that 2D convolution gives us relationships between low level features in the X-Y dimension, while the 3D convolution helps detect low level features and relationships between them in all the 3 dimensions.
Consider a CNN employing 2D conv layers to recognize hand written digits. If a digit, say 5, was written in different colors:
Would a strictly 2D CNN would perform poorly (since they belong to different channels in the z dimension)?
Also, are there practical well-known neural nets that employ 3D convolution?
The problem is that the 2D aspects of an image have locality. In a sense, things that are nearby are expected to be related in some fundamental way. E.g. a pixel near a hair pixel is expected to be a hair pixel, a priori. However, the different channels have no such relationship. When you only have 3 channels, a 3D convolution is equivalent to being fully connected in z. When you have 27 channels (e.g. in the middle of the net), why would any 3 channels be considered "close" to each other?
This answer explains the difference nicely.
Doing a "fully-connected" relationship over the channels is what most libraries do by default. Note this line in particular: "...a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]". For an input vector of size in_channels, a matrix of size [in_channels, out_channels] is fully-connected. So, the filter can be thought of as a fully-connected layer on a "patch" of image size [filter_height, filter_width].
To illustrate, on a single channel, a regular plain old image filter takes a patch of image and maps that patch to a single pixel in a new image. Like so: (image credit)
On the other hand, suppose that we have multiple channels. Instead of performing a linear mapping from a 3x3 patch to a 1x1 pixel, we perform a linear mapping from a 3x3xin_channels patch to a 1x1xout_channels set of pixels. How do we do this? Well, a linear mapping is just a matrix. Note that a 3x3xin_channels patch can be written as a vector with 3*3*in_channels entries. A 1x1xout_channels set of pixels can be written as a vector with out_channels entries. A linear mapping between the two is given by a matrix with 3*3*in_channels rows and out_channels columns. The entries of that matrix are the parameters of that layer of the network. The layer works by simply multiplying the in vector by the matrix of weights to get the out vector. This is repeated over all patches of an image. (Actually, instead of doing this in a loop over all patches, you can achieve an equivalent thing with some fanciness which is what libraries do in practice, but it gives the same result)
To illustrate, the mapping takes this 3x3xin_channels column:
To this 1x1xout_channels stack of pixels:
Now, what you are proposing is to do something with the following bit:
There is no mathematical reason why you can't do something with that 3x3x3 patch containing only 3 channels of your whole set of in_channels. However, whatever 3 channels you choose is totally arbitrary, and they have no intrinsic relationship to one another that would suggest that treating them as being "nearby" would help.
To reiterate, in an image, the pixels that are near each other are expected to be "similar" or "related" in some sense. This is why a convolution works at all. If you jumbled up the pixels and then did a convolution, it would be worthless. On that note, all of the channels are just a jumble. There is no "nearby relatedness" property along the channels. E.g. the "red" channel isn't near the "green" channel OR the "blue" channel, because "nearness" doesn't make any sense between the channels. Since "nearness" isn't a property of the channel dimension, then doing a convolution in that dimension probably isn't going to be useful.
On the other hand, we can simply take the input of ALL of the in_channels to generate the output from ALL of the out_channels simultaneously, and let them influence each other in a linear sort of way. Note that the linear transformation described involves a sort of cross-pollination of the parameters. For example, for a layer at the top of the network, taking in a 3x3 patch of r,g,b channels labeled r_1_1-r_3_3 etc., a single pixel in a single channel of the output from that patch would look like:
A*r_1_1 + B*r_1_2 + ... C*r_3_3 + D*b_1_1 + E*b_1_2 + ... F*b_3_3 + G*g_1_1 + ...
Where the capital letters are entries of the weight matrix.
So your observation: "Would a strictly 2D CNN would perform poorly?" is based on an assumption that the convolutional layer doesn't include any "mixing" between the various channels. This is not the case. The in_channels are ALL combined in a linear mapping to obtain the out_channels.

Coordinate normalization for NN input in matlab

I am trying to implement a classification NN in Matlab.
My inputs are clusters of coordinates from an image. (Corresponding to delaunay triangulation vertexes)
There are 3 clusters (results of the optics algorithm) in this format:
( Not all clusters are of the same size.). Elements represent coordinates in euclidean 2d space . So (110,12) is a point in my image and the matrix depicted represents one cluster of points.
Clustering was done on image edges. So coordinates refer to logical values (always 1s in this case) on the image matrix.(After edge detection there are 3 "dense" areas in an image, and these collections of pixels are used for classification). There are 6 target classes.
So, my question is how can I format them into single column vector inputs to use in a neural network?
(There is a relevant answer here but I would like some elaboration if possible. ( I am probably too tired right now from 12 hours of trying stuff and dont get it 100% :D :( )
Remember, there are 3 different coordinate matrices for each picture, so my initial thought was, create an nn with 3 inputs (of different length). But how to serialize this?
Here's a cluster with its tags on in case it helps:
For you to train the classifier, you need a matrix X where each row will correspond to an image. If you want to use a coordinate representation, this means all images will have to be of the same size, say, M by N. So, the row of an image will have M times N elements (features) and the corresponding feature values will be the cluster assignments. Class vector y will be whatever labels you have, that is one of the six different classes you mentioned through the comments above. You should keep in mind that if you use a coordinate representation, X can get very high-dimensional, and unless you have a large number of images, chances are your classifier will perform very poorly. If you have few images, consider using fractions of pixels belonging to clusters that I suggested in one of the comments: this can give you a shorter feature description that is invariant to rotation and translation, and may yield better classification.

How plot U-Matrix, Sample Hit and Input Planes from a trained data by SOM

I have written a simple SOM algorithm in MATLAB. My big challenge is that, how can I visualize/plot data in the format of U-Matrix, Sample Hits and Component/Input Planes? These three plots exists in the SOM toolbox in MATLAB. But the problem is that I cannot call them to visualize my data over my written code. Because they need a 'net' as input in which my code does not make any 'net'.
Is there any guidance?
You can create your own functions as they are not too complicated. I will assume a SOM of 20x20x10 (400 nodes, 4 features) for explanation.
The Hit-Map is no more than giving each sample to the already learned SOM and incrementing +1 to the node that was chosen as the Best Matching Unit (BMU). Then you plot this map. So if node(1,1) fires 10 times, and node(1,2) fires 100 times, then you will have an image where node(1,2) has a higher intensity than node(1,1).
The U-Matrix is a map representing the average distance between the node's weight vector and its closest neighbours. So here you can calculate the Euclidean distance between the feature vector of node X to every neighbour. So if you had a feature vector for node(1,1,:)=[1,1,2,3], node(1,2,:)=[2,2,1,1], and node(2,1,:)=[1,1,1,1], then the value of the U-matrix for node(1,1) could be U(1,1)=norm(squeeze(node(1,1,:)-node(1,2,:)))+norm(squeeze(node(1,1,:)-node(2,1,:)))=4.8818
The Component/Input Planes is the simplest one and does not require any processing. You just basically pick each feature of the SOM map and plot. So in our example of a 20x20x4 SOM, you would have 4 features and therefore 4 components, which you can plot through imagesc(node(:,:,1)) for feature 1

How to select top 100 features(a subset) which are most relevant after pca?

I performed PCA on a 63*2308 matrix and obtained a score and a co-efficient matrix. The score matrix is 63*2308 and the co-efficient matrix is 2308*2308 in dimensions.
How do i extract the column names for the top 100 features which are most important so that i can perform regression on them?
PCA should give you both a set of eigenvectors (your co-efficient matrix) and a vector of eigenvalues (1*2308) often referred to as lambda). You might been to use a different PCA function in matlab to get them.
The eigenvalues indicate how much of your data each eigenvector explains. A simple method for selecting features would be to select the 100 features with the highest eigen values. This gives you a set of feature which explain most of the variance in the data.
If you need to justify your approach for a write up you can actually calculate the amount of variance explained per eigenvector and cut of at, for example, 95% variance explained.
Bear in mind that selecting based solely on eigenvalue, might not correspond to the set of features most important to your regression, so if you don't get the performance you expect you might want to try a different feature selection method such as recursive feature selection. I would suggest using google scholar to find a couple of papers doing something similar and see what methods they use.
A quick matlab example of taking the top 100 principle components using PCA.
[eigenvectors, projected_data, eigenvalues] = princomp(X);
[foo, feature_idx] = sort(eigenvalues, 'descend');
selected_projected_data = projected(:, feature_idx(1:100));
Have you tried with
B = sort(your_matrix,2,'descend');
C = B(:,1:100);
Be careful!
With just 63 observations and 2308 variables, your PCA result will be meaningless because the data is underspecified. You should have at least (rule of thumb) dimensions*3 observations.
With 63 observations, you can at most define a 62 dimensional hyperspace!