Relevant eigenvectors in a face image? - matlab

I am using PCA for face recognition. I have obtained the eigenvectors / eigenfaces for each image, which is a colomn matrix. I want to know if selecting the first three eigenvectors , since their corresponding eigenvalues amount to 70% of total variance, will be sufficient for face recognition?

Firstly, lets be clear about a few things. The eigenvectors are computed from the covariance matrix formed from the entire dataset i.e., you reshape each grayscale image of a face into a single column and treat it as a point in R^d space, compute the covariance matrix from them and compute the eigenvectors of the covariance matrix. These eigenvectors become a new basis for your space of face images. You do not have eigenvectors for each image. Instead, you represent each face image in terms of the eigenvectors by projecting onto (a possibly subset) of them.
Limitations of eigenfaces
As to whether the representation of your face images under this new basis good enough for face recognition depends on many factors. But in general, the eigenfaces method does not perform well for real world unconstrained faces. It only works for faces which are pixel-wise aligned, facing frontal, and has fairly uniform illumination conditions across the images.
More is not necessarily better
While it is commonly believed (when using PCA) that retaining more variance is better than less, things are much more complicated than that because of two factors: 1) Noise in real world data and 2) dimensionality of data. Sometimes projecting to a lower dimension and losing variance can actually produce better results.
Conclusion
Hence, my answer is it is difficult to say whether retaining a certain amount of variance is enough beforehand. The number of dimensions (and hence the number of eigenvectors to keep and the associated variance retained) should be determined by cross-validation. But ultimately, as I have mentioned above, eigenfaces is not a good method for face recognition unless you have a "nice" dataset. You might be slightly better off using "Fisherfaces" i.e., LDA on the face images or combine these methods with Local Binary Pattern (LBP) as features (instead of raw face pixels). But seriously, face recognition is a difficult problem and in general the state-of-the-art has not reached a stage where it can be deployed in real world systems.

It's not impossible, but a little rare to me that only 3 eigenvalues can achieve 70% variance. How many training samples do you have (what is the total dimension)? Make sure you are reshape each image from the database into a vector, normalize the vector data then align them into a matrix. The eigenvalues/eigenvectors are obtained from the covariance of the matrix.
In theory, 70% variance should be enough to form a human-recognizable face with the corresponding eigenvectors. However, the optimal number of eigenvalues is better to get from cross-validation: you can try to increase 1 eigenvector each time, observe the face formation and the recognition accuracy. You can even plot the cross validation accuracy curve, there may be a sharp corner on the curve, then the corresponding eigenvector number is hopefully applied in your test.

Related

Lukas Kanade optical flow: Understanding the math

I found a Matlab implementation of the LKT algorithm here and it is based on the brightness constancy equation.
The algorithm calculates the Image gradients in x and y direction by convolving the image with appropriate 2x2 horizontal and vertical edge gradient operators.
The brightness constancy equation in the classic literature has on its right hand side the difference between two successive frames.
However, in the implementation referred to by the aforementioned link, the right hand side is the difference of convolution.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
Why couldn't It_m be simply calculated as:
it_m = im1 - im2;
As you mentioned, in theory only pixel by pixel difference is stated for optical flow computation.
However, in practice, all natural (not synthetic) images contain some degree of noise. On the other hand, differentiating is some kind of high pass filter and would stress (high pass) noise ratio to the signal.
Therefore, to avoid artifact caused by noise, usually an image smoothing (or low pass filtering) is carried out prior to any image differentiating (we have such process in edge detection too). The code does exactly this, i.e. apply and moving average filter on the image to reduce noise effect.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
(Comments converted to an answer.)
In theory, there is nothing wrong with taking a pixel-wise difference:
Im_t = im1-im2;
to compute the time derivative. Using a spatial smoother when computing the time derivative mitigates the effect of noise.
Moreover, looking at the way that code computes spatial (x and y) derivatives:
Ix_m = conv2(im1,[-1 1; -1 1], 'valid');
computing the time derivate with a similar kernel and the valid option ensures the matrices It_x, It_y and Im_t have compatible sizes.
The temporal partial derivative(along t), is connected to the spatial partial derivatives (along x and y).
Think of the video sequence you are analyzing as a volume, spatio-temporal volume. At any given point (x,y,t), if you want to estimate partial derivatives, i.e. estimate the 3D gradient at that point, then you will benefit from having 3 filters that have the same kernel support.
For more theory on why this should be so, look up the topic steerable filters, or better yet look up the fundamental concept of what partial derivative is supposed to be, and how it connects to directional derivatives.
Often, the 2D gradient is estimated first, and then people tend to think of the temporal derivative secondly as independent of the x and y component. This can, and very often do, lead to numerical errors in the final optical flow calculations. The common way to deal with those errors is to do a forward and backward flow estimation, and combine the results in the end.
One way to think of the gradient that you are estimating is that it has a support region that is 3D. The smallest size of such a region should be 2x2x2.
if you do 2D gradients in the first and second image both using only 2x2 filters, then the corresponding FIR filter for the 3D volume is collected by averaging the results of the two filters.
The fact that you should have the same filter support region in 2D is clear to most: thats why the Sobel and Scharr operators look the way they do.
You can see the sort of results you get from having sanely designed differential operators for optical flow in this Matlab toolbox that I made, in part to show this particular point.

MATLAB second-moments of a region

This is a follow-up question on the one below:
Second moments question
MATLAB's regionprops function estimates an ellipse from a given set of 2d-points. This is done by using the image moments, they claim to use normalized second central moments, the formulas also follow what is suggested by the wikipedia link on image moments.
Effectively the covariance matrix of the region is calculated (in a slightly more efficient way) and then the square root of the eigenvalues of this matrix are calculated and put out as the major and minor axes - with one change: They are multiplied by a factor of 4.
Why?
Essentially, covariance estimation assumes a multivariate normal distribution. However, an arbitrary image region is most likely not normally distributed, I would rather expect a factor based on the assumption that data is uniformly distributed. So what is the justification for choosing 4?
Meanwhile I found the answer: Factor 4 yields correct results for regions with an elliptical shape. For e.g. rectangular or non-solid regions, the estimated axis lengths are incorrect, and the error varies nonlinearly with changes in the region.

Hessian Matrix of the image

I am wondering what information does an Hessian Matrix of an image provides? Does it provide the information of the stable points? What is Hessian matrix used for?
Hessian matrix describes the 2nd order local image intensity variations around the selected voxel. For the obtained Hessian matrix, eigenvector decomposition extracts an orthonormal coordinate system that is aligned with the second order structure of the image. Having the eigenvalues and knowing the
(assumed) model of the structure to be detected and the resulting theoretical behavior of the eigenvalues, the decision can be made if the analyzed voxel belongs to the structure being searched.
The figure below illustrates the correspondence between eigenvalues of the hessian operation on the image and the local features (corner, edge, or flat region).
The Hessian operator is also widely used in 3D images, and it can reflect more local features:
It is widely used in vessel detection in medical images. For more details, please see M.Rudzki et al's Vessel Detection Method Based on Eigenvalues of the Hessian Matrix and its Applicability to Airway Tree Segmentation

dimension reduction using pca for FastICA

I am trying to develop a system for image classification. I am using following the article:
INDEPENDENT COMPONENT ANALYSIS (ICA) FOR TEXTURE CLASSIFICATION by Dr. Dia Abu Al Nadi and Ayman M. Mansour
In a paragraph it says:
Given the above texture images, the Independent Components are learned by the method outlined above.
The (8 x 8) ICA basis function for the above textures are shown in Figure 2. respectively. The dimension is reduced by PCA, resulting in a total of 40 functions. Note that independent components from different windows size are different.
The "method outlined above" is FastICA, the textures are taken from Brodatz album , each texture image has 640x640 pixels. My question is:
What the authors means with "The dimension is reduced by PCA, resulting in a total of 40 functions.", and how can I get that functions using matlab?
PCA (Principal Component Analysis) is a method for finding an orthogonal basis (think of a coordinate system) for a high-dimensional (data) space. The "axes" of the PCA basis are sorted by variance, i.e. along the first PCA "axis" your data has the largest variance, along the second "axis" the second largest variance, etc.
This is exploited for dimension reduction: Say you have 1000 dimensional data. Then you do a PCA, transform your data into the PCA basis and throw away all but the first 20 dimensions (just an example). If your data follows a certain statistical distribution, then chances are that the 20 PCA dimensions describe your data almost as well as the 64 original dimensions did. There are methods for finding the number of dimensions to use, but that is beyond scope here.
Computationally, PCA amounts to finding the Eigen-decomposition of your data's covariance matrix, in Matlab: [V,D] = eig(cov(MyData)).
Note that if you want to work with these concepts you should do some serious reading. A classic article on what you can do with PCA on image data is Turk and Pentland's Eigenfaces. It also gives some background in an understandable way.
PCA reduce the dimension of data,ICA extracts the components of the data of which dimension must <=
data dimension

relation between support vectors and accuracy in case of RBF kernel

I am using RBF kernel matlab function.
On couple of dataset as I go on increasing sigma value the number of support vectors increase and accuracy increases.
While in case of one data set, as I increase the sigma value, the support vectors decrease and accuracy increases.
I am not able to analyze the relation between support vectors and accuracy in case of RBF kernel.
The number of support vectors doesn't have a direct relationship to accuracy; it depends on the shape of the data (and your C/nu parameter).
Higher sigma means that the kernel is a "flatter" Gaussian and so the decision boundary is "smoother"; lower sigma makes it a "sharper" peak, and so the decision boundary is more flexible and able to reproduce strange shapes if they're the right answer. If sigma is very high, your data points will have a very wide influence; if very low, they will have a very small influence.
Thus, often, increasing the sigma values will result in more support vectors: for more-or-less the same decision boundary, more points will fall within the margin, because points become "fuzzier." Increased sigma also means, though, that the slack variables "moving" points past the margin are more expensive, and so the classifier might end up with a much smaller margin and fewer SVs. Of course, it also might just give you a dramatically different decision boundary with a completely different number of SVs.
In terms of maximizing accuracy, you should be doing a grid search on many different values of C and sigma and choosing the one that gives you the best performance on e.g. 3-fold cross-validation on your training set. One reasonable approach is to choose from e.g. 2.^(-9:3:18) for C and median_eval * 2.^(-4:2:10); those numbers are fairly arbitrary, but they're ones I've used with success in the past.