I am trying to develop a system for image classification. I am using following the article:
INDEPENDENT COMPONENT ANALYSIS (ICA) FOR TEXTURE CLASSIFICATION by Dr. Dia Abu Al Nadi and Ayman M. Mansour
In a paragraph it says:
Given the above texture images, the Independent Components are learned by the method outlined above.
The (8 x 8) ICA basis function for the above textures are shown in Figure 2. respectively. The dimension is reduced by PCA, resulting in a total of 40 functions. Note that independent components from different windows size are different.
The "method outlined above" is FastICA, the textures are taken from Brodatz album , each texture image has 640x640 pixels. My question is:
What the authors means with "The dimension is reduced by PCA, resulting in a total of 40 functions.", and how can I get that functions using matlab?
PCA (Principal Component Analysis) is a method for finding an orthogonal basis (think of a coordinate system) for a high-dimensional (data) space. The "axes" of the PCA basis are sorted by variance, i.e. along the first PCA "axis" your data has the largest variance, along the second "axis" the second largest variance, etc.
This is exploited for dimension reduction: Say you have 1000 dimensional data. Then you do a PCA, transform your data into the PCA basis and throw away all but the first 20 dimensions (just an example). If your data follows a certain statistical distribution, then chances are that the 20 PCA dimensions describe your data almost as well as the 64 original dimensions did. There are methods for finding the number of dimensions to use, but that is beyond scope here.
Computationally, PCA amounts to finding the Eigen-decomposition of your data's covariance matrix, in Matlab: [V,D] = eig(cov(MyData)).
Note that if you want to work with these concepts you should do some serious reading. A classic article on what you can do with PCA on image data is Turk and Pentland's Eigenfaces. It also gives some background in an understandable way.
PCA reduce the dimension of data,ICA extracts the components of the data of which dimension must <=
data dimension
Related
Frequently linear interpolation is used with a Gaussian or uniform prior which has unit variance and zero mean where the size of the vector can be defined in an arbitrary way e.g. 100 to generate initial random vectors for generator model in Generative Adversarial Neural (GAN).
Let's say we have 1000 images for training and batch size is 64. Then each epoch, need to generate a number of random vectors using prior distribution corresponding to each image given small batch. But the problem I see is that since there is no mapping between random vector and corresponding image, the same image can be generated using multiple initial random vectors. In this paper, it suggests overcoming this problem by using different spherical interpolation up to some extent.
So what will happens if initially generate random vectors corresponding to the number of training images and when train the model uses the same random vector which is generated initially?
In GANs the random seed used as input does not actually correspond to any real input image. What GANs actually do is learn a transformation function from a known noise distribution (e.g. Gaussian) to a complex unknown distribution, which is representated by i.i.d. samples (e.g. your training set). What the discriminator in a GAN does is to calculate a divergence (e.g. Wasserstein divergence, KL-divergence, etc.) between the generated data (e.g. transformed gaussian) and the real data (your training data). This is done in a stochastic fashion and therefore no link is neccessary between the real and the fake data. If you want to learn more about this on a hands on example, I can recommend that you train to train a Wasserstein GAN to transform one 1D gaussian distribution into another one. There you can visualize the discriminator and the gradient of the discriminator and really see the dynamics of such a system.
Anyways, what your paper is trying to tell you is after you have trained your GAN and want to see how it has mapped the generated data from the known noise space to the unknown image space. For this reason interpolation schemes have been invented like the spherical one you are quoting. They also show that the GAN has learned to map some parts of the latent space to key characteristics in images, like smiles. But this has nothing to do with the training of GANs.
I have computed colour descriptors of a dataset of images and generated a 152×320 matrix (152 samples and 320 features). I would like to use PCA to reduce the dimensionality of my image descriptors space. I know that I could implement this using Matlab PCA built-in function but as I have just started learning about this concept I would like to implement the Matlab code without the built-in function so I can have a clear understanding how the function works. I tried to find how to do that online but all I could find is the either the general concept of PCA or the implementation of it with the built-in functions without explaining clearly how it works. Anyone could help me with a step by step instructions or a link that could explain a simple way on how to implement PCA for dimensionality reduction. The reason why I'm so confused is because there are so many uses for PCA and methods to implement it and the more I read about it the more confused I get.
PCA is basically taking the dominant eigen vectors of the data (Or better yet their projection of the dominant Eigen Vectors of the covariance matrix).
What you can do is use the SVD (Singular Value Decomposition).
To imitate MATLAB's pca() function here what you should do:
Center all features (Each column of your data should have zero mean).
Apply the svd() function on your data.
Use the V Matrix (Its columns) as your vectors to project your data on. Chose the number of columns to use according to the dimension of the data you'd like to have.
The projected data is now you new dimensionality reduction data.
I am using PCA for face recognition. I have obtained the eigenvectors / eigenfaces for each image, which is a colomn matrix. I want to know if selecting the first three eigenvectors , since their corresponding eigenvalues amount to 70% of total variance, will be sufficient for face recognition?
Firstly, lets be clear about a few things. The eigenvectors are computed from the covariance matrix formed from the entire dataset i.e., you reshape each grayscale image of a face into a single column and treat it as a point in R^d space, compute the covariance matrix from them and compute the eigenvectors of the covariance matrix. These eigenvectors become a new basis for your space of face images. You do not have eigenvectors for each image. Instead, you represent each face image in terms of the eigenvectors by projecting onto (a possibly subset) of them.
Limitations of eigenfaces
As to whether the representation of your face images under this new basis good enough for face recognition depends on many factors. But in general, the eigenfaces method does not perform well for real world unconstrained faces. It only works for faces which are pixel-wise aligned, facing frontal, and has fairly uniform illumination conditions across the images.
More is not necessarily better
While it is commonly believed (when using PCA) that retaining more variance is better than less, things are much more complicated than that because of two factors: 1) Noise in real world data and 2) dimensionality of data. Sometimes projecting to a lower dimension and losing variance can actually produce better results.
Conclusion
Hence, my answer is it is difficult to say whether retaining a certain amount of variance is enough beforehand. The number of dimensions (and hence the number of eigenvectors to keep and the associated variance retained) should be determined by cross-validation. But ultimately, as I have mentioned above, eigenfaces is not a good method for face recognition unless you have a "nice" dataset. You might be slightly better off using "Fisherfaces" i.e., LDA on the face images or combine these methods with Local Binary Pattern (LBP) as features (instead of raw face pixels). But seriously, face recognition is a difficult problem and in general the state-of-the-art has not reached a stage where it can be deployed in real world systems.
It's not impossible, but a little rare to me that only 3 eigenvalues can achieve 70% variance. How many training samples do you have (what is the total dimension)? Make sure you are reshape each image from the database into a vector, normalize the vector data then align them into a matrix. The eigenvalues/eigenvectors are obtained from the covariance of the matrix.
In theory, 70% variance should be enough to form a human-recognizable face with the corresponding eigenvectors. However, the optimal number of eigenvalues is better to get from cross-validation: you can try to increase 1 eigenvector each time, observe the face formation and the recognition accuracy. You can even plot the cross validation accuracy curve, there may be a sharp corner on the curve, then the corresponding eigenvector number is hopefully applied in your test.
I understand the concept of PCA, and what it's doing, but trying to apply the concept to my application is proving difficult.
I have a 1 by X matrix of a physiological signal (it's not EMG, but very similar, so think of it as EMG if it helps) which contains various noise and artefacts. What I've noticed of the noise is that some of it is very large and I would assume after PCA this would be the largest principal component, thus my idea of using PCA for some dimensional reduction.
My problem is that with a 1 by X matrix there is no covariance matrix, only the variance, and thus eigenvectors and all of PCA falls through.
I know I need to rearrange my data into a matrix more than 1D, but this is where I need some suggestions. Do I split my data into windows of equal length to create a large dimensional matrix which I can apply PCA to? Do I perform several trials of the same action so I have lots of data sets (this would be impractical for my application)?
Any suggestions or examples would be helpful. I'm using MATLAB to perform this task.
I have to write a classificator (gaussian mixture model) that I use for human action recognition.
I have 4 dataset of video. I choose 3 of them as training set and 1 of them as testing set.
Before I apply the gm model on the training set I run the pca on it.
pca_coeff=princomp(trainig_data);
score = training_data * pca_coeff;
training_data = score(:,1:min(size(score,2),numDimension));
During the testing step what should I do? Should I execute a new princomp on testing data
new_pca_coeff=princomp(testing_data);
score = testing_data * new_pca_coeff;
testing_data = score(:,1:min(size(score,2),numDimension));
or I should use the pca_coeff that I compute for the training data?
score = testing_data * pca_coeff;
testing_data = score(:,1:min(size(score,2),numDimension));
The classifier is being trained on data in the space defined by the principle components of the training data. It doesn't make sense to evaluate it in a different space - therefore, you should apply the same transformation to testing data as you did to training data, so don't compute a different pca_coef.
Incidently, if your testing data is drawn independently from the same distribution as the training data, then for large enough training and test sets, the principle components should be approximately the same.
One method for choosing how many principle components to use involves examining the eigenvalues from the PCA decomposition. You can get these from the princomp function like this:
[pca_coeff score eigenvalues] = princomp(data);
The eigenvalues variable will then be an array where each element describes the amount of variance accounted for by the corresponding principle component. If you do:
plot(eigenvalues);
you should see that the first eigenvalue will be the largest, and they will rapidly decrease (this is called a "Scree Plot", and should look like this: http://www.ats.ucla.edu/stat/SPSS/output/spss_output_pca_5.gif, though your one may have up to 800 points instead of 12).
Principle components with small corresponding eigenvalues are unlikely to be useful, since the variance of the data in those dimensions is so small. Many people choose a threshold value, and then select all principle components where the eigenvalue is above that threshold. An informal way of picking the threshold is to look at the Scree plot and choose the threshold to be just after the line 'levels out' - in the image I linked earlier, a good value might be ~0.8, selecting 3 or 4 principle components.
IIRC, you could do something like:
proportion_of_variance = sum(eigenvalues(1:k)) ./ sum(eigenvalues);
to calculate "the proportion of variance described by the low dimensional data".
However, since you are using the principle components for a classification task, you can't really be sure that any particular number of PCs is optimal; the variance of a feature doesn't necessarily tell you anything about how useful it will be for classification. An alternative to choosing PCs with the Scree plot is just to try classification with various numbers of principle components and see what the best number is empirically.