gmdistribution for classification in Matlab - matlab

Let's assume I have two gmdistibution models that i obtained using
modeldata1=gmdistribution.fit(data1,1);
modeldata2=gmdistribution.fit(data2,1);
Now I have an unknown 'data' observation, and I want to see if it belongs to data1 or data2.
Based on my understanding of these functions, nlogn output using posterior,cluster, or pdf commands wouldn't be a good measure since I am comparing 'data' to two different distributions.
What measure or output should I use find what is the p(data|modeldata1) and p(data|modeldata2) ?
Many thanks,

If I understand you correctly, you want to assign a new, unknown, datapoint to either class 1 or class 2 with the descriptors for each class (in this case the mean vector and covariance matrix) found by gmdistribution.fit.
In seeing this new datapoint, lets call it x, you should ask yourself what is
p(modeldata1 | x) and p(modeldata2 | x) and which ever one of these is the highest you should assign x to.
So how do you find these? You just apply Bayes rule and pick which ever one is the largest of:
p(modeldata1 | x) = p(x|modeldata1)p(modeldata1)/p(x)
p(modeldata1 | x) = p(x|modeldata2)p(modeldata2)/p(x)
Here you dont need to calculate p(x) as it is the same in each equation.
So, now you estimate the priors p(modeldata1) and p(modeldata2) by the number of training points from each class (or use some given information) and then calculate
p(x|modeldata1)=1/((2pi)^d/2 * sqrt(det(Sigma1)))*exp(0.5*(x-mu1)/Sigma1*(x-mu1))
where d is the dimensionality of your data, Sigma is a corvariance matrix, and mu is a mean vector. This is then your asked for p(data|modeldata1). (Just remember to also use p(modeldata1) and p(modeldata2) when you do the classification).
I know this was a bit unclear, but hopefully it can help you with a step in the right direction.
EDIT: Personally, I find a visualization such as the one below (takes from Pattern Recognition by Theodoridis and Koutroumbas). Here you have two gaussian mixtures with some priors and different covariance matrices. The blue area is where you would choose one class, while the gray area is where the other would be choosen.

Related

Should I perform data centering before apply SVD?

I have to use SVD in Matlab to obtain a reduced version of my data.
I've read that the function svds(X,k) performs the SVD and returns the first k eigenvalues and eigenvectors. There is not mention in the documentation if the data have to be normalized.
With normalization I mean both substraction of the mean value and division by the standard deviation.
When I implemented PCA, I used to normalize in such way. But I know that it is not needed when using the matlab function pca() because it computes the covariance matrix by using cov() which implicitly performs the normalization.
So, the question is. I need the projection matrix useful to reduce my n-dim data to k-dim ones by SVD. Should I perform data normalization of the train data (and therefore, the same normalization to further projected new data) or not?
Thanks
Essentially, the answer is yes, you should typically perform normalization. The reason is that features can have very different scalings, and we typically do not want to take scaling into account when considering the uniqueness of features.
Suppose we have two features x and y, both with variance 1, but where x has a mean of 1 and y has a mean of 1000. Then the matrix of samples will look like
n = 500; % samples
x = 1 + randn(n,1);
y = 1000 + randn(n,1);
svd([x,y])
But the problem with this is that the scale of y (without normalizing) essentially washes out the small variations in x. Specifically, if we just examine the singular values of [x,y], we might be inclined to say that x is a linear factor of y (since one of the singular values is much smaller than the other). But actually, we know that that is not the case since x was generated independently.
In fact, you will often find that you only see the "real" data in a signal once we remove the mean. At the extremely end, you could image that we have some feature
z = 1e6 + sin(t)
Now if somebody just gave you those numbers, you might look at the sequence
z = 1000001.54, 1000001.2, 1000001.4,...
and just think, "that signal is boring, it basically is just 1e6 plus some round off terms...". But once we remove the mean, we see the signal for what it actually is... a very interesting and specific one indeed. So long story short, you should always remove the means and scale.
It really depends on what you want to do with your data. Centering and scaling can be helpful to obtain principial components that are representative of the shape of the variations in the data, irrespective of the scaling. I would say it is mostly needed if you want to further use the principal components itself, particularly, if you want to visualize them. It can also help during classification since your scores will then be normalized which may help your classifier. However, it depends on the application since in some applications the energy also carries useful information that one should not discard - there is no general answer!
Now you write that all you need is "the projection matrix useful to reduce my n-dim data to k-dim ones by SVD". In this case, no need to center or scale anything:
[U,~] = svd(TrainingData);
RecudedData = U(:,k)'*TestData;
will do the job. The svds may be worth considering when your TrainingData is huge (in both dimensions) so that svd is too slow (if it is huge in one dimension, just apply svd to the gram matrix).
It depends!!!
A common use in signal processing where it makes no sense to normalize is noise reduction via dimensionality reduction in correlated signals where all the fearures are contiminated with a random gaussian noise with the same variance. In that case if the magnitude of a certain feature is twice as large it's snr is also approximately twice as large so normalizing the features makes no sense since it would just make the parts with the worse snr larger and the parts with the good snr smaller. You also don't need to subtract the mean in that case (like in PCA), the mean (or dc) isn't different then any other frequency.

How To Do Linearly Separable Binary Classification?

I want to solve following optimization problem -
Cost Function: 1/2 ||W||^2
Subject to : Y_i(w.X_i - b) >= 1
Where X is a 700x3 matrix, Y is a vector stores the label of classes for those instances (valued as 1/-1) and w.X_i is the dot product of w and X_i.
I am using CVX -
cvx_begin
variable W(3);
variable B;
minimize (0.5*W'*W)
subject to
Y'*(X*W - B) >= 1;
cvx_end
then, I am plotting, w1.x1 + w2.x2 - b
which does not seem to be separating hyper-plane?
Whats wrong am I doing?
In short:
when you are doing w1.x1 + w2.x2 - b you are trying to specify a hyperplane at a particular location, which is also the same as specifying a particular point on a vector. To do either in a 3D space you need to use all three dimensions, so: w1.x1 + w2.x2 +w3.x3 - b
In longer:
When performing a linear classification such as this, the task can be viewed in two ways:
Finding a separating hyperplane such that all samples of one class are on one side, and all samples of the other class are on the other side.
Finding a projection of the multidimensional space which the samples are in, into a single dimensional line, such that there is a point on the line which clearly separates them.
These are identical tasks, since the single dimension in 2 is essentially how far each sample is from the separating hyperplane (and which side said sample is on). I find it helps to bear both of these viewpoints in mind, particularly since the separating hyperplane is the plane orthogonal to the single dimensional vector.
So, in the case you are dealing with, the weight vector w provided by the model is used to project the samples in matrix X onto a single dimensional line and the offset b indicates at which point along this vector the separating hyperplane occurs. By subtracting b from the projected values they are shifted such that this hyperplane is the one orthogonal to the line at point 0 which makes for simple thresholding.

How to select top 100 features(a subset) which are most relevant after pca?

I performed PCA on a 63*2308 matrix and obtained a score and a co-efficient matrix. The score matrix is 63*2308 and the co-efficient matrix is 2308*2308 in dimensions.
How do i extract the column names for the top 100 features which are most important so that i can perform regression on them?
PCA should give you both a set of eigenvectors (your co-efficient matrix) and a vector of eigenvalues (1*2308) often referred to as lambda). You might been to use a different PCA function in matlab to get them.
The eigenvalues indicate how much of your data each eigenvector explains. A simple method for selecting features would be to select the 100 features with the highest eigen values. This gives you a set of feature which explain most of the variance in the data.
If you need to justify your approach for a write up you can actually calculate the amount of variance explained per eigenvector and cut of at, for example, 95% variance explained.
Bear in mind that selecting based solely on eigenvalue, might not correspond to the set of features most important to your regression, so if you don't get the performance you expect you might want to try a different feature selection method such as recursive feature selection. I would suggest using google scholar to find a couple of papers doing something similar and see what methods they use.
A quick matlab example of taking the top 100 principle components using PCA.
[eigenvectors, projected_data, eigenvalues] = princomp(X);
[foo, feature_idx] = sort(eigenvalues, 'descend');
selected_projected_data = projected(:, feature_idx(1:100));
Have you tried with
B = sort(your_matrix,2,'descend');
C = B(:,1:100);
Be careful!
With just 63 observations and 2308 variables, your PCA result will be meaningless because the data is underspecified. You should have at least (rule of thumb) dimensions*3 observations.
With 63 observations, you can at most define a 62 dimensional hyperspace!

Different fundamental matrix from the same projection matrices

I use two projection matrices P1 and P2 (for example I'm using dinosaur dataset) and I need to compute the fundamental matrix F.
So I use two Matlab functions:
Peter Kovesi's function: www.csse.uwa.edu.au/~pk/Research/MatlabFns/Projective/fundfromcameras.m
Zisserman: www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_F_from_P.m
These functions should do the same thing, but I have a different F value! How it's possible? Which is the right functions?
If two points X1 and X2 are "the same" in two different images, X2^TFX1 = 0 ...
So I found two corresponded points from two rotated images (5 degrees) by using SURF, but X2^TFX1 is never equal to zero with this two funtcions.
Any ideas?
Instead if I use this function that computes F from matches points:
ransac fit fundamental matrix by Peter Kovesi: ransacfitfundmatrix.m
I have that X2^TFX1 = 0 .... Obviously F is different from the two F I had with the other two functions...
Well for one thing, it's overwhelmingly likely that the points aren't perfectly rotated version of each other. SURF uses a lot of approximations, bi-linear interpolation and a whole slew of things that break true rotational invariance. So there might not exist such a fundamental matrix (if there's no linear relationship between the two sets of points.) Yes, this is true even after you do point matching.
That said, your X2^T*F*X1 should probably be small if the matching is really good, but I'd be surprise if it's ever exactly zero for any real image.
The fundamental matrix is unique only up to a scale.
So, even if you have different fundamental matrices, both can be correct for your images.

Matlab Principal Component Analysis (eigenvalues order)

I want to use the "princomp" function of Matlab but this function gives the eigenvalues in a sorted array. This way I can't find out to which column corresponds which eigenvalue.
For Matlab,
m = [1,2,3;4,5,6;7,8,9];
[pc,score,latent] = princomp(m);
is the same as
m = [2,1,3;5,4,6;8,7,9];
[pc,score,latent] = princomp(m);
That is, swapping the first two columns does not change anything. The result (eigenvalues) in latent will be: (27,0,0)
The information (which eigenvalue corresponds to which original (input) column) is lost.
Is there a way to tell matlab to not to sort the eigenvalues?
With PCA, each principle component returned will be a linear combination of the original columns/dimensions. Perhaps an example might clear up any misunderstanding you have.
Lets consider the Fisher-Iris dataset comprising of 150 instances and 4 dimensions, and apply PCA on the data. To make things easier to understand, I am first zero-centering the data before calling PCA function:
load fisheriris
X = bsxfun(#minus, meas, mean(meas)); %# so that mean(X) is the zero vector
[PC score latent] = princomp(X);
Lets look at the first returned principal component (1st column of PC matrix):
>> PC(:,1)
0.36139
-0.084523
0.85667
0.35829
This is expressed as a linear combination of the original dimensions, i.e.:
PC1 = 0.36139*dim1 + -0.084523*dim2 + 0.85667*dim3 + 0.35829*dim4
Therefore to express the same data in the new coordinates system formed by the principal components, the new first dimension should be a linear combination of the original ones according to the above formula.
We can compute this simply as X*PC which is the exactly what is returned in the second output of PRINCOMP (score), to confirm this try:
>> all(all( abs(X*PC - score) < 1e-10 ))
1
Finally the importance of each principal component can be determined by how much variance of the data it explains. This is returned by the third output of PRINCOMP (latent).
We can compute the PCA of the data ourselves without using PRINCOMP:
[V E] = eig( cov(X) );
[E order] = sort(diag(E), 'descend');
V = V(:,order);
the eigenvectors of the covariance matrix V are the principal components (same as PC above, although the sign can be inverted), and the corresponding eigenvalues E represent the amount of variance explained (same as latent). Note that it is customary to sort the principal component by their eigenvalues. And as before, to express the data in the new coordinates, we simply compute X*V (should be the same as score above, if you make sure to match the signs)
"The information (which eigenvalue corresponds to which original (input) column) is lost."
Since each principal component is a linear function of all input variables, each principal component (eigenvector, eigenvalue), corresponds to all of the original input columns. Ignoring possible changes in sign, which are arbitrary in PCA, re-ordering the input variables about will not change the PCA results.
"Is there a way to tell matlab to not to sort the eigenvalues?"
I doubt it: PCA (and eigen analysis in general) conventionally sorts the results by variance, though I'd note that princomp() sorts from greatest to least variance, while eig() sorts in the opposite direction.
For more explanation of PCA using MATLAB illustrations, with or without princomp(), see:
Principal Components Analysis