How To Do Linearly Separable Binary Classification? - matlab

I want to solve following optimization problem -
Cost Function: 1/2 ||W||^2
Subject to : Y_i(w.X_i - b) >= 1
Where X is a 700x3 matrix, Y is a vector stores the label of classes for those instances (valued as 1/-1) and w.X_i is the dot product of w and X_i.
I am using CVX -
cvx_begin
variable W(3);
variable B;
minimize (0.5*W'*W)
subject to
Y'*(X*W - B) >= 1;
cvx_end
then, I am plotting, w1.x1 + w2.x2 - b
which does not seem to be separating hyper-plane?
Whats wrong am I doing?

In short:
when you are doing w1.x1 + w2.x2 - b you are trying to specify a hyperplane at a particular location, which is also the same as specifying a particular point on a vector. To do either in a 3D space you need to use all three dimensions, so: w1.x1 + w2.x2 +w3.x3 - b
In longer:
When performing a linear classification such as this, the task can be viewed in two ways:
Finding a separating hyperplane such that all samples of one class are on one side, and all samples of the other class are on the other side.
Finding a projection of the multidimensional space which the samples are in, into a single dimensional line, such that there is a point on the line which clearly separates them.
These are identical tasks, since the single dimension in 2 is essentially how far each sample is from the separating hyperplane (and which side said sample is on). I find it helps to bear both of these viewpoints in mind, particularly since the separating hyperplane is the plane orthogonal to the single dimensional vector.
So, in the case you are dealing with, the weight vector w provided by the model is used to project the samples in matrix X onto a single dimensional line and the offset b indicates at which point along this vector the separating hyperplane occurs. By subtracting b from the projected values they are shifted such that this hyperplane is the one orthogonal to the line at point 0 which makes for simple thresholding.

Related

MATLAB - Changepoint analysis or "findchangepts": How does it work?

I am using the function findchangepts and use 'linear' which detects changes in mean and slope. How does it note a change? Is it by consecutive points until the next point has a different mean and slope?
Mathworks has the following explanation:
If x is a vector with N elements, then findchangepts partitions x into two regions, x(1:ipt-1) and x(ipt:N), that minimize the sum of the residual (squared) error of each region from its local mean.
How does the function get ipt?
Thanks in advance!
I am working with a single vector with N elements.

Rotate a basis to align to vector

I have a matrix M of size NxP. Every P columns are orthogonal (M is a basis). I also have a vector V of size N.
My objective is to transform the first vector of M into V and to update the others in order to conservate their orthogonality. I know that the origins of V and M are the same, so it is basically a rotation from a certain angle. I assume we can find a matrix T such that T*M = M'. However, I can't figure out the details of how to do it (with MATLAB).
Also, I know there might be an infinite number of transforms doing that, but I'd like to get the simplest one (in which others vectors of M approximately remain the same, i.e no rotation around the first vector).
A small picture to illustrate. In my actual case, N and P can be large integers (not necessarily 3):
Thanks in advance for your help!
[EDIT] Alternative solution to Gram-Schmidt (accepted answer)
I managed to get a correct solution by retrieving a rotation matrix R by solving an optimization problem minimizing the 2-norm between M and R*M, under the constraints:
V is orthogonal to R*M[1] ... R*M[P-1] (i.e V'*(R*M[i]) = 0)
R*M[0] = V
Due to the solver constraints, I couldn't indicate that R*M[0] ... R*M[P-1] are all pairwise orthogonal (i.e (R*M)' * (R*M) = I).
Luckily, it seems that with this problem and with my solver (CVX using SDPT3), the resulting R*M[0] ... R*M[P-1] are also pairwise orthogonal.
I believe you want to use the Gram-Schmidt process here, which finds an orthogonal basis for a set of vectors. If V is not orthogonal to M[0], you can simply change M[0] to V and run Gram-Schmidt, to arrive at an orthogonal basis. If it is orthogonal to M[0], instead change another, non-orthogonal vector such as M[1] to V and swap the columns to make it first.
Mind you, the vector V needs to be in the column space of M, or you will always have a different basis than you had before.
Matlab doesn't have a built-in Gram-Schmidt command, although you can use the qr command to get an orthogonal basis. However, this won't work if you need V to be one of the vectors.
Option # 1 : if you have some vector and after some changes you want to rotate matrix to restore its orthogonality then, I believe, this method should work for you in Matlab
http://www.mathworks.com/help/symbolic/mupad_ref/numeric-rotationmatrix.html
(edit by another user: above link is broken, possible redirect: Matrix Rotations and Transformations)
If it does not, then ...
Option # 2 : I did not do this in Matlab but a part of another task was to find Eigenvalues and Eigenvectors of the matrix. To achieve this I used SVD. Part of SVD algorithm was Jacobi Rotation. It says to rotate the matrix until it is almost diagonalizable with some precision and invertible.
https://math.stackexchange.com/questions/222171/what-is-the-difference-between-diagonalization-and-orthogonal-diagonalization
Approximate algorithm of Jacobi rotation in your case should be similar to this one. I may be wrong at some point so you will need to double check this in relevant docs :
1) change values in existing vector
2) compute angle between actual and new vector
3) create rotation matrix and ...
put Cosine(angle) to diagonal of rotation matrix
put Sin(angle) to the top left corner of the matric
put minus -Sin(angle) to the right bottom corner of the matrix
4) multiple vector or matrix of vectors by rotation matrix in a loop until your vector matrix is invertible and diagonalizable, ability to invert can be calculated by determinant (check for singularity) and orthogonality (matrix is diagonalized) can be tested with this check - if Max value in LU matrix is less then some constant then stop rotation, at this point new matrix should contain only orthogonal vectors.
Unfortunately, I am not able to find exact pseudo code that I was referring to in the past but these links may help you to understand Jacobi Rotation :
http://www.physik.uni-freiburg.de/~severin/fulltext.pdf
http://web.stanford.edu/class/cme335/lecture7.pdf
https://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2003/rapporter03/maleko_mercy_03003.pdf

How can I generate a set of n dimensional vectors that contains all integer points in an n-dimensional rectangular prism

Okay, so I'm working on a problem related to quantum chaos and one of the things I need to do is to map the unit cube in n-dimensions to a parallelepiped in n-dimensions and find all integer points in the interior of this parallelepiped. I have been trying to do this using the following scheme:
Given the linear map B and the dimension of the cube n, we find the coordinates of the corners of the unit hypercube by converting numbers j from 0 to (2^n -1) into their binary representation and turning them into vectors that describe the vertices of the cube.
The next step was to apply the map B to each of these vectors, which gives me a set of 2^n vectors describing the coordinates of the vertices of the parallelepiped in n dimensions
Now, we take the maximum and minimum value attained by any of these vertices in each coordinate direction, i.e the first element of my vectors might have a maximum value of 4 across all of the vertices and a minimum value of -3 etc. This gives me an n-dimensional rectangular prism that contains my parallelepiped and some extra unwanted space.
I now find all points with integer coordinates in this bounding rectangular prism described as vectors in n dimensions
Finally, I apply the inverse of the map B to each of the points and throw away any points that have any coefficients greater than 1 as they must originally have lain outside my unit hypercube.
My issue arises in step 4, I'm struggling to come up with a way of generating all vectors with integer coordinates in my rectangular hyper-prism such that I can change the number of dimensions n on the fly. Ideally, i'd like to be able to increase n at will until it becomes too computationally heavy to do so, but every method of finding all integer points in the prism i've tried so far has relied on n for loops to permute each element and thus I need to rewrite the code every time.
So I guess my question is this, is there any way to code this up so that I can change n on the fly? Also, any thoughts on the idea of the algorithm itself would be appreciated :) It wouldn't surprise me if i've overcomplicated things massively...
EDIT:
Of course as soon as I post the question I see a lovely little link in the side-bar where a clever method has been given already for how to do this: Generate a matrix containing all combinations of elements taken from n vectors
I'll leave this up for the moment just in case anyone has any comments on the method in general, but otherwise (since I can't upvote yet I'll just say it here) Luis Mendo, you are a hero!

gmdistribution for classification in Matlab

Let's assume I have two gmdistibution models that i obtained using
modeldata1=gmdistribution.fit(data1,1);
modeldata2=gmdistribution.fit(data2,1);
Now I have an unknown 'data' observation, and I want to see if it belongs to data1 or data2.
Based on my understanding of these functions, nlogn output using posterior,cluster, or pdf commands wouldn't be a good measure since I am comparing 'data' to two different distributions.
What measure or output should I use find what is the p(data|modeldata1) and p(data|modeldata2) ?
Many thanks,
If I understand you correctly, you want to assign a new, unknown, datapoint to either class 1 or class 2 with the descriptors for each class (in this case the mean vector and covariance matrix) found by gmdistribution.fit.
In seeing this new datapoint, lets call it x, you should ask yourself what is
p(modeldata1 | x) and p(modeldata2 | x) and which ever one of these is the highest you should assign x to.
So how do you find these? You just apply Bayes rule and pick which ever one is the largest of:
p(modeldata1 | x) = p(x|modeldata1)p(modeldata1)/p(x)
p(modeldata1 | x) = p(x|modeldata2)p(modeldata2)/p(x)
Here you dont need to calculate p(x) as it is the same in each equation.
So, now you estimate the priors p(modeldata1) and p(modeldata2) by the number of training points from each class (or use some given information) and then calculate
p(x|modeldata1)=1/((2pi)^d/2 * sqrt(det(Sigma1)))*exp(0.5*(x-mu1)/Sigma1*(x-mu1))
where d is the dimensionality of your data, Sigma is a corvariance matrix, and mu is a mean vector. This is then your asked for p(data|modeldata1). (Just remember to also use p(modeldata1) and p(modeldata2) when you do the classification).
I know this was a bit unclear, but hopefully it can help you with a step in the right direction.
EDIT: Personally, I find a visualization such as the one below (takes from Pattern Recognition by Theodoridis and Koutroumbas). Here you have two gaussian mixtures with some priors and different covariance matrices. The blue area is where you would choose one class, while the gray area is where the other would be choosen.

Matlab: how to find which variables from dataset could be discarded using PCA in matlab?

I am using PCA to find out which variables in my dataset are redundand due to being highly correlated with other variables. I am using princomp matlab function on the data previously normalized using zscore:
[coeff, PC, eigenvalues] = princomp(zscore(x))
I know that eigenvalues tell me how much variation of the dataset covers every principal component, and that coeff tells me how much of i-th original variable is in the j-th principal component (where i - rows, j - columns).
So I assumed that to find out which variables out of the original dataset are the most important and which are the least I should multiply the coeff matrix by eigenvalues - coeff values represent how much of every variable each component has and eigenvalues tell how important this component is.
So this is my full code:
[coeff, PC, eigenvalues] = princomp(zscore(x));
e = eigenvalues./sum(eigenvalues);
abs(coeff)/e
But this does not really show anything - I tried it on a following set, where variable 1 is fully correlated with variable 2 (v2 = v1 + 2):
v1 v2 v3
1 3 4
2 4 -1
4 6 9
3 5 -2
but the results of my calculations were following:
v1 0.5525
v2 0.5525
v3 0.5264
and this does not really show anything. I would expect the result for variable 2 show that it is far less important than v1 or v3.
Which of my assuptions is wrong?
EDIT I have completely reworked the answer now that I understand which assumptions were wrong.
Before explaining what doesn't work in the OP, let me make sure we'll have the same terminology. In principal component analysis, the goal is to obtain a coordinate transformation that separates the observations well, and that may make it easy to describe the data , i.e. the different multi-dimensional observations, in a lower-dimensional space. Observations are multidimensional when they're made up from multiple measurements. If there are fewer linearly independent observations than there are measurements, we expect at least one of the eigenvalues to be zero, because e.g. two linearly independent observation vectors in a 3D space can be described by a 2D plane.
If we have an array
x = [ 1 3 4
2 4 -1
4 6 9
3 5 -2];
that consists of four observations with three measurements each, princomp(x) will find the lower-dimensional space spanned by the four observations. Since there are two co-dependent measurements, one of the eigenvalues will be near zero, since the space of measurements is only 2D and not 3D, which is probably the result you wanted to find. Indeed, if you inspect the eigenvectors (coeff), you find that the first two components are extremely obviously collinear
coeff = princomp(x)
coeff =
0.10124 0.69982 0.70711
0.10124 0.69982 -0.70711
0.9897 -0.14317 1.1102e-16
Since the first two components are, in fact, pointing in opposite directions, the values of the first two components of the transformed observations are, on their own, meaningless: [1 1 25] is equivalent to [1000 1000 25].
Now, if we want to find out whether any measurements are linearly dependent, and if we really want to use principal components for this, because in real life, measurements my not be perfectly collinear and we are interested in finding good vectors of descriptors for a machine-learning application, it makes a lot more sense to consider the three measurements as "observations", and run princomp(x'). Since there are thus three "observations" only, but four "measurements", the fourth eigenvector will be zero. However, since there are two linearly dependent observations, we're left with only two non-zero eigenvalues:
eigenvalues =
24.263
3.7368
0
0
To find out which of the measurements are so highly correlated (not actually necessary if you use the eigenvector-transformed measurements as input for e.g. machine learning), the best way would be to look at the correlation between the measurements:
corr(x)
ans =
1 1 0.35675
1 1 0.35675
0.35675 0.35675 1
Unsurprisingly, each measurement is perfectly correlated with itself, and v1 is perfectly correlated with v2.
EDIT2
but the eigenvalues tell us which vectors in the new space are most important (cover the most of variation) and also coefficients tell us how much of each variable is in each component. so I assume we can use this data to find out which of the original variables hold the most of variance and thus are most important (and get rid of those that represent small amount)
This works if your observations show very little variance in one measurement variable (e.g. where x = [1 2 3;1 4 22;1 25 -25;1 11 100];, and thus the first variable contributes nothing to the variance). However, with collinear measurements, both vectors hold equivalent information, and contribute equally to the variance. Thus, the eigenvectors (coefficients) are likely to be similar to one another.
In order for #agnieszka's comments to keep making sense, I have left the original points 1-4 of my answer below. Note that #3 was in response to the division of the eigenvectors by the eigenvalues, which to me didn't make a lot of sense.
the vectors should be in rows, not columns (each vector is an
observation).
coeff returns the basis vectors of the principal
components, and its order has little to do with the original input
To see the importance of the principal components, you use eigenvalues/sum(eigenvalues)
If you have two collinear vectors, you can't say that the first is important and the second isn't. How do you know that it shouldn't be the other way around? If you want to test for colinearity, you should check the rank of the array instead, or call unique on normalized (i.e. norm equal to 1) vectors.