Plot SVM margins using MATLAB and libsvm - matlab

I am using svmlib to classify linearly two dimensional non-separable data. I am able to train the svm and obtain w and b using svmlib. Using this information I can plot the decision boundary, along with the support vectors, but I am not sure about how to plot the margins, using the information that svmlib gives me.
Below is my code:
model = svmtrain(Y,X, '-s 0 -t 0 -c 100');
w = model.SVs' * model.sv_coef;
b = -model.rho;
if (model.Label(1) == -1)
w = -w; b = -b;
end
y_hat = sign(w'*X' + b);
sv = full(model.SVs);
% plot support vectors
plot(sv(:,1),sv(:,2),'ko', 'MarkerSize', 10);
% plot decision boundary
plot_x = linspace(min(X(:,1)), max(X(:,1)), 30);
plot_y = (-1/w(2))*(w(1)*plot_x + b);
plot(plot_x, plot_y, 'k-', 'LineWidth', 1)

It depends on what you mean by "the margins". It also depends on what SVM version you are talking about (separable on non-separable), but since you mentioned libsvm I'll assume you mean the more general, non-separable version.
The term "margin" can refer to the Euclidean distance from the separating hyperplane to the hyperplane defined by wx+b=1 (or wx+b=-1). This distance is given by 1/norm(w).
"Margin" can also refer to the margin of a specific sample x, which is the Euclidean distance of x from the separating hyperplane. It is given by
(wx+b)/norm(w)
note that this is a signed distance, that is it is negative/positive, depending on which side of the hyperplane the point x resides. You can draw it as a line from the point, perpendicular to the hyperplane.
Another interesting value is the slack variable xi, which is the "algebraic" distance (not Euclidean) of a support vector from the "hard" margin defined by wx+b=+1 (or -1). It is positive only for support vectors, and if a point is not a support vector, its xi equals 0. More compactly:
xi = max(0, 1 - y*(w'*x+b))
where y is the label.

Related

Matlab- Given matrix X with xi samples, y binary column vector, and a vector w plot all these into 3d graph

I have started to learn Machine Learning, and programming in matlab.
I want to plot a matrix sized m*d where d=3 and m are the number of points.
with y binary vector I'd like to color each point with blue/red.
and plot a plane which is described with the vertical vector to it w.
The problem I trying to solve is to give some kind of visual representation of the data and the linear predictor.
All I know is how to single points with plot3, but no any number of points.
Thanks.
Plot the points using scatter3()
scatter3(X(y,1),X(y,2),X(y,3),'filled','fillcolor','red');
hold on;
scatter3(X(~y,1),X(~y,2),X(~y,3),'filled','fillcolor','blue');
or using plot3()
plot(X(y,1),X(y,2),X(y,3),' o','MarkerEdgeColor','red','MarkerFaceColor','red');
hold on;
plot(X(~y,1),X(~y,2),X(~y,3),' o','MarkerEdgeColor','blue','MarkerFaceColor','blue');
There are a few ways to plot a plane. As long as w(3) isn't very close to 0 then the following will work okay. I'm assuming your plane is defined by x'*w+b=0 where b is a scalar and w and x are column vectors.
x1min = min(X(:,1)); x2min = min(X(:,2));
x1max = max(X(:,1)); x2max = max(X(:,2));
[x1,x2] = meshgrid(linspace(x1min,x1max,20), linspace(x2min, x2max, 20));
x3 = -(w(1)*x1 + w(2)*x2 + b)/w(3);
surf(x1,x2,x3,'FaceColor',[0.6,0.6,0.6],'FaceAlpha',0.7,'EdgeColor',[0.4,0.4,0.4],'EdgeAlpha',0.4);
xlabel('x_1'); ylabel('x_2'); zlabel('x_3'); axis('vis3d');
Resulting plot

Unnormalization of ellipse coefficients after direct ellipse fitting

I am trying to understand the normalization and "unnormalization" steps in the direct least squares ellipse fitting algorithm developed by Fitzgibbon, Pilu and Fisher (improved by Halir and Flusser).
EDITED: More details about theory added. Is eigenvalue problem where confusion stems from?
Short theory:
The ellipse is represented by an implicit second order polynomial (general conic equation):
where:
To constrain this general conic to an ellipse, the coefficients must satisfy the quadratic constraint:
which is equivalent to:
where C is a matrix of zeros except:
The design matrix D is composed of all data points x sub i.
The minimization of the distance between a conic and the data points can be expressed by a generalized eigenvalue problem (some theory has been omitted):
Denoting:
We now have the system:
If we solve this system, the eigenvector corresponding to the single positive eigenvalue is the correct answer.
The code:
The code snippets here are directly from the MATLAB code provided by the authors:
http://research.microsoft.com/en-us/um/people/awf/ellipse/fitellipse.html
The data input is a series of (x,y) points. The points are normalized by subtracting the mean and dividing by the standard deviation (in this case, computed as half the range). I'm assuming this normalization allows for a better fit of the data.
% normalize data
% X and Y are the vectors of data points, not normalized
mx = mean(X);
my = mean(Y);
sx = (max(X)-min(X))/2;
sy = (max(Y)-min(Y))/2;
x = (X-mx)/sx;
y = (Y-my)/sy;
% Build design matrix
D = [ x.*x x.*y y.*y x y ones(size(x)) ];
% Build scatter matrix
S = D'*D; %'
% Build 6x6 constraint matrix
C(6,6) = 0; C(1,3) = -2; C(2,2) = 1; C(3,1) = -2;
[gevec, geval] = eig(S,C);
% Find the negative eigenvalue
I = find(real(diag(geval)) < 1e-8 & ~isinf(diag(geval)));
% Extract eigenvector corresponding to negative eigenvalue
A = real(gevec(:,I));
After this, the normalization is reversed on the coefficients:
par = [
A(1)*sy*sy, ...
A(2)*sx*sy, ...
A(3)*sx*sx, ...
-2*A(1)*sy*sy*mx - A(2)*sx*sy*my + A(4)*sx*sy*sy, ...
-A(2)*sx*sy*mx - 2*A(3)*sx*sx*my + A(5)*sx*sx*sy, ...
A(1)*sy*sy*mx*mx + A(2)*sx*sy*mx*my + A(3)*sx*sx*my*my ...
- A(4)*sx*sy*sy*mx - A(5)*sx*sx*sy*my ...
+ A(6)*sx*sx*sy*sy ...
]';
At this point, I'm not sure what happened. Why is the unnormalization of the last three coefficients of A (d, e, f) dependent on the first three coefficients? How do you mathematically show where these unnormalization equations come from?
The 2 and 1 coefficients in the unnormalization lead me to believe the constraint matrix must be involved somehow.
Please let me know if more detail is needed on the method...it seems I'm missing how the normalization has propagated through the matrices and eigenvalue problem.
Any help is appreciated. Thanks!
At first, let me formalize the problem in a homogeneous space (as used in Richard Hartley and Andrew Zisserman's book Multiple View Geometry):
Assume that,
P=[X,Y,1]'
is our point in the unnormalized space, and
p=lambda*[x,y,1]'
is our point in the normalized space, where lambda is an unimportant free scale (in homogeneous space [x,y,1]=[10*x,10*y,10] and so on).
Now it is clear that we can write
x = (X-mx)/sx;
y = (Y-my)/sy;
as a simple matrix equation like:
p=H*P; %(equation (1))
where
H=[1/sx, 0, -mx/sx;
0, 1/sy, -my/sy;
0, 0, 1];
Also we know that an ellipse with the equation
A(1)*x^2 + A(2)*xy + A(3)*y^2 + A(4)*x + A(5)*y + A(6) = 0 %(first representation)
can be written in matrix form as:
p'*C*p=0 %you can easily verify this by matrix multiplication
where
C=[A(1), A(2)/2, A(4)/2;
A(2)/2, A(3), A(5)/2;
A(4)/2, A(5)/2, A(6)]; %second representation
and
p=[x,y,1]
and it is clear that these two representations of an ellipse are exactly the same and equivalent.
Also we know that the vector A=[A(1),A(2),A(3),A(4),A(5),A(6)] is a type-1 representation of the ellipse in the normalized space.
So we can write:
p'*C*p=0
where p is the normalized point and C is as defined previously.
Now we can use the "equation (1): p=HP" to derive some good result:
(H*P)'*C*(H*P)=0
=====>
P'*H'*C*H*P=0
=====>
P'*(H'*C*H)*P=0
=====>
P'*(C1)*P=0 %(equation (2))
We see that the equation (2) is an equation of an ellipse in the unnormalized space where C1 is the type-2 representation of ellipse and we know that:
C1=H'*C*H
Ans also, because the equation (2) is a zero equation we can multiply it by any non-zero number. So we multiply it by sx^2*sy^2 and we can write:
C1=sx^2*sy^2*H'*C*H
And finally we get the result
C1=[ A(1)*sy^2, (A(2)*sx*sy)/2, (A(4)*sx*sy^2)/2 - A(1)*mx*sy^2 - (A(2)*my*sx*sy)/2;
(A(2)*sx*sy)/2, A(3)*sx^2, (A(5)*sx^2*sy)/2 - A(3)*my*sx^2 - (A(2)*mx*sx*sy)/2;
-(- (A(4)*sx^2*sy^2)/2 + (A(2)*my*sx^2*sy)/2 + A(1)*mx*sx*sy^2)/sx, -(- (A(5)*sx^2*sy^2)/2 + A(3)*my*sx^2*sy + (A(2)*mx*sx*sy^2)/2)/sy, (mx*(- (A(4)*sx^2*sy^2)/2 + (A(2)*my*sx^2*sy)/2 + A(1)*mx*sx*sy^2))/sx + (my*(- (A(5)*sx^2*sy^2)/2 + A(3)*my*sx^2*sy + (A(2)*mx*sx*sy^2)/2))/sy + A(6)*sx^2*sy^2 - (A(4)*mx*sx*sy^2)/2 - (A(5)*my*sx^2*sy)/2]
which can be transformed into the type-2 ellipse and get the exact result we were looking for:
[ A(1)*sy^2, A(2)*sx*sy, A(3)*sx^2, A(4)*sx*sy^2 - 2*A(1)*mx*sy^2 - A(2)*my*sx*sy, A(5)*sx^2*sy - 2*A(3)*my*sx^2 - A(2)*mx*sx*sy, A(2)*mx*my*sx*sy + A(1)*mx*my*sy^2 + A(3)*my^2*sx^2 + A(6)*sx^2*sy^2 - A(4)*mx*sx*sy^2 - A(5)*my*sx^2*sy]
If you are curious how I managed to caculate these time-consuming equations I can give you the matlab code to do it for you as follows:
syms sx sy mx my
syms a b c d e f
C=[a, b/2, d/2;
b/2, c, e/2;
d/2, e/2, f];
H=[1/sx, 0, -mx/sx;
0, 1/sy, -my/sy;
0, 0, 1];
C1=sx^2*sy^2*H.'*C*H
a=[Cp(1,1), 2*Cp(1,2), Cp(2,2), 2*Cp(1,3), 2*Cp(2,3), Cp(3,3)]

Using nlinfit to fit Gaussian to x,y paired data

I am trying to use Matlab's nlinfit function to estimate the best fitting Gaussian for x,y paired data. In this case, x is a range of 2D orientations and y is the probability of a "yes" response.
I have copied #norm_funct from relevant posts and I'd like to return a smoothed, normal distribution that best approximates the observed data in y, and returns the magnitude, mean and SD of the best fitting pdf. At the moment, the fitted function appears to be incorrectly scaled and less than smooth - any help much appreciated!
x = -30:5:30;
y = [0,0.20,0.05,0.15,0.65,0.85,0.88,0.80,0.55,0.20,0.05,0,0;];
% plot raw data
figure(1)
plot(x, y, ':rs');
axis([-35 35 0 1]);
% initial paramter guesses (based on plot)
initGuess(1) = max(y); % amplitude
initGuess(2) = 0; % mean centred on 0 degrees
initGuess(3) = 10; % SD in degrees
% equation for Gaussian distribution
norm_func = #(p,x) p(1) .* exp(-((x - p(2))/p(3)).^2);
% use nlinfit to fit Gaussian using Least Squares
[bestfit,resid]=nlinfit(y, x, norm_func, initGuess);
% plot function
xFine = linspace(-30,30,100);
figure(2)
plot(x, y, 'ro', x, norm_func(xFine, y), '-b');
Many thanks
If your data actually represent probability estimates which you expect come from normally distributed data, then fitting a curve is not the right way to estimate the parameters of that normal distribution. There are different methods of different sophistication; one of the simplest is the method of moments, which means you choose the parameters such that the moments of the theoretical distribution match those of your sample. In the case of the normal distribution, these moments are simply mean and variance (or standard deviation). Here's the code:
% normalize y to be a probability (sum = 1)
p = y / sum(y);
% compute weighted mean and standard deviation
m = sum(x .* p);
s = sqrt(sum((x - m) .^ 2 .* p));
% compute theoretical probabilities
xs = -30:0.5:30;
pth = normpdf(xs, m, s);
% plot data and theoretical distribution
plot(x, p, 'o', xs, pth * 5)
The result shows a decent fit:
You'll notice the factor 5 in the last line. This is due to the fact that you don't have probability (density) estimates for the full range of values, but from points at distances of 5. In my treatment I assumed that they correspond to something like an integral over the probability density, e.g. over an interval [x - 2.5, x + 2.5], which can be roughly approximated by multiplying the density in the middle by the width of the interval. I don't know if this interpretation is correct for your data.
Your data follow a Gaussian curve and you describe them as probabilities. Are these numbers (y) your raw data – or did you generate them from e.g. a histogram over a larger data set? If the latter, the estimate of the distribution parameters could be improved by using the original full data.

"Frequency" shift in discrete FFT in MATLAB

(Disclaimer: I thought about posting this on math.statsexchange, but found similar questions there that were moved to SO, so here I am)
The context:
I'm using fft/ifft to determine probability distributions for sums of random variables.
So e.g. I'm having two uniform probability distributions - in the simplest case two uniform distributions on the interval [0,1].
So to get the probability distribution for the sum of two random variables sampled from these two distributions, one can calculate the product of the fourier-transformed of each probabilty density.
Doing the inverse fft on this product, you get back the probability density for the sum.
An example:
function usumdist_example()
x = linspace(-1, 2, 1e5);
dx = diff(x(1:2));
NFFT = 2^nextpow2(numel(x));
% take two uniform distributions on [0,0.5]
intervals = [0, 0.5;
0, 0.5];
figure();
hold all;
for i=1:size(intervals,1)
% construct the prob. dens. function
P_x = x >= intervals(i,1) & x <= intervals(i,2);
plot(x, P_x);
% for each pdf, get the characteristic function fft(pdf,NFFT)
% and form the product of all char. functions in Y
if i==1
Y = fft(P_x,NFFT) / NFFT;
else
Y = Y .* fft(P_x,NFFT) / NFFT;
end
end
y = ifft(Y, NFFT);
x_plot = x(1) + (0:dx:(NFFT-1)*dx);
plot(x_plot, y / max(y), '.');
end
My issue is, the shape of the resulting prob. dens. function is perfect.
However, the x-axis does not fit to the x I create in the beginning, but is shifted.
In the example, the peak is at 1.5, while it should be 0.5.
The shift changes if I e.g. add a third random variable or if I modify the range of x.
But I can't get figure how.
I'm afraid it might have to do with the fact that I'm having negative x values, while fourier transforms usually work in a time/frequency domain, where frequencies < 0 don't make sense.
I'm aware I could find e.g. the peak and shift it to its proper place, but seems nasty and error prone...
Glad about any ideas!
The problem is that your x origin is -1, not 0. You expect the center of the triangular pdf to be at .5, because that's twice the value of the center of the uniform pdf. However, the correct reasoning is: the center of the uniform pdf is 1.25 above your minimum x, and you get the center of the triangle at 2*1.25 = 2.5 above the minimum x (that is, at 1.5).
In other words: although your original x axis is (-1, 2), the convolution (or the FFT) behave as if it were (0, 3). In fact, the FFT knows nothing about your x axis; it only uses the y samples. Since your uniform is zero for the first samples, that zero interval of width 1 is amplified to twice its width when you do the convolution (or the FFT). I suggest drawing the convolution on paper to see this (draw original signal, reflected signal about y axis, displace the latter and see when both begin to overlap). So you need a correction in the x_plot line to compensate for this increased width of the zero interval: use
x_plot = 2*x(1) + (0:dx:(NFFT-1)*dx);
and then plot(x_plot, y / max(y), '.') will give the correct graph:

3D curvefitting

I have discrete regular grid of a,b points and their corresponding c values and I interpolate it further to get a smooth curve. Now from interpolation data, I further want to create a polynomial equation for curve fitting. How to fit 3D plot in polynomial?
I try to do this in MATLAB. I used Surface fitting toolbox in MATLAB (r2010a) to curve fit 3-dimensional data. But, how does one find a formula that fits a set of data to the best advantage in MATLAB/MAPLE or any other software. Any advice? Also most useful would be some real code examples to look at, PDF files, on the web etc.
This is just a small portion of my data.
a = [ 0.001 .. 0.011];
b = [1, .. 10];
c = [ -.304860225, .. .379710865];
Thanks in advance.
To fit a curve onto a set of points, we can use ordinary least-squares regression. There is a solution page by MathWorks describing the process.
As an example, let's start with some random data:
% some 3d points
data = mvnrnd([0 0 0], [1 -0.5 0.8; -0.5 1.1 0; 0.8 0 1], 50);
As #BasSwinckels showed, by constructing the desired design matrix, you can use mldivide or pinv to solve the overdetermined system expressed as Ax=b:
% best-fit plane
C = [data(:,1) data(:,2) ones(size(data,1),1)] \ data(:,3); % coefficients
% evaluate it on a regular grid covering the domain of the data
[xx,yy] = meshgrid(-3:.5:3, -3:.5:3);
zz = C(1)*xx + C(2)*yy + C(3);
% or expressed using matrix/vector product
%zz = reshape([xx(:) yy(:) ones(numel(xx),1)] * C, size(xx));
Next we visualize the result:
% plot points and surface
figure('Renderer','opengl')
line(data(:,1), data(:,2), data(:,3), 'LineStyle','none', ...
'Marker','.', 'MarkerSize',25, 'Color','r')
surface(xx, yy, zz, ...
'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.2)
grid on; axis tight equal;
view(9,9);
xlabel x; ylabel y; zlabel z;
colormap(cool(64))
As was mentioned, we can get higher-order polynomial fitting by adding more terms to the independent variables matrix (the A in Ax=b).
Say we want to fit a quadratic model with constant, linear, interaction, and squared terms (1, x, y, xy, x^2, y^2). We can do this manually:
% best-fit quadratic curve
C = [ones(50,1) data(:,1:2) prod(data(:,1:2),2) data(:,1:2).^2] \ data(:,3);
zz = [ones(numel(xx),1) xx(:) yy(:) xx(:).*yy(:) xx(:).^2 yy(:).^2] * C;
zz = reshape(zz, size(xx));
There is also a helper function x2fx in the Statistics Toolbox that helps in building the design matrix for a couple of model orders:
C = x2fx(data(:,1:2), 'quadratic') \ data(:,3);
zz = x2fx([xx(:) yy(:)], 'quadratic') * C;
zz = reshape(zz, size(xx));
Finally there is an excellent function polyfitn on the File Exchange by John D'Errico that allows you to specify all kinds of polynomial orders and terms involved:
model = polyfitn(data(:,1:2), data(:,3), 2);
zz = polyvaln(model, [xx(:) yy(:)]);
zz = reshape(zz, size(xx));
There might be some better functions on the file-exchange, but one way to do it by hand is this:
x = a(:); %make column vectors
y = b(:);
z = c(:);
%first order fit
M = [ones(size(x)), x, y];
k1 = M\z;
%least square solution of z = M * k1, so z = k1(1) + k1(2) * x + k1(3) * y
Similarly, you can do a second order fit:
%second order fit
M = [ones(size(x)), x, y, x.^2, x.*y, y.^2];
k2 = M\z;
which seems to have numerical problems for the limited dataset you gave. Type help mldivide for more details.
To make an interpolation over some regular grid, you can do like so:
ngrid = 20;
[A,B] = meshgrid(linspace(min(a), max(a), ngrid), ...
linspace(min(b), max(b), ngrid));
M = [ones(numel(A),1), A(:), B(:), A(:).^2, A(:).*B(:), B(:).^2];
C2_fit = reshape(M * k2, size(A)); % = k2(1) + k2(2)*A + k2(3)*B + k2(4)*A.^2 + ...
%plot to compare fit with original data
surfl(A,B,C2_fit);shading flat;colormap gray
hold on
plot3(a,b,c, '.r')
A 3rd-order fit can be done using the formula given by TryHard below, but the formulas quickly become tedious when the order increases. Better write a function that can construct M given x, y and order if you have to do that more than once.
This sounds like more of a philosophical question than specific implementation, specifically to bit - "how does one find a formula that fits a set of data to the best advantage?" In my experience that is a choice you have to make depending on what you're trying to achieve.
What defines "best" for you? For a data fitting problem you can keep adding more and more polynomial coefficients and making a better R^2 value... but will eventually "over fit" the data. A downside of high order polynomials is behavior outside the bounds of the sample data which you've used to fit your response surface - it can quickly go off in some wild direction which may not be appropriate for whatever it is you're trying to model.
Do you have insight into the physical behavior of the system / data you're fitting? That can be used as a basis for what set of equations to use to create a math model. My recommendation would be to use the most economical (simple) model you can get away with.