trying to implement knnclassify in matlab for fisheriris dataset - matlab

I am trying to implement knnclassify in matlab for fisheriris data set and try to get the confusion matrix for it. Below is the code for matlab implementation of knnclassify.
I am unable to understand how to incorporate the fisheriris datase in this code.
and how to calculate the confusion matrix.
function classifications = knnclassify(train_points, train_labels, test_points, k);
%-------------------------------------------
% K nearest neighbour (KNN) classification
% code by Jaakko Peltonen 2008
%-------------------------------------------
% Classifies test points based on majority vote of their k nearest
% neighbours in train points. 'Nearest' is determined from squared
% Euclidean distance. In case the majority voting results in a tie,
% gives equal portions to each class in the tie.
%
% Inputs:
%---------
% train_points: matrix, the i:th row has the features of the i:th
% training point.
%
% train_labels: two possible formats.
% Format 1: a column vector where the i:th element is an integer
% value indicating the label of the i:th training
% point. The labels should start from zero.
%
% Format 2: a matrix where the i:th row has the class memberships
% of the i:th training point. Each class membership is
% a value from 0 to 1, where 1 means the point fully
% belongs to that class.
%
% test_points: feature matrix for test points. Same format as
% train_points. You can give an empty matrix: then the method
% computes leave-one-out classification error based only on
% the training points.
%
% Outputs:
%---------
% classifications: matrix, the i:th row has the predicted class
% memberships of the i:th test point. Each class
% membership if a value from 0 to 1, where 1 means
% the test point is predicted to fully belong to
% that class.
%
nDim = size(test_points,2);
nTrainPoints = size(train_points,1);
% if the training labels were provided in format 1,
% convert them to format 2.
if size(train_labels,2)==1,
nClasses = max(train_labels)+1;
train_labels2 = zeros(nTrainPoints,nClasses);
for i=1:nTrainPoints,
train_labels2(i,train_labels(i)+1) = 1;
end;
train_labels = train_labels2;
end;
nClasses = size(train_labels,2);
% if test_points is empty, perform leave-one-out classification
if isempty(test_points),
test_points = train_points;
leave_one_out = 1;
else
leave_one_out = 0;
end;
nTestPoints = size(test_points,1);
%
% Perform the KNN classification. For leave-one-out classification,
% this code assumes that k < nTrainPoints.
%
classifications = zeros(nTestPoints, nClasses);
for i=1:nTestPoints,
% find squared Euclidean distances to all training points
difference = train_points(:,1:nDim) - repmat(test_points(i,:),[nTrainPoints 1]);
distances = sum(difference.^2,2);
% in leave-one-out classification, make sure the point being
% classified is not chosen among the k neighbors.
if leave_one_out == 1,
distances(i) = inf;
end;
% collect the 'votes' of the k closest points
[sorted_distances, indices] = sort(distances);
classamounts = zeros(1, nClasses);
for j=1:k,
classamounts = classamounts + train_labels(indices(j),:);
end;
% choose the class by majority vote
indices = find(classamounts == max(classamounts));
if (length(indices) == 1),
% there is a single winner
classifications(i,indices(1)) = 1;
else
% there was a tie between two or more classes
classifications(i,indices) = 1/length(indices);
end;
end;

Answer for the first question:
You first need to fix small bug in the function:
Change
difference = train_points(:,1:nDim) - repmat...
to
difference = train_points - repmat...
Than the script to run this classifier on fisheriris:
clear
data=load('fisheriris');
[~,~,ci]=unique(data.species);
data.labels=ci-1; % start from 0
portion_to_test=40/100;
n=size(data.species,1);% #samples
test_idx=randperm(n,round(n*portion_to_test));
train_idx=setdiff(1:n,test_idx);
train_meas=data.meas(train_idx,:);
train_labels=data.labels(train_idx);
test_meas=data.meas(test_idx,:);
test_labels=data.labels(test_idx);
result=knnclassify(train_meas,train_labels,test_meas,1);
[~,mi]=max(result,[],2);
result_labels=mi-1; % start from 0;
error=sum(result_labels~=test_labels)/length(test_labels)

Related

Why is my decision boundary wrong for logistic regression using gradient descent?

I am trying to solve a classification task using logistic regression. Part of my task is to plot the decision boundary. I find that the gradient of the decision boundary seems to be solved correctly by my algorithm but when plotting the boundary is too high and does not separate the points well. I cannot work out why this is and would be grateful for any advice to solve this issue.
data = open('Question5.mat');
x = data.x; y = data.y; % Extract data for ease of use
LR = 0.001; % Set tunable learning rate for gradient descent
w_est = [0; 0; 0]; % Set inital guess for a, b, and c
cost = []; % Initalise array to hold value of cost function
figure;
for i = 1:20000 % Set iteration limit for gradient descent
iteration_cost = 0; grad_a = 0; grad_b = 0; grad_c = 0; % Set innial value of 0 for summed terms
for m = 1:1100 % Iterate through data points
y_hat_est = 1./(1+exp(-w_est'*[x(m,1); x(m,2); 1])); % Calculate value of sigmoid function with estimated coefficients for each datapoint
iteration_cost = iteration_cost + y(m)*log(y_hat_est)+(1-y(m))*log(1-y_hat_est); % Calculate cost function and add it to summed term for each data point
% Calculate each gradient term for each data point and add to
% summed gradient
grad_a = grad_a + (y_hat_est - y(m))*x(m,1);
grad_b = grad_b + (y_hat_est - y(m))*x(m,2);
grad_c = grad_c + (y_hat_est - y(m))*x(m,3);
end
g = [grad_a; grad_b; grad_c]; % Create vector of gradients
w_est = w_est - LR*g; % Update estimate vector with next term
cost(i) = -iteration_cost; % Add the value of the cost function to the array for costs
if mod(i,1000) == 0 % Only plot on some iterations to speed up program
hold off
gscatter(x(:,1),x(:,2),y,'rb'); % Plot scatter plot grouped by class
xlabel('x1'); ylabel('x2'); title(i); % Add title and labels to figure
hold on
x1_plot = -6:4; x2_plot = -3:7; % Create array of values for plotting
plot( -(w_est(1)*x1_plot + w_est(3)) /w_est(2), x2_plot); % Plot decision boundary based on the current coefficient estimates
% pause(1) % Add delay to aid visualisation
end
end
hold off;
figure; plot(cost) % Plot the cost function
title('Cost function'); xlabel('Iteration number'); ylabel('cost');
enter image description here

Demeaned Returns for Covariance (Matlab)

I've got this code:
function [sigma,shrinkage]=covMarket(x,shrink)
% function sigma=covmarket(x)
% x (t*n): t iid observations on n random variables
% sigma (n*n): invertible covariance matrix estimator
%
% This estimator is a weighted average of the sample
% covariance matrix and a "prior" or "shrinkage target".
% Here, the prior is given by a one-factor model.
% The factor is equal to the cross-sectional average
% of all the random variables.
% The notation follows Ledoit and Wolf (2003)
% This version: 04/2014
% de-mean returns
t=size(x,1);
n=size(x,2);
meanx=mean(x);
x=x-meanx(ones(t,1),:);
xmkt=mean(x')';
sample=cov([x xmkt])*(t-1)/t;
covmkt=sample(1:n,n+1);
varmkt=sample(n+1,n+1);
sample(:,n+1)=[];
sample(n+1,:)=[];
prior=covmkt*covmkt'./varmkt;
prior(logical(eye(n)))=diag(sample);
if (nargin < 2 | shrink == -1) % compute shrinkage parameters
c=norm(sample-prior,'fro')^2;
y=x.^2;
p=1/t*sum(sum(y'*y))-sum(sum(sample.^2));
% r is divided into diagonal
% and off-diagonal terms, and the off-diagonal term
% is itself divided into smaller terms
rdiag=1/t*sum(sum(y.^2))-sum(diag(sample).^2);
z=x.*xmkt(:,ones(1,n));
v1=1/t*y'*z-covmkt(:,ones(1,n)).*sample;
roff1=sum(sum(v1.*covmkt(:,ones(1,n))'))/varmkt...
-sum(diag(v1).*covmkt)/varmkt;
v3=1/t*z'*z-varmkt*sample;
roff3=sum(sum(v3.*(covmkt*covmkt')))/varmkt^2 ...
-sum(diag(v3).*covmkt.^2)/varmkt^2;
roff=2*roff1-roff3;
r=rdiag+roff;
% compute shrinkage constant
k=(p-r)/c;
shrinkage=max(0,min(1,k/t))
else % use specified number
shrinkage = shrink;
end
% compute the estimator
sigma=shrinkage*prior+(1-shrinkage)*sample;
end
It's a Part of the Matlab code from Ledoit/Wolf (2003). I don't understand why the demeaning the returns before calculating the covariance? Is this Matlab specific? In my opinion, there is no need for demeaning returns before calculating with the cov-function. (The function does it on its own)
Thanks for help in advance!

linear regression with feature normalization matlab code

I did it two ways, why is the first way(starting on line with mu=mean(X) not working? what's the difference?
function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
% FEATURENORMALIZE(X) returns a normalized version of X where
% the mean value of each feature is 0 and the standard deviation
% is 1. This is often a good preprocessing step to do when
% working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
% of the feature and subtract it from the dataset,
% storing the mean value in mu. Next, compute the
% standard deviation of each feature and divide
% each feature by it's standard deviation, storing
% the standard deviation in sigma.
%
% Note that X is a matrix where each column is a
% feature and each row is an example. You need
% to perform the normalization separately for
% each feature.
%
% Hint: You might find the 'mean' and 'std' functions useful.
%
%mu=mean(X)
%X_norm=X-mu;
%sigma=std(X_norm)
%X_norm(1)=X_norm(1)/sigma(1)
%X_norm(2)=X_norm(2)/sigma(2)
% Calculates mean and std dev for each feature
for i=1:size(mu,2)
mu(1,i) = mean(X(:,i));
sigma(1,i) = std(X(:,i));
X_norm(:,i) = (X(:,i)-mu(1,i))/sigma(1,i);
end
% ============================================================
end
The reason is because you try to subtract a vector from a matrix. mean(X) gives you a vector with the mean in the columns of X, dimension [1xC], and the X is dimension [RxC]. A way to solve this in a oneliner is
X = (X-repmat(mean(X,1),size(X,1),1))./repmat(std(X,0,1),size(X,1),1)
You need to loop through X. You can further verify the output of above code with normalize(X)
for i = 1: size(X, 2)
mu = mean(X(:, i));
sigma = std(X(:, i));
X_norm(:, i) = (X(:, i) - mu) ./ sigma
end

Revised Simplex Method - Matlab Script

I've been asked to write down a Matlab program in order to solve LPs using the Revised Simplex Method.
The code I wrote runs without problems with input data although I've realised it doesn't solve the problem properly, as it does not update the inverse of the basis B (the real core idea of the abovementioned method).
The problem is only related to a part of the code, the one in the bottom of the script aiming at:
Computing the new inverse basis B^-1 by performing elementary row operations on [B^-1 u] (pivot row index is l_out). The vector u is transformed into a unit vector with u(l_out) = 1 and u(i) = 0 for other i.
Here's the code I wrote:
%% Implementation of the revised Simplex. Solves a linear
% programming problem of the form
%
% min c'*x
% s.t. Ax = b
% x >= 0
%
% The function input parameters are the following:
% A: The constraint matrix
% b: The rhs vector
% c: The vector of cost coefficients
% C: The indices of the basic variables corresponding to an
% initial basic feasible solution
%
% The function returns:
% x_opt: Decision variable values at the optimal solution
% f_opt: Objective function value at the optimal solution
%
% Usage: [x_opt, f_opt] = S12345X(A,b,c,C)
% NOTE: Replace 12345X with your own student number
% and rename the file accordingly
function [x_opt, f_opt] = SXXXXX(A,b,c,C)
%% Initialization phase
% Initialize the vector of decision variables
x = zeros(length(c),1);
% Create the initial Basis matrix, compute its inverse and
% compute the inital basic feasible solution
B=A(:,C);
invB = inv(B);
x(C) = invB*b;
%% Iteration phase
n_max = 10; % At most n_max iterations
for n = 1:n_max % Main loop
% Compute the vector of reduced costs c_r
c_B = c(C); % Basic variable costs
p = (c_B'*invB)'; % Dual variables
c_r = c' - p'*A; % Vector of reduced costs
% Check if the solution is optimal. If optimal, use
% 'return' to break from the function, e.g.
J = find(c_r < 0); % Find indices with negative reduced costs
if (isempty(J))
f_opt = c'*x;
x_opt = x;
return;
end
% Choose the entering variable
j_in = J(1);
% Compute the vector u (i.e., the reverse of the basic directions)
u = invB*A(:,j_in);
I = find(u > 0);
if (isempty(I))
f_opt = -inf; % Optimal objective function cost = -inf
x_opt = []; % Produce empty vector []
return % Break from the function
end
% Compute the optimal step length theta
theta = min(x(C(I))./u(I));
L = find(x(C)./u == theta); % Find all indices with ratio theta
% Select the exiting variable
l_out = L(1);
% Move to the adjacent solution
x(C) = x(C) - theta*u;
% Value of the entering variable is theta
x(j_in) = theta;
% Update the set of basic indices C
C(l_out) = j_in;
% Compute the new inverse basis B^-1 by performing elementary row
% operations on [B^-1 u] (pivot row index is l_out). The vector u is trans-
% formed into a unit vector with u(l_out) = 1 and u(i) = 0 for
% other i.
M=horzcat(invB,u);
[f g]=size(M);
R(l_out,:)=M(l_out,:)/M(l_out,j_in); % Copy row l_out, normalizing M(l_out,j_in) to 1
u(l_out)=1;
for k = 1:f % For all matrix rows
if (k ~= l_out) % Other then l_out
u(k)=0;
R(k,:)=M(k,:)-M(k,j_in)*R(l_out,:); % Set them equal to the original matrix Minus a multiple of normalized row l_out, making R(k,j_in)=0
end
end
invM=horzcat(u,invB);
% Check if too many iterations are performed (increase n_max to
% allow more iterations)
if(n == n_max)
fprintf('Max number of iterations performed!\n\n');
return
end
end % End for (the main iteration loop)
end % End function
%% Example 3.5 from the book (A small test problem)
% Data in standard form:
% A = [1 2 2 1 0 0;
% 2 1 2 0 1 0;
% 2 2 1 0 0 1];
% b = [20 20 20]';
% c = [-10 -12 -12 0 0 0]';
% C = [4 5 6]; % Indices of the basic variables of
% % the initial basic feasible solution
%
% The optimal solution
% x_opt = [4 4 4 0 0 0]' % Optimal decision variable values
% f_opt = -136 % Optimal objective function cost
Ok, after a lot of hrs spent on the intensive use of printmat and disp to understand what was happening inside the code from a mathematical point of view I realized it was a problem with the index j_in and normalization in case of dividing by zero therefore I managed to solve the issue as follows.
Now it runs perfectly. Cheers.
%% Implementation of the revised Simplex. Solves a linear
% programming problem of the form
%
% min c'*x
% s.t. Ax = b
% x >= 0
%
% The function input parameters are the following:
% A: The constraint matrix
% b: The rhs vector
% c: The vector of cost coefficients
% C: The indices of the basic variables corresponding to an
% initial basic feasible solution
%
% The function returns:
% x_opt: Decision variable values at the optimal solution
% f_opt: Objective function value at the optimal solution
%
% Usage: [x_opt, f_opt] = S12345X(A,b,c,C)
% NOTE: Replace 12345X with your own student number
% and rename the file accordingly
function [x_opt, f_opt] = S472366(A,b,c,C)
%% Initialization phase
% Initialize the vector of decision variables
x = zeros(length(c),1);
% Create the initial Basis matrix, compute its inverse and
% compute the inital basic feasible solution
B=A(:,C);
invB = inv(B);
x(C) = invB*b;
%% Iteration phase
n_max = 10; % At most n_max iterations
for n = 1:n_max % Main loop
% Compute the vector of reduced costs c_r
c_B = c(C); % Basic variable costs
p = (c_B'*invB)'; % Dual variables
c_r = c' - p'*A; % Vector of reduced costs
% Check if the solution is optimal. If optimal, use
% 'return' to break from the function, e.g.
J = find(c_r < 0); % Find indices with negative reduced costs
if (isempty(J))
f_opt = c'*x;
x_opt = x;
return;
end
% Choose the entering variable
j_in = J(1);
% Compute the vector u (i.e., the reverse of the basic directions)
u = invB*A(:,j_in);
I = find(u > 0);
if (isempty(I))
f_opt = -inf; % Optimal objective function cost = -inf
x_opt = []; % Produce empty vector []
return % Break from the function
end
% Compute the optimal step length theta
theta = min(x(C(I))./u(I));
L = find(x(C)./u == theta); % Find all indices with ratio theta
% Select the exiting variable
l_out = L(1);
% Move to the adjacent solution
x(C) = x(C) - theta*u;
% Value of the entering variable is theta
x(j_in) = theta;
% Update the set of basic indices C
C(l_out) = j_in;
% Compute the new inverse basis B^-1 by performing elementary row
% operations on [B^-1 u] (pivot row index is l_out). The vector u is trans-
% formed into a unit vector with u(l_out) = 1 and u(i) = 0 for
% other i.
M=horzcat(u, invB);
[f g]=size(M);
if (theta~=0)
M(l_out,:)=M(l_out,:)/M(l_out,1); % Copy row l_out, normalizing M(l_out,1) to 1
end
for k = 1:f % For all matrix rows
if (k ~= l_out) % Other then l_out
M(k,:)=M(k,:)-M(k,1)*M(l_out,:); % Set them equal to the original matrix Minus a multiple of normalized row l_out, making R(k,j_in)=0
end
end
invB=M(1:3,2:end);
% Check if too many iterations are performed (increase n_max to
% allow more iterations)
if(n == n_max)
fprintf('Max number of iterations performed!\n\n');
return
end
end % End for (the main iteration loop)
end % End function
%% Example 3.5 from the book (A small test problem)
% Data in standard form:
% A = [1 2 2 1 0 0;
% 2 1 2 0 1 0;
% 2 2 1 0 0 1];
% b = [20 20 20]';
% c = [-10 -12 -12 0 0 0]';
% C = [4 5 6]; % Indices of the basic variables of
% % the initial basic feasible solution
%
% The optimal solution
% x_opt = [4 4 4 0 0 0]' % Optimal decision variable values
% f_opt = -136 % Optimal objective function cost

Matrices kernelpca

we are working on a project and trying to get some results with KPCA.
We have a dataset (handwritten digits) and have taken the 200 first digits of each number so our complete traindata matrix is 2000x784 (784 are the dimensions).
When we do KPCA we get a matrix with the new low-dimensionality dataset e.g.2000x100. However we don't understand the result. Shouldn;t we get other matrices such as we do when we do svd for pca? the code we use for KPCA is the following:
function data_out = kernelpca(data_in,num_dim)
%% Checking to ensure output dimensions are lesser than input dimension.
if num_dim > size(data_in,1)
fprintf('\nDimensions of output data has to be lesser than the dimensions of input data\n');
fprintf('Closing program\n');
return
end
%% Using the Gaussian Kernel to construct the Kernel K
% K(x,y) = -exp((x-y)^2/(sigma)^2)
% K is a symmetric Kernel
K = zeros(size(data_in,2),size(data_in,2));
for row = 1:size(data_in,2)
for col = 1:row
temp = sum(((data_in(:,row) - data_in(:,col)).^2));
K(row,col) = exp(-temp); % sigma = 1
end
end
K = K + K';
% Dividing the diagonal element by 2 since it has been added to itself
for row = 1:size(data_in,2)
K(row,row) = K(row,row)/2;
end
% We know that for PCA the data has to be centered. Even if the input data
% set 'X' lets say in centered, there is no gurantee the data when mapped
% in the feature space [phi(x)] is also centered. Since we actually never
% work in the feature space we cannot center the data. To include this
% correction a pseudo centering is done using the Kernel.
one_mat = ones(size(K));
K_center = K - one_mat*K - K*one_mat + one_mat*K*one_mat;
clear K
%% Obtaining the low dimensional projection
% The following equation needs to be satisfied for K
% N*lamda*K*alpha = K*alpha
% Thus lamda's has to be normalized by the number of points
opts.issym=1;
opts.disp = 0;
opts.isreal = 1;
neigs = 30;
[eigvec eigval] = eigs(K_center,[],neigs,'lm',opts);
eig_val = eigval ~= 0;
eig_val = eig_val./size(data_in,2);
% Again 1 = lamda*(alpha.alpha)
% Here '.' indicated dot product
for col = 1:size(eigvec,2)
eigvec(:,col) = eigvec(:,col)./(sqrt(eig_val(col,col)));
end
[~, index] = sort(eig_val,'descend');
eigvec = eigvec(:,index);
%% Projecting the data in lower dimensions
data_out = zeros(num_dim,size(data_in,2));
for count = 1:num_dim
data_out(count,:) = eigvec(:,count)'*K_center';
end
we have read lots of papers but still cannot get the hand of kpca's logic!
Any help would be appreciated!
PCA Algorithm:
PCA data samples
Compute mean
Compute covariance
Solve
: Covariance matrix.
: Eigen Vectors of covariance matrix.
: Eigen values of covariance matrix.
With the first n-th eigen vectors you reduce the dimensionality of your data to the n dimensions. You can use this code for the PCA, it has an integraded example and it is simple.
KPCA Algorithm:
We choose a kernel function in you code this is specified by:
K(x,y) = -exp((x-y)^2/(sigma)^2)
in order to represent your data in a high dimensional space hopping that, in this space your data will be well represented for further porposes like classification or clustering whereas this task could be harder to be solved in the initial feature space. This trick is aslo known as "Kernel trick". Look figure.
[Step1] Constuct gram matrix
K = zeros(size(data_in,2),size(data_in,2));
for row = 1:size(data_in,2)
for col = 1:row
temp = sum(((data_in(:,row) - data_in(:,col)).^2));
K(row,col) = exp(-temp); % sigma = 1
end
end
K = K + K';
% Dividing the diagonal element by 2 since it has been added to itself
for row = 1:size(data_in,2)
K(row,row) = K(row,row)/2;
end
Here because the gram matrix is symetric the half of the values are computed and the final result is obtained by adding the computed so far gram matrix and its transpose. Finally, we divide by 2 as the comments mention.
[Step2] Normalize the kernel matrix
This is done by this part of your code:
K_center = K - one_mat*K - K*one_mat + one_mat*K*one_mat;
As the comments mention a pseudocentering procedure must be done. For an idea about the proof here.
[Step3] Solve the eigenvalue problem
For this task this part of the code is responsible.
%% Obtaining the low dimensional projection
% The following equation needs to be satisfied for K
% N*lamda*K*alpha = K*alpha
% Thus lamda's has to be normalized by the number of points
opts.issym=1;
opts.disp = 0;
opts.isreal = 1;
neigs = 30;
[eigvec eigval] = eigs(K_center,[],neigs,'lm',opts);
eig_val = eigval ~= 0;
eig_val = eig_val./size(data_in,2);
% Again 1 = lamda*(alpha.alpha)
% Here '.' indicated dot product
for col = 1:size(eigvec,2)
eigvec(:,col) = eigvec(:,col)./(sqrt(eig_val(col,col)));
end
[~, index] = sort(eig_val,'descend');
eigvec = eigvec(:,index);
[Step4] Change representaion of each data point
For this task this part of the code is responsible.
%% Projecting the data in lower dimensions
data_out = zeros(num_dim,size(data_in,2));
for count = 1:num_dim
data_out(count,:) = eigvec(:,count)'*K_center';
end
Look the details here.
PS: I encurage you to use code written from this author and contains intuitive examples.