How do I implement stochastic gradient descent correctly? - matlab

I'm trying to implement stochastic gradient descent in MATLAB however I am not seeing any convergence. Mini-batch gradient descent worked as expected so I think that the cost function and gradient steps are correct.
The two main issues I am having are:
Randomly shuffling the data in the training set before the
for-loop
Selecting one example at a time
Here is my MATLAB code:
Generating Data
alpha = 0.001;
num_iters = 10;
xrange =(-10:0.1:10); % data lenght
ydata = 5*(xrange)+30; % data with gradient 2, intercept 5
% plot(xrange,ydata); grid on;
noise = (2*randn(1,length(xrange))); % generating noise
target = ydata + noise; % adding noise to data
f1 = figure
subplot(2,2,1);
scatter(xrange,target); grid on; hold on; % plot a scttaer
title('Linear Regression')
xlabel('xrange')
ylabel('ydata')
tita0 = randn(1,1); %intercept (randomised)
tita1 = randn(1,1); %gradient (randomised)
% Initialize Objective Function History
J_history = zeros(num_iters, 1);
% Number of training examples
m = (length(xrange));
Shuffling data, Gradient Descent and Cost Function
% STEP1 : we shuffle the data
data = [ xrange, ydata];
data = data(randperm(size(data,1)),:);
y = data(:,1);
X = data(:,2:end);
for iter = 1:num_iters
for i = 1:m
x = X(:,i); % STEP2 Select one example
h = tita0 + tita1.*x; % building the estimated %Changed to xrange in BGD
%c = (1/(2*length(xrange)))*sum((h-target).^2)
temp0 = tita0 - alpha*((1/m)*sum((h-target)));
temp1 = tita1 - alpha*((1/m)*sum((h-target).*x)); %Changed to xrange in BGD
tita0 = temp0;
tita1 = temp1;
fprintf("here\n %d; %d", i, x)
end
J_history(iter) = (1/(2*m))*sum((h-target).^2); % Calculating cost from data to estimate
fprintf('Iteration #%d - Cost = %d... \r\n',iter, J_history(iter));
end
On plotting the cost vs iterations and linear regression graphs, the MSE settles (local minimum?) at around 420 which is wrong.
On the other hand if I re-run the exact same code however using batch gradient descent I get acceptable results. In batch gradient descent I am changing x to xrange:
Any suggestions on what I am doing wrong?
EDIT:
I also tried selecting random indexes using:
f = round(1+rand(1,1)*201); %generating random indexes
and then selecting one example:
x = xrange(f); % STEP2 Select one example
Proceeding to use x in the hypothesis and GD steps also yield a cost of 420.

First we need to shuffle the data correctly:
data = [ xrange', target'];
data = data(randperm(size(data,1)),:);
Next we need to index X and y correctly:
y = data(:,2);
X = data(:,1);
Then during gradient descent I need to update based on a single value not on target, like so:
tita0 = tita0 - alpha*((1/m)*((h-y(i))));
tita1 = tita1 - alpha*((1/m)*((h-y(i)).*x));
Theta converges to [5, 30] with the changes above.

Related

Incorrect dimensions for matrix multiplication

I just want to make some predictions using these calculations but I'm getting this error: Error using * Incorrect dimensions for matrix multiplication. Check that the number of columns in the first matrix matches the number of rows in the second matrix. To operate on each element of the matrix individually, use TIMES (.*) for elementwise multiplication.
Error in main (line 64) r=(pn')*(best);
the dataset
Main driver
clear; clc;
%load dataset
ds = load('ex1data1.txt');
%split x/y
x = ds(:,1); % Examples
y = ds(:,2);
n=1;
m=length(y);
format long;
b1 = x\y; %intercept
%plot lineer regression
% Top plot
yCalc1 = b1*x;
ax1 = nexttile;
scatter(ax1,x,y,'o');
hold on
plot(ax1,x,yCalc1);
title(ax1,'Linear Regression Relation Between Profit & Truck')
ax1.FontSize = 14;
ax1.XColor = 'red';
ylabel('profit of a food truck');
xlabel('population of a city');
grid on
%normalise
[x,maxs,mins]=normalize(x);
%add column with ones - help hyphothesizs
xo=[ones(m,1),x];
%gradient descent
repeat=1500;
lrate=0.01;
thetas=zeros(2,1);
[best,costs] = gradientDescent(repeat,lrate,thetas,xo,y,m);
% plot 𝑱(𝜽) vs iteration
ax2 = nexttile;
scatter(ax2,costs,1:repeat)
title(ax2,' 𝑱(𝜽) vs iteration')
grid(ax2,'on')
ylabel('Iteration');
xlabel('J(𝜽)');
%prediction
p=[7;7];
pn=(p-maxs')./(maxs'-mins');
pn = [1;pn];
r=(pn')*(best);
Gradient Descent
function [thetas,costs] = gradientDescent(repeat,lrate,thetas,xo,y,m)
costs = zeros(repeat,1);
for r = 1:repeat
hc=xo*thetas - y;
tempintercept=sum(hc.*xo);
thetas = thetas - (lrate * (1/m)) * tempintercept';
costs(r)=cost(thetas,xo,y);
end
end
Normalize
function [x,maxs,mins] = normalize (x)
x=(x-max(x))./(max(x)/min(x));
maxs=max(x);
mins=min(x);
end

Why is my decision boundary wrong for logistic regression using gradient descent?

I am trying to solve a classification task using logistic regression. Part of my task is to plot the decision boundary. I find that the gradient of the decision boundary seems to be solved correctly by my algorithm but when plotting the boundary is too high and does not separate the points well. I cannot work out why this is and would be grateful for any advice to solve this issue.
data = open('Question5.mat');
x = data.x; y = data.y; % Extract data for ease of use
LR = 0.001; % Set tunable learning rate for gradient descent
w_est = [0; 0; 0]; % Set inital guess for a, b, and c
cost = []; % Initalise array to hold value of cost function
figure;
for i = 1:20000 % Set iteration limit for gradient descent
iteration_cost = 0; grad_a = 0; grad_b = 0; grad_c = 0; % Set innial value of 0 for summed terms
for m = 1:1100 % Iterate through data points
y_hat_est = 1./(1+exp(-w_est'*[x(m,1); x(m,2); 1])); % Calculate value of sigmoid function with estimated coefficients for each datapoint
iteration_cost = iteration_cost + y(m)*log(y_hat_est)+(1-y(m))*log(1-y_hat_est); % Calculate cost function and add it to summed term for each data point
% Calculate each gradient term for each data point and add to
% summed gradient
grad_a = grad_a + (y_hat_est - y(m))*x(m,1);
grad_b = grad_b + (y_hat_est - y(m))*x(m,2);
grad_c = grad_c + (y_hat_est - y(m))*x(m,3);
end
g = [grad_a; grad_b; grad_c]; % Create vector of gradients
w_est = w_est - LR*g; % Update estimate vector with next term
cost(i) = -iteration_cost; % Add the value of the cost function to the array for costs
if mod(i,1000) == 0 % Only plot on some iterations to speed up program
hold off
gscatter(x(:,1),x(:,2),y,'rb'); % Plot scatter plot grouped by class
xlabel('x1'); ylabel('x2'); title(i); % Add title and labels to figure
hold on
x1_plot = -6:4; x2_plot = -3:7; % Create array of values for plotting
plot( -(w_est(1)*x1_plot + w_est(3)) /w_est(2), x2_plot); % Plot decision boundary based on the current coefficient estimates
% pause(1) % Add delay to aid visualisation
end
end
hold off;
figure; plot(cost) % Plot the cost function
title('Cost function'); xlabel('Iteration number'); ylabel('cost');
enter image description here

How to plot decision boundary from linear SVM after PCA in Matlab?

I have conducted a linear SVM on a large dataset, however in order to reduce the number of dimensions I performed a PCA, than conducted the SVM on a subset of the component scores (the first 650 components which explained 99.5% of the variance). Now I want to plot the decision boundary in the original variable space using the beta weights and bias from the SVM created in PCA space. But I can't figure out how to project the bias term from the SVM into the original variable space. I've written a demo using the fisher iris data to illustrate:
clear; clc; close all
% load data
load fisheriris
inds = ~strcmp(species,'setosa');
X = meas(inds,3:4);
Y = species(inds);
mu = mean(X)
% perform the PCA
[eigenvectors, scores] = pca(X);
% train the svm
SVMModel = fitcsvm(scores,Y);
% plot the result
figure(1)
gscatter(scores(:,1),scores(:,2),Y,'rgb','osd')
title('PCA space')
% now plot the decision boundary
betas = SVMModel.Beta;
m = -betas(1)/betas(2); % my gradient
b = -SVMModel.Bias; % my y-intercept
f = #(x) m.*x + b; % my linear equation
hold on
fplot(f,'k')
hold off
axis equal
xlim([-1.5 2.5])
ylim([-2 2])
% inverse transform the PCA
Xhat = scores * eigenvectors';
Xhat = bsxfun(#plus, Xhat, mu);
% plot the result
figure(2)
hold on
gscatter(Xhat(:,1),Xhat(:,2),Y,'rgb','osd')
% and the decision boundary
betaHat = betas' * eigenvectors';
mHat = -betaHat(1)/betaHat(2);
bHat = b * eigenvectors';
bHat = bHat + mu; % I know I have to add mu somewhere...
bHat = bHat/betaHat(2);
bHat = sum(sum(bHat)); % sum to reduce the matrix to a single value
% the correct value of bHat should be 6.3962
f = #(x) mHat.*x + bHat;
fplot(f,'k')
hold off
axis equal
title('Recovered feature space')
xlim([3 7])
ylim([0 4])
Any guidance on how I'm calculating bHat incorrectly would be much appreciated.
Just in case anyone else comes across this problem, the solution is the bias term can be used to find the y-intercept, b = -SVMModel.Bias/betas(2). And the y-intercept is just another point in space [0 b] which can be recovered/unrotated by inverse transforming it through the PCA. This new point can then be used to solve the linear equation y = mx + b (i.e., b = y - mx). So the code should be:
% and the decision boundary
betaHat = betas' * eigenvectors';
mHat = -betaHat(1)/betaHat(2);
yint = b/betas(2); % y-intercept in PCA space
yintHat = [0 b] * eigenvectors'; % recover in original space
yintHat = yintHat + mu;
bHat = yintHat(2) - mHat*yintHat(1); % solve the linear equation
% the correct value of bHat is now 6.3962

Get the size of HOG feature vector - MATLAB

I'm a beginner in image processing and I'm using MATLAB to extract HOG features from the images to train SVM classifier. The size of the training images is 480*640 pixels and I'm getting 167796 features with the default settings for the built-in MATLAB extractHOGFeatures function. However, when I test the model it gives me less features (216 features only!) knowing that the testing images have the same size of the training images. I get this error in MATLAB "The number of columns in TEST and training data must be equal".
Do you have any clue how to solve this problem and get feature vector with the same size for the training and testing sets?
Here is the code,
[fpos,fneg] = featuress(pathPos, pathNeg);
%train SVM
HOG_featV = loadingV(fpos,fneg); % loading and labeling each training example
%% Detection
tSize = [24 32];
testImPath = '.\face_detection\dataset\bikes_and_persons2\';
imlist = dir([testImPath '*.bmp']);
for j = 1:length(imlist)
disp ('inside for loop');
img = imread([testImPath imlist(j).name]);
axis equal; axis tight; axis off;
imshow(img); hold on;
detect(img,model,tSize);
%% training
function [fpos, fneg] = featuress(pathPos,pathNeg)
% extract features for positive examples
imlist = dir([pathPos '*.bmp']);
for i = 1:length(imlist)
im = imread([pathPos imlist(i).name]);
fpos{i} = extractHOGFeatures(double(im));
end
% extract features for negative examples
imlist = dir([pathNeg '*.bmp']);
for i = 1:length(imlist)
im = imread([pathNeg imlist(i).name]);
fneg{i} = extractHOGFeatures(double(im));
end
end
%% testing function
function detect(im,model,wSize)
topLeftRow = 1;
topLeftCol = 1;
[bottomRightCol bottomRightRow d] = size(im);
fcount = 1;
for y = topLeftCol:bottomRightCol-wSize(2)
for x = topLeftRow:bottomRightRow-wSize(1)
p1 = [x,y];
p2 = [x+(wSize(1)-1), y+(wSize(2)-1)];
po = [p1; p2];
img = imcut(po,im);
featureVector{fcount} = extractHOGFeatures(double(img));
boxPoint{fcount} = [x,y];
fcount = fcount+1;
x = x+1;
end
end
lebel = ones(length(featureVector),1);
P = cell2mat(featureVector');
% each row of P' correspond to a window
[ predictions] = svmclassify(model, P); % classifying each window
[a, indx]= max(predictions);
bBox = cell2mat(boxPoint(indx));
rectangle('Position',[bBox(1),bBox(2),24,32],'LineWidth',1, 'EdgeColor','r');
end
Thanks in advance.
What's the size of P? Is it 167796 x 216? If so then, you should not transpose featureVector when you call cell2mat. Or you should transpose P before you use it. You can also make featureVector a matrix rather than a cell array. Since you know that the length of the HOG vector is 167796 and you know how many images you have, you can pre-allocate it up front, and fill in the rows.

Solving ODEs with Matlab, with varying Parameters

Lets say I have a simple logistic equation
dx/dt = 2ax(1 - x/N)
where N is the carrying capacity, a is some growth rate, and both a and N are parameters I'd like to vary.
So what I want to do is to plot a 3D graph of my fixed point and the two parameters.
I understand how to find a fixed point of a single parameter.
Here is my sample code
function xprime = MyLogisticFunction(t,X) %% The ODE
% Parameters
N = 10 % Carrying Capacity
a = 0.5 % Growth Rate
x1prime = 2*a*X(1)*(1 - X(1)/N );
xprime = [x1prime ]';
end
Next my solver
% Initial Number
x0 = 0.4;
%Time Window
tspan=[0 100];
[t,x]=ode45(#MyLogisticFunction,tspan,x0);
clf
x(end,1) % This gives me the fixed point for the parameters above.
So my real question is, how do I put a for loop across two functions, that allows me to vary a and N, so that I can plot out a 3D graph of a and N and my fixed point x*.
I've tried combining both functions into one .m file but it does not seem to work
You need to pass the parameters to your function:
function xprime = MyLogisticFunction(t,X,a,N) %% The ODE
% Parameters (passed as function arguments)
% N = 10 % Carrying Capacity
% a = 0.5 % Growth Rate
x1prime = 2*a*X(1)*(1 - X(1)/N );
xprime = [x1prime ]';
end
and then when you call the ode solver:
% Initial Number
x0 = 0.4;
%Time Window
tspan=[0 100];
a = 0.1:0.1:1; % or whatever
N = 1:10; % or whatever
x_end = zeros(length(a),length(N));
for ii = 1:length(a)
for jj = 1:length(N)
[t,x]=ode45(#(t,X)MyLogisticFunction(t,X,a(ii),N(jj)),tspan,x0);
x_end(ii,jj) = x(end,1);
end
end