I am trying to solve a classification task using logistic regression. Part of my task is to plot the decision boundary. I find that the gradient of the decision boundary seems to be solved correctly by my algorithm but when plotting the boundary is too high and does not separate the points well. I cannot work out why this is and would be grateful for any advice to solve this issue.
data = open('Question5.mat');
x = data.x; y = data.y; % Extract data for ease of use
LR = 0.001; % Set tunable learning rate for gradient descent
w_est = [0; 0; 0]; % Set inital guess for a, b, and c
cost = []; % Initalise array to hold value of cost function
figure;
for i = 1:20000 % Set iteration limit for gradient descent
iteration_cost = 0; grad_a = 0; grad_b = 0; grad_c = 0; % Set innial value of 0 for summed terms
for m = 1:1100 % Iterate through data points
y_hat_est = 1./(1+exp(-w_est'*[x(m,1); x(m,2); 1])); % Calculate value of sigmoid function with estimated coefficients for each datapoint
iteration_cost = iteration_cost + y(m)*log(y_hat_est)+(1-y(m))*log(1-y_hat_est); % Calculate cost function and add it to summed term for each data point
% Calculate each gradient term for each data point and add to
% summed gradient
grad_a = grad_a + (y_hat_est - y(m))*x(m,1);
grad_b = grad_b + (y_hat_est - y(m))*x(m,2);
grad_c = grad_c + (y_hat_est - y(m))*x(m,3);
end
g = [grad_a; grad_b; grad_c]; % Create vector of gradients
w_est = w_est - LR*g; % Update estimate vector with next term
cost(i) = -iteration_cost; % Add the value of the cost function to the array for costs
if mod(i,1000) == 0 % Only plot on some iterations to speed up program
hold off
gscatter(x(:,1),x(:,2),y,'rb'); % Plot scatter plot grouped by class
xlabel('x1'); ylabel('x2'); title(i); % Add title and labels to figure
hold on
x1_plot = -6:4; x2_plot = -3:7; % Create array of values for plotting
plot( -(w_est(1)*x1_plot + w_est(3)) /w_est(2), x2_plot); % Plot decision boundary based on the current coefficient estimates
% pause(1) % Add delay to aid visualisation
end
end
hold off;
figure; plot(cost) % Plot the cost function
title('Cost function'); xlabel('Iteration number'); ylabel('cost');
enter image description here
I am trying to create a big plot in Matlab by adding subplots in a loop.
% Generation of examples and targets
x = 0 : 0.05 : 3 * pi;
y = sin(x.^2);
% Deep Learning Toolbox™ software arranges concurrent vectors with a
% matrix, and sequential vectors with a cell array (where the second index is the time step).
% con2seq and seq2con allow concurrent vectors to be converted to sequential vectors, and back again.
p = con2seq(x);
t = con2seq(y); % convert the data to a useful format
% Creation of networks (based on algorlm1.m)
num_hid = 50;
% Epoch quantities to evaluate
num_epochs = [1 20 1000]
% We'll evaluate these algorithms
training_algorithms = {
'traingd',
'traingda',
'traincgf',
'traincgp',
'trainlm',
'trainbfg'
};
%% Initialization
% Create arrays to store networks and training parameters
networks = cell(length(training_algorithms), 1);
durations = zeros(length(training_algorithms), 1);
slopes = cell(length(training_algorithms), 1);
intercepts = cell(length(training_algorithms), 1);
correlations = cell(length(training_algorithms), 1);
for tai = 1:length(training_algorithms)
% Create new feedforwardnet
net = feedforwardnet(num_hid, training_algorithms{tai});
% Set all networks weights and biases equal (to the first one created)
if tai > 1
net.IW{1,1} = networks{tai-1}.IW{1,1};
net.LW{1,1} = networks{tai-1}.LW{1,1};
net.b{1} = networks{tai-1}.b{1};
net.b{2} = networks{tai-1}.b{2};
end
% Store network
networks{tai} = net;
end
figure;
for tai = 1:6
net = networks{tai}; % Load network
net.trainParam.showWindow = false; % Don't show graph
% Train network, and time training
tic;
net = train(net, p, t);
durations(tai)=toc;
% Simulate input on trained networks (and convert to double format)
y_result = cell2mat(sim(net, p));
% Evaluate result
[slopes{tai}, intercepts{tai}, correlations{tai}] = postreg(y_result, y);
% Add network to array
networks{tai} = net;
% Plot results
subplot(2,6,tai);
plot(x,y,'bx',x,y_result,'r'); % Plot the sine function and the output of the network
%title('1 epoch');
legend('target',training_algorithms{tai},'Location','north');
subplot(2,6, tai+6);
postregm(y_result, y); % perform a linear regression analysis and plot the result
end
I only get plots that were created in the last iteration though (when tai == 6). I tried adding 'hold on' in front of the loop, and turned it off again. Any ideas why this is happening?
EDIT: Here's the resulting figure:
EDIT2: I added code so it could be reproduced. You'll need the deep learning toolbox.
I'm trying to implement stochastic gradient descent in MATLAB however I am not seeing any convergence. Mini-batch gradient descent worked as expected so I think that the cost function and gradient steps are correct.
The two main issues I am having are:
Randomly shuffling the data in the training set before the
for-loop
Selecting one example at a time
Here is my MATLAB code:
Generating Data
alpha = 0.001;
num_iters = 10;
xrange =(-10:0.1:10); % data lenght
ydata = 5*(xrange)+30; % data with gradient 2, intercept 5
% plot(xrange,ydata); grid on;
noise = (2*randn(1,length(xrange))); % generating noise
target = ydata + noise; % adding noise to data
f1 = figure
subplot(2,2,1);
scatter(xrange,target); grid on; hold on; % plot a scttaer
title('Linear Regression')
xlabel('xrange')
ylabel('ydata')
tita0 = randn(1,1); %intercept (randomised)
tita1 = randn(1,1); %gradient (randomised)
% Initialize Objective Function History
J_history = zeros(num_iters, 1);
% Number of training examples
m = (length(xrange));
Shuffling data, Gradient Descent and Cost Function
% STEP1 : we shuffle the data
data = [ xrange, ydata];
data = data(randperm(size(data,1)),:);
y = data(:,1);
X = data(:,2:end);
for iter = 1:num_iters
for i = 1:m
x = X(:,i); % STEP2 Select one example
h = tita0 + tita1.*x; % building the estimated %Changed to xrange in BGD
%c = (1/(2*length(xrange)))*sum((h-target).^2)
temp0 = tita0 - alpha*((1/m)*sum((h-target)));
temp1 = tita1 - alpha*((1/m)*sum((h-target).*x)); %Changed to xrange in BGD
tita0 = temp0;
tita1 = temp1;
fprintf("here\n %d; %d", i, x)
end
J_history(iter) = (1/(2*m))*sum((h-target).^2); % Calculating cost from data to estimate
fprintf('Iteration #%d - Cost = %d... \r\n',iter, J_history(iter));
end
On plotting the cost vs iterations and linear regression graphs, the MSE settles (local minimum?) at around 420 which is wrong.
On the other hand if I re-run the exact same code however using batch gradient descent I get acceptable results. In batch gradient descent I am changing x to xrange:
Any suggestions on what I am doing wrong?
EDIT:
I also tried selecting random indexes using:
f = round(1+rand(1,1)*201); %generating random indexes
and then selecting one example:
x = xrange(f); % STEP2 Select one example
Proceeding to use x in the hypothesis and GD steps also yield a cost of 420.
First we need to shuffle the data correctly:
data = [ xrange', target'];
data = data(randperm(size(data,1)),:);
Next we need to index X and y correctly:
y = data(:,2);
X = data(:,1);
Then during gradient descent I need to update based on a single value not on target, like so:
tita0 = tita0 - alpha*((1/m)*((h-y(i))));
tita1 = tita1 - alpha*((1/m)*((h-y(i)).*x));
Theta converges to [5, 30] with the changes above.
I am training an Elman network (a specific type of Recurrent Neural Network) and for that reason my datasets (input/target) need to be cell arrays (so that the examples are considered as a sequence by the train function).
But, I don't manage to trigger the use of a validation and test set by the train function.
Here is an example, where I want a validation and test set to be used but the train function is not using any (I know that by looking at the performance plot from the 'nntraintool' wizard or by looking at the content of the 'tr' variable in my example below). It seems the "divideind" property and indexes are ignored.
%% Set the parameters of the run
n_neurons = 50; % Number of neurons
n = 1000; % Total number of samples
ne = 500; % Number of epochs
%% Create the samples
% Allocate memory
u = zeros(1, n);
x = zeros(1, n);
y = zeros(1, n);
% Initialize u, x and y
u(1)=randn;
x(1)=rand+sin(u(1));
y(1)=x(1);
% Calculate the samples
for i=2:n
u(i)=randn;
x(i)=.8*x(i-1)+sin(u(i));
y(i)=x(i);
end
%% Create the datasets
X=num2cell(u);
T=num2cell(y);
%% Train and simulate the network
% Create the net and apply the selected parameters
net = newelm(X,T,n_neurons); % Create network
net.trainParam.epochs = ne; % Number of epochs
%% This seems to be ignored
net.divideFcn = 'divideind';
net.divideParam.trainInd = 1:800;
net.divideParam.valInd = 801:900;
net.divideParam.testInd = 901:1000;
[net,tr]= train(net,X,T);
I found the answer, I need to add:
net.divideMode = 'time';
so that cell arrays can be divided in train/validation/test sets, for example with:
net.divideFcn = 'divideind';
I have been using the STK toolbox for a few days, for kriging of environmental parameter fields, i.e. in a geostatistical context.
I find the toolbox very well implemented and useful (big thanks to the authors!), and the kriging predictions I am getting through STK actually seem fine; however, I am finding myself unable to visualize a semivariogram model based on the STK output (i.e. estimated parameters for gaussian process / covariance functions).
I am attaching an example figure, showing the empirical semivariogram for a simple 1D test case and a Gaussian semivariogram model (as typically used in geostatistics, see also figure) fitted directly to that data. The figure further shows a semivariogram model based on STK output, i.e. using previously estimated model parameters (model.param from stk_param_estim) to get covariance K on a target grid of lag distances and then converting K to semivariance (according to the well-known relation semivar = K0-K where K0 is the covariance at zero lag). I am attaching a simple script to reproduce the figure and detailing the attempted conversion.
As you can see in the figure, this doesn’t do the trick. I have tried several other simple examples and STK datasets, but models obtained through STK vs direct fitting never agree, and in fact usually look much more different than in the example (i.e. the range often seems very different, in addition to the sill/sigma2; uncomment line 12 in the script to see another example). I have also attempted to input the converted STK parameters into the geostatistical model (also in the script), however, the output is identical to the result based on converting K above.
I’d be very thankful for your help!
Figure illustrating the lack of agreement between semivariograms based on direct fit vs conversion of STK output
% Code to reproduce the figure illustrating my problem of getting
% variograms from STK output. The only external functions needed are those
% included with STK.
% TEST DATA - This is simply a monotonic part of the normal pdf
nugget = 0;
X = [0:20]'; % coordinates
% X = [0:50]'; % uncomment this line to see how strongly the models can deviate for different test cases
V = normpdf(X./10+nugget,0,1); % observed values
covmodel = 'stk_gausscov_iso'; % covar model, part of STK toolbox
variomodel = 'stk_gausscov_iso_vario'; % variogram model, nested function
% GET STRUCTURE FOR THE SELECTED KRIGING (GAUSSIAN PROCESS) MODEL
nDim = size(X,2);
model = stk_model (covmodel, nDim);
model.lognoisevariance = NaN; % This makes STK fit nugget
% ESTIMATE THE PARAMETERS OF THE COVARIANCE FUNCTION
[param0, model.lognoisevariance] = stk_param_init (model, X, V); % Compute an initial guess for the parameters of the covariance function (param0)
model.param = stk_param_estim (model, X, V, param0); % Now model the covariance function
% EMPIRICAL SEMIVARIOGRAM (raw, binning removed for simplicity)
D = pdist(X)';
semivar_emp = 0.5.*(pdist(V)').^2;
% THEORETICAL SEMIVARIOGRAM FROM STK
% Target grid of lag distances
DT = [0:1:100]';
DT_zero = zeros(size(DT));
% Get covariance matrix on target grid using STK estimated pars
pairwise = true;
K = feval(model.covariance_type, model.param, DT, DT_zero, -1, pairwise);
% convert covariance to semivariance, i.e. G = C(0) - C(h)
sill = exp(model.param(1));
nugget = exp(model.lognoisevariance);
semivar_stk = sill - K + nugget; % --> this variable is then plotted
% TEST: FIT A GAUSSIAN VARIOGRAM MODEL DIRECTLY TO THE EMPIRICAL SEMIVARIOGRAM
f = #(par)mseval(par,D,semivar_emp,variomodel);
par0 = [10 10 0.1]; % initial guess for pars
[par,mse] = fminsearch(f, par0); % optimize
semivar_directfit = feval(variomodel, par, DT); % evaluate
% TEST 2: USE PARS FROM STK AS INPUT TO GAUSSIAN VARIOGRAM MODEL
par(1) = exp(model.param(1)); % sill, PARAM(1) = log (SIGMA ^ 2), where SIGMA is the standard deviation,
par(2) = sqrt(3)./exp(model.param(2)); % range, PARAM(2) = - log (RHO), where RHO is the range parameter. --- > RHO = exp(-PARAM(2))
par(3) = exp(model.lognoisevariance); % nugget
semivar_stkparswithvariomodel = feval(variomodel, par, DT);
% PLOT SEMIVARIOGRAM
figure(); hold on;
plot(D(:), semivar_emp(:),'.k'); % Observed variogram, raw
plot(DT, semivar_stk,'-b','LineWidth',2); % Theoretical variogram, on a grid
plot(DT, semivar_directfit,'--r','LineWidth',2); % Test direct fit variogram
plot(DT,semivar_stkparswithvariomodel,'--g','LineWidth',2); % Test direct fit variogram using pars from stk
legend('raw empirical semivariance (no binned data here for simplicity) ',...
'Gaussian cov model from STK, i.e. exp(Sigma2) - K + exp(lognoisevar)',...
'Gaussian semivariogram model (fitted directly to semivariance)',...
'Gaussian semivariogram model (using transformed params from STK)');
xlabel('Lag distance','Fontweight','b');
ylabel('Semivariance','Fontweight','b');
% NESTED FUNCTIONS
% Objective function for direct fit
function [mse] = mseval(par,D,Graw,variomodel)
Gmod = feval(variomodel, par, D);
mse = mean((Gmod-Graw).^2);
end
% Gaussian semivariogram model.
function [semivar] = stk_gausscov_iso_vario(par, D) %#ok<DEFNU>
% D : lag distance, c : sill, a : range, n : nugget
c = par(1); % sill
a = par(2); % range
if length(par) > 2, n = par(3); % nugget optional
else, n = 0; end
semivar = n + c .* (1 - exp( -3.*D.^2./a.^2 )); % Model
end
There is nothing wrong with the way you compute the semivariogram.
To understand the figure that you obtain, consider that:
The parameters of the model are estimated in STK using the (restricted) maximum likelihood method, not by least-squares fitting on the semi-variogram.
For very smooth stationary random fields observed over short intervals, you should not expect that the theoretical semivariogram will agree with the empirical semivariogram, with or without binning. The reason for this is that the observations, and thus the squared differences, are very correlated in this case.
To convince yourself of the second point, you can run the following script repeatedly:
% a smooth GP
model = stk_model (#stk_gausscov_iso, 1);
model.param = log ([1.0, 0.2]); % unit variance
x_max = 20; x_obs = x_max * rand (50, 1);
% Simulate data
z_obs = stk_generate_samplepaths (model, x_obs);
% Empirical semivariogram (raw, no binning)
h = (pdist (double (x_obs)))';
semivar_emp = 0.5 * (pdist (z_obs)') .^ 2;
% Model-based semivariogram
x1 = (0:0.01:x_max)';
x0 = zeros (size (x1));
K = feval (model.covariance_type, model.param, x0, x1, -1, true);
semivar_th = 1 - K;
% Figure
figure; subplot (1, 2, 1); plot (x_obs, z_obs, '.');
subplot (1, 2, 2); plot (h(:), semivar_emp(:),'.k'); hold on;
plot (x1, semivar_th,'-b','LineWidth',2);
legend ('empirical', 'model'); xlabel ('lag'); ylabel ('semivar');
Further questions on parameter estimation for Gaussian process models should probably be asked on Cross-Validated rather than Stack Overflow.