Implementing logistic regression with L2 regularization in Matlab - matlab

Matlab has built in logistic regression using mnrfit, however I need to implement a logistic regression with L2 regularization. I'm completely at a loss at how to proceed. I've found some good papers and website references with a bunch of equations, but not sure how to implement the gradient descent algorithm needed for the optimization.
Is there an easily available sample code in Matlab for this. I've found some libraries and packages, but they are all part of larger packages, and call so many convoluted functions, one can get lost just going through the trace.

Here is an annotated piece of code for plain gradient descent for logistic regression. To introduce regularisation, you will want to update the cost and gradient equations. In this code, theta are the parameters, X are the class predictors, y are the class-labels and alpha is the learning rate
I hope this helps :)
function [theta,J_store] = logistic_gradientDescent(theta, X, y,alpha,numIterations)
% Initialize some useful values
m = length(y); % number of training examples
n = size(X,2); %number of features
J_store = 0;
%J_store = zeros(numIterations,1);
for iter=1:numIterations
%predicts the class labels using the current weights (theta)
Z = X*theta;
h = sigmoid(Z);
%This is the normal cost function equation
J = (1/m).*sum(-y.*log(h) - (1-y).*log(1-h));
%J_store(iter) = J;
%This is the equation to obtain the given the current weights, without regularisation
grad = [(1/m) .* sum(repmat((h - y),1,n).*X)]';
theta = theta - alpha.*grad;
end
end

Related

Manual prediction of Gaussian Regression SVM in Matlab

I trained a SVM using the Regression Learner of Matlab with a Gaussian kernel. The learning worked really well and the RSE is small.
Now, I exported the model back to the Matlab workspace (trainedModel) and I can use the predict function to get the estimation of new values. However, I would like to manually implement the prediction function, because I need to export it to a different programming language, thus I cannot rely on the Matlab's predict function. Therefore, following the MATLAB explanation I implemented the following equation:
with
This is my code for a [0.5 1 50] input:
bias = trainedModel.RegressionSVM.Bias;
alpha = trainedModel.RegressionSVM.Alpha;
SV = trainedModel.RegressionSVM.SupportVectors;
Mu = trainedModel.RegressionSVM.Mu;
Sg = trainedModel.RegressionSVM.Sigma;
input = ([0.5 1 50] - Mu) ./ Sg;
sum = bias;
for n=1:length(alpha)
G = exp(-norm((SV(n,:)'-input))^2);
sum = sum + alpha(n) .* G;
end
disp(sum)
(Note that alpha is already the difference of the Lagrangian multipliers according to the documentation)
However, the predicted results are completely wrong. I think something is wrong with G because the values are very small (in the order of 10^(-25)), but I cannot figure out the error.
The mistake was very small... The reason is the transposition of the SV array, which is incorrect (it creates a matrix due to the - operator, but then it's hidden by the norm). Therefore, changing the following line:
G = exp(-norm((SV(n,:)'-input))^2);
to
G = exp(-norm((SV(n,:)-input))^2);
solved the problem.

MATLAB - Meaning of guassian distribution data. (in Neural Network)

I'm a newbie to MATLAB and now I'm trying to create a 2-d gaussian distribute data to train my neural network. I just found the code on the official document.
mu = [0 0];
Sigma = [.25 .3; .3 1];
x1 = -3:.2:3; x2 = -3:.2:3;
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
I know "mu" is average of the data. Sigma is something related to
Standard deviation. But I just don't get what is the idea of mesgrid and the interval(x1,x2). And the Geometric meaning of these code.
Also, can someone explain me why is guassian distribution so important in machine learning and data science? Cause all the course keep saying and saying this term.
Meshgrid is a basic matlab function, that is in no way specifically related to neural networks or a gaussian distribution. Check the documentation of Matlab to find out more about it.
The gaussian distribution (also known as normal distribution) is important for datascience because it comes with several nice statistical properties. Unfortunately it is hard to describe them all in a compact way, and this would also not be a question about programming, but more about statistics.
I think the code you provide seems confusing to you because you expect it to generate samples whereas it merely returns values of the Gaussian PDF (probability density function) for some given pairs of (x1,x2).
For example F = mvnpdf(a,b,mu, Sigma) returns the probability of x1=a and x2=b given that they follow a multivariate Gaussian distribution with mean mu and covariance matrix Sigma.
Being in Stack Overflow, I am focusing on the Matlab aspect of your question: for generating 100 samples of a 2-D Gaussian you can use something like the following (taken from the Matlab help of randn function):
mu = [1 2];
Sigma = [1 .5; .5 2];
R = chol(Sigma);
z = repmat(mu,100,1) + randn(100,2)*R;
The array z = [x1,x2] contains the x1 and x2 vectors that you are looking for.
Some statistics textbook or wikipedia could convince you on why the above code indeed generates such samples. The last line of code is related to one of the nice properties of a Gaussian distribution (or any other elliptical distribution).

Fast scaling of Gaussian Kernel by the Covariance of the Inputs

I am currently fiddling with multivariate kernel density estimations for estimating the probability density functions (PDF) of hydrological data sets using Matlab. I am most familiar with kernel density estimation using Gaussian kernels as outlined in Sharma (2000 and 2014) (where the kernel bandwidths are set using the Gaussian Reference Rule (GRR)). The GRR is written as follows (Sharma, 2000):
where lambda_ref = GRR bandwidth of kernel, n is the sample size, and d is the dimension of the data set we are using for density estimation. To estimate the multivariate density of our data set X we use the following formula (Sharma, 2000):
where lamda is the same as lamda_ref above, S is the sample covariance of X and det() stands for determinant.
My question is: I understand that there are many "fast" methods for calculating the Gaussian kernel function represented by the term exp() such as the method proposed here (using Matlab): http://mrmartin.net/?p=218. Since I will be working with data sets that are quite large in sample size (1000-10,000) I am looking for a fast code. Is anyone aware how I can write a fast code for the second equation that takes into account the inverse of the sample covariance matrix (S^-1)?
I greatly appreciate any help that can be provided on this issue. Thank you!
Note(s):
I understand that there is a Matlab code for calculating the second equation, found as a sub-function in: http://www.mathworks.com/matlabcentral/fileexchange/29039-mutual-information-2-variablle/content/MutualInfo.m. However this code has a bottleneck in how it calculates the kernel matrix.
References:
1 A. Sharma, Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3 — A nonparametric probabilistic forecast model, Journal of Hydrology, Volume 239, Issues 1–4, 20 December 2000, Pages 249-258, ISSN 0022-1694, http://dx.doi.org/10.1016/S0022-1694(00)00348-6.
2 Sharma, A., and R. Mehrotra (2014), An information theoretic alternative to model a natural system using observational information alone, Water Resour. Res., 50, 650–660, doi:10.1002/2013WR013845.
I have found a code that I am able to modify for my purposes. The original code is listed at the following link: http://www.kernel-methods.net/matlab/kernels/rbf.m.
Code
function K = rbf(coord,sig)
%function K = rbf(coord,sig)
%
% Computes an rbf kernel matrix from the input coordinates
%
%INPUTS
% coord = a matrix containing all samples as rows
% sig = sigma, the kernel width; squared distances are divided by
% squared sig in the exponent
%
%OUTPUTS
% K = the rbf kernel matrix ( = exp(-1/(2*sigma^2)*(coord*coord')^2) )
%
%
% For more info, see www.kernel-methods.net
%
%Author: Tijl De Bie, february 2003. Adapted: october 2004 (for speedup).
n=size(coord,1);
K=coord*coord'/sig^2;
d=diag(K);
K=K-ones(n,1)*d'/2;
K=K-d*ones(1,n)/2;
K=exp(K);
Modified Code incorporating sample covariance scaling:
xcov = cov(x.'); % sample covariance of the data
invxc = pinv(xcov); % inversion of data sample covariance
coord = x.';
sig = sigma; % kernel bandwidth
n = size(coord,1);
K = coord*invxc*coord'/sig^2;
d = diag(K);
K = K-ones(n,1)*d'/2;
K = K-d*ones(1,n)/2;
K = exp(K); % kernel matrix
I hope this helps someone else looking into the same problem.

Fitting a 2D Gaussian to 2D Data Matlab

I have a vector of x and y coordinates drawn from two separate unknown Gaussian distributions. I would like to fit these points to a three dimensional Gauss function and evaluate this function at any x and y.
So far the only manner I've found of doing this is using a Gaussian Mixture model with a maximum of 1 component (see code below) and going into the handle of ezcontour to take the X, Y, and Z data out.
The problems with this method is firstly that its a very ugly roundabout manner of getting this done and secondly the ezcontour command only gives me a grid of 60x60 but I need a much higher resolution.
Does anyone know a more elegant and useful method that will allow me to find the underlying Gauss function and extract its value at any x and y?
Code:
GaussDistribution = fitgmdist([varX varY],1); %Not exactly the intention of fitgmdist, but it gets the job done.
h = ezcontour(#(x,y)pdf(GaussDistributions,[x y]),[-500 -400], [-40 40]);
Gaussian Distribution in general form is like this:
I am not allowed to upload picture but the Formula of gaussian is:
1/((2*pi)^(D/2)*sqrt(det(Sigma)))*exp(-1/2*(x-Mu)*Sigma^-1*(x-Mu)');
where D is the data dimension (for you is 2);
Sigma is covariance matrix;
and Mu is mean of each data vector.
here is an example. In this example a guassian is fitted into two vectors of randomly generated samples from normal distributions with parameters N1(4,7) and N2(-2,4):
Data = [random('norm',4,7,30,1),random('norm',-2,4,30,1)];
X = -25:.2:25;
Y = -25:.2:25;
D = length(Data(1,:));
Mu = mean(Data);
Sigma = cov(Data);
P_Gaussian = zeros(length(X),length(Y));
for i=1:length(X)
for j=1:length(Y)
x = [X(i),Y(j)];
P_Gaussian(i,j) = 1/((2*pi)^(D/2)*sqrt(det(Sigma)))...
*exp(-1/2*(x-Mu)*Sigma^-1*(x-Mu)');
end
end
mesh(P_Gaussian)
run the code in matlab. For the sake of clarity I wrote the code like this it can be written more more efficient from programming point of view.

Defining an SDE in Matlab in which the Components Are Functions of Other SDEs

I am trying to create an SDE model in Matlab with the sde function in the Econometrics toolbox. From looking at the examples on the website, the basic case seems simple enough in that an equation like
dX(t) = 0.1 X(t) dt + 0.3 X(t) dW(t)
can be defined by first creating anonymous functions and then using those in the SDE equation as seen below (where the variables used in the functions are previously defined):
F = #(t,X) 0.1 * X;
G = #(t,X) 0.3 * X;
obj = sde(F, G) % dX = F(t,X)dt + G(t,X)dW
I was hoping to do something just a bit more complicated in which the drift term of the SDE I would like to model is a function of another SDE. Specifically, the equation is
dY(t) / Y(t) = G(t) dt + sigma dW(t)
Where G(t) is another SDE I've already defined. Would someone be able to give me a sense of what the equation for the drift term (corresponding to F in the code above) would be in this case?
I don't have the Econometrics toolbox so I can't give you detailed code for it (most Matlab installs don't have this toolbox by default). However, your case is a pretty common so I imagine that it shouldn't be too hard to do what you need. You might consider creating a service request with The MathWorks or posting a question at Quant.StackExchange. Make sure you're clear about the SDEs that you're interested in. Sorry that I can't help you more in that area.
Another way to simulate this coupled set of SDEs is to use my own SDETools toolbox which is available for free on GitHub. These are quite straightforward if you have any experience with Matlab's ODE Suite functions such as ode45. SDETools also has dedicated functions for common stochastic process (e.g., geometric Brownian motion and Ornstein-Uhlenbeck) using their analytic solutions. Here is basic code with arbitrary parameter values to simulate your SDEs using the Euler-Maryama integrator function, sde_euler, in SDETools:
t0 = 0;
dt = 1e-2;
tf = 1e1;
tspan = t0:dt:tf; % Time vector
y0 = [1;1]; % Initial conditions
kappa = 10;
mu = 0;
tau = 1e-1;
sigma = 1e-2;
f = #(t,y)[kappa.*(mu-y(1));y(1).*y(2)]; % Drift
g = #(t,y)[tau;sigma.*y(2)]; % Diffusion
% Set random seed and type of stochastic integration
options = sdeset('RandSeed',1,'SDEType','Ito');
Y = sde_euler(f,g,tspan,y0,options); % Euler-Maruyama
figure;
plot(tspan,Y)
In fact, it's possible that the sde function in the Econometrics toolbox could itself take the two function handles, f and g, I created above.
Here's another way you can simulate the system (it may or may not be faster), this time using the sde_ou function to first generate an OU process, which itself is independent of the other equation:
...
options = sdeset('RandSeed',1);
Y(:,1) = sde_ou(kappa,mu,tau,tspan,y0(1),options); % OU process
f = #(t,y)Y(round(t/dt)+1,1).*y; % Index into OU process as a function of t
g = #(t,y)sigma.*y;
options = sdeset('RandSeed',2,'SDEType','Ito');
Y(:,2) = sde_euler(f,g,tspan,y0(2),options); % Euler-Maruyama of just second equation
figure;
plot(tspan,Y)
Note that even the the same random seed is used for the OU process, the output of Y(:,1) will be slightly different for this and the sde_euler case because of the order in which the Wiener increments are generated internally. Again, you may well be able to do something similar using the function in the Econometrics toolbox.