logistic regression in matlab

logistic regression in matlab - matlab

i would like to implement logistic regression in matlab, i have following few code for this
function B=logistic_regression(x,y)
f=#(a)(sum(y.*log((exp(a(1)+a(2)*x)/(1+exp(a(1)+a(2)*x))))+(1-y).*log((1-((exp(a(1)+a(2)*x)/(1+exp(a(1)+a(2)*x))))))));
a=[0.1, 0.1];
options = optimset('PlotFcns',#optimplotfval);
B = fminsearch(f,a, options);
end
logistic regression is following :
first we are calculating logit which is equal to
L=b0+b1*x
then we are calculating probability which is equal to
p=e^L/(1+e^L)
and finally we are calculating
y*ln(p)+(1-y)*ln(1-p)
i decided to write all those stuff in one line, but when i am running code , it gives me following error
>> B=logistic_regression(x,y)
Assignment has more non-singleton rhs dimensions than non-singleton subscripts
Error in fminsearch (line 200)
fv(:,1) = funfcn(x,varargin{:});
Error in logistic_regression (line 6)
B = fminsearch(f,a, options);
how can i fix this problem? thanks in advance

In order to implement a logistic regression model, I usually call the glmfit function, which is the simpler way to go. The syntax is:
b = glmfit(x,y,'binomial','link','logit');
b is a vector that contains the coefficients for the linear portion of the logistic regression (the first element is the constant term alpha of the regression). x contains the predictors data, with one row for each observation and one column for each variable. y contains the target variable, usually a vector of boolean (0 or 1) values representing the outcome.
Once you obtain the coefficients, you have to apply the linear part of the regression to your predictors:
z = b(1) + (x * b(2));
To finish, you must apply the logistic function to the output of the linear part:
z = 1 ./ (1 + exp(-z));
If you need more tinkering on your data or on your output, and you require more flexibility and control over your model, I suggest you to look at this implementation:
https://github.com/mohammadaltaleb/Logistic-Regression

Related

Plot log(n over k)

I've never used Matlab before and I really don't know how to fix the code. I need to plot log(1000 over k) with k going from 1 to 1000.
y = #(x) log(nchoosek(1000,x));
fplot(y,[1 1000]);
Error:
Warning: Function behaves unexpectedly on array inputs. To improve performance, properly
vectorize your function to return an output with the same size and shape as the input
arguments.
In matlab.graphics.function.FunctionLine>getFunction
In matlab.graphics.function.FunctionLine/updateFunction
In matlab.graphics.function.FunctionLine/set.Function_I
In matlab.graphics.function.FunctionLine/set.Function
In matlab.graphics.function.FunctionLine
In fplot>singleFplot (line 241)
In fplot>#(f)singleFplot(cax,{f},limits,extraOpts,args) (line 196)
In fplot>vectorizeFplot (line 196)
In fplot (line 166)
In P1 (line 5)

There are several problems with the code:
nchoosek does not vectorize on the second input, that is, it does not accept an array as input. fplot works faster for vectorized functions. Otherwise it can be used, but it issues a warning.
The result of nchoosek is close to overflowing for such large values of the first input. For example, nchoosek(1000,500) gives 2.702882409454366e+299, and issues a warning.
nchoosek expects integer inputs. fplot uses in general non-integer values within the specified limits, and so nchoosek issues an error.
You can solve these three issues exploiting the relationship between the factorial and the gamma function and the fact that Matlab has gammaln, which directly computes the logarithm of the gamma function:
n = 1000;
y = #(x) gammaln(n+1)-gammaln(x+1)-gammaln(n-x+1);
fplot(y,[1 1000]);
Note that you get a plot with y values for all x in the specified range, but actually the binomial coefficient is only defined for non-negative integers.

OK, since you've gotten spoilers for your homework exercise anyway now, I'll post an answer that I think is easier to understand.
The multiplicative formula for the binomial coefficient says that
n over k = producti=1 to k( (n+1-i)/i )
(sorry, no way to write proper formulas on SO, see the Wikipedia link if that was not clear).
To compute the logarithm of a product, we can compute the sum of the logarithms:
log(product(xi)) = sum(log(xi))
Thus, we can compute the values of (n+1-i)/i for all i, take the logarithm, and then sum up the first k values to get the result for a given k.
This code accomplishes that using cumsum, the cumulative sum. Its output at array element k is the sum over all input array elements from 1 to k.
n = 1000;
i = 1:1000;
f = (n+1-i)./i;
f = cumsum(log(f));
plot(i,f)
Note also ./, the element-wise division. / performs a matrix division in MATLAB, and is not what you need here.

syms function type reproduces exactly what you want
syms x
y = log(nchoosek(1000,x));
fplot(y,[1 1000]);

This solution uses arrayfun to deal with the fact that nchoosek(n,k) requires k to be a scalar. This approach requires no toolboxes.
Also, this uses plot instead of fplot since this clever answer already addresses how to do with fplot.
% MATLAB R2017a
n = 1000;
fh=#(k) log(nchoosek(n,k));
K = 1:1000;
V = arrayfun(fh,K); % calls fh on each element of K and return all results in vector V
plot(K,V)
Note that for some values of k greater than or equal to 500, you will receive the warning
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits
because nchoosek(1000,500) = 2.7029e+299. As pointed out by #Luis Mendo, this is due to realmax = 1.7977e+308 which is the largest real floating-point supported. See here for more info.

Squared covariance function of Gaussian process using matlab?

This is my first attempt to write the covariance function. I have following values,
x = [-1.50 -1.0 -.75 -.40 -.25 0.00];
sf = 1.27;
ell = 1;
sn = 0.3;
The formula for squared exponential covariance function is
The matlab code for that I have written as :
K = sf^2*exp(-0.5*(squareform(pdist(x)).^2)/ell^2)+(sn)^2*eye(Ntr,Ntr);
where sf is the signal standard deviation, ell the characteristic length scale, sn the noise standard deviation and Ntr the length of the training input data x.
But this giving me no result. Is there is any mistake in my coding ?
And once I calculate, I want to summarize into matrix form as shown below ,
Any help please ?
if x_ = 0.2 then how we can calculate :
a) K_ =[k(x_,x1) k(x_,x2)..........k(x_,xn)] and
b) K__ = k(x_,x_)
Using matlab ?

I obtain the following error message:
Error using +
Matrix dimensions must agree.
Error in untitled3 (line 7)
K = sf^2*exp(-0.5*(squareform(pdist(x)).^2)/ell^2)+(sn)^2*eye(Ntr,Ntr);
indicating that your matrix dimensions do not agree. If you evaluate the parts of your code separately, you will notice that pdist(x) returns a 1x0 vector. In the documentation of pdist the expected format of x is explained:
Rows of X correspond to observations, and columns correspond to
variables
So, instead of calculating pdist(x), you should calculate it for the transpose of x, i.e. pdist(x.'):
K = sf^2*exp(-0.5*(squareform(pdist(x.')).^2)/ell^2)+(sn)^2*eye(Ntr,Ntr);
As a conclusion, always read the error messages and documentation carefully, especially the expected format of your input arguments.
Subquestions
To calculate K for a specific value of x_ (x' in your mentioned formula), you can convert your given formula almost literally to MATLAB:
K_ = sf^2*exp(-0.5*(x-x_).^2/ell^2)+(sn)^2*(x == x_);
To calculate K__, you can use the formula above and setting x = x_, or you can simplify your formula to:
K__ = sf^2+sn^2;

How to perform convolution using Fourier Series

OK let me cut to the chase.
I am trying to use MATLAB to
(i)generate the fourier series based on known coefficients and thereafter
(ii) determine the output function when the impulse is known.
So far I used this code to obtain the fourier series:
clear all
syms x k L n
evalin(symengine,'assume(k,Type::Integer)');
a = #(f,x,k,L) (2/(pi*k))* sin((pi*k)/(2 * L));
fs = #(f,x,n,L) (1/2*L) + symsum(a(f,x,k,L)*cos(k*2*pi*x/L),k,1,n);
f = x;
pretty(fs(f,x,11,1))
This works as desired. Now the impulse response is as follows:
h = heaviside(x) * exp(-5*x);
Now, in order to obtain the function, we need to perform the convolution with the respective functions.But when I input the following, I get the error:
x1 = fs(f,x,1,1);
conv(h,x1)
Undefined function 'conv2' for input arguments of type 'sym'.
Error in conv (line 38)
c = conv2(a(:),b(:),shape);
Any help would be appreciated

That is because conv is only defined for numeric inputs. If you want to find the convolution symbolically, you'll have to input the equation yourself symbolically using integration.
If you recall, the convolution integral is defined as:
Source: Wikipedia
Therefore, you would do this:
syms x tau;
F = int(h(tau)*x1(x-tau),'tau',-inf,+inf);
int is a function in MATLAB that does symbolic integration for you. Also note that the convolution integral is commutative, and so this also works:
Source: Wikipedia
Therefore, you should also get the same answer if you did:
syms x tau;
F = int(h(x-tau)*x1(tau),'tau',-inf,+inf);
Hope this helps!

Solving a Second Order Differential with Matrix input

I am trying to solve a second order differential using ODE45 in Matlab with matrix as inputs. I am struck with couple of errors that includes :
"In an assignment A(I) = B, the number of elements in B and
I must be the same."
Double order differential equations given below:
dy(1)= diag(ones(1,100) - 0.5*y(2))*Co;
dy(2)= -1 * Laplacian(y(1)) * y(2);
Main function call is:
[T,Y] = ode45(#rigid,[0.000 100.000],[Co Xo]);
Here, Co is Matrix of size 100x100 and Xo is a column matrix of size 100x1. Laplacian is a pre-defined function to compute matrix laplacian.
I will appreciate any help in this. Should I reshape input matrices and vectors to fall in same dimensions or something?

Your guess is correct. The MATLAB ode suite can solve only vector valued ode, i.e. an ode of the form y'=f(t,y). In your case you should convert y, and dy, back and forth between a matrix and an array by using reshape.
To be more precise, the initial condition will be transformed into the array
y0 = reshape([Co Xo], 100*101, 1);
while y will be obtained with
y_matrix = reshape(y, 100, 101);
y1 = y_matrix(:,1:100);
y2 = y_matrix(:,101);
After having computed the matrices dy1 and dy2 you will have to covert them in an array with
dy = reshape([dy1 dy2], 100*101, 1);
Aside from the limitations of ode45 your code gives that error because, in MATLAB, matrices are not indexed in that way. In fact, if you define A = magic(5), A(11) gives the eleventh element of A i.e. 1.

Programming a Basic Neural Network from scratch in MATLAB

I have asked a few questions about neural networks on this website in the past and have gotten great answers, but I am still struggling to implement one for myself. This is quite a long question, but I am hoping that it will serve as a guide for other people creating their own basic neural networks in MATLAB, so it should be worth it.
What I have done so far could be completely wrong. I am following the online stanford machine learning course by Professor Andrew Y. Ng and have tried to implement what he has taught to the best of my ability.
Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?
I have a feed 2 layer feed forward neural network.
The MATLAB code for the feedforward part is:
function [ Y ] = feedforward2( X,W1,W2)
%This takes a row vector of inputs into the neural net with weight matrices W1 and W2 and returns a row vector of the outputs from the neural net
%Remember X, Y, and A can be vectors, and W1 and W2 Matrices
X=transpose(X); %X needs to be a column vector
A = sigmf(W1*X,[1 0]); %Values of the first hidden layer
Y = sigmf(W2*A,[1 0]); %Output Values of the network
Y = transpose(Y); %Y needs to be a column vector
So for example a two layer neural net with two inputs and two outputs would look a bit like this:
a1
x1 o--o--o y1 (all weights equal 1)
\/ \/
/\ /\
x2 o--o--o y2
a2
if we put in:
X=[2,3];
W1=ones(2,2);
W2=ones(2,2);
Y = feedforward2(X,W1,W2)
we get the the output:
Y = [0.5,0.5]
This represents the y1 and y2 values shown in the drawing of the neural net
The MATLAB code for the squared error cost function is:
function [ C ] = cost( W1,W2,Xtrain,Ytrain )
%This gives a value seeing how close W1 and W2 are to giving a network that represents the Xtrain and Ytrain data
%It uses the squared error cost function
%The closer the cost is to zero, the better these particular weights are at giving a network that represents the training data
%If the cost is zero, the weights give a network that when the Xtrain data is put in, The Ytrain data comes out
M = size(Xtrain,1); %Number of training examples
oldsum = 0;
for i = 1:M,
H = feedforward2(Xtrain,W1,W2);
temp = ( H(i) - Ytrain(i) )^2;
Sum = temp + oldsum;
oldsum = Sum;
end
C = (1/2*M) * Sum;
end
Example
So for example if the training data is:
Xtrain =[0,0; Ytrain=[0/57;
1,2; 3/57;
4,1; 5/57;
5,2; 7/57; a1
3,4; 7/57; %This will be for a two input one output network x1 o--o y1
5,3; 8/57; \/ \_o
1,5; 6/57; /\ /
6,2; 8/57; x2 o--o
2,1; 3/57; a2
5,5;] 10/57;]
We start with initial random weights
W1=[2,3; W2=[3,2]
4,1]
If we put in:
Y= feedforward2([6,2],W1,W2)
We get
Y = 0.9933
Which is far from what the training data says it should be (8/57 = 0.1404). So the initial random weights W1 and W2 where a bad guess.
To measure exactly how bad/good a guess the random weights weights are we use the cost function:
C= cost(W1,W2,Xtrain,Ytrain)
This gives the value:
C = 6.6031e+003
Minimizing the cost function
If we minimize the cost function by searching all of the possible variables W1 and W2 and then picking the lowest, this will give the network that best approximates the training data
But when I Use the code:
[W1,W2]=fminsearch(cost(W1,W2,Xtrain,Ytrain),[W1,W2])
It gives an error message. It says: "Error using horzcat. CAT arguments dimensions are not consistent."Why am I getting this error and what can I do to fix it?
Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?
Thank you!!!

Your Neural network seems alright, although the kind of training you're trying to do is quite in-efficient if you're training against labeled data as you're doing. In that case I would suggest looking into Back-propagation
About your error when training: Your error message hints at the problem: dimensions are not consistent
As argument x0 in fminsearch which is the initial guess for the optimizer, you send [W1, W2] but from what I can see, these matrices don't have the same number of rows, and therefore you can't add them together like that. I would suggest modifying your cost-function to take a vector as argument and then form your weight-vectors for different layers from that one vector.
You are also not supplying the cost-function correctly to fminsearch as you are just evaluating cost with w1, w2, Xtrain and Ytrain in-place.
According to the documentation (it's been years since I used Matlab) it seems like you pass the pointer to the cost-function as
fminsearch(cost, [W1; W2])
EDIT: You could express your weights and modify your code as follows:
global Xtrain
global Ytrain
W = [W1; W2]
fminsearch(cost, W)
Cost-function must be modified such that it doesn't take Xtrain, Ytrain as input because fminsearch will then try to optimize those too. Modify your cost-function like this:
function [ C ] = cost( W )
W1 = W[1:2,:]
W2 = W[3,:]
global Xtrain
global Ytrain
...