What's wrong with my Logistic Regression parameters in MATLAB glmnet?

What's wrong with my Logistic Regression parameters in MATLAB glmnet? - matlab

I am using glmnet in MATLAB 2019a on my Macbook to do logistic regression.
Algorithm:
log(pi/(1-pi))=b0+X*b
pi=P(Y=2|X_i)=1-P(Y=1|X_i)
Code:
Y = [2;1;2;1;2;1;2;1;1;2];
X = [0.1451 0.1176 0.0872 0.0544 0.0197 -0.0164 -0.0533 -0.0907;
0.5096 0.7240 0.9038 1.0515 1.1694 1.2599 1.3253 1.3681;
-0.0593 -0.1683 -0.2738 -0.3754 -0.4730 -0.5660 -0.6543 -0.7376;
-1.0128 -0.9539 -0.9004 -0.8522 -0.8089 -0.7701 -0.7355 -0.7047;
0.7533 0.5640 0.4054 0.2752 0.1709 0.0900 0.0302 -0.0109;
0.2014 0.2595 0.3070 0.3444 0.3724 0.3918 0.4032 0.4074;
0.9174 0.8706 0.8260 0.7834 0.7423 0.7025 0.6636 0.6253;
0.7643 0.6115 0.4789 0.3653 0.2693 0.1897 0.1252 0.0744;
-0.3299 -0.5078 -0.6507 -0.7615 -0.8430 -0.8981 -0.9294 -0.9399;
-0.2141 -0.1472 -0.0818 -0.0179 0.0443 0.1045 0.1626 0.2183];
lambda=0.5;
family='binomial';
options.weights = [];
options.alpha = 1;
options.nlambda = 100;
options.lambda_min = 0;
options.lambda = lambda;
options.standardize = false;%true
options.thresh = 1E-4;
options.dfmax = 0;
options.pmax = 0;
options.exclude = [];
options.penalty_factor = [];
options.maxit = 100;
options.HessianExact = false;
options.type = 'naive';
fit = glmnet(X,Y,family,options);
Outcome:
a0 0
label [1;2]
beta [0;0;0;0;0;0;0;0]
dev 0
nulldev 13.8629
df 0
lambda 0.5000
npasses 1
jerr 0
dim [8,1]
class 'lognet'
No matter how I changed my input and output, the function always returns coefficients with all 0. I picked these options carefully so I am really confused how it comes to this outcome.
Is it because the data I picked are special or the parameters are wrong?

I have just found where the problem is. The data here is small, so lambda=0.5 implements an overly strong penalty to it.
Changing lambda to less than 0.01 solves the problem. Plus, options.pmax = 0 should be deleted, otherwise it asks for all coefficients to be 0.

Related

Solver stops before obtaining a solution in the MATLAB optimization toolbox

I use fmincon function of MATLAB optimization toolbox for a minimization problem. The basic part of the code is as follows:
clc;
clear;
load('coord_points.mat');
load('coord_centroids.mat');
numberOfPoints=200;
numberOfCentroids=40;
Q=5;
tau=0.1;
d = zeros(numberOfPoints,numberOfCentroids);
for j=1:numberOfCentroids
centroid = coord_centroids(j,:);
for i=1:numberOfPoints
point = coord_points(i,:);
d(i,j)=norm(centroid-point);
end
end
sectorization = optimproblem;
x = optimvar('x',numberOfPoints,numberOfCentroids,'LowerBound',0,'UpperBound',1);
y = optimvar('y',1,'LowerBound',0,'UpperBound',1);
qjExpression=optimexpr(numberOfCentroids);
for j=1:numberOfCentroids
qjExpression(j) = sum(x(:,j));
end
qExpression = mean(qjExpression);
objExpression = y;
%%%
sectorization.Objective = objExpression;
cons1 = optimconstr(numberOfPoints,1);
for i=1:numberOfPoints
cons1(i) = sum(x(i,:)) == 1;
end
cons2 = optimconstr(numberOfCentroids,1);
for j=1:numberOfCentroids
cons2(j) = sum(x(:,j)) >= 1;
end
cons3 = optimconstr(numberOfCentroids,1);
for j=1:numberOfCentroids
cons3(j) = sum(x(:,j)) <= Q*(1+tau);
end
cons4 = optimconstr(1);
varianceOfQj = optimexpr(1);
for j=1:numberOfCentroids
varianceOfQj = varianceOfQj + (qjExpression(j)-qExpression)^2;
end
varianceOfQj=varianceOfQj/numberOfCentroids;
cons4 = varianceOfQj - y <= 0;
sectorization.Constraints.cons1=cons1;
sectorization.Constraints.cons2=cons2;
sectorization.Constraints.cons3=cons3;
sectorization.Constraints.cons4=cons4;
%load('sol.mat');
sol.x=zeros(numberOfPoints,numberOfCentroids);
sol.y=1;
problem = prob2struct(sectorization,sol);
options = optimoptions(#fmincon,'MaxFunctionEvaluations',5000,'Algorithm','active-set','Display','iter')
[x,fval] = fmincon(problem);
But I get the following result:
Solver stopped prematurely.
fmincon stopped because it exceeded the function evaluation limit, options.MaxFunctionEvaluations = 3.000000e+03.
However, as seen in the code, in the options section, max function evaluations is made equal to 5000.
Even, while the program is running, at first outputs as follows are appeared:
*options =
fmincon options:
Options used by current Algorithm ('active-set'):
(Other available algorithms: 'interior-point', 'sqp', 'sqp-legacy', 'trust-region-reflective')
Set properties:
Algorithm: 'active-set'
Display: 'iter'
MaxFunctionEvaluations: 5000
Default properties:
CheckGradients: 0
ConstraintTolerance: 1.0000e-06
FiniteDifferenceStepSize: 'sqrt(eps)'
FiniteDifferenceType: 'forward'
FunctionTolerance: 1.0000e-06
MaxIterations: 400
OptimalityTolerance: 1.0000e-06
OutputFcn: []
PlotFcn: []
SpecifyConstraintGradient: 0
SpecifyObjectiveGradient: 0
StepTolerance: 1.0000e-06
TypicalX: 'ones(numberOfVariables,1)'
UseParallel: 0
Show options not used by current Algorithm ('active-set')*
What can I do to make the Solver work properly? Why does it stop even though I have changed the setting for the maximum evaluation?
Please give your feedback.

Error with matrix indices in heat flow simulation

When I run this code that I've written to simulate a heat flow model in MATLAB i get an error that says 'Subscript indices must either be real positive integers or logicals.' I think this is probably something to do with my linspace command generating a different type of variable not integers and so it's not working properly but I'm not sure how to amend my script to correct for this.
Cp = 400;
p = 8960;
k = 400;
a = k/(p*Cp);
dt = 0.01;
dx = sqrt(5*a*dt); %% 5 as 1/5 is smaller than 1/4 for stability
T = zeros(20000,10000);
for x = linspace(1,10000,10000);
T(x,:) = 1000;
end
for x = linspace(10001,20000,10000);
T(x,:) = 25;
end
for t = linspace(1,10000,10000);
for x = linspace(1,20000,20000);
T(x,t+1) = T(x,t)+a*dt*((T(x-1,t)-2*T(x,t)+ T(x+1,t))/(dx*dx));
end
end

The line that blows up is:
T(x,t+1) = T(x,t)+a*dt*((T(x-1,t)-2*T(x,t)+ T(x+1,t))/(dx*dx));
Specifically T(x-1,t) triggers the error because x starts as 1, hence x - 1 = 0 and 0 is not a valid index.
On a more general Matlab coding note, I would write x = 1:10000 instead of x = linspace(1,10000,10000), but this is not causing the error. Note that I'm only addressing the Matlab error message. I have no idea whether your overall code works.

Matlab: binomial simulation

How do i simulate a binomial distribution with values for investment with two stocks acme and widget?
Number of trials is 1000
invest in each stock for 5 years
This is my code. What am I doing wrong?
nyears = 5;
ntrials = 1000;
startamount = 100;
yrdeposit = 50;
acme = zeros(nyears, 1);
widget = zeros(nyears,1);
v5 = zeros(ntrials*5, 1);
v5 = zeros(ntrials*5, 1);
%market change between -5 to 1%
marketchangeacme = (-5+(1+5)*rand(nyears,1));
marketchangewidget = (-3+(3+3)*rand(nyears,1));
acme(1) = startamount;
widget(1) = startamount;
for m=1:numTrials
for n=1:nyears
acme(n) = acme(n-1) + (yrdeposit * (marketchangeacme(n)));
widget(n) = acme(n-1) + (yrdeposit * (marketchangewidget(n)));
vacme5(i) = acme(j);
vwidget5(i) = widget(j);
end
theMean(m) = mean(1:n*nyears);
p = 0.5 % prob neg return
acmedrop = (marketchangeacme < p)
widgetdrop = (marketchangewidget <p)
end
plot(mean)

Exactly what you are trying to calculate is not clear. However some things that are obviously wrong with the code are:
widget(n) presumable isn't a function of acme(n-1) but rather 'widget(n-1)`
Every entry of theMean will be mean(1:nyears*nyears), which for nyears=5 will be 13. (This is because n=nyears always at that point in code.)
The probability of a negative return for acme is 5/6, not 0.5.
To find the locations of the negative returns you want acmedrop = (marketchangeacme < 0); not < 0.5 (nor any other probability). Similarly for widgetdrop.
You are not preallocating vacme5 nor vwidget5 (but you do preallocate v5 twice, and then never use it.
You don't create a variable called mean (and you never should) so plot(mean) will not work.

numerical computation problems in matlab

in this question i am addressing to numerical computation problems in matlab and want to get experience how to avoid this problems/errors in future
for example let consider following simple codes
t = 0.4 + 0.1 - 0.5
t =
0
it works fine,but
u = 0.4 - 0.5 + 0.1
u =
2.7756e-17
of course in mind it is also 0,but why not in first calculation got the same result?or what is difference?also please look
v = (sin(2*pi) = = sin(4*pi))
v = (sin(2*pi)==sin(4*pi))
v =
0
it shows that sine function is not periodic,so what is general advice in this case?introduce some epsilon?like
V=((sin(2*pi)-sin(4*pi))<eps)
V =
0
or
EPS=0.000000000000001
EPS =
1.0000e-15
>> V=((sin(2*pi)-sin(4*pi))<EPS)
V =
1
please help me

It's normal you get these results, because floating-point relative accuracy in Matlab is
eps('double')
ans =
2.2204e-16
For V=((sin(2*pi)-sin(4*pi))<eps), because
sin(2*pi)-sin(4*pi)
ans =
2.4493e-16
which is larger than eps('double'), so its result will be V=0.
And for V=((sin(2*pi)-sin(4*pi))<EPS), because EPS>2.4493e-16, so its result will be V=1.

Generating random numbers...Faster way?

Using Run & Time on my algorithm I found that is a bit slow on adding standard deviation to integers. First of all I created the large integer matrix:
NumeroCestelli = 5;
lover_bound = 0;
upper_bound = 250;
steps = 10 ;
Alpha = 0.123
livello = [lover_bound:steps:upper_bound];
L = length(livello);
[PianoSperimentale] = combinator(L,NumeroCestelli,'c','r');
for i=1:L
PianoSperimentale(PianoSperimentale==i)=livello(i);
end
then I add standard deviation (sigma = alpha * mu) and error (of a weigher) like this:
%Standard Deviation
NumeroEsperimenti = size(PianoSperimentale,1);
PesoCestelli = randn(NumeroEsperimenti,NumeroCestelli)*Alfa;
PesoCestelli = PesoCestelli.*PianoSperimentale + PianoSperimentale;
random = randn(NumeroEsperimenti,NumeroCestelli);
PesoCestelli(PesoCestelli<0) = random(PesoCestelli<0).*(Alfa.*PianoSperimentale(PesoCestelli<0) + PianoSperimentale(PesoCestelli<0));
%Error
IncertezzaCella = 0.5*10^(-6);
Incertezza = randn(NumeroEsperimenti,NumeroCestelli)*IncertezzaCella;
PesoIncertezza = PesoCestelli.*Incertezza+PesoCestelli;
PesoIncertezza = (PesoIncertezza<0).*(-PesoIncertezza)+PesoIncertezza;
Is there a faster way?

There is not enough information for me to test it, but I bet that eliminating all the duplicate calculations that you do will lead to a speedup. I have tried to remove some of them:
PesoCestelli = randn(NumeroEsperimenti,NumeroCestelli)*Alfa;
PesoCestelli = (1+PesoCestelli).*PianoSperimentale;
random = randn(NumeroEsperimenti,NumeroCestelli);
idx = PesoCestelli<0;
PesoCestelli(idx) = random(idx).*(1+Alfa).*PianoSperimentale(idx);
%Error
IncertezzaCella = 0.5*10^(-6);
Incertezza = randn(NumeroEsperimenti,NumeroCestelli)*IncertezzaCella;
PesoIncertezza = abs((1+PesoCestelli).*Incertezza);
Note that I reduced the last two lines to a single line.

You calculate PesoCestelli<0 a number of times. You could just calculate it once and save teh value. You also create a full set of random numbers, but only use a subset of them where PesoCestelli<0. You might be able to speed things up by only creating the number of random numbers you need.
It is not clear what Alfa is, but if it is a scalar, instead of
Alfa.*PianoSperimentale(PesoCestelli<0) + PianoSperimentale(PesoCestelli<0)
it might be faster to do
(1+Alfa).*PianoSperimentale(PesoCestelli<0)