Understanding scipy integrate's internal behavior - scipy

I am trying to understand what scipy.integrate is doing internally. Namely, it seems that something weird and inconsistent is happening.
How get it working properly? I need it to perform one integration step at a time, because I do some stuff with t inside the ODE and need it to be consistent
So, here is my MWE
import numpy as np
from scipy.integrate import ode
t0 = 0
t1 = 1
def myODE(t, x):
print('INTERNAL t = {time:2.3f}'.format(time=t))
Dx = np.zeros([2, 1])
Dx[0] = -x[0]**2
Dx[1] = -x[1]**2
return Dx
simulator = ode(myODE).set_integrator('dopri5')
simulator.set_initial_value(np.ones([2,1]), t0)
t = simulator.t
while t < t1:
t = simulator.t
print('Outside integrate t = {time:2.3f}'.format(time=t))
x = simulator.integrate(2, step=True)
print('x1 = {x1:2.3f}'.format(x1=x[0,0]))
What I'm trying to do to perform one integration step at a time. Instead, integrate does something else. As you can see from the output below, it performs several steps at a time, and those steps are inconsistent: sometimes, t increases and the decreases again.
Outside integrate t = 0.000
INTERNAL t = 0.000
INTERNAL t = 0.010
INTERNAL t = 0.004
INTERNAL t = 0.006
INTERNAL t = 0.016
...
INTERNAL t = 1.969
INTERNAL t = 1.983
INTERNAL t = 2.000
INTERNAL t = 2.000
x1 = 0.333
Outside integrate t = 2.000
INTERNAL t = 2.000
...
INTERNAL t = 2.000
x1 = 0.333

All solvers that are better than the standard fixed-step RK4 method use a variable step size. Treating the solver as a black-box, one can not know what internal step sizes are used.
What is known, however, is that the explicit one-step methods have multiple stages, at least equal to their order, that each comprise a call to the ODE function at a point close to, but not necessarily on the solution trajectory. Implicit methods may have less stages than the order, but require an iterative approach to the solution of the implicit step equations.
The Dormand-Prince 45 method has 7 stages, where the last stage is also the first of the next step, so in the long-time average 6 evaluations per step. This is what you see in the ode(dopri) method.
INTERNAL t = 0.00000000
INTERNAL t = 0.01000000
INTERNAL t = 0.00408467
INTERNAL t = 0.00612700
INTERNAL t = 0.01633866
INTERNAL t = 0.01815407
INTERNAL t = 0.02042333
INTERNAL t = 0.02042333
INTERNAL t = 0.03516563
INTERNAL t = 0.04253677
INTERNAL t = 0.07939252
INTERNAL t = 0.08594465
INTERNAL t = 0.09413482
INTERNAL t = 0.09413482
Outside integrate t = 0.09413482
...
There one can see that the minimal step of the scipy method consists of 2 DoPri steps. In the sequence of the first step, the first evaluation is just probing if the initial step size is appropriate, this is only done once. All the other step points are at the prescribed times t_n+c_i*dt where c=[0,1/5,3/10,4/5,8/9,1,1].
You can get proper single steps with the new classes that are the steppers for the new interface solve_ivp. Take care that the default tolerances are here much looser than in the ode(dopri) case, probably following the Matlab philosophy of generating "good enough" plots with minimal effort. For RK45 this can look like
simulator = RK45(myODE, t0, [1,1], t1, atol=6.8e-7, rtol=2.5e-8)
t = simulator.t
while t < t1:
simulator.step()
t = simulator.t
x = simulator.y
print(f'Outside integrate t = {t:12.8f}')
print(f'x1 = {x[0]:12.10f}, err = {x[0]-1/(1+t):8.6g}')
This uses slightly different internal steps, but, as said, has a "true" single-step output.
INTERNAL t = 0.00000000
INTERNAL t = 0.01000000
INTERNAL t = 0.00408223
INTERNAL t = 0.00612334
INTERNAL t = 0.01632891
INTERNAL t = 0.01814323
INTERNAL t = 0.02041114
INTERNAL t = 0.02041114
Outside integrate t = 0.02041114
x1 = 0.9799971436, err = 5.2347e-13
INTERNAL t = 0.04750541
INTERNAL t = 0.06105254
INTERNAL t = 0.12878821
INTERNAL t = 0.14083011
INTERNAL t = 0.15588248
INTERNAL t = 0.15588248
Outside integrate t = 0.15588248
x1 = 0.8651399668, err = 1.13971e-07
...
If you have an input that is a step function, or a zero-order hold, the most expedient solution would be to loop over the steps and initialize one RK45 object per step with the step segment as integration boundaries. Save the last value as initial value for the next step. Perhaps also the last step size as initial step size in the next step.
Directly using a step function inside the ODE function is inefficient, as the step size controller expects a very smooth ODE function for an optimal step size sequence. At jumps that is grossly violated and can lead to stark local reductions in the step size, and accordingly an increased number of function evaluations.

Related

Random number seed overlapping issue

I am using Matlab GPU computing to run a simulation. I suspect I may encounter a "random number seed" overlapping issue. My code is the following
N = 10000;
v = rand(N,1);
p = [0:0.1:1];
pA = [0:0.1:2];
[v,p,pA] = ndgrid(v,p,pA);
v = gpuArray(v);
p = gpuArray(p);
pA = gpuArray(pA);
t = 1;
bH = 0.9;
bL = 0.6;
a = 0.5;
Y = MyFunction(v,p,pA,t,bH,bL,a);
function[RA] = MyFunction(v,p,pA,t,bH,bL,a)
function[RA] = SSP1(v,p,pA)
RA = 0;
S1 = rand;
S2 = rand;
S3 = rand;
vA1 = (S1<a)*bH+(S1>=a)*bL;
vA2 = (S2<a)*bH+(S2>=a)*bL;
vA3 = (S3<a)*bH+(S3>=a)*bL;
if p<=t && pA>3*bL && pA<=3*bH
if pA>vA1+vA2+vA3
if v>=p
RA = p;
end
else
if v+vA1+vA2+vA3>=p+pA
RA = p+pA;
end
end
end
end
[RA] = gather(arrayfun(#SSP1,v,p,pA));
end
The idea of the code is the following:
I generate N random agents, which is characterized by the value of v. Then for each agent, I have to compute a quantity given (p,pA). As I have N agents and many combinations of (p,pA), I want to use GPU to speed up the process. But here comes a tricky thing:
for each agent, in order to finish my computation, I have to generate 3 extra random variables, vA1,vA2,vA3. Based on my understanding of GPU (I could be wrong), it does these computations simultaneously, i.e, for each agent v, it generates 3 random variables vA1,vA2,vA3. And GPU does this N procedures at the same time. However, I am not sure whether for agent 1 and agent 2, the corresponding vA1,vA2,vA3 may overlap? Because here N could be 1 million. I want to make sure that for all of these agents, the random number seed that is used to generate their corresponding vA1,vA2,vA3 won't overlap; otherwise, I am in big trouble.
There is a way to prevent this from happening, which is: I first generate 3N of these random variables vA1,vA2,vA3. Then I put them into my GPU. However, that may require a lot of GPU memory, which I don't have. The current method, I guess does not need too much GPU memory, as I am generating vA1,vA2,vA3 on the fly?
What you say does not happen. The proof is that the following code snipped generates random values in hB.
A=ones(100,1);
dA=gpuArray(A);
[hB] = gather(arrayfun(#applyrand,dA));
function dB=applyrand(dA)
r=rand;
dB=dA*r;
end
That said, your code has only 12 values for your random variables (4 for each) because for your use of S1, S2 and S3 you are basically flipping a coin:
vA1 = (S1<0.5)*bH+(S1>=0.5)*bL;
so vA1 is either 0, bH, bL or bH+bL.
Maybe this lack of variability is what is making you think that you don't have much randomness, not very clear from the question.

SVGP for US Flight data

My problem is the optimization issue for SVIGP in the US Flight dataset.
I implemented the SVGP model for the US flight data mentioned in the Hensman 2014 using the number of inducing point = 100, batch_size = 1000, learning rate = 1e-5 and maxiter = 500.
The result is pretty strange end ELBO does not increase and it have large variance no matter how I tune the learning rate
Initialization
M = 100
D = 8
def init():
kern = gpflow.kernels.RBF(D, 1, ARD=True)
Z = X_train[:M, :].copy()
m = gpflow.models.SVGP(X_train, Y_train.reshape([-1,1]), kern, gpflow.likelihoods.Gaussian(), Z, minibatch_size=1000)
return m
m = init()
Inference
m.feature.trainable = True
opt = gpflow.train.AdamOptimizer(learning_rate = 0.00001)
m.compile()
opt.minimize(m, step_callback=logger, maxiter = 500)
plt.plot(logf)
plt.xlabel('iteration')
plt.ylabel('ELBO')
Result:
Added Results
Once I add more iterations and use large learning rate. It is good to see that ELBO increases as iterations increase. But it is very confused that both RMSE(root mean square error) for training and testing data increase too. Do you have some suggestions?
Figures and codes shown as follows:
ELBOs vs iterations
Train RMSEs vs iterations
Test RMSEs vs iterations
Using logger
def logger(x):
print(m.compute_log_likelihood())
logx.append(x)
logf.append(m.compute_log_likelihood())
logt.append(time.time() - st)
py_train = m.predict_y(X_train)[0]
py_test = m.predict_y(X_test)[0]
rmse_hist.append(np.sqrt(np.mean((Y_train - py_train)**2)))
rmse_test_hist.append(np.sqrt(np.mean((Y_test - py_test)**2)))
logger.i+=1
logger.i = 1
And the full code is shown through link.

Which Function in Matlab Should I use to Validate a Model forecast() or predict()?

I have used two types of models for modeling a SISO system with a time series data. The first is ARIMAx and the second one the Output-Error. Now, I should know which of the two performs best in forecasting the output giving the input in certain horizon, 15 days in my case, and only the necessary observed outputs for the model initialize properly. In Matlab, it is presented two functions in that seems to be used to validate models the forecast() and predict(). I have been reading the difference between predicting and forecasting and apparently people misconfuse a lot the two terms. I would like to know which of the two I should use to validate a model and choose the best one. The main point is that I have to test the model's performance for many horizons. In other words, how the model performs to forecast on the first day ahead, on the second day ahead until the 15th day ahead. I wrote the following code as an example:
close all
clear all
tic;
uhe = {'furnas'};
% Set the structures to be evaluated in ARMAx model
na = 10;
nb = 2;
nc = 1;
nk = 2;
% Set the structures to be evaluated in OE model
nbb = 10;
nf = 6;
nkk = 0;
u = 1;
% Read training dataset file and set iddata definitions
data_train = importdata(strcat('train_',uhe{u},'.dat'));
data_test = importdata(strcat('test_',uhe{u},'.txt'));
data_valid = importdata(strcat('valid_',uhe{u},'.txt'));
data_complet = vertcat(data_train, data_valid, data_test);
data_complet = iddata(data_complet(:,2),data_complet(:,1));
data_complet.TimeUnit = 'days';
data_complet.InputName = 'Chuva';
data_complet.OutputName = 'Vazão';
data_complet.InputUnit = 'm³/s';
data_complet.OutputUnit = 'm³/s';
data_complet.Name = 'Sistema Chuva-Vazão';
data_train = iddata(data_train(:,2),data_train(:,1));
data_train.TimeUnit = 'days';
data_train.InputName = 'Chuva';
data_train.OutputName = 'Vazão';
data_train.InputUnit = 'm³/s';
data_train.OutputUnit = 'm³/s';
data_train.Name = 'Sistema Chuva-Vazão';
data_valid = iddata(data_valid(:,2),data_valid(:,1));
data_valid.TimeUnit = 'days';
data_valid.InputName = 'Chuva';
data_valid.OutputName = 'Vazão';
data_valid.InputUnit = 'm³/s';
data_valid.OutputUnit = 'm³/s';
data_valid.Name = 'Sistema Chuva-Vazão';
data_test = iddata(data_test(:,2),data_test(:,1));
data_test.TimeUnit = 'days';
data_test.InputName = 'Chuva';
data_test.OutputName = 'Vazão';
data_test.InputUnit = 'm³/s';
data_test.OutputUnit = 'm³/s';
data_test.Name = 'Sistema Chuva-Vazão';
% Modeling training dataset with ARMAx
models_train_armax = armax(data_train,[na nb nc nk]);
% Modeling training dataset with OE
models_train_oe = oe(data_train,[nbb nf nkk]);
% Evalutaing the validation dataset ARMAX
x0 = findstates(models_train_armax,data_valid);
OPT = simOptions('InitialCondition',x0);
ssmodel_armax=idss(models_train_armax);
models_valid_armax = sim(ssmodel_armax,data_valid,OPT);
% Evaluating the validation dataset OE
x0 = findstates(models_train_oe,data_valid);
OPT = simOptions('InitialCondition',x0);
ssmodel_oe=idss(models_train_oe);
models_valid_oe = sim(ssmodel_oe,data_valid,OPT);
% Predicting Horizon
hz = 20;
% Applying predict function
opt = predictOptions('InitialCondition','e');
[y_armax_pred] = predict(ssmodel_armax,data_valid(1:end),hz,opt);
[y_oe_pred] = predict(ssmodel_oe,data_valid(1:end),hz,opt);
% Applying forecast function
opt = forecastOptions('InitialCondition','e');
[y_armax_fc] = forecast(ssmodel_armax,data_train((end-max([na nb nc nk])):end),hz,data_test.u(1:hz),opt);
[y_oe_fc] = forecast(ssmodel_oe,data_train((end-max([nbb nf nkk])):end),hz,data_test(1:hz),opt);
Depends on how you are trying to validate the model. Generally you would use the predict command as you would want backtest against previous data.
Alternatively you could use forecast if you have a cross-validation/holdout sample and you would like to test against that
Matlab's help has an interesting line regarding the difference between forecast and predict
forecast performs prediction into the future, in a time range beyond the last instant of measured data. In contrast, the predict command predicts the response of an identified model over the time span of measured data. Use predict to determine if the predicted result matches the observed response of an estimated model. If sys is a good prediction model, consider using it with forecast.
Also note that Matlab's help for predict also says that careful model validation should not use the default value of the prediction horizon.
For careful model validation, a one-step-ahead prediction (K = 1) is usually not a good test for validating the model sys over the time span of measured data. Even the trivial one step-ahead predictor, y(hat)(t)=y(t−1), can give good predictions. So a poor model may look fine for one-step-ahead prediction of data that has a small sample time. Prediction with K = Inf, which is the same as performing simulation with sim command, can lead to diverging outputs because low-frequency disturbances in the data are emphasized, especially for models with integration. Use a K value between 1 and Inf to capture the mid-frequency behavior of the measured data.

Issues with Simplex method for linear programming in Matlab (linprog funcion)

I am using the linprog function in Matlab to solve a set of large linear programming problems. I have 2601 decision variables, 51 inequality constraints, 71 equality constraints, and lower bounds of 0 for all variables.
The coefficients in the objective function and constraints vary in different problems. I am using the simplex method (when I try active-set and interior-point the program never stops running, as long as I have waited which was more than hours).
The simplex method converges for some of the problems very quickly, and for some of them (also very quickly) shows this message:
Exiting: The constraints are overly stringent; no feasible starting point found.
However, even for the ones with that message, it still provides a solution which satisfy the constraints. Can I just ignore that message and use the solutions or the message is important and the solution is probably not optimum?
Update: It turned out that the interior-point method solves some of them, but not the others. So in the code below, I used the interior-point method for the ones that work with it, and the simplex method with the rest.
These are my files and this is my code:
clc; clear;
%distances
t1 = readtable('t.xlsx', 'ReadVariableNames',false);
ti = table2array(t1);
sz = size(ti);
tiv = reshape(ti, [1,sz(1)*sz(2)]);
%crude oil production and attraction
A = readtable('A.xlsx', 'ReadVariableNames',false);
Ai = table2array(A);
P = readtable('P.xlsx', 'ReadVariableNames',false);
Pi = table2array(P);
%others
one1 = readtable('A Matrix.xlsx', 'ReadVariableNames',false);
one = table2array(one1);
two1 = readtable('Aeq Matrix.xlsx', 'ReadVariableNames',false);
two = table2array(two1);
zero = zeros(sz(1), sz(1));
infin = inf(sz(1), sz(1));
zerov = reshape(zero, [1,sz(1)*sz(2)]);
infinv = reshape(infin, [1,sz(1)*sz(2)]);
%OF
f = (tiv).^1;
%linear program
%x = linprog(f,A,b,Aeq,beq,lb,ub)
options1 = optimoptions('linprog','Algorithm','interior-point');
options2 = optimoptions('linprog','Algorithm','simplex');
x1999 = vec2mat(linprog(f,one,Pi(1,1:end),two,Ai(1,1:end),zerov,infinv,zerov,options2),sz(1));
x2000 = vec2mat(linprog(f,one,Pi(2,1:end),two,Ai(2,1:end),zerov,infinv,zerov,options1),sz(1));
x2001 = vec2mat(linprog(f,one,Pi(3,1:end),two,Ai(3,1:end),zerov,infinv,zerov,options1),sz(1));
x2002 = vec2mat(linprog(f,one,Pi(4,1:end),two,Ai(4,1:end),zerov,infinv,zerov,options1),sz(1));
x2003 = vec2mat(linprog(f,one,Pi(5,1:end),two,Ai(5,1:end),zerov,infinv,zerov,options1),sz(1));
x2004 = vec2mat(linprog(f,one,Pi(6,1:end),two,Ai(6,1:end),zerov,infinv,zerov,options1),sz(1));
x2005 = vec2mat(linprog(f,one,Pi(7,1:end),two,Ai(7,1:end),zerov,infinv,zerov,options1),sz(1));
x2006 = vec2mat(linprog(f,one,Pi(8,1:end),two,Ai(8,1:end),zerov,infinv,zerov,options1),sz(1));
x2007 = vec2mat(linprog(f,one,Pi(9,1:end),two,Ai(9,1:end),zerov,infinv,zerov,options2),sz(1));
x2008 = vec2mat(linprog(f,one,Pi(10,1:end),two,Ai(10,1:end),zerov,infinv,zerov,options2),sz(1));
x2009 = vec2mat(linprog(f,one,Pi(11,1:end),two,Ai(11,1:end),zerov,infinv,zerov,options2),sz(1));
x2010 = vec2mat(linprog(f,one,Pi(12,1:end),two,Ai(12,1:end),zerov,infinv,zerov,options2),sz(1));
x2011 = vec2mat(linprog(f,one,Pi(13,1:end),two,Ai(13,1:end),zerov,infinv,zerov,options2),sz(1));
x2012 = vec2mat(linprog(f,one,Pi(14,1:end),two,Ai(14,1:end),zerov,infinv,zerov,options1),sz(1));
x2013 = vec2mat(linprog(f,one,Pi(15,1:end),two,Ai(15,1:end),zerov,infinv,zerov,options2),sz(1));
x2014 = vec2mat(linprog(f,one,Pi(16,1:end),two,Ai(16,1:end),zerov,infinv,zerov,options2),sz(1));
x2015 = vec2mat(linprog(f,one,Pi(17,1:end),two,Ai(17,1:end),zerov,infinv,zerov,options2),sz(1));
x2016 = vec2mat(linprog(f,one,Pi(18,1:end),two,Ai(18,1:end),zerov,infinv,zerov,options1),sz(1));
In case somebody wants to know what the problem was, I found that for those programs with error, there was actually no feasible point and what the error said was correct. I found it out by running the same linear programs with a vector of zeros for the objective function's coefficients, and getting the same error (recommended method by Matlab's manual).

Using PCA before classification

I am using PCA to reduce number of features before training Random Forest. I first used around 70 principal components out of 125 which were around 99% of the energy (according to eigen values). I got much worse results after training Random Forests with new transformed features. After that I used all the principal components and I got the same results as when I used 70. This made no sense to me since that is the same feature space only in difirent base (the space has only be rotated so that should not affect the boundary).
Does anyone have the idea what may be the problem here?
Here is my code
clc;
clear all;
close all;
load patches_training_256.txt
load patches_testing_256.txt
Xtr = patches_training_256(:,2:end);
Xtr = Xtr';
Ytr = patches_training_256(:,1);
Ytr = Ytr';
Xtest = patches_testing_256(:,2:end);
Xtest = Xtest';
Ytest = patches_testing_256(:,1);
Ytest = Ytest';
data_size = size(Xtr, 2);
feature_size = size(Xtr, 1);
mu = mean(Xtr,2);
sigma = std(Xtr,0,2);
mu_mat = repmat(mu,1,data_size);
sigma_mat = repmat(sigma,1,data_size);
cov = ((Xtr - mu_mat)./sigma_mat) * ((Xtr - mu_mat)./sigma_mat)' / data_size;
[v d] = eig(cov);
%[U S V] = svd(((Xtr - mu_mat)./sigma_mat)');
k = 124;
%Ureduce = U(:,1:k);
%XtrReduce = ((Xtr - mu_mat)./sigma_mat) * Ureduce;
XtrReduce = v'*((Xtr - mu_mat)./sigma_mat);
B = TreeBagger(300, XtrReduce', Ytr', 'Prior', 'Empirical', 'NPrint', 1);
data_size_test = size(Xtest, 2);
mu_test = repmat(mu,1,data_size_test);
sigma_test = repmat(sigma,1,data_size_test);
XtestReduce = v' * ((Xtest - mu_test) ./ sigma_test);
Ypredict = predict(B,XtestReduce');
error = sum(Ytest' ~= (double(cell2mat(Ypredict)) - 48))
Random forest heavily depends on the choice of the base. It is not a linear model, which is (up to normalization) rotation invariant, RF completely changes behaviour once you "rotate the space". The reason behind it lies in the fact that it uses decision trees as base classifiers which analyze each feature completely independently, so as the result it fails to find any linear combination of features. Once you rotate your space you change "meaning" of features. There is nothing wrong with that, simply tree based classifiers are rather bad choice to apply after such transformations. Use features selection methods instead (methods which select which features are valuable without creating any linear combinations). In fact, RFs themselves can be used for such task due to their internal "feature importance" computation,
There is already a matlab function princomp which would do pca for you. I would suggest not to fall in numerical error loops. They have done it for us..:)