Increasing simulation time in a Monte Carlo simulation in Matlab/Simulink - matlab

I'm running Monte Carlo simulation for a Simulink model with a Matlab script that looks more or less like this :
model = 'modelName';
load_system(model)
for ii = 1 : numberOfMC
% Some set_param...
% Some values are set
sim(model);
results{ii, 1} = numberOfMC;
% ect...
end
close_system(model,0);
As the number of Monte Carlo trials increase, the time of one simulation increases as well like n^2.
Is there a simple explanation for that and is there a solution to have something linear in time?
Thank you!
EDIT:
When I split my simulation in 6 batchs, and I run them in series, the sum of the simulation times is far less than when I run the entire simulation ine one shot.

As it seems that there is a limit to what one can do without feedback from the asker I will just post my comment as an answer:
My bet would be memory issues, if you want to eliminate this see whether the problem still occurs if you don't store the result in the first place, simply remove this line:
results{ii, 1} = numberOfMC;
Also confirm that you don't have other growing variables or that you accidentally make the input more complex as you go. It is probably not relevant, does the time also increase like this if you do all simulations in reversed order? Or if you do the full amount of iterations, but each time with exactly the same input?

Related

Ways to Improve Universal Differential Equation Training with sciml_train

About a month ago I asked a question about strategies for better convergence when training a neural differential equation. I've since gotten that example to work using the advice I was given, but when I applied what the same advice to a more difficult model, I got stuck again. All of my code is in Julia, primarily making use of the DiffEqFlux library. In effort to keep this post as brief as possible, I won't share all of my code for everything I've tried, but if anyone wants access to it to troubleshoot I can provide it.
What I'm Trying to Do
The data I'm trying to learn comes from an SIRx model:
function SIRx!(du, u, p, t)
β, μ, γ, a, b = Float32.([280, 1/50, 365/22, 100, 0.05])
S, I, x = u
du[1] = μ*(1-x) - β*S*I - μ*S
du[2] = β*S*I - (μ+γ)*I
du[3] = a*I - b*x
nothing
end;
The initial condition I used was u0 = Float32.([0.062047128, 1.3126149f-7, 0.9486445]);. I generated data from t=0 to 25, sampled every 0.02 (in training, I only use every 20 points or so for speed, and using more doesn't improve results). The data looks like this: Training Data
The UDE I'm training is
function SIRx_ude!(du, u, p, t)
μ, γ = Float32.([1/50, 365/22])
S,I,x = u
du[1] = μ*(1-x) - μ*S + ann_dS(u, #view p[1:lenS])[1]
du[2] = -(μ+γ)*I + ann_dI(u, #view p[lenS+1:lenS+lenI])[1]
du[3] = ann_dx(u, #view p[lenI+1:end])[1]
nothing
end;
Each of the neural networks (ann_dS, ann_dI, ann_dx) are defined using FastChain(FastDense(3, 20, tanh), FastDense(20, 1)). I tried using a single neural network with 3 inputs and 3 outputs, but it was slower and didn't perform any better. I also tried normalizing inputs to the network first, but it doesn't make a significant difference outside of slowing things down.
What I've Tried
Single shooting
The network just fits a line through the middle of the data. This happens even when I weight the earlier datapoints more in the loss function. Single-shot Training
Multiple Shooting
The best result I had was with multiple shooting. As seen here, it's not simply fitting a straight line, but it's not exactly fitting the data eitherMultiple Shooting Result. I've tried continuity terms ranging from 0.1 to 100 and group sizes from 3 to 30 and it doesn't make a significant difference.
Various Other Strategies
I've also tried iteratively growing the fit, 2-stage training with a collocation, and mini-batching as outlined here: https://diffeqflux.sciml.ai/dev/examples/local_minima, https://diffeqflux.sciml.ai/dev/examples/collocation/, https://diffeqflux.sciml.ai/dev/examples/minibatch/. Iteratively growing the fit works well the first couple of iterations, but as the length increases it goes back to fitting a straight line again. 2-stage collocation training works really well for stage 1, but it doesn't actually improve performance on the second stage (I've tried both single and multiple shooting for the second stage). Finally, mini-batching worked about as well as single-shooting (which is to say not very well) but much more quickly.
My Question
In summary, I have no idea what to try. There are so many strategies, each with so many parameters that can be tweaked. I need a way to diagnose the problem more precisely so I can better decide how to proceed. If anyone has experience with this sort of problem, I'd appreciate any advice or guidance I can get.
This isn't a great SO question because it's more exploratory. Did you lower your ODE tolerances? That would improve your gradient calculation which could help. What activation function are you using? I would use something like softplus instead of tanh so that you don't have the saturating behavior. Did you scale the eigenvalues and take into account the issues explored in the stiff neural ODE paper? Larger neural networks? Different learning rates? ADAM? Etc.
This is much better suited for a forum for discussion like the JuliaLang Discourse. We can continue there since walking through this will not be fruitful without some back and forth.

faster way to add many large matrix in matlab

Say I have many (around 1000) large matrices (about 1000 by 1000) and I want to add them together element-wise. The very naive way is using a temp variable and accumulates in a loop. For example,
summ=0;
for ii=1:20
for jj=1:20
summ=summ+ rand(400);
end
end
After searching on the Internet for some while, someone said it's better to do with the help of sum(). For example,
sump=zeros(400,400,400);
count=0;
for ii=1:20
for j=1:20
count=count+1;
sump(:,:,count)=rand(400);
end
end
sum(sump,3);
However, after I tested two ways, the result is
Elapsed time is 0.780819 seconds.
Elapsed time is 1.085279 seconds.
which means the second method is even worse.
So I am just wondering if there any effective way to do addition? Assume that I am working on a computer with very large memory and a GTX 1080 (CUDA might be helpful but I don't know whether it's worthy to do so since communication also takes time.)
Thanks for your time! Any reply will be highly appreciated!.
The fastes way is to not use any loops in matlab at all.
In many cases, the internal functions of matlab all well optimized to use SIMD or other acceleration techniques.
An example for using the build in functionalities to create matrices of the desired size is X = rand(sz1,...,szN).
In your explicit case sum(rand(400,400,400),3) should give you then the fastest result.

Generate random numbers inside spmd in matlab

I am running a Monte carlo simulation in Matlab using parallelisation due to the extensive time that the simulation takes to run.
The main objective is create a really big panel data set and use that to estimate some regressions.
The problem is that when I run the simulation without parallelise they take A LOT of time to run, so I decided to use spmd option. However, results are very different running the parallelised code compared to the normal one.
rng(3857);
for r=1:MCREP
Ycom=[];
Xcom=[];
YLcom=[];
spmd
for it=labindex:numlabs:NT
(code to generate different components, alpha, delta, x_it, eps_it)
%e.g. x_it=2+1*randn(TT,1);
(uses random number generator: rndn)
% Create different time periods observations for each individual
for t=2:TT
yi(t)=xi*alpha+mu*delta+rho*yi(t-1)+beta*x_it(t)+eps_it(t);
yLi(t)=yi(t-1);
end
% Concatenate each individual in a big matrix: create panel
Ycom=[Ycom yi];
Xcom=[Xcom x_it];
YLcom=[YLcom yLi];
end
end
% Retrieve data stored in composite form
mm=matlabpool('size');
for i=1:mm
Y(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Ycom{i};
X(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Xcom{i};
YL(:,(i-1)*(NT/mm)+1:i*(NT/mm))=YLcom{i};
end
(rest of the code, run regressions)
end
The intensive part of the code is the one that is parallelised with the spmd, it creates a really large panel data set in where columns are independent individuals, and rows are dependent time periods.
My main problem is that when I run the code using the parallel then results are different than when I don't use it, moreover results are different if I use 8 workers or 16 workers. However for a matter of time is unfeasible to run the code without parallelisation.
I believe problem is coming from the random numbers generation, but I can not fix the seed inside the spmd because that mean fixing the seed inside the Monte Carlo loop, so all the repetitions are going to have the same numbers.
I would want to know how can I fix the random number generator in such a way that it does not matter how many workers I use it will give me the same results.
PS. Another solution would be to do the spmd in the most outer loop (the Monte Carlo loop), however I can not see a performance gain when I use the parallelisation in that way.
Thank you very much for your help.
Heh... the random generators in MATLAB's parallel execution is indeed an issue.
The MATLAB's web page about random generators (http://www.mathworks.com/help/matlab/math/creating-and-controlling-a-random-number-stream.html) states that only two streams/generators can have multiple streams. These two have a limited period (see the Table at the previous link).
BUT!!! The default generator (mt19937ar) can be seeded in order to have different results :)
Thus, what you can do is to start with the mrg32k3a, obtain a random number in each worker and then use this random number along with the worker index to seed an mt19937ar generator.
E.g.
spmd
r1 = rand(randStream{labindex}, [1 1]);
r2 = rand(randStream{labindex}, [1 1]);
rng(labindex+(r1/r2), 'twister');
% Do you stuff
end
Of course, the r1 and r2 can be modified (or, maybe, add more r's) in order to have more complicated seeding.

Calculate time of script execution previously with Matlab

Good morning,
I have a question about the time execution of a script on Matlab. Is it possible to know previously how long spend the execution of a script before running it (an estimated time, for example)? I know that with tic and toc command, among others, is it possible to know the time at the end but I don't know if it's possible to know it before.
Thanks in advance,
It is not too hard to make an estimate of how long your calculation will take.
You already know how to record calculation times with tic and toc, so now you can do this:
Start with a small scale test (example, n=1) and record the calculation time
Multiply n with a constant k (I usually choose 2 or 10 for easy calculations), record the calculation time
Keep multiplying with n untill you find a consistent relation: 'If I multiply my input size with k, my calculation time changes like so ...'
Now you can extrapolate your estimated calculation time by:
calculating how many times you need to multiply input size of the biggest small scale example to get your real data size
Applying the consistent relation that you found exactly that many times to the calculation time of your biggest small scale example
Of course this combines well with some common sense, like if you do certain things t times they will take about t times as long. This can easily be used when you have to perform a certain calculation a million times. Just interrupt the loop after a minute or so, if it is still in the first ten calculations you may want to give up!

Need help on Hopfield Simulation function

I have trained a Hopfiled network using newhop function, when I am simulating this network for my test input data, [y,Pf,Af] = sim(net,{1 repeatnum},{},{im1}); it is working properly but the problem is that it gets number of iterations as an input argument e.g. 100 iterations. The network may converge on the input data for example in 5th iteration and there's no need to continue the simulation. Is there any way to simulate until the convergence of network?
Regards!
Check out net.adaptParam.goal.
If necessary, set net.adaptFcn and net.adaptParam properly and according to the help of help nntrain.
From help traingd:
Training stops when any of these conditions occurs:
1) The maximum number of EPOCHS (repetitions) is reached.
2) The maximum amount of TIME has been exceeded.
3) Performance has been minimized to the GOAL.
4) The performance gradient falls below MINGRAD.
5) Validation performance has increased more than MAX_FAIL times
since the last time it decreased (when using validation).