How to run parameters optimization for my model using the GPU - matlab

Essentially, I have a model with given a set of parameters is able to calculate different thermodynamic properties for different compounds, let's say liquid densities and vapor pressures for example.
When I want to regress the model parameters (e.g. a,b,c,d,e) by fitting data for different compounds, I usually do a lot of sequential operations that I am sure I can easily improve their efficiency. I am thinking about multiobjective optimization or even better-using GPU or multicores of the CPU. But I am a bit lost on where to start from reading just the documentation.
So within my objetive function I usually have something like:
[fval]= objective_function(a,b,c,d,e)
input_comp1=f(constants,a,b,c,d,e)
input_comp2=f(constants,a,b,c,d,e)
exp_density1=load some text file or so.
exp_density2=load some text file or so.
exp_vaporpressure1=load some text file or so.
exp_vaporpressure2=load some text file or so.
densities_1=density(a,b,c,d,e,input_comp1)
vapor_pressures1=vapor_pressures(a,b,c,d,e,input_comp1)
densities_2=density(a,b,c,d,e,input_comp2)
vapor_pressures=vapor_pressures (a,b,c,d,e,input_comp2)
ARD_d1=expression for deviations between experimental and calculated values for density of comp.1
ARD_d2=...
ARD_p1=...
ARD_p2=...
fval= ARD_d1+ARD_d2+ARD_p1+ARD_p2
Which is then evaluated by something like fminsearch but I have also used others in the past, fminsearch worked the best for me. When I do this for just one component It works fast enough for my purpose (but I am a patient man). But now I've extended the model in a way that I need to regress parameters simultaneous from more than one component and it becomes impossible.
I am quite sure this way of doing the calculations can be improved because I can run the calculations for the different compounds simultaneous instead of doing them sequentially, and then evaluate fval when the calculations for all components are done. But how?

Related

integrate Modelica variable without influencing state selection

I want to integrate a Modelica variable over time, just for convenience in plotting and post-processing. The variable I want to integrate over time is the power of a compressor so that I get the total energy. The first idea would be to add these lines:
Modelica.Units.SI.Power P_comp;
Modelica.Units.SI.Energy E_comp;
equation
P_comp = der(E_comp);
Is that the recommended way, or are there (better?) alternatives? Is it expected to influence the selection of dynamic states?
Assuming that those two lines are the only ones using E_comp that should work.
Basically E_comp will be part of its own separate state-selection block and changes there shouldn't influence anything else.
However, state selection consists of a number of algorithms and heuristics so it is difficult to formally guarantee that any change does not influence it.
I could imagine some strange possibilities that would break this, but I don't think anyone has implemented them - and I don't see a use-case for them (except to mess up cases like this).
And if you instead of integrating want to differentiate a signal it is a lot messier.

checking for convergence in complex hierarchical models JAGS

I have estimated a complex hierarchical model with many random effects, but don't really know what the best approach is to checking for convergend. I have complex longitudinal data from a few hundred individuals and estimate quite a few parameters for every individual. Because of that, I have way to many traceplots to inspect visually. Or should I really spend a day going through all the traceplots? What would be a better way to check for convergence? Do I have to calculate Gelman and Rubin's Rhat for every parameter on the person level? And when can I conclude that the model converged? When absolutely all of the thousends of parameters reached convergence? Is it even sensible to expect that? Or is there something like "overall convergence"? And what does it mean when some person-level parameters did not converge? Does it make sense to use autorun.jags from the R2jags package with such a model or will it just run for ever? I know, these are a lot of question, but I just don't know how to approach that.
The measure I am using for convergence is a potential scale reduction factor (psrf)* using the gelman.diag function from the R package coda.
But nevertheless, I am also quickly visually inspecting all the traceplots, even though I also have tens/hundreds of them. It can be really fast if you put them in PNG files and then quickly go through them using e.g. IrfanView (let me know if you need me to expand on this).
The reason you should inspect the traceplots is pretty well described by an example from Marc Kery (author of great Bayesian books): see "Never blindly trust Rhat for convergence in a Bayesian analysis", here I include a self explanatory image from this email:
This is related to Rhat statistics while I use psrf, but it's pretty likely that psrf suffers from this too... and better to check the chains.
*) Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992).

A general question about Modelica initialization

How to set values to all the variables that could be possibly used as iteration variables, for example, there is a heat exchanger which includes a few connectors, and each connector includes a few variables, I can't know which variables could be used as iteration variables, when dealing with initialization, do I need to set values to every variable so that no matter which variable is chosen as iteration variable, there is a reasonable value?
Marvel,
I think that you are a bit on the wrong track for finding a solution: setting values to all variables that possibly could become iteration variables is often too many, and will lead to errors and problems. But I think I can give you some useful advice in any case.
Alias variables: there are many alias variable sin Modelica models. You should always try to only select one of them to set start values.
Feedback between start values and iteration variables: most Modelica tools will prefer to select iteration variables that have start values set. Selecting fewer thus can guide the algorithm towards selecting good one. Therefore: don't overdo it.
General advice for selecting iteration variables. For a pure ODE, the states will always be a complete set of start variables, even if sometimes not the best one. For DAE you can start with the following exercise: think of all equations that result from a singular perturbation of the complete physics as differential equations with states. For example, in a heat exchanger, you need to consider the dynamic momentum balance and not the most often used static reduction to an algebraic pressure loss only, i.e. add the mass flow as a state. Similar in chemical reactions: think of it as Kinetics, not equilibrium reactions. That gives you a pretty good starting point, even though often not the best one.
If your troubles don't quite resolve from that, I recommend that you contact us via www.modelon.com: we have advanced ways of dealing with hard initialization and steady state problems in our Modelic tool. :-)
There is also a simplest way to answer your question, working quite well with fluid models.
Giving the fact that you are using a dynamic model, what you need to initialize are the state variables of your system. To know the state variables, either you know the type of model you are wirking with or you can dig through them using options like 'List continuous time states selected' in Dymola (I do not know about other tools), giving you the state variables in the translation log.
In case of fluid models, most of the times those are pressure and energy (enthalpy or temperature). All other variables will be calculated based on them.
For complex (or not) models, this approach show limits, which can sometimes be solved by changing/correcting the structure of the model.
Static models are something else...
Hope this can help :)

matlab running all linprog algortithms (is there a matlab-list of algorithms?)

Matlab offers multiple algorithms for solving Linear Programs.
For example Matlab R2012b offers: 'active-set', 'trust-region-reflective', 'interior-point', 'interior-point-convex', 'levenberg-marquardt', 'trust-region-dogleg', 'lm-line-search', or 'sqp'.
But other versions of Matlab support different algorithms.
I would like to run a loop over all algorithms that are supported by the users Matlab-Version. And I would like them to be ordered like the recommendation order of Matlab.
I would like to implement something like this:
i=1;
x=[];
while (isempty(x))
options=optimset(options,'Algorithm',Here_I_need_a_list_of_Algorithms(i))
x = linprog(f,A,b,Aeq,beq,lb,ub,x0,options);
end
In 99% this code should be equivalent to
x = linprog(f,A,b,Aeq,beq,lb,ub,x0,options);
but sometimes the algorithm gives back an empty array because of numerical problems (exitflag -4). If there is a chance that one of the other algorithms can find a solution I would like to try them too.
So my question is:
Is there a possibility to automatically get a list of all linprog-algorithms that are supported by the installed Matlab-version ordered like Matlab recommends them.
I think looping through all algorithms can make sense in other scenarios too. For example when you need very precise data and have a lot of time, you could run them all and than evaluate which gives the best results.
Or one would like to loop through all algorithms, if one wants to find which algorithms is the best for LPs with a certain structure.
There's no automatic way to do this as far as I know. If you really want to do it, the easiest thing to do would be to go to the online documentation, and check through previous versions (online documentation is available for old versions, not just the most recent release), and construct some variables like this:
r2012balgos = {'active-set', 'trust-region-reflective', 'interior-point', 'interior-point-convex', 'levenberg-marquardt', 'trust-region-dogleg', 'lm-line-search', 'sqp'};
...
r2017aalgos = {...};
v = ver('matlab');
switch v.Release
case '(R2012b)'
algos = r2012balgos;
....
case '(R2017a)'
algos = r2017aalgos;
end
% loop through each of the algorithms
Seems boring, but it should only take you about 30 minutes.
There's a reason MathWorks aren't making this as easy as you might hope, though, because what you're asking for isn't a great idea.
It is possible to construct artificial problems where one algorithm finds a solution and the others don't. But in practice, typically if the recommended algorithm doesn't find a solution this doesn't indicate that you should switch algorithms, it indicates that your problem wasn't well-formulated, and you should consider modifying it, perhaps by modifying some constraints, or reformulating the objective function.
And after all, why stop with just looping through the alternative algorithms? Why not also loop through lots of values for other options such as constraint tolerances, optimality tolerances, maximum number of function evaluations, etc.? These may have just as much likelihood of affecting things as a choice of algorithm. And soon you're running an optimisation algorithm to search through the space of meta-parameters for your original optimisation.
That's not a great plan - probably better to just choose one of the recommended algorithms, stick to that, and if things don't work out then focus on improving your formulation of the problems rather than over-tweaking the optimisation itself.

Declaring rng('shuffle','twister') many times through the use of functions degrade computation time

I have an optimization program where I have a main program and three subprograms (functions) in MATLAB. I declared rng('shuffle','twister') in my main program but I thought that I needed to declare the same rng('shuffle','twister') under my functions since they also use random sampling. My question is if it is necessary to declare rng('shuffle','twister') in my functions since it greatly degrades the computation time. I seem to be getting the same answers anyway. Is there a way around this?
You do not need to repeatedly run rng(...) in your functions, just once when you start MATLAB if you want to get different numbers each time. The random number functions in MATLAB (i.e. rand, randn, randi, etc.) share a global/system-wide generator, so there is no need to reseed it except when you restart MATLAB.
Since all of these functions access the same underlying stream, a call to one affects the values produced by the others at subsequent calls.
Hence, numbers generated in the different functions and in repeated calls to the functions will be different whether or not you reseed the generator.
More about the 'shuffle' option from this page, which indicates that not only is it not useful to re-seed frequently, but it may actually be undesirable from a statistical standpoint:
'shuffle' is a very easy way to reseed the random number generator. You might think that it's a good idea, or even necessary, to use it to get "true" randomness in MATLAB. For most purposes, though, it is not necessary to use 'shuffle' at all. Choosing a seed based on the current time does not improve the statistical properties of the values you'll get from rand, randi, and randn, and does not make them "more random" in any real sense. While it is perfectly fine to reseed the generator each time you start up MATLAB, or before you run some kind of large calculation involving random numbers, it is actually not a good idea to reseed the generator too frequently within a session, because this can affect the statistical properties of your random numbers.