How to improve the convergence performance of Dymola? - modelica

Recently I am working with fluid modeling with Modelica, but I come across a lot of divergence problems of nonlinear equations, like in the following screenshot.
So I am considering if it is possible to use the min/max/nominal attributes of variables to improve the model's convergence, especially when a user comes across the nonlinear solver failure. According to the answer of this question on StackOverflow, min/max attributes won't help convergence, and based on the Modelica Specification 4.8.6, nomial attributes are used to determine appropriate tolerances or epsilons, or may be used for scaling.
So my question is:
If I meet this kind of divergence problem caused by the nonlinearity of my model, how could I help the compiler to get convergence better and quicker?
Someone might suggest better start values of variables used as state variables, but when I am dealing with large models, I am not sure how to find the specific state variables of which I should modify the start values.

Chapter 2.6.13 "Online diagnostics for non-linear systems" in manual 1B and following in the manual should help. You can e.g. list states that dominates error: usually these states are a good hint where to start your improvements.

Adding to the answer by Imke Krueger.
If the models fail after 2917 s one possibility is that the solution was diverging before that, with e.g., enthalpy decreasing further and further until the model has left the valid regions.
Assuming it happened fairly slowly it is best to plot the states and other variables in that components. Additionally the states dominating the error as indicated in the answer by Imke Krueger and see if any of them seem to diverge.
If it happened more quickly:
Log events and check whether something important like a flow reversal just happened before that time.
Disable equidistant output, as it is possible that the model diverged between two output points.

An eigenvalue-based analysis of the Jacobin at time = 0 provides a ranking of state-variables from most significant to the least one. That could be a heuristic to examine the influence of start variables of most significant state-variables.
What could be also helpful is to conduct a similar analysis a little time before the problem occurs.
Also there is a possibility to compute dynamic parameter sensitivities of state variables (before the problem occurs) w.r.t. start values, see e.g. https://github.com/Mathemodica/DerXP for a suggested approach. This gives you a hint which start values significantly influences the values of state variables.

Related

Dymola: why choosing which integration method

Simulating models with dymola, I get different results depending on the chosen integration method. So my question is: why choose which method?
Ideally the choice of method should be based on which one most quickly gives a result close enough to the exact result.
But we don't know the exact result, and in this case at least some of the solvers (likely all) don't generate a result close enough to it.
One possibility is to first try to simulate with a lot stricter tolerance - e.g. 1e-9. (Note: for the fixed step-size solvers, Euler and Rkfix* it would be smaller step-size, but don't start using them.)
Hopefully the difference between the solvers will decrease and the different solvers give more similar results (which should be close to the exact one).
You can then always use this stricter tolerance. Or in case one solver already gave the same result with less strict tolerance - then you can use that one at less strict tolerance; but you also have to make a trade-off between the accuracy and simulation-time.
Sometimes this does not happen and even the same solver generate different results for different tolerances; without converging to a true solution.
(Assumedly the solution are close together at the start but then quickly diverge.)
It is likely that the model is chaotic in that case. That is a bit more complicated to handle and there are several options:
It could be due to a modeling error that can be corrected
It could be that the model is correct, but the system could be changed to become smoother
It could be that the important outputs converge regardless of differences in other variables
It could also be some other error (including problems during initialization), but that requires a more complete example to investigate.
Choose the solver that matches best the exact solution.
Maybe Robert Piché's work on numerical solvers gives some more clues.

When(why) to use step/pulse/ramp functions in simulink?

Hello guys I'd like to know the answer to the question that is the titled named by.
For example if I have physical system described in differential equation(s), how should I know when I should use step, pulse or ramp generator?
What exactly does it do?
Thank you for your answers.
They are mostly the remnants of the classical control times. The main reason why they are so famous is because of their simple Laplace transform terms. 1,1/s and 1/s^2. Then you can multiply these with the plant and you would get the Laplace transform of the output.
Back in the day, what you only had was partial fraction expansion and Laplace transform tables to get an idea what the response would look like. And today, you can basically simulate whatever input you like. So they are not really neeeded which is the answer to your question.
But since people used these signals so often they have spotted certain properties. For example, step response is good for assessing the transients and the steady state tracking error value, ramp response is good for assessing (reference) following error (which introduces double integrators) and so on. Hence, some consider these signals as the characteristic functions though it is far from the truth. Especially, you should keep in mind that, just because the these responses are OK, the system is not necessarily stable.
However, keep in mind that these are extremely primitive ways of assessing the system. Currently, they are taught because they are good for giving homeworks and making people acquainted with Simulink etc.
They are used to determine system characteristics. If you are studying a system of differential equations you would want to know different characteristics from the response of the system from these kinds of inputs since these inputs are the very fundamental ones. For example a system whose output blows up for a pulse input is unstable, and you would not want to have such a system in real life(except in rare situations). It's too difficult for me to explain it all in an answer, you should start with this wiki page.

Why does GlobalSearch return different solutions each run?

When running the GlobalSearch solver on a nonlinear constrained optimization problem I have, I often get very different solutions each run. For the cases that I have an analytical solution, the numerical results are less dispersed than the non-analytical cases but are still different each run. It would be nice to get the same results at least for these analytical cases so that I know the optimization routine is working properly. Is there a good explanation of this in the Global Optimization Toolbox User Guide that I missed?
Also, why does GlobalSearch use a different number of local solver runs each run?
Thanks!
A full description of how the GlobalSearch algorithm works can be found Here.
In summary the GlobalSearch method iteratively performs convex optimization. Basically it starts out by using fmincon to search for a local minimum near the initial conditions you have provided. Then a bunch of "trial points", based on how good the initial result was, are generated using the "scatter search algorithm." Then there is some more convex optimization and rating of "how good" the minima around these points are.
There are a couple of things that can cause the algorithm give you different answers:
1. Changing the initial conditions you give it
2. The scatter search algorithm itself
The fact that you are getting different answers each time likely means that your function is highly non-convex. The best thing that I know of that you can do in this scenario is just to try the optimization algorithm at several different initial conditions and see what result you get back the most frequently.
It also looks like there is something called the 'PlotFcns' property which would allow you get a better idea what the functions the solver is generating for you look like.
You can use the ga or gamulti objective functions within the GlobalSearch api. I would recommend this. Convex optimizers wont be able to solve a non-linear problem. Even then Genetic Algorithms dont gaurantee the solution. If you run the ga and then use its final minimum as the start of your fmincon search then it should result in the same answer consistently. There may be better ones but if the search space is unknown you may never know.

Choose the right classification algorithm. Linear or non-linear? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I find this question a little tricky. Maybe someone knows an approach to answer this question. Imagine that you have a dataset(training data) which you don't know what it is about. Which features of training data would you look at in order to infer classification algorithm to classify this data? Can we say anything whether we should use a non-linear or linear classification algorithm?
By the way, I am using WEKA to analyze the data.
Any suggestions?
Thank you.
This is in fact two questions in one ;-)
Feature selection
Linear or not
add "algorithm selection", and you probably have three most fundamental questions of classifier design.
As an aside note, it's a good thing that you do not have any domain expertise which would have allowed you to guide the selection of features and/or to assert the linearity of the feature space. That's the fun of data mining : to infer such info without a priori expertise. (BTW, and while domain expertise is good to double-check the outcome of the classifier, too much a priori insight may make you miss good mining opportunities). Without any such a priori knowledge you are forced to establish sound methodologies and apply careful scrutiny to the results.
It's hard to provide specific guidance, in part because many details are left out in the question, and also because I'm somewhat BS-ing my way through this ;-). Never the less I hope the following generic advice will be helpful
For each algorithm you try (or more precisely for each set of parameters for a given algorithm), you will need to run many tests. Theory can be very helpful, but there will remain a lot of "trial and error". You'll find Cross-Validation a valuable technique.
In a nutshell, [and depending on the size of the available training data], you randomly split the training data in several parts and train the classifier on one [or several] of these parts, and then evaluate the classifier on its performance on another [or several] parts. For each such run you measure various indicators of performance such as Mis-Classification Error (MCE) and aside from telling you how the classifier performs, these metrics, or rather their variability will provide hints as to the relevance of the features selected and/or their lack of scale or linearity.
Independently of the linearity assumption, it is useful to normalize the values of numeric features. This helps with features which have an odd range etc.
Within each dimension, establish the range within, say, 2.5 standard deviations on either side of the median, and convert the feature values to a percentage on the basis of this range.
Convert nominal attributes to binary ones, creating as many dimensions are there are distinct values of the nominal attribute. (I think many algorithm optimizers will do this for you)
Once you have identified one or a few classifiers with a relatively decent performance (say 33% MCE), perform the same test series, with such a classifier by modifying only one parameter at a time. For example remove some features, and see if the resulting, lower dimensionality classifier improves or degrades.
The loss factor is a very sensitive parameter. Try and stick with one "reasonnable" but possibly suboptimal value for the bulk of the tests, fine tune the loss at the end.
Learn to exploit the "dump" info provided by the SVM optimizers. These results provide very valuable info as to what the optimizer "thinks"
Remember that what worked very well wih a given dataset in a given domain may perform very poorly with data from another domain...
coffee's good, not too much. When all fails, make it Irish ;-)
Wow, so you have some training data and you don't know whether you are looking at features representing words in a document, or genese in a cell and need to tune a classifier. Well, since you don't have any semantic information, you are going to have to do this soley by looking at statistical properties of the data sets.
First, to formulate the problem, this is more than just linear vs non-linear. If you are really looking to classify this data, what you really need to do is to select a kernel function for the classifier which may be linear, or non-linear (gaussian, polynomial, hyperbolic, etc. In addition each kernel function may take one or more parameters that would need to be set. Determining an optimal kernel function and parameter set for a given classification problem is not really a solved problem, there are only useful heuristics and if you google 'selecting a kernel function' or 'choose kernel function', you will be treated to many research papers proposing and testing various approaches. While there are many approaches, one of the most basic and well travelled is to do a gradient descent on the parameters-- basically you try a kernel method and a parameter set , train on half your data points and see how you do. Then you try a different set of parameters and see how you do. You move the parameters in the direction of best improvement in accuracy until you get satisfactory results.
If you don't need to go through all this complexity to find a good kernel function, and simply want an answer to linear or non-linear. then the question mainly comes down to two things: Non linear classifiers will have a higher risk of overfitting (undergeneralizing) since they have more dimensions of freedom. They can suffer from the classifier merely memorizing sets of good data points, rather than coming up with a good generalization. On the other hand a linear classifier has less freedom to fit, and in the case of data that is not linearly seperable, will fail to find a good decision function and suffer from high error rates.
Unfortunately, I don't know a better mathematical solution to answer the question "is this data linearly seperable" other than to just try the classifier itself and see how it performs. For that you are going to need a smarter answer than mine.
Edit: This research paper describes an algorithm which looks like it should be able to determine how close a given data set comes to being linearly seperable.
http://www2.ift.ulaval.ca/~mmarchand/publications/wcnn93aa.pdf

How to optimize neural network by using genetic algorithm?

I'm quite new with this topic so any help would be great. What I need is to optimize a neural network in MATLAB by using GA. My network has [2x98] input and [1x98] target, I've tried consulting MATLAB help but I'm still kind of clueless about what to do :( so, any help would be appreciated. Thanks in advance.
Edit: I guess I didn't say what is there to be optimized as Dan said in the 1st answer. I guess most important thing is number of hidden neurons. And maybe number of hidden layers and training parameters like number of epochs or so. Sorry for not providing enough info, I'm still learning about this.
If this is a homework assignment, do whatever you were taught in class.
Otherwise, ditch the MLP entirely. Support vector regression ( http://www.csie.ntu.edu.tw/~cjlin/libsvm/ ) is much more reliably trainable across a broad swath of problems, and pretty much never runs into the stuck-in-a-local-minima problem often hit with back-propagation trained MLP which forces you to solve a network topography optimization problem just to find a network which will actually train.
well, you need to be more specific about what you are trying to optimize. Is it the size of the hidden layer? Do you have a hidden layer? Is it parameter optimization (learning rate, kernel parameters)?
I assume you have a set of parameters (# of hidden layers, # of neurons per layer...) that needs to be tuned, instead of brute-force searching all combinations to pick a good one, GA can help you "jump" from this combination to another one. So, you can "explore" the search space for potential candidates.
GA can help in selecting "helpful" features. Some features might appear redundant and you want to prune them. However, say, data has too many features to search for the best set of features by some approaches such as forward selection. Again, GA can "jump" from this set candidate to another one.
You will need to find away to encode the data (input parameters, features...) fed to GA. For finding a set of input paras or a good set of features, I think binary encoding should work. In addition, choosing operators for GA to reproduce offsprings is also important. Yet GA needs to be tuned, too (early stopping which can also be applied to ANN).
Here are just some ideas. You might want to search for more info about GA, feature selection, ANN pruning...
Since you're using MATLAB already I suggest you look into the Genetic Algorithms solver (known as GATool, part of the Global Optimization Toolbox) and the Neural Network Toolbox. Between those two you should be able to save quite a bit of figuring out.
You'll basically have to do 2 main tasks:
Come up with a representation (or encoding) for your candidate solutions
Code your fitness function (which basically tests candidate solutions) and pass it as a parameter to the GA solver.
If you need help in terms of coming up with a fitness function, or encoding of candidate solutions then you'll have to be more specific.
Hope it helps.
Matlab has a simple but great explanation for this problem here. It explains both the ANN and GA part.
For more info on using ANN in command line see this.
There is also plenty of litterature on the subject if you google it. It is however not related to MATLAB, but simply the results and the method.
Look up Matthew Settles on Google Scholar. He did some work in this area at the University of Idaho in the last 5-6 years. He should have citations relevant to your work.