Some general Modelica advice?
We've built a model with ~2000 equations and three vectors of input from measured data. Using OpenModelica, attempts at simulation have begun to hang in the translation stage (which runs for hours where it used to take less than a minute) and now I regularly "lose connection to omc.exe." Is there perhaps something cumulative occurring that's degrading translation/compilation performance?
In general, are there any good rules of thumb for keeping simulations lighter and faster? I realize that, depending on the couplings, additional equations could be exponentially increasing the size of the resulting system of equations - could this be a problem?
Thanks for your thoughts!
It shouldn't take that long. Seems like a bug.
You can report this bug here:
https://trac.openmodelica.org/OpenModelica (New Ticket).
If your model is public you can post it there, if not you can contact the OpenModelica team privately.
I did some cleaning in the code; and got the part that repeats 12x (the module) down to ~180 equations; in the process I reduced the size of my input vectors (and also a 2D look-up table the module refers to) by quite a bit - they're both down to a few hundred values. It's working now--simulations run in reasonable time, a few minutes each.
Since all these tables were defined within Modelica functions (as you pointed out, Mr. Tiller) perhaps shrinking them helped to improve the performance. I had assumed that all that data just got spread out in a memory array, without going through any real processing, but maybe that's not the case...time to know more about what's going on under the hood in this environment (as always).
Thanks for the help!
Related
I have a problem that I don't have enough training data for my NN. It is trying to predict the result of a soccer game given the last games which I woulf say is a regression task.
The training data are results of soccer games of the last 15 seasons (which are about 4500 games). Getting to new data would be hard and would take a lot of time.
What should I do now?
Is it good to duplicate the data?
Should I input randomized data? (Maybe noise but I'm not quite sure what that is)
If there is no way of creating more data,
I should probably turn up the learning rate right? (I have it sitting at 0.01 and the momentum at 0.9)
I am using mini batches consisting of 32 training datas in training. Since I don't have a lot of training I don't have a lot of mini batches. Should I stop using them?
To start from the beginning: This is a very theoretical question and is not directly related to programming, which I recommend (in future) to post over at the Data Science Stackexchange.
To go into your problem: 4500 samples is not as bad as it sounds, depending on the exact task at hand. Are you trying to predict the match results (i.e. which team is the winner?), are you looking for more specific predictions (across a lot of different, specific teams)?
If you can make sure that you have a reasonable amount of data per class, one can work with a number of samples lower than what you have. Simply duplicating the data will not help you much, since you are very likely to just overfit on the samples you are seeing, without much of an improvement; Or rather, you will get the same results as training over a longer period (since essentially you see every sample twice per epoch, instead of one).
Again, what usually happens after long training periods is overfitting, so nothing gained here.
Your second suggestion is generally called data augmentation. Instead of simply copying samples, you alter them enough to make it look "different" to the network. But be careful! Data augmentation works well for some inputs, like images, since the change in input is significant enough to not represent the same sample, but still contains meaningful information about the class (a horizontally mirrored image of a cat still shows a "valid cat", unlike a vertically mirrored image, which is more unrealistic in the real world).
Essentially, it depends on your input features to determine where it makes sense to add noise. If you are only changing the results of the previous game, a minor change in input (adding/subtracting one goal at random) can significantly change the prediction you make.
If you slightly scramble ELO scores by a random number, on the other hand, the input value will not be too different, "but different enough" to use it as a novel example.
Turning up the learning rate is not a good idea, since you are essentially just letting the network converge more towards the specific samples. On the contrary, I would argue that the current learning rate is still too high, and you should certainly not increase it.
Regarding mini batches, I think I have referenced this a million times now, but always consider smaller minibatches. From a theoretical point of view, you are more likely to converge to a local minimum.
I have run across issues in developing models where the translation time (simulates quickly but takes far too long to translate) has become a serious issue and could use some insight so I can look into resolving this.
So the question is:
What are some of the primary factors that impact the translation time of a model and ideas to address the issue?
For example, things that may have an impact:
for loops vs a vectorized method - a basic model testing this didn't seem to impact anything
using input variables vs parameters
impact of annotations (e.g., Evaluate=true)
or tough luck, this is tool dependent (Dymola, OMEdit, etc.) :(
use of many connect() - this seems to be a factor (perhaps primary) as it forces translater to do all the heavy lifting
Any insight is greatly appreciated.
Clearly the answer to this question if naturally open ended. There are many things to consider when computation times may be a factor.
For distributed models (e.g., finite difference) the use of simple models and then using connect equations to link them in the appropriate order is not the best way to produce the models. Experience has shown that this method significantly increases the translation time to unbearable lengths. It is better to create distributed models in the same approach that is used the MSL Dynamic pipe (not exactly like it but similar).
Changing the approach as described is significantly faster in translational time (orders of magnitude for larger models, >~100,000 equations) than using connect statements as the number of distributed elements increases to larger numbers. This was tested using Dymola 2017 and 2017FD01.
Some related materials pointed out by others that may be useful for more information have been included below:
https://modelica.org/events/modelica2011/Proceedings/pages/papers/07_1_ID_183_a_fv.pdf
Scalable Test Suite : https://dx.doi.org/10.3384/ecp15118459
I know it sounds strange and that's a bad way to write a question,but let me show you this odd behavior.
as you can see this signal, r5, is nice and clean. exactly what I expected from my simulation.
now look at this:
this is EXACTLY the same simulation,the only difference is that the filter is now not connected. I tried for hours to find a reason,but it seems like a bug.
This is my file, you can test it yourself disconnecting the filter.
----edited.
Tried it with simulink 2014 and on friend's 2013,on two different computers...if Someone can test it on 2015 it would be great.
(attaching the filter to any other r,r1-r4 included ''fixes'' the noise (on ALL r1-r8),I tried putting it on other signals but the noise won't go away).
the expected result is exactly the smooth one, this file showed to be quite robust on other simulations (so I guess the math inside the blocks is good) and this case happens only with one of the two''link number'' (one input on the top left) set to 4,even if a small noise appears with one ''link number'' set to 3.
thanks in advance for any help.
It seems to me that the only thing the filter could affect is the time step used in the integration, assuming you are using a dynamic time step (which is the default). So, my guess is that (if this is not a bug) your system is numerically unstable/chaotic. It could also be related to noise, caused by differentiation. Differentiating noise over a smaller time step mostly makes things even worse.
Solvers such as ode23 and ode45 use a dynamic time step. ode23 compares a second and third order integration and selects the third one if the difference between the two is not too big. If the difference is too big, it does another calculation with a smaller timestep. ode45 does the same with a fourth and fifth order calculation, more accurate, but more sensitive. Instabilities can occur if a smaller time step makes things worse, which could occur if you differentiate noise.
To overcome the problem, try using a fixed time step, change your precision/solver, or better: avoid differentiation, use some type of state estimator to obtain derivatives or calculate analytically.
so I have the following Integral that i need to do numerically:
Int[Exp(0.5*(aCosx + bSinx + cCos2x + dSin2x))] x=0..2Pi
The problem is that the output at any given value of x can be extremely large, e^2000, so larger than I can deal with in double precision.
I havn't had much luck googling for the following, how do you deal with large numbers in fortran, not high precision, i dont care if i know it to beyond double precision, and at the end i'll just be taking the log, but i just need to be able to handle the large numbers untill i can take the log..
Are there integration packes that have the ability to handle arbitrarily large numbers? Mathematica clearly can.. so there must be something like this out there.
Cheers
This is probably an extended comment rather than an answer but here goes anyway ...
As you've already observed Fortran isn't equipped, out of the box, with the facility for handling such large numbers as e^2000. I think you have 3 options.
Use mathematics to reduce your problem to one which does (or a number of related ones which do) fall within the numerical range that your Fortran compiler can compute.
Use Mathematica or one of the other computer algebra systems (eg Maple, SAGE, Maxima). All (I think) of these can be integrated into a Fortran program (with varying degrees of difficulty and integration).
Use a library for high-precision (often called either arbitray-precision or multiple-precision too) arithmetic. Your favourite search engine will turn up a number of these for you, some written in Fortran (and therefore easy to integrate), some written in C/C++ or other languages (and therefore slightly harder to integrate). You might start your search at Lawrence Berkeley or the GNU bignum library.
(Yes I know that I wrote that you have 3 options, but your question suggests that you aren't ready to consider this yet) You could write your own high-/arbitrary-/multiple-precision functions. Fortran provides everything you need to construct such a library, there is a lot of work already done in the field to learn from, and it might be something of interest to you.
In practice it generally makes sense to apply as much mathematics as possible to a problem before resorting to a computer, that process can not only assist in solving the problem but guide your selection or construction of a program to solve what's left of the problem.
I agree with High Peformance Mark that the best option here numerically is to use analytics to scale or simplify the result first.
I will mention that if you do want to brute force it, gfortran (as of 4.6, with the libquadmath library) has support for quadruple precision reals, which you can use by selecting the appropriate kind. As long as your answers (and the intermediate results!) don't get too much bigger than what you're describing, that may work, but it will generally be much slower than double precision.
This requires looking deeper at the problem you are trying to solve and the behavior of the underlying mathematics. To add to the good advice already provided by Mark and Jonathan, consider expanding the exponential and trig functions into Taylor series and truncating to the desired level of precision.
Also, take a step back and ask why you are trying to accomplish by calculating this value. As an example, I recently had to debug why I was getting outlandish results from a property correlation which was calculating vapor pressure of a fluid to see if condensation was occurring. I spent a long time trying to understand what was wrong with the temperature being fed into the correlation until I realized the case causing the error was a simulation of vapor detonation. The problem was not in the numerics but in the logic of checking for condensation during a literal explosion; physically, a condensation check made no sense. The real problem was the code was asking an unnecessary question; it already had the answer.
I highly recommend Forman Acton's Numerical Methods That (Usually) Work and Real Computing Made Real. Both focus on problems like this and suggest techniques to tame ill-mannered computations.
My question is specific to iPhone, iPod, and iPad, since I am assuming that the architecture makes a big difference. I'm hoping there is either a specification somewhere (for the various chips perhaps), or a reliable way to measure T for each specific instruction. I know I can use any number of tools to measure aggregate processor time used, memory used, etc. I want to quantify at a lower level.
So, I'm able to figure out how many times I go through the main part of the algorithm. For example, I iterate n * (n-1) times in a naive implementation, and between n (best case) and n + n * (n-1) (worst case) in another. I can also make a reasonable count of the total number of instructions (+ - = % * /, and logic statements), and I can compare those counts, but that's assuming the weight of each operation is the same. Also, I don't have any idea how to weight the actual time value of a logic statement (if, else, for, while) vs a mathematical operator... is "if" as much work as "+" each time I use it? I would love to know where to find this information.
So, for clarity, my goal is to discover how much processor time I am demanding of the CPU (or GPU or any U) so that I can design an optimal algorithm around processor time. Can someone give me an idea of where to start for iOS hardware?
Edit: This link to ClockServices.c and SIMD stuff in the developer portal might be a good start for people interested in this. A few more cups of coffee tonight and I might get through it ;)
On a modern platform, processor time isn't the only limiting factor. Often, memory access is.
Still, processor time:
Your basic approach at an estimation for the processor load is OK, though, and is sensible: Make a rough estimate of the cost based on your knowledge of typical platforms.
In this article, Table 1 shows the times for typical primitive operations in .NET. While your platform may vary, the relative time is usually very similar. Maybe you can find - or even make - one for iStuff.
(I haven't come across one so thorough for other platforms, except processor / instruction set manuals, but they deal with assembly instructions)
memory locality:
A cache miss can cost you hundreds of cycles, a disk access a thousand times as much. So controlling your memory access patterns (i.e. reducing the working set, restructuring and accessing data in a cache-friendly way) is an important part of evaluating an algorithm.
xCode has instruments to measure performance of each function/operation, you can simply use them.