do GCC or similar compilers perform optimizations that are aimed at improving the numerical stability of floating-point operations.
It is known that seemingly simply operations like addition or computing the norm of a vector are numerically unstable if implemented in the obvious manner, and, on the other hand, compilers sometimes destroy work-arounds for these problems for the sake of speed optimization.
What is the state of the art of optimizing compiler output for numerical stable computation? Anything better than pending?
Oracle (formerly Sun) compilers for Linux and Solaris.
Their C++ and Fortran compilers have support for Interval Arithmetics
Related
In order to make my model simulation's in Modelica run faster am asking the following quesion :
What does impact simulation runtime in Modelica ?
i will aprecicate any help possible.
Edit: More details can be consulted from my book "Modelica by Application -- Power Systems" (URL)
What does impact the runtime performance?
I. Applied compilation techniques
Naturally, object-oriented Modelica models, even trivial ones, would correspond to a large-scale system of equations. Modelica simulation environments would usually optimize such generated models:
reduce the number of possible equations by removing trivial ones (i.e. alias equations)
decompose a large-block of equation system with so called BLT-transformation into smaller cascaded blocks of equation systems that can be solved faster in a sequential manner and not as a single block of equations,
solve s.c. large algebraic loops using tearing methods.
It can theoretically even go too far and attempt to solve blocks of equation system in an analytical manner if possible instead of conducting expensive numerical integration
Thus, the runtime performance would be influenced by the underlying Modelica compiler and how far does it exploit equation-based compiler methods. Usually some extra settings need to be activated to exploit all possible kind of such techniques. Digging the documentation to enable such settings is needed.
II. The nature of the model
The nature of the model would influence the runtime performance, particularly:
Is the model a large-scale system? or a small-scale one?
Is it strongly nonlinear or semi-linear one?
Is the resulting optimized equation system corresponding to the model sparse (i.e. large set of equations each with few number of variables, e.g. power system network models) or dense (e.g. multibody systems and biochemical networks)
Is it a stiff system? (e.g. a system with several subsystems some exhibiting very quick dynamics and others very slow dynamics)
Does the system exhibit large number of state events
...
III The choice of the solver
The mentioned characteristics of a given model would typically influence the ideal choice of the solver. The solver can largely influence the runtime performance (and accuracy). A strategy for solver choice could be made in the following order:
For a non-stiff weakly nonlinear model, the ideal choice would be an explicit method, e.g. Single-step Runga-Kutta or Multi-step Adam-Bashforth of higher order. If accuracy is less significant, one can attempt an explicit method of a lower order which would executes faster. Naturally, increasing the solver error tolerance would also speed-up the simulation.
However, it could happen, particularly for large-scale systems, that numerical stability could be more difficult to guarantee. Then, smaller solver step-sizes (and/or smaller error tolerance) for explicit solvers should be attempted. In this case, an implicit solver with larger error tolerance can be comparable with an explicit solver with a smaller tolerance.
Actually, it is wise to try both methods, comparing the accuracy of the results, and figuring out if explicit methods produce comparably accurate results. However, as a warning this would be just a heuristic, since the system does not necessarily have the same behavior over the entire space of admissible parameter values.
For increasing nonlinearity of the model, the choice would tend more towards modern solvers making use of variable step-size techniques. Here I would start with implicit variable-step Runga-Kutta (i.e. single-step) and/or the implicit variable-step multi-step methods, Adams–Moulton. For both of these classes, one can enlarge the solver tolerance and/or lower the solver error order and figure out if the simulation produces comparably accurate solutions (but with faster runtime).
Implementations of the previous classes of methods are usually less conservative with error control, and therefore, for increasing stiffness of the model or badly scalable models, the choice would tend more towards modern solvers implementing so-called numerically more stable backward differentiation formula (BDF), s.a. DASSL, CVODE, IDA. These solvers (can) also make use of the s.c. Jacobian of the system for adaptive step-size control.
A modern solver like LSODAR that switches between explicit and implicit solvers and also perform automatic error order control (switching between different orders) is a good choice if one does not know that much information about the behavior of the model. May be some Modelica environments have an advanced solver making use of automatic switching. However, if one knows the behavior of the model in advance, it is also wise to use other suggested methods since LSODAR may not perform the most optimal switching when needed.
x. ...
The comparisons between solvers from classes 3,4 and 5 are not straightforward to judge and it depends also on whether the system is continuous or hybrid, i.e. the underlying root-finding algorithms.
Usually DASSL could be slower as it is more conservative with step-size/error control. So it seems that IDA and others are faster. Some published works exist that can give some intuitions regarding such comparisons. It would be nice to have a Modelica library including all possible types of models and running all possible benchmarks w.r.t. accuracy and runtime to draw some more solver/model specific conclusions. A library that could be used and extended for such a purpose is the ScalableTestSuite Modelica library.
IV. Advanced aspects
There have been some published works in the Modelica community regarding making use of sparse solvers to exploit the expected sparsity of the Jacobian. If such a feature is provided by the simulation environment, this would usually significantly improve the runtime performance of large-scale models.
For models with massive number of events, numerical integration in the standard way can be extremely inefficient. Particularly challenging is when an event is triggered, other sets of state-events could be further triggered and a queue of state-events should be evaluated. The root-finding algorithm could further trigger other events and the solver could be hanging on in a s.c. chattering situation. There are advanced strategies for such situations, s.c. sliding mode, however I am not sure how far Modelica simulation environments are handing this issue.
One set of suggested solutions (also for systems with high degree of stiffness) is to employ so called QSS (quantized state system) methods. This would be significantly beneficial particularly for models that can not be solved using explicit solvers. There are both explicit and implicit QSS methods. There have been also other worth-to-try numerical integration strategies where only subsets of the entire equation system is evaluated when approximating a state event. Here I am not sure about availability of such solvers.
Some simulation environments differentiate between two simulation modes which can influence the simulation runtime: the ODE Mode and DAE Mode. In the first mode, the system is reduced to an ODE system with potentially additional cascaded blocks of nonlinear equation systems. In the DAE mode, the system is reduced to a DAE system of index one. The former mode would be beneficial for dense systems exhibiting such large cascaded blocks of nonlinear equations to be solved using s.c. Tearing methods instead of numerical integration. The DAE mode would be beneficial for large-scale sparse systems solved using sparse solvers. I think the ODE mode is usually activated by choosing CVODE or LSODAR while DAE mode is activated by choosing IDA or DASSL. But digging the documentation here and there is also recommended.
There are also some published works regarding so called multirate numerical integration solvers. Here, in each numerical integration step, only the numerically-significant portion of the equation system and not the entire equation system is integrated. Hence, this is significantly beneficial for large-scale stiff systems.
x. ...
V. Parallelization
Obviously, making use of multicore / GPUs for executing numerical integration in parallel, among other approaches for applying parallelization can speed-up computations.
VI. quite very advanced topics
In order to pay attention at some excellent research attempts some of which can be exploited for speeding up the simulation runtime performance of large-scale (loosely-coupled) hybrid networked models, I am listing this here as well. Speed-up can be obtained by making use of hybrid paradigms, agent-based modeling paradigm and/or multimode paradigm. The idea behind is that it is possible to describe a loosely coupled system in several smaller subsystems and conduct the communication among subsystems only when necessary. This can be beneficial and the reasons can be traced by searching for relevant publications. There have been some excellent work in some of the mentioned directions, and it is worth to continue them where they have stopped if this is the case.
Remark: Any of the mentioned solvers is not necessarily present in all possible Modelica simulation environments. If a solver is not provided as a choice, one would still be able to produce an FMU-ME (Functional mockup unit for model exchange) and write code that numerically integrate this FMU with a desired solver.
Warning: Some of the above aspects are based on personal experiences for a particular type of models and are not necessarily true for all model types.
Few suggested reading and I am definitely missing a lot of key publications:
F. Casella, Simulation of Large-Scale Models in Modelica: State of the Art and Future Perspectives, Modelica 2016
Liu Liu, Felix Felgner and Georg Frey, Comparison of 4 numerical solvers for stiff and hybrid systems simulation, Conference 2010
Willi Braun, Francesco Casella and Bernhard Bachmann, Solving large-scale Modelica models: new approaches and experimental results using OpenModelica, Modelica 2017
Erik Henningsson and Hans Olsson and Luigi Vanfretti, DAE Solvers for Large-Scale Hybrid Models, Modelica 2019
Tamara Beltrame and François Cellier, Quantised state system simulation in Dymola/Modelica using the DEVS formalism, Modelica 2006
Victorino Sanz and Federico Bergero and Alfonso Urquia, An approach to agent-based modeling with Modelica, Simpra 2010
Is there any maximum limit for decision variables in scipy linear programming module (minimization) in python? If so, Can it be extended the number of decision variables to 10000? If scipy is limited to number of decision variables, Is there any other software which can be installed in python so that I can proceed with?
The original scipy Simplex LP solver was only for very small problems. The newer scipy Interior Point solver can handle larger problems more reliably. Also make sure to pass on A_eq and/or A_ub as sparse matrices. If you don't do this you may run out of memory.
Having said this, I would be more comfortable with LP solvers that have seen more large, sparse problems than scipy. Most LP solvers have a Python interface.
Finally, larger problems are often (but not always) more complex and it may help to use a modeling tool. This will allow you to express the problem in a more natural way than using matrices. For Python there is PuLP and Pyomo (among others). Some commercial solvers also provide excellent modeling tools.
I learn something about logical circuit and computer architecture,including assembly instruction set(such as x86 instruction set,ARM instruction set) and microarchitecture(x86/ARM),I found no matter Intel processor or ARM processor can only do addition/subtraction/multiplication/division these four basic math computation hardwarely,because Intel/ARM processor only have adder/subtracter/multiplier/divider these four basic computers.
But these processors support more advanced math computation like trigonometric function,exponential function,power function and these functions' derivative/ definite integral?Even matrix computation?
I know these advanced math computations can be done softwarely(like Python's NumPy/SciPy),but Intel/ARM processor can support these advanced math computations hardwarely just like addition/subtraction/multiplication/division?
Generally speaking, you can build hardware structures to help accelerate the calculation of things such as trigonometric functions. However, in practice, it's pointless, because it's not a good use of hardware resources.
There is a paper from 1983 on how trigonometric functions were implemented on the 8087 floating-point co-processor (Implementation of transcendental functions on a numerics processor). Even there, they rely on a CORDIC implementation, which is a method of calculating trig functions using relatively basic hardware (add/sub/shift/table look-up). You can read more about CORDIC implementations in the following paper: Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations
On a modern x86 processor, complex instructions like FCOS are implemented using microcode. Intel doesn't like to talk about their microcoded instructions, but you can find a paper from AMD right now that describes this particular use of microcode: The K5 Transcendental Functions
Intel processors do support trigonometric and many advanced computations
According to "Professional Assembly Language" by Richard Blum
Since 80486 the Intel IA-32 platform has directly supported floating point operations.
The FPU Floating Point Unit supports many advanced functions other than simple add, sum, multiple, div.
They are
Absolute value FABS
Change sign FCHS
Cosine FCOS
Partial Tangent FPTAN
etc.
This page of IMSL says
To obtain improved performance we recommend linking with High
Performance versions of LAPACK and BLAS, if available.
What is High Performance versions of LAPACK and BLAS ?
There are plenty of good implementations to pick from:
Intel MKL is likely the best on Intel machines. It's not free though, so that may be a problem.
According to their benchmark, OpenBLAS compares quite well with Intel MKL and is free
Eigen is also an option and has a largish (albeit old) benchmark showing good performance on small matrices (though it's not technically a drop-in BLAS library)
ATLAS, OSKI, POSKI are examples of auto-tuned kernels which will claim to work on many architectures
Generally, it is quite hard to pick one of these without benchmarking because:
some implementations work better on different types of matrices. For example Eigen works better on matrices with small rank (100s)
some are optimised for specific architectures (e.g. Intel's)
in some cases the multithreading of the BLAS library may conflict with a multithreaded application (e.g. OpenBLAS)
developer's benchmarks may tend to emphasise cases which work better on their implementation.
I would suggest pick one or two of these libraries that apply for your use case and benchmark them for your particular application on your particular (or similar) machine. This is quite easy to do even after compiling your code.
LAPACK and BLAS are performance libraries that provides basically Linear Algebra mathematical operations for a system of linear equations. you can find such libraries useful in the computer vision for example ( Object detection and classifications ) , Classical algorithms, Modelling , ...
TAsking provides a full C implementation of the LAPACK and BLAS performance libraries, both libraries are ISO-C99 Compliant with full documentation and examples, you can check it here
http://www.tasking.com/products/tasking-lapack-performance-libraries
I'm doing some numerical estimation and correction with the Kalman filter, and would like to better estimate my parameters of Q and R, preferably dynamically.
http://en.wikipedia.org/wiki/Kalman_filter#Estimation_of_the_noise_covariances_Qk_and_Rk
That article mentions that GNU Octave is currently the best way of determining these parameters from data:
http://en.wikipedia.org/wiki/GNU_Octave#C.2B.2B_integration
Unfortunately it is written for Matlab, and there's supposedly a C++ implementation. I'm very weak in C++ and would not even know how to import a C++ library and link it properly in XCode. All of my C++ libraries to date have been wrapped in 3rd party Objective-C classes.
Has anyone used the C++ implementation for scientific computing or engineering applications on iPhone? I'd appreciate any pointers or tutorials on how to do this kind of analysis with Objective-C.
Additional keywords:
estimating covariance from data
Autocovariance Least-Squares (ALS) technique
noise covariance
Thank you!
I do not know of any such C++ library, if you fancy doing numerical analysis on iOS, the best way to go is the accelerate framework, specifically (from this description):
Linear Algebra: LAPACK and BLAS
The Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package
(LAPACK) libraries contain—as you would expect—functions to perform
linear algebra computations such as solving simultaneous linear
equations, least squares solutions of linear equations, and eigenvalue
problems. The BLAS library serves as a building block for the LAPACK
library. The BLAS and LAPACK libraries are widely distributed and
industry standard computational libraries. They are available on a
number of different platforms and architectures. So, if you are
already using these libraries you should feel right at home, as the
APIs are exactly the same on Mac OS X.
You'll need a fairly good grounding in C, pointers, arrays and such though, no way around it I feel. There is a detailed description of how to use these linear algebra primitives to implement kalman filtering (although this is using R, so probably not of mush use to you).
This is a SO post on Kalman Filtering which expressed my opinion quite well. I'm afraid I think the chances of finding a magic Objective-C wrapper for Kalman Filtering are fairly low, though I would be very happy to be proven wrong!