Can I tell the QB64 compiler *not* to optimize my code? - compiler-optimization

I want to experiment with the efficiency of various algorithms and compiler optimization is an obstacle. Can I disable compiler optimizations in QB64?

Related

what does impact a simulation runtime in Modelica

In order to make my model simulation's in Modelica run faster am asking the following quesion :
What does impact simulation runtime in Modelica ?
i will aprecicate any help possible.
Edit: More details can be consulted from my book "Modelica by Application -- Power Systems" (URL)
What does impact the runtime performance?
I. Applied compilation techniques
Naturally, object-oriented Modelica models, even trivial ones, would correspond to a large-scale system of equations. Modelica simulation environments would usually optimize such generated models:
reduce the number of possible equations by removing trivial ones (i.e. alias equations)
decompose a large-block of equation system with so called BLT-transformation into smaller cascaded blocks of equation systems that can be solved faster in a sequential manner and not as a single block of equations,
solve s.c. large algebraic loops using tearing methods.
It can theoretically even go too far and attempt to solve blocks of equation system in an analytical manner if possible instead of conducting expensive numerical integration
Thus, the runtime performance would be influenced by the underlying Modelica compiler and how far does it exploit equation-based compiler methods. Usually some extra settings need to be activated to exploit all possible kind of such techniques. Digging the documentation to enable such settings is needed.
II. The nature of the model
The nature of the model would influence the runtime performance, particularly:
Is the model a large-scale system? or a small-scale one?
Is it strongly nonlinear or semi-linear one?
Is the resulting optimized equation system corresponding to the model sparse (i.e. large set of equations each with few number of variables, e.g. power system network models) or dense (e.g. multibody systems and biochemical networks)
Is it a stiff system? (e.g. a system with several subsystems some exhibiting very quick dynamics and others very slow dynamics)
Does the system exhibit large number of state events
...
III The choice of the solver
The mentioned characteristics of a given model would typically influence the ideal choice of the solver. The solver can largely influence the runtime performance (and accuracy). A strategy for solver choice could be made in the following order:
For a non-stiff weakly nonlinear model, the ideal choice would be an explicit method, e.g. Single-step Runga-Kutta or Multi-step Adam-Bashforth of higher order. If accuracy is less significant, one can attempt an explicit method of a lower order which would executes faster. Naturally, increasing the solver error tolerance would also speed-up the simulation.
However, it could happen, particularly for large-scale systems, that numerical stability could be more difficult to guarantee. Then, smaller solver step-sizes (and/or smaller error tolerance) for explicit solvers should be attempted. In this case, an implicit solver with larger error tolerance can be comparable with an explicit solver with a smaller tolerance.
Actually, it is wise to try both methods, comparing the accuracy of the results, and figuring out if explicit methods produce comparably accurate results. However, as a warning this would be just a heuristic, since the system does not necessarily have the same behavior over the entire space of admissible parameter values.
For increasing nonlinearity of the model, the choice would tend more towards modern solvers making use of variable step-size techniques. Here I would start with implicit variable-step Runga-Kutta (i.e. single-step) and/or the implicit variable-step multi-step methods, Adams–Moulton. For both of these classes, one can enlarge the solver tolerance and/or lower the solver error order and figure out if the simulation produces comparably accurate solutions (but with faster runtime).
Implementations of the previous classes of methods are usually less conservative with error control, and therefore, for increasing stiffness of the model or badly scalable models, the choice would tend more towards modern solvers implementing so-called numerically more stable backward differentiation formula (BDF), s.a. DASSL, CVODE, IDA. These solvers (can) also make use of the s.c. Jacobian of the system for adaptive step-size control.
A modern solver like LSODAR that switches between explicit and implicit solvers and also perform automatic error order control (switching between different orders) is a good choice if one does not know that much information about the behavior of the model. May be some Modelica environments have an advanced solver making use of automatic switching. However, if one knows the behavior of the model in advance, it is also wise to use other suggested methods since LSODAR may not perform the most optimal switching when needed.
x. ...
The comparisons between solvers from classes 3,4 and 5 are not straightforward to judge and it depends also on whether the system is continuous or hybrid, i.e. the underlying root-finding algorithms.
Usually DASSL could be slower as it is more conservative with step-size/error control. So it seems that IDA and others are faster. Some published works exist that can give some intuitions regarding such comparisons. It would be nice to have a Modelica library including all possible types of models and running all possible benchmarks w.r.t. accuracy and runtime to draw some more solver/model specific conclusions. A library that could be used and extended for such a purpose is the ScalableTestSuite Modelica library.
IV. Advanced aspects
There have been some published works in the Modelica community regarding making use of sparse solvers to exploit the expected sparsity of the Jacobian. If such a feature is provided by the simulation environment, this would usually significantly improve the runtime performance of large-scale models.
For models with massive number of events, numerical integration in the standard way can be extremely inefficient. Particularly challenging is when an event is triggered, other sets of state-events could be further triggered and a queue of state-events should be evaluated. The root-finding algorithm could further trigger other events and the solver could be hanging on in a s.c. chattering situation. There are advanced strategies for such situations, s.c. sliding mode, however I am not sure how far Modelica simulation environments are handing this issue.
One set of suggested solutions (also for systems with high degree of stiffness) is to employ so called QSS (quantized state system) methods. This would be significantly beneficial particularly for models that can not be solved using explicit solvers. There are both explicit and implicit QSS methods. There have been also other worth-to-try numerical integration strategies where only subsets of the entire equation system is evaluated when approximating a state event. Here I am not sure about availability of such solvers.
Some simulation environments differentiate between two simulation modes which can influence the simulation runtime: the ODE Mode and DAE Mode. In the first mode, the system is reduced to an ODE system with potentially additional cascaded blocks of nonlinear equation systems. In the DAE mode, the system is reduced to a DAE system of index one. The former mode would be beneficial for dense systems exhibiting such large cascaded blocks of nonlinear equations to be solved using s.c. Tearing methods instead of numerical integration. The DAE mode would be beneficial for large-scale sparse systems solved using sparse solvers. I think the ODE mode is usually activated by choosing CVODE or LSODAR while DAE mode is activated by choosing IDA or DASSL. But digging the documentation here and there is also recommended.
There are also some published works regarding so called multirate numerical integration solvers. Here, in each numerical integration step, only the numerically-significant portion of the equation system and not the entire equation system is integrated. Hence, this is significantly beneficial for large-scale stiff systems.
x. ...
V. Parallelization
Obviously, making use of multicore / GPUs for executing numerical integration in parallel, among other approaches for applying parallelization can speed-up computations.
VI. quite very advanced topics
In order to pay attention at some excellent research attempts some of which can be exploited for speeding up the simulation runtime performance of large-scale (loosely-coupled) hybrid networked models, I am listing this here as well. Speed-up can be obtained by making use of hybrid paradigms, agent-based modeling paradigm and/or multimode paradigm. The idea behind is that it is possible to describe a loosely coupled system in several smaller subsystems and conduct the communication among subsystems only when necessary. This can be beneficial and the reasons can be traced by searching for relevant publications. There have been some excellent work in some of the mentioned directions, and it is worth to continue them where they have stopped if this is the case.
Remark: Any of the mentioned solvers is not necessarily present in all possible Modelica simulation environments. If a solver is not provided as a choice, one would still be able to produce an FMU-ME (Functional mockup unit for model exchange) and write code that numerically integrate this FMU with a desired solver.
Warning: Some of the above aspects are based on personal experiences for a particular type of models and are not necessarily true for all model types.
Few suggested reading and I am definitely missing a lot of key publications:
F. Casella, Simulation of Large-Scale Models in Modelica: State of the Art and Future Perspectives, Modelica 2016
Liu Liu, Felix Felgner and Georg Frey, Comparison of 4 numerical solvers for stiff and hybrid systems simulation, Conference 2010
Willi Braun, Francesco Casella and Bernhard Bachmann, Solving large-scale Modelica models: new approaches and experimental results using OpenModelica, Modelica 2017
Erik Henningsson and Hans Olsson and Luigi Vanfretti, DAE Solvers for Large-Scale Hybrid Models, Modelica 2019
Tamara Beltrame and François Cellier, Quantised state system simulation in Dymola/Modelica using the DEVS formalism, Modelica 2006
Victorino Sanz and Federico Bergero and Alfonso Urquia, An approach to agent-based modeling with Modelica, Simpra 2010

Should I use SystemVerilog 2-state data type in design (not verification)?

SystemVerilog has 2-state data type, should I use it in design (not verification)?
I know it would improve the simulation performance but does it have any impacts on synthesis?
Using 2-state does not necessarily improve simulation performance unless it can save you a significant amount of memory. Synthesis tools do not really care about 2-state versus 4-state unless you are specifying don't cares in either literals or parameters, not variables. So I would say it has no impact on performance.
The real issue about using 4-state versus 2-state is really in simulation and where you need or want to propagate X-values for error conditions or don't cares.

Are compilers getting better at optimizing code over time, and if so at what rate? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We know for example that Moores law states that the number of transistors on a chip doubles every 1.8-2 years (and hence computing power has been approximately increasing at this rate). This got me thinking about compiler optimizations. Are compilers getting better a making codes run faster as time goes on? If they are is there any theory as to how this performance increase scales? If I were to take a piece of code written in 1970 compiled with 1970 compiler optimizations would that same code run faster on the same machine but compiled with todays optimizations? Can I expect a piece a code written today to run faster in say a 100 years solely as the result of better optimizations/compilers (obviously independent of improvements in hardware and algorithm improvements)?
This is a complex, multi-faceted question, so let me try to hit on a few key points:
Compiler optimization theory is highly complex and is often (far) more difficult than the actual design of the language in the first place. This domain incorporates many other complex mathematical subdomains (eg, directed graph theory). Some problems in compiler optimization theory are known to be NP-complete or even undecidable (which represent the most complex categories of problems to solve).
While there are hundreds of known techniques (see here, for example) the implementation of these techniques is highly dependent on both the computer language and the targeted CPU (such as instruction set and pipelines). Because computer languages and CPUs are constantly evolving, the optimal implementations of even well-known techniques can change over time. New CPU features and architectures can also open up previously unavailable optimization techniques. Some of the most cutting-edge techniques may also be proprietary and thus not available to the general public for reuse. For example, several commerical JVMs offer specialty optimizations to the JIT-compilation of Java bytecode which are quantitatively superior to (default) open-source JVMs on a statistical basis.
There is an unmistakable historical trend toward better and better compiler optimization. This is why, for example, it is quite rare nowadays that any manual assembly coding is done regularly. But due to the factors already discussed (and others), the evolution of the efficiency and benefits provided by automatic compiler optimizations has been quite non-linear historically. This is in contrast to the fairly consistent curvature of Moore's law and other laws relating to computer hardware improvements. Compiler optimization's track record is probably better visualized as a line with many "fits and starts". Because the factors driving the non-linearity of compiler optimization theory will not likely change in the immediate future, it's likely this trajectory will remain non-linear for at least the near future.
It would be quite difficult to state even an average rate of improvement when languages themselves are coming and going, not to mention CPU models with different hardware features coming and going. CPUs have evolved different instruction sets and instruction set extensions over time, so it's quite difficult to even do an "apples to apples" comparison. This is true regardless of which metric you use: program length in terms of discrete instructions, program execution time (highly dependent on CPU clock speed and pipelining capabilities), or others.
Compiler optimization theory is probably now in the regime of diminishing returns. That is to say that most of the "low hanging" fruit have been addressed and much of the remaining optimizations are either quite complex or provide relatively small marginal improvements. Perhaps the greatest coming factor which will disruptively impact compiler optimization theory will be the advent of weak (or strong) AI. Because many of the future gains in compiler optimization theory will require highly complex predictive capabilities, the best optimizers will actually have some level of innate intelligence (for example, to predict the most common user inputs, to predict the most common execution paths, and to reduce NP-hard optimization problems into solvable sub-problems, etc.). It could very well be possible in the future that every piece of software you use is specifically custom compiled just for you in a tailored way to your specific use cases, interests, and requirements. Imagine that your OS (operating system) is specifically compiled or re-compiled just for you based on your specific use cases as a scientist vs. a video gamer vs. a corporate executive, or old vs. young, or any other combination of demographics that potentially impact code execution.

pycuda vs theano vs pylearn2

I am currently learning programming with GPU to improve the performance of machine learning algorithms. Initially I try to learn programming cuda with pure c, then I found pycuda which to me a wrapper of cuda library, and then I found theano and pylearn2 and got a little confused:
I understand them in this way:
pycuda: python wrapper for cuda library
theano: similar to numpy but transparent to GPU and CPU
pylearn2: deep learning package which build on theano and implemented several machine learning/deep learning model
Since I am new to GPU programming, should I start learning from C/C++ implementation or starting from pycuda is enough, even starting from theano? E.g. I would like to implement randomForest model after learning the GPU programming.Thanks.
Your understand is almost right. I would just add some remarks about Theano. It's much more than a Numpy which can run on the GPU. Theano is indeed is math expression compiler, which translates symbolic math expressions in highly optimized C/CUDA code, targeted for both CPU and GPU. The code it generates is often much more efficient than the one most programmers would write. Theano also can make symbolic differentiation (very useful for gradient based optimization) and has also a feature to achieve better numerical stability (which probably is something useful, though I don't know for real to what extent). It's very likely Theano will be enough to implement what you need. If you still decide to learn CUDA or PyCUDA, choose the one based no the language you will use, C++ or Python.

compiler-optimization for numerical stability

do GCC or similar compilers perform optimizations that are aimed at improving the numerical stability of floating-point operations.
It is known that seemingly simply operations like addition or computing the norm of a vector are numerically unstable if implemented in the obvious manner, and, on the other hand, compilers sometimes destroy work-arounds for these problems for the sake of speed optimization.
What is the state of the art of optimizing compiler output for numerical stable computation? Anything better than pending?
Oracle (formerly Sun) compilers for Linux and Solaris.
Their C++ and Fortran compilers have support for Interval Arithmetics