Are compilers getting better at optimizing code over time, and if so at what rate? [closed] - compiler-optimization

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We know for example that Moores law states that the number of transistors on a chip doubles every 1.8-2 years (and hence computing power has been approximately increasing at this rate). This got me thinking about compiler optimizations. Are compilers getting better a making codes run faster as time goes on? If they are is there any theory as to how this performance increase scales? If I were to take a piece of code written in 1970 compiled with 1970 compiler optimizations would that same code run faster on the same machine but compiled with todays optimizations? Can I expect a piece a code written today to run faster in say a 100 years solely as the result of better optimizations/compilers (obviously independent of improvements in hardware and algorithm improvements)?

This is a complex, multi-faceted question, so let me try to hit on a few key points:
Compiler optimization theory is highly complex and is often (far) more difficult than the actual design of the language in the first place. This domain incorporates many other complex mathematical subdomains (eg, directed graph theory). Some problems in compiler optimization theory are known to be NP-complete or even undecidable (which represent the most complex categories of problems to solve).
While there are hundreds of known techniques (see here, for example) the implementation of these techniques is highly dependent on both the computer language and the targeted CPU (such as instruction set and pipelines). Because computer languages and CPUs are constantly evolving, the optimal implementations of even well-known techniques can change over time. New CPU features and architectures can also open up previously unavailable optimization techniques. Some of the most cutting-edge techniques may also be proprietary and thus not available to the general public for reuse. For example, several commerical JVMs offer specialty optimizations to the JIT-compilation of Java bytecode which are quantitatively superior to (default) open-source JVMs on a statistical basis.
There is an unmistakable historical trend toward better and better compiler optimization. This is why, for example, it is quite rare nowadays that any manual assembly coding is done regularly. But due to the factors already discussed (and others), the evolution of the efficiency and benefits provided by automatic compiler optimizations has been quite non-linear historically. This is in contrast to the fairly consistent curvature of Moore's law and other laws relating to computer hardware improvements. Compiler optimization's track record is probably better visualized as a line with many "fits and starts". Because the factors driving the non-linearity of compiler optimization theory will not likely change in the immediate future, it's likely this trajectory will remain non-linear for at least the near future.
It would be quite difficult to state even an average rate of improvement when languages themselves are coming and going, not to mention CPU models with different hardware features coming and going. CPUs have evolved different instruction sets and instruction set extensions over time, so it's quite difficult to even do an "apples to apples" comparison. This is true regardless of which metric you use: program length in terms of discrete instructions, program execution time (highly dependent on CPU clock speed and pipelining capabilities), or others.
Compiler optimization theory is probably now in the regime of diminishing returns. That is to say that most of the "low hanging" fruit have been addressed and much of the remaining optimizations are either quite complex or provide relatively small marginal improvements. Perhaps the greatest coming factor which will disruptively impact compiler optimization theory will be the advent of weak (or strong) AI. Because many of the future gains in compiler optimization theory will require highly complex predictive capabilities, the best optimizers will actually have some level of innate intelligence (for example, to predict the most common user inputs, to predict the most common execution paths, and to reduce NP-hard optimization problems into solvable sub-problems, etc.). It could very well be possible in the future that every piece of software you use is specifically custom compiled just for you in a tailored way to your specific use cases, interests, and requirements. Imagine that your OS (operating system) is specifically compiled or re-compiled just for you based on your specific use cases as a scientist vs. a video gamer vs. a corporate executive, or old vs. young, or any other combination of demographics that potentially impact code execution.

Related

what does impact a simulation runtime in Modelica

In order to make my model simulation's in Modelica run faster am asking the following quesion :
What does impact simulation runtime in Modelica ?
i will aprecicate any help possible.
Edit: More details can be consulted from my book "Modelica by Application -- Power Systems" (URL)
What does impact the runtime performance?
I. Applied compilation techniques
Naturally, object-oriented Modelica models, even trivial ones, would correspond to a large-scale system of equations. Modelica simulation environments would usually optimize such generated models:
reduce the number of possible equations by removing trivial ones (i.e. alias equations)
decompose a large-block of equation system with so called BLT-transformation into smaller cascaded blocks of equation systems that can be solved faster in a sequential manner and not as a single block of equations,
solve s.c. large algebraic loops using tearing methods.
It can theoretically even go too far and attempt to solve blocks of equation system in an analytical manner if possible instead of conducting expensive numerical integration
Thus, the runtime performance would be influenced by the underlying Modelica compiler and how far does it exploit equation-based compiler methods. Usually some extra settings need to be activated to exploit all possible kind of such techniques. Digging the documentation to enable such settings is needed.
II. The nature of the model
The nature of the model would influence the runtime performance, particularly:
Is the model a large-scale system? or a small-scale one?
Is it strongly nonlinear or semi-linear one?
Is the resulting optimized equation system corresponding to the model sparse (i.e. large set of equations each with few number of variables, e.g. power system network models) or dense (e.g. multibody systems and biochemical networks)
Is it a stiff system? (e.g. a system with several subsystems some exhibiting very quick dynamics and others very slow dynamics)
Does the system exhibit large number of state events
...
III The choice of the solver
The mentioned characteristics of a given model would typically influence the ideal choice of the solver. The solver can largely influence the runtime performance (and accuracy). A strategy for solver choice could be made in the following order:
For a non-stiff weakly nonlinear model, the ideal choice would be an explicit method, e.g. Single-step Runga-Kutta or Multi-step Adam-Bashforth of higher order. If accuracy is less significant, one can attempt an explicit method of a lower order which would executes faster. Naturally, increasing the solver error tolerance would also speed-up the simulation.
However, it could happen, particularly for large-scale systems, that numerical stability could be more difficult to guarantee. Then, smaller solver step-sizes (and/or smaller error tolerance) for explicit solvers should be attempted. In this case, an implicit solver with larger error tolerance can be comparable with an explicit solver with a smaller tolerance.
Actually, it is wise to try both methods, comparing the accuracy of the results, and figuring out if explicit methods produce comparably accurate results. However, as a warning this would be just a heuristic, since the system does not necessarily have the same behavior over the entire space of admissible parameter values.
For increasing nonlinearity of the model, the choice would tend more towards modern solvers making use of variable step-size techniques. Here I would start with implicit variable-step Runga-Kutta (i.e. single-step) and/or the implicit variable-step multi-step methods, Adams–Moulton. For both of these classes, one can enlarge the solver tolerance and/or lower the solver error order and figure out if the simulation produces comparably accurate solutions (but with faster runtime).
Implementations of the previous classes of methods are usually less conservative with error control, and therefore, for increasing stiffness of the model or badly scalable models, the choice would tend more towards modern solvers implementing so-called numerically more stable backward differentiation formula (BDF), s.a. DASSL, CVODE, IDA. These solvers (can) also make use of the s.c. Jacobian of the system for adaptive step-size control.
A modern solver like LSODAR that switches between explicit and implicit solvers and also perform automatic error order control (switching between different orders) is a good choice if one does not know that much information about the behavior of the model. May be some Modelica environments have an advanced solver making use of automatic switching. However, if one knows the behavior of the model in advance, it is also wise to use other suggested methods since LSODAR may not perform the most optimal switching when needed.
x. ...
The comparisons between solvers from classes 3,4 and 5 are not straightforward to judge and it depends also on whether the system is continuous or hybrid, i.e. the underlying root-finding algorithms.
Usually DASSL could be slower as it is more conservative with step-size/error control. So it seems that IDA and others are faster. Some published works exist that can give some intuitions regarding such comparisons. It would be nice to have a Modelica library including all possible types of models and running all possible benchmarks w.r.t. accuracy and runtime to draw some more solver/model specific conclusions. A library that could be used and extended for such a purpose is the ScalableTestSuite Modelica library.
IV. Advanced aspects
There have been some published works in the Modelica community regarding making use of sparse solvers to exploit the expected sparsity of the Jacobian. If such a feature is provided by the simulation environment, this would usually significantly improve the runtime performance of large-scale models.
For models with massive number of events, numerical integration in the standard way can be extremely inefficient. Particularly challenging is when an event is triggered, other sets of state-events could be further triggered and a queue of state-events should be evaluated. The root-finding algorithm could further trigger other events and the solver could be hanging on in a s.c. chattering situation. There are advanced strategies for such situations, s.c. sliding mode, however I am not sure how far Modelica simulation environments are handing this issue.
One set of suggested solutions (also for systems with high degree of stiffness) is to employ so called QSS (quantized state system) methods. This would be significantly beneficial particularly for models that can not be solved using explicit solvers. There are both explicit and implicit QSS methods. There have been also other worth-to-try numerical integration strategies where only subsets of the entire equation system is evaluated when approximating a state event. Here I am not sure about availability of such solvers.
Some simulation environments differentiate between two simulation modes which can influence the simulation runtime: the ODE Mode and DAE Mode. In the first mode, the system is reduced to an ODE system with potentially additional cascaded blocks of nonlinear equation systems. In the DAE mode, the system is reduced to a DAE system of index one. The former mode would be beneficial for dense systems exhibiting such large cascaded blocks of nonlinear equations to be solved using s.c. Tearing methods instead of numerical integration. The DAE mode would be beneficial for large-scale sparse systems solved using sparse solvers. I think the ODE mode is usually activated by choosing CVODE or LSODAR while DAE mode is activated by choosing IDA or DASSL. But digging the documentation here and there is also recommended.
There are also some published works regarding so called multirate numerical integration solvers. Here, in each numerical integration step, only the numerically-significant portion of the equation system and not the entire equation system is integrated. Hence, this is significantly beneficial for large-scale stiff systems.
x. ...
V. Parallelization
Obviously, making use of multicore / GPUs for executing numerical integration in parallel, among other approaches for applying parallelization can speed-up computations.
VI. quite very advanced topics
In order to pay attention at some excellent research attempts some of which can be exploited for speeding up the simulation runtime performance of large-scale (loosely-coupled) hybrid networked models, I am listing this here as well. Speed-up can be obtained by making use of hybrid paradigms, agent-based modeling paradigm and/or multimode paradigm. The idea behind is that it is possible to describe a loosely coupled system in several smaller subsystems and conduct the communication among subsystems only when necessary. This can be beneficial and the reasons can be traced by searching for relevant publications. There have been some excellent work in some of the mentioned directions, and it is worth to continue them where they have stopped if this is the case.
Remark: Any of the mentioned solvers is not necessarily present in all possible Modelica simulation environments. If a solver is not provided as a choice, one would still be able to produce an FMU-ME (Functional mockup unit for model exchange) and write code that numerically integrate this FMU with a desired solver.
Warning: Some of the above aspects are based on personal experiences for a particular type of models and are not necessarily true for all model types.
Few suggested reading and I am definitely missing a lot of key publications:
F. Casella, Simulation of Large-Scale Models in Modelica: State of the Art and Future Perspectives, Modelica 2016
Liu Liu, Felix Felgner and Georg Frey, Comparison of 4 numerical solvers for stiff and hybrid systems simulation, Conference 2010
Willi Braun, Francesco Casella and Bernhard Bachmann, Solving large-scale Modelica models: new approaches and experimental results using OpenModelica, Modelica 2017
Erik Henningsson and Hans Olsson and Luigi Vanfretti, DAE Solvers for Large-Scale Hybrid Models, Modelica 2019
Tamara Beltrame and François Cellier, Quantised state system simulation in Dymola/Modelica using the DEVS formalism, Modelica 2006
Victorino Sanz and Federico Bergero and Alfonso Urquia, An approach to agent-based modeling with Modelica, Simpra 2010

How to design deep convolutional neural networks? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
As I understand it, all CNNs are quite similar. They all have a convolutional layers followed by pooling and relu layers. Some have specialised layers like FlowNet and Segnet. My doubt is how should we decide how many layers to use and how do we set the kernel size for each layer in the network. I have searched for an answer to this question but I couldn't find a concrete answer. Is the network designed using trial and error or are some specific rules that I am not aware of? If you could please clarify this, I would be very grateful to you.
Short answer: if there are design rules, we haven't discovered them yet.
Note that there are comparable questions in computing. For instance, note that there is only a handful of basic electronic logic units, the gates that drive your manufacturing technology. All computing devices use the same Boolean logic; some have specialised additions, such as photoelectric input or mechanical output.
How do you decide how to design your computing device?
The design depends on the purpose of the CNN. Input characteristics, accuracy, training speed, scoring speed, adaptation, computing resources, ... all of these affect the design. There is no generalized solution, even for a given problem (yet).
For instance, consider the ImageNet classification problem. Note the structural differences between the winners and contenders so far: AlexNet, GoogleNet, ResNet, VGG, etc. If you change inputs (say, to MNIST), then these are overkill. If you change the paradigm, they may be useless. GoogleNet may be a prince of image processing, but it's horrid for translating spoken French to written English. If you want to track a hockey puck in real time on your video screen, forget these implementations entirely.
So far, we're doing this the empirical way: a lot of people try a lot of different things to see what works. We get feelings for what will improve accuracy, or training time, or whatever factor we want to tune. We find what works well with total CPU time, or what we can do in parallel. We change algorithms to take advantage of vector math in lengths that are powers of 2. We change problems slightly and see how the learning adapts elsewhere. We change domains (say, image processing to written text), and start all over -- but with a vague feeling of what might tune a particular bottleneck, once we get down to considering certain types of layers.
Remember, CNNs really haven't been popular for that long, barely 6 years. For the most part, we're still trying to learn what the important questions might be. Welcome to the research team.

What are the available approaches to interconnecting simulation systems?

I am looking for a distributed simulation algorithm which allows me to couple multiple standalone systems. The systems I am targeting for interconnection use different formalisms, e.g. discrete time and continuous simulation paradigms. Now, the only algorithms I found were from the field of parallel discrete event simulation (PDES), such as the classical chandy/misra "null"-message protocol, which has some very undesirable problems. My question is now, what other approaches to interconnecting simulation systems besides PDES-algorithms are known i.e. can be used for interconnecting simulation systems?
Not an algorithm, but there are two IEEE standards out there that define protocols intended to address your issue: High-Level Architecture (HLA) and Distributed Interactive Simulation (DIS). HLA has a much greater presence in the analytic discrete-event simulation community where I hang out, DIS tends to get more use in training applications. If you'd like to check out some applications papers, go to the Winter Simulation Conference / INFORMS-sponsorted paper archive site and search for HLA, you'll get 448 hits.
Be forewarned, trying to make this stuff work in general requires some pretty weird plumbing, lots of kludges, and can be very fragile.

Has anyone tried to compile code into neural network and evolve it?

Do you know if anyone has tried to compile high level programming languages (java, c#, etc') into a recurrent neural network and then evolve them?
I mean that the whole process including memory usage is stored in a graph of a neural net, and I'm talking about complex programs (thinking about natural language processing problems).
When I say neural net I mean a directed weighted graphs that spreads activation, and the nodes are functions of their inputs (linear, sigmoid and multiplicative to keep it simple).
Furthermore, is that what people mean in genetic programming or is there a difference?
Neural networks are not particularly well suited for evolving programs; their strength tends to be in classification. If anyone has tried, I haven't heard about it (which considering I barely touch neural networks is not a surprise, but I am active in the general AI field at the moment).
The main reason why neural networks aren't useful for generating programs is that they basically represent a mathematical equation (numeric, rather than functional). Given some numeric input, you get a numeric output. It is difficult to interpret these in the context of a program any more complicated than simple arithmetic.
Genetic Programming traditionally uses Lisp, which is a pure functional language, and often programs are often shown as tree diagrams (which occasionally look similar to some neural network diagrams - is this the source of your confusion?). The programs are evolved by exchanging entire branches of a tree (a function and all its parameters) between programs or regenerating an entire branch randomly.
There are certainly a lot of good (and a lot of bad) references on both of these topics out there - I refrain from listing them because it isn't clear what you are actually interested in. Wikipedia covers each of these techniques, and is a good starting point.
Genetic programming is very different from Neural networks. What you are suggesting is more along the lines of genetic programming - making small random changes to a program, possibly "breeding" successful programs. It is not easy, and I have my doubts that it can be done successfully across a large program.
You may have more luck extracting a small but critical part of your program, one which has a few particular "aspects" (such as parameter values) that you can try to evolve.
Google is your friend.
Some sophisticated anti-virus programs as well as sophisticated malware use formal grammar and genetic operators to evolve against each other using neural networks.
Here is an example paper on the topic: http://nexginrc.org/nexginrcAdmin/PublicationsFiles/raid09-sadia.pdf
Sources: A class on Artificial Intelligence I took a couple years ago.
With regards to your main question, no one has ever tried that on programming languages to the best of my knowledge, but there is some research in the field of evolutionary computation that could be compared to something like that (but it's obviously a far-fetched comparison). As a matter of possible interest, I asked a similar question about sel-improving compilers a while ago.
For a difference between genetic algorithms and genetic programming, have a look at this question.
Neural networks have nothing to do with genetic algorithms or genetic programming, but you can obviously use either to evolve neural nets (as any other thing for that matters).
You could have look at genetic-programming.org where they claim that they have found some near human competitive results produced by genetic programming.
I have not heard of self-evolving and self-imrpvoing programs before. They may exist as special research tools like genetic-programming.org have but nothing solid for generic use. And even if they exist they are very limited to special purpose operations like malware detection as Alain mentioned.

Landau notation (ide) tools support

It is good idea to have impotant information during developing like Landau notation to know functions's time costs. So it should be documented in sources isn't it?
I'm looking for tools that can calculate it.
In the general case, the asymptotic complexity of an arbitrary algorithm is undecidable, by Rice's theorem.
But in practice, you can often make a good guess by repeatedly running the algorithm on various inputs (of sizes spanning several orders of magnitude), recording actual CPU time, and fitting a curve. (You should throw out data points with very short runtimes, since these will be dominated by noise. Also, on JITed runtimes like the Java Virtual Machine, make sure to run the function for a while before starting the timing, to make sure the VM has warmed up.)