I have used caliper based template https://github.com/dcsobral/scala-foreach-benchmark happily for years. It runs randomly constructed problems multiple times and than calculates average time consumption.
Now I have faced with non-deterministic algorithm. So I need to know both running time and resulting fitness. I'm searching for a java/scala benchmark framework that can measure both characteristics in average and worst case scenario.
Non-deterministic means that algorithm relies on some random generator to make decisions. It used to find a near-optimal solution, where searching the optimum would require too much processor time. e.g. solution for the TSP problem.
Fitness means cost function for optimization process. Different runs (with different random seeds) may come with different costs. So you need to stabilize not only run time, but cost value (fitness).
I don't know if repeated invocation of a function until it shows acceptable run time variation is common feature of benchmark frameworks, but the caliper does so, and I search for a similar framework that is more advanced and capable of handling fitness besides time.
Use JMH: it has Scala bindings, and can sample execution time.
Related
I am using Matlab for this project. I have introduced some modifications to the ode45 solver.
I am using sometimes up to 64 components, all in the [0,1] interval and the components sum up to 1.
At some intervals I halt the integration process in order to run a quick check to see whether further integration is needed and I am looking for some clever way to efficiently figure this one.
I have found four cases and I should be able to detect each of them during a check:
1: The system has settled into an equilibrium and all components are unchanged.
2: Three or more components are wildly fluctuating in a periodic manner.
3: One or two components are changing very rapidly with low amplitude and short frequency.
4: None of the above is true and the integration must be continued.
To give an idea: I have found it to be a good practice to use the last ~5k states generated by the ode45 solver to a function for this purpose.
In short: how does one detect equilibrium or a nonchanging periodic pattern during ODE integration?
Steady-state only occurs when the time derivatives your model function computes are all 0. A periodic solution like you described corresponds rather to a limit cycle, i.e. oscillations around an unstable equilibrium. I don't know if there are methods to detect these cycles. I might update my answers to give more info on that. Maybe an idea would be to see if the last part of the signal correlates with itself (with a delay corresponding to the cycle period).
Note that if you are only interested in the steady state, an implicit method like ode15s may be more efficient, as it can "dissipate" all the transient fluctuations and use much larger time steps than explicit methods, which must resolve the transient accurately to avoid exploding. However, they may also dissipate small-amplitude limit cycles. A pragmatic solution is then to slightly perturb the steady-state values and see if an explicit integration converges towards the unperturbed steady-state.
Something I often do is to look at the norm of the difference between the solution at each step and the solution at the last step. If this difference is small for a sufficiently high number of steps, then steady-state is reached. You can also observe how the norm $||frac{dy}{dt}||$ converges to zero.
This question is actually better suited for the computational science forum I think.
I am attempting to train a neural network to control a simple entity in a simulated 2D environment, currently by using a genetic algorithm.
Perhaps due to lack of familiarity with the correct terms, my searches have not yielded much information on how to treat fitness and training in cases where all the following conditions hold:
There is no data available on correct outputs for given inputs.
A performance evaluation can only be made after an extended period of interaction with the environment (with continuous controller input/output invocation).
There is randomness inherent in the system.
Currently my approach is as follows:
The NN inputs are instantaneous sensor readings of the entity and environment state.
The outputs are instantaneous activation levels of its effectors, for example, a level of thrust for an actuator.
I generate a performance value by running the simulation for a given NN controller, either for a preset period of simulation time, or until some system state is reached. The performance value is then assigned as appropriate based on observations of behaviour/final state.
To prevent over-fitting, I repeat the above a number of times with different random generator seeds for the system, and assign a fitness using some metric such as average/lowest performance value.
This is done for every individual at every generation. Within a given generation, for fairness each individual will use the same set of random seeds.
I have a couple of questions.
Is this a reasonable, standard approach to take for such a problem? Unsurprisingly it all adds up to a very computationally expensive process. I'm wondering if there are any methods to avoid having to rerun a simulation from scratch every time I produce a fitness value.
As stated, the same set of random seeds is used for the simulations for each individual in a generation. From one generation to the next, should this set remain static, or should it be different? My instinct was to use different seeds each generation to further avoid over-fitting, and that doing so would not have an adverse effect on the selective force. However, from my results, I'm unsure about this.
It is a reasonable approach, but genetic algorithms are not known for being very fast/efficient. Try hillclimbing and see if that is any faster. There are numerous other optimization methods, but nothing is great if you assume the function is a black box that you can only sample from. Reinforcement learning might work.
Using random seeds should prevent overfitting, but may not be necessary depending on how representative a static test is of average, and how easy it is to overfit.
If I assume that a problem is a candidate for parallization e.g. matrix multiplication or some other problem and I use an Intel i7 haswell dualcore, is there some way I can compare a parallel execution to a sequential version of the same program or will matlab optimize a program to my architecture (dualcore, quadcore..)? I would like to know the speedup from adding more processors from a good benchmark parallell program.
Unfortunately there is no such thing as a benchmark parallel program. If you measure a speedup for a benchmark algorithm that does not mean that all the algorithms will benefit from parallelization
Since your target architecture has only 2 cores you might be better off avoiding parallelization at all and let Matlab and the operative system to optimize the execution. Anyway, here are the steps I followed.
Determine if your problem is apt for parallelization by calculating the theoretical speedup. Some problems like matrix multiplication or Gauss elimination are well studied. Since I assume your problem is more complicated than that, try to decompose your algorithm into simple blocks and determine, block-wise, the advantages of parallelization.
If you find that several parts of your algorithms could profit from parallelization, study those part separately.
Obtain statistical information of the runtime of your sequential algorithm. That is, run your program X number of times under similar conditions (and similar inputs) and average the running time.
Obtain statistical information of the runtime of your parallel algorithm.
Measure with the profiler. Many people recommends to use function like tic or toc. The profiler will give you a more accurate picture of your running times, as well as detailed information per function. See the documentation for detailed information on how to use the profiler.
Don't make the mistake of not taking into account the time Matlab takes to open the pool of workers (I assume you are working with the Parallel Computing Toolbox). Depending on your number of workers, the pool takes more/less time and in some occasions it could be up to 1 minute (2011b)!
You can try "Run and time" feature on MATLAB.
Or simply put some tic and toc to the first and end of your code, respectively.
Matlab provides a number of timing functions to help you assess the performance of your code: go read the documentation here and select the function that you deem most appropriate in your case! In particular, be aware of the difference between tic toc and the cputime function.
Iam writting my thesis and using software called Wingen3 and I am facing problem in determing How many replication should I put when using the program to generate Data?
Some says 5, some says 10,000 but is there a rule or a formula to determine how many replication?
Nobody can give you more than a hand-waving guess without knowing more about your specific case. Note: I know absolutely nothing about "Wingen3", but sample size questions are (or at least ought to be) a function of the statistical properties of your estimators, not of the software.
In general you replicate simulations when they are stochastic to estimate the distributional behavior of the output measures. How many replications depends entirely on what type of measure you're trying to determine and what margin of error you're willing to tolerate in the estimates. One fairly common technique is to make a small initial run and estimate the sample variability of your performance measure. Then project how large a sample will get you down to the desired margin of error. This works fairly well if you're estimating means, medians, or quartiles, but not at all well for estimating quantiles in the tail of your distribution. For instance, if you want to determine the 99.9%-ile, you're seeking extremes that happen one time in a thousand on average and you may need tens or even hundreds of thousands of replications to accurately assess such rare events.
I have a program in SimMechanics that uses 6 derivative blocks (du/dt). It takes about 24 hours to do 10 secs of simulation. Is there any way to reduce the calculation time of the Simulink derivative blocks?
You don't say what your integration time step is. If it's on the order of milliseconds, and you're simulating a 10 sec total transient time, that means 10,000 time steps.
The stability limit of the time step is determined by the characteristics of the dynamic system you're simulating.
It's also affected by the integration scheme you're using. Explicit integration is well-known to have stability problems for larger time steps, so if you're using an Euler method of integration you'll be forced to use a small time step.
Maybe you can switch your integration scheme to an implicit method, 5th order Runge Kutta with error correction, or Burlich-Storer. See your documentation for details.
You've given no useful information about the physics of the system of interest, the size of the model, or your simulation choices, so all this is an educated guess on my part.
Runge-Kutta methods (called ODE45 or ODE23 in Matlab dialect) are not always useful with mechanical problems, due to best performance with variable time slice setup. Move to fixed time setup and select the solver by evaluating the error order you can admit. Refer to both Matlab documentation (and some Numerical Analysis texts too, :-) ) for deeper detail.
Consider also if your problem needs some "stiff-enabled" technique of resolution. Huge constant terms could drive to instability your solver if not properly handled.