Getting Average Execution Time of a Program - average

I am trying to get the total execution time of a program - running it different times will give different run times since it depends on a lot of factors. Thus, I am running this program multiple times and taking the mean running time. Is there a way to determine what is the optimal number of times this program should be run because running it too many times will lead to the waste of precious resources, while running it too little times will not allow me to gain reliable insights?

Related

Internal multithread error during variation experiment

I run a parameter variation experiment that varies five parameters of two levels each to yield 32 iterations. During the run, the error in the attached image occurred.
-When running this design three separated times with no replications in any of them, the error occurred only in one run of the three.
-When adding replications to the run (even as few as two replications) the error always occurs too early in run time.
-My selection for the maximum available memory for the experiment is: 60,000 Mb of the total 46Gb RAM of the device
-Disabling of parallel execution doesn't seem appealing due to the consequent slow run speed; I use material handling library, and it takes around 18 minutes for a single day run time.
How can I overcome this error?
Thanks

A way/ tool to make an estimation of execution time for an APP/ task

I'am trying to run real-life experiment for an application in Raspberry Pi, and I need to estimate or predicate the execution time for the application. in other words, before execution/ run the task i need to know how long (roughly) this task/app going to take to get the result back. I have identified several techniques and works that have been done before. but most of it are simulation work which doesn't work with real-life experiment. does anyone can help me with any idea or technique (No code). thank you in advance
Estimating the execution time of an application or function is going to be difficult in any context. You might want to look up the halting problem for some insight to why. It's impossible to determine whether a given program will finish executing, and therefore, you can't really tell how long a given program will take to finish executing.
For general computing, varying hardware capabilities of any given system will always have an effect on the execution time of a program. Raspberry Pi is a little more discrete than that, and therefore more predictable in that sense, but those specifications will not always be consistent across its various versions. That adds to the complexity of determining a run time.
Practically, the most reliable way to determine how long a process will take would be to just run it and time it. If you absolutely need predicted times for something, you might be able to do a bit of a composite estimate - time the smaller chunks of the application separately, and then use those to determine how long you expect the application as a whole to run. For most situations, though, it would be much faster to just run the program itself rather than trying to predict it.
Store the time before and after the execution ? Then you could know the execution time

Faster way to run simulink simulation repeatedly for a large number of time

I want to run a simulation which includes SimEvent blocks (thus only Normal option is available for sim run) for a large number of times, like at least 1000. When I use sim it compiles the program every time and I wonder if there is any other solution which just run the simulation repeatedly in a faster way. I disabled Rebuild option from Configuration Parameter and it does make it faster but still takes ages to run for around 100 times.
And single simulation time is not long at all.
Thank you!
It's difficult to say why the model compiles every time without actually seeing the model and what's inside it. However, the Parallel Computing Toolbox provides you with the ability to distribute the iterations of your model across several cores, or even several machines (with the MATLAB Distributed Computing Server). See Run Parallel Simulations in the documentation for more details.

Why does Matlab run faster after a script is "warmed up"?

I have noticed that the first time I run a script, it takes considerably more time than the second and third time1. The "warm-up" is mentioned in this question without an explanation.
Why does the code run faster after it is "warmed up"?
I don't clear all between calls2, but the input parameters change for every function call. Does anyone know why this is?
1. I have my license locally, so it's not a problem related to license checking.
2. Actually, the behavior doesn't change if I clear all.
One reason why it would run faster after the first time is that many things are initialized once, and their results are cached and reused the next time. For example in the M-side, variables can be defined as persistent in functions that can be locked. This can also occur on the MEX-side of things.
In addition many dependencies are loaded after the first time and remain so in memory to be re-used. This include M-functions, OOP classes, Java classes, MEX-functions, and so on. This applies to both builtin and user-defined ones.
For example issue the following command before and after running your script for the first run, then compare:
[M,X,C] = inmem('-completenames')
Note that clear all does not necessarily clear all of the above, not to mention locked functions...
Finally let us not forget the role of the accelerator. Instead of interpreting the M-code every time a function is invoked, it gets compiled into machine code instructions during runtime. JIT compilation occurs only for the first invocation, so ideally the efficiency of running object code the following times will overcome the overhead of re-interpreting the program every time it runs.
Matlab is interpreted. If you don't warm up the code, you will be losing a lot of time due to interpretation instead of the actual algorithm. This can skew results of timings significantly.
Running the code at least once will enable Matlab to actually compile appropriate code segments.
Besides Matlab-specific reasons like JIT-compilation, modern CPUs have large caches, branch predictors, and so on. Warming these up is an issue for benchmarking even in assembly language.
Also, more importantly, modern CPUs usually idle at low clock speed, and only jump to full speed after several milliseconds of sustained load.
Intel's Turbo feature gets even more funky: when power and thermal limits allow, the CPU can run faster than its sustainable max frequency. So the first ~20 seconds to 1 minute of your benchmark may run faster than the rest of it, if you aren't careful to control for these factors.
Another issue not mensioned by Amro and Marc is memory (pre)allocation.
If your script does not pre-allocate its memory it's first run would be very slow due to memory allocation. Once it completed its first iteration all memory is allocated, so consecutive invokations of the script would be more efficient.
An illustrative example
for ii = 1:1000
vec(ii) = ii; %// vec grows inside loop the first time this code is executed only
end

Is Scala doing anything in parallel on its own?

I have little program creating a maze. It uses lots of collections (the default variant, which is immutable, or at least used as an immutable).
The program calculates 30 mazes with increasing dimensions. Using a for comprehension over (1 to 30)
Since with the latest versions the parallel collections framework became available I thought to give it a spin, hoping for some performance gain.
This failed and when I investigated a little, I found the following:
When run without any call to anything remotely parallel it still showed a processor load of about 30% on each of the 4 cores of my machine.
When I replaced the Range 1 to 30 with (1 to 30).par CPU load went up to about 80% on all cores (which I expected). The order in which the mazes completed became more or less random (which I expected). The total time for all mazes stayed the same.
Replacing some of the internally used collections with their parallel counter parts did seem to have an effect.
I now have 2 questions:
Why do I have all 4 cores spinning, although there isn't anything that runs in parallel.
What might be likely reasons for the program to still take the same time, no matter if running in parallel or not. There are no obvious other bottlenecks but CPU cycles (no IO, no Network, plenty of Memory via -Xmx setting)
Any ideas on this?
The 30% per core version is just a poor scheduler (sounds like Windows 7) migrating the process from core to core very frequently. It's probably closer to 25% per core (1/4) for your process plus misc other load making 30%. If you run the same example under Linux you would probably see one core pegged.
When you converted to (1 to 30).par, you started really using threads across all cores but the synchronization overhead of distributing such a small amount of work and then collecting the results cancelled out the parallelism gains. You need to break your work into larger independent chunks.
EDIT: If each of 1..30 represents some larger amount of work (solving a maze, say) then automatic parallelization will work much better if each unit of work is about the same. Imagine you had 29 easy mazes and one very very hard maze. The 30th maze will still run serially (or very nearly) with everything else). If your mazes increase in complexity by number try spawning them in the order 30 to 1 by -1 so that the biggest tasks will go first. Think of it as a braindead solution to the knapsack problem.