Internal multithread error during variation experiment - simulation

I run a parameter variation experiment that varies five parameters of two levels each to yield 32 iterations. During the run, the error in the attached image occurred.
-When running this design three separated times with no replications in any of them, the error occurred only in one run of the three.
-When adding replications to the run (even as few as two replications) the error always occurs too early in run time.
-My selection for the maximum available memory for the experiment is: 60,000 Mb of the total 46Gb RAM of the device
-Disabling of parallel execution doesn't seem appealing due to the consequent slow run speed; I use material handling library, and it takes around 18 minutes for a single day run time.
How can I overcome this error?
Thanks

Related

Getting Average Execution Time of a Program

I am trying to get the total execution time of a program - running it different times will give different run times since it depends on a lot of factors. Thus, I am running this program multiple times and taking the mean running time. Is there a way to determine what is the optimal number of times this program should be run because running it too many times will lead to the waste of precious resources, while running it too little times will not allow me to gain reliable insights?

Matlab Segmentation Violation and Memory Assertion Failure

I am running multiple Matlab jobs in parallel on an Sun Grid Engine that is using Matlab 2016b. On my personal macbook I am running Matlab 2016a. The script is doing some MRI image processing, where each job uses a different set of parameters so that I can do parameter optimization for my image processing routine.
About half of the jobs crash however, either due to segmentation violations, malloc.c memory assertion failures ('You may have modified memory not owned by you.') or errors from HDF5-DIAG followed by a segmentation violation.
Some observations
The errors do not always occur in the same jobs or in the same
functions, but the crashes occur in several groups of jobs, where the jobs within one group crash within one minute of another.
I am not using dynamic arrays anymore but preallocate my
arrays. If the arrays turn out to be too small I extend them with
for example cat(array, zeros(1, 2000)).
The jobs use partly the
same computations so they can share data. I do this by first
checking wether the data is already generated by another job. If so
try to load it using a while loop with a maximum number of attempts
and pauses of 1 second (since it might fail when another job is
still writing to the file, if it waits a bit and retries it might
succeed). If the loading fails after the maximum number of attempts
or if the data does not exist yet, then this job performs the
required computations and tries to save the data. If the data was
saved by another job in the meantime then this job does not save the
data anymore.
I am not using any C/C++ or MEX files.
I have tested a subset of some of the jobs on my own laptop with Matlab 2016a and on a linux computer with Matlab 2016b, those worked fine. But again, the problem occurs only after a few hundred (of the total 500 iterations), and I didn't run the full simulation on my own computer but only around 20 iterations due to time constraints.

VMD terminates simulation before completion

I am trying to run a 1 ns simulation using VMD/NAMD on top of my 200 ps simulation, so I set the program to run 800000 with a timestep of 1. However, the next day (it took about 12 hours) it was complete, but I only had ~16500 frames. Anyone know why the program only collected so many frames? I have a similar issue with running different simulations: the amount I ask it to run and the number of frames I get are not the same.

Faster way to run simulink simulation repeatedly for a large number of time

I want to run a simulation which includes SimEvent blocks (thus only Normal option is available for sim run) for a large number of times, like at least 1000. When I use sim it compiles the program every time and I wonder if there is any other solution which just run the simulation repeatedly in a faster way. I disabled Rebuild option from Configuration Parameter and it does make it faster but still takes ages to run for around 100 times.
And single simulation time is not long at all.
Thank you!
It's difficult to say why the model compiles every time without actually seeing the model and what's inside it. However, the Parallel Computing Toolbox provides you with the ability to distribute the iterations of your model across several cores, or even several machines (with the MATLAB Distributed Computing Server). See Run Parallel Simulations in the documentation for more details.

Is Scala doing anything in parallel on its own?

I have little program creating a maze. It uses lots of collections (the default variant, which is immutable, or at least used as an immutable).
The program calculates 30 mazes with increasing dimensions. Using a for comprehension over (1 to 30)
Since with the latest versions the parallel collections framework became available I thought to give it a spin, hoping for some performance gain.
This failed and when I investigated a little, I found the following:
When run without any call to anything remotely parallel it still showed a processor load of about 30% on each of the 4 cores of my machine.
When I replaced the Range 1 to 30 with (1 to 30).par CPU load went up to about 80% on all cores (which I expected). The order in which the mazes completed became more or less random (which I expected). The total time for all mazes stayed the same.
Replacing some of the internally used collections with their parallel counter parts did seem to have an effect.
I now have 2 questions:
Why do I have all 4 cores spinning, although there isn't anything that runs in parallel.
What might be likely reasons for the program to still take the same time, no matter if running in parallel or not. There are no obvious other bottlenecks but CPU cycles (no IO, no Network, plenty of Memory via -Xmx setting)
Any ideas on this?
The 30% per core version is just a poor scheduler (sounds like Windows 7) migrating the process from core to core very frequently. It's probably closer to 25% per core (1/4) for your process plus misc other load making 30%. If you run the same example under Linux you would probably see one core pegged.
When you converted to (1 to 30).par, you started really using threads across all cores but the synchronization overhead of distributing such a small amount of work and then collecting the results cancelled out the parallelism gains. You need to break your work into larger independent chunks.
EDIT: If each of 1..30 represents some larger amount of work (solving a maze, say) then automatic parallelization will work much better if each unit of work is about the same. Imagine you had 29 easy mazes and one very very hard maze. The 30th maze will still run serially (or very nearly) with everything else). If your mazes increase in complexity by number try spawning them in the order 30 to 1 by -1 so that the biggest tasks will go first. Think of it as a braindead solution to the knapsack problem.