I am doing a parameter variation experiment with 1000 replications for each iteration. For each of these model runs, I want to store a copy of a dataset that is in Main. My current setup is that I am writing that dataset to an excelfile after each simulation run, using the After simulation run field in the experiment with the following code:
ds_export.fillFrom(root.ds_costAll);
excelfile.writeDataSet(ds_export, 1, 2, 1 + i*2);
Where i is a counter for the current iteration.
However, I am running in some performance issues. I believe copies of ds_costAll are being stored in my system's memory, in anticipation of my experiment being completed, upon which it will be written to the excelfile. This means that my system's memory utilization is nearing 100% while the cpu is hardly even bothered. My system has 16gb of memory, and the maximum available memory of the experiment is also 16gb Is there a way to more efficiently export this data?
How many cores are you using in runtime?
Tools->Preferences->Runtime->Number of processes for parallel execution
Might be an option to reduce it a bit.
Related
Was playing around with some larger data sets and noticed that VSCode only uses around 30% CPU and RAM.
Is there some way to increase it? Probably some configurations? Thanks
You can increase/decrease the available RAM for VS Code on its Settings. Go to File -> Preferences -> Settings, there you can type files.maxMemoryForLargeFilesMB and change the value for your desired maximum RAM.
Not sure which coding language are you using, but let's break your question in two parts:
How to use more CPU ? (Can Increase Performance)
By using multiprocessing apis, which can divide a given large data sets into smaller units to be processed by various CPU cores, it is like a master slave architecture, where each sub process will execute on separate core and at max it is driven by total number of CPU cores.
If number of data units is more than CPU cores, then it will context switch
How to use more RAM ? (Can Degrade Performance)
Why do you need to increase RAM usage, that will be anyway dependant on amount of data allocated by the program
You may plan to create multiple copies to have a snapshot for each thread, needn't then use mutex or lock, but generally not a good practice
Finally :
CPU and RAM will be used process that is executing, based on programming langiage not the VSCode which is just an editor
I am running a for loop using MATLAB's parfor function. My CPU's specs are
I set preferred number of workers to 24. However, MATLAB sets this number to 6. Is number of workers bounded by the number of cores or by (number of cores)x(number of processors=6x12?
Matlab prefers to limit the number of workers to the number of cores (six in your case).
Your CPU (intel i7-9750H) has hyperthreading, i.e. you can run multiple (here 2) threads per core. However, this is of no use if you want to run them under full-load, which means that there is simply no resources available to switch to a different task (what the additional threads effectively are).
See the documentation.
Restricting to one worker per physical core ensures that each worker
has exclusive access to a floating point unit, which generally
optimizes performance of computational code. If your code is not
computationally intensive, for example, it is input/output (I/O)
intensive, then consider using up to two workers per physical core.
Running too many workers on too few resources may impact performance
and stability of your machine.
Note that Matlab needs to stream data to every core in order to run the distributed code. This is some kind of initialization effort and the reason why you won't be able to cut the runtime in half if you double the number of cores/workers. And that is also the explanation why there is no use for Matlab to make use of hyperthreading. It would just mean to increase the initial streaming effort without any speed-up -- in fact, the core would probably force matlab to save intermediate results and switch to the other task from time to time... which is the same task as before;)
When I clear all, the Matlab memory usage drops to 0.5GB. I then load a *.mat file in which the memory requirement is dominated by an object requiring 161MB (from whos). The Matlab memory usage jumps to 1.3GB. I then run a script that processes the data, creating small data objects/structures in the process. From whos, nothing rivals the 161MB object in terms of memory footprint. However, the Matlab's memory footprint creeps up to 2.2GB during processing, then settles down to 1.6GB when done. whos still reveals the overwhelmingly dominant memory user to be the loaded object.
Why does Matlab use so much memory than the data that it is processing? It's about 1000 times more. Is this just to give it the space for intermediate results?
I'm using Windows 7, 64-bit. My code is pretty simple post-processing script to tally up some of the loaded data. It invokes no user-defined functions or 3rd party tools. I understand that readers can't analyze my code to track down the specific causes, but is 1000x memory footprint typical? What are typical reasons for this?
some users (right now 4) are running the same rather large and heavy MATLAB (R2010b) script on the same windows server.
There seems to be a rather big drop in performance in the MATLAB (observed a factor 5 in running time when doing a bit of benchmarking) when more users are running the same script on the server (many different datasets). Depending on the size of the dataset the running time is between a few hours and 1-2 weeks.
There is plenty of CPU and RAM resources available on the server, this is not the bottleneck. This server has 64 cores and 128 GB RAM, the program uses no more than 10% of the CPU, most of the time less than, and about 1 GB of RAM while running).
It does not seem to be a bottleneck related to hardware, as the server in general is running other programs without any significant slow down, only MATLAB seems to be slowing down.
Is there some kind of internal resources in MATLAB that is being used up and creating a bottleneck and if so is there a way to get around this?
Edit, extra info
When running "bench" while the scripts are running I also get extremely slow speed from this internal machine benchmarking, worse for the heavier tests ... this indicates to me it is not directly related to reading/writing files, it might be indirectly related if matlab writes some temporary files.
Also just tried to increase Java Heap Memory to 10 GB ... it does improve performance a bit, but there is still a very clear slowdown with each new instance of this script that is being run.
Update: upgrading to MATLAB 2015B didn't change much. We have improved a lot on the code so it runs much faster now, but the issue still remains even though the problem is smaller since the program is script is running for shorter amounts of time for each user.
I have little program creating a maze. It uses lots of collections (the default variant, which is immutable, or at least used as an immutable).
The program calculates 30 mazes with increasing dimensions. Using a for comprehension over (1 to 30)
Since with the latest versions the parallel collections framework became available I thought to give it a spin, hoping for some performance gain.
This failed and when I investigated a little, I found the following:
When run without any call to anything remotely parallel it still showed a processor load of about 30% on each of the 4 cores of my machine.
When I replaced the Range 1 to 30 with (1 to 30).par CPU load went up to about 80% on all cores (which I expected). The order in which the mazes completed became more or less random (which I expected). The total time for all mazes stayed the same.
Replacing some of the internally used collections with their parallel counter parts did seem to have an effect.
I now have 2 questions:
Why do I have all 4 cores spinning, although there isn't anything that runs in parallel.
What might be likely reasons for the program to still take the same time, no matter if running in parallel or not. There are no obvious other bottlenecks but CPU cycles (no IO, no Network, plenty of Memory via -Xmx setting)
Any ideas on this?
The 30% per core version is just a poor scheduler (sounds like Windows 7) migrating the process from core to core very frequently. It's probably closer to 25% per core (1/4) for your process plus misc other load making 30%. If you run the same example under Linux you would probably see one core pegged.
When you converted to (1 to 30).par, you started really using threads across all cores but the synchronization overhead of distributing such a small amount of work and then collecting the results cancelled out the parallelism gains. You need to break your work into larger independent chunks.
EDIT: If each of 1..30 represents some larger amount of work (solving a maze, say) then automatic parallelization will work much better if each unit of work is about the same. Imagine you had 29 easy mazes and one very very hard maze. The 30th maze will still run serially (or very nearly) with everything else). If your mazes increase in complexity by number try spawning them in the order 30 to 1 by -1 so that the biggest tasks will go first. Think of it as a braindead solution to the knapsack problem.