MATLAB parfor processes static memory usage? - matlab

I've had some parfor code running for around a day in order to perform grid search on classifier parameters. Anyway, from the output I'm able to tell that I'm about 95% of the way through the search. I had started my pool with 8 workers. From looking at task manager, it appears that only two of the workers are still running. This is my assumption given two MATLAB.exe processes are at 700MB and six are at 170MB. Anyway, my real concern is that all 8 of these MATLAB.exe instances have static memory usage. I.e., memory usage is not jumping around, which is what I would typically see. In the past, when not using parfor I would assume this means the program has crashed and I'll have to restart. MATLAB GUI is responding and usable.
I'm unsure what to think of this though when using the parallel computing. Anyone experienced this before? I'm running MATLAB R2013a

I don't think there's cause for concern just yet. The MATLAB processes will always use some memory even when idle and 170 MB is not unusual. In fact on my machine, if I start a pool of 4 workers using 'local', and do nothing, each worker uses around 250 MB. The worker processes will continue to exist and remain in an idle state until you close the pool.

Related

MATLAB slowing down with many instances of script running

some users (right now 4) are running the same rather large and heavy MATLAB (R2010b) script on the same windows server.
There seems to be a rather big drop in performance in the MATLAB (observed a factor 5 in running time when doing a bit of benchmarking) when more users are running the same script on the server (many different datasets). Depending on the size of the dataset the running time is between a few hours and 1-2 weeks.
There is plenty of CPU and RAM resources available on the server, this is not the bottleneck. This server has 64 cores and 128 GB RAM, the program uses no more than 10% of the CPU, most of the time less than, and about 1 GB of RAM while running).
It does not seem to be a bottleneck related to hardware, as the server in general is running other programs without any significant slow down, only MATLAB seems to be slowing down.
Is there some kind of internal resources in MATLAB that is being used up and creating a bottleneck and if so is there a way to get around this?
Edit, extra info
When running "bench" while the scripts are running I also get extremely slow speed from this internal machine benchmarking, worse for the heavier tests ... this indicates to me it is not directly related to reading/writing files, it might be indirectly related if matlab writes some temporary files.
Also just tried to increase Java Heap Memory to 10 GB ... it does improve performance a bit, but there is still a very clear slowdown with each new instance of this script that is being run.
Update: upgrading to MATLAB 2015B didn't change much. We have improved a lot on the code so it runs much faster now, but the issue still remains even though the problem is smaller since the program is script is running for shorter amounts of time for each user.

Difference between MATLAB parallel computing terminologies

I want to know the differences between
1. labs
2. workers
3. cores
4. processes
Is it just the semantics or they are all different?
labs and workers are MathWorks terminologies, and they mean roughly the same thing.
A lab or a worker is essentially an instance of MATLAB (without a front-end). You run several of them, and you can run them either on your own machine (requires only Parallel Computing Toolbox) or remotely on a cluster (requires Distributed Computing Server). When you execute parallel code (such as a parfor loop, an spmd block, or a parfeval command), the code is executed in parallel by the workers, rather than by your main MATLAB.
Parallel Computing Toolbox has changed and developed its functionality quite a lot over recent releases, and has also changed and developed the terminologies it uses to describe the way it works. At some point it was convenient to refer to them as labs when running an spmd block, but workers when running a parfor loop, or working on jobs and tasks. I believe they are moving now toward always calling them workers (although there's a legacy in the commands labSend, labReceive, labBroadcast, labindex and numlabs).
cores and processes are different, and are not themselves anything to do with MATLAB.
A core is a physical part of your processor - you might have a dual-core or quad-core processor in your desktop computer, or you might have access to a really big computer with many more than that. By having multiple cores, your processor can do multiple things at once.
A process is (roughly) a program that your operating system is running. Although the OS runs multiple programs simultaneously, it typically does this by interleaving operations from each process. But if you have access to a multiple-core machine, those operations can be done in parallel.
So you would typically want to tell MATLAB to start one worker for each of the cores you have on your machine. Each of those workers will be run as a process by the OS, and will end up being run one worker per core in parallel.
The above is quite simplified, but I hope gives a roughly accurate picture.
Edit: moved description of threads from a comment to the answer.
Threads are something different again. Threads are also not in themselves anything to do with MATLAB.
Let's go back to processes for a moment. One thing I didn't mention above is that the OS allocates each process a specific block of memory which other processes shouldn't be able to touch, so that it's difficult for them to interact with each other and mess things up.
A thread is like a process within a process - it's a stream of operations that the process runs. Typically, operations from each thread would be interleaved, but if you have multiple cores, they can also be parallelized across the cores.
However, unlike processes, they all share a memory block, which is OK because they're all managed by the same program so it should matter less if they're allowed to interact.
Regular MATLAB automatically uses multiple threads to parallelize many built-in operations (such as matrix multiplication, svd, eig, linear algebra etc) - that's without you doing anything, and whether or not you have Parallel Computing Toolbox.
However, MATLAB workers are each run as a single process with a single thread, so you have full control over how to parallelize.
I think workers are synonyms for processes. The term "cores" is related to the hardware. Labs is a mechanism which allows workers to communicate with each other. Each worker has at least one lab but can own more.
This piece of a discussion may be useful
http://www.mathworks.com/matlabcentral/answers/5529-mysterious-behavior-in-parfor-i-know-sounds-basic-but
I hope someone here will deliver more information in a more rigorous way

Using matlabpool with a specified number of workers

I've been using the command matlabpool open 8 for a while in order to speed up things.
However I just tried using it and am denied 8 cores and now limited to 4.
My laptop is an i7 with 4 cores but hyperthreaded which meant I had no issue making matlab working on 8 virtual cores.
Simultaneously I noticed the following warning message:
Warning: matlabpool will be removed in a future release.
Use parpool instead.
Seems like MathsWorks decided this was a great update for some reason.
Any ideas how I can get my code running on 8 cores again?
Note: I was using R2010b (I think) and now using R2014b.
It looks like #horchler has provided you with a direct solution to your question in the comments.
However, I would recommend sticking to the default 4 workers suggested by MATLAB, and not using 8. You're very unlikely to get significant speedup by moving to 8, and you're even likely to slow things down a bit.
You have four physical cores, and they can only do so much work. Hyperthreading enables the operating system to pretend that there are 8 cores, by interleaving operations done on pairs of virtual cores.
This is great for applications such as Outlook, which are not compute-intensive, but require lots of operations to appear simultaneous in order, for example, to keep a GUI responsive while checking for email over a network connection.
But for compute-intensive applications such as MATLAB, it will not give you any sort of real speed up, as the operations are just interleaved - you haven't increased the amount of work that the 4 real, physical cores can do. In addition, there's a small overhead in performing the hyperthreading.
In my experience, MATLAB will benefit slightly by turning hyperthreading off. (Of course other things, such as Outlook, won't: your choice).

How to fully use the CPU in Matlab [Improving performance of a repetitive, time-consuming program]

I'm working on an adaptive and Fully automatic segmentation algorithm under varying light condition , the core of this algorithm uses Particle swarm Optimization(PSO) to tune the fuzzy system and believe me it's very time consuming :| for only 5 particles and 100 iterations I have to wait 2 to 3 hours ! and it's just processing one image from my data set containing over 100 photos !
I'm using matlab R2013 ,with a intel coer i7-2670Qm # 2.2GHz //8.00GB RAM//64-bit operating system
the problem is : when starting the program it uses only 12%-16% of my CPU and only one core is working !!
I've searched a lot and came into matlabpool so I added this line to my code :
matlabpool open 8
now when I start the program the task manger shows 98% CPU usage, but it's just for a few seconds ! after that it came back to 12-13% CPU usage :|
Do you have any idea how can I get this code run faster ?!
12 Percent sounds like Matlab is using only one Thread/Core and this one with with full load, which is normal.
matlabpool open 8 is not enough, this simply opens workers. You have to use commands like parfor, to assign work to them.
Further to Daniel's suggestion, ideally to apply PARFOR you'd find a time-consuming FOR loop in you algorithm where the iterations are independent and convert that to PARFOR. Generally, PARFOR works best when applied at the outermost level possible. It's also definitely worth using the MATLAB profiler to help you optimise your serial code before you start adding parallelism.
With my own simulations I find that I cannot recode them using Parfor, the for loops I have are too intertwined to take advantage of multiple cores.
HOWEVER:
You can open a second (and third, and fourth etc) instance of Matlab and tell this additional instance to run another job. Each instance of matlab open will use a different core. So if you have a quadcore, you can have 4 instances open and get 100% efficiency by running code in all 4.
So, I gained efficiency by having multiple instances of matlab open at the same time and running a job. My jobs took 8 to 27 hours at a time, and as one might imagine without liquid cooling I burnt out my cpu fan and had to replace it.
Also do look into optimizing your matlab code, I recently optimized my code and now it runs 40% faster.

performance issues with parallel MATLAB on a NUMA machine

I'm running memory-intensive parallel computations in MATLAB on a 64-core NUMA machine under Windows 7, 8 cores per socket. I'm using parallel computing toolbox to do that. I've noticed a very strange cpu load pattern: then running say 36 parallel MATLABs, the cores on the 1st socket are fully loaded, 2nd socket is almost fully loaded too, third socket is about 50% and so on. The last socket is usually almost completely free and doing nothing. Running more than 12 parallel workers simultaneously seem to very adversely affect performance of all workers.
I tried to experiment with cpu affinity, pinning different workers to different cores. While it helps in simple tests (i.e. cpu load pattern becomes uniform across all cores), it doesn't help in our real-life memory-intensive computations.
I suspect the problem is with memory locality. I.e. all memory is allocated on 1st and 2nd sockets. This would explain strange cpu load: OS tires to run computational threads closer to the data. But I don't know neither how to confirm this suspicion directly, nor how to fix it, if it's true.
I use maxNumCompThreads(4) in all my parallel workers, if that's important. Hyperthreading is off.
You should only be able to run 12 local workers using Parallel Computing Toolbox. See the data sheet.
Please note that in R2014a the limit on the number of local workers was removed. See the release notes.