matlab parallel processing on several nodes - matlab

I have studied pages and discussion on matlab processing, but I still don't know how to distribute my program over several nodes(not cores). In the cluster which I am using, there are 10 nodes available, and inside each node there are 8 cores available. When Using "parfor" inside each node (locally between 8 cores), the parallel-ization works fine. But when using several nodes, I think that (not sure how to verify this) it doesn't work well. Here is a piece of program which I run on the cluster:
function testPool2()
disp('This is a comment')
disp(['matlab number of cores : ' num2str(feature('numCores'))])
matlabpool('open',5);
disp('This is another comment!!')
tic;
for i=1:10000
b = rand(1,1000);
end;
toc
tic;
parfor i=1:10000
b = rand(1,1000);
end;
toc
end
And the outputs is :
This is a comment
matlab number of cores : 8
Starting matlabpool using the 'local' profile ... connected to 5 labs.
This is another comment!!
Elapsed time is 0.165569 seconds.
Elapsed time is 0.649951 seconds.
{Warning: Objects of distcomp.abstractstorage class exist - not clearing this
class
or any of its super-classes}
{Warning: Objects of distcomp.filestorage class exist - not clearing this class
or any of its super-classes}
{Warning: Objects of distcomp.serializer class exist - not clearing this class
or any of its super-classes}
{Warning: Objects of distcomp.fileserializer class exist - not clearing this
class
or any of its
super-classes}
The program is first compiled using "mcc -o out testPool2.m" and then transferred to an scratch drive of a server. Then I submit the job using Microsoft HPC pack 2008 R2. Also note that I don't have access to the graphical interface of the MATLAB installed on each of the nodes. I can only submit jobs using MSR HPC Job Manager (see this: http://blogs.technet.com/b/hpc_and_azure_observations_and_hints/archive/2011/12/12/running-matlab-in-parallel-on-a-windows-cluster-using-compiled-matlab-code-and-the-matlab-compiler-runtime-mcr.aspx )
Based on the above output we can see that, the number of the available cores is 8; so I infer that the "matlabpool" only works for local cores in a machine; not between nodes (separate computers connected to each other)
So, any ideas how I can generalize my for loop ("parfor") to nodes ?
PS. I have no idea what are the warnings at the end of the output !

In order to run MATLAB on multiple nodes, the distributed computing server is needed in addition to the parallel computing toolbox. The distributing computing server must be installed and correctly configured on all of the nodes in the cluster. Normally MATLAB distributed server comes with shell scripts for launching parallel MATLAB jobs on, multiple nodes based on scheduler and cluster setup.
Without access to the distributed computing server, MATLAB can only be run on a single node. It would be valuable to verify with the cluster administrator that the distributed computing server is setup and running correctly; in some cases the administrators of these servers even have example scripts for launching and running jobs common to their user base, e.g. MATLAB
Here is a link to documentation on the Distributed Computing Server:
http://www.mathworks.com/help/mdce/index.html?searchHighlight=distributed%20computing%20server

Related

Running multiple parpool jobs on a cluster

I am trying to run a number of MATLAB jobs on a cluster.
Since MATLAB saves states and diaries of each parpool job in ~/.matlab/... , when I run multiple jobs on a cluster, (each job using its own parpool), then MATLAB despite the fact that I close every open parpool every time I use one, it gives me errors related to "found 5 pre-existing parallel jobs..."
Is there a way to change the preferences folder of MATLAB for each instance of MATLAB so that this conflict does not arise ?
You need to overwrite JobStorageLocation property with a unique path for each job before starting parallel pool, e.g.
pc = parcluster('local'); % or whatever cluster you're running your jobs on
pc.JobStorageLocation = 'C:\my\unique\job\storage\location';
parpool(pc);

The client lost connection to lab X Matlab

I need help on how to tackle the Matlab error below. After a couple of successful runs I got the error message below in Matlab using parfor.
Opened 2 pools. Send function1 to worker1 and send function2 to worker2. Both functions does some sort of calcs on matrices and generate CSV at the end. It was fine until after a few runs.
The session that parfor is using has shut down
The client lost connection to lab 2. This might be due to network
problems, or the interactive matlabpool job might have errored.
We're using VM machine with a processor Intel Xeon X7560 #2.27GHz (4 processors). The RAM is 16GB. 64-bit OS.
This is part of a batch run. To resolve the issue instead of re-using the pools for every batch iteration. Make sure to "close" it. Then open fresh Matlab pools for every iteration. Seems to be more stable now, although a lot slower than the previous implementation.

Connecting laptop(s)/desktop(s) to form a MATLAB computing cluster?

I have experience running parallel jobs on a remote cluster, and parallel (parfor) jobs on a single local machine, but never tried making a cluster of my own. I have access couple of laptops/desktops/servers (root access on all except one server), and was wondering if I could connect them all (or some) to form a local cluster (will have about 30 cores total).
Once you move beyond working with one machine, you move license types from a parallel computing toolbox to a Distributed Computing Server license. The licenses are available in clusters from 8 workers and up. List price on a 8 worker cluster is $6K, 32 workers are $21K. You can get more information on the Mathworks product page. Also note that submitting jobs to the workers requires the Parallel Computing Toolbox.
Once you have the worker licenses the only supported way to distribute jobs to the workers is through a scheduler. The server licenses come with a basic Mathworks scheduler that does have some limitations, but is ideal for single users or small groups. Beyond that you would need to go with one of the higher end schedulers such as LSF. A full list of supported schedulers is on the product page. Moving from a PCT setup on a single machine to a distributed setup can be fairly involved.
Are you prepared to pay the license cost for this? You can use local clusters (up to 8) using 1 copy of the parallel computing toolbox license. But to use distributed clusters, you need a distributed computing toolbox for each "node" (processor core) on the cluster. I'm not familiar with how to set this up. I know that I have access to a few of these clusters, and I also use local clusters extensively. We opted to not create our own distributed cluster for this reason. We also have data that shows that distributed clusters were slow for our particular tasks (a lot of file io was happening in our case).
I know this doesn't answer your question, just a few things to think about.

How does one create a cluster from different servers in MATLAB?

I have servers A, B & C with 8 cores each. Right now, I'm running jobs in parallel on a single server with 8 workers. Is there any way I can harness the power of all three for a single job? The servers are all inter-accessible via ssh (all three exist behind a gateway, so no password required either)
I'm going to assume you're currently using Parallel Computing Toolbox. To use multiple servers together, you need the following things:
MATLAB Distributed Computing Server licences for the MATLAB workers running on the machines.
Some sort of scheduler to schedule the jobs across the machines. MDCS comes with a basic scheduler, call the "Jobmanager". There are also various freely available schedulers for Linux systems such as Torque.
The installation instructions for MDCS are quite detailed and will lead you through all the stages you need to complete to get parallel jobs running.

How to setup matlabpool for multiple processors?

I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things.
This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM.
On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine by runinng:
matlabpool open 2
On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors, and if I try running:
matlabpool open 8
I get an error saying that the ClusterSize is 1 since there's only 1 core on my CPU. True, there is only 1 core on each CPU, but I have 8 CPUs on the given EC2 instance!
So the difference from my machine and the ec2 instance is that I have my 2 cores on a single processor locally, while the EC2 instance has 8 distinct processors.
My question is, how do I get matlab to work with those 8 processors?
I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not), which is not my problem.
Any help appreciated!
Note: the point is not EC2, I am remoting into it and running matlab on it as if it was any other machine. The point is that I can't get matlab to see the 8 processors!
MATLAB isn't seeing all 8 cores. Set it manually. Parallel menu -> Manage Configurations. Right-click on the "local" line. In the scheduler tab, set the "Number of workers available to scheduler" to 8.
Original answer was a question getting more detail:
Are you trying to use MDCS on EC2 (and MATLAB's user interface on your PC), or are you trying to run MATLAB's user interface and PCT on EC2 (via ssh or vnc or the like)?
This post is to add information in response to a part of original poster's question
[OP] I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not)...
The paper mentioned above is no longer available
In its place MathWorks offers MATLAB users a way to set up and distribute computations on a cluster running MATLAB Distributed Computing Server (MDCS) on Amazon EC2. More information is available here: http://www.mathworks.com/ec2