How to setup matlabpool for multiple processors? - matlab

I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things.
This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM.
On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine by runinng:
matlabpool open 2
On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors, and if I try running:
matlabpool open 8
I get an error saying that the ClusterSize is 1 since there's only 1 core on my CPU. True, there is only 1 core on each CPU, but I have 8 CPUs on the given EC2 instance!
So the difference from my machine and the ec2 instance is that I have my 2 cores on a single processor locally, while the EC2 instance has 8 distinct processors.
My question is, how do I get matlab to work with those 8 processors?
I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not), which is not my problem.
Any help appreciated!
Note: the point is not EC2, I am remoting into it and running matlab on it as if it was any other machine. The point is that I can't get matlab to see the 8 processors!

MATLAB isn't seeing all 8 cores. Set it manually. Parallel menu -> Manage Configurations. Right-click on the "local" line. In the scheduler tab, set the "Number of workers available to scheduler" to 8.
Original answer was a question getting more detail:
Are you trying to use MDCS on EC2 (and MATLAB's user interface on your PC), or are you trying to run MATLAB's user interface and PCT on EC2 (via ssh or vnc or the like)?

This post is to add information in response to a part of original poster's question
[OP] I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not)...
The paper mentioned above is no longer available
In its place MathWorks offers MATLAB users a way to set up and distribute computations on a cluster running MATLAB Distributed Computing Server (MDCS) on Amazon EC2. More information is available here: http://www.mathworks.com/ec2

Related

KVM CPU share / priority / overselling

i have question about KVM i could not find any satisfying answer in the net about.
Lets say i want to create 3 virtual machines on a host with 2 CPUs. I am assigning 1 cpu to 1 virtual machines. The other 2 virtual machines should be sharing 1 cpu. If it is possible i want to give 1 vm 30 % and the other one 70% of the cpu.
I know this does not make much sense but i am curious and want to test is :-)
I know that hypervisors like onapp can do that. But how do they do it?
KVM represents each virtual CPU as a thread in the host Linux system, actually as a thread in the QEMU process. So scheduling of guest VCPUs is controlled by the Linux scheduler.
On Linux, you can use taskset to force specific threads onto specific CPUs. So that will let you assign one VCPU to one physical CPU and two VCPUs to another. See, for example, https://groups.google.com/forum/#!topic/linuxkernelnewbies/qs5IiIA4xnw.
As far as controlling what percent of the CPU each VM gets, Linux has several scheduling policies available, but I'm not familiar with them. Any information you can find on how to control scheduling of Linux processes will apply to KVM.
The answers to this question may help: https://serverfault.com/questions/313333/kvm-and-virtual-to-physical-cpu-mapping. (Also that forum may be a better place for this question, since this one is intended for programming questions.)
If you search for "KVM virtual CPU scheduling" and "Linux CPU scheduling" (without the quotes), you should find plenty of additional information.

Matlab, parallel computing and Amazon EC2

I'm completely new to Amazon EC2. I have read the website documentation but I'm still confused.
At the moment I'm estimating my model using Matlab-r2014b. In my Matlab code I use parallel computing ("parfor") on the local cluster. I run my model through the HPC of my University which allows me to access 1 node with 40GB of memory and 12 cores.
My questions are the following:
(1) Does Amazon EC2 offer a machine with more than 40GB of memory and 12 cores where I can run my Matlab code?
(2) Prices and instructions?
Short answer, yes, the c3.8xlarge instance type is available with 60GB of ram and 32 cores.
Per hour pricing is available here: https://aws.amazon.com/ec2/pricing/ along with all the other sizes and options.

matlab parallel processing on several nodes

I have studied pages and discussion on matlab processing, but I still don't know how to distribute my program over several nodes(not cores). In the cluster which I am using, there are 10 nodes available, and inside each node there are 8 cores available. When Using "parfor" inside each node (locally between 8 cores), the parallel-ization works fine. But when using several nodes, I think that (not sure how to verify this) it doesn't work well. Here is a piece of program which I run on the cluster:
function testPool2()
disp('This is a comment')
disp(['matlab number of cores : ' num2str(feature('numCores'))])
matlabpool('open',5);
disp('This is another comment!!')
tic;
for i=1:10000
b = rand(1,1000);
end;
toc
tic;
parfor i=1:10000
b = rand(1,1000);
end;
toc
end
And the outputs is :
This is a comment
matlab number of cores : 8
Starting matlabpool using the 'local' profile ... connected to 5 labs.
This is another comment!!
Elapsed time is 0.165569 seconds.
Elapsed time is 0.649951 seconds.
{Warning: Objects of distcomp.abstractstorage class exist - not clearing this
class
or any of its super-classes}
{Warning: Objects of distcomp.filestorage class exist - not clearing this class
or any of its super-classes}
{Warning: Objects of distcomp.serializer class exist - not clearing this class
or any of its super-classes}
{Warning: Objects of distcomp.fileserializer class exist - not clearing this
class
or any of its
super-classes}
The program is first compiled using "mcc -o out testPool2.m" and then transferred to an scratch drive of a server. Then I submit the job using Microsoft HPC pack 2008 R2. Also note that I don't have access to the graphical interface of the MATLAB installed on each of the nodes. I can only submit jobs using MSR HPC Job Manager (see this: http://blogs.technet.com/b/hpc_and_azure_observations_and_hints/archive/2011/12/12/running-matlab-in-parallel-on-a-windows-cluster-using-compiled-matlab-code-and-the-matlab-compiler-runtime-mcr.aspx )
Based on the above output we can see that, the number of the available cores is 8; so I infer that the "matlabpool" only works for local cores in a machine; not between nodes (separate computers connected to each other)
So, any ideas how I can generalize my for loop ("parfor") to nodes ?
PS. I have no idea what are the warnings at the end of the output !
In order to run MATLAB on multiple nodes, the distributed computing server is needed in addition to the parallel computing toolbox. The distributing computing server must be installed and correctly configured on all of the nodes in the cluster. Normally MATLAB distributed server comes with shell scripts for launching parallel MATLAB jobs on, multiple nodes based on scheduler and cluster setup.
Without access to the distributed computing server, MATLAB can only be run on a single node. It would be valuable to verify with the cluster administrator that the distributed computing server is setup and running correctly; in some cases the administrators of these servers even have example scripts for launching and running jobs common to their user base, e.g. MATLAB
Here is a link to documentation on the Distributed Computing Server:
http://www.mathworks.com/help/mdce/index.html?searchHighlight=distributed%20computing%20server

Connecting laptop(s)/desktop(s) to form a MATLAB computing cluster?

I have experience running parallel jobs on a remote cluster, and parallel (parfor) jobs on a single local machine, but never tried making a cluster of my own. I have access couple of laptops/desktops/servers (root access on all except one server), and was wondering if I could connect them all (or some) to form a local cluster (will have about 30 cores total).
Once you move beyond working with one machine, you move license types from a parallel computing toolbox to a Distributed Computing Server license. The licenses are available in clusters from 8 workers and up. List price on a 8 worker cluster is $6K, 32 workers are $21K. You can get more information on the Mathworks product page. Also note that submitting jobs to the workers requires the Parallel Computing Toolbox.
Once you have the worker licenses the only supported way to distribute jobs to the workers is through a scheduler. The server licenses come with a basic Mathworks scheduler that does have some limitations, but is ideal for single users or small groups. Beyond that you would need to go with one of the higher end schedulers such as LSF. A full list of supported schedulers is on the product page. Moving from a PCT setup on a single machine to a distributed setup can be fairly involved.
Are you prepared to pay the license cost for this? You can use local clusters (up to 8) using 1 copy of the parallel computing toolbox license. But to use distributed clusters, you need a distributed computing toolbox for each "node" (processor core) on the cluster. I'm not familiar with how to set this up. I know that I have access to a few of these clusters, and I also use local clusters extensively. We opted to not create our own distributed cluster for this reason. We also have data that shows that distributed clusters were slow for our particular tasks (a lot of file io was happening in our case).
I know this doesn't answer your question, just a few things to think about.

How does one create a cluster from different servers in MATLAB?

I have servers A, B & C with 8 cores each. Right now, I'm running jobs in parallel on a single server with 8 workers. Is there any way I can harness the power of all three for a single job? The servers are all inter-accessible via ssh (all three exist behind a gateway, so no password required either)
I'm going to assume you're currently using Parallel Computing Toolbox. To use multiple servers together, you need the following things:
MATLAB Distributed Computing Server licences for the MATLAB workers running on the machines.
Some sort of scheduler to schedule the jobs across the machines. MDCS comes with a basic scheduler, call the "Jobmanager". There are also various freely available schedulers for Linux systems such as Torque.
The installation instructions for MDCS are quite detailed and will lead you through all the stages you need to complete to get parallel jobs running.