The client lost connection to lab X Matlab - matlab

I need help on how to tackle the Matlab error below. After a couple of successful runs I got the error message below in Matlab using parfor.
Opened 2 pools. Send function1 to worker1 and send function2 to worker2. Both functions does some sort of calcs on matrices and generate CSV at the end. It was fine until after a few runs.
The session that parfor is using has shut down
The client lost connection to lab 2. This might be due to network
problems, or the interactive matlabpool job might have errored.
We're using VM machine with a processor Intel Xeon X7560 #2.27GHz (4 processors). The RAM is 16GB. 64-bit OS.

This is part of a batch run. To resolve the issue instead of re-using the pools for every batch iteration. Make sure to "close" it. Then open fresh Matlab pools for every iteration. Seems to be more stable now, although a lot slower than the previous implementation.

Related

Solving system of equations in parallel

I'm trying to solve a system in parallel. I'm using the example from
http://www.mathworks.com/matlabcentral/answers/196655-linear-least-squares-mldivide-for-large-matrices-in-parallel
and I get the error
Starting parallel pool (parpool) using the 'local' profile ... Error using parpool (line 103)
Not enough input arguments.
Any suggestions?
This is likely because your local cluster job storage got corrupted somehow. Find local_cluster_jobs folder, delete it, and restart Matlab. On Windows it is probably located at the following path (could slightly differ depending on your Matlab version, though):
%AppData%\MathWorks\MATLAB\local_cluster_jobs

matlab parallel processing on several nodes

I have studied pages and discussion on matlab processing, but I still don't know how to distribute my program over several nodes(not cores). In the cluster which I am using, there are 10 nodes available, and inside each node there are 8 cores available. When Using "parfor" inside each node (locally between 8 cores), the parallel-ization works fine. But when using several nodes, I think that (not sure how to verify this) it doesn't work well. Here is a piece of program which I run on the cluster:
function testPool2()
disp('This is a comment')
disp(['matlab number of cores : ' num2str(feature('numCores'))])
matlabpool('open',5);
disp('This is another comment!!')
tic;
for i=1:10000
b = rand(1,1000);
end;
toc
tic;
parfor i=1:10000
b = rand(1,1000);
end;
toc
end
And the outputs is :
This is a comment
matlab number of cores : 8
Starting matlabpool using the 'local' profile ... connected to 5 labs.
This is another comment!!
Elapsed time is 0.165569 seconds.
Elapsed time is 0.649951 seconds.
{Warning: Objects of distcomp.abstractstorage class exist - not clearing this
class
or any of its super-classes}
{Warning: Objects of distcomp.filestorage class exist - not clearing this class
or any of its super-classes}
{Warning: Objects of distcomp.serializer class exist - not clearing this class
or any of its super-classes}
{Warning: Objects of distcomp.fileserializer class exist - not clearing this
class
or any of its
super-classes}
The program is first compiled using "mcc -o out testPool2.m" and then transferred to an scratch drive of a server. Then I submit the job using Microsoft HPC pack 2008 R2. Also note that I don't have access to the graphical interface of the MATLAB installed on each of the nodes. I can only submit jobs using MSR HPC Job Manager (see this: http://blogs.technet.com/b/hpc_and_azure_observations_and_hints/archive/2011/12/12/running-matlab-in-parallel-on-a-windows-cluster-using-compiled-matlab-code-and-the-matlab-compiler-runtime-mcr.aspx )
Based on the above output we can see that, the number of the available cores is 8; so I infer that the "matlabpool" only works for local cores in a machine; not between nodes (separate computers connected to each other)
So, any ideas how I can generalize my for loop ("parfor") to nodes ?
PS. I have no idea what are the warnings at the end of the output !
In order to run MATLAB on multiple nodes, the distributed computing server is needed in addition to the parallel computing toolbox. The distributing computing server must be installed and correctly configured on all of the nodes in the cluster. Normally MATLAB distributed server comes with shell scripts for launching parallel MATLAB jobs on, multiple nodes based on scheduler and cluster setup.
Without access to the distributed computing server, MATLAB can only be run on a single node. It would be valuable to verify with the cluster administrator that the distributed computing server is setup and running correctly; in some cases the administrators of these servers even have example scripts for launching and running jobs common to their user base, e.g. MATLAB
Here is a link to documentation on the Distributed Computing Server:
http://www.mathworks.com/help/mdce/index.html?searchHighlight=distributed%20computing%20server

Distributed write job crashes remote machine with MongoDB server

Looking for any advice I can get.
I have 16 virtual CPUs all writing to a single remote MongoDB server. The machine that's being written to is a 64-bit machine with 32GB RAM, running Windows Server 2008 R2. After a certain amount of time, all the CPUs stop cold (no gradual performance reduction), and any attempt to get a Remote Desktop Connection hangs.
I'm writing from Python via pymongo, and the insert statement is "[collection].insert([document], safe=True)"
I decided to more actively monitor my server as the distributed write job progressed, remoting in from time to time and checking the Task Manager. What I see is a steady memory creep, from 0.0GB all the way up to 29.9GB, in a fairly linear fashion. My leading theory is therefore that my writes are filling up the memory and eventually overwhelming the machine.
Am I missing something really basic? I'm new to MongoDB, but I remember that when writing to a MySQL database, inserts are typically followed by commits, where it's the commit statement that actually makes sure the record is written. Here I'm not doing any commits...?
Thanks,
Dave
Try it with journaling turned off and see if the problem remains.

Load distribution to instances of a perl script running on each node of a cluster

I have a perl script (call it worker) installed on each node/machine (4 total) of a cluster (each running RHEL). The script itself is configured as a RedHat Cluster service (which means the RH cluster manager would ensure that one and exactly one instance of this script is running as long as at least one node in the cluster is up).
I have X amount of work to be done every day once a day, which this script does. So far the X was small enough and only one instance of this script was enough to do it. But now the load is going to increase and along with High Availability (viz already implemented using RHCS), I also need load distribution.
Question is how do I do that?
Of course I have a way to split the work in n parts of size X/n each. Options I had in mind:
Create a new load distributor, which splits the work in jobs of X/n. AND one of the following:
Create a named pipe on the network file system (which is mounted and visible on all nodes), post all jobs to the pipe. Make each worker script on each node read (atomic) from the pipe and do the work. OR
Make each worker script on each node listen on a TCP socket and the load distributor send jobs to each this socket in a round robin (or some other algo) fashion.
Theoretical problem with #1 is that we've observed some nasty latency problems with NFS. And I'm not even sure if NFS would support IPC via named pipes across machines.
Theoretical problem with #2 is that I have to implement some monitors to ensure that each worker is running and listening, which being a noob to Perl, I'm not sure if is easy enough.
I personally prefer load distributor creating a pool and workers pulling from it, rather than load distributor tracking each worker and pushing work to each. Any other options?
I'm open to new ideas as well. :)
Thanks!
-- edit --
using Perl 5.8.8, to be precise: This is perl, v5.8.8 built for x86_64-linux-thread-multi
If you want to keep it simple use a database to store the jobs and then have each worker lock the table and get the jobs they need then unlock and let the next worker do it's thing. This isn't the most scalable solution since you'll have lock contention, but with just 4 nodes it should be fine.
But if you start going down this road it might make sense to look at a dedicated job-queue system like Gearman.

How to setup matlabpool for multiple processors?

I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things.
This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM.
On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine by runinng:
matlabpool open 2
On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors, and if I try running:
matlabpool open 8
I get an error saying that the ClusterSize is 1 since there's only 1 core on my CPU. True, there is only 1 core on each CPU, but I have 8 CPUs on the given EC2 instance!
So the difference from my machine and the ec2 instance is that I have my 2 cores on a single processor locally, while the EC2 instance has 8 distinct processors.
My question is, how do I get matlab to work with those 8 processors?
I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not), which is not my problem.
Any help appreciated!
Note: the point is not EC2, I am remoting into it and running matlab on it as if it was any other machine. The point is that I can't get matlab to see the 8 processors!
MATLAB isn't seeing all 8 cores. Set it manually. Parallel menu -> Manage Configurations. Right-click on the "local" line. In the scheduler tab, set the "Number of workers available to scheduler" to 8.
Original answer was a question getting more detail:
Are you trying to use MDCS on EC2 (and MATLAB's user interface on your PC), or are you trying to run MATLAB's user interface and PCT on EC2 (via ssh or vnc or the like)?
This post is to add information in response to a part of original poster's question
[OP] I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not)...
The paper mentioned above is no longer available
In its place MathWorks offers MATLAB users a way to set up and distribute computations on a cluster running MATLAB Distributed Computing Server (MDCS) on Amazon EC2. More information is available here: http://www.mathworks.com/ec2