Dedicated distributed system to handle matlab jobs

Dedicated distributed system to handle matlab jobs - matlab

I'm a software engineer and currently looking forward to setup a distributed system at my laboratory so that I can process some matlab jobs over them. I have looked into MATLAB MPI but I want to know if there is some way so that I can setup a system here without any FEE or AMOUNT.

I have spent a lot of time looking at that very issue, and the short answer is: nope, not possible.
There are two long answers. First, if you're constrained to using Matlab, then all roads lead back to MathWorks. One possibility is that you could compile your code, you'd need to buy the compiler from Mathworks, though, and then you could run the compiled code on whatever grid infrastructure you wish, such as Hadoop.
Second, for this reason, I have found it much better to simply port code to another language, usually one in open source. For the work I tend to do, Octave is a poor replacement for Matlab. Instead, R and Python are great for most of the same functionality. Personally, I lean a lot more toward R than Python, but that's because R is a better fit for these applications (i.e. they're very statistical in nature).
I've ported a lot of Matlab code to R and it's not too bad. Porting to Python would be easier in general, though, and there is a very large Matlab refugee community that has switched to Python.
Once you're in either Python or R, there are a lot of options for MPI, multicore tools, distributed systems, GPU tools, and more. In fact, you may find the migration easier by writing some of the distributed functions in Python or R, loading up an easy to use grid system, and then have Matlab submit the job to the server. Your local code could be the same, but then you can work on porting only the gridded parts, which you'd probably have to devote some time to write in Matlab, anyway.

I wouldn't say it's completely impossible; You can use TCP/IP sockets to build a client/server application (you will find many MEX implementations for BSD sockets on File Exchange).
The architecture is simple: your main MATLAB client script sends jobs (code along with any needed data serialized) to nodes to evaluate and send back results when done. These nodes would be distributed MATLAB instances running the server part which listens for connections, and runs anything it receive through the EVAL function.
Obviously it is up to write code that can be divided into breakable tasks.
This is not as sophisticated as what is offered by the Distributed Computing Toolbox, but basically does the same thing...

Related

Parallel scripts on MATLAB

I have two systems running on MATLAB: the control system and the computer vision system.
The control system needs to receive three variables generated by the computer vision system periodically. However, I can't single thread both systems, because the computer vision system latency is too high compared with the control system latency.
I tried to run each program in a different MATLAB session and use a .mat file as interface between both sessions, but it did not work.
I'm not familiar with the Parallel Computing Toolbox. So I was wondering if someone can help with this? Or at lest give a start up idea, because, as I've said, I will start to learn the Parallel Computing Toolbox now.

I think the function in the Parallel Computing Toolbox you may be looking for is parfeval. It lets you spawn an asynchronous task, and get its result whenever it is ready.

In addition to parfeval as suggested by #Dima you might also want to look into labSendReceive
and associated function like labSend and labReceive which allow to share data between individual workers in your parallel pool. I guess which one is best for you depends on the type of calculation you want to do.

Matlab and GPU/CUDA programming

I need to run several independent analyses on the same data set.
Specifically, I need to run bunches of 100 glm (generalized linear models) analyses and was thinking to take advantage of my video card (GTX580).
As I have access to Matlab and the Parallel Computing Toolbox (and I'm not good with C++), I decided to give it a try.
I understand that a single GLM is not ideal for parallel computing, but as I need to run 100-200 in parallel, I thought that using parfor could be a solution.
My problem is that it is not clear to me which approach I should follow. I wrote a gpuArray version of the matlab function glmfit, but using parfor doesn't have any advantage over a standard "for" loop.
Has this anything to do with the matlabpool setting? It is not even clear to me how to set this to "see" the GPU card. By default, it is set to the number of cores in the CPU (4 in my case), if I'm not wrong.
Am I completely wrong on the approach?
Any suggestion would be highly appreciated.
Edit
Thanks. I'm aware of GPUmat and Jacket, and I could start writing in C without too much effort, but I'm testing the GPU computing possibilities for a department where everybody uses Matlab or R. The final goal would be a cluster based on C2050 and the Matlab Distribution Server (or at least this was the first project).
Reading the ADs from Mathworks I was under the impression that parallel computing was possible even without C skills. It is impossible to ask the researchers in my department to learn C, so I'm guessing that GPUmat and Jacket are the better solutions, even if the limitations are quite big and the support to several commonly used routines like glm is non-existent.
How can they be interfaced with a cluster? Do they work with some job distribution system?

I would recommend you try either GPUMat (free) or AccelerEyes Jacket (buy, but has free trial) rather than the Parallel Computing Toolbox. The toolbox doesn't have as much functionality.
To get the most performance, you may want to learn some C (no need for C++) and code in raw CUDA yourself. Many of these high level tools may not be smart enough about how they manage memory transfers (you could lose all your computational benefits from needlessly shuffling data across the PCI-E bus).

Parfor will help you for utilizing multiple GPUs, but not a single GPU. The thing is that a single GPU can do only one thing at a time, so parfor on a single GPU or for on a single GPU will achieve the exact same effect (as you are seeing).
Jacket tends to be more efficient as it can combine multiple operations and run them more efficiently and has more features, but most departments already have parallel computing toolbox and not jacket so that can be an issue. You can try the demo to check.
No experience with gpumat.
The parallel computing toolbox is getting better, what you need is some large matrix operations. GPUs are good at doing the same thing multiple times, so you need to either combine your code somehow into one operation or make each operation big enough. We are talking a need for ~10000 things in parallel at least, although it's not a set of 1e4 matrices but rather a large matrix with at least 1e4 elements.
I do find that with the parallel computing toolbox you still need quite a bit of inline CUDA code to be effective (it's still pretty limited). It does better allow you to inline kernels and transform matlab code into kernels though, something that

CUDA and MATLAB for loop optimization

I'm going to attempt to optimize some code written in MATLAB, by using CUDA. I recently started programming CUDA, but I've got a general idea of how it works.
So, say I want to add two matrices together. In CUDA, I could write an algorithm that would utilize a thread to calculate the answer for each element in the result matrix. However, isn't this technique probably similar to what MATLAB already does? In that case, wouldn't the efficiency be independent of the technique and attributable only to the hardware level?

The technique might be similar but remember with CUDA you have hundreds of threads running simultaneously. If MATLAB is using threads and those threads are running on a Quad core, you are only going to get 4 threads excuted per clock cycle while you might achieve a couple of hundred threads to run on CUDA with that same clock cycle.
So to answer you question, YES, the efficiency in this example is independent of the technique and attributable only to the hardware.

The answer is unequivocally yes, all the efficiencies are hardware level. I don't how exactly matlab works, but the advantage of CUDA is that mutltiple threads can be executed simultaneously, unlike matlab.
On a side note, if your problem is small, or requires many read write operations, CUDA will probably only be an additional headache.

CUDA has official support for matlab.
[need link]
You can make use of mex files to run on GPU from MATLAB.
The bottleneck is the speed at which data is transfered from CPU-RAM to GPU. So if the transfer is minimized and done in large chunks, the speedup is great.

For simple things, it's better to use the gpuArray support in the Matlab PCT. You can check it here
http://www.mathworks.de/de/help/distcomp/using-gpuarray.html
For things like adding gpuArrays, multiplications, mins, maxs, etc., the implementation they use tends to be OK. I did find out that for making things like batch operations of small matrices like abs(y-Hx).^2, you're better off writing a small Kernel that does it for you.

Landau notation (ide) tools support

It is good idea to have impotant information during developing like Landau notation to know functions's time costs. So it should be documented in sources isn't it?
I'm looking for tools that can calculate it.

In the general case, the asymptotic complexity of an arbitrary algorithm is undecidable, by Rice's theorem.
But in practice, you can often make a good guess by repeatedly running the algorithm on various inputs (of sizes spanning several orders of magnitude), recording actual CPU time, and fitting a curve. (You should throw out data points with very short runtimes, since these will be dominated by noise. Also, on JITed runtimes like the Java Virtual Machine, make sure to run the function for a while before starting the timing, to make sure the VM has warmed up.)

Project ideas for discrete mathematics course using MATLAB? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
A professor asked me to help making a specification for a college project.
By the time the students should know the basics of programming.
The professor is a mathematician and has little experience in other programming languages, so it should really be in MATLAB.
I would like some projects ideas. The project should
last about 1 to 2 months
be done individually
have web interface would be great
doesn't necessary have to go deep in maths, but some would be great
use a database (or store data in files)
What kind of project would make the students excited?
If you have any other tips I'll appreciate.
UPDATE: The students are sophomores and have already studied vector calculus. This project is for an one year Discrete Mathematics course.
UPDATE 2: The topics covered in the course are
Formal Logic
Proofs, Recursion, and Analysis of Algorithms
Sets and Combinatorics
Relations, Functions, and Matrices
Graphs and Trees
Graph Algorithms
Boolean Algebra and Computer Logic
Modeling Arithmetic, Computation, and Languages
And it'll be based on this book Mathematical Structures for Computer Science: A Modern Approach to Discrete Mathematics by Judith L. Gersting

General Suggestions:
There are many teaching resources at The MathWorks that may give you some ideas for course projects. Some sample links:
The MATLAB Central blogs, specifically some posts by Loren that include using LEGO Mindstorms in teaching and a webinar about MATLAB for teaching (note: you will have to sign up to see the webinar)
The Curriculum Exchange: a repository of course materials
Teaching with MATLAB and Simulink: a number of other links you may find useful
Specific Suggestions:
One of my grad school projects in non-linear dynamics that I found interesting dealt with Lorenz oscillators. A Lorenz oscillator is a non-linear system of three variables that can exhibit chaotic behavior. Such a system would provide an opportunity to introduce the students to numerical computation (iterative methods for simulating systems of differential equations, stability and convergence, etc.).
The most interesting thing about this project was that we were using Lorenz oscillators to encode and decode signals. This "encrypted communication" aspect was really cool, and was based on the following journal article:
Kevin M. Cuomo and Alan V. Oppenheim,
Circuit Implementation of Synchronized Chaos with Applications
to Communications, Physical Review
Letters 71(1), 65-68 (1993)
The article addresses hardware implementations of a chaotic communication system, but the equivalent software implementation should be simple enough to derive (and much easier for the students to implement!).
Some other useful aspects of such a project:
The behavior of the system can be visualized in 2-D and 3-D plots, thus exposing the students to a number of graphing utilities in MATLAB (PLOT, PLOT3, COMET, COMET3, etc.).
Audio signals can be read from files, encrypted using the Lorenz equations, written out to a new file, and then decrypted once again. You could even have the students each encrypt a signal with their Lorenz oscillator code and give it to another student to decrypt. This would introduce them to various file operations (FREAD, FWRITE, SAVE, LOAD, etc.), and you could even introduce them to working with audio data file formats.
You can introduce the students to the use of the PUBLISH command in MATLAB, which allows you to format M-files and publish them to various output types (like HTML or Word documents). This will teach them techniques for making useful help documentation for their MATLAB code.

I have found that implementing and visualizing Dynamical systems is great
for giving an introduction to programming and to an interesting branch of
applied mathematics. Because one can see the 'life' in these systems,
our students really enjoy this practical module.
We usually start off by visualizing a 1D attractor, so that we can
overlay the evolution rule/rate of change with the current state of
the system. That way you can teach computational aspects (integrating the system) and
visualization, and the separation of both in implementation (on a simple level, refreshing
graphics at every n-th computation step, but in C++ leading to threads, unsure about MATLAB capabilities here).
Next we add noise, and then add a sigmoidal nonlinearity to the linear attractor. We combine this extension with an introduction to version control (we use a sandbox SVN repository for this): The
students first have to create branches, modify the evolution rule and then merge
it back into HEAD.
When going 2D you can simply start with a rotation and modify it to become a Hopf oscillator, and visualize either by morphing a grid over time or by going 3D when starting with a distinct point. You can also visualize the bifurcation diagram in 3D. So you again combine generic MATLAB skills like 3D plotting with the maths.
To link in other topics, browse around in wikipedia: you can bring in hunter/predator models, chaotic systems, physical systems, etc.etc.
We usually do not teach object-oriented-programming from within MATLAB, although it is possible and you can easily make up your own use cases in the dynamical systems setting.
When introducing inheritance, we will already have moved on to C++, and I'm again unaware of MATLAB's capabilities here.
Coming back to your five points:
Duration is easily adjusted, because the simple 1D attractor can be
done quickly and from then on, extensions are ample and modular.
We assign this as an individual task, but allow and encourage discussion among students.
About the web interface I'm at a loss: what exactly do you have in mind, why is it
important, what would it add to the assignment, how does it relate to learning MATLAB.
I would recommend dropping this.
Complexity: A simple attractor is easily understood, but the sky's the limit :)
Using a database really is a lot different from config files. As to the first, there
is a database toolbox for accessing databases from MATLAB. Few institutes have the license though, and apart from that: this IMHO does not belong into such a course. I suggest introducing to the concept of config files, e.g. for the location and strength of the attractor, and later for the system's respective properties.
All this said, I would at least also tell your professor (and your students!) that Python is rising up against MATLAB. We are in the progress of going Python with our tutorials, but I understand if someone wants to stick with what's familiar.
Also, we actually need the scientific content later on, so the usefulness for you will probably depend on which department your course will be related to.

A lot of things are possible.
The first example that comes in mind is to model a public transportation network (the network of your city, with underground, buses, tramways, ...). It is represented by a weighted directed graph (you can use sparse matrix to represent it, for example).
You may, for example, ask them to compute the shortest path from one station to another one (Moore-dijkistra algorithm, for example) and display it.
So, for the students, the several steps to do are:
choose an appropriate representation for the network (it could be some objects to represent the properties of the stations and the lines, and a sparse matrix for the network)
load all the data (you can provide them the data in an XML file)
be able to draw the network (since you will put the coordinates of the stations)
calculate the shortest path from one point to another and display it in a pretty way
create a fronted (with GUI)
Of course, this could be complicated by adding connection times (when you change from one line to another), asking for several options (shortest path with minimum connections, take in considerations the time you loose by waiting for a train/bus, ...)
The level of details will depend on the level of the students and the time they could spend on it (it could be very simple, or very realist)

You want to do a project with a web interface and a database, but not any serious math... and you're doing it in MATLAB? Do you understand that MATLAB is especially designed to be used for "deep math", and not for web interfaces or databases?
I think if this is an intro to a Discrete Mathematics course, you should probably do something involving Discrete Mathematics, and not waste the students' time as they learn a bunch of things in that language that they'll never actually use.
Why not do something involving audio? I did an undergraduate project in which we used MATLAB to automatically beat-match different tunes and DJ mix between them. The full program took all semester, but you could do a subset of it. wavread() and the like are built in and easy to use.
Or do some simple image processing like finding Waldo using cross-correlation.
Maybe do something involving cryptography, have them crack a simple encryption scheme and feel like hackers.

MATLAB started life as a MATrix LAB, so maybe concentrating on problems in linear algebra would be a natural fit.
Discrete math problems using matricies include:
Spanning trees and shortest paths
The marriage problem (bipartite graphs)
Matching algorithms
Maximal flow in a network
The transportation problem
See Gil Strang's "Intro to Applied Math" or Knuth's "Concrete Math" for ideas.

You might look here: http://www.mathworks.com/academia/student_center/tutorials/launchpad.html
on the MathWorks website. The interactive tutorial (second link) is quite popular.
--Loren

I always thought the one I was assigned in grad school was a good choice-a magnetic lens simulator. The math isn't completely overwhelming so you can focus more on learning the language, and it's a good intro to the graphical capabilities (e.g., animating the path of an off-axis electron going through the lens).

db I/O and fancy interfaces are out of place in a discrete math course.
my matlab labs were typically algorithm implementations, with charts as output, and simple file input.
how hard is the material? image processing is really easy in matlab, can you do some discrete 2D filtering? blurs and stuff. http://homepages.inf.ed.ac.uk/rbf/HIPR2/filtops.htm

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse