Does MATLAB support the parallelization of supervised machine learning algorithms? Alternatives?

Does MATLAB support the parallelization of supervised machine learning algorithms? Alternatives? - matlab

Up to now I have used RapidMiner for some data/text mining tasks, but with an increasing amount of data there are huge performance issues. AFAIK the RapidMiner Parallel Processing Extensions is only available for the enterprise version - unfortunately I am limited to the community version.
Now I want to transfer the tasks to a high performance cluster by using MATLAB (academic license). I did not find any information that the Parallel Computation Toolbox supports e.g. SVM or KNN.
Does MATLAB or any additional libraries support the paralleliization of data mining algorithms?

Most data mining and machine learning functionality for MATLAB is contained within Statistics Toolbox (in recent versions, that's called Statistics and Machine Learning Toolbox). To enable parallelization, you'll also need Parallel Computing Toolbox, and to enable that parallelization to be carried out on an HPC cluster, you'll need to install MATLAB Distributed Computing Server on the cluster.
There are lots of ways that you might want to parallelize data mining tasks - for example, you might want to parallelize an individual learning task, or parallelize a cross-validation, or parallelize several learning tasks across multiple datasets.
The first is possible for some, but not all of the data mining algorithms in Statistics Toolbox. MathWorks are gradually introducing that piece by piece. For example, kmeans is parallelized, and there is a parallelized algorithm for bagged decision trees, but I believe SVM learning is currently not parallelized. You'll need to look into the documentation for Statistics Toolbox to find out if the algorithms you require are on the list.
The second two are possible. Functionality in Statistics Toolbox for cross-validation (and bootstrapping, jack-knifing) is parallelized, as are some feature selection algorithms. And in order to parallelize running several jobs over multiple datasets, you can use functionality from Parallel Computing Toolbox (such as a parfor or parallel for loop) to iterate over them.
In addition, the upcoming R2015b release of MATLAB (out in September) will include GPU-enabled statistics functionality, providing additional speedups.

Related

Non-linear SVM is not available in Apache Spark

Does avyone know the reason why the Non-Linear SVM has not been implemented in Apache Spark?
I was reading this page:
https://issues.apache.org/jira/browse/SPARK-4638
Look at the last comment. It says:
"Commenting here b/c of the recent dev list thread: Non-linear kernels for SVMs in Spark would be great to have. The main barriers are:
Kernelized SVM training is hard to distribute. Naive methods require a lot of communication. To get this feature into Spark, we'd need to do proper background research and write up a good design.
Other ML algorithms are arguably more in demand and still need improvements (as of the date of this comment). Tree ensembles are first-and-foremost in my mind."
The question is: Why is the kernelized SVM hard to distribute?
Everybody knows that the non-linear SVMs exhibit better performance than the linear ones.

Is it beneficial to run Matlab calculations in parallel on a multi-core computer?

I have a laptop with a multi-core processor and I would like to run a lengthy loop in which Simulink simulations are performed. Is it beneficial to split the loop into two parts (it is possible in my case), open the Matlab application twice, and run a Matlab script in each of them?
Someone told me that Matlab/Simulink always uses one core per opened Matlab application. Is that correct?

MATLAB splits some builtin functions across multiple cores, but standard MATLAB code uses just one core. Generally, if you are running several independent iterations, then the computation time can benefit from parallelization. You can do this easily using either parfor (if the have the Parallel Computing Toolbox), or batch_job.

Simulink running on multiple cores (or not) for a given model

I have a huge simulink model that takes about ~1 hour to execute. In my computer (HP Z210), the execution makes the computer use all cpu cores at 100%. What is intriguing me is that the same model running on my colleague computer (Dell precision T3600) uses ~50% of the cpu "power" (some cores at 100% and some cores remain idle).
My questions is:
I always thought Simulink runs in a single core. I´m not using parrallel computer toolbox or any other toolbox. I´m only using Matlab and Simulink licenses.
Why the execution in my computer is different from my colleague? Does it have anything to do with hyper-threading?

While simulink itself is single threaded in executing a single model, a block itself might be multi threaded. For example a simulink block executing matrix multiplication will use the multi threaded implementation which is also used by matlab.

Simnulink is definitely a single-threaded application. The exception to this is if you are using Rapid Accelerator mode and have multiple cores available, then the standalone executable runs on a separate core. See How Acceleration Modes Work for more details.
If you are running multiple simulations, then you can distribute these across multiple cores with the Parallel Computing Toolbox, or even multiple workers (machines) with the MATLAB Distributed Computing Server. However, this is for multiple simulations of a model (e.g. a Monte-Carlo simulation), not for breaking uo a large model in several chunks (currently not possible as far as I know). see Run Parallel Simulations for more details.
Not sure why the execution would be different from one machine to the other. Are you both using the same release of MATLAB? Same O/S? There are so many things that could be different. With regards to speeding up the execution of model, you could try running the model in accelerated mode, using the Simulink profiler to see where the bottlenecks are, changing some of the solver settings (e.g. variable-step vs fixed-step), etc...

If your model can be built with Simulink Coder, you can use xMOD software (www.xmodsoftware.com) to execute your model in multi-core (subsystem per thread basis, with a dedicated solver and step-size for each subsystem).

Accelerating MATLAB code using GPUs?

AccelerEyes announced in December 2012 that it works with Mathworks on the GPU code and has discontinued its product Jacket for MATLAB:
http://blog.accelereyes.com/blog/2012/12/12/exciting-updates-from-accelereyes/
Unfortunately they do not sell Jacket licences anymore.
As far as I understand, the Jacket GPU Array solution based on ArrayFire was much faster than the gpuArray solution provided by MATLAB.
I started working with gpuArray, but I see that many functions are implemented poorly. For example a simple
myArray(:) = 0
is very slow. I have written some custom CUDA-Kernels, but the poorly-implemented standard MATLAB functionality adds a lot of overhead, even if working with gpuArrays consistently throughout the code. I fixed some issues by replacing MATLAB code with hand written CUDA code - but I do not want to reimplement the MATLAB standard functionality.
Another feature I am missing is sparse GPU matrices.
So my questions are:
How do is speed up the badly implemented default GPU implementations provided by MATLAB? In particular, how do I speed up sparse matrix operations in MATLAB using the GPU?

MATLAB does support CUDA based GPU. You have to access it from the "Parallel Computing Toolbox". Hope these 2 links also help:
Parallel Computing Toolbox Features
Key Features
Parallel for-loops (parfor) for running task-parallel algorithms on multiple processors
Support for CUDA-enabled NVIDIA GPUs
Full use of multicore processors on the desktop via workers that run locally
Computer cluster and grid support (with MATLAB Distributed Computing Server)
Interactive and batch execution of parallel applications
Distributed arrays and single program multiple data (spmd) construct for large dataset handling and data-parallel algorithms
MATLAB GPU Computing Support for NVIDIA CUDA-Enabled GPUs
Using MATLAB for GPU computing lets you accelerate your applications with GPUs more easily than by using C or Fortran. With the familiar MATLAB language you an take advantage of the CUDA GPU computing technology without having to learn the intricacies of GPU architectures or low-level GPU computing libraries.
You can use GPUs with MATLAB through Parallel Computing Toolbox, which supports:
CUDA-enabled NVIDIA GPUs with compute capability 2.0 or higher. For releases 14a and earlier, compute capability 1.3 is sufficient.
GPU use directly from MATLAB
GPU-enabled MATLAB functions such as fft, filter, and several linear algebra operations
GPU-enabled functions in toolboxes: Image Processing Toolbox, Communications System Toolbox, Statistics and Machine Learning Toolbox, Neural Network Toolbox, Phased Array Systems Toolbox, and Signal Processing Toolbox (Learn more about GPU support for signal processing algorithms)
CUDA kernel integration in MATLAB applications, using only a single line of MATLAB code
Multiple GPUs on the desktop and computer clusters using MATLAB workers in Parallel Computing Toolbox and MATLAB Distributed Computing Server

I had the pleasure of attending a talk by John, the founder of AccelerEyes. They did not get the speedup because they just removed poorly written code and replaced it with code that saved a few bits here and there. Their speedup was mostly from exploiting the availability of cache and doing a lot of operations in-memory (GPU's). Matlab relied on transferring data between GPU and CPU, if I remember correctly, and hence the speedup was crazy.

Matlab and GPU/CUDA programming

I need to run several independent analyses on the same data set.
Specifically, I need to run bunches of 100 glm (generalized linear models) analyses and was thinking to take advantage of my video card (GTX580).
As I have access to Matlab and the Parallel Computing Toolbox (and I'm not good with C++), I decided to give it a try.
I understand that a single GLM is not ideal for parallel computing, but as I need to run 100-200 in parallel, I thought that using parfor could be a solution.
My problem is that it is not clear to me which approach I should follow. I wrote a gpuArray version of the matlab function glmfit, but using parfor doesn't have any advantage over a standard "for" loop.
Has this anything to do with the matlabpool setting? It is not even clear to me how to set this to "see" the GPU card. By default, it is set to the number of cores in the CPU (4 in my case), if I'm not wrong.
Am I completely wrong on the approach?
Any suggestion would be highly appreciated.
Edit
Thanks. I'm aware of GPUmat and Jacket, and I could start writing in C without too much effort, but I'm testing the GPU computing possibilities for a department where everybody uses Matlab or R. The final goal would be a cluster based on C2050 and the Matlab Distribution Server (or at least this was the first project).
Reading the ADs from Mathworks I was under the impression that parallel computing was possible even without C skills. It is impossible to ask the researchers in my department to learn C, so I'm guessing that GPUmat and Jacket are the better solutions, even if the limitations are quite big and the support to several commonly used routines like glm is non-existent.
How can they be interfaced with a cluster? Do they work with some job distribution system?

I would recommend you try either GPUMat (free) or AccelerEyes Jacket (buy, but has free trial) rather than the Parallel Computing Toolbox. The toolbox doesn't have as much functionality.
To get the most performance, you may want to learn some C (no need for C++) and code in raw CUDA yourself. Many of these high level tools may not be smart enough about how they manage memory transfers (you could lose all your computational benefits from needlessly shuffling data across the PCI-E bus).

Parfor will help you for utilizing multiple GPUs, but not a single GPU. The thing is that a single GPU can do only one thing at a time, so parfor on a single GPU or for on a single GPU will achieve the exact same effect (as you are seeing).
Jacket tends to be more efficient as it can combine multiple operations and run them more efficiently and has more features, but most departments already have parallel computing toolbox and not jacket so that can be an issue. You can try the demo to check.
No experience with gpumat.
The parallel computing toolbox is getting better, what you need is some large matrix operations. GPUs are good at doing the same thing multiple times, so you need to either combine your code somehow into one operation or make each operation big enough. We are talking a need for ~10000 things in parallel at least, although it's not a set of 1e4 matrices but rather a large matrix with at least 1e4 elements.
I do find that with the parallel computing toolbox you still need quite a bit of inline CUDA code to be effective (it's still pretty limited). It does better allow you to inline kernels and transform matlab code into kernels though, something that

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse