I have recently purchased a P100 GPU in hopes of speeding up parallel code and need some help deciding how to translate MATLAB code into a CUDA code ( I've moved away from plain gpuarrays in MATLAB ). I have experimented with .ptx kernels and MEX-files and have run into some roadblocks with both.
The parallel code has elementwise exponentiation, elementwise multiplication, and FFT and IFFT calls. It also incorporates complex numbers.
Are .ptx files compiled from CUDA-kernels or MEX CUDA files easier to work with and which will allow me to perform my necessary FFT, IFFT, exp, and mult calls?
It's simple really. You have to use MEX because you want to call into the NVIDIA cufft library, which you can only do from the host. However, there are basically no circumstances in which you will get a reasonable speed-up over calling FFT and IFFT from MATLAB, because those functions just call directly into cufft, with the added advantage of MATLAB's GPU memory pool and FFT plan cache. So maybe you should focus on the element-wise kernels.
Related
So I am using MATLAB for a project and am discussing the use of the power method for finding stationary distributions of Markov chains and its convergence rate. I was wondering what method/algorithms MATLAB's eig() function uses to find the eigenvectors of a matrix?
Normally Matlab is using LAPACK routines to do calculation. With that in mind I guess that from here you will be able to find the code that matlab runs. Be Aware LAPACK is in Fortran.
MATLAB Incorporates LAPACK
I like to use cuSolver code for Eigen value decomposition of complex matrix in Matlab.
I am using MATLAB CUDA kernel and it seems that its not possible to interface cuSolver with MATLAB as the cuSolver contains the code for host as well as for device (as mentioned here: http://docs.nvidia.com/cuda/cusolver/#syevd-example1)
while MATLAB CUDA kernel works only for the kernel function..
Please comment.
Any other idea to compute Eigenvalue decomposition of large no of matrices containing complex data in parallel on GPU by using Matlab environment?
You almost certainly need to use the MEX interface. This allows you to take in gpuArray data, and call kernels and other CUDA library functions.
See the doc: http://uk.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code.html for more.
Matlab is an efficeve tool to do numerical experiments. Then, I find that many papers like using it to test the Flop Complexity of an algorithm (e.g., regression, svd).
However, as I have learnt from others, Matlab uses Intel MKL for Matrix Multiplication. This is highly optimized code taking advantage of all the cores and their Vector Processing Units (SSE / AVX), and optimized for the cache layout in the CPU.
This means directly using Matlab cannot truly test flops complexity.
My question is then: how to disable MKL or something eles in Matlab in order to test the Flop Complexity of an algorithm?
The lsqcurvefit Matlab function is used to fit the paramaters of a model curve to a real curve (acquired data from experiment or observation) so that de square differences are minimized. lsqcurvefit
The function is time consuming, and maybe prohibitive if used on large set of curves.
Can it be straightforwardly used inside a CUDA kernel, being then all the program coded in Matlab? (Edit: this is, without writing a custom version of lsqcurvefit in C for the kernel. For instance: write the kernel code in Matlab (using there "any" matlab function like lsqcurvefit()), then compile the kernel with a Matlab provided tool, and finally execute the kernel in the GPU, called from matlab host code).
Thanks
There are many ways to combine the capabilities of matlab with GPUs, but there isn't any matlab code that can be used in a CUDA kernel.
AccelerEyes announced in December 2012 that it works with Mathworks on the GPU code and has discontinued its product Jacket for MATLAB:
http://blog.accelereyes.com/blog/2012/12/12/exciting-updates-from-accelereyes/
Unfortunately they do not sell Jacket licences anymore.
As far as I understand, the Jacket GPU Array solution based on ArrayFire was much faster than the gpuArray solution provided by MATLAB.
I started working with gpuArray, but I see that many functions are implemented poorly. For example a simple
myArray(:) = 0
is very slow. I have written some custom CUDA-Kernels, but the poorly-implemented standard MATLAB functionality adds a lot of overhead, even if working with gpuArrays consistently throughout the code. I fixed some issues by replacing MATLAB code with hand written CUDA code - but I do not want to reimplement the MATLAB standard functionality.
Another feature I am missing is sparse GPU matrices.
So my questions are:
How do is speed up the badly implemented default GPU implementations provided by MATLAB? In particular, how do I speed up sparse matrix operations in MATLAB using the GPU?
MATLAB does support CUDA based GPU. You have to access it from the "Parallel Computing Toolbox". Hope these 2 links also help:
Parallel Computing Toolbox Features
Key Features
Parallel for-loops (parfor) for running task-parallel algorithms on multiple processors
Support for CUDA-enabled NVIDIA GPUs
Full use of multicore processors on the desktop via workers that run locally
Computer cluster and grid support (with MATLAB Distributed Computing Server)
Interactive and batch execution of parallel applications
Distributed arrays and single program multiple data (spmd) construct for large dataset handling and data-parallel algorithms
MATLAB GPU Computing Support for NVIDIA CUDA-Enabled GPUs
Using MATLAB for GPU computing lets you accelerate your applications with GPUs more easily than by using C or Fortran. With the familiar MATLAB language you an take advantage of the CUDA GPU computing technology without having to learn the intricacies of GPU architectures or low-level GPU computing libraries.
You can use GPUs with MATLAB through Parallel Computing Toolbox, which supports:
CUDA-enabled NVIDIA GPUs with compute capability 2.0 or higher. For releases 14a and earlier, compute capability 1.3 is sufficient.
GPU use directly from MATLAB
GPU-enabled MATLAB functions such as fft, filter, and several linear algebra operations
GPU-enabled functions in toolboxes: Image Processing Toolbox, Communications System Toolbox, Statistics and Machine Learning Toolbox, Neural Network Toolbox, Phased Array Systems Toolbox, and Signal Processing Toolbox (Learn more about GPU support for signal processing algorithms)
CUDA kernel integration in MATLAB applications, using only a single line of MATLAB code
Multiple GPUs on the desktop and computer clusters using MATLAB workers in Parallel Computing Toolbox and MATLAB Distributed Computing Server
I had the pleasure of attending a talk by John, the founder of AccelerEyes. They did not get the speedup because they just removed poorly written code and replaced it with code that saved a few bits here and there. Their speedup was mostly from exploiting the availability of cache and doing a lot of operations in-memory (GPU's). Matlab relied on transferring data between GPU and CPU, if I remember correctly, and hence the speedup was crazy.