Segmentation Faults when Running MEX Files in Parallel - matlab

I am currently running repetitions of an experiment that uses MEX files in MATLAB 2012a and occasionally running into segmentation faults that I cannot understand.
Some information about the faults
They occur randomly
They only occur when I run multiple repetitions of my experiment in parallel on a Linux machine using a parfor loop.
They do not occur when I run multiple repetitions of my experiment in parallel on Mac OSX 10.7 using a parfor loop.
They do not occur when I run or do they occur when I run the repetitions sequentially.
They seem to occur far less frequently when I run 2 experiments in parallel - as opposed to 12 experiments in parallel.
Some information about my MEX file:
It is written in C
It uses the IBM CPLEX 12.4 API (this is thread-safe)
It was compiled using GCC 4.6.3
My thoughts are that there may be some issue in accessing the MEX file in multiple cores. Can anyone shed any light on what might be going on or suggest a fix? I'd be happy to provide more information as necessary.

I've recently sent a stack trace to the people at MATLAB and it turns out that the culprit is not my code but one of the functions from the CPLEX 12.4 API. It turns out that this function uses the putenv() function in C which is not necessarily thread-safe.
Unfortunately, I have to keep using this function and the API so I've posted a follow-up thread that focuses on finding ways to avoid this fault.
Any advice would be appreciated.

My thoughts are that there may be some issue in accessing the MEX file in multiple cores.
It's much more likely that your MEX file has a bug. Various bugs (which are very easy to make in C), such as accessing dangling memory, double-free()ing, or writing past the end of allocated array, will cause intermittent SIGSEGV.
Your best bet is to run Matlab under a debugger, and see where it crashes.

Related

Spawn multiple copies of matlab on the same machine

I am facing a huge problem. I built a complex C application with embedded Matlab functions that I call using the Matlab engine (engOpen() and such ...). The following happens:
I spawn multiple instances of this application on a machine, one for each core
However! ... The application then slows down to a halt. In fact, on my 16-core machine, the application slows down approximately by factor 16.
Now I realized this is because there is only a sngle matlab engine started per machine and all my 16 instances share the same copy of matlab!
I tried to replicate this with the matlab GUI and its the same problem. I run a program in the GUI that takes 14 seconds, and THEN I run it in two GUIs at the same time and it takes 28 seconds
This is a huge problem for me, because I will miss my deadline if I have to reprogram my entire c application without matlab. I know that matlab has commands for parallel programming, but my matlab calls are embedded in the C application and I want to run multiple instances of the C application. Again, I cannot refactor my entire c application because I will miss the deadline.
Can anyone please let me know if there is a solution for this (e.g. really start multiple matlab processes on the same machine). I am willing to pay for extra licenses. I currently have fully lincensed matlab installed on all machines.
Thank you so so much!
EDIT
Thank you Ben Voigt for your help. I found that a single instance of Matlab is already using multiple cores. In fact, running one instance shows me full utilization of 4 cores. If I run two copies of Matlab, I get full utilization of 8 cores. Hence it is actually running in parallel. However, even though 2 instances seem to take up double the processing power, I still get 2* slowdown. Hence, 2 instances seem to get twice the result with 4* the compute power total. Why could that be?
Your slowdown is not caused by stuffing all N instances into a single MatLab instance on a single core, but by the fact that there are no longer 16 cores at the disposal of each instance. Many MATLAB vector operations use parallel computation even without explicit parallel constructs, so more than one core per instance is needed for optimal efficiency.
MATLAB libraries are not thread-safe. If you create multithreaded applications, make sure only one thread accesses the engine application.
I think the matlab engine is the wrong technique. For windows platforms, you can try using the com automation server, which has the .Single option which starts one matlab instance for each com client you open.
Alternatives are:
Generate C++ code for the functions.
Create a .NET library. (NE Builder)
Run matlab via command line.

Will Matlab standalone be faster than Matlab from UI for long execution code?

I have built an standalone Matlab application. I was expecting it to be faster than running the application from the Matlab environent but it is indeed a bit slower (1.3 seg per iteration vs 1.5 seg per iteration)
I am not counting the init time required by MCR but the execution of my code.
Is that the expected performance or should I be obtaining a performance improvement?
I haven't found any settings on the deployment tool that could help to reduce execution time.
Thanks in advance
Applications built with MATLAB Compiler should execute at pretty much exactly the same speed as within MATLAB.
MATLAB Compiler does not convert your MATLAB code into machine code in the same way as a C compiler does for C. What it does is to archive and encrypt your MATLAB code (note, it properly encrypts it, not just pcodes it as a comment suggests), create a thin executable wrapper and package them together, possibly also with MATLAB Compiler Runtime (MCR). MCR is very similar to MATLAB itself, without a graphical user interface, and is freely redistibutable.
When you run the executable, it dearchives and decrypts your MATLAB code and runs it against the MCR. It should run exactly the same, both in terms of results and speed.
Very old versions of MATLAB Compiler (pre-version 4.0) worked in a different way, converting a subset of the MATLAB language into C code, and compiling this. This provided a potentially significant speed-up, but only a subset of the language was supported and results, unless you were careful, could sometimes be different. Similar functionality is now available in the separate MATLAB Coder product.
There are a few small things you can do to improve performance: for example, within deploytool you can specify which toolboxes your application uses. deploytool uses a dependency checker to package up all MATLAB functionality that it thinks your code might possibly depend on, but it can't always tell exactly, as the functions your code needs might change at runtime. It therefore errs on the side of caution and includes more than necessary. By specifying only the toolboxes you know to be necessary, you can speed things up a little (it also speeds up the build process quite a bit).

Deployed Matlab application using significantly more memory than Matlab scripts

I was testing a stand-alone application we developed in Matlab when I noticed that its memory usage, according to Windows Task Manager, was peaking several times above 16gb. I decided to run Matlab's profiler with profile -memory on on the scripts behind the compiled version to see where the memory peaks were occurring, using the exact same input. However, the highest peak memory it found was 2400860.00 Kb, or about 1/4 as much, for the function that essentially acts as the program's main().
Thus, I was wondering if people have noticed huge memory usage differences between running a compiled Matlab program and running the original scripts in Matlab. I noticed it took a lot longer running in Matlab, but I figured that was due to the profiler keeping track of all of the memory allocations and deallocations, rather than reading and writing to a swap space on disk.
To make a real quick answer to this question. Yes, MATLAB compiled applications run with more overhead than MATLAB scripts.
This is because MATLAB deployed applications open up a version of MATLAB which is stored in the memory called the MCR. The MCR runs with more overhead than MATLAB.
One thing that I have found useful in situations like this is to recompile and see if that helps at all. If it doesn't, you could try to lower the memory usage by running calculations in segments.
This might be helpful for better memory usage: http://www.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
Source:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/306814
Matlab executable too slow
Comment if you have questions.

Working around segmentation faults that occur in parallel due to a non-thread-safe API function

I am currently coding a MEX file in MATLAB to run experiments in parallel using the parfor function in MATLAB 2012a. The MEX file does some very straightforward numerical tasks but relies on the CPLEX 12.4 API from IBM.
Although my MEX file works sequentially, I will inevitably receive "random" segmentation fault when I run in in parallel. After sending a stack trace of the segmentation fault to MATLAB, they have suggested that the error originates from the "putenv()" function from the C library, which is apparently not thread-safe.
I do not use the putenv() function in my MEX code, but it turns out that one of functions that I absolutely have to call from the CPLEX 12.4 does use it. I'm wondering if there is anything I could do to avoid the segmentation faults that come as a result of this function. Someone previously suggested "locking my bits" and "using semaphores" but I'm really little over my head when it comes to these concepts.
Any advice or direction would be very much appreciated.
It turns out that the violation occurs since I use the CPLEX MATLAB API in my MATLAB code and the CPLEX C API in my MEX code at the same time. Both APIs use the putenv() function, which is not thread-safe. In particular, the crash occurs whenever two threads try to use the putenv() function at the same time (either in the MEX file or the MATLAB code).
The fix is to use the package and add a mutex_lock / mutex_unlock around the functions that use putenv() in C and MATLAB (i.e. CPXopenCPLEX in C / Cplex() in MATLAB). More information and exact code to create the mutex_lock / mutex_unlock can be found in the following post on the CPLEX forums

matlab shared c++ libraries and OpenCL

I have a project that requires lots of image processing and wanted to add GPU support to speed things up.
I was wondering if i compiled my matlab into c++ shared library and called it from within OpenCL program, does that mean that the matlab code is going to be run on GPU?
My own (semi-educated) guess is that you are going to find this very difficult to do. But, others have trodden the same path. This paper might be a good place to start your research. And Googling turned up Accelereyes and a couple of references to items on the Mathworks File Exchange which you might want to follow up.
Everything in jacket is written in c/ c++ / cuda.
Infact we now have a beta version libjacket (http://www.accelereyes.com/downloadLibjacket) which can be used to extend not just matlab but other languages if you are willing.
#OSaad
Most of our functions are the fastest options out there. Be it in C or matlab.
The Parallel Computing Toolbox in the upcoming release R2010b (due September 1st) supports GPU processing for several functions. Unfortunately, it only supports CUDA (version 1.3 and later), so with an ATI graphics card, you're out of luck. However, you may just want to buy a dedicated GPU, anyway.
Typically, if you can write your Matlab code in a "vectorized" way, then the packages like AccelerEyes and Jacket have a reasonable chance of making things run on the GPU. You can verify this to some extent beforehand by checking whether Matlab itself is able to run on multiple cores on the CPU (these days Matlab will use multiple cores if things are parallelizable in an obvious way).
If that doesn't work, then you need to drop down to C/C++ via mex and then, from there, call OpenCL yourself. Mex is how Matlab talks to C code, so you write C code that is called by Matlab (and receives the matrices, etc), then initialises and calls OpenCL. This is more work, but may be your only route (and, even if the automated packages work to some extent, this approach can still give more speedups because you can be smarter about memory management, for example, if you know what your are doing).