Why is MEX code much slower than the matlab code?

Why is MEX code much slower than the matlab code? - matlab

First of all, it is a request not to mark this question as duplicate as it will be clear once it is studied in detail.
I am trying to implement Orthogonal Matching Pursuit algorithm. For that I need to find the dot product of two matrices of size 144*14596 and 144*1 as shown below
clc,clear;
load('E');
load('R');
load('P');
sparse=zeros(14596,2209);
dictionary=tem2;
atoms=zeros(size(dictionary,1),size(dictionary,2));
coefs=zeros(size(dictionary,2),1);
tic
%Normalize the dictionary
for index=1:size(dictionary,2)
dictionary(:,index)=dictionary(:,index)./norm(dictionary(:,index));
end
D=dictionary;
/* NOTE: I tried for ii=1:5 to check the difference in computational time*/
for ii=1:2209
r=tem4(:,ii);
dictionary=D;
index=[];
count=0;
t=5;
while(t>1e-15 && count~=144)
/***************Problem lies here**************/
% inner_product=dictionary'*r; %Dot Product (Should be slow but is fast)
inner_product=dotProduct(dictionary',r); %(Should be fast but is very slow)
/****************************************************/
[m,ind]=max(abs(inner_product));
index=[index ind];
atoms(:,ind)=dictionary(:,ind); %Select atom which has maximum inner product
dictionary(:,ind)=0;
at=atoms(:,index);
x=(at'*at)\(at'*r);
coefs(index)=x;
r=r-at*x;
t=norm(r);
count=count+1;
end
sparse(:,ii)=coefs;
end
sig=D*sparse;
final=uint8((repmat((((max(tem4))-min(tem4))./((max(sig)-min(sig)))),size(tem4,1),1).*(sig-repmat(min(sig),size(tem4,1),1)))+repmat(min(tem4),size(tem4,1),1));
toc
but the problem I am facing is that it takes a lot of time (as seen in the profiler report) in finding out the dot product using the following code in MATLAB.
inner_product=dictionary'*r;
In order to reduce the computational time, I wrote the MEX code as shown below to find out the dot product:
/***********************************************************************
*Program to create a MEX-file to find the dot product of matrices *
*Created by: Navdeep Singh *
*#Copyright Reserved *
***********************************************************************/
#include "mex.h"
void dot_prod(double *m1,double *m2, double *t,size_t M,size_t N, size_t M2,size_t N2 )
{
int i,j,k;
double s;
for(i=0;i<M;i++)
{ for(k=0;k<N2;k++)
{ s=0;
for(j=0;j<N;j++)
{ s=s+*((m1+i)+(M*j))*(*(m2+(j+M2*k)));
}
*((t+i)+(M*k))=s;
}
}
}
void mexFunction(int nlhs,mxArray *plhs[],int nrhs, const mxArray *prhs[])
{ double *mat1,*mat2,*out;
size_t rows_mat1,cols_mat1,rows_mat2,cols_mat2;
mat1=mxGetPr(prhs[0]);
mat2=mxGetPr(prhs[1]);
rows_mat1=mxGetM(prhs[0]);
cols_mat1=mxGetN(prhs[0]);
rows_mat2=mxGetM(prhs[1]);
cols_mat2=mxGetN(prhs[1]);
plhs[0]=mxCreateDoubleMatrix(rows_mat1,cols_mat2,mxREAL);
out=mxGetPr(plhs[0]);
dot_prod(mat1,mat2,out,rows_mat1,cols_mat1,rows_mat2,cols_mat2);
}
But to my surprise I found out that MEX solution is much much slower than the one used in MATLAB, which defeats the ultimate purpose of MEX. To know the cause I searched a lot on internet and found some interesting facts such as :
Matlab: Does calling the same mex function repeatedly from a loop incur too much overhead?
Matlab mex-file with mexCallMATLAB is almost 300 times slower than the corresponding m-file
These links suggests that overhead should not be much and if there is some it is always for the first call, as time is needed to load symbol tables etc . -- But contrary to this I found that a lot of overhead is incurring in my code.
Also I found out that size of arguments does not matter though number of arguments can affect the computational time but it is again minimal. One of the link also suggest that dynamically allocated memory should be freed (apart from the one allocated by matlab itself) but I don't have any such kind of allocation also.
So kindly let me know what is the reason behind
why MEX is taking huge amount of time?
What can be the solution to it?
Your help is really appreciated.
Various files can be found here:
dictionary.m
dotProduct.c
Report MEX
E.mat
R.mat
P.mat

Matlab has highly optimized code to calculate dot product of to matrices,
you just wrote a nested for loop to calculate the dot product , so you could compare this Mex code just with "similar nested for loops" in matlab then decide whether MEX code is faster or matlab ,
in fact matlab does not use nested for loop to calculate dot product of matrices,
from MATLAB doc:
MEX-Files have several applications:
calling large pre-existing c/c++ and FORTRAN programs from MATLAB without rewriting them as MATLAB functions
Replacing performance-critical routines with c/c++ implementations
MEX files are not appropriate for all applications. MATLAB is a high-productivity environment whose specialty is eliminating time-consuming, low-level programming in compiled languages like C or C++. In general, do your programming in MATLAB. Do not use MEX files unless your application requires it.
EXAMPLE1

Related

Communication between Fortran and Matlab

I am relatively new to making different programing languages communicate with each other and would appreciate some help.
Basically I have a Fortran code and a Matlab code. Both codes are first initialized and then have to run sequentially. Each code requires input from the other one. When this process has repeated often enough some convergence criteria is reached and the iteration is terminated. To make things more complicated the Fortran code not only requires input from Matlab but also from its own previous iteration. The same holds true for Matlab. So as far as I can see it is best to keep both programs open throughout the iteration process as I have a lot of variables and therefore can’t just write them in a text file to hand them over to the other programme and preserve them for the next iteration.
So essentially I’m trying to do something like this:
Initialise variable sets A,B,C and D
Fortran:
Input: A and B
Calculations …
Output: A (variables have now new values) and D
Matlab:
Input: C and D
Calculations …
Output: C (variables have now new values) and B
Repeat Fortran and Matlab until convergence criteria is reached.
So my questions are: How to make Matlab and Fortran communicate with each other and pass variables to one and another? And how can each code trigger the other one but then wait for the other code to finish its calculation first before continuing?

The keywords for your favourite search engine are "fortran mex". MATLAB has a very good documentation/tutorial, you can start here:
A MEX-file lets you call a Fortran subroutine from MATLAB
But I believe it only works if you call Fortran subroutines from Matlab. You cannot easily call Matlab .m function from Fortran code. So your "main" program has to be Matlab .m script which calls Fortran subroutines defined in the MEX file (which is actually a dynamic library).

Vectorising 3d array

I am trying to vectorise a for loop. I have a set of coordinates listed in a [68x200] matrix called plt2, and I have another set of coordinates listed in a [400x1] matrix called trans1. I want to create a three dimensional array called dist1, where in dist1(:,:,1) I have all of the values of plt2 with the first value of trans1 subtracted, all the way through to the end of trans1. I have a for loop like this which works but is very slow:
for i=1:source_points;
dist1(:,:,i)=plt2-trans1(i,1);
end
Thanks for any help.

If I understood correctly, this can be easily solved with bsxfun:
dist1 = bsxfun(#minus, plt2, shiftdim(trans1,-2));
Or, if speed is important, use this equivalent version (thanks to #chappjc), which seems to be much faster:
dist1 = bsxfun(#minus, plt2, reshape(trans1,1,1,[]));
In general, bsxfun is a very useful function for cases like this. Its behaviour can be summarized as follows: for any singleton dimension of any of its two input arrays, it applies an "implicit" for loop to the other array along the same dimension. See the doc for further details.

Vectorizing is a good first optimization, and is usually much easier than going all in writing your own compiled mex-function (in c).
However, the golden middle-way for power users is Matlab Coder (this also applies to slightly harder problems than the one posted, where vectorization is more or less impossible). First, create a small m-file function around the slow code, in your case:
function dist1 = do_some_stuff(source_points,dist1,plt2,trans1)
for i=1:source_points;
dist1(:,:,i)=plt2-trans1(i,1);
end
Then create a simple wrapper function which calls do_some_stuff as well as defines the inputs. This file should really be only 5 rows, with only the bare essentials needed. Matlab Coder uses the wrapper function to understand what typical proper inputs to do_some_stuff are.
You can now fire up the Matlab Coder gui from the Apps section and simply add do_some_stuff under Entry-Point Files. Press Autodefine types and select your wrapper function. Go to build and press build, and you are good to go! This approach usually bumps up the execution speed substantially with almost no effort.
BR
Magnus

How to avoid allocating memory for the returned value each time a function is called

I have a function that returns a large vector and is called multiple times, with some logic going on between calls that makes vectorization not an option.
An example of the function is
function a=f(X,i)
a=zeros(size(X,1),1);
a(:)=X(:,i);
end
and I am doing
for i=1:n a=f(X,i); end
When profiling this (size(X,1)=5.10^5, n=100 ) times are 0.12s for the zeros line and 0.22s for a(:)=X(:,i) the second line. As expected memory is allocated at each call of f in the 'zeros' line.
To get rid of that line and its 0.12s, I thought of allocating the returned value just once, and passing it in as return space each time to an appropriate function g like so:
function a=g(X,i,a)
a(:)=X(:,i);
end
and doing
a=zeros(m,1);
for i=1:n a=g(X,i,a); end
What is surprising to me is that profiling inside g still shows memory being allocated in the same amounts at the a(:)=X(:,i); line, and the time taken is very much like 0.12+0.22s..
1)Is this just "lazy copy on write" because I am writing into a?
2)Going forward, what are the options?
-a global variable for a (messy..)?
-writing a matrix handle class (must I really?)
(The nested function way means some heavy redesigning to make a nesting function to which X is known (the matrix A with notations from that answer)..)

Perhaps this is a bit tangential to your question, but if this is a performance critical application, I think a good way to go is to rewrite your function as a mex file. Here is a quote from http://www.mathworks.com/support/tech-notes/1600/1605.html#intro,
The main reasons to write a MEX-file are:...
Speed; you can rewrite bottleneck computations (like for-loops) as a MEX-file for efficiency.
If you are not familiar with mex files, the link above should get you started. Converting your existing function to C/C++ should not be overly difficult. The yprime.c example included with MATLAB is similar to what you're trying to do, since it is iteratively being called to calculate the derivatives inside ode45, etc.

Methods to speed up for loop in MATLAB

I have just profiled my MATLAB code and there is a bottle-neck in this for loop:
for vert=-down:up
for horz=-lhs:rhs
y = y + x(k+vert.*length+horz).*DM(abs(vert).*nu+abs(horz)+1);
end
end
where y, x and DM are vectors I have already defined. I vectorised the loop by writing,
B=(-down:up)'*ones(1,lhs+rhs+1);
C=ones(up+down+1,1)*(-lhs:rhs);
y = sum(sum(x(k+length.*B+C).*DM(abs(B).*nu+abs(C)+1)));
But this ended up being sufficiently slower.
Are there any suggestions on how I can speed up this for loop?
Thanks in advance.

What you've done is not really vectorization. It's very difficult, if not impossible, to write proper vectorization procedures for image processing (I assume that's what you're doing) in Matlab. When we use the term vectorized, we really mean "vectorized with no additional computation". For example, this code
a = 1:1000000;
for i = a
n = n+i;
end
would run much slower then this code
a = 1:1000000;
sum(a)
Update: code above has been modified, thanks to #Rasman's keen suggestion. The reason is that Matlab does not compile your code into machine language before running it, and that's what causes it to be slower. Built-in functions like sum, mean and the .* operator run pre-compiled C code behind the scenes. For loops are a great example of code that runs slowly when not optimized for you CPU's registers.
What you have done, and please ignore my first comment, is rewriting your procedure with a vector operation and some additional operations. Those are the operations that take extra CPU simply because you're telling your computer to do more computations, even though each computation separately may (or may not) take less time.
If you are really after speeding up you code, take a look at MEX files. They allow you to write and compile C and C++ code, compile it and run as Matlab functions, just like those fast built-in ones. In any case, Matlab is not meant to be a fast general-purpose programming platform, but rather a computer simulation environment, though this approach has been changing in the recent years. My advise (from experience) is that if you do image processing, you will write for loops, and there's rarely a way around it. Vector operations were written for a more intuitive approach to linear algebra problems, and we rarely treat digital images as regular rectangular matrices in terms of what we do with them.
I hope this helps.

I would use matrices when handling images... you could then try to extract submatrices like so:
X = reshape(x,height,length);
kx = mod(k,length);
ky = floor(k/length);
xstamp = X( [kx-down:kx+up], [ky-lhs:ky+rhs]);
xstamp = xstamp.*getDMMMask(width, height);
y = sum(xstamp);
...
function mask = getDMMask(width, height, nu)
% I don't get what you're doing there .. return an appropriate sized mask here.
return mask;
end

MATLAB interview questions?

I programmed in MATLAB for many years, but switched to using R exclusively in the past few years so I'm a little out of practice. I'm interviewing a candidate today who describes himself as a MATLAB expert.
What MATLAB interview questions should I ask?
Some other sites with resources for this:
"Matlab interview questions" on Wilmott
"MATLAB Questions and Answers" on GlobaleGuildLine
"Matlab Interview Questions" on CoolInterview

This is a bit subjective, but I'll bite... ;)
For someone who is a self-professed MATLAB expert, here are some of the things that I would personally expect them to be able to illustrate in an interview:
How to use the arithmetic operators for matrix or element-wise operations.
A familiarity with all the basic data types and how to convert effortlessly between them.
A complete understanding of matrix indexing and assignment, be it logical, linear, or subscripted indexing (basically, everything on this page of the documentation).
An ability to manipulate multi-dimensional arrays.
The understanding and regular usage of optimizations like preallocation and vectorization.
An understanding of how to handle file I/O for a number of different situations.
A familiarity with handle graphics and all of the basic plotting capabilities.
An intimate knowledge of the types of functions in MATLAB, in particular nested functions. Specifically, given the following function:
function fcnHandle = counter
value = 0;
function currentValue = increment
value = value+1;
currentValue = value;
end
fcnHandle = #increment;
end
They should be able to tell you what the contents of the variable output will be in the following code, without running it in MATLAB:
>> f1 = counter();
>> f2 = counter();
>> output = [f1() f1() f2() f1() f2()]; %# WHAT IS IT?!

We get several new people in the technical support department here at MathWorks. This is all post-hiring (I am not involved in the hiring), but I like to get to know people, so I give them the "Impossible and adaptive MATLAB programming challenge"
I start out with them at MATLAB and give them some .MAT file with data in it. I ask them to analyze it, without further instruction. I can very quickly get a feel for their actual experience.
http://blogs.mathworks.com/videos/2008/07/02/puzzler-data-exploration/
The actual challenge does not mean much of anything, I learn more from watching them attempt it.
Are they making scripts, functions, command line or GUI based? Do they seem to have a clear idea where they are going with it? What level of confidence do they have with what they are doing?
Are they computer scientists or an engineer that learned to program. CS majors tend to do things like close their parenthesis immediately, and other small optimizations like that. People that have been using MATLAB a while tend to capture the handles from plotting commands for later use.
How quickly do they navigate the documentation? Once I see they are going down the 'right' path then I will just change the challenge to see how quickly they can do plots, pull out submatrices etc...
I will throw out some old stuff from Project Euler. Mostly just ramp up the questions until one of us is stumped.

Floating Point Questions
Given that Matlab's main (only?) data type is the double precision floating point matrix, and that most people use floating point arithmetic -- whether they know it or not -- I'm astonished that nobody has suggested asking basic floating point questions. Here are some floating point questions of variable difficulty:
What is the range of |x|, an IEEE dp fpn?
Approximately how many IEEE dp fpns are there?
What is machine epsilon?
x = 10^22 is exactly representable as a dp fpn. What are the fpns xp
and xs just below and just above x ?
How many dp fpns are in [1,2)? How many atoms are on an edge of a
1-inch sugar cube?
Explain why sin(pi) ~= 0, but cos(pi) = -1.
Why is if abs(x1-x2) < 1e-10 then a bad convergence test?
Why is if f(a)*f(b) < 0 then a bad sign check test?
The midpoint c of the interval [a,b] may be calculated as:
c1 = (a+b)/2, or
c2 = a + (b-a)/2, or
c3 = a/2 + b/2.
Which do you prefer? Explain.
Calculate in Matlab: a = 4/3; b = a-1; c = b+b+b; e = 1-c;
Mathematically, e should be zero but Matlab gives e = 2.220446049250313e-016 = 2^(-52), machine epsilon (eps). Explain.
Given that realmin = 2.225073858507201e-308, and Matlab's u = rand gives a dp fpn uniformly distributed over the open interval (0,1):
Are the floating point numbers [2^(-400), 2^(-100), 2^(-1)]
= 3.872591914849318e-121, 7.888609052210118e-031, 5.000000000000000e-001
equally likely to be output by rand ?
Matlab's rand uses the Mersenne Twister rng which has a period of
(2^19937-1)/2, yet there are only about 2^64 dp fpns. Explain.
Find the smallest IEEE double precision fpn x, 1 < x < 2, such that x*(1/x) ~= 1.
Write a short Matlab function to search for such a number.
Answer: Alan Edelman, MIT
Would you fly in a plane whose software was written by you?
Colin K would not hire me (and probably fire me) for saying "that
Matlab's main (only?) data type is the double precision floating
point matrix".
When Matlab started that was all the user saw, but over the years
they have added what they coyly call 'storage classes': single,
(u)int8,16,32,64, and others. But these are not really types
because you cannot do USEFUL arithmetic on them. Arithmetic on
these storage classes is so slow that they are useless as types.
Yes, they do save storage but what is the point if you can't do
anything worthwhile with them?
See my post (No. 13) here, where I show that arithmetic on int32s is 12 times slower than
double arithmetic and where MathWorkser Loren Shure says "By
default, MATLAB variables are double precision arrays. In the olden
days, these were the ONLY kind of arrays in MATLAB. Back then even
character arrays were stored as double values."
For me the biggest flaw in Matlab is its lack of proper types,
such as those available in C and Fortran.
By the way Colin, what was your answer to Question 14?

Ask questions about his expertise and experience in applying MATLAB in your domain.
Ask questions about how he would approach designing an application for implementation in MATLAB. If he refers to recent features of MATLAB, ask him to explain them, and how they are different from the older features they replace or supplement, and why they are preferable (or not).
Ask questions about his expertise with MATLAB data structures. Many of the MATLAB 'experts' I've come across are very good at writing code, but very poor at determining what are the best data structures for the job in hand. This is often a direct consequence of their being domain experts who've picked up MATLAB rather than having been trained in computerism. The result is often good code which has to compensate for the wrong data structures.
Ask questions about his experience, if any, with other languages/systems and invite him to expand upon his observations about the relative strengths and weaknesses of MATLAB.
Ask for top tips on optimising MATLAB programs. Expect the answers: vectorisation, pre-allocation, clearing unused variables, etc.
Ask about his familiarity with the MATLAB profiler, debugger and lint tools. I've recently discovered that the MATLAB 'expert' over in the corner here had never, in 10 years using the tool, found the profiler.
That should get you started.

I. I think this recent SO question
on indexing is a very good question
for an "expert".
I have a 2D array, call it 'A'. I have
two other 2D arrays, call them 'ix'
and 'iy'. I would like to create an
output array whose elements are the
elements of A at the index pairs
provided by x_idx and y_idx. I can do
this with a loop as follows:
for i=1:nx
for j=1:ny
output(i,j) = A(ix(i,j),iy(i,j));
end
end
How can I do this without the loop? If
I do output = A(ix,iy), I get the
value of A over the whole range of
(ix)X(iy).
II. Basic knowledge of operators like element-wise multiplication between two matrices (.*).
III. Logical indexing - generate a random symmetric matrix with values from 0-1 and set all values above T to 0.
IV. Read a file with some properly formatted data into a matrix (importdata)
V. Here's another sweet SO question
I have three 1-d arrays where elements
are some values and I want to compare
every element in one array to all
elements in other two.
For example:
a=[2,4,6,8,12]
b=[1,3,5,9,10]
c=[3,5,8,11,15]
I want to know if there are same
values in different arrays (in this
case there are 3,5,8)
Btw, there's an excellent chance your interviewee will Google "MATLAB interview questions" and see this post :)

Possible question:
I have an array A of n R,G,B triplets. It is a 3xn matrix. I have another array B in the form 1xn which stores an index value (association to a cluster) for each triplet.
How do I plot the triplets of A in 3D space (using plot3 function), coloring each triplet according to its index in B? (The goal is to qualitatively evaluate my clustering)
Really, really good programmers who are MATLAB novices won't be able to give you an efficient (== MATLAB style) solution. However, it is a very simple problem if you do know your MATLAB.

Depends a bit what you want to test.
To test MATLAB fluency, there are several nice Stack Overflow questions that you could use to test e.g. array manipulations (example 1, example 2), or you could use fix-this problems like this question (I admit, I'm rather fond of that one), or look into this list for some highly MATLAB-specific stuff. If you want to be a bit mean, throw in a question like this one, where the best solution is a loop, and the typical MATLAB-way-of-thinking solution would just fill up the memory.
However, it may be more useful to ask more general programming questions that are related to your area of work and see whether they get the problem solved with MATLAB.
For example, since I do image analysis, I may ask them to design a class for loading images of different formats (a MATLAB expert should know how to do OOP, after all, it has been out for two years now), and then ask follow-ups as to how to deal with large images (I want to see a check on how much memory would be used - or maybe they know memory.m - and to hear about how MATLAB usually works with doubles), etc.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse