Use and implications of eval('expression') in MATLAB code? - matlab

I was encountered with usage of function eval(expression) in somebody else's code in matlab:
for example:
for n = 1 : 4
sn = int2str( n) ;
eval( [ 'saveas( fig' sn ', [ sName' sn ' ], ''fig'' ) ' ] );
end
MathWorks stuff in Matlab Help pointed out that:
Many common uses of the eval function are less efficient and are more difficult to read and debug than other MATLAB functions and language constructs.
After this, I find usage of this function in many other program languages, so as Python, JavaScript, PHP.
So I have a few questions:
Will usage of this function be enfluenced on the performance of my code?
If it will slow down execution, why does it occur?
If it slow down execution every time when called, what reason for use this function in principle?

The eval function is dangerous and there are very few examples where you actually need it. For example, the code you gave can easily be rewritten if you store the figure handles in an array fig(1), fig(2) etc and write
for n = 1:4
filename = sprintf('sName%d', n);
saveas(fig(n), filename, 'fig');
end
which is clearer, uses fewer characters, can be analysed by the Matlab editor's linter, is more easily modifiable if (when) you need to extend the code, and is less prone to weird bugs.
As a rule of thumb, you should never use eval in any language unless you really know what you are doing (i.e. you are writing a complicated Lisp macro or something else equivalent to manipulating the AST of the language - if you don't know what that means, you probably don't need to use eval).
There are almost always clearer, more efficient and less dangerous ways to achieve the same result. Often, a call to eval can be replaced with some form of recursion, a higher-order function or a loop.

Using eval here will certainly be slower than a non-eval version, but most likely it won't be a bottleneck in you code. However, the performance is only one issue, maintenance (incl. debugging), as well as readability are other ones.
The slowdown occurs because Matlab uses a JIT compiler, and eval lines cannot be optimized.
Eval use is in most cases due to lack of knowledge about the Matlab functionality that would be appropriate instead. In this particular case, the issue is that the figure handles are stored in variable names called fig1 through fig4. If they had been stored in an array called fig, i.e. fig(1) etc, eval would have been unnecessary.
EDIT Here are two excellent articles by Loren Shure about why eval should be avoided in Matlab. Evading eval, and More on eval.

For the most part, the slowdown occurs because the string has to be parsed into actual code. This isn't such a major issue if used sparingly, but if you ever find yourself using it in code that loops (either an explicit loop or things like JavaScript's setInterval()) then you're in for a major performance drop.
Common uses I've seen for eval that could be done better are:
Accessing property names in the ignorance of [] notation
Calling functions based on an argument name, which could instead be done with a switch (safer, prevents risk of code injection)
Accessing variables named like var1, var2, var3 when they should be arrays instead
To be honest, I don't think I've ever encountered a single situation where eval is the only way to solve a problem. I guess you could liken it to goto in that it's a substitute for program structure, or useful as a temporary solution to test a program before spending the time making it work optimally.

Here is another implication:
When you compile a program that uses eval, you must put pragmas that tell the compiler that some functions are needed. For example:
This code will compile and run well:
foo()
But this one needs a pragma added:
%#function foo
eval('foo()')
Otherwise you will encounter a runtime problem.

Related

How to automatically convert a Matlab script into a Matlab function?

My problem is the following:
I have very many (~1000) mutually calling Matlab scripts, which are very poorly written, regularly damage each other's environments and generally became unmanageable.
One of the reasons I even got this problem is that I need to write a testsuite covering a big part of them. Luckily, for most of them the main criterion of 'correctness' is 'they don't crash'.
Just running them one by one in a loop is generally not an option, because they regularly call clear classes, close all, clc, shadow built-in functions and operators, et cetera.
So my original aim was to find a way to run a matlab script in sort of an 'isolated environment', but I didn't find a good way to do it. (Suggestions welcome, but it is not the main question.)
Since I will need to convert them all to functions anyway, I am looking for some way to do it auto-magically, or at least semi-automatically.
What I can mean semi-automatically:
Just add a line function varargout = $filename( varargin ) as the first line of the file, and end as the last one. This will at least make them runnable as functions with feval and all such functions and (more importantly) prevent them from damaging the test-runner.
Do point 1 and scan the file for referencing undeclared variables and add them as function arguments. This should be also doable, since the names of the variables are known. This will not help identifying output variables, but will still be a lot of help. For example, we could pack the whole workspace into one big output structure.
Do a runtime version of point 2. This way the 'magical converter' can actually track execution environments (workspaces) and identify which variables are implicitly used as 'input arguments' of a script, and which would be later used 'output arguments'. This option looks EXPHARD, but for a small number of calls should be not too bad in practice.
Point 1 I can implement myself using sed, as I also can get rid of all clear classes and clc, but the options 2 and 3 seem much harder. Is there anything at least remotely resembling options 2 or 3?

When is the eval function necessary?

I just read the following article from MathWorks which describes why it is important to avoid the eval function and lists alternatives to many of eval's common uses.
After reading the article, I have the impression that the eval function is neither useful nor necessary. So, my question is this: When is the eval function necessary?
I have found only one useful case for eval, and then the evalc variety: when calling a function with built-in command line call back (e.g. lines without ; or with disp calls), which you cannot modify. For instance when you got some obfuscated function that dumps heaps of stuff to your command window. In that case it's best to try and obtain the source code to modify that to your needs, as using evalc will mess up your performance. Otherwise, I have not found a case where eval is the best solution.
I wrote an extensive answer detailing why you should try to avoid eval as much as possible here: How to put these images together?
I have already used eval when trying to create multiple arrays with different names. This is not really recommended, but it worked for my specific application. For example, if I wanted to have N matrices with the specific names "matrix1" "matrix2" .. "matrixN" , one solution would be to manually type these in as "matrix1 = something" ... "matrixN = somethingelse". If N is really large, this is not ideal. Using eval , you could set up a for loop that would change the name of the matrix on every loop, and calculate some value based on that same N value.
Of course, ideally saving them in to a cell would be better, but I needed the arrays in the format I described.

Can operators be stored in variables?

This is a thought example of what I am thinking of:
test = 'x > 0';
while str2func(test)
Do your thing
x=x-1;
end
Is it possible to store whole logical operations in a variable like this?
Of course the str2func will break here. If it is possible this function will likely be something else. And I have only added apostrophes to the test variable content, because I cannot think of what else would be the storing method.
I can see it usefull when sending arguments to functions and alike. But mostly I'm just wondering, because I have never seen it done in any programming language before.
You can store the textual representation of a function in a variable and evaluate it, for example
test = 'x > 0';
eval(test)
should result in 1 or 0 depending on x's value.
But you shouldn't use eval for reasons too-often covered here on SO for me to bother repeating. You should instead become familiar with functions and function handles. For example
test = #(x)x>0
makes test a handle to a function which tests whether its argument is greater than 0 or not.
Many languages which are interpreted at run-time, as opposed to compiled languages, have similar capabilities.

would it be worth it to use inline::C to speed up math

i have been working on a perl program to process large amounts of dna. It outputs exactly what i need however it takes much longer than i would like using NYTprof i have narrowed down the major problem areas to be the loop that adds my values together. would using inline::C to do the math make my program faster or should i accept the speed and move on? is there another way to improve the speed? here is my program and an input it would run as well as an executable with the default values entered already.
It's unlikely you'll get useful help here (this included). I can see various problems with your code, and none have to do with the choice of language.
use CPAN. If you're parsing genbank, then use some an appropriate module.
You're writing assembly in Perl, and neither Perl nor you are very good at that. It's near impossible to know what's going on when you don't pass parameters to subroutines, instead relying on globals all over the place. What do #X1, #X2, #Y1, #Y2 mean?
The following might be your problem: until ($ender - $starter > $tlength) { (line 153). According to your test case, these start by being 103, 1, and 200, and it's not clear when or if they change. Depending on what's in #te, it might or might not ever get out of the loop; I just can't tell from your code.
It would help if we knew, exactly, what are the parameters to add, the in-out invariants, and what it is returning.
That's all I got.
I second the recommendation of PDL made in a comment, if it's applicable. Or the use of a CPAN module tailored to your problem (again, if applicable).
I didn't see anything that looked unambiguously like "the loop that adds my values together" in that code; please, show just the code you are considering optimizing, ideally with just enough structure around it to actually run it.
So to answer your generic question generically, yes, Inline::C can be a useful tool for optimization if you are certain your performance problem is limited to what it actually can do for you. In using it, be aware that invoking your C code from Perl or vice versa is non-trivially expensive, so you have to have enough code translated to C to minimize the transitions.

Using regexp to index a file for imenu, performance is unacceptable

I'm producing a function for imenu-create-index-function, to index a source code module, for csharp-mode.el
It works, but delivers completely unacceptable performance. Any tips for fixing this?
The Background
I looked at js.el, which is the rebadged "espresso" now included, since v23.2, into emacs. It indexes Javascript files very nicely, does a good job with anonymous functions and various coding styles and patterns in common use. For example, in javascript one can do:
(function() {
var x = ... ;
function foo() {
if (x == 1) ...
}
})();
...to define a scope where x is "private" or inaccessible from other code. This gets indexed nicely by js.el, using regexps, and it indexes the inner functions (anonymous or not) within that scope also. It works quickly. A big module can be indexed in less than a second.
I tried following a similar approach in csharp-mode, but it's quite a bit more complicated. In Js, everything that gets indexed is a function. So the starting regex is "function" with some elaboration on either end. Once an occurrence of the function keyword is found, then there are 4 - 8 other regexps that get tried via looking-at - the number depends on settings. One nice thing about js mode is that you can turn on or off regexps for various coding styles, to speed things along I suppose. The default "styles" work for most of the code I tried.
This doesn't work in csharp-mode. It works, but it performs poorly enough to make it not very usable. I think the reason for this is that
there is no single marker keyword in C#, as function behaves in javascript. In C# I need to look for namespace, class, struct, interface, enum, and so on.
there's a great deal of flexibility with which csharp constructs can be defined. As one example, a class can define base classes as well as implemented interfaces. Another example: The return type for a method isn't a simple word-like string, but can be something messy like Dictionary<String, List<String>> . The index routine needs to handle all those cases, and capture the matches. This makes it run sloooooowly.
I use a lot of looking-back. The marker I use in the current approach is the open curly brace. Once I find one of those, I use looking-back to determine if the curly is a class, interface, enum, method, etc. I read that looking-back can be slow; I'm not clear on how much slower it is than, say, looking-at.
once I find an open-close pair of curlies, I call narrow-to-region in order to index what's inside. not sure if this is will kill performance or not. I suspect that it is not the main culprit, because the perf problems I see happen in modules with one namespace and 2 or 3 classes, which means narrow gets called 3 or 4 times total.
What's the Question?
My question is: do you have any tips for speeding up imenu-like indexing in a C# buffer?
I'm considering:
avoiding looking-back. I don't know exactly how to do this because when re-search-forward finds, say, the keyword class, the cursor is already in the middle of a class declaration. looking-back seems essential.
instead of using open-curly as the marker, use the keywords like enum, interface, namespace, class
avoid narrow-to-region
any hard advice? Further suggestions?
Something I've tried and I'm not really enthused about re-visiting: building a wisent-based parser for C#, and relying on semantic to do the indexing. I found semantic to be very very very (etc) difficult to use, hard to discover, and problematic. I had semantic working for a while, but then upgraded to v23.2, and it broke, and I never could get it working again. Simple things - like indexing the namespace keyword - took a very long time to solve. I'm very dissatisfied with it and don't want to try again.
I don't really know C# syntax, and without looking at your elisp it's hard to give an answer, but here goes anyway.
looking-back can be deadly slow. It's the first thing I'd experiment with. One thing that helps a lot is using the limit arg to, say, restrict your search to the beginning of the current line. A different approach is when you hit the open curly do backward-char then backward-sexp (or whatever) to get to the front of the previous word, then use looking-at.
Using keywords to search around instead of open curly is probably what I would have done. Maybe something like (re-search-forward "\\(enum\\|interface\\|namespace\\|class\\)[ \t\n]*{" nil t) then using match-string-no-properties on the first capture group to see which of the keywords was found. This might help with the looking-back problem as well.
I don't know how expensive narrow-to-region is, but could be avoided by when you find a open curly do save-excursion forward-sexp and keep point as a limit for the current iteration of your (I assume recursive) searches.