How to automatically convert a Matlab script into a Matlab function? - matlab

My problem is the following:
I have very many (~1000) mutually calling Matlab scripts, which are very poorly written, regularly damage each other's environments and generally became unmanageable.
One of the reasons I even got this problem is that I need to write a testsuite covering a big part of them. Luckily, for most of them the main criterion of 'correctness' is 'they don't crash'.
Just running them one by one in a loop is generally not an option, because they regularly call clear classes, close all, clc, shadow built-in functions and operators, et cetera.
So my original aim was to find a way to run a matlab script in sort of an 'isolated environment', but I didn't find a good way to do it. (Suggestions welcome, but it is not the main question.)
Since I will need to convert them all to functions anyway, I am looking for some way to do it auto-magically, or at least semi-automatically.
What I can mean semi-automatically:
Just add a line function varargout = $filename( varargin ) as the first line of the file, and end as the last one. This will at least make them runnable as functions with feval and all such functions and (more importantly) prevent them from damaging the test-runner.
Do point 1 and scan the file for referencing undeclared variables and add them as function arguments. This should be also doable, since the names of the variables are known. This will not help identifying output variables, but will still be a lot of help. For example, we could pack the whole workspace into one big output structure.
Do a runtime version of point 2. This way the 'magical converter' can actually track execution environments (workspaces) and identify which variables are implicitly used as 'input arguments' of a script, and which would be later used 'output arguments'. This option looks EXPHARD, but for a small number of calls should be not too bad in practice.
Point 1 I can implement myself using sed, as I also can get rid of all clear classes and clc, but the options 2 and 3 seem much harder. Is there anything at least remotely resembling options 2 or 3?

Related

When is the eval function necessary?

I just read the following article from MathWorks which describes why it is important to avoid the eval function and lists alternatives to many of eval's common uses.
After reading the article, I have the impression that the eval function is neither useful nor necessary. So, my question is this: When is the eval function necessary?
I have found only one useful case for eval, and then the evalc variety: when calling a function with built-in command line call back (e.g. lines without ; or with disp calls), which you cannot modify. For instance when you got some obfuscated function that dumps heaps of stuff to your command window. In that case it's best to try and obtain the source code to modify that to your needs, as using evalc will mess up your performance. Otherwise, I have not found a case where eval is the best solution.
I wrote an extensive answer detailing why you should try to avoid eval as much as possible here: How to put these images together?
I have already used eval when trying to create multiple arrays with different names. This is not really recommended, but it worked for my specific application. For example, if I wanted to have N matrices with the specific names "matrix1" "matrix2" .. "matrixN" , one solution would be to manually type these in as "matrix1 = something" ... "matrixN = somethingelse". If N is really large, this is not ideal. Using eval , you could set up a for loop that would change the name of the matrix on every loop, and calculate some value based on that same N value.
Of course, ideally saving them in to a cell would be better, but I needed the arrays in the format I described.

Overwrote built in function - Standard deviation

I want to have a std.m file for the standard deviation. It is in data fun toolbox, but, by mistake, I changed the code and the std command is not working anymore. How can I run the original std (standard deviation) command?
Taking all the comments out, the function std.m is actually extremely simple:
function y = std(varargin)
y = sqrt(var(varargin{:}));
This is the definition of the standard deviation: the square root of the Variance.
Set built-in functions to Read-Only
Now don't go breaking the var.m file because it is more complex and I wonder if there would be copyright issue to display the listing here.
To avoid the problem of breaking built-in files, it is advisable to set all your Matlab toolbox files as Read Only. I remember old Matlab installer giving the option to do that at install time. I don't know if the installer still offers the option, but if not it is extremely easy to do it manually (on Windows, just select your folders, right-click Properties, tick read only and accept to propagate the property to all subfolders and files).
Overloading
Once this is done, your built-in files are safe. You can still modify the default behavior of a built-in function by overloading it. This consist in writing a function with the same name and arrange for it to be called before the default function (your overload function becomes the default one).
This article explain how to overload user functions.
Matlab does not recommend to directly overload the built-in functions (rather call it another name like mySTD.m for example), but if you insist it is perfectly feasible and still a much better option than modifying the built-in function... at least the original function is still intact somewhere.

At which lines in my MATLAB code a variable is accessed?

I am defining a variable in the beginning of my source code in MATLAB. Now I would like to know at which lines this variable effects something. In other words, I would like to see all lines in which that variable is read out. This wish does not only include all accesses in the current function, but also possible accesses in sub-functions that use this variable as an input argument. In this way, I can see in a quick way where my change of this variable takes any influence.
Is there any possibility to do so in MATLAB? A graphical marking of the corresponding lines would be nice but a command line output might be even more practical.
You may always use "Find Files" to search for a certain keyword or expression. In my R2012a/Windows version is in Edit > Find Files..., with the keyboard shortcut [CTRL] + [SHIFT] + [F].
The result will be a list of lines where the searched string is found, in all the files found in the specified folder. Please check out the options in the search dialog for more details and flexibility.
Later edit: thanks to #zinjaai, I noticed that #tc88 required that this tool should track the effect of the name of the variable inside the functions/subfunctions. I think this is:
very difficult to achieve. The problem of running trough all the possible values and branching on every possible conditional expression is... well is hard. I think is halting-problem-hard.
in 90% of the case the assumption that the output of a function is influenced by the input is true. But the input and the output are part of the same statement (assigning the result of a function) so looking for where the variable is used as argument should suffice to identify what output variables are affected..
There are perverse cases where functions will alter arguments that are handle-type (because the argument is not copied, but referenced). This side-effect will break the assumption 2, and is one of the main reasons why 1. Outlining the cases when these side effects take place is again, hard, and is better to assume that all of them are modified.
Some other cases are inherently undecidable, because they don't depend on the computer states, but on the state of the "outside world". Example: suppose one calls uigetfile. The function returns a char type when the user selects a file, and a double type for the case when the user chooses not to select a file. Obviously the two cases will be treated differently. How could you know which variables are created/modified before the user deciding?
In conclusion: I think that human intuition, plus the MATLAB Debugger (for run time), and the Find Files (for quick search where a variable is used) and depfun (for quick identification of function dependence) is way cheaper. But I would like to be wrong. :-)

Anonymous function corruption in MATLAB. What does '#sf%' mean?

I was stumped by a segmentation fault in MATLAB.
It seems like it was caused by an anonymous function that was loaded from a mat file.
The original anonymous function handle was:
#(x)scaledNlfun(x,#logexp1,1e3)
But when it is loaded, it becomes:
#sf%1#(x)scaledNlfun(x,#logexp1,1e3)
It seems to be okay, when I call it in command line, but it creates a segmentation fault (or Segmentation violation) within a function. Not the function call itself, but a few lines after that. In debugging mode, if I step through the statement, it is fine as well.
The stack trace shows bunch of
[ 0] 0x00002b20b97baba4 /usr/local/MATLAB/R2013a/bin/glnxa64/libmwm_interpreter.so+04127652
and it happens on both MATLAB 2012a and 2013a on a Linux 2.6.18-371.3.1.el5 SMP.
This function handle was saved within a parfor loop using '-v7.3' option because the struct that contains the handle was too big. If I replace the anonymous function after loading the mat file, everything works fine, so I'm thinking the matlab load function has a bug.
Unfortunately, I cannot create a minimal example to reproduce the error. I tried saving anonymous function handles within parfor with '-v7.3', but without the other complex data structures, it seems to work fine. But I have 80 mat files that would reliably crash matlab (many of them more than 1GB).
In any case, does anybody know what that "#sf%" mean? (it's not the stateflow toolbox)
The core of the problem seems to be that you have #sf%1# where you would expect # just looking at this, I can think of a few possibilities:
Somehow sf%1# was inserted after the original #
Somehow #sf%1# was substituted in place of the original #
Somehow #sf%1 was attached before the original #
I would actually bet on the third one, but here are the most logical scenarios I can think of that could cause this problem:
Perhaps there was an invisible char?
Perhaps some kind of strange character conversion?
Probably a situation where two things are stored in a variable instead of one. Perhaps something like #s or #sf and some separating characters.
All in all, this does not explain why it would go well if you run the entire program in the console, but perhaps you just ran part of it. In that case these could be some things to look out for.

would it be worth it to use inline::C to speed up math

i have been working on a perl program to process large amounts of dna. It outputs exactly what i need however it takes much longer than i would like using NYTprof i have narrowed down the major problem areas to be the loop that adds my values together. would using inline::C to do the math make my program faster or should i accept the speed and move on? is there another way to improve the speed? here is my program and an input it would run as well as an executable with the default values entered already.
It's unlikely you'll get useful help here (this included). I can see various problems with your code, and none have to do with the choice of language.
use CPAN. If you're parsing genbank, then use some an appropriate module.
You're writing assembly in Perl, and neither Perl nor you are very good at that. It's near impossible to know what's going on when you don't pass parameters to subroutines, instead relying on globals all over the place. What do #X1, #X2, #Y1, #Y2 mean?
The following might be your problem: until ($ender - $starter > $tlength) { (line 153). According to your test case, these start by being 103, 1, and 200, and it's not clear when or if they change. Depending on what's in #te, it might or might not ever get out of the loop; I just can't tell from your code.
It would help if we knew, exactly, what are the parameters to add, the in-out invariants, and what it is returning.
That's all I got.
I second the recommendation of PDL made in a comment, if it's applicable. Or the use of a CPAN module tailored to your problem (again, if applicable).
I didn't see anything that looked unambiguously like "the loop that adds my values together" in that code; please, show just the code you are considering optimizing, ideally with just enough structure around it to actually run it.
So to answer your generic question generically, yes, Inline::C can be a useful tool for optimization if you are certain your performance problem is limited to what it actually can do for you. In using it, be aware that invoking your C code from Perl or vice versa is non-trivially expensive, so you have to have enough code translated to C to minimize the transitions.