would it be worth it to use inline::C to speed up math - perl

i have been working on a perl program to process large amounts of dna. It outputs exactly what i need however it takes much longer than i would like using NYTprof i have narrowed down the major problem areas to be the loop that adds my values together. would using inline::C to do the math make my program faster or should i accept the speed and move on? is there another way to improve the speed? here is my program and an input it would run as well as an executable with the default values entered already.

It's unlikely you'll get useful help here (this included). I can see various problems with your code, and none have to do with the choice of language.
use CPAN. If you're parsing genbank, then use some an appropriate module.
You're writing assembly in Perl, and neither Perl nor you are very good at that. It's near impossible to know what's going on when you don't pass parameters to subroutines, instead relying on globals all over the place. What do #X1, #X2, #Y1, #Y2 mean?
The following might be your problem: until ($ender - $starter > $tlength) { (line 153). According to your test case, these start by being 103, 1, and 200, and it's not clear when or if they change. Depending on what's in #te, it might or might not ever get out of the loop; I just can't tell from your code.
It would help if we knew, exactly, what are the parameters to add, the in-out invariants, and what it is returning.
That's all I got.

I second the recommendation of PDL made in a comment, if it's applicable. Or the use of a CPAN module tailored to your problem (again, if applicable).
I didn't see anything that looked unambiguously like "the loop that adds my values together" in that code; please, show just the code you are considering optimizing, ideally with just enough structure around it to actually run it.
So to answer your generic question generically, yes, Inline::C can be a useful tool for optimization if you are certain your performance problem is limited to what it actually can do for you. In using it, be aware that invoking your C code from Perl or vice versa is non-trivially expensive, so you have to have enough code translated to C to minimize the transitions.

Related

How can I run an algorithm written for Digitool 4.3 (2003)?

I work on computational music. I have found the ps13 pitch spelling algorithm implemented in Lisp in 2003, precisely "Digitool MCL 4.3". I would like to run this code, preferably on a Linux x86 machine, to compare its results with other similar codes.
I am new to Lisp, but so far my research led me to think that Digitool MCL is no longer available. I thought of two ways which may help me:
a virtual environment (Docker or else) which would emulate a machine from 2003…
a code translation tool which would transform the 2003 source code into something executable today
I have not succeeded in finding one of these two options, nor running it directly with sbcl (but, as a newbie, I may have missed a small modification to make it run easily).
May someone help me?
Summary
This code is very close to being portable CL: you won't need something emulating an antique Mac to run it. I ran it on three implementations (SBCL, LispWorks, CCL) within a few minutes. However if you're not a Lisp person (and don't want to become one) it will be somewhat more fiddly to do that.
However I can't just send you a fixed version, both because this isn't the right forum for that, and also because we'd need to get the author's permission to do so. I have asked him if he would be interested in a portabalised version, and if he is, I will send him one in due course. You could also get in touch and ask to be notified.
(Meta-summary: while I think the question is fine, any reasonable answer probably doesn't fit on SO.)
Details
One initial problem with this code is that the file uses old Mac line end conventions (I think: not Unix anyway): unless whatever Lisp you're using is smart enough to spot this (some are, SBCL seems not to be although I am sure there are options to tell it) you'll need to convert it.
Given that, the code that implements this algorithm is very, very close to being portable Common Lisp. It has four dependencies on non-standard things:
two global variables, *save-local-symbols* and *verbose-eval-selection*;
two functions: choose-file-dialog and choose-directory-dialog.
The global variables can probably be safely commented out as I think they are just controls for the compiler, probably. The functions have fairly obvious specifications: they're obviously meant to pop up file / directory choosers.
However you can just not use the bits of the code that use these functions, so you can compile it, get a few compiler warnings about undefined functions, and then it's fine.
But it gets better than that in fact: the latter-day descendant of MCL is Clozure CL: CCL is free, and open source. CCL has both choose-file-dialog and choose-directory-dialog already and both of the globals exist although one is no longer exported.
Unfortunately there are then some hidden portability problems to do with assumptions about what pathnames look like as strings: it's making some assumption about what things looked like on pre-OSX Macs I think. This kind of problem is easy but often a bit fiddly to fix (I think in this case it would be easy). So, again, the answer to that is just not call the things that are doing a lot of pathname munging:
> (ps13-test-from-file-list (directory "~/Downloads/d/*.opnd"))
[... much output ...]
Total number of errors = 81.
Total number of notes = 41544.
Percentage correct = 99.81%
nil
Note that the above output came from LispWorks, not CCL: CCL works just as well though, as will any CL probably.
SBCL has one additional problem: the CL-USER package in SBCL already uses a package which exports int which is defined in this code. So you need to compile it in some other package. But given that, it's fine in SBCL as well.

How to automatically convert a Matlab script into a Matlab function?

My problem is the following:
I have very many (~1000) mutually calling Matlab scripts, which are very poorly written, regularly damage each other's environments and generally became unmanageable.
One of the reasons I even got this problem is that I need to write a testsuite covering a big part of them. Luckily, for most of them the main criterion of 'correctness' is 'they don't crash'.
Just running them one by one in a loop is generally not an option, because they regularly call clear classes, close all, clc, shadow built-in functions and operators, et cetera.
So my original aim was to find a way to run a matlab script in sort of an 'isolated environment', but I didn't find a good way to do it. (Suggestions welcome, but it is not the main question.)
Since I will need to convert them all to functions anyway, I am looking for some way to do it auto-magically, or at least semi-automatically.
What I can mean semi-automatically:
Just add a line function varargout = $filename( varargin ) as the first line of the file, and end as the last one. This will at least make them runnable as functions with feval and all such functions and (more importantly) prevent them from damaging the test-runner.
Do point 1 and scan the file for referencing undeclared variables and add them as function arguments. This should be also doable, since the names of the variables are known. This will not help identifying output variables, but will still be a lot of help. For example, we could pack the whole workspace into one big output structure.
Do a runtime version of point 2. This way the 'magical converter' can actually track execution environments (workspaces) and identify which variables are implicitly used as 'input arguments' of a script, and which would be later used 'output arguments'. This option looks EXPHARD, but for a small number of calls should be not too bad in practice.
Point 1 I can implement myself using sed, as I also can get rid of all clear classes and clc, but the options 2 and 3 seem much harder. Is there anything at least remotely resembling options 2 or 3?

How expensive is: require "foo.pl";

I'm about to rewrite a large portion of a project that I have developed over the last 10years while learning perl. There is alot of optimisation that can be gained.
A key part of the code is a large if/elsif block that require xxx.cgi files depending on a POST value. Eg:
if($FORM{'action'} eq "1"){require "1.cgi";}
elsif($FORM{'action'} eq "2"){require "2.cgi";}
elsif($FORM{'action'} eq "3"){require "3.cgi";}
elsif($FORM{'action'} eq "4"){require "4.cgi";}
It has many more irritations but just how expensive is using "require" in perl?
require itself has a relatively low cost in any case and, if you require the same file more than once within a single run of your program, it will detect that the file has already been loaded and not attempt to load it a second time. However, if you have a long and highly-populated search path (#INC) and you require (or use) a lot of files, it's possible that all of the directory searches could add up; this isn't common (and doesn't sound likely in your case), but it can be improved by reorganizing your module directories so that the things you're loading show up earlier in #INC.
The potentially-major performance hit referred to by earlier answers is the cost of compiling the code in the files you require. Getting rid of the require by moving the code into your main program will not help with this, as the code will still need to be compiled. In your case, it would probably make things worse, as it would cause the code for all options to be compiled on every one rather than only compiling the code used by the one action selected by the user.
As has been said, it really depends on the actual code in those files. Your best bet would be to do tests using Devel::NYTProf and/or Benchmark to see where the most time is being spent in your code if you are unhappy with its performance.
You can also read Profiling Perl on perl.com, but it is a bit outdated as it uses Devel::DProf.
Not answer to your primary question, but still a good idea for code refactor i read recently in Ovid blog.
The first time, possibly expensive; Perl has to search a path to find the file and load it up. Subsequent times, it's cheap -- a table is consulted and the file isn't actually loaded a second time. If this is in a CGI that is run once per request and then exited, then this is not too good.
It's really going to depend on the size of the files you're calling to. If you have massive CGI files, then it might detriment the performance of your software. If we're talking 6 or 7 lines of code each, then no issue. Try benchmarking your program's performance with and without, and make your own judgement.

Why are Perl source filters bad and when is it OK to use them?

It is "common knowledge" that source filters are bad and should not be used in production code.
When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.
Why are source filters bad?
When is it OK to use a source filter?
Why source filters are bad:
Nothing but perl can parse Perl. (Source filters are fragile.)
When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).
When they're okay:
You're experimenting.
You're writing throw-away code.
Your name is Damian and you must be allowed to program in latin.
You're programming in Perl 6.
Only perl can parse Perl (see this example):
#result = (dothis $foo, $bar);
# Which of the following is it equivalent to?
#result = (dothis($foo), $bar);
#result = dothis($foo, $bar);
This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.
After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.
I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:
$ perl -MSmart::Comments test.pl
so as to avoid any chance that it might remain enabled in production code.
See also: Perl Cannot Be Parsed: A Formal Proof
I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.
Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)
It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.
For example, when you declare a method in MooseX::Declare like this:
method frob ($bubble, $bobble does coerce) {
... # complicated code
}
The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.
The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.
In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.
Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.
In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)
There is a nice example here that shows in what trouble you can get with source filters.
http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/
They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.

How can I create binary trees in Perl?

How do can I create binary trees in Perl?
CPAN contains a very wide variety of different modules, and rather than reinventing the wheel, I would suggest looking for it there first. Tree::Binary seems to do what you want to do.
I'm guessing this is some kind of homework assignment (although it's hard to tell from the question), so if you actually have to write your own, a good place to start would be learning how to create objects in Perl (here's a tutorial). The wikipedia page will probably be helpful as well.
A more detailed question will yield better responses.
There's the Tree::Binary module in CPAN...
Although I haven't used it, Tree::RedBlack creates the tree and keeps it balanced (if doing deletes or insertions). If I recall, some of the other tree modules may not provide this capacity (if I have it right).
Chris
I would avoid Tree::Binary from CPAN. We have production software that depends on it, and its API has changed significantly twice in the past two years, making the system crash. For instance, there is a function that continues to do the same thing, but the authors saw fit to at first call it "set_left", then change it to "left", and now "setLeft".