What are the uses of Perl's C translation backend? - perl

Other than the purely obvious: "It translates Perl to C."; are there any real world uses (a.k.a. hacks) for the Perl compiler's optimized C translation backend, B::CC?

Not really. It means you can convert a (small) Perl script into a (big) C program, which will be much harder for the recipient to reverse engineer. In some paranoid circles, this might be accounted an advantage (for example, if your Perl code is embarrassingly bad and you'd rather conceal that fact from your paying customers). But mostly it is of limited to negative value.

Compiling a Perl program to an optree, which can then be executed, can take a while sometimes. You can safe some of that time by using perlcc with any of its backends. That'll, in one way or another, serialise the compiled optree and make loading it later, when executing your compiled binary, somewhat faster. I can see that being useful in, for example, CGI environments, for which, however, much better alternatives of avoiding startup costs are available.
Contrary to popular believe, perlcc doesn't make it very hard to reverse-engineer the resulting binary, as discussed in How can I reverse-engineer a Perl program that has been compiled with perlcc?

Related

Is there any performance benefits that come with using subroutine in Perl?

I am new to Perl 5 and so far I understand that subroutines are good for code reuse and breaking long procedural scripts into shorter and readable fragments. What I wanted to know is that are there any performance benefits that come with using subroutines? Do they make the code to execute faster?
Take a look at the following snippets;
#!/usr/bin/perl
print "Who is the fastest?\n";
compared to this one;
#!/usr/bin/perl
# defining subroutine
sub speed_test {
print "Who is the fastest?\n";
}
# calling subroutine ;
speed_test();
The performance benefit of subroutines is that, used proficiently, they allow developers to write, test, and release code faster.
The resulting code does not run faster, but that's OK, because it's not the 1960s any more. These days, hardware is cheap and programmers are expensive. Hiring a second programmer to speed up slow code production costs orders of magnitude more than buying a second machine (or a higher-powered machine) to speed up slow code execution.
Also, as I note with just about every "how can I micro-optimize Perl code?" question, Perl is not a high-performance language, period. Never has been, almost certainly never will be. That's not what it's designed for. Perl is optimized for speed of development, not speed of execution.
If saving a microsecond here and a microsecond there actually matters to your use case (which it almost certainly doesn't), then you will get far greater speed benefits by switching to a lower-level, higher-performance language than you will get by anything you might do differently in Perl.
There is no performance benefit in using subroutines. Actually there is a performance penalty for using them. That is why in highly speed-optimized code subroutines are sometimes "inlined".
But the root of all evil in software development is premature code optimization. Writing code that does not use subroutines/methods to delegate different tasks is way worse on the long run.
Any code block that exceeds 10 (or maybe 25? or 50?) lines usually has to be split. The limit depends on the people you ask.

Is there a standard way for perl to behave when it runs out of memory?

Is there a standard(ish) way for a Perl interpreter (aka "perl") to behave when it runs out of memory? Is it documented/specced-out in any way? Coded in some uniform way?
I'm especially interested in any standards which are expressed as covenant to Perl code being run - e.g., will die be called? Will END block be executed? Etc...
I'm fine with both an "theoretical" answer (e.g. some sort of generic "this is what perl code ought to do in general on out-of-memory" mission statement document from Larry/P5P/etc..., even if not 100% of malloc() calls follow this rule); or a "practical" statement (e.g. all malloc() calls in Perl are wrapped into a generic "allocate_memory" function which uniformly handles all failures).
It may be possible that the answer depends on what specifically causes out of memory (e.g. a request for more memory for Perl code's data structure vs. memory allocated by internal Perl code unrelated to explicit "need to store more data" logic in Perl program).
If the answer is extremely implementation dependent, assume perl for Solaris/Linux,and narrowing down to any recent stable version (5.8 to 5.16) is acceptable.
The question is limited to standard Perl interpreter, however you wish to define that as far as pre-compile configuration (e.g. perl that comes with a major Linux distribution, or one compiled with all defaults left alone, etc...).
NOTE: This question came out of Gilles's comment to another Q
Taking a look at the manual page for the various diagnostic warnings that Perl will issue when the "use diagnostics" pragma is enabled, you can see various different types of "out of memory" errors and what they mean.
So you can infer the "standard" behavior from these messages; the one with an exclamation point ("Out of memory!") sounds like the one you're asking about:
Out of memory!
(X) The malloc() function returned 0, indicating there was
insufficient remaining memory (or virtual memory) to satisfy the
request. Perl has no option but to exit immediately.
An "X" level error is labeled as "A very fatal error (nontrappable)."
However if it's a "large request" (for greater than 64K) it is trappable (I guess Perl assumes it'll have enough memory to shutdown cleanly).

Does Common Lisp have the fastest PCRE implementation?

A friend claimed that Common Lisp has the fastest Perl-compatible regular expression library of any language, including Perl itself, because with an optimizing JIT compiler like SBCL, CL-PPCRE can compile each particular regex down to native assembly whereas other implementations, include Perl's, must generate bytecode and interpret it. In practice, especially for the common case where we try to match the same regex against many inputs or long input, the compilation overhead is more than justified.
Unfortunately, I can't find any benchmarks on this, and I don't know enough to run my own, so I turn to the hive mind. Can anyone evaluate this claim?
I have no benchmarks of my own to share, but perhaps your friend was referring to results concerning the portable regular expression library CL-PPCRE. The current web page no longer speaks of benchmarks, but courtesy of the Wayback Machine we can see that it used to show benchmark results where CL-PPCRE outperformed Perl 2-to-1. Benchmarking is a tricky business (especially for moving targets), which might explain why the current page is silent on the matter.
PCRE library has a JIT compiler module now, and it has a good performance compared to other regexes: http://sljit.sourceforge.net/regex_perf.html or http://blog.rburchell.com/2011/12/why-i-avoid-qregexp-in-qt-4-and-so.html

Why is Perl used so extensively in biology research? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I work as support staff in a biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk.
Why is Perl used so much in biology?
Lincoln Stein highlighted some of the saving graces of Perl for bioinformatics in his article:
How Perl Saved the Human Genome Project.
From his analysis:
I think several factors are responsible:
Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language.
Perl is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry. Of course this flexibility can be also be a curse. I talk more about the problems with Perl below.
Perl is component-oriented. Perl encourages people to write their software in small modules, either using Perl library modules or with the classic Unix tool-oriented approach. External programs can easily be incorporated into a Perl script using a pipe, system call or socket. The dynamic loader introduced with Perl5 allows people to extend the Perl language with C routines or to make entire compiled libraries available for the Perl interpreter. An effort is currently under way to gather all the world's collected wisdom about biological data into a set of modules called "bioPerl" (discussed at length in an article to be published later in the Perl Journal).
Perl is easy to write and fast to develop in. The interpreter doesn't require you to declare all your function prototypes and data types in advance, new variables spring into existence as needed, calls to undefined functions only cause an error when the function is needed. The debugger works well with Emacs and allows a comfortable interactive style of development.
Perl is a good prototyping language. Because Perl is quick and dirty, it often makes sense to prototype new algorithms in Perl before moving them to a fast compiled language.
Sometimes it turns out that Perl is fast enough so that of the algorithm doesn't have to be ported; more frequently one can write a small core of the algorithm in C, compile it as a dynamically loaded module or external executable, and leave the rest of the application in Perl (for an example of a complex genome mapping application implemented in this way, see http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/).
Perl is a good language for Web CGI scripting, and is growing in importance as more labs turn to the Web for publishing their data.
The real answer probably has less to do with Perl than you think. Many of the things that happen are accidents of history. At the time, way back when, Perl was pretty popular, Java was getting more popular, not too many people were paying attention to Python, and Ruby was just getting started.
The people who needed to get work done used Perl and made some libraries in Perl, and other people started using those libraries. Once people start using something that is moderately useful to them, they tend not to switch (economists call those "switching costs"). From there, even more people start using it because a lot of other people are using it.
The same evolution might not happen today. I'd say that Perl, Python, and Ruby are all completely adequate and up to the task. All the things that mobrule quotes from Lincoln Stein could apply to any of the three today. If everyone had to start from scratch today, any one of those languages could be the one that everyone uses.
I've noticed, from my own client base though (a very small and unrepresentative sample of biotech), that the people pushing the programming for a lot of the biological stuff seemed to be at least part-time sysadmins who were supporting scientists. The scientists worried about the science and did some light programming, but the IT support people were doing a lot of the heavy lifting for the non-science parts. Perl is very well positioned as a sysadmin tool since it's the duct-tape of the internet.
Probably because Perl is good at manipulating strings, and much research in genetics involves the manipulation of veeery long "ACTGCATG..." strings. Just guessing...
I use lots of Perl for dealing with qualitative and quantitative data in social science research. In terms of getting things done (largely with text) quickly, finding libraries on CPAN (nice central location), and generally just getting things done quickly, it can't be surpassed.
Perl is also excellent glue, so if you have some instrumental records, and you need to glue them to data analysis routines, then Perl is your language.
Perl seems to be the language of choice for bioinformatics - there's even an O'Reilly title on just this subject: Beginning Perl for Bioinformatics.
Perl is very powerful when it comes to deal with text and it's present in almost every Linux/Unix distribution. In bioinformatics, not only are sequence data very easy to manipulate with Perl, but also most of the bionformatics algorithms will output some kind of text results.
Then, the biggest bioinformatics centers like the EBI had that great guy, Ewan Birney, who was leading the BioPerl project. That library has lots of parsers for every kind of popular bioinformatics algorithms' results, and for manipulating the different sequence formats used in major sequence databases.
Nowadays, however, Perl is not the only language used by bioinformaticians: along with sequence data, labs produce more and more different kinds of data types and other languages are more often used in those areas.
The R statistics programming language for example, is widely used for statistical analysis of microarray and qPCR data (among others). Again, why are we using it so much? Because it has great libraries for that kind of data (see bioconductor project).
Now when it comes to web development, CGI is not really state of the art today, but people who know Perl may stick to it. In my company though it is no longer used...
I hope this helps.
Perl basically forces very short development cycles. That's the kind of development that gets stuff done.
It's enough to outweigh Perl's disadvantages.
Bioinformatics deals primarily in text parsing and Perl is the best programming language for the job as it is made for string parsing. As the O'Reilly book (Beginning Perl for Bioinformatics) says that "With [Perl]s highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis."
This seems to be a pretty comprehensive response. Perhaps one thing missing, however, is that most biologists (until recently, perhaps) don't have much programming experience at all. The learning curve for Perl is much lower than for compiled languages (like C or Java), and yet Perl still provides a ton of features when it comes to text processing. So what if it takes longer to run? Biologists can definitely handle that. Lab experiments routinely take one hour or more finish, so waiting a few extra minutes for that data processing to finish isn't going to kill them!
Just note that I am talking here about biologists that program out of necessity. I understand that there are some very skilled programmers and computer scientists out there that use Perl as well, and these comments may not apply to them.
People missed out DBI, the Perl abstract database interface that makes it really easy to work with bioinformatic databases.
There is also the one-liner angle. You can write something to reformat data in a single line in Perl and just use the -pe flag to embed that at the command line. Many people using AWK and sed moved to Perl. Even in full programs, file I/O is incredibly easy and quick to write, and text transformation is expressive at a high level compared to any engineering language around. People who use Java or even Python for one-off text transformation are just too lazy to learn another language. Java especially has a high dependence on the JVM implementation and its I/O performance.
At least you know how fast or slow Perl will be everywhere, slightly slower than C I/O. Don't learn grep, cut, sed, or AWK; just learn Perl as your command line tool, even if you don't produce large programs with it. Regarding CGI, Perl has plenty of better web frameworks such as Catalyst and Mojolicious, but the mindshare definitely came from CGI and bioinformatics being one of the earliest heavy users of the Internet.
Perl is very easy to learn as compared to other languages. It can fully exploit the biological data which is becoming the big data. It can manipulate big data and perform good for manipulation data curation and all type of DNA programming, automation of biology has become easy due languages like Perl, Python and Ruby. It is very easy for those who are knowing biology, but not knowing how to program that in other programming languages.
Personally, and I know this will date me, but it's because I learned Perl first. I was being asked to take FASTA files and mix with other FASTA files. Perl was the recommended tool when I asked around.
At the time I'd been through a few computer science classes, but I didn't really know programming all that well.
Perl proved fairly easy to learn. Once I'd gotten regular expressions into my head I was parsing and making new FASTA files within a day.
As has been suggested, I was not a programmer. I was a biochemistry graduate working in a lab, and I'd made the mistake of setting up a Linux server where everyone could see me. This was back in the day when that was an all-day project.
Anyway, Perl became my goto for anything I needed to do around the lab. It was awesome, easy to use, super flexible, other Perl guys in other labs we're a lot like me.
So, to cut it short, Perl is easy to learn, flexible and forgiving, and it did what I needed.
Once I really got into bioinformatics I picked up R, Python, and even Java. Perl is not that great at helping to create maintainable code, mostly because it is so flexible. Now I just use the language for the job, but Perl is still one of my favorite languages, like a first kiss or something.
To reiterate, most bioinformatics folks learned coding by just kluging stuff together, and most of the time you're just trying to get an answer for the principal investigator (PI), so you can't spend days on code design. Perl is superb at just getting an answer, it probably won't work a second time, and you will not understand anything in your own code if you see it six months later; BUT if you need something now, then it is a good choice even though I mostly use Python now.
I hope that gives you an answer from someone who lived it.

What is best practice as far as using perl-isms (idiomatic expressions) in Perl?

A couple of years back I participated in writing the best practices/coding style for our (fairly large and often Perl-using) company. It was done by a committee of "senior" Perl developers.
As anything done by consensus, it had parts which everyone disagreed with. Duh.
The part that rubbed wrong the most was a strong recommendation to NOT use many Perlisms (loosely defined as code idioms not present in, say C++ or Java), such as "Avoid using '... unless X;' constructs".
The main rationale posited for such rules as this one was that non-Perl developers would have much harder time with the Perl code base otherwise. The assumption here I guess is that Perl code jockeys are rarer breed overall - and among new hires to the company - than non-Perlers.
I was wondering whether SO has any good arguments to support or reject this logic... it is mostly academic curiosity at this point as the company's Perl coding standard is ossified and will never be revised again as far as I'm aware.
P.S. Just to be clear, the question is in the context I noted - the answer for an all-Perl smaller development shop is obviously a resounding "use Perl to its maximum capability".
I write code assuming that a competent Perl programmer will be reading it. I don't go out of my way to be clever, but I don't dumb it down either.
If you're writing code for people who don't know the language, you're going to miss most of the point of using that language. I often find that people want to outlaw Perlisms because they refuse to learn any more than they already know.
Since you say that you are in a small Perl shop, it should be pretty easy to ask the person who wrote the code what it means if you don't understand it. That sort of stuff should come up in code reviews and so on. Everyone continues to learn more about the language as you have periodic and regular chances to review the code. You shouldn't let too much time elapse without other eyeballs looking at someone's code. You certainly shouldn't wait until a week after they leave the company.
As for new hires, I'm always puzzled why anyone would think that you should sit them in front of a keyboard and turn them loose expecting productive work in a codebase they have never seen.
This isn't limited to Perl, either. It's a general programming issue. You should always be learning more about your tools. Most of the big shops I know have mini-bootcamps to bring developers up to speed on the codebase, including any bits of tricky code they may encounter.
I ask myself two simple questions:
Am I doing this because it's devilishly clever and/or shows off my extensive knowledge of Perl arcana?
Then it's a bad idea. But,
Am I doing this because it's idiomatic Perl and benefits from Perl's distinct advantages?
Then it's a good idea.
I see no justifiable reason to reject, say, string interpolation just because Java and C don't have it. unless is a funny one but I think having a subroutine start with the occasional
return undef unless <something>;
isn't so bad.
What sort of perlisms do you mean?
Good:
idiomatic for loops: for(1..5) {} or for( #foo ) {}
Scalar context evaluation of arrays: my $count = #items;
map, grep and sort: my %foo = map { $_->id => $_ } #objects;
OK if limited:
statement modifier control - trailing if, unless, etc.
Restrict to error trapping and early returns. die "Bad juju\n" unless $foo eq 'good juju';
As Schwern pointed out, another good use is conditional assignment of default values: my $foo = shift; $foo = 'blarg' unless defined $foo;. This usage is, IMO, cleaner than a my $foo = defined $_[0] ? shift : 'blarg';.
Reason to avoid: if you need to add additional behaviors to the check or an else, you have a big reformatting job. IMO, the hassle to redo a statement (even in a good editor) is more disruptive than typing several "unnecessary" blocks.
Prototypes - use only to create filtery functions like map. Prototypes are compiler hints not 'prototypes' in the sense of any other language.
Logical operators - standardize on when to use and and or vs. && and ||. All your code should be consistent. Best if you use a Perl::Critic policy to enforce.
Avoid:
Local variables. Dynamic scope is damn weird, and local is not the same as local anywhere else.
Package variables. Enables bad practices. If you think you need globally shared state, refactor. If you still need globally shared state, use a singleton.
Symbol table hackery
It must have been, as you say, a few years ago, because Damian Conway has 'cornered the market' in Perl standards with Perl Best Practices for the last few years.
I've worked in a similarly ossified environment - where we were not allowed to adopt the latest best practices, because that would be a change, and no one at a sufficiently high level in the corporate structure understood (or could be bothered to understand) Perl and sign off on moving in to the 21st Century.
A corporation that deploys a technology and retains it, but doesn't either buy in expertise or train up in house, is asking for trouble.
(I'd guess you're working in a highly change-controlled environment - financial perhaps?)
I agree with brian on this by the way.
I'd say Moose kills off 99.9% of Perl-isms, by convention, that shouldn't be used: symbol table hackery, reblessing objects, common blackbox violations: treating objects as arrays or hashes. The great thing, is it does all of this without taking the functionality hit of "not using it".
If the "perl-isms" you're really referring to are mutator form (warn "bad idea" unless $good_idea), unless, and until then I don't think you really have much of an argument because these "perlisms" don't seem to inhibit readability to either perl users, or non-perl users.
Pick up a copy of Effective Perl Programming: Ways to Write Better, More Idiomatic Perl (2nd Edition), and treat that as a guideline. It contains many of the better idioms and is packed with the little bits of information that will get you writing good Perl style Perl code, as opposed to C or Java (or whatever) style Perl code.