Is there any current review of statistical modules for Perl? - perl

I would like to know which is the current status of the statistical modules in CPAN, does any one know any recent review or could comment about its likes/dislikes with those modules?
I have used the clasical: Statistics::Descriptive, Statistics::Distributions, and some others contained in Bundle::Math::Statistics
Some of the modules has not been updated for long time. I don't know if this is because they are rock solid or has been overtaken by better modules.
Does someone know any current review similar to this old one:
Using Perl for Statistics: Data Processing and Statistical Computing
NB (for the people that will suggest to use R ;-)):
All my code is mainly in perl, but I use R a lot for statistics and plotting. I usually prepare the dataframes with perl and write the R script in the perl modules as templates and save to a file and execute them from perl. But sometimes you have small data sets where efficiency is not an issue (well I am using perl insn't it ;-)) and you want to add some statistics and histograms to your report produced with perl.

PDL, the Perl Data Language is alive and thriving so its worth taking a look at that.
And I think the other stats modules you mention are OK. For eg. Statistics::Descriptive is up-to-date and has been used in answers to a few questions here on Stackoverflow.
NB. There is also a Perl to R bridge called Statistics::R which looks interesting.
/I3az/

Related

perl add options in pod documentation from hash in GetOptions

This is naive question for which I struggled to find an elegant solution. I write perl scripts that as they mature, grow in the number of options passed to GetOptions. The important options for the script, I add on top as POD documentation, but I rely on giving meaningful names to the other options, and I don't bother to document them explicitly.
I would like to pass the hash in GetOptions somehow to the content printed by perldoc, so that the non-documented options is listed there. Any option?
perldoc parses pod. You'd have to write a script to modify your .pl's pod based on values obtained by running your .pl... Yeah, that doesn't sound like a good idea.
You might be interested in Getopt::Euclid, Docopt, Getopt::Auto or Getopt::AsDocumented. These take the opposite approach: You define the options in the documentation, and they parse the documentation to determine how to process the command line.
I ralize that your point of view is code first and document later the things that become stable. That is fair when you need to write twice docs and code. But nowadays that is not true. My recomendation for prototyping is write the manpage and use a module that create the code of the options for you.
My favorite in perl and python is docopt (and has implementations for many other languages). Here you can find the Perl Docopt
I asked about perl implementatios for docopt long time ago in stackoverflow and I was pointed to the docopt module and other options like Getopt::Euclid, Getopt::Auto and Getopt::AsDocumented
Related with your concern about not incrementing cpan dependencies,
in python Docopt is selfcontained but I think it is not true for perl implementation which depends on
boolean,
Class::Accessor::Lite,
List::MoreUtils,
List::Util,
parent,
Pod::Usage
and
Scalar::Util; and the dependencies of the dependencies....
If the perl Docopt implementation is not as small and selfcontained as the original Python then probably is time to give it wider usage and start to fill feature/implementation requests because, IMHO, the doctopt way of doing things is ONE of the many ways of doing the things right.
Finally here you have a talk about the why of docopt. It is about python but command line interfaces logic and UX are the same for all the languages: PyCon UK 2012: Create beautiful command-line interfaces with Python

Are there any modules I'm missing to help me write better code?

I'm using the following command to test my perl code:
perl -MB::Lint::StrictOO -MO=Lint,all,oo -M-circular::require -M-indirect -Mwarnings::method -Mwarnings::unused -c $file
On a system with a perl version less than 5.10 I am also using uninit.
I am also using Perl::Critic and Perl::Tidy and have set up the appropriate rc files to my liking.
These modules have done a great job in helping me break some bad habits I learned when first learning perl.
Are there any more modules or pragmas that will kick me back on the straight and narrow when I mess up?
Using tests, and the Test::* family of modules and some good books have been pointed out. This new information has caused me to reconsider some assumptions about the relationship between testing and code skill building. These are all appreciated and already being researched and put to use.
It seems to me that these are two separate parts of a whole. 'perl -c', Perl::Critic and Perl::Tidy all help during the process of writing code and before execution of code. Devel::Cover, Devel::NYTProf and Tests happen during and after execution of code.
Good development dictates an iterative process, so tests will be run, and code developed over and over, but we still have this separation.
It appears to me that the focus in the answers have been on the 'during and after execution' of code. Again, this is very appreciated. Can I assume that I have the 'writing and pre-execution' part down pretty well then? At least, insomuch as the pragmas, modules and utilities are concerned.
I'm a little worried that you're using Perl 5.9. For two reasons.
Firstly it's a little old. 5.9.0 was released in 2003 and 5.9.5 (the last version in the 5.9.x series) was released in 2007. There have been several high quality versions of Perl since then.
Secondly (and most importantly), 5.9 is an unstable development version of Perl. 5.9 is basically the series of experiments that eventually led to Perl 5.10.0. The only reason to use it is to test that 5.10 will be a stable version of Perl. No-one should be using it at all now.
You don't appear to be testing your code, merely checking that it will compile. I suggest that you look at Test::More (which makes writing actual tests nice and easy), Test::Class (which makes dealing with very large test suites easier), and Devel::Cover (to see which bits of your code are covered by your tests and which aren't).

Why is Perl used so extensively in biology research? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I work as support staff in a biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk.
Why is Perl used so much in biology?
Lincoln Stein highlighted some of the saving graces of Perl for bioinformatics in his article:
How Perl Saved the Human Genome Project.
From his analysis:
I think several factors are responsible:
Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language.
Perl is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry. Of course this flexibility can be also be a curse. I talk more about the problems with Perl below.
Perl is component-oriented. Perl encourages people to write their software in small modules, either using Perl library modules or with the classic Unix tool-oriented approach. External programs can easily be incorporated into a Perl script using a pipe, system call or socket. The dynamic loader introduced with Perl5 allows people to extend the Perl language with C routines or to make entire compiled libraries available for the Perl interpreter. An effort is currently under way to gather all the world's collected wisdom about biological data into a set of modules called "bioPerl" (discussed at length in an article to be published later in the Perl Journal).
Perl is easy to write and fast to develop in. The interpreter doesn't require you to declare all your function prototypes and data types in advance, new variables spring into existence as needed, calls to undefined functions only cause an error when the function is needed. The debugger works well with Emacs and allows a comfortable interactive style of development.
Perl is a good prototyping language. Because Perl is quick and dirty, it often makes sense to prototype new algorithms in Perl before moving them to a fast compiled language.
Sometimes it turns out that Perl is fast enough so that of the algorithm doesn't have to be ported; more frequently one can write a small core of the algorithm in C, compile it as a dynamically loaded module or external executable, and leave the rest of the application in Perl (for an example of a complex genome mapping application implemented in this way, see http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/).
Perl is a good language for Web CGI scripting, and is growing in importance as more labs turn to the Web for publishing their data.
The real answer probably has less to do with Perl than you think. Many of the things that happen are accidents of history. At the time, way back when, Perl was pretty popular, Java was getting more popular, not too many people were paying attention to Python, and Ruby was just getting started.
The people who needed to get work done used Perl and made some libraries in Perl, and other people started using those libraries. Once people start using something that is moderately useful to them, they tend not to switch (economists call those "switching costs"). From there, even more people start using it because a lot of other people are using it.
The same evolution might not happen today. I'd say that Perl, Python, and Ruby are all completely adequate and up to the task. All the things that mobrule quotes from Lincoln Stein could apply to any of the three today. If everyone had to start from scratch today, any one of those languages could be the one that everyone uses.
I've noticed, from my own client base though (a very small and unrepresentative sample of biotech), that the people pushing the programming for a lot of the biological stuff seemed to be at least part-time sysadmins who were supporting scientists. The scientists worried about the science and did some light programming, but the IT support people were doing a lot of the heavy lifting for the non-science parts. Perl is very well positioned as a sysadmin tool since it's the duct-tape of the internet.
Probably because Perl is good at manipulating strings, and much research in genetics involves the manipulation of veeery long "ACTGCATG..." strings. Just guessing...
I use lots of Perl for dealing with qualitative and quantitative data in social science research. In terms of getting things done (largely with text) quickly, finding libraries on CPAN (nice central location), and generally just getting things done quickly, it can't be surpassed.
Perl is also excellent glue, so if you have some instrumental records, and you need to glue them to data analysis routines, then Perl is your language.
Perl seems to be the language of choice for bioinformatics - there's even an O'Reilly title on just this subject: Beginning Perl for Bioinformatics.
Perl is very powerful when it comes to deal with text and it's present in almost every Linux/Unix distribution. In bioinformatics, not only are sequence data very easy to manipulate with Perl, but also most of the bionformatics algorithms will output some kind of text results.
Then, the biggest bioinformatics centers like the EBI had that great guy, Ewan Birney, who was leading the BioPerl project. That library has lots of parsers for every kind of popular bioinformatics algorithms' results, and for manipulating the different sequence formats used in major sequence databases.
Nowadays, however, Perl is not the only language used by bioinformaticians: along with sequence data, labs produce more and more different kinds of data types and other languages are more often used in those areas.
The R statistics programming language for example, is widely used for statistical analysis of microarray and qPCR data (among others). Again, why are we using it so much? Because it has great libraries for that kind of data (see bioconductor project).
Now when it comes to web development, CGI is not really state of the art today, but people who know Perl may stick to it. In my company though it is no longer used...
I hope this helps.
Perl basically forces very short development cycles. That's the kind of development that gets stuff done.
It's enough to outweigh Perl's disadvantages.
Bioinformatics deals primarily in text parsing and Perl is the best programming language for the job as it is made for string parsing. As the O'Reilly book (Beginning Perl for Bioinformatics) says that "With [Perl]s highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis."
This seems to be a pretty comprehensive response. Perhaps one thing missing, however, is that most biologists (until recently, perhaps) don't have much programming experience at all. The learning curve for Perl is much lower than for compiled languages (like C or Java), and yet Perl still provides a ton of features when it comes to text processing. So what if it takes longer to run? Biologists can definitely handle that. Lab experiments routinely take one hour or more finish, so waiting a few extra minutes for that data processing to finish isn't going to kill them!
Just note that I am talking here about biologists that program out of necessity. I understand that there are some very skilled programmers and computer scientists out there that use Perl as well, and these comments may not apply to them.
People missed out DBI, the Perl abstract database interface that makes it really easy to work with bioinformatic databases.
There is also the one-liner angle. You can write something to reformat data in a single line in Perl and just use the -pe flag to embed that at the command line. Many people using AWK and sed moved to Perl. Even in full programs, file I/O is incredibly easy and quick to write, and text transformation is expressive at a high level compared to any engineering language around. People who use Java or even Python for one-off text transformation are just too lazy to learn another language. Java especially has a high dependence on the JVM implementation and its I/O performance.
At least you know how fast or slow Perl will be everywhere, slightly slower than C I/O. Don't learn grep, cut, sed, or AWK; just learn Perl as your command line tool, even if you don't produce large programs with it. Regarding CGI, Perl has plenty of better web frameworks such as Catalyst and Mojolicious, but the mindshare definitely came from CGI and bioinformatics being one of the earliest heavy users of the Internet.
Perl is very easy to learn as compared to other languages. It can fully exploit the biological data which is becoming the big data. It can manipulate big data and perform good for manipulation data curation and all type of DNA programming, automation of biology has become easy due languages like Perl, Python and Ruby. It is very easy for those who are knowing biology, but not knowing how to program that in other programming languages.
Personally, and I know this will date me, but it's because I learned Perl first. I was being asked to take FASTA files and mix with other FASTA files. Perl was the recommended tool when I asked around.
At the time I'd been through a few computer science classes, but I didn't really know programming all that well.
Perl proved fairly easy to learn. Once I'd gotten regular expressions into my head I was parsing and making new FASTA files within a day.
As has been suggested, I was not a programmer. I was a biochemistry graduate working in a lab, and I'd made the mistake of setting up a Linux server where everyone could see me. This was back in the day when that was an all-day project.
Anyway, Perl became my goto for anything I needed to do around the lab. It was awesome, easy to use, super flexible, other Perl guys in other labs we're a lot like me.
So, to cut it short, Perl is easy to learn, flexible and forgiving, and it did what I needed.
Once I really got into bioinformatics I picked up R, Python, and even Java. Perl is not that great at helping to create maintainable code, mostly because it is so flexible. Now I just use the language for the job, but Perl is still one of my favorite languages, like a first kiss or something.
To reiterate, most bioinformatics folks learned coding by just kluging stuff together, and most of the time you're just trying to get an answer for the principal investigator (PI), so you can't spend days on code design. Perl is superb at just getting an answer, it probably won't work a second time, and you will not understand anything in your own code if you see it six months later; BUT if you need something now, then it is a good choice even though I mostly use Python now.
I hope that gives you an answer from someone who lived it.

How can I use Perl to test C programs?

I'm looking for some tutorials showing how I could test C programs by writing Perl programs to automate testing.
Basically I want to learn automation testing with Perl programs. Can anyone kindly share such tutorials or any experiences of yours which can help me kick-start this process?
Perl tests usually use TAP. There are a several C libraries for TAP. Watch this Perl testing presentation.
If you want to start learning how to use Perl to test external programs, start with learning to use Perl to test Perl bits. The Test::More module is a good place to start. Once you understand that, look at all of the other Test::* modules on CPAN to see if one of those modules does the sort of thing you need to do.
If you have a specific question, ask about that. This question is really too broad for anyone to provide a useful answer.
You should look at Perl Testing: A Developer's Notebook by chromatic and Ian Langworth.
I keep meaning to buy a copy but as yet I've just skimmed it at perlmongers meetings. But it seems to be spot on what you're looking for.
UPDATE:
Hm, and this shows I should read the question - testing C programs with Perl, not testing Perl programs with Perl.
The book may still be useful (in that you should probably be writing test scripts and using Test::More and friends) but you will need to write a set of Perl functions to control your C if you take that approach. Basically
sub run_my_c_program {
my #args=#_;
#Set up test environment according to #args
system "my-c-program";
# Turns restults into a $rv data structure
return $rv;
}
and then check $rv in the same way as a normal Perl test:
is_deeply(run_my_c_program(...),
{ .. what I think it returns ..},
".. description of what I'm testing ..");
I don't have a tutorial but I used to be on a test team that tested a C++ compiler. The test harness (home brewed) we used was written in perl and it worked for us for many years. Perl was ideal because we could easily use it to invoke the tools to build the program, capture the output for later insertion into our test logs (using the "backticks" to run the compiler. For example: $compilerOutput = `cl -?`;
), then, if the build was successful, run the test programs and capture their output, again for insertion into our test logs.
Other benefits we got from using Perl:
For pass/fail detection, Perl's regular expression support was helpful.
Our program (the compiler) was ported during the course of its life to many different hardware architectures. Since Perl source was available and in C, we could get perl running on a new architecture as soon as a C compiler was available (which is usually the first language implemented in a new environment after an assembler).
Lots of documentation, books, help available.
-Ron
I wrote an article about Testing C with Libtap that uses Perl's Test::Harness to test C programs. Here's an example project: https://github.com/stig/libggtl/tree/master/t -- might not build any more, as the esoteric build system probably doesn't exist, but you should be able to figure out how it works :-)

The future of Perl? (Perl 6, employability)

I've found a few related questions, like Python vs. Perl (now deleted) and Is Perl Worth it? (now deleted), but I can't seem to find anything that directly addresses this question.
Is there a legitimate future in Perl? I work in a Perl shop right now, and I came from PHP so I see some of the advantages of an arguably "lower" level language when doing things on the server-level, but it seems to me a lot of the tasks in Perl can be performed more quickly in PHP, and SOME ARGUE (subjective, not my opinion) that Python does these tasks in a more explicit way that's easier to maintain.
Is having this job on my resume ultimately going to make me less employable, especially if the language no longer grows?
A few notes:
I love Perl, so don't think I'm bashing the language. It's fun to use and we use a fairly verbose syntax that is relatively easy to maintain.
I realize that "Vaporware" is a buzzword that isn't necessarily applicable to this situation, because Perl doesn't have a marketing department and they're not "promising" Perl 6 by any date.
I realize that CPAN keeps the community going, so whether Perl 6 comes out or not people continue to build modules that increase possibilities in the language, but that doesn't mean that industry shops realize this, and switch to "more supported" languages that keep coming out with revised versions of the language like Python and (especially) PHP.*
EDIT {CLARIFICATION}
Cade Roux and Telemachus both brought up good points about whether or not your future can be defined by your resume.
To be honest, this was brought up when one of my former employers said "I don't hire anyone with Perl as their last job. That's OLD technology." This was a PHP shop, so take all that with a grain of salt.
Now without defaming my former employer, she's not a tech person AT ALL, so she was really expressing an opinion of a layperson, and in this case my question was more along the lines of "Is there a stigma on this particular technology placed on it by people who don't utilize it?", specifically more along the lines of people who may have had past experience with similar employers. I'm not asking you to look into the future with a magic glass to assume what the next "hot" language would be, but rather if this particular language (which is accused of stunted growth, again by laypeople) has negative connotations placed upon it.
I hope that makes a little more sense.
Plenty of shops - including on Wall Street - heavily use Perl and will continue to do so.
However, I have never seen a PHP or Python used in this industry (not saying it is not used, but that I never encountered. Purely personal anecdote. Nor have I EVER heard any conversation of "Perl can not do X that Python can, let's use Python").
Perl6 is irrelevant to job picture.
Many shops are still on 5.8 or G-d forbid 5.6
More importantly, perl5 continues to evolve, including with features/ideas from Perl6. See Perl 5.10 and 5.11
Plus evolution includes really cool framework like Moose etc...
I can probably come up with more bullets later, but the summary is that no, having a Perl job will in no way negatively affect your career prospects.
However, knowing nothing but Perl may affect it negatively, so make sure you know Java, C#, C++ or something besides dynamic interpreted languages. Not many shops would hire "Perl Only" developer, even if they gladly hire "Perl + other stuff" ones.
See Tim Bunce's Perl Myths slides on slide share.
In short, Perl is not dead and has lots of jobs available.
Anyone who actually watches the development of Perl, would know that that there has perhaps been more work on the Perl language in the past decade, than in the previous decade.
This has been spurred on by the introduction of Perl6.
The introduction of Perl 6 spurred on, the now deeply ingrained, testing culture.
Just look at how much the Rakudo implementation of Perl 6, is tested:
Rakudo Progress http://rakudo.de/progress.png
There has also been a lot of back-porting of Perl 6 features into Perl 5.
For example, the Perl 6 "switch" statement
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
# or
use feature qw'switch say';
my $str = "testing 123";
given( $str ){
when(/(\d+)/){
say $1;
}
when( [0..10] ){
say $_, 'is equal to some number between 0 and 10';
# given, sets the current topic "$_"
}
}
There are few languages I would tie my career to. Perl will always be there and it will always be the best tool for certain kinds of jobs. But this is true for many languages. However, there are also languages which have more competition in some of the spaces where they are used. Perl is one language that has a lot more strong niches.
Still, you wouldn't restrict yourself to using just one language for your entire life - or even in one project if there are better options to solve a problem.
Career-wise, there are basic technologies which are fairly universally used, and of these I think a few of the most valuable are: relational database concepts and SQL, XML/HTML/HTTP/DOM, regular expressions. These are all basically independent of any particular vendor or language, and if you are strong in these areas, choice of language and platform are going to be informed by the problem being addressed.
Perl is, and always will be, a practical language for manipulating large amounts of data. I work in an industry where moving, converting, and parsing large amounts of text and image data is what we do, and I couldn't live without Perl.
Likewise, if you're a sysadmin (especially a Unix one), Perl is a necessary tool. There are tons of places where you need to be able to whip up a quick and dirty application that runs right along with the shell functions.
Languages have niches. Perl has a big stable niche, in many ways much more stable than fad-driven web languages. PHP, for example, is a nice little web language, but its saving grace is that it's quick and easy to develop in, not that it is a particularly great language. I'll tend to use PHP over Perl for web applications (though I use Python over PHP, if I have time), but 90% of the stuff I do in my day-to-day would be nearly impossible in PHP, and is flat trivial in Perl.
#Nate: I love Python. LOVE it. I actually worry that I love it too much, and I'm being irrational about it. PHP is a nice tool, but when your main selling point is "Quick and Easy" then you're running a risk. That was the big push behind original Visual Basic, and we all know how that worked out.
I'd discourage you from putting Perl on your resume - there's already too many people in the perl market and we don't want any more! ... just kidding.
The past is supposedly no guide to the future, but, despite having plenty of C (etc.) and Java in my 'skills toolbag' I've seen more gainful employ from my Perl than anything else over the last decade.
I suspect that offshore-perl-new-build may not be the biggest market in the future, but there's certainly active development in the city and media industries in the UK.
Otherwise, I'd just agree with the points above. Technicians with diverse skills are more able to pick the right tools, and less inclined to 'get religious' about language choice.
If you're looking at a post where the non-technical management have a strong point of view about what technology should and shouldn't be used - I'd place that one in the 'avoid' pile.
To add another separate answer - as you have noted - there is a very real danger when dealing with recruiters and others that your resume will be interpreted and things inferred that are not necessarily how you see yourself, and you might get pigeon-holed.
This WILL happen both ways - too much variation and you aren't an expert in anything OR too little variation and you are only good at one thing.
I don't have a simple answer for combatting that, except to ensure that you emphasize portable skills and also achievements which are independent of technology - making the company more money, landing new business, making new markets, etc.
Perl is another tool in your toolbox. If I have an opening and one person is narrow focused to a specific technology, and another has a broad range of skills I would be more inclined to hire the one with the wider range of skills even if they might not be quite as deeply knowledgeable. Some one who has a wide range of skills on a range of platforms is someone who can think, innovate and adapt.
I don't understand the point of this question. You have a job and you already know Perl. You can ask whether or not to learn new languages and which ones to learn (please don't, but you could), but none of us can or should predict whether or not you're going to get another job using Perl.
You ask, "Is having this job on my resume ultimately going to make me less employable, especially if the language no longer grows?"
Well, it's better than a blank resume, and you can't change your past, so really what are we talking about here?