Using Haskell to extend Perl? - perl

Has anyone ever written a Haskell extension to Perl? Maybe something simple, like a function that calculates the fib. sequence? I'm interested in using Haskell, and I see some overlap between the Haskell and Perl community. Any pointers to Haskell / Perl projects, or cool things that manage to use both of these? I've seen Language::Haskell -which is only an interpreter- but it seems poorly documented, 6 years old, and lots of fail.
Is it possible to build extentions to Perl using ghci comparable to using XS (something I don't claim to know anything about)? I realize this question is probably all kinds of wrong, and badly worded. I'm attempting two things that I know little about - Haskell and extending Perl (which have both always interested me). Feel free to edit this.

The closest work was Inline::Haskell I think, during the pugs / perl6 time.
You can also embed Perl5 in a Haskell program: http://hackage.haskell.org/package/HsPerl5
The Haskell FFI happily supports calling into Haskell from other languages, but I'm not sure this is sensible in the larger scheme of things. Sounds like you're doing it wrong.

It's perhaps worth noting here that you can write shell scripts in Haskell as well using runhaskell:
#! /usr/bin/env runhaskell
There's HSH for mixing shell expressions into Haskell programs.
And the Simple UNIX Tools Haskell wiki page is full of ideas too.

Nothing to Perl but more about Scripting in Haskell

Related

Perl does not have a simple #include, OK, but WHY?

Perl does not have a C style preprocessor level "include" function. That is how it is, and there are numerous sites that explain how to more or less emulate the same sort of behavior.
The one thing I couldn't find on any of these sites is any explanation for WHY perl does not have this functionality. Given that Perl often provides many different ways to accomplish the same thing, it is a curious omission.
Can somebody please explain why the decision was made to exclude this sort of functionality?
Perl already has require, do, eval and here documents among other things. It doesn't need a builtin preprocessor, if you need one that badly, there are filters. http://perldoc.perl.org/perlfilter.html
In general, nobody wants #include, even C and C++ programmers would mostly be happy to give it up in exchange for:
Faster compiles
Clean module system
#include is legacy, period. If a mainstream language designer announced tomorrow that they were adding #include to (your favorite language here) you'd probably see mass hysteria, laughing, and loss of confidence in that designer.
Language designers don't implement #include in any new language, there are simply better ways to do it. In general the trend is to attempt to achieve single pass lexing. Preprocessing requires you to incrementally expand #includes and potentially revisit the same characters repeatedly. It has been wrought with problems, and is one of the reasons that C++ is such dog to compile. It was ok in the 60s and 70s when memory and CPU were tiny and languages and problems were simpler, as were codebases. Nowadays, you want to be able to compile a "library" once, store its type metadata with it so the compiler can access it efficiently without rescanning it. That is what Microsoft does anyway with precompiled headers.
So what would #include be good for?
Modules ? No. See above. Modules are compiled once, export their metadata efficiently, they don't pollute the namespace of the clients, they don't recursively inject other includes, they can be distributed in binary form, among umpteen other advantages that I'm not even smart enough to think of.
Including macros ? No. Replace with constants, inlining and generic programming. All of which can be precompiled and expored from a module.
Splicing in generated code ? Better ways to do it anyway. See modules.
The only useful functionality for the preprocessor, IMO, is conditional compilation.
#ifdef _WIN32_
// do windowsy stuff
#else
#endif
Again, Perl can do this with do, eval or require as well.
Perl doesn't have or lack it any more than C does.
The C preprocessor was designed such that it and C need to know as little as possible about each other. There is no reason why you can't use it with Perl.
So why don't Perl programmers do it?
As codenhein explains, it's generally a bad idea to use an include mechanism with a compiler that don't know anything about each other, as it leaves you open to some crazy errors that neither can diagnose; the fact that C programmers are used to it doesn't change that.

Is there runtime flow chart for Perl?

I am trying to better understand logic and flow of exceptions. So i got to state that i really feeled lack of understanding how Perl interpretes and runs programs, which phases are involved and what happens on every phase.
For example, I'd like to understand, when are binded STD* IO and when released, what is happening with $SIG{*} things, how they are depended with execepions, how program dies, etc. I'd like to have better insight of internals mechanics.
I am looking for links or books. I prefer some material which has also visual charts involved but this is not mandatory. I'd like to see some "big picture" of whole process, then i have already possibilities to dig further if i find it necessary.
I found Chapter 18th in Programming Perl gives overview of compiling phase and i try to work it trough, but i appreciate other good sources too.
Some alternative sources (there are not very many):
Mannning's Extending and Embedding Perl, which is the go-to reference on Perl's internals outside of the source
The chapter on the Perl internals in Advanced Perl Programming, which may be exactly what you want
Simon Cozens's Perl internals FAQ
Those may be more focused to what you're looking for. I'm not sure any of them explicitly spells out the interpreter's runtime execution order, though. The first one is a better "I want to work with this stuff" book; the second two are probably good introductory references.
Some of the questions you ask are not, as far as I know, explicitly documented - the I/O question being one I can't think of a good source for in particular. Exception handling is documented very well in Try::Tiny's documentation, and it's what we use for exceptions. Signal handling is messy, but perlipc documents it pretty well. With threads, you may be stuck with unsafe signals - I generally avoid threads in favor of multiple processes unless I must have shared memory.
You might start with these topics accessible via the perldoc program:
Internals and C Language Interface
perlembed Perl ways to embed perl in your C or C++ application
perldebguts Perl debugging guts and tips
perlxstut Perl XS tutorial
perlxs Perl XS application programming interface
perlxstypemap Perl XS C/Perl type conversion tools
perlclib Internal replacements for standard C library functions
perlguts Perl internal functions for those doing extensions
perlcall Perl calling conventions from C
perlmroapi Perl method resolution plugin interface
perlreapi Perl regular expression plugin interface
perlreguts Perl regular expression engine internals
perlapi Perl API listing (autogenerated)
perlintern Perl internal functions (autogenerated)
perliol C API for Perl's implementation of IO in Layers
perlapio Perl internal IO abstraction interface
perlhack Perl hackers guide
perlsource Guide to the Perl source tree
perlinterp Overview of the Perl interpreter source and how it works
perlhacktut Walk through the creation of a simple C code patch
perlhacktips Tips for Perl core C code hacking
perlpolicy Perl development policies
perlgit Using git with the Perl repository

How can i test Perl code for DRY (Don't Repeat Yourself)

For Python, we could use something like Python Code Clone Detector
But i just could not find anything for Perl.
With reference to DRY, Catalyst mentions that its build on DRY principle. and if it is i would imagine some tool might have been used to verify that claim.
Furthermore does Perl promote DRY or not ? I know for sure it promotes repeat Others by using CPAN.
You probably mean "Perl promotes 'do not repeat others' by providing CPAN", and that is certainly true.
However, DRY is more of a general programming principle (write many specialized, small functions that can be parametrized properly by their arguments instead of writing monolithic functions that "do it all") than a language feature. You can write DRY-compliant code in C++, Python, Perl, Ruby, C and most others. Some languages require more boilerplate, some less.
Perl definitely allows for small functions with few boilerplate by providing concise language constructs.
I don't know of tools detecting non-DRY code for Perl, though.

Why is Perl used so extensively in biology research? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I work as support staff in a biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk.
Why is Perl used so much in biology?
Lincoln Stein highlighted some of the saving graces of Perl for bioinformatics in his article:
How Perl Saved the Human Genome Project.
From his analysis:
I think several factors are responsible:
Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language.
Perl is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry. Of course this flexibility can be also be a curse. I talk more about the problems with Perl below.
Perl is component-oriented. Perl encourages people to write their software in small modules, either using Perl library modules or with the classic Unix tool-oriented approach. External programs can easily be incorporated into a Perl script using a pipe, system call or socket. The dynamic loader introduced with Perl5 allows people to extend the Perl language with C routines or to make entire compiled libraries available for the Perl interpreter. An effort is currently under way to gather all the world's collected wisdom about biological data into a set of modules called "bioPerl" (discussed at length in an article to be published later in the Perl Journal).
Perl is easy to write and fast to develop in. The interpreter doesn't require you to declare all your function prototypes and data types in advance, new variables spring into existence as needed, calls to undefined functions only cause an error when the function is needed. The debugger works well with Emacs and allows a comfortable interactive style of development.
Perl is a good prototyping language. Because Perl is quick and dirty, it often makes sense to prototype new algorithms in Perl before moving them to a fast compiled language.
Sometimes it turns out that Perl is fast enough so that of the algorithm doesn't have to be ported; more frequently one can write a small core of the algorithm in C, compile it as a dynamically loaded module or external executable, and leave the rest of the application in Perl (for an example of a complex genome mapping application implemented in this way, see http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/).
Perl is a good language for Web CGI scripting, and is growing in importance as more labs turn to the Web for publishing their data.
The real answer probably has less to do with Perl than you think. Many of the things that happen are accidents of history. At the time, way back when, Perl was pretty popular, Java was getting more popular, not too many people were paying attention to Python, and Ruby was just getting started.
The people who needed to get work done used Perl and made some libraries in Perl, and other people started using those libraries. Once people start using something that is moderately useful to them, they tend not to switch (economists call those "switching costs"). From there, even more people start using it because a lot of other people are using it.
The same evolution might not happen today. I'd say that Perl, Python, and Ruby are all completely adequate and up to the task. All the things that mobrule quotes from Lincoln Stein could apply to any of the three today. If everyone had to start from scratch today, any one of those languages could be the one that everyone uses.
I've noticed, from my own client base though (a very small and unrepresentative sample of biotech), that the people pushing the programming for a lot of the biological stuff seemed to be at least part-time sysadmins who were supporting scientists. The scientists worried about the science and did some light programming, but the IT support people were doing a lot of the heavy lifting for the non-science parts. Perl is very well positioned as a sysadmin tool since it's the duct-tape of the internet.
Probably because Perl is good at manipulating strings, and much research in genetics involves the manipulation of veeery long "ACTGCATG..." strings. Just guessing...
I use lots of Perl for dealing with qualitative and quantitative data in social science research. In terms of getting things done (largely with text) quickly, finding libraries on CPAN (nice central location), and generally just getting things done quickly, it can't be surpassed.
Perl is also excellent glue, so if you have some instrumental records, and you need to glue them to data analysis routines, then Perl is your language.
Perl seems to be the language of choice for bioinformatics - there's even an O'Reilly title on just this subject: Beginning Perl for Bioinformatics.
Perl is very powerful when it comes to deal with text and it's present in almost every Linux/Unix distribution. In bioinformatics, not only are sequence data very easy to manipulate with Perl, but also most of the bionformatics algorithms will output some kind of text results.
Then, the biggest bioinformatics centers like the EBI had that great guy, Ewan Birney, who was leading the BioPerl project. That library has lots of parsers for every kind of popular bioinformatics algorithms' results, and for manipulating the different sequence formats used in major sequence databases.
Nowadays, however, Perl is not the only language used by bioinformaticians: along with sequence data, labs produce more and more different kinds of data types and other languages are more often used in those areas.
The R statistics programming language for example, is widely used for statistical analysis of microarray and qPCR data (among others). Again, why are we using it so much? Because it has great libraries for that kind of data (see bioconductor project).
Now when it comes to web development, CGI is not really state of the art today, but people who know Perl may stick to it. In my company though it is no longer used...
I hope this helps.
Perl basically forces very short development cycles. That's the kind of development that gets stuff done.
It's enough to outweigh Perl's disadvantages.
Bioinformatics deals primarily in text parsing and Perl is the best programming language for the job as it is made for string parsing. As the O'Reilly book (Beginning Perl for Bioinformatics) says that "With [Perl]s highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis."
This seems to be a pretty comprehensive response. Perhaps one thing missing, however, is that most biologists (until recently, perhaps) don't have much programming experience at all. The learning curve for Perl is much lower than for compiled languages (like C or Java), and yet Perl still provides a ton of features when it comes to text processing. So what if it takes longer to run? Biologists can definitely handle that. Lab experiments routinely take one hour or more finish, so waiting a few extra minutes for that data processing to finish isn't going to kill them!
Just note that I am talking here about biologists that program out of necessity. I understand that there are some very skilled programmers and computer scientists out there that use Perl as well, and these comments may not apply to them.
People missed out DBI, the Perl abstract database interface that makes it really easy to work with bioinformatic databases.
There is also the one-liner angle. You can write something to reformat data in a single line in Perl and just use the -pe flag to embed that at the command line. Many people using AWK and sed moved to Perl. Even in full programs, file I/O is incredibly easy and quick to write, and text transformation is expressive at a high level compared to any engineering language around. People who use Java or even Python for one-off text transformation are just too lazy to learn another language. Java especially has a high dependence on the JVM implementation and its I/O performance.
At least you know how fast or slow Perl will be everywhere, slightly slower than C I/O. Don't learn grep, cut, sed, or AWK; just learn Perl as your command line tool, even if you don't produce large programs with it. Regarding CGI, Perl has plenty of better web frameworks such as Catalyst and Mojolicious, but the mindshare definitely came from CGI and bioinformatics being one of the earliest heavy users of the Internet.
Perl is very easy to learn as compared to other languages. It can fully exploit the biological data which is becoming the big data. It can manipulate big data and perform good for manipulation data curation and all type of DNA programming, automation of biology has become easy due languages like Perl, Python and Ruby. It is very easy for those who are knowing biology, but not knowing how to program that in other programming languages.
Personally, and I know this will date me, but it's because I learned Perl first. I was being asked to take FASTA files and mix with other FASTA files. Perl was the recommended tool when I asked around.
At the time I'd been through a few computer science classes, but I didn't really know programming all that well.
Perl proved fairly easy to learn. Once I'd gotten regular expressions into my head I was parsing and making new FASTA files within a day.
As has been suggested, I was not a programmer. I was a biochemistry graduate working in a lab, and I'd made the mistake of setting up a Linux server where everyone could see me. This was back in the day when that was an all-day project.
Anyway, Perl became my goto for anything I needed to do around the lab. It was awesome, easy to use, super flexible, other Perl guys in other labs we're a lot like me.
So, to cut it short, Perl is easy to learn, flexible and forgiving, and it did what I needed.
Once I really got into bioinformatics I picked up R, Python, and even Java. Perl is not that great at helping to create maintainable code, mostly because it is so flexible. Now I just use the language for the job, but Perl is still one of my favorite languages, like a first kiss or something.
To reiterate, most bioinformatics folks learned coding by just kluging stuff together, and most of the time you're just trying to get an answer for the principal investigator (PI), so you can't spend days on code design. Perl is superb at just getting an answer, it probably won't work a second time, and you will not understand anything in your own code if you see it six months later; BUT if you need something now, then it is a good choice even though I mostly use Python now.
I hope that gives you an answer from someone who lived it.

Why is Perl the best choice for most string manipulation tasks?

I've heard that Perl is the go-to language for string manipulation (and line noise ;). Can someone provide examples and comparisons with other language(s) to show me why?
It is very subjective, so I wouldn't say that Perl is the best choice, but it is certainly a valid choice for string manipulation. Other alternatives are Tcl, Python, AWK, etc.
I like Perl's capabilities because it has excellent support (better than POSIX as pointed out in the comment) for fast regexs and the implicit variables makes it easy to do basic string crunching with very little code.
If you have a *nix background a lot of what you already know will apply to Perl as well, which makes it fairly easy to pick up for a lot of people.
Perl -> Practical Extraction and Reporting Language
Perl's strength(when it comes to string processing) lies in it's very powerful Regular expression engine.
Because of this there are many people in the field of BioInformatics using Perl as their
main tool, hence the large number of posts about BioPerl on PerlMonks . In BioInformatics they work with strings a lot , they call them "sequences"(I don't know much about this).
Perlmonks.org is the heart of the Perl community, check out the immense number of hits
when you search for site:perlmonks.org regex 20,000 hits
You cannot ignore the sheer number of modules on CPAN:
375 modules under the namespace String on CPAN(Perl's module repository)
241 in Regex namespace
156 in Regexp namespace.
This is very clear evidence that Perl is a very powerful language when it comes to string processing.
So if you want to do some string processing and you're using Perl, you've got it covered :)
To address the second part of your question: Perl's reputation for line noise comes from 4 kinds of people:
Overly clever (for their own good) hackers (or sometimes just hacks) who value cleverness and showing off over readability. "If it was hard to write it should be hard to read" is NOT just a mythical attitude.
People who wouldn't know good software development if it hit them over the head with a cluebat. Such as people who save a couple of characters in a program by using $_ instead of a named variable. In a nested scope. Or never heard of comments. Or self-documenting identifiers. Or whitespace.
People who think that software development == code golf. More seriously, that the less the amount of characters in the code, the more readable it is, because they misunderstand what "conciseness" means in code.
(NOTE: first 2 sets are not mutually exclusive)
People who code/hack in perl (e.g. SysAdmins) who have very little training, experience or incentive to do software development. E.g. the percentage of people using Perl who do quick and dirty hacks with bad style and worse code quality is probably higher than, say Python.
Just for reference, 80% of awful Perl "code" in my $work falls under this - it was written by financial analysts who are smart enough to pick up a Perl book and some earlier scripts, clone off a script that does what business need is, and don't have CS/programming background to worry about how readable/maintainable their code was.
In other (and less snide) words, you can write beautiful, incredibly readable and easy to maintain software in Perl. It all depends on who does the writing, what their priorities and skills are. Also, just like with any other language, you can write a miserable write-only mess with it.
The difference from other languages is that very often, the write-onlyness of said mess, when done in Perl, does indeed consist of very high density of non-letter characters (sygils and special characters in poorly written RegExes). This high density can indeed, asymptotically approximate line noise.
Because It is what is perl made for. Because Perl is expressive, powerful and fast. I have beaten many times specialized products with small and dirty script in perl written in few minutes. For example, outer join and large join vs. MySQL (just because can't do merge join), ETL processing vs. Java Hadoop (because I have years experience to write it effectively and perl IO layer is just great) and so and so.
It's a very subjective question. Perhaps the true answer is that Perl has a nice syntax (incl. the regex syntax) that makes people want to sign it high praises over other languages? IMHO, any language that supports a rich regex syntax would be considerablly powerfull at string manipulation.
Kids these days! Back in the day, all we had was SNOBOL -- and we liked it! Try it sometime...you never know, you might want something respectable to fall back on when this Perl fad runs its course!
Perl is widely used for string manipulation tasks as its string manipulation API is easy to learn. And also its regex is widely used. It has been in use for a very long time and anyone with a Unix background would pick up perl very easily. Historically, perl was developed in the late 80's for report processing tasks and was "originally" developed for text processing tasks. So till date, the trend continues as anyone with a string manipulation task or text processing task would opt for perl as the first choice. Its not that other languages like python arent up to the task, but perl's popular in this area.
I like Perl a lot, write books about it, publish a magazine about it, and so on. I don't think I would ever say it's the best language to do anything in. A lot of that has to do with the task you need to do. For many string processing tasks, ETL, data cleanup, and so in, Perl is a very strong and capable language. You wouldn't have that much trouble doing simple tasks.
Your comment sounds like it comes from the early 1990s though, when the rest of the world hadn't caught up. Many of the dynamic languages are now up to task, so you might not have to switch languages. If you decide to use Perl and run into problems, there are plenty of people here who are willing to help, and not all of us will fault you if you choose something else. :)
At the beginning, Perl was developed for easy report processing and dealing with text files, thus it's got a very strong REGEX support. Most of the info on REGEX you can find in perldoc.
Perl was the go-to language for a long time. The problem is it can be pretty messy and difficult to maintain (some people can write Perl that avoids this, but it is very easy to wrote ugly code). I would not tell you to avoid Perl, but many have moved on to some modern alternatives.
I would recommend learning one of the newer scripting languages such as Python or Ruby. Both will work very well for your needs, and can easily handle more difficult tasks later on. They're both quite nice to work in, after having written C and Perl for so long.
In short, Perl would be a good hammer for this nail. Python and Ruby would be nail-guns.
I disagree that Perl is the best language for text processing. Simple things are easy; to replace foo with bar:
$data =~ s/foo/bar/g;
Harder things are not simple, though. Look at Data::SExpression, for example. It is a lot of code to do something very simple.
An similar implementation in Haskell with PArrow looks something like:
import Text.ParserCombinators.PArrow
data Atom = QuotedString String | Symbol String
deriving (Show, Eq)
data Sexp = Sexp [Sexp] | Atom Atom
deriving (Eq)
quotedString :: Char -> Char -> MD a Atom
quotedString quoteChar escapeChar = between q q inside >>^ QuotedString
where q = char quoteChar
inside = many $ (char escapeChar >>> anyChar) <+> notChar quoteChar
doubleQuotedString, symbol :: MD a Atom
doubleQuotedString = quotedString '"' '\\'
symbol = word >>^ Symbol
atom, sexp :: MD a Sexp
atom = (doubleQuotedString <+> symbol) >>^ Atom
sexp = atom <+> (between (char '(') (char ')') sexp' >>^ Sexp)
where sexp' = sepBy1 sexp spaces
Just sayin'. Perl is not the end-all-and-be-all of text manipulation. There are many reasons to prefer Perl to other languages, but parsing is not one of them.