I am in the midst of learning Perl, and I have encountered a question. What, exactly is the difference between subroutines and scripts?
A script is just a name for a (usually short) program usually contained in a single file. It's not really a scientific/technical term and therefore is pretty vague - people can refer to a "script" when discussing a 3-line quick program, or a 10000 lines of code program.
Some people refer to ANY Perl program as a "script" - see below for the historical reason. Some people, when they say "a Perl script" as opposed to a Perl "program", mean a relatively simple, relatively short program, frequently structured without using any subroutines/classes/other methods of code organization. Again, there's no standard definition.
As an aside, the reason why Perl programs are frequently called "scripts" is that Perl originally was used for writing scripts that perform work in Unix shell, the way shell scripting languages were used. The term "scripting language" means a language used to control an application, in this case Unix shell.
Of course, since then Perl has grown to become a full fledged programming language, but the word/term remained, sometimes used by inertia, sometimes derogatorily.
A subroutine (also known as a procedure, function, routine, method, or subprogram) is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code. It is frequently meant to contain code that performs the task which needs to be done several times in your program, or even by multiple programs.
A subroutine is NOT a Perl specific concept, though calling it "subroutine" is done in very few languages (most use the term function, method or procedure).
As a special side note, a "method" - in Perl as well as other languages - is a special type of subroutine which is associated with an object oriented class or an object of that class. The fact that it's merely a special case of a subroutine is, of course, highlighted by the fact that - despite deepest wishes by "Modern Perl" author chromatic - methods in Perl 5 are declared with "sub" keyword, same as regular subroutines.
As noted above, some people, when referring to a Perl program as a "script", imply that it does not contain subroutines (e.g. anything complicated enough to have a subroutine is no longer a "script" but a "program"). But that is not an accepted or formal definition - as stated, there is no definition of what a script is, everyone uses the term any which way they want.
A script is usually a file, which can contain statements and subroutines. A subroutine is something you find within a script.
Subroutines are described in detail in the perlsub manual page.
Related
I wish that John McCarthy was still alive, but...
From LISP 1.5 Programmer's Manual :
LISP can interpret and execute programs written in the form of S-
expressions. Thus, like machine language, and unlike most other higher
level languages, it can be used to generate programs for further
execution.
I need more clarification about how machine language can used to generate programs and how Lisp can do it?
All that is saying is that machine code can directly write machine instructions to memory and jump to those instructions to execute them; this is the basis of many attack vectors to break into software, in fact.
The point is, when you're writing machine code, it's easy to generate machine code. But when you're writing in a compiled language like C, you can't just generate C code at run time and then execute it - unless your program includes a C compiler.
Lisp - and, these days, many other languages, especially "scripting languages" like Perl, Python, Ruby, Tcl, Javascript, and command shells - have the ability to execute code that is generated at runtime. In Lisp, since code and data have the same structure, this is usually less work than it is in the other languages, where the code to be evaluated is generally a string that has to be parsed. (Though Perl has the ability to eval a block instead of a string, which lets the compiler do the parsing ahead of time for literal code.)
A machine language can alter itself while running. The last assembly programming i did was for MS DOS and resident program that i used to run before testing other programs. When my program misbehaved, a keystroke switched to the resident program and could peek into the running program and alter it directly before resuming. It was quite handy since I didn't have a debugger.
LISP had this from the very beginning since it was originally interpreted. You could change the definition of a function while you were running and the whole langugage was always available at runtime, even eval and define. When it started getting compiled it wasn't compiled like Algol, but partially, allowing for interpreted and compiled code to intermix at the same time. The fact that its code structure was list structure and that symbols are a data type contributed to this.
Last interview I saw with McCarthy he was asked about what he thought of modern programming languages (Not LISP family but the Algol family language Ruby, that is said to be influenced by LISP), and before answering he asked if they could represent code as data (like list structure). Since it didn't, Ruby is still behind what LISP was in the 60s in his opinion.
Many new programming languages are emerging in the Algol family and some of the most promising ones, like Perl6 and Nemerle, are getting closer to the features LISP had in the 60s.
Machine language programs can fill memory regions with arbitrary bytes. Then they can just jump to the start of such region which will thus get executed right away.
Lisp language programs can easily create arbitrary S-expressions in memory, using cons. Then they can just call eval on these S-expressions to evaluate (interpret) them.
High level languages programs can easily fill memory regions with characters representing new code in the language's syntax. But they can not run such a code.
I understand that the Perl syntax is ambiguous and that its disambiguation is non-trivial (sometimes involving execution of code during the compile phase). Regardless, does Perl have a formal grammar (albeit ambiguous and/or context-sensitive)?
From perlfaq7
Can I get a BNF/yacc/RE for the Perl language?
There is no BNF, but you can paw your
way through the yacc grammar in
perly.y in the source distribution if
you're particularly brave. The grammar
relies on very smart tokenizing code,
so be prepared to venture into toke.c
as well.
In the words of Chaim Frenkel: "Perl's
grammar can not be reduced to BNF. The
work of parsing perl is distributed
between yacc, the lexer, smoke and
mirrors."
To see the wonderful set of examples of WHY it's pretty much near impossible to parse Perl due to context influences, please look into Randal Schwartz's post: On Parsing Perl
In addition, please see the discussion in "Perl 5 Internals (Chapter 5. The Lexer and the Parser)" by Simon Cozens.
Please note that the answer is different for Perl6:
There exists a grammar for Perl6
Rakudo Perl has its own version of the grammar
Other people have posted this link before on similar questions, but I think it is fun and has a great case example: Perl Cannot Be Parsed (A Formal Proof).
From that link:
[Consider] the following devilish
snippet of code, concocted by Randal
Schwartz, and determine the correct
parse for it:
whatever / 25 ; # / ; die "this dies!";
Schwartz's Snippet can parse two different ways: if whatever is nullary
(that is, takes no arguments), the
first statement is a division in void
context, and the rest of the line is a
comment. If whatever takes an
argument, Schwartz's Snippet parses as
a call to the whatever function with
the result of a match operator, then a
call to the die() function.
This means that, in order to statically parse Perl, it must be
possible to determine from a string of
Perl 5 code whether it establishes a
nullary prototype for the whatever
subroutine.
I just post this part to show that it gets really hard really quickly.
Alternatively, many code/text editors can do a decent (though never great) job of syntax highlighting so you may start at those specs to see what they do. In fact you have inspired me, I think I will post a related question asking what editor best highlights Perl.
There is no formal grammar in the sense "this is the specification of Perl 5" (The Perl 6 effort is trying to fix that, though). But there is a formal grammar in the Perl 5 source code. Of course, understanding the code is most likely not a trivial undertaking.
Jeffrey Kegler has written some good articles about the perl grammar as well on his blog. In particular see, this post and this one. The rest of the blog has some quite interesting thoughts on parsing in general as well.
Is Perl considered a general purpose programming language?
Reading about it on Wikipedia
Perl has a Turing-complete grammar because parsing can be affected by run-time code executed during the compile phase.[41] Therefore, Perl cannot be parsed by a straight Lex/Yacc lexer/parser combination. Instead, the interpreter implements its own lexer, which coordinates with a modified GNU bison parser to resolve ambiguities in the language.
It is often said that "Only perl can parse Perl," meaning that only the Perl interpreter (perl) can parse the Perl language (Perl), but even this is not, in general, true. Because the Perl interpreter can simulate a Turing machine during its compile phase, it would need to decide the Halting Problem in order to complete parsing in every case. It's a long-standing result that the Halting Problem is undecidable, and therefore not even perl can always parse Perl. Perl makes the unusual choice of giving the user access to its full programming power in its own compile phase. The cost in terms of theoretical purity is high, but practical inconvenience seems to be rare.
So, it says that though Perl has the Turing complete badge, it is different from other languages because gives "the user access to its full programming power in its own compile phase". What does that mean? What programming power does Perl provide me at compiling phase that others don't?
There are no features of Perl that do not appear in any other language. Lisp can do anything (Lisp is an example, here.). So perhaps we can narrow the question down to what are the features of Perl that make wide behavior swings an easy thing to do.
BEGIN blocks (END blocks, too.) which alter the behavior during compile. So I can write Perl code that changes the location of modules to be loaded.
Even the following code might have a different meaning.
use Frobnify;
Frobnify->new->initialize;
Because I could have changed where Frobnify loads from:
BEGIN {
if ( [ localtime ]->[6] == 2 ) {
s|^/var|/var/days/tuesday| foreach #INC;
}
}
So on Tuesdays, I load /var/days/tuesday/perl/lib/Frobnify.pm
Source Filters can programmatically edit the code that will perform. (CAVEAT on source filters!) (crudely and roughly equivalent to LISP macros)
Somewhat along with BEGIN blocks are #INC hooks. As I can modify #INC at the beginning to see change what gets loaded. I can set a subroutine at the front of the #INC array to load anything I want to load. The hook can receive a request to load Frobnify and respond to it by loading Defrobnify.pm.
Somewhat along with this is Symbol Manipuation. After loading Defrobnify.pm, I can do this:
*Frobnify:: = \*Defrobnify::;
Now Frobnify->new creates a Defrobnify object!
Subroutine prototypes are a compile time feature that is more or less exclusive to Perl. Many of Perl's builtin functions impose special types of context on their arguments (scalar, list, reference, code-block, capture). Prototypes are a way of porting some of that functionality over to user defined subroutines.
For example, Perl allows you to effectively generate new syntactic constructs with the (&) prototype. This is used in modules like Try::Tiny to add try and catch keywords to the language:
try {
die "foo";
} catch {
warn "caught error: $_"; # not $#
};
This works because try and catch are declared as sub try (&;#) { ... }. The sub name {...} syntax is equivalent to BEGIN { *name = sub {...} } which means it has a compile time effect. In the case of try, the (&;#) prototype tells the compiler that any time it sees the identifier try, the first argument must be a bare block, and following the block is an optional list.
This is just one example of prototypes, and they are able to do many other things:
$ imposes scalar context on an argument
& imposes code context on an argument
# imposes list context on an argument
% imposes list context (with an even number of elements)
* imposes glob context on the argument
\$ imposes scalar reference context
\# imposes array reference context
... for the rest of the sigils
Due to their power (and absence in other languages) prototypes can be confusing and are best used in moderation. (like every other advanced feature of Perl).
The simple answer is that BEGIN blocks provide Turing-completeness:
BEGIN {
my $foo = turing_machine_simulator($program);
}
BEGIN blocks are executed as soon as the perl compiler sees them. This means that the compiler can be asked to do tasks of arbitrary complexity. Anything Perl can do, it can do during its compilation phase.
Looking through the perlsub and perlop manpages I've noticed that there are many references to "magic" and "magical" there (just search any of them for "magic"). I wonder why is Perl so rich in them.
Some examples:
print ++($foo = 'zz') # prints 'aaa'
printf "%d: %s", $! = 1, $! # prints '1: Operation not permitted'
while (my $line = <FH>) { ... } # $line is tested for definedness, not truth
use warnings; print "0 but true" + 1 # "0 but true" is a valid number!
When a Perl feature is described as "magic":
It means that that feature is
implemented by NBA star Magic Johnson.
Whenever Perl executes "magic", it is
actually sending an RPC call to a
remote receiver implanted in Magic
himself. He computes the answer, and
then sends a return message. The use
of Mr. Johnson for all the hard parts
of Perl provides a great abstraction
layer and simplifies porting to new
platforms. It's way easier than, say,
the Apache Portable Runtime.
Source: perrin on Perl Monks
It's official! Perl is more magical.
Hits from the following Google searches:
25 site:ruby-doc.org magic
36 site:docs.python.org magic
497 site:perldoc.perl.org magic
Magic, in Perl parlance is simply the word given to attributes applied to variables / functions that allow an extension of their functionality. Some of this functionality is available directly from Perl, and some requires the use of the C api.
A perfect example of magic is the tie interface which allows you to define your own implementation of a variable. Every operation that can be done to a variable (fetching or storing a value for instance) is exposed for reimplementation, allowing for elegant and logical syntactic constructs like a hash with values stored on disk, which are transparently loaded and saved behind the scenes.
Magic can also refer to the special ways that certain builtins can behave, such as how the first argument to map or grep can either be a block or a bare expression:
my #squares = map {$_**2} 1 .. 10;
my #roots = map sqrt, 1 .. 10;
which is not a behavior available to user defined subroutines.
Many other features of Perl, such as operator overloading or variables that can return different values when used with numeric or string operators are implemented with magic. Context could be seen as magic as well.
In a nutshell, magic is any time that a Perl construct behaves differently than a naive interpretation would suggest, an exception to the rule. Magic is of course very powerful, and should not be wielded without great care. Magic Johnson is of course involved in the execution of all magic (see FM's answer), but that is beyond the scope of this explaination.
I wonder why is Perl so rich in them.
To make things easy.
You'll find that most "magic" in Perl is to simplify the syntax for common tasks.
Because perl always Does What I Mean for some values of always.
I think (opinion more than fact) that this has to do with the organic growth viewpoint that Perl's creator Larry Wall has with the Perl language. Python is a study in the opposite approach, whose style often makes Perl hackers cringe at the perception of being forced to conform to a stylistic regime.
Some of it has to do with Perl being designed to be "efficient" at writing quick scripts to do Perl*-ish* tasks, in both wall clock time, and in keystrokes. Some of it has to do with the TMTOWTDI mantra of Perl and its followers.
Programmers tend to be opinionated about Perl's frequent usage of "magic", for some it is an eye-straining visual cacophony of chaos and disrespect for orderliness (which harkens back to the days of computer Priesthood in white lab coats behind a glass window), for others it is a shining example of getting things done efficiently, if not always obviously to the novice or outsider.
Perl's design philosophy is that simple things must be simple. This sounds good,and to some extent it is. However, there's a tradeoff involved: Making every simple thing a one-liner results in tons of special case hacks to save a few lines of code. Different people have different preferences regarding making simple operations within a language simple versus making the language specification simple. Perl is at one extreme. Java is at the other, at least among languages that people actually use. Python and C# are somewhere in between.
It is "common knowledge" that source filters are bad and should not be used in production code.
When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.
Why are source filters bad?
When is it OK to use a source filter?
Why source filters are bad:
Nothing but perl can parse Perl. (Source filters are fragile.)
When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).
When they're okay:
You're experimenting.
You're writing throw-away code.
Your name is Damian and you must be allowed to program in latin.
You're programming in Perl 6.
Only perl can parse Perl (see this example):
#result = (dothis $foo, $bar);
# Which of the following is it equivalent to?
#result = (dothis($foo), $bar);
#result = dothis($foo, $bar);
This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.
After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.
I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:
$ perl -MSmart::Comments test.pl
so as to avoid any chance that it might remain enabled in production code.
See also: Perl Cannot Be Parsed: A Formal Proof
I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.
Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)
It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.
For example, when you declare a method in MooseX::Declare like this:
method frob ($bubble, $bobble does coerce) {
... # complicated code
}
The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.
The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.
In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.
Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.
In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)
There is a nice example here that shows in what trouble you can get with source filters.
http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/
They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.