What compile time features does Perl provide that other languages don't?

What compile time features does Perl provide that other languages don't? - perl

Is Perl considered a general purpose programming language?
Reading about it on Wikipedia
Perl has a Turing-complete grammar because parsing can be affected by run-time code executed during the compile phase.[41] Therefore, Perl cannot be parsed by a straight Lex/Yacc lexer/parser combination. Instead, the interpreter implements its own lexer, which coordinates with a modified GNU bison parser to resolve ambiguities in the language.
It is often said that "Only perl can parse Perl," meaning that only the Perl interpreter (perl) can parse the Perl language (Perl), but even this is not, in general, true. Because the Perl interpreter can simulate a Turing machine during its compile phase, it would need to decide the Halting Problem in order to complete parsing in every case. It's a long-standing result that the Halting Problem is undecidable, and therefore not even perl can always parse Perl. Perl makes the unusual choice of giving the user access to its full programming power in its own compile phase. The cost in terms of theoretical purity is high, but practical inconvenience seems to be rare.
So, it says that though Perl has the Turing complete badge, it is different from other languages because gives "the user access to its full programming power in its own compile phase". What does that mean? What programming power does Perl provide me at compiling phase that others don't?

There are no features of Perl that do not appear in any other language. Lisp can do anything (Lisp is an example, here.). So perhaps we can narrow the question down to what are the features of Perl that make wide behavior swings an easy thing to do.
BEGIN blocks (END blocks, too.) which alter the behavior during compile. So I can write Perl code that changes the location of modules to be loaded.
Even the following code might have a different meaning.
use Frobnify;
Frobnify->new->initialize;
Because I could have changed where Frobnify loads from:
BEGIN {
if ( [ localtime ]->[6] == 2 ) {
s|^/var|/var/days/tuesday| foreach #INC;
}
}
So on Tuesdays, I load /var/days/tuesday/perl/lib/Frobnify.pm
Source Filters can programmatically edit the code that will perform. (CAVEAT on source filters!) (crudely and roughly equivalent to LISP macros)
Somewhat along with BEGIN blocks are #INC hooks. As I can modify #INC at the beginning to see change what gets loaded. I can set a subroutine at the front of the #INC array to load anything I want to load. The hook can receive a request to load Frobnify and respond to it by loading Defrobnify.pm.
Somewhat along with this is Symbol Manipuation. After loading Defrobnify.pm, I can do this:
*Frobnify:: = \*Defrobnify::;
Now Frobnify->new creates a Defrobnify object!

Subroutine prototypes are a compile time feature that is more or less exclusive to Perl. Many of Perl's builtin functions impose special types of context on their arguments (scalar, list, reference, code-block, capture). Prototypes are a way of porting some of that functionality over to user defined subroutines.
For example, Perl allows you to effectively generate new syntactic constructs with the (&) prototype. This is used in modules like Try::Tiny to add try and catch keywords to the language:
try {
die "foo";
} catch {
warn "caught error: $_"; # not $#
};
This works because try and catch are declared as sub try (&;#) { ... }. The sub name {...} syntax is equivalent to BEGIN { *name = sub {...} } which means it has a compile time effect. In the case of try, the (&;#) prototype tells the compiler that any time it sees the identifier try, the first argument must be a bare block, and following the block is an optional list.
This is just one example of prototypes, and they are able to do many other things:
$ imposes scalar context on an argument
& imposes code context on an argument
# imposes list context on an argument
% imposes list context (with an even number of elements)
* imposes glob context on the argument
\$ imposes scalar reference context
\# imposes array reference context
... for the rest of the sigils
Due to their power (and absence in other languages) prototypes can be confusing and are best used in moderation. (like every other advanced feature of Perl).

The simple answer is that BEGIN blocks provide Turing-completeness:
BEGIN {
my $foo = turing_machine_simulator($program);
}
BEGIN blocks are executed as soon as the perl compiler sees them. This means that the compiler can be asked to do tasks of arbitrary complexity. Anything Perl can do, it can do during its compilation phase.

Related

Is there an option in the perl command to check for undefined functions?

Background
The perl command has several idiot-proofing command line options described in perldoc perlrun:
-c causes Perl to check the syntax of the program and then exit without executing it.
-w prints warnings about dubious constructs, such as variable names that are mentioned only once and scalar variables that are used before being set, etc.
-T forces "taint" checks to be turned on so you can test them.
After reading through these options, I could not find one that detects undefined functions. For example, I had a function I used called NFD() that imports the Unicode::Normalize package. However, being a perl novice, I did not know if this was already under the fold of the standard perl library or not. And perl -c nor any of the other options uncovered this error for me, rather a coworker noticed that it was somehow undefined (and not inside the standard libraries). Therefore, I was curious about the following:
Question
Is there an option in the perl command to automatically detect if there is an undefined function not already inside an imported package?

I did not know if this was already under the fold of the standard perl library or not.
It sounds like you want to distinguish imported subs from other subs and builtin functions.
If you always list your imports explicitly instead of accepting the defaults like I do, then you'll not only know which subs are imported, you'll know from which module they were imported.
use Foo::Bar; # Default imports
use Foo::Bar qw( ); # Import nothing ("()" also works)
use Foo::Bar qw( foo bar ); # Import subs foo and bar.
Is there an option in the perl command to check for undefined functions?
On the other hand, if you are trying to identify the subs that you call that don't exist or that aren't defined at compile time, then this question is a duplicate of How can I smoke out undefined subroutines?.

Aside from the particular technical details, you can't know if a function will be defined at some time in the future when you plan to use it. As a dynamic language, thinks come into and go out of existence, and even change their definitions, while the programming is running.
Jeffrey Kegler wrote Perl Cannot Be Parsed: A Formal Proof that relied on this idea. The details of the halting problem aren't as interesting as workings of a dynamic language.
And, for what it's worth, those command-line options don't make programs idiot-proof. For example, in Mastering Perl I show that merely adding -T to a program doesn't magically make it secure, as many would have you believe.
What were you doing with Unicode::Normalize? It has an NFD already but your question makes it sound like you were wrapping it somehow:
use Unicode::Normalize qw(NFD);

Why doesn't this run forever?

I was looking at a rather inconclusive question about whether it is best to use for(;;) or while(1) when you want to make an infinite loop and I saw an interesting solution in C where you can #define "EVER" as a constant equal to ";;" and literally loop for(EVER).
I know defining an extra constant to do this is probably not the best programming practice but purely for educational purposes I wanted to see if this could be done with Perl as well.
I tried to make the Perl equivalent, but it only loops once and then exits the loop.
#!/usr/bin/perl -w
use strict;
use constant EVER => ';;';
for (EVER) {
print "FOREVER!\n";
}
Output:
FOREVER!
Why doesn't this work in perl?

C's pre-processor constants are very different from the constants in most languages.
A normal constant acts like a variable which you can only set once; it has a value which can be passed around in most of the places a variable can be, with some benefits from you and the compiler knowing it won't change. This is the type of constant that Perl's constant pragma gives you. When you pass the constant to the for operator, it just sees it as a string value, and behaves accordingly.
C, however, has a step which runs before the compiler even sees the code, called the pre-processor. This actually manipulates the text of your source code without knowing or caring what most of it means, so can do all sorts of things that you couldn't do in the language itself. In the case of #DEFINE EVER ;;, you are telling the pre-processor to replace every occurrence of EVER with ;;, so that when the actual compiler runs, it only sees for(;;). You could go a step further and define the word forever as for(;;), and it would still work.
As mentioned by Andrew Medico in comments, the closest Perl has to a pre-processor is source filters, and indeed one of the examples in the manual is an emulation of #define. These are actually even more powerful than pre-processor macros, allowing people to write modules like Acme::Bleach (replaces your whole program with whitespace while maintaining functionality) and Lingua::Romana::Perligata (interprets programs written in grammatically correct Latin), as well as more sensible features such as adding keywords and syntax for class and method declarations.

It doesn't run forever because ';;' is an ordinary string, not a preprocessor macro (Perl doesn't have an equivalent of the C preprocessor). As such, for (';;') runs a single time, with $_ set to ';;' that one time.

Andrew Medico mentioned in his comment that you could hack it together with a source filter.
I confirmed this, and here's an example.
use Filter::cpp;
#define EVER ;;
for (EVER) {
print "Forever!\n";
}
Output:
Forever!
Forever!
Forever!
... keeps going ...
I don't think I would recommend doing this, but it is possible.

This is not possible in Perl. However, you can define a subroutine named forever which takes a code block as a parameter and runs it again and again:
#!/usr/bin/perl
use warnings;
use strict;
sub forever (&) {
$_[0]->() while 1
}
forever {
print scalar localtime, "\n";
sleep 1;
};

What is the difference between subroutines and scripts in Perl?

I am in the midst of learning Perl, and I have encountered a question. What, exactly is the difference between subroutines and scripts?

A script is just a name for a (usually short) program usually contained in a single file. It's not really a scientific/technical term and therefore is pretty vague - people can refer to a "script" when discussing a 3-line quick program, or a 10000 lines of code program.
Some people refer to ANY Perl program as a "script" - see below for the historical reason. Some people, when they say "a Perl script" as opposed to a Perl "program", mean a relatively simple, relatively short program, frequently structured without using any subroutines/classes/other methods of code organization. Again, there's no standard definition.
As an aside, the reason why Perl programs are frequently called "scripts" is that Perl originally was used for writing scripts that perform work in Unix shell, the way shell scripting languages were used. The term "scripting language" means a language used to control an application, in this case Unix shell.
Of course, since then Perl has grown to become a full fledged programming language, but the word/term remained, sometimes used by inertia, sometimes derogatorily.
A subroutine (also known as a procedure, function, routine, method, or subprogram) is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code. It is frequently meant to contain code that performs the task which needs to be done several times in your program, or even by multiple programs.
A subroutine is NOT a Perl specific concept, though calling it "subroutine" is done in very few languages (most use the term function, method or procedure).
As a special side note, a "method" - in Perl as well as other languages - is a special type of subroutine which is associated with an object oriented class or an object of that class. The fact that it's merely a special case of a subroutine is, of course, highlighted by the fact that - despite deepest wishes by "Modern Perl" author chromatic - methods in Perl 5 are declared with "sub" keyword, same as regular subroutines.
As noted above, some people, when referring to a Perl program as a "script", imply that it does not contain subroutines (e.g. anything complicated enough to have a subroutine is no longer a "script" but a "program"). But that is not an accepted or formal definition - as stated, there is no definition of what a script is, everyone uses the term any which way they want.

A script is usually a file, which can contain statements and subroutines. A subroutine is something you find within a script.
Subroutines are described in detail in the perlsub manual page.

Why is there so much "magic" in Perl?

Looking through the perlsub and perlop manpages I've noticed that there are many references to "magic" and "magical" there (just search any of them for "magic"). I wonder why is Perl so rich in them.
Some examples:
print ++($foo = 'zz') # prints 'aaa'
printf "%d: %s", $! = 1, $! # prints '1: Operation not permitted'
while (my $line = <FH>) { ... } # $line is tested for definedness, not truth
use warnings; print "0 but true" + 1 # "0 but true" is a valid number!

When a Perl feature is described as "magic":
It means that that feature is
implemented by NBA star Magic Johnson.
Whenever Perl executes "magic", it is
actually sending an RPC call to a
remote receiver implanted in Magic
himself. He computes the answer, and
then sends a return message. The use
of Mr. Johnson for all the hard parts
of Perl provides a great abstraction
layer and simplifies porting to new
platforms. It's way easier than, say,
the Apache Portable Runtime.
Source: perrin on Perl Monks
It's official! Perl is more magical.
Hits from the following Google searches:
25 site:ruby-doc.org magic
36 site:docs.python.org magic
497 site:perldoc.perl.org magic

Magic, in Perl parlance is simply the word given to attributes applied to variables / functions that allow an extension of their functionality. Some of this functionality is available directly from Perl, and some requires the use of the C api.
A perfect example of magic is the tie interface which allows you to define your own implementation of a variable. Every operation that can be done to a variable (fetching or storing a value for instance) is exposed for reimplementation, allowing for elegant and logical syntactic constructs like a hash with values stored on disk, which are transparently loaded and saved behind the scenes.
Magic can also refer to the special ways that certain builtins can behave, such as how the first argument to map or grep can either be a block or a bare expression:
my #squares = map {$_**2} 1 .. 10;
my #roots = map sqrt, 1 .. 10;
which is not a behavior available to user defined subroutines.
Many other features of Perl, such as operator overloading or variables that can return different values when used with numeric or string operators are implemented with magic. Context could be seen as magic as well.
In a nutshell, magic is any time that a Perl construct behaves differently than a naive interpretation would suggest, an exception to the rule. Magic is of course very powerful, and should not be wielded without great care. Magic Johnson is of course involved in the execution of all magic (see FM's answer), but that is beyond the scope of this explaination.

I wonder why is Perl so rich in them.
To make things easy.
You'll find that most "magic" in Perl is to simplify the syntax for common tasks.

Because perl always Does What I Mean for some values of always.

I think (opinion more than fact) that this has to do with the organic growth viewpoint that Perl's creator Larry Wall has with the Perl language. Python is a study in the opposite approach, whose style often makes Perl hackers cringe at the perception of being forced to conform to a stylistic regime.
Some of it has to do with Perl being designed to be "efficient" at writing quick scripts to do Perl*-ish* tasks, in both wall clock time, and in keystrokes. Some of it has to do with the TMTOWTDI mantra of Perl and its followers.
Programmers tend to be opinionated about Perl's frequent usage of "magic", for some it is an eye-straining visual cacophony of chaos and disrespect for orderliness (which harkens back to the days of computer Priesthood in white lab coats behind a glass window), for others it is a shining example of getting things done efficiently, if not always obviously to the novice or outsider.

Perl's design philosophy is that simple things must be simple. This sounds good,and to some extent it is. However, there's a tradeoff involved: Making every simple thing a one-liner results in tons of special case hacks to save a few lines of code. Different people have different preferences regarding making simple operations within a language simple versus making the language specification simple. Perl is at one extreme. Java is at the other, at least among languages that people actually use. Python and C# are somewhere in between.

Is it okay to use modules from within subroutines?

Recently I start playing with OO Perl and I've been creating quite a bunch of new objects for a new project that I'm working on. As I'm unfamilliar with any best practice regarding OO Perl and we're kind in a tight rush to get it done :P
I'm putting a lot of this kind of code into each of my function:
sub funcx{
use ObjectX; # i don't declare this on top of the pm file
# but inside the function itself
my $obj = new ObjectX;
}
I was wondering if this will cause any negative impact versus putting on the use Object line on top of the Perl modules outside of any function scope.
I was doing this so that I feel it's cleaner in case I need to shift the function around.
And the other thing that I have noticed is that when I try to run a test.pl script on the unix server itself which test my objects, it slow as heck. But when the same code are run through CGI which is connected to an apache server, the web page doesn't load as slowly.

Where to put use?
use occurs at compile time, so it doesn't matter where you put it. At least from a purely pragmatic, 'will it work', point of view. Because it happens at compile time use will always be executed, even if you put it in a conditional. Never do this: if( $foo eq 'foo' ) { use SomeModule }
In my experience, it is best to put all your use statements at the top of the file. It makes it easy to see what is being loaded and what your dependencies are.
Update:
As brian d foy points out, things compiled before the use statement will not be affected by it. So, the location can matter. For a typical module, location does not matter, however, if it does things that affect compilation (for example it imports functions that have prototypes), the location could matter.
Also, Chas Owens points out that it can affect compilation. Modules that are designed to alter compilation are called pragmas. Pragmas are, by convention, given names in all lower-case. These effects apply only within the scope where the module is used. Chas uses the integer pragma as an example in his answer. You can also disable a pragma or module over a limited scope with the keyword no.
use strict;
use warnings;
my $foo;
print $foo; # Generates a warning
{ no warnings 'unitialized`; # turn off warnings for working with uninitialized values.
print $foo; # No warning here
}
print $foo; # Generates a warning
Indirect object syntax
In your example code you have my $obj = new ObjectX;. This is called indirect object syntax, and it is best avoided as it can lead to obscure bugs. It is better to use this form:
my $obj = ObjectX->new;
Why is your test script slow on the server?
There is no way to tell with the info you have provided.
But the easy way to find out is to profile your code and see where the time is being consumed. NYTProf is another popular profiling tool you may want to check out.
Best practices
Check out Perl Best Practices, and the quick reference card. This page has a nice run down of Damian Conway's OOP advice from PBP.
Also, you may wish to consider using Moose. If the long script startup time is acceptable in your usage, then Moose is a huge win.

question 1
It depends on what the module does. If it has lexical effects, then it will only affect the scope it is used in:
my $x;
{
use integer;
$x = 5/2; #$x is now 2
}
my $y = 5/2; #$y is now 2.5
If it is a normal module then it makes no difference where you use it, but it is common to use all of those modules at the top of the program.
question 2
Things that can affect the speed of a program between machines
speed of the processor
version of modules installed (some modules have XS versions that are much faster)
version of Perl
number of entries in PERL5LIB
speed of the drive

daotoad and Chas. Owens already answered the part of your question pertaining to the position of use statements. Let me remark on something else here:
I was doing this so that I feel it's
cleaner in case I need to shift the
function around.
Personally, I find it much cleaner to have all the used modules in one place at the top of the file. You won't have to search for use statements to see what other modules are being used and a quick glance will tell you what is being used and even what is not being used.
Regarding your performance problem: with Apache and mod_perl the Perl interpreter will have to parse and compile your used modules only once. The next time the script is run, execution should be much faster. On the command line, however, a second run doesn't get this benefit.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse