Explain the deviousness of the Perl "preamble" - perl

The Perl manual describes a totally devious construct that will work under any of csh, sh, or Perl, such as the following:
eval '(exit $?0)' && eval 'exec perl -wS $0 ${1+"$#"}'
& eval 'exec /usr/bin/perl -wS $0 $argv:q'
if $running_under_some_shell;
Devious indeed... can someone please explain in detail how this works?

The idea is that those three lines do 3 different things if they're evaluated in a standard Bourne shell (sh), a C shell (csh), or Perl. This hack is only needed on systems that don't support specifying an interpreter name using a #! line at the start of a script. If you execute a Perl script beginning with those 3 lines as a shell script, the shell will launch the Perl interpreter, passing it the script's filename and the command line arguments.
In Perl, the three lines form one statement, terminated by the ;, of the form
eval '...' && eval '...' & eval '...' if $running_under_some_shell;
Since the script just started, $running_under_some_shell is undef, which is false, and the evals are never executed. It's a no-op.
The devious part is that $?0 is parsed differently in sh versus csh. In sh, that means $? (the exit status of the last command) followed by 0. Since there is no previous command, $? will be 0, so $?0 evaluates to 00. In csh, $?0 is a special variable that is 1 if the current input filename is known, or 0 if it isn't. Since the shell is reading these lines from a script, $?0 will be 1.
Therefore, in sh, eval '(exit $?0)' means eval '(exit 00)', and in csh it means eval '(exit 1)'. The parens indicate that the exit command should be evaluated in a subshell.
Both sh and csh understand && to mean "execute the previous command, then execute the following command only if the previous command exited 0". So only sh will execute eval 'exec perl -wS $0 ${1+"$#"}'. csh will proceed to the next line.
csh will ignore "& " at the beginning of a line. (I'm not sure exactly what that means to csh. Its purpose is to make this a single expression from Perl's point of view.) csh then proceeds to evaluate eval 'exec /usr/bin/perl -wS $0 $argv:q'.
These two command lines are quite similar. exec perl means to replace the current process by launching a copy of perl. -wS means the same as -w (enable warnings) and -S (look for the specified script in $PATH). $0 is the filename of the script. Finally both ${1+"$#"} and $argv:q produce a copy of the current command line arguments (in sh and csh, respectively).
It uses ${1+"$#"} instead of the more usual "$#" to work around a bug in some ancient version of the Bourne shell. They mean the same thing. You can read the details in Bennett Todd's explanation (copied in gbacon's answer).

From Tom Christiansen's collection Far More Than Everything You've Ever Wanted to Know About …:
Why we use eval 'exec perl $0 -S ${1+"$#"}'
Newsgroups: comp.lang.tcl,comp.unix.shell
From: bet#ritz.mordor.com (Bennett Todd)
Subject: Re: "$#" versus ${1+"$#"}
Followup-To: comp.unix.shell
Date: Tue, 26 Sep 1995 14:35:45 GMT
Message-ID: <DFIoJL.934#ritz.mordor.com>
(This isn't really a TCL question; it's a Bourne Shell question; so I've
cross-posted, and set followups, to comp.unix.shell).
Once upon a time (or so the story goes) there was a Bourne Shell somewhere
which offered two choices for interpolating the whole command-line. The
simplest was $*, which just borfed in all the args, losing any quoting that
had protected internal whitespace. It also offered "$#", to protect
whitespace. Now the icko bit is how "$#" was implemented. In this early
shell, the two-character sequence $# would interpolate as
$1" "$2" "$3" "$4" ... $n
so that when you added the surrounding quotes, it finished quoting the whole
schmeer. Cute, cute, too cute.... Now consider what the correct usage
"$#"
will expand to if there are no args:
""
That's the empty string — a single argument of length zero. That's not
the same as no args at all. So, someone came up with a clever application of
another Bourne Shell feature, conditional interpolation. The idiom
${varname+value}
expands to value if varname is set, and nothing otherwise. Thus the
idiom under discussion
${1+"$#"}
means exactly, precisely the same as a simple
"$#"
without that ancient, extremely weird bug.
So now the question: what shells had that bug? Are there any shells
shipped with any even vaguely recent OS that included it?
--
-Bennett
bet#mordor.com
http://www.mordor.com/bet/

Related

Long eval string at the beginnning of Perl script

I am relatively new to Perl and am working on Perl files written by someone else, and I keep encountering the following statement at the beginning of the scripts:
eval '(exit $?0)' && eval 'exec perl -w -S $0 ${1+"$#"}' && eval 'exec perl -w -S $0 $argv:q'
if 0;
What do these two lines do? What is the code checking? and What does the if 0; sentence do?
This is a variant of the exec hack. In the days before interpreters could be reliably specified with a #!, this was used to make the shell exec perl. The if 0 on the second line is never read by the shell, which reads the first line only and execs perl, which reads the if 0 and does not re-execute itself.
This is an interesting variant, but I think not quite correct. It seems to be set up to work with either the bourne shell or with csh variants, using the initial eval to determine the shell that is parsing it and then using the appropriate syntax to pass the arguments to perl. The middle clause is sh syntax and the last clause is appropriate for csh. If the second && were || and the initial eval '(exit $?0)' did actually fail in csh, then this would accomplish those goals, but as written I don't think it quite works for csh. Is there a command that precedes this that would set $? to some value based on the shell? But even if that were the case and $? is set to a non-zero value, then nothing would be exec'ed unless the && is replaced with ||. Something funny is happening.

Perl ambiguous command line options, and security implications of eval with -i?

I know this is incorrect. I just want to know how perl parses this.
So, I'm playing around with perl, what I wanted was perl -ne what I typed was perl -ie the behavior was kind of interesting, and I'd like to know what happened.
$ echo 1 | perl -ie'next unless /g/i'
So perl Aborted (core dumped) on that. Reading perl --help I see -i takes an extension for backups.
-i[extension] edit <> files in place (makes backup if extension supplied)
For those that don't know -e is just eval. So I'm thinking one of three things could have happened either it was parsed as
perl -i -e'next unless /g/i' i gets undef, the rest goes as argument to e
perl -ie 'next unless /g/i' i gets the argument e, the rest is hanging like a file name
perl -i"-e'next unless /g/i'" whole thing as an argument to i
When I run
$ echo 1 | perl -i -e'next unless /g/i'
The program doesn't abort. This leads me to believe that 'next unless /g/i' is not being parsed as a literal argument to -e. Unambiguously the above would be parsed that way and it has a different result.
So what is it? Well playing around with a little more, I got
$ echo 1 | perl -ie'foo bar'
Unrecognized switch: -bar (-h will show valid options).
$ echo 1 | perl -ie'foo w w w'
... works fine guess it reads it as `perl -ie'foo' -w -w -w`
Playing around with the above, I try this...
$ echo 1 | perl -ie'foo e eval q[warn "bar"]'
bar at (eval 1) line 1.
Now I'm really confused.. So how is Perl parsing this? Lastly, it seems you can actually get a Perl eval command from within just -i. Does this have security implications?
$ perl -i'foo e eval "warn q[bar]" '
Quick answer
Shell quote-processing is collapsing and concatenating what it thinks is all one argument. Your invocation is equivalent to
$ perl '-ienext unless /g/i'
It aborts immediately because perl parses this argument as containing -u, which triggers a core dump where execution of your code would begin. This is an old feature that was once used for creating pseudo-executables, but it is vestigial in nature these days.
What appears to be a call to eval is the misparse of -e 'ss /g/i'.
First clue
B::Deparse can your friend, provided you happen to be running on a system without dump support.
$ echo 1 | perl -MO=Deparse,-p -ie'next unless /g/i'
dump is not supported.
BEGIN { $^I = "enext"; }
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
chomp($_);
(('ss' / 'g') / 'i');
}
So why does unle disappear? If you’re running Linux, you may not have even gotten as far as I did. The output above is from Perl on Cygwin, and the error about dump being unsupported is a clue.
Next clue
Of note from the perlrun documentation:
-u
This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.
Working hypothesis and confirmation
Perl’s argument processing sees the entire chunk as a single cluster of options because it begins with a dash. The -i option consumes the next word (enext), as we can see in the implementation for -i processing.
case 'i':
Safefree(PL_inplace);
[Cygwin-specific code elided -geb]
{
const char * const start = ++s;
while (*s && !isSPACE(*s))
++s;
PL_inplace = savepvn(start, s - start);
}
if (*s) {
++s;
if (*s == '-') /* Additional switches on #! line. */
s++;
}
return s;
For the backup file’s extension, the code above from perl.c consumes up to the first whitespace character or end-of-string, whichever is first. If characters remain, the first must be whitespace, then skip it, and if the next is a dash then skip it also. In Perl, you might write this logic as
if ($$s =~ s/i(\S+)(?:\s-)//) {
my $extension = $1;
return $extension;
}
Then, all of -u, -n, -l, and -e are valid Perl options, so argument processing eats them and leaves the nonsensical
ss /g/i
as the argument to -e, which perl parses as a series of divisions. But before execution can even begin, the archaic -u causes perl to dump core.
Unintended behavior
An even stranger bit is if you put two spaces between next and unless
$ perl -ie'next unless /g/i'
the program attempts to run. Back in the main option-processing loop we see
case '*':
case ' ':
while( *s == ' ' )
++s;
if (s[0] == '-') /* Additional switches on #! line. */
return s+1;
break;
The extra space terminates option parsing for that argument. Witness:
$ perl -ie'next nonsense -garbage --foo' -e die
Died at -e line 1.
but without the extra space we see
$ perl -ie'next nonsense -garbage --foo' -e die
Unrecognized switch: -onsense -garbage --foo (-h will show valid options).
With an extra space and dash, however,
$ perl -ie'next -unless /g/i'
dump is not supported.
Design motivation
As the comments indicate, the logic is there for the sake of harsh shebang (#!) line constraints, which perl does its best to work around.
Interpreter scripts
An interpreter script is a text file that has execute permission enabled and whose first line is of the form:
#! interpreter [optional-arg]
The interpreter must be a valid pathname for an executable which is not itself a script. If the filename argument of execve specifies an interpreter script, then interpreter will be invoked with the following arguments:
interpreter [optional-arg] filename arg...
where arg... is the series of words pointed to by the argv argument of execve.
For portable use, optional-arg should either be absent, or be specified as a single word (i.e., it should not contain white space) …
Three things to know:
'-x y' means -xy to Perl (for some arbitrary options "x" and "y").
-xy, as common for unix tools, is a "bundle" representing -x -y.
-i, like -e absorbs the rest of the argument. Unlike -e, it considers a space to be the end of the argument (as per #1 above).
That means
-ie'next unless /g/i'
which is just a fancy way of writing
'-ienext unless /g/i'
unbundles to
-ienext -u -n -l '-ess /g/i'
^^^^^ ^^^^^^^
---------- ----------
val for -i val for -e
perlrun documents -u as:
This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump() operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.

Could someone tell me what this means in Perl

I'm new to Perl and was hoping someone could tell me what this means exactly
eval 'exec ${PERLHOME}/bin/perl -S $0 ${1+"$#"}' # -*- perl -*-
if 0;
This is explained in perldoc perlrun:
-S
makes Perl use the PATH environment variable to search for the program
unless the name of the program contains path separators.
...
Typically this is used to emulate #! startup on platforms that don't
support #! . It's also convenient when debugging a script that uses
#! , and is thus normally found by the shell's $PATH search
mechanism.
This example works on many platforms that have a shell compatible with
Bourne shell:
#!/usr/bin/perl
eval 'exec /usr/bin/perl -wS $0 ${1+"$#"}'
if $running_under_some_shell;
The system ignores the first line and feeds the program to /bin/sh,
which proceeds to try to execute the Perl program as a shell script.
The shell executes the second line as a normal shell command, and thus
starts up the Perl interpreter. On some systems $0 doesn't always
contain the full pathname, so the -S tells Perl to search for the
program if necessary. After Perl locates the program, it parses the
lines and ignores them because the variable
$running_under_some_shell is never true. If the program will be
interpreted by csh, you will need to replace ${1+"$#"} with $* ,
even though that doesn't understand embedded spaces (and such) in the
argument list. To start up sh rather than csh, some systems may
have to replace the #! line with a line containing just a colon,
which will be politely ignored by Perl.
In short, it mimics shebang behavior for platforms that have shells compatible with Bash.
It's valid both as shell script and as a Perl program. It is used to run the Perl interpreter after all on systems where the shebang doesn't work, for some reason. It's rarely seen these days but used to be common in the early 1990s.
The comment is just a comment, but it has special meaning in Emacs, which will open the file in perl mode.
I just read #Zaid's response, which is better and more correct than mine as long as this code is on the first line of the script being executed, and no shebang exists. I've never seen this kind of substitute. Quite interesting, really.
The second line, if 0; is a part of the first line. You can tell since the first line lacks a ;. It would be more obvious if this was one long single line with the comment being after the semicolon.
So it's equivalent to:
if(0) {
eval 'exec ${PERLHOME}/bin/perl -S $0 ${1+"$#"}
}
In perl, 0 will be evaluated to false, and so the eval-clause will never execute. Presumably this condition(the if) was a quick way to disable the line. Perhaps the evaluation was once something real instead of an always-false.
See perl --help, perldoc -f eval and perldoc -f exec for information on the evaluation block itself.
The remaining trickyness (${1+"$#"}) I have no idea about. This isn't perl anyway; it's interpreted by whichever shell exec is launching (Correct me if I'm wrong on this!). If it's bash, I don't think it does anything at all and can be substituted with $#, which is the environment variable holding all commandline arguments (ie #ARGV in perl).

What is the exact meaning of the find2perl perl shebang + eval?

What exacly do the following?
#! /usr/bin/perl -w
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if 0; #$running_under_some_shell
the if 0 is never true, so the eval part will never executed,
and the eval is strange too - what is the value of $0 in this context (inside single quotes?)
Ps: taken from the result of the find2perl command
Best guess - as in this comment #$running_under_some_shell, it's to detect if the script is being run by some shell other than perl, e.g. bash.
the if 0 is never true, so the eval part will never executed,
Not by perl, no. By other shells such as bash it won't spot the line continuation and will just execute the eval statement. This then re-runs the script under perl. (Oddly with different options than the hashbang line.)
and the eval is strange too - what is the value of $0 in this context (inside single quotes?)
Again, this will be expanded by bash not perl: here it means the path to find2perl to pass into the perl interpreter.
I found some discussion here:
http://www.perlmonks.org/?node_id=825147
The extended hashbang is there so you
can run your Perl script with almost
any /bin/sh under the sun, even a
shell/kernel that does not honor the
hashbang and it will still launch perl
in the end.

perl: force the use of command line flags?

I often write one-liners on the command line like so:
perl -Magic -wlnaF'\t' -i.orig -e 'abracadabra($_) for (#F)'
In order to scriptify this, I could pass the same flags to the shebang line:
#!/usr/bin/perl -Magic -wlnaF'\t' -i.orig
abracadabra($_) for (#F);
However, there's two problems with this. First, if someone invokes the script by passing it to perl directly (as 'perl script.pl', as opposed to './script.pl'), the flags are ignored. Also, I can't use "/usr/bin/env perl" for this because apparently I can't pass arguments to perl when calling it with env, so I can't use a different perl installation.
Is there anyway to tell a script "Hey, always run as though you were invoked with -wlnaF'\t' -i.orig"?
You're incorrect about the perl script.pl version; Perl specifically looks for and parses options out of a #! line, even on non-Unix and if run as a script instead of directly.
The #! line is always examined for switches as the line is being
parsed. Thus, if you're on a machine that allows only one argument
with the #! line, or worse, doesn't even recognize the #! line, you
still can get consistent switch behavior regardless of how Perl was
invoked, even if -x was used to find the beginning of the program.
(...)
Parsing of the #! switches starts wherever perl is mentioned in the
line. The sequences "-*" and "- " are specifically ignored so that you
could, if you were so inclined, say
#!/bin/sh
#! -*-perl-*-
eval 'exec perl -x -wS $0 ${1+"$#"}'
if 0;
to let Perl see the -p switch.
Now, the above quote expects perl -x, but it works just as well if you start the script with
#! /usr/bin/env perl -*-perl -p-*-
(with enough characters to get past the 32-character limit on systems with that limit; see perldoc perlrun for details on that and the rest of what I quoted above).
I had the same problem with #!env perl -..., and env ended up being helpful:
$ env 'perl -w'
env: ‘perl -w’: No such file or directory
env: use -[v]S to pass options in shebang lines
So, just modify the shebang to #!/usr/bin/env -S perl -...