Perl ambiguous command line options, and security implications of eval with -i? - perl

I know this is incorrect. I just want to know how perl parses this.
So, I'm playing around with perl, what I wanted was perl -ne what I typed was perl -ie the behavior was kind of interesting, and I'd like to know what happened.
$ echo 1 | perl -ie'next unless /g/i'
So perl Aborted (core dumped) on that. Reading perl --help I see -i takes an extension for backups.
-i[extension] edit <> files in place (makes backup if extension supplied)
For those that don't know -e is just eval. So I'm thinking one of three things could have happened either it was parsed as
perl -i -e'next unless /g/i' i gets undef, the rest goes as argument to e
perl -ie 'next unless /g/i' i gets the argument e, the rest is hanging like a file name
perl -i"-e'next unless /g/i'" whole thing as an argument to i
When I run
$ echo 1 | perl -i -e'next unless /g/i'
The program doesn't abort. This leads me to believe that 'next unless /g/i' is not being parsed as a literal argument to -e. Unambiguously the above would be parsed that way and it has a different result.
So what is it? Well playing around with a little more, I got
$ echo 1 | perl -ie'foo bar'
Unrecognized switch: -bar (-h will show valid options).
$ echo 1 | perl -ie'foo w w w'
... works fine guess it reads it as `perl -ie'foo' -w -w -w`
Playing around with the above, I try this...
$ echo 1 | perl -ie'foo e eval q[warn "bar"]'
bar at (eval 1) line 1.
Now I'm really confused.. So how is Perl parsing this? Lastly, it seems you can actually get a Perl eval command from within just -i. Does this have security implications?
$ perl -i'foo e eval "warn q[bar]" '

Quick answer
Shell quote-processing is collapsing and concatenating what it thinks is all one argument. Your invocation is equivalent to
$ perl '-ienext unless /g/i'
It aborts immediately because perl parses this argument as containing -u, which triggers a core dump where execution of your code would begin. This is an old feature that was once used for creating pseudo-executables, but it is vestigial in nature these days.
What appears to be a call to eval is the misparse of -e 'ss /g/i'.
First clue
B::Deparse can your friend, provided you happen to be running on a system without dump support.
$ echo 1 | perl -MO=Deparse,-p -ie'next unless /g/i'
dump is not supported.
BEGIN { $^I = "enext"; }
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
chomp($_);
(('ss' / 'g') / 'i');
}
So why does unle disappear? If you’re running Linux, you may not have even gotten as far as I did. The output above is from Perl on Cygwin, and the error about dump being unsupported is a clue.
Next clue
Of note from the perlrun documentation:
-u
This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.
Working hypothesis and confirmation
Perl’s argument processing sees the entire chunk as a single cluster of options because it begins with a dash. The -i option consumes the next word (enext), as we can see in the implementation for -i processing.
case 'i':
Safefree(PL_inplace);
[Cygwin-specific code elided -geb]
{
const char * const start = ++s;
while (*s && !isSPACE(*s))
++s;
PL_inplace = savepvn(start, s - start);
}
if (*s) {
++s;
if (*s == '-') /* Additional switches on #! line. */
s++;
}
return s;
For the backup file’s extension, the code above from perl.c consumes up to the first whitespace character or end-of-string, whichever is first. If characters remain, the first must be whitespace, then skip it, and if the next is a dash then skip it also. In Perl, you might write this logic as
if ($$s =~ s/i(\S+)(?:\s-)//) {
my $extension = $1;
return $extension;
}
Then, all of -u, -n, -l, and -e are valid Perl options, so argument processing eats them and leaves the nonsensical
ss /g/i
as the argument to -e, which perl parses as a series of divisions. But before execution can even begin, the archaic -u causes perl to dump core.
Unintended behavior
An even stranger bit is if you put two spaces between next and unless
$ perl -ie'next unless /g/i'
the program attempts to run. Back in the main option-processing loop we see
case '*':
case ' ':
while( *s == ' ' )
++s;
if (s[0] == '-') /* Additional switches on #! line. */
return s+1;
break;
The extra space terminates option parsing for that argument. Witness:
$ perl -ie'next nonsense -garbage --foo' -e die
Died at -e line 1.
but without the extra space we see
$ perl -ie'next nonsense -garbage --foo' -e die
Unrecognized switch: -onsense -garbage --foo (-h will show valid options).
With an extra space and dash, however,
$ perl -ie'next -unless /g/i'
dump is not supported.
Design motivation
As the comments indicate, the logic is there for the sake of harsh shebang (#!) line constraints, which perl does its best to work around.
Interpreter scripts
An interpreter script is a text file that has execute permission enabled and whose first line is of the form:
#! interpreter [optional-arg]
The interpreter must be a valid pathname for an executable which is not itself a script. If the filename argument of execve specifies an interpreter script, then interpreter will be invoked with the following arguments:
interpreter [optional-arg] filename arg...
where arg... is the series of words pointed to by the argv argument of execve.
For portable use, optional-arg should either be absent, or be specified as a single word (i.e., it should not contain white space) …

Three things to know:
'-x y' means -xy to Perl (for some arbitrary options "x" and "y").
-xy, as common for unix tools, is a "bundle" representing -x -y.
-i, like -e absorbs the rest of the argument. Unlike -e, it considers a space to be the end of the argument (as per #1 above).
That means
-ie'next unless /g/i'
which is just a fancy way of writing
'-ienext unless /g/i'
unbundles to
-ienext -u -n -l '-ess /g/i'
^^^^^ ^^^^^^^
---------- ----------
val for -i val for -e
perlrun documents -u as:
This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump() operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.

Related

Run a sed search and replace inside perl

I am trying to test the code snippet below for a bigger script that I am writing. However, I can't get the search working with parentheses and variables.
Appreciate any help someone can give me.
Code snippet:
#!/usr/bin/perl
$file="test4.html";
$Search="Help (Test)";
$Replace="Testing";
print "/usr/bin/sed -i cb 's/$Search/$Replace/g' $file\n";
`/usr/bin/sed -i cb 's/$Search/$Replace/g' $file`;
Thanks,
Ash
The syntax to run a command in a child process and wait for its termination in perl is system "cmd", "arg1", "arg2",...:
#!/usr/bin/perl
$file="test4.html";
$Search="Help (Test)";
$Replace="Testing";
print "/usr/bin/sed -icb -e 's/$Search/$Replace/g' -- $file\n";
system "/usr/bin/sed", "-icb", "-e", "s/$Search/$Replace/g", "--", $file;
(error checking left as an exercise, see perldoc -f system for details)
Note that -i is not a standard sed option. The few implementations that support it (yours must be the FreeBSD one as you've separated the cb backup extension from -i) have actually copied it from perl! It does feel a bit silly to be calling sed from perl here.
Looking at your approach:
The `...` operator itself is reminiscent of the equivalent `...` shell operator. In perl, what's inside is evaluated as if inside double quoted, in that $var, #var... perl variables are expanded, and a shell is started with -c and the resulting string as arguments and with its stdout redirected to a pipe.
The shell interprets that argument as code in the shell syntax. Perl reads the output of that inline shell script from the other end of the pipe and that makes up the expansion of `...`. Same as in shell command substitution except that there's is no stripping of zero bytes or of trailing newlines.
sed -i produces no output, so it's pointless to try and capture its output with `...` here.
Now in your case, the code that sh is asked to interpret is:
/usr/bin/sed -i cb 's/Help (Test)/Testing/g' test4.html
That should work fine on FreeBSD or macOS at least. If $file had been test$(reboot).html, that would have been worse though.
Here, because you have the contents of variables that end up interpreted as code in an interpreter (here sh), you have a potential arbitrary command injection vulnerability.
In the system approach, we remove sh, so that particular vulnerability is removed. However sed is also an interpreter of some language. That language is not as omnipotent as that of sh, but for instance sed can write to arbitrary files with its w command. The GNU implementation (which you don't seem to be using) can run arbitrary commands as well.
So you still potentially have a code injection vulnerability in the case of $Search or $Replace coming from an external source.
If that's the case, you'd need to make sure your properly sanitise those values before running sed. See for instance: How to ensure that string interpolated into `sed` substitution escapes all metachars

Inserting headers into multiple files

I found some command line with Perl that inserts headers into my files without going through the tedious process of inserting them one by one. Can someone walk me through the Perl aspect of this command line? I'm new to this and can't seem to find the right explanations for what I wrote.
cat header.txt | perl -0 -i -pe 'BEGIN{$h = <STDIN>}; print $h' 1*
-e
rather than provide a script in a xxxx.pl file, provide it on the command line
-p
makes it iterate over filename arguments somewhat like sed but also prints the contents of $_ at the end of the script.
the two above are combined in -pe
-i
indicate you want to edit the file in place and write the output to the same file. In practice, Perl renames the input file and reads from this renamed version while writing to a new file with the original name
-0
redefines the end of record character (\n by default) so that you can read the entire input file as a single line
1*
is the command line argument to your script, so I guess you are modifying any file with a name that starts with 1 (you could have used *.c, or whatever depending on the type of files you are trying to modify)
print $h
prints the variable $h that is the "main" of your script. if it was initialized with the content of the header file (the intent of this one-liner) then it will print the header file
BEGIN{ some code here }
this is stuff you execute before the script starts. this is where I'm stumped. this doesn't seem like valid perl code
so basically:
this will supposedly slurp the entire header file (because of -0) in the BEGIN block and store it in the variable $h
iterate over all the files specified by the wildcards at the end of the command line
for each file: print the header (print $h) then print hte file itself (because of -pe)
so it's equivalent to spelling the script out:
$h = gets content of the entire header file
while (<>){ #loop implied by -pe, iterates over all the 1* files
# the main contents of the "-e" script are inserted below as part of executing -pe
print h$; #print the header we saved
print $_; # implied by -pe, and since we are using -0, this prints the entire content in one shot
# end of the "-e" script. again it was a single print $h statement, the second print is implied by -pe
}
It's a bit hard to explain, take a look at the perlrun documentation for details (run man perlrun).
This is not 100% complete explanation because I don;t think the BEGIN block is right. I tried it on my ubuntu machine and it complained about its syntax too
Here's something similar, with an explanation. The program in the question doesn't run on my mac.
I needed to add the #nullable disable directive to the top of all my csharp files as part of migrating to nullable reference types.
perl -w -i -p -0777 -e 's/^/#nullable disable\n\n/' $(find . -iname '*.cs')
-w enable warnings
-i edit files in place
-p read each file block by block, printing each block after applying a perl expression. the default block size is one line
-0777 changes the default block size to the entire file
-e the perl expression to execute
The final argument uses shell command substitution to create a list of files. It passes that list of file paths to the perl command. The find command searches for files that end in .cs.
The perl program is a single substitution command. It matches the very beginning of the block and replaces (prepends, really) with "#nullable disable" and a couple new-lines.

Why is Perl complaining about an unterminated string here?

I have a Perl script which runs fine under Perl 5.005 on HPUX 11i v3 and that causes a small problem under Perl 5.10 on my Ubuntu 11.04 box.
It boils down to a line like:
open (CMD, "do something | grep -E 'sometext$' |");
(it's actually capturing ps -ef output to see if a process is running but I don't think that's important (a)).
Now this runs fine on the HPUX environment but, when I try it out under Ubuntu, I get:
sh: Syntax error: Unterminated quoted string
By inserting copious debug statements, I tracked it down to that offending line then started removing characters one-by-one until it stopped complaining. Luckily, $ was the first one I tried and it stopped giving me the error, so I then changed the line to:
open (CMD, "do something | grep -E 'sometext\$' |");
and it worked fine (under Linux anyway - I haven't tested that on HPUX since I don't have access to that machine today - if it does work, I'll just use that approach but I'd still like to know why it's a problem).
So it seems obvious that the $ is "swallowing" the closing single quote under my Linux environment but not apparently on the HPUX one.
My question is simply, why? Surely there weren't any massive changes between 5.005 and 5.10. Or is there some sort of configuration item I'm missing?
(a) But, if you know a better way to do this without external CPAN modules (ie, with just the baseline Perl 5.005 installation), I'd be happy to know about it.
$' is a special variable (see perldoc perlvar). 5.005 was many versions ago, so it's possible that something has changed in the regexp engine to make this variable different (although it appears to be in 5.005 also)
As for the better way, you could at least only run the 'ps -ef' in a pipeline and do the 'grep' in perl.
Use the following!!!
use strict;
use warnings;
You would have gotten
Use of uninitialized value $' in concatenation (.) or string
A sigil followed by any punctuation symbol (on a standard keyboard) is a variable in Perl, regardless of if it is defined or not. So in a double quoted string [$#][symbol] will always be read as one token and interpolated unless the sigil is escaped.
I have a feeling that the difference you are seeing has to do with different system shells rather than different versions of perl.
Consider your line:
open (CMD, "do something | grep -E 'sometext$' |");
When perl sees that, it will interpolate the empty $' variable into the double quoted string, so the string becomes:
open (CMD, "do something | grep -E 'sometext |");
At that point, your shell gets to process a line that looks like:
do something | grep -E 'sometext
And if it succeeds or fails will depend on the shell's rules regarding unterminated strings (some shells will complain loudly, others will automatically terminate the string at eof).
If you were using the warnings pragma in your script, you probably would have gotten a warning about interpolating an undefined variable.
A shorter and cleaner way to read in the output of ps would be:
my #lines = grep /sometext\$/, `ps -ef`;
Or using an explicit open:
my #lines = grep /sometext\$/, do {
open my $fh, '|-', 'ps -ef' or die $!;
<$fh>
};
Because $' is a special variable in recent versions of Perl.
From the official documentation (perlvar):
$':
The string following whatever was matched by the last successful
pattern match (not counting any matches hidden within a BLOCK or
eval() enclosed by the current BLOCK).
If there were no successful pattern matches, $' is empty and your statement essentially interpolates to
open (CMD, "do something | grep -E 'sometext |");
Escaping the dollar sign (the solution that works for you on Linux) should work on HPUX too.
I'm not sure when was this variable added, but I can confirm that it exists in Perl 5.8.5. What's New for Perl 5.005 mentions $' (not as a new feature), so I think it was already there before then.
You should probably use single-quotes rather than double quotes around the string since there is nothing in the string that should be interpolated:
open (CMD, q{do something | grep -E 'sometext$' |});
Better would be to use the 3-argument form of open with a lexical file handle:
open my $cmd, '-|', q{do something | grep -E 'sometext$'} or die 'a horrible death';
I don't have a good explanation for why $' is being recognized as the special variable in 5.10 and yet it was not in 5.005. That is unexpected.
Is there a really good reason you can't upgrade to something like 5.14.1? Even if you don't change the system-provided Perl, there's no obvious reason you can't install a recent version in some other location and use that for all your scripting work.
$' is a special variable.
If you want to avoid variable expansion, just use q():
open (CMD, q(do something | grep -E 'sometext$' |));

Could someone tell me what this means in Perl

I'm new to Perl and was hoping someone could tell me what this means exactly
eval 'exec ${PERLHOME}/bin/perl -S $0 ${1+"$#"}' # -*- perl -*-
if 0;
This is explained in perldoc perlrun:
-S
makes Perl use the PATH environment variable to search for the program
unless the name of the program contains path separators.
...
Typically this is used to emulate #! startup on platforms that don't
support #! . It's also convenient when debugging a script that uses
#! , and is thus normally found by the shell's $PATH search
mechanism.
This example works on many platforms that have a shell compatible with
Bourne shell:
#!/usr/bin/perl
eval 'exec /usr/bin/perl -wS $0 ${1+"$#"}'
if $running_under_some_shell;
The system ignores the first line and feeds the program to /bin/sh,
which proceeds to try to execute the Perl program as a shell script.
The shell executes the second line as a normal shell command, and thus
starts up the Perl interpreter. On some systems $0 doesn't always
contain the full pathname, so the -S tells Perl to search for the
program if necessary. After Perl locates the program, it parses the
lines and ignores them because the variable
$running_under_some_shell is never true. If the program will be
interpreted by csh, you will need to replace ${1+"$#"} with $* ,
even though that doesn't understand embedded spaces (and such) in the
argument list. To start up sh rather than csh, some systems may
have to replace the #! line with a line containing just a colon,
which will be politely ignored by Perl.
In short, it mimics shebang behavior for platforms that have shells compatible with Bash.
It's valid both as shell script and as a Perl program. It is used to run the Perl interpreter after all on systems where the shebang doesn't work, for some reason. It's rarely seen these days but used to be common in the early 1990s.
The comment is just a comment, but it has special meaning in Emacs, which will open the file in perl mode.
I just read #Zaid's response, which is better and more correct than mine as long as this code is on the first line of the script being executed, and no shebang exists. I've never seen this kind of substitute. Quite interesting, really.
The second line, if 0; is a part of the first line. You can tell since the first line lacks a ;. It would be more obvious if this was one long single line with the comment being after the semicolon.
So it's equivalent to:
if(0) {
eval 'exec ${PERLHOME}/bin/perl -S $0 ${1+"$#"}
}
In perl, 0 will be evaluated to false, and so the eval-clause will never execute. Presumably this condition(the if) was a quick way to disable the line. Perhaps the evaluation was once something real instead of an always-false.
See perl --help, perldoc -f eval and perldoc -f exec for information on the evaluation block itself.
The remaining trickyness (${1+"$#"}) I have no idea about. This isn't perl anyway; it's interpreted by whichever shell exec is launching (Correct me if I'm wrong on this!). If it's bash, I don't think it does anything at all and can be substituted with $#, which is the environment variable holding all commandline arguments (ie #ARGV in perl).

How do I use Perl on the command line to search the output of other programs?

As I understand (Perl is new to me) Perl can be used to script against a Unix command line. What I want to do is run (hardcoded) command line calls, and search the output of these calls for RegEx matches. Is there a way to do this simply in Perl? How?
EDIT: Sequence here is:
-Call another program.
-Run a regex against its output.
my $command = "ls -l /";
my #output = `$command`;
for (#output) {
print if /^d/;
}
The qx// quasi-quoting operator (for which backticks are a shortcut) is stolen from shell syntax: run the string as a command in a new shell, and return its output (as a string or a list, depending on context). See perlop for details.
You can also open a pipe:
open my $pipe, "$command |";
while (<$pipe>) {
# do stuff
}
close $pipe;
This allows you to (a) avoid gathering the entire command's output into memory at once, and (b) gives you finer control over running the command. For example, you can avoid having the command be parsed by the shell:
open my $pipe, '-|', #command, '< single argument not mangled by shell >';
See perlipc for more details on that.
You might be able to get away without Perl, as others have mentioned. However, if there is some Perl feature you need, such as extended regex features or additional text manipulation, you can pipe your output to perl then do what you need. Perl's -e switch let's you specify the Perl program on the command line:
command | perl -ne 'print if /.../'
There are several other switches you can pass to perl to make it very powerful on the command line. These are documented in perlrun. Also check out some of the articles in Randal Schwartz's Unix Review column, especially his first article for them. You can also google for Perl one liners to find lots of examples.
Do you need Perl at all? How about
command -I use | grep "myregexp" && dosomething
right in the shell?
#!/usr/bin/perl
sub my_action() {
print "Implement some action here\n";
}
open PROG, "/path/to/your/command|" or die $!;
while (<PROG>) {
/your_regexp_here/ and my_action();
print $_;
}
close PROG;
This will scan output from your command, match regexps and do some action (which now is printing the line)
In Perl you can use backticks to execute commands on the shell. Here is a document on using backticks. I'm not sure about how to capture the output, but I'm sure there's more than a way to do it.
You indeed use a one-liner in a case like this. I recently coded up one that I use, among other ways, to produce output which lists the directory structure present in a .zip archive (one dir entry per line). So using that output as an example of command output that we'd like to filter, we could put a pipe in and then use perl with the -n -e flags to filter the incoming data (and/or do other things with it):
[command_producing_text_output] | perl -MFile::Path -n -e \
"BEGIN{#PTM=()} if (m{^perl/(bin|lib(?!/site))}) {chomp;push #PTM,$_}" ^
-e "END{#WDD=mkpath (\#PTM,1);" ^
-e "printf qq/Created %u dirs to reflect part of structure present in the .ZIP file\n/, scalar(#WDD);}"
the shell syntax used, including: quoting of perl code and escaping of newlines, reflects CMD.exe usage in Windows NT-like consoles. If you need to, mentally replace
"^" with "\" and " with ' in the appropriate places.
The one-liner above adds only the directory names that start with "perl/bin" or
"perl/lib (not followed by "/site"); it then creates those directories. You wind
up with a (empty) tree that you can use for whatever evil purposes you desire.
The main point is to illustrate that there are flags available (-n, -p) to
allow perl to loop over each input record (line), and that what you can do is unlimited in terms of complexity.