security concerns with Perl /e modifier - perl

Let's say that I have a Perl script which contains a substitution command which takes replacement string as a positional parameter and uses /e modifier:
perl -pe 's/abc/$ARGV[0]/ge;'
Are there any security concerns with this approach? I mean is it possible to give such positional parameter value which causes perl to execute an unwanted function? I mean something similar: perl -pe 's/abc//;unlink ("/tmp/file");'.

perl -pe 's/abc/$ARGV[0]/ge'
Are there any security concerns with this approach? I mean is it possible to give such positional parameter value which causes perl to execute an unwanted function?
In
perldoc perlop
in the section
Regexp Quote-Like Operators
it explains
e Evaluate the right side as an expression
ee Evaluate the right side as a string then eval the result.
But this isn't entirely true. In both cases the "right side"—the replacement—is evaluated as if it were a do block†. In the first case the result provides the replacement string, while in the second the result is passed to eval and the result of that provides the replacement string. There is no distinction whereby the replacement is evaluated as an "expression" in the first place and as a "string" in the second.
Both /e and /ee allow for any valid Perl code sequence, including loops, conditionals, and multiple statements, and aren't limited to a single expression
There's never anything wrong with $ARGV[0] in isolation. Tainted strings become dangerous only if you execute them, either as Perl, using eval, or as shell code using system, qx//, or backticks. So it's fine in the replacement part of a substitution with a single /e modifier
But if you use something else in the replacement, for instance
perl -pe 's/abc/qx{$ARGV[0]}/eg'
then that parameter will be executed as a shell command, so it clearly isn't safe. But then nor is
perl -pe 's/abc/unlink glob "*.*"/eg'
so you have to be sensible about it
What is dangerous is the double-e modifier /ee, which treats the replacement as a Perl do block and then does an eval on the result. So something like
s/abc/$ARGV[0]/eeg
is very unsafe, because you could run your code like this
perl -pe 's/abc/$ARGV[0]/eeg' 'unlink glob *.*'
With just a single /e this would just replace abc with the string
unlink glob *.*
in $ARGV[0]. But using /ee, the string is passed to eval and all your files are deleted!
Remember this:
/e — replacement is an expression (a do block)
/ee — replacement is an expression (a do block) and the result is passed to eval
†
This is why I choose to use braces to delimit substitutions that use one of the /e modes. With
s{abc}{ $ARGV[0] }ge the replacement looks much more like the block of code that it is than if I had used the usual slashes

Unlike /ee, there's no inherent risk to /e as it doesn't invoke the Perl parser. It simply causes code in the source file to be evaluated, just like how map BLOCK LIST and for (LIST) BLOCK evaluate their BLOCK.
Note that
s{$foo}{$bar}g
is simply short for
s{$foo}{ qq{$bar} }eg
So if you're ok with
perl -pe's/abc/$ARGV[0]/g'
then you're ok with
perl -pe's/abc/"$ARGV[0]"/eg'
and the virtually identical
perl -pe's/abc/$ARGV[0]/eg'

Related

perl: upper case evaluation for subroutine

I need to pass system variable in upper case to perl subroutine.
For example, if the variable with name VARNAME (value 'super'), i need to pass "SUPER_MAN".
In general, if we use 'uc' option like in the example below, we can convert to upper case
perl -e 'print uc"$ENV{VARNAME}\n"'
But when we try to pass it in subroutine, we need to include uc function in the syntax and evaluate during runtime. To emulate that I was trying the below but not working, Where am I going wrong?
perl -e 'print ".uc($ENV{VARNAME})_MAN\n"'
.uc(super)_MAN
Alternate methods/approach is also welcome.
Take the uc out of the quotes "", since perl thinks you want the literal letters uc:
FOO=abc perl -e 'print "." . uc($ENV{FOO}) . "_MAN\n"'
.ABC_MAN
perldoc perlop - Quote and Quote like Operators

What is this backtick at the beginning of a directory name? (perl)

I am trying to understand a program. Correct my if I'm wrong, but backticks are used to execute commands in a shell, so I'm not sure what it is its purpose in the following code:
my $end = $` if $dir =~ m/\/foldername/;
foreach my $folder (#dirs_) {
my $start_from = "$dir" . "\/" . "$folder";
my $move_to = "$end" . "\/" . "$folder";
rmtree $move_to;
dircopy($start_from, $move_to);
}
It's not very pretty is it.
The $` variable is one of the trinity $`, $& and $' which represent the pre-match, match, and post-match parts of the last string that was subjected to a successful regex comparison
For instance, if I have
my $s = 'abcdef';
then after
$s =~ /c./;
you will find that $` is ab, $& is cd, and $' is ef
It's important to remember that, just like the capture variables $1, $2 etc., these three are unaffected by failed regex matches. (They are not set to undef.) So it's vital to check whether a regex pattern matched before using any of them
This is archaic Perl, maintained primarily for backward compatability. It was a good idea at the time because Perl was keeping close to shell syntax (as were awk and sed, which still do). Nowadays it is best to use regex captures, or perhaps substr in conjunction with the newer #- and #+ arrays
All of the special built-in variables are documented in perldoc perlvar
The variable $` is a Perl special variable whose "English" name is $PREMATCH. From the perldoc website:
The string preceding whatever was matched by the last successful pattern match, not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK.
The $` is a regex-related special variable, containing the string preceding the last succesful match. From perlvar
$`
The string preceding whatever was matched by the last successful pattern match, not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK.
See Performance issues above for the serious performance implications of using this variable (even once) in your code.
This variable is read-only and dynamically-scoped.
Mnemonic: ` often precedes a quoted string.
In this case it contains the part of the string in $dir that precedes the matched |/foldername|, if the match happened. Note that this line of code, with the conditional declaration, results in undefined behavior if there is no match.
The code in foreach is then meant to copy folders "$dir/$folder" one level up. However, if the match failed this code runs after the program got into an invalid state. So I would urge you to rewrite it, along the lines of: declare $end in a separate statement, then conditionally assign the match and enter the loop, or skip the loop (if the match fails and $end is undef).
Following the link in documentation quote above, to Performance issues
In Perl 5.20.0 a new copy-on-write system was enabled by default, which finally fixes all performance issues with these three variables, and makes them safe to use anywhere.
The "three variables" refers to $\, $&, and $'. Thanks to stevieb for this remark.
However, I suggest following the recommendation by Borodin to use modern tools and techniques.

Meaning of the <*> symbol

I've recently been exposed to a bit of Perl code, and some aspects of it are still elusive to me. This is it:
#collection = <*>;
I understand that the at-symbol defines collection as an array. I've also searched around a bit, and landed on perldoc, specifically at the part about I/O Operators. I found the null filelhandle specifically interesting; code follows.
while (<>) {
...
}
On the same topic I have also noticed that this syntax is also valid:
while (<*.c>) {
...
}
According to perldoc It is actually calling an internal function that invokes glob in a manner similar as the following code:
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
...
}
Question
What does the less-than, asterisk, more-than (<*>) symbol mentioned on the first line actually do? Is it a reference to an internally open and referenced glob? Would it be a special case, such as the null filehandle? Or can it be something entirely different, like a legacy implementation?
<> (the diamond operator) is used in two different syntaxes.
<*.c>, <*> etc. is shorthand for the glob built-in function. So <*> returns a list of all files and directories in the current directory. (Except those beginning with a dot; use <* .*> for that).
<$fh> is shorthand for calling readline($fh). If no filehandle is specified (<>) the magical *ARGV handle is assumed, which is a list of files specified as command line arguments, or standard input if none are provided. As you mention, the perldoc covers both in detail.
How does Perl distinguish the two? It checks if the thing inside <> is either a bare filehandle or a simple scalar reference to a filehandle (e.g. $fh). Otherwise, it calls glob() instead. This even applies to stuff like <$hash{$key}> or <$x > - it will be interpreted as a call to glob(). If you read the perldoc a bit further on, this is explained - and it's recommended that you use glob() explicitly if you're putting a variable inside <> to avoid these problems.
It collects all filenames in the current directory and save them to the array collection. Except those beginning with a dot. It's the same as:
#collection = glob "*";

Should I escape shell arguments in Perl?

When using system() calls in Perl, do you have to escape the shell args, or is that done automatically?
The arguments will be user input, so I want to make sure this isn't exploitable.
If you use system $cmd, #args rather than system "$cmd #args" (an array rather than a string), then you do not have to escape the arguments because no shell is invoked (see system). system {$cmd} $cmd, #args will not invoke a shell either even if $cmd contains metacharacters and #args is empty (this is documented as part of exec). If the args are coming from user input (or other untrusted source), you will still want to untaint them. See -T in the perlrun docs, and the perlsec docs.
If you need to read the output or send input to the command, qx and readpipe have no equivalent. Instead, use open my $output, "-|", $cmd, #args or open my $input, "|-", $cmd, #args although this is not portable as it requires a real fork which means Unix only... I think. Maybe it'll work on Windows with its simulated fork. A better option is something like IPC::Run, which will also handle the case of piping commands to other commands, which neither the multi-arg form of system nor the 4 arg form of open will handle.
On Windows, the situation is a bit nastier. Basically, all Win32 programs receive one long command-line string -- the shell (usually cmd.exe) may do some interpretation first, removing < and > redirections for example, but it does not split it up at word boundaries for the program. Each program must do this parsing themselves (if they wish -- some programs don't bother). In C and C++ programs, routines provided by the runtime libraries supplied with the compiler toolchain will generally perform this parsing step before main() is called.
The problem is, in general, you don't know how a given program will parse its command line. Many programs are compiled with some version of MSVC++, whose quirky parsing rules are described here, but many others are compiled with different compilers that use different conventions.
This is compounded by the fact that cmd.exe has its own quirky parsing rules. The caret (^) is treated as an escape character that quotes the following character, and text inside double quotes is treated as quoted if a list of tricky criteria are met (see cmd /? for the full gory details). If your command contains any strange characters, it's very easy for cmd.exe's idea of which parts of text are "quoted" and which aren't to get out of sync with your target program's, and all hell breaks loose.
So, the safest approach for escaping arguments on Windows is:
Escape arguments in the manner expected by the command-line parsing logic of the program you're calling. (Hopefully you know what that logic is; if not, try a few examples and guess.)
Join the escaped arguments with spaces.
Prefix every single non-alphanumeric character of the resulting string with ^.
Append any redirections or other shell trickery (e.g. joining commands with &&).
Run the command with system() or backticks.
sub esc_chars {
# will change, for example, a!!a to a\!\!a
#_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
return #_;
}
http://www.slac.stanford.edu/slac/www/resource/how-to-use/cgi-rexx/cgi-esc.html
If you use system "$cmd #args" (a string), then you have to escape the arguments because a shell is invoked.
Fortunately, for double quoted strings, only four characters need escaping:
" - double quote
$ - dollar
# - at symbol
\ - backslash
The answers on your question were very useful. In the end I followed #runrig's advice but then used the core module open3() command so I could capture the output from STDERR as well as STDOUT.
For sample code of open3() in use with #runrig's solution, see my related question and answer:
Calling system commands from Perl

How to replace $*=1 with an alternative now $* is no longer supported

I'm a complete perl novice, am running a perl script using perl 5.10 and getting this warning:
$* is no longer supported at migrate.pl line 380.
Can anyone describe what $* did and what the recommended replacement of it is now?
Alternatively if you could point me to documentation that describes this that would be great.
The script I'm running is to migrate a source code database from vss to svn and can be found here:
http://www.x2systems.com/files/migrate.pl.txt
The two snippets of code that use it are:
$* = 1;
$/ = ':';
$cmd = $SSCMD . " Dir -I- \"$proj\"";
$_ = `$cmd`;
# what this next expression does is to merge wrapped lines like:
# $/DeviceAuthority/src/com/eclyptic/networkdevicedomain/deviceinterrogator/excep
# tion:
# into:
# $/DeviceAuthority/src/com/eclyptic/networkdevicedomain/deviceinterrogator/exception:
s/\n((\w*\-*\.*\w*\/*)+\:)/$1/g;
$* = 0;
and then some ways later on:
$cmd = $SSCMD . " get -GTM -W -I-Y -GL\"$localdir\" -V$version \"$file\" 2>&1";
$out = `$cmd`;
# get rid of stupid VSS warning messages
$* = 1;
$out =~ s/\n?Project.*rebuilt\.//g;
$out =~ s/\n?File.*rebuilt\.//g;
$out =~ s/\n.*was moved out of this project.*rebuilt\.//g;
$out =~ s/\nContinue anyway.*Y//g;
$* = 0;
many thanks,
Rory
From perlvar:
Use of $* is deprecated in modern Perl, supplanted by the /s and /m modifiers on pattern matching.
If you have access to the place where it's being matched just add it to the end:
$haystack =~ m/.../sm;
If you only have access to the string, you can surround the expression with
qr/(?ms-ix:$expr)/;
Or in your case:
s/\n((\w*\-*\.*\w*\/*)+\:)/$1/gsm;
From Perl 5.8 version of perlvar:
Set to a non-zero integer value to do
multi-line matching within a string
[...] Use of $* is deprecated in
modern Perl, supplanted by the /s and
/m modifiers on pattern matching.
While using /s and /m is much better, you need to set the modifiers (appropriately!) for each regular expression.
perlvar also says "This variable influences the interpretation of only ^ and $." which gives the impression that it's equivalent to /m only and not /s.
Note that $* is a global variable. Because the change to it is not made local with the local keyword, it will affect all regular expressions in the program, not just those that follow it in the block. This will make it more difficult to update the script correctly.
From perldoc perlvar:
$*
Set to a non-zero integer value to do multi-line matching within a string, 0 (or undefined) to tell Perl that it can assume that strings contain a single line, for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when $* is 0 or undefined. Default is undefined. (Mnemonic: * matches multiple things.) This variable influences the interpretation of only ^ and $. A literal newline can be searched for even when $* == 0.
Use of $* is deprecated in modern Perl, supplanted by the /s and /m modifiers on pattern matching.
Assigning a non-numerical value to $* triggers a warning (and makes $* act as if $* == 0), while assigning a numerical value to $* makes that an implicit int is applied on the value.
It was basically a way of saying that in subsequent regexes (s/// or m//), the ^ or $ assertions should be able to match before or after newlines embedded in the string.
The recommended equivalent is the m modifier at the end of your regex (e.g., s/\n((\w*-*.*\w*/*)+:)/$1/gm;).
It turns on multi-line mode. Since perl 5.0 (from 1994), the correct way to do that is adding a m and/or the s modifier to your regexps, like this
s/\n?Project.*rebuilt\.//msg