$> and $? in Perl

$> and $? in Perl - perl

In Perl, do $> and $? have special meaning in the same way that $_ and #_ are "special"?

Yes, there are many special variables whose name is a single punctuation character, including the scalar variable > (written $>) and the scalar variable ? (written $?). They are documented in perldoc perlvar.
$> is the process's effective user ID. It's “magical” in that assigning to it will change the EUID (if permitted).
$? contains the status of the last external process call. It's a little magical (e.g. you can only assign integers to it), but mainly several built-in constructs (such as backticks, i.e. `foo`) assign to it.

Related

$ENV{$variable} in perl

Is that anyway in Perl to expand the variable by in $ENV{$variable}?
I exported "a=T" and "T=b" in shell, and I run the Perl script in which print "$ENV{$a}\n", but nothing printed. I want to "b" can be printed, then how should I do in Perl?

Those environment variables should be chained you say, so
$ENV{ $ENV{a} };
Note: not $a but a, like $ENV{USER} etc. This uses the hash %ENV (see perlvar), which has the current environment, so with keys being names of environment variables.
It is apparently of interest to use a Perl variable (for the shell variable's name†) in %ENV, and not a string literal as above. In that case we need to pass that shell variable, its name or the value, to the Perl program somehow so to have it stored in a variable; can't just use it directly.
Incidentally, one of the ways to pass a variable from shell to Perl is precisely by exporting it, what then makes it available via %ENV. However, it can also be passed as usual, via command line. Assuming the use of a Perl one-liner (common in shell scripts), we have two options for how to pass
As an argument, perl -we'...' "$var", in which case it is available in #ARGV
Via the -s command switch, perl -s -we'...' -- -shv="$var", what sets up $shv variable in the one-liner, with the value $var. The -- mark the start of arguments.
See this post for details, and perhaps this one for another, more involved, example.
Note A comment asks how to pass variable's name (string a), not its value ($a). This doesn't seem as the best design to me; if the name of a variable for some reason need be passed around then it makes sense to store that in a variable (var="a") and pass that variable, as above.
But if the idea is indeed to pass the name itself around, then do that instead, so either of
perl -we'...' "a"
perl -we'...' -s -- -shv="a"
The rest is the same and %ENV uses the variable that got assigned the input.
If a full Perl script is used (not a one-liner) then use Getopt::Long to nicely handle arugments.
† A comment asks about passing the shell variable's name to a Perl variable — so a from the OP, not its value $a. I am a little uncertain of the utility of that but it is of course possible.
The two ways for how to pass a variable from shell to Perl then differ in what is passed.

security concerns with Perl /e modifier

Let's say that I have a Perl script which contains a substitution command which takes replacement string as a positional parameter and uses /e modifier:
perl -pe 's/abc/$ARGV[0]/ge;'
Are there any security concerns with this approach? I mean is it possible to give such positional parameter value which causes perl to execute an unwanted function? I mean something similar: perl -pe 's/abc//;unlink ("/tmp/file");'.

perl -pe 's/abc/$ARGV[0]/ge'
Are there any security concerns with this approach? I mean is it possible to give such positional parameter value which causes perl to execute an unwanted function?
In
perldoc perlop
in the section
Regexp Quote-Like Operators
it explains
e Evaluate the right side as an expression
ee Evaluate the right side as a string then eval the result.
But this isn't entirely true. In both cases the "right side"—the replacement—is evaluated as if it were a do block†. In the first case the result provides the replacement string, while in the second the result is passed to eval and the result of that provides the replacement string. There is no distinction whereby the replacement is evaluated as an "expression" in the first place and as a "string" in the second.
Both /e and /ee allow for any valid Perl code sequence, including loops, conditionals, and multiple statements, and aren't limited to a single expression
There's never anything wrong with $ARGV[0] in isolation. Tainted strings become dangerous only if you execute them, either as Perl, using eval, or as shell code using system, qx//, or backticks. So it's fine in the replacement part of a substitution with a single /e modifier
But if you use something else in the replacement, for instance
perl -pe 's/abc/qx{$ARGV[0]}/eg'
then that parameter will be executed as a shell command, so it clearly isn't safe. But then nor is
perl -pe 's/abc/unlink glob "*.*"/eg'
so you have to be sensible about it
What is dangerous is the double-e modifier /ee, which treats the replacement as a Perl do block and then does an eval on the result. So something like
s/abc/$ARGV[0]/eeg
is very unsafe, because you could run your code like this
perl -pe 's/abc/$ARGV[0]/eeg' 'unlink glob *.*'
With just a single /e this would just replace abc with the string
unlink glob *.*
in $ARGV[0]. But using /ee, the string is passed to eval and all your files are deleted!
Remember this:
/e — replacement is an expression (a do block)
/ee — replacement is an expression (a do block) and the result is passed to eval
†
This is why I choose to use braces to delimit substitutions that use one of the /e modes. With
s{abc}{ $ARGV[0] }ge the replacement looks much more like the block of code that it is than if I had used the usual slashes

Unlike /ee, there's no inherent risk to /e as it doesn't invoke the Perl parser. It simply causes code in the source file to be evaluated, just like how map BLOCK LIST and for (LIST) BLOCK evaluate their BLOCK.
Note that
s{$foo}{$bar}g
is simply short for
s{$foo}{ qq{$bar} }eg
So if you're ok with
perl -pe's/abc/$ARGV[0]/g'
then you're ok with
perl -pe's/abc/"$ARGV[0]"/eg'
and the virtually identical
perl -pe's/abc/$ARGV[0]/eg'

Meaning of the <*> symbol

I've recently been exposed to a bit of Perl code, and some aspects of it are still elusive to me. This is it:
#collection = <*>;
I understand that the at-symbol defines collection as an array. I've also searched around a bit, and landed on perldoc, specifically at the part about I/O Operators. I found the null filelhandle specifically interesting; code follows.
while (<>) {
...
}
On the same topic I have also noticed that this syntax is also valid:
while (<*.c>) {
...
}
According to perldoc It is actually calling an internal function that invokes glob in a manner similar as the following code:
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
...
}
Question
What does the less-than, asterisk, more-than (<*>) symbol mentioned on the first line actually do? Is it a reference to an internally open and referenced glob? Would it be a special case, such as the null filehandle? Or can it be something entirely different, like a legacy implementation?

<> (the diamond operator) is used in two different syntaxes.
<*.c>, <*> etc. is shorthand for the glob built-in function. So <*> returns a list of all files and directories in the current directory. (Except those beginning with a dot; use <* .*> for that).
<$fh> is shorthand for calling readline($fh). If no filehandle is specified (<>) the magical *ARGV handle is assumed, which is a list of files specified as command line arguments, or standard input if none are provided. As you mention, the perldoc covers both in detail.
How does Perl distinguish the two? It checks if the thing inside <> is either a bare filehandle or a simple scalar reference to a filehandle (e.g. $fh). Otherwise, it calls glob() instead. This even applies to stuff like <$hash{$key}> or <$x > - it will be interpreted as a call to glob(). If you read the perldoc a bit further on, this is explained - and it's recommended that you use glob() explicitly if you're putting a variable inside <> to avoid these problems.

It collects all filenames in the current directory and save them to the array collection. Except those beginning with a dot. It's the same as:
#collection = glob "*";

How do I "untaint" a variable?

upto my knowledge once a variable is tainted, Perl won't allow to use it in a system(), exec(), piped open, eval(), backtick command, or any function that affects something outside the program (such as unlink). So whats the process to untaint it?

Use a regular expression on the tainted variable to pull out the "safe" values:
Sometimes you have just to clear your data's taintedness. Values may be untainted by using them as keys in a hash; otherwise the only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression match. Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were doing when you wrote the pattern.
Don't ignore this warning though:
That means using a bit of thought--don't just blindly untaint anything, or you defeat the entire mechanism. It's better to verify that the variable has only good characters (for certain values of "good") rather than checking whether it has any bad characters. That's because it's far too easy to miss bad characters that you never thought of.
Perlsec: Laundering and Detecting Tainted Data

use Untaint:
DESCRIPTION
This module is used to launder data which has been tainted by using
the -T switch to be in taint mode. This can be used for CGI scripts
as well as command line scripts. The module will untaint scalars,
arrays, and hashes. When laundering an array, only array elements
which are tainted will be laundered.
SYNOPSIS
use Untaint;
my $pattern = qr(^k\w+);
my $foo = $ARGV[0];
# Untaint a scalar
if (is_tainted($foo)) {
print "\$foo is tainted. Attempting to launder\n";
$foo = untaint($pattern, $foo);
}else{
print "\$foo is not tainted!!\n";
}

Should I escape shell arguments in Perl?

When using system() calls in Perl, do you have to escape the shell args, or is that done automatically?
The arguments will be user input, so I want to make sure this isn't exploitable.

If you use system $cmd, #args rather than system "$cmd #args" (an array rather than a string), then you do not have to escape the arguments because no shell is invoked (see system). system {$cmd} $cmd, #args will not invoke a shell either even if $cmd contains metacharacters and #args is empty (this is documented as part of exec). If the args are coming from user input (or other untrusted source), you will still want to untaint them. See -T in the perlrun docs, and the perlsec docs.
If you need to read the output or send input to the command, qx and readpipe have no equivalent. Instead, use open my $output, "-|", $cmd, #args or open my $input, "|-", $cmd, #args although this is not portable as it requires a real fork which means Unix only... I think. Maybe it'll work on Windows with its simulated fork. A better option is something like IPC::Run, which will also handle the case of piping commands to other commands, which neither the multi-arg form of system nor the 4 arg form of open will handle.

On Windows, the situation is a bit nastier. Basically, all Win32 programs receive one long command-line string -- the shell (usually cmd.exe) may do some interpretation first, removing < and > redirections for example, but it does not split it up at word boundaries for the program. Each program must do this parsing themselves (if they wish -- some programs don't bother). In C and C++ programs, routines provided by the runtime libraries supplied with the compiler toolchain will generally perform this parsing step before main() is called.
The problem is, in general, you don't know how a given program will parse its command line. Many programs are compiled with some version of MSVC++, whose quirky parsing rules are described here, but many others are compiled with different compilers that use different conventions.
This is compounded by the fact that cmd.exe has its own quirky parsing rules. The caret (^) is treated as an escape character that quotes the following character, and text inside double quotes is treated as quoted if a list of tricky criteria are met (see cmd /? for the full gory details). If your command contains any strange characters, it's very easy for cmd.exe's idea of which parts of text are "quoted" and which aren't to get out of sync with your target program's, and all hell breaks loose.
So, the safest approach for escaping arguments on Windows is:
Escape arguments in the manner expected by the command-line parsing logic of the program you're calling. (Hopefully you know what that logic is; if not, try a few examples and guess.)
Join the escaped arguments with spaces.
Prefix every single non-alphanumeric character of the resulting string with ^.
Append any redirections or other shell trickery (e.g. joining commands with &&).
Run the command with system() or backticks.

sub esc_chars {
# will change, for example, a!!a to a\!\!a
#_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
return #_;
}
http://www.slac.stanford.edu/slac/www/resource/how-to-use/cgi-rexx/cgi-esc.html

If you use system "$cmd #args" (a string), then you have to escape the arguments because a shell is invoked.
Fortunately, for double quoted strings, only four characters need escaping:
" - double quote
$ - dollar
# - at symbol
\ - backslash

The answers on your question were very useful. In the end I followed #runrig's advice but then used the core module open3() command so I could capture the output from STDERR as well as STDOUT.
For sample code of open3() in use with #runrig's solution, see my related question and answer:
Calling system commands from Perl

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse