Should I escape shell arguments in Perl? - perl

When using system() calls in Perl, do you have to escape the shell args, or is that done automatically?
The arguments will be user input, so I want to make sure this isn't exploitable.

If you use system $cmd, #args rather than system "$cmd #args" (an array rather than a string), then you do not have to escape the arguments because no shell is invoked (see system). system {$cmd} $cmd, #args will not invoke a shell either even if $cmd contains metacharacters and #args is empty (this is documented as part of exec). If the args are coming from user input (or other untrusted source), you will still want to untaint them. See -T in the perlrun docs, and the perlsec docs.
If you need to read the output or send input to the command, qx and readpipe have no equivalent. Instead, use open my $output, "-|", $cmd, #args or open my $input, "|-", $cmd, #args although this is not portable as it requires a real fork which means Unix only... I think. Maybe it'll work on Windows with its simulated fork. A better option is something like IPC::Run, which will also handle the case of piping commands to other commands, which neither the multi-arg form of system nor the 4 arg form of open will handle.

On Windows, the situation is a bit nastier. Basically, all Win32 programs receive one long command-line string -- the shell (usually cmd.exe) may do some interpretation first, removing < and > redirections for example, but it does not split it up at word boundaries for the program. Each program must do this parsing themselves (if they wish -- some programs don't bother). In C and C++ programs, routines provided by the runtime libraries supplied with the compiler toolchain will generally perform this parsing step before main() is called.
The problem is, in general, you don't know how a given program will parse its command line. Many programs are compiled with some version of MSVC++, whose quirky parsing rules are described here, but many others are compiled with different compilers that use different conventions.
This is compounded by the fact that cmd.exe has its own quirky parsing rules. The caret (^) is treated as an escape character that quotes the following character, and text inside double quotes is treated as quoted if a list of tricky criteria are met (see cmd /? for the full gory details). If your command contains any strange characters, it's very easy for cmd.exe's idea of which parts of text are "quoted" and which aren't to get out of sync with your target program's, and all hell breaks loose.
So, the safest approach for escaping arguments on Windows is:
Escape arguments in the manner expected by the command-line parsing logic of the program you're calling. (Hopefully you know what that logic is; if not, try a few examples and guess.)
Join the escaped arguments with spaces.
Prefix every single non-alphanumeric character of the resulting string with ^.
Append any redirections or other shell trickery (e.g. joining commands with &&).
Run the command with system() or backticks.

sub esc_chars {
# will change, for example, a!!a to a\!\!a
#_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
return #_;
}
http://www.slac.stanford.edu/slac/www/resource/how-to-use/cgi-rexx/cgi-esc.html

If you use system "$cmd #args" (a string), then you have to escape the arguments because a shell is invoked.
Fortunately, for double quoted strings, only four characters need escaping:
" - double quote
$ - dollar
# - at symbol
\ - backslash

The answers on your question were very useful. In the end I followed #runrig's advice but then used the core module open3() command so I could capture the output from STDERR as well as STDOUT.
For sample code of open3() in use with #runrig's solution, see my related question and answer:
Calling system commands from Perl

Related

'Correct' way to have perl arguments interpreted by current shell

Sorry, this is pretty basic, and I suspect a duplicate, but after some searching I'm coming up empty:
Given the following script:
#!/usr/bin/perl
use strict;
use warnings;
use IPC::Run3;
my $stdout2;
print $ARGV[0];
print "\n";
my #cmd1 = split /\s+/, $ARGV[0] ;
run3 (\#cmd1, \undef, \$stdout2, \$stdout2);
print $stdout2
And running it like so:
£ perl comp.pl "md5sum *(.)"
md5sum *(.)
md5sum: '*(.)': No such file or directory
Fair enough. The *(.) isn't being intrepreted by the shell and probably most would consider this a feature. But I would like it to be intepreted by the current shell (or zsh specifically would be fine).
The question is how I can do this without complicating the shell command to run the perl script.
Prepending "zsh" and "-c" to cmd1 is ok if that's a reasonable way to do it. It just seems like...it isn't.
My intention is also to pass slightly more complex commands to this script eventually, like so:
perl comp.pl 'md5sum *(.)' 'ssh remoteHost "md5sum *(.)"'
I have no objection to non-perl answers to the problem you can probably infer I'm trying to solve (I suspect rsync could do this) but I'm primarily interested in solving this through Perl as there'll eventually be business-specific logic in this comparison.
EDIT
I tried various forms of:
my $cmd = $ARGV[0];
run3 (\$cmd, \undef, \$stdout2, \$stdout2);
the documentation seems to think this would be ok, but I get:
Not an ARRAY reference at /usr/local/share/perl/5.22.1/IPC/Run3.pm line 320.
The IPC::Run3 docs say that one can pass a string instead of an arrayref for the command
run3($cmd, $stdin, $stdout, $stderr, \%options)
...
$cmd
Usually $cmd will be an ARRAY reference and the child is invoked via
system #$cmd;
But $cmd may also be a string in which case the child is invoked via
system $cmd;
In this case the string $cmd is passed to the shell if it contains shell metacharacters. So take input without splitting it, $cmd = $ARGV[0], or join it after validation, $cmd = join ' ', #cmd;
Even in general this is not the preferred way, and the docs warn to see system for "pitfalls" of it.
Things are yet much worse here since you'd be passing user input directly for execution! Never mind possible nefarious intents, just think of what a good typo can do. Even without that, there is simply a difference between typing a command at the terminal and passing it to a script, which may edit it, may get modified, pick up bugs, etc.
If nothing else, I'd urge to add code for substantial checks of submitted input. An analysis may involve identifying the known and accepted metacharacters while suitably quoting parts of input that shouldn't be interpreted, for example using String::ShellQuote.
But I'd really suggest to reconsider the design, so to not submit complete commands to the script. Rather, specify with keywords what should happen. Things like globbing (assembling a file list) are done from Perl really nicely and with a lot of control. Do outside only what is necessary; generally there'll be no need for the shell then.

Capturing output from a command specified in Perl array

I thought this must be simple, but I haven't found a viable solution.
The problem is as simple as this: I want to execute a system command and capture the output in Perl variable. The command is specified in Perl array (containing command and parameters, e.g. #cmd = ('mycmd', '-opt1', 'arg1', 'val1')).
I don't want to use forking, i.e. open(FROM_KID, '-|') is not an option. I know that if I had the command in a string I can achieve this with backticks. So perhaps this problem reduces to converting #cmd array into a string. In my case, the command arguments can contain spaces.
Is there a simple way to convert #cmd array into string that can be used with backticks, but such that all arguments are properly quoted? Also ideally without using any external libraries.
Thanks!!
You may be looking for String::ShellQuote. Note that only Bourne shell quoting is supported.
But backticks also perform an implicit fork, and if they in any way differ from the implicit fork of pipe open, I for one never noticed. :-\

how to include single quotes from user input in perl

The user is going to enter input string such as Tom's Toy.
However the perl script complains saying "unmatched '."
This is my code.
my $commandline="";
while (#ARGV) {
$_ = shift #ARGV;
{$commandline .= $_ . ' ';}
}
print " Running $commandline\n";
system ($commandline);
Now if the user input is Tom's Toy. I just want to print back Tom's Toy.
However perl complains "unmatched '.".
IF I dont user quote it works fine. (for eg: Tom Toy is good)
How do I fix this issue.
Any help is greatly appreciated.
Thanks in advance
If you switch things around a little to use the system $cmd, #args version of the function, no shell will be invoked, so no escaping will be necessary.
my $cmd = shift #ARGV;
my #args = #ARGV;
print " Running $cmd\n";
system $cmd, #args;
I tested with ./test.pl echo Tom\'s Toy and it gives the expected output:
Running echo
Tom's Toy
system(#ARGV) is probably all you need.
If you give system() a single argument, and if that argument contains any shell metacharacters (including spaces, quotation marks, etc), then the argument will be passed to the shell. jwodder is quite correct: the error message is from the shell, not from Perl.
If you pass system() multiple arguments, it's done without invoking a shell -- which is usually better. The approach you're using takes your program's command-line arguments, joins them together into a single string, then passes that string to the shell, which splits it back into multiple arguments for execution.
On the other hand, sometimes you might want to invoke the shell, for example if you're building up a complex command using pipes, I/O redirection, and so forth, and you don't want to set it all up in Perl. But you have to be careful about metacharacters, as you've seen.
"perldoc -f system" explains this more fully.
If all you want to do is print back the user input, use print, not system. system will try to pass the supplied string to the shell for execution as a command, and it's the shell that's complaining about the unmatched quote.
(Also, instead of manually concatenating #ARGV, may I direct your attention to the join function?)

What does the '`' character do in Perl?

I was using Perl to read through each line of a file. I used a command line tool to call a service, and I noticed some interesting functionality that I can't figure out how to search for. To the variable $cmd I assigned the command that invokes the service. If I refer to $cmd later in the code it prints out the command line argument, but if I refer to it as `$cmd`, however, it gives the output from running the service.
What is the explanation for this?
It works just like backquotes in the shell, which is why it is called that. See sh(1) for details. It captures the standard output alone, and nothing else. It sets the $? variable to the 16-bit wait status word.
This is all explained in the perlop(1) manpage:
qx/STRING/
`STRING`
A string which is (possibly) interpolated and then
executed as a system command with /bin/sh or its
equivalent. Shell wildcards, pipes, and redirections
will be honored. The collected standard output of the
command is returned; standard error is unaffected. In
scalar context, it comes back as a single (potentially
multi-line) string, or undef if the command failed.
In list context, returns a list of lines (however
you’ve defined lines with $/ or
$INPUT_RECORD_SEPARATOR), or the empty list if the
command failed.
Because backticks do not affect standard error: use
shell file descriptor syntax (assuming the shell
supports this) if you care to address this. To
capture a command’s STDERR and STDOUT merged together:
$output = `cmd 2>&1`;
To capture a command’s STDOUT but discard its STDERR:
$output = `cmd 2>/dev/null`;
To capture a command’s STDERR but discard its STDOUT
(ordering is important here):
$output = `cmd 2>&1 1>/dev/null`;
To exchange a command’s STDOUT and STDERR in order to
capture the STDERR but leave its STDOUT to come out
the old STDERR:
$output = `cmd 3>&1 1>&2 2>&3 3>&-`;
To read both a command’s STDOUT and its STDERR
separately, it’s easiest to redirect them separately
to files, and then read from those files when the
program is done:
system("program args 1>program.stdout 2>program.stderr");
The STDIN filehandle used by the command is inherited
from Perl’s STDIN. For example:
open(BLAM, "blam") || die "$0: can't open blam: $!";
open (STDIN, "<&BLAM") || die "$0: can't dup BLAM: $!";
print `sort`;
will print the sorted contents of the file blam.
Using single-quote as the delimiter protects the command
from Perl’s double-quote interpolation, passing the contents on
to the shell instead:
$perl_info = qx(ps $$); # that's Perl's $$
$shell_info = qx'ps $$'; # that's the new shell's $$
How that string gets evaluated is entirely subject to
the command interpreter on your system. On most
platforms, you will have to protect shell
metacharacters if you want them treated literally.
This is in practice difficult to do, as it’s unclear
which characters need escaping, or how. See perlsec for a
clean and safe example of a manual fork and exec
to emulate backticks safely.
On some platforms (notably DOS-like ones), the shell
may not be capable of dealing with multiline commands,
so putting newlines in the string may not get you what
you want. You may be able to evaluate multiple
commands in a single line by separating them with the
command separator character, if your shell supports
that (e.g. ; on many Unix shells; & on the Windows
NT CMD.COM shell).
Beginning with v5.6.0, Perl attempts to flush all
files opened for output before starting the child
process, but this may not be supported on some
platforms (see perlport(1)). To be safe, you may need to
set $| ($AUTOFLUSH in English) or call the
autoflush method of IO::Handle on any open
handles.
Beware that some command shells may place restrictions
on the length of the command line. You must ensure
your strings don’t exceed this limit after any
necessary interpolations. See the platform-specific
release notes for more details about your particular
environment.
Using this operator can lead to programs that are
difficult to port, because the shell commands called
vary between systems, and may in fact not be present
at all. As one example, the type command under the
POSIX shell is very different from the type command
under DOS. That doesn't mean you should go out of
your way to avoid backticks when they’re the right way
to get something done. Perl was made to be a glue
language, and one of the things it glues together is
commands. Just understand what you’re getting
yourself into.
See I/O Operators for more discussion.
Here’s a simple example of using backticks to get the exit status of the first element in a pipeline:
$device = q(/dev/rmt8);
$dd_noise = q(^[0-9]+\+[0-9]+ records (in|out)$);
$status = `exec 3>&1; ((dd if=$device ibs=64k 2>&1 1>&3 3>&- 4>&-; echo $? >&4) | egrep -v "$dd_noise" 1>&2 3>&- 4>&-) 4>&1`;
EDIT
Well ok then, so maybe that wasn’t that simple an example. :) But this one is.
I’d like to recommend the Capture::Tiny CPAN module as a simpler way to manage the output from external commands that you would normally run using backquotes. It has advantages and disadvantages, but I feel that for many people, the advantages outweigh any arguable disadvantageL
The advantage is that you get to do all this without requiring deep knowledge of arcane mysteries of file-descriptor redirection the way the previous example did.
The disadvantage is it’s yet another non-core dependency — something else you have to install from CPAN.
That’s really not bad for what you get.
Here’s an example of how easy it is:
NAME
Capture::Tiny - Capture STDOUT and STDERR from Perl, XS, or external programs
SYNOPSIS
use Capture::Tiny qw/capture tee capture_merged tee_merged/;
($stdout, $stderr) = capture {
# your code here
};
($stdout, $stderr) = tee {
# your code here
};
$merged = capture_merged {
# your code here
};
$merged = tee_merged {
# your code here
};
DESCRIPTION
Capture::Tiny provides a simple, portable way to capture anything sent to STDOUT or STDERR, regardless of whether it comes from Perl, from XS code
or from an external program. Optionally, output can be teed so that it is captured while being passed through to the original handles. Yes, it
even works on Windows. Stop guessing which of a dozen capturing modules to use in any particular situation and just use this one.
There, isn’t that a whole lot easier?
The back-quote in Perl does much the same as the back-quote in shell - it runs a command and captures the standard output.
See also qx//.
I think the backtick lets you run commands and store their output in a variable:
$listing=`ls -1 /tmp/`;

When is the right time (and the wrong time) to use backticks?

Many beginning programmers write code like this:
sub copy_file ($$) {
my $from = shift;
my $to = shift;
`cp $from $to`;
}
Is this bad, and why? Should backticks ever be used? If so, how?
A few people have already mentioned that you should only use backticks when:
You need to capture (or supress) the output.
There exists no built-in function or Perl module to do the same task, or you have a good reason not to use the module or built-in.
You sanitise your input.
You check the return value.
Unfortunately, things like checking the return value properly can be quite challenging. Did it die to a signal? Did it run to completion, but return a funny exit status? The standard ways of trying to interpret $? are just awful.
I'd recommend using the IPC::System::Simple module's capture() and system() functions rather than backticks. The capture() function works just like backticks, except that:
It provides detailed diagnostics if the command doesn't start, is killed by a signal, or returns an unexpected exit value.
It provides detailed diagnostics if passed tainted data.
It provides an easy mechanism for specifying acceptable exit values.
It allows you to call backticks without the shell, if you want to.
It provides reliable mechanisms for avoiding the shell, even if you use a single argument.
The commands also work consistently across operating systems and Perl versions, unlike Perl's built-in system() which may not check for tainted data when called with multiple arguments on older versions of Perl (eg, 5.6.0 with multiple arguments), or which may call the shell anyway under Windows.
As an example, the following code snippet will save the results of a call to perldoc into a scalar, avoids the shell, and throws an exception if the page cannot be found (since perldoc returns 1).
#!/usr/bin/perl -w
use strict;
use IPC::System::Simple qw(capture);
# Make sure we're called with command-line arguments.
#ARGV or die "Usage: $0 arguments\n";
my $documentation = capture('perldoc', #ARGV);
IPC::System::Simple is pure Perl, works on 5.6.0 and above, and doesn't have any dependencies that wouldn't normally come with your Perl distribution. (On Windows it depends upon a Win32:: module that comes with both ActiveState and Strawberry Perl).
Disclaimer: I'm the author of IPC::System::Simple, so I may show some bias.
The rule is simple: never use backticks if you can find a built-in to do the same job, or if their is a robust module on the CPAN which will do it for you. Backticks often rely on unportable code and even if you untaint the variables, you can still open yourself up to a lot of security holes.
Never use backticks with user data unless you have very tightly specified what is allowed (not what is disallowed -- you'll miss things)! This is very, very dangerous.
Backticks should be used if and only if you need to capture the output of a command. Otherwise, system() should be used. And, of course, if there's a Perl function or CPAN module that does the job, this should be used instead of either.
In either case, two things are strongly encouraged:
First, sanitize all inputs: Use Taint mode (-T) if the code is exposed to possible untrusted input. Even if it's not, make sure to handle (or prevent) funky characters like space or the three kinds of quote.
Second, check the return code to make sure the command succeeded. Here is an example of how to do so:
my $cmd = "./do_something.sh foo bar";
my $output = `$cmd`;
if ($?) {
die "Error running [$cmd]";
}
Another way to capture stdout(in addition to pid and exit code) is to use IPC::Open3 possibily negating the use of both system and backticks.
Use backticks when you want to collect the output from the command.
Otherwise system() is a better choice, especially if you don't need to invoke a shell to handle metacharacters or command parsing. You can avoid that by passing a list to system(), eg system('cp', 'foo', 'bar') (however you'd probably do better to use a module for that particular example :))
In Perl, there's always more than one way to do anything you want. The primary point of backticks is to get the standard output of the shell command into a Perl variable. (In your example, anything that the cp command prints will be returned to the caller.) The downside of using backticks in your example is you don't check the shell command's return value; cp could fail and you wouldn't notice. You can use this with the special Perl variable $?. When I want to execute a shell command, I tend to use system:
system("cp $from $to") == 0
or die "Unable to copy $from to $to!";
(Also observe that this will fail on filenames with embedded spaces, but I presume that's not the point of the question.)
Here's a contrived example of where backticks might be useful:
my $user = `whoami`;
chomp $user;
print "Hello, $user!\n";
For more complicated cases, you can also use open as a pipe:
open WHO, "who|"
or die "who failed";
while(<WHO>) {
# Do something with each line
}
close WHO;
From the "perlop" manpage:
That doesn't mean you should go out of
your way to avoid backticks when
they're the right way to get something
done. Perl was made to be a glue
language, and one of the things it
glues together is commands. Just
understand what you're getting
yourself into.
For the case you are showing using the File::Copy module is probably best. However, to answer your question, whenever I need to run a system command I typically rely on IPC::Run3. It provides a lot of functionality such as collecting the return code and the standard and error output.
Whatever you do, as well as sanitising input and checking the return value of your code, make sure you call any external programs with their explicit, full path. e.g. say
my $user = `/bin/whoami`;
or
my $result = `/bin/cp $from $to`;
Saying just "whoami" or "cp" runs the risk of accidentally running a command other than what you intended, if the user's path changes - which is a security vulnerability that a malicious attacker could attempt to exploit.
Your example's bad because there are perl builtins to do that which are portable and usually more efficient than the backtick alternative.
They should be used only when there's no Perl builtin (or module) alternative. This is both for backticks and system() calls. Backticks are intended for capturing output of the executed command.
Backticks are only supposed to be used when you want to capture output. Using them here "looks silly." It's going to clue anyone looking at your code into the fact that you aren't very familiar with Perl.
Use backticks if you want to capture output.
Use system if you want to run a command. One advantage you'll gain is the ability to check the return status.
Use modules where possible for portability. In this case, File::Copy fits the bill.
In general, it's best to use system instead of backticks because:
system encourages the caller to check the return code of the command.
system allows "indirect object" notation, which is more secure and adds flexibility.
Backticks are culturally tied to shell scripting, which might not be common among readers of the code.
Backticks use minimal syntax for what can be a heavy command.
One reason users might be temped to use backticks instead of system is to hide STDOUT from the user. This is more easily and flexibly accomplished by redirecting the STDOUT stream:
my $cmd = 'command > /dev/null';
system($cmd) == 0 or die "system $cmd failed: $?"
Further, getting rid of STDERR is easily accomplished:
my $cmd = 'command 2> error_file.txt > /dev/null';
In situations where it makes sense to use backticks, I prefer to use the qx{} in order to emphasize that there is a heavy-weight command occurring.
On the other hand, having Another Way to Do It can really help. Sometimes you just need to see what a command prints to STDOUT. Backticks, when used as in shell scripts are just the right tool for the job.
Perl has a split personality. On the one hand it is a great scripting language that can replace the use of a shell. In this kind of one-off I-watching-the-outcome use, backticks are convenient.
When used a programming language, backticks are to be avoided. This is a lack of error
checking and, if the separate program backticks execute can be avoided, efficiency is
gained.
Aside from the above, the system function should be used when the command's output is not being used.
Backticks are for amateurs. The bullet-proof solution is a "Safe Pipe Open" (see "man perlipc"). You exec your command in another process, which allows you to first futz with STDERR, setuid, etc. Advantages: it does not rely on the shell to parse #ARGV, unlike open("$cmd $args|"), which is unreliable. You can redirect STDERR and change user priviliges without changing the behavior of your main program. This is more verbose than backticks but you can wrap it in your own function like run_cmd($cmd,#args);
sub run_cmd {
my $cmd = shift #_;
my #args = #_;
my $fh; # file handle
my $pid = open($fh, '-|');
defined($pid) or die "Could not fork";
if ($pid == 0) {
open STDERR, '>/dev/null';
# setuid() if necessary
exec ($cmd, #args) or exit 1;
}
wait; # may want to time out here?
if ($? >> 8) { die "Error running $cmd: [$?]"; }
while (<$fh>) {
# Have fun with the output of $cmd
}
close $fh;
}