How do I "untaint" a variable? - perl

upto my knowledge once a variable is tainted, Perl won't allow to use it in a system(), exec(), piped open, eval(), backtick command, or any function that affects something outside the program (such as unlink). So whats the process to untaint it?

Use a regular expression on the tainted variable to pull out the "safe" values:
Sometimes you have just to clear your data's taintedness. Values may be untainted by using them as keys in a hash; otherwise the only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression match. Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were doing when you wrote the pattern.
Don't ignore this warning though:
That means using a bit of thought--don't just blindly untaint anything, or you defeat the entire mechanism. It's better to verify that the variable has only good characters (for certain values of "good") rather than checking whether it has any bad characters. That's because it's far too easy to miss bad characters that you never thought of.
Perlsec: Laundering and Detecting Tainted Data

use Untaint:
DESCRIPTION
This module is used to launder data which has been tainted by using
the -T switch to be in taint mode. This can be used for CGI scripts
as well as command line scripts. The module will untaint scalars,
arrays, and hashes. When laundering an array, only array elements
which are tainted will be laundered.
SYNOPSIS
use Untaint;
my $pattern = qr(^k\w+);
my $foo = $ARGV[0];
# Untaint a scalar
if (is_tainted($foo)) {
print "\$foo is tainted. Attempting to launder\n";
$foo = untaint($pattern, $foo);
}else{
print "\$foo is not tainted!!\n";
}

Related

Simple perl file copy methods without using File::Copy

I have very little perl experience and have been trying a few methods on OS X before I attempt to use Macperl on a more difficult to access OS 9 with very limited memory.
I have been trying simple file copy methods without using File::Copy.
Both of the following appear to work:
open R,"<old";
open W,">new";
print W <R>;
open $r,"<old";
open $w,">new";
while (<$r>) { print $w $_ }
Why can't I use $r and $w with the first method?
How does the first method work without a 'while'?
Is there a better way to do this?
Is there a better way to do this?
There sure is... File::Copy is a core module (no installation requred), so there's little reason to avoid using it.
use File::Copy;
copy('old', 'new');
Alternatively, you can use a system call to the underlying OS, in your case, OS-X
system('cp', 'old', 'new');
UPDATE Oops, you're on OS9, so I'm not sure what system calls are available.
You can use your first method with lexical file handles, but you need to disambiguate a little.
open $r, '<', 'old';
open $w, '>', 'new';
print {$w} <$r>;
Bare in mind this is unidiomatic code, and if you just want to create a direct copy, the first method is preferable (EDIT If your memory constraints allow for it).
Perl operators and functions can return different things depending on what their context expects.
The first method works because the print function creates what is called a list context for the <> operator - then thee operator will "slurp in" the entire file.
In the second example, the <> operator is called in the condition of the loop, which creates a scalar context, returning one line at a time (some asterisks here, but that's another story.)
Here is some explanation about contexts: http://perlmaven.com/scalar-and-list-context-in-perl.
And, both methods should work with the R and W filehandles (which are old fashioned Perl filehandles that are different from regular variables), and with the $r/$w notation that actually denotes variables which hold a filehandle reference. The difference is subtle but in most everyday use cases these can be used interchangeably. Have you tried using $ variables in the first example?
In addition to the Hellmar Becker's answer:
The print $w <$r>; does not work (gives a syntax error) because if the FILEHANDLE argument is a variable, perl tries to interpret the print's argument list beginning ($w <$r) as an operator (see the NOTE in http://perldoc.perl.org/functions/print.html). To disambigue put parentheses around the <$r>:
print $w (<$r>);

Perl - Two questions regarding proper syntax for dereferencing

as a newbie I am trying to explore perl data structures using this material from atlanta perl mongers, avaliable here Perl Data Structures
Here is the sample code that I've writen, 01.pl is the same as 02.pl but 01.pl contains additional two pragmas: use strict; use warnings;.
#!/usr/bin/perl
my %name = (name=>"Linus", forename=>"Torvalds");
my #system = qw(Linux FreeBSD Solaris NetBSD);
sub passStructure{
my ($arg1,$arg2)=#_;
if (ref($arg1) eq "HASH"){
&printHash($arg1);
}
elsif (ref($arg1) eq "ARRAY"){
&printArray($arg1);
}
if (ref($arg2) eq "HASH"){
&printHash($arg2);
}
elsif (ref($arg2) eq "ARRAY"){
&printArray($arg2);
}
}
sub printArray{
my $aref = $_[0];
print "#{$aref}\n";
print "#{$aref}->[0]\n";
print "$$aref[0]\n";
print "$aref->[0]\n";
}
sub printHash{
my $href = $_[0];
print "%{$href}\n";
print "%{$href}->{'name'}\n";
print "$$href{'name'}\n";
print "$href->{'name'}\n";
}
&passStructure(\#system,\%name);
There are several points mentioned in above document that I misunderstood:
1st
Page 44 mentions that those two syntax constructions: "$$href{'name'}" and "$$aref[0]" shouldn't never ever been used for accessing values. Why ? Seems in my code they are working fine (see bellow), moreover perl is complaining about using #{$aref}->[0] as deprecated, so which one is correct ?
2nd
Page 45 mentions that without "use strict" and using "$href{'SomeKey'}" when "$href->{'SomeKey'}" should be used, the %href is created implictly. So if I understand it well, both following scripts should print "Exists"
[pista#HP-PC temp]$ perl -ale 'my %ref=(SomeKey=>'SomeVal'); print $ref{'SomeKey'}; print "Exists\n" if exists $ref{'SomeKey'};'
SomeVal
Exists
[pista#HP-PC temp]$ perl -ale ' print $ref{'SomeKey'}; print "Exists\n" if exists $ref{'SomeKey'};'
but second wont, why ?
Output of two beginning mentioned scripts:
[pista#HP-PC temp]$ perl 01.pl
Using an array as a reference is deprecated at 01.pl line 32.
Linux FreeBSD Solaris NetBSD
Linux
Linux
Linux
%{HASH(0x1c33ec0)}
%{HASH(0x1c33ec0)}->{'name'}
Linus
Linus
[pista#HP-PC temp]$ perl 02.pl
Using an array as a reference is deprecated at 02.pl line 32.
Linux FreeBSD Solaris NetBSD
Linux
Linux
Linux
%{HASH(0x774e60)}
%{HASH(0x774e60)}->{'name'}
Linus
Linus
Many people think $$aref[0] is ugly and $aref->[0] not ugly. Others disagree; there is nothing wrong with the former form.
#{$aref}->[0], on the other hand, is a mistake that happens to work but is deprecated and may not continue to.
You may want to read http://perlmonks.org/?node=References+quick+reference
A package variable %href is created simply by mentioning such a hash without use strict "vars" in effect, for instance by leaving the -> out of $href->{'SomeKey'}. That doesn't mean that particular key is created.
Update: looking at the Perl Best Practices reference (a book that inspired much more slavish adoption and less actual thought than the author intended), it is recommending the -> form specifically to avoid the possibility of leaving off a sigil, leading to the problem mentioned on p45.
Perl has normal datatypes, and references to data types. It is important that you are aware of the differerence between them, both in their meaning, and in their syntax.
Type |Normal Access | Reference Access | Debatable Reference Access
=======+==============+==================+===========================
Scalar | $scalar | $$scalar_ref |
Array | $array[0] | $arrayref->[0] | $$arrayref[0]
Hash | $hash{key} | $hashref->{key} | $$hashref{key}
Code | code() | $coderef->() | &$coderef()
The reason why accessing hashrefs or arrayrefs with the $$foo[0] syntax can be considered bad is that (1) the double sigil looks confusingly like a scalar ref access, and (2) this syntax hides the fact that references are used. The dereferencing arrow -> is clear in its intent. I covered the reason why using the & sigil is bad in this answer.
The #{$aref}->[0] is extremely wrong, because you are dereferencing a reference to an array (which cannot, by definition, be a reference itself), and then dereferencing the first element of that array with the arrow. See the above table for the right syntax.
Interpolating hashes into strings seldom makes sense. The stringification of a hash denotes the number of filled and available buckets, and so can tell you about the load. This isn't useful in most cases. Also, not treating the % character as special in strings allows you to use printf…
Another interesting thing about Perl data structures is to know when a new entry in a hash or array is created. In general, accessing a value does not create a slot in that hash or array, except when you are using the value as reference.
my %foo;
$foo{bar}; # access, nothing happens
say "created at access" if exists $foo{bar};
$foo{bar}[0]; # usage as arrayref
say "created at ref usage" if exists $foo{bar};
Output: created at ref usage.
Actually, the arrayref spings into place, because you can use undef values as references in certain cases. This arrayref then populates the slot in the hash.
Without use strict 'refs', the variable (but not a slot in that variable) springs into place, because global variables are just entries in a hash that represents the namespace. $foo{bar} is the same as $main::foo{bar} is the same as $main::{foo}{bar}.
The major advantage of the $arg->[0] form over the $$arg[0] form, is that it's much clearer with the first type as to what is going on... $arg is an ARRAYREF and you're accessing the 0th element of the array it refers to.
At first reading, the second form could be interpreted as ${$arg}[0] (dereferencing an ARRAYREF) or ${$arg[0]} (dereferencing whatever the first element of #arg is.
Naturally, only one interpretation is correct, but we all have those days (or nights) where we're looking at code and we can't quite remember what order operators and other syntactic devices work in. Also, the confusion would compound if there were additional levels of dereferencing.
Defensive programmers will tend to err towards efforts to make their intentions explicit and I would argue that $arg->[0] is a much more explicit representation of the intention of that code.
As to the automatic creation of hashes... it's only the hash that would be created (so that the Perl interpreter and check to see if the key exists). The key itself is not created (naturally... you wouldn't want to create a key that you're checking for... but you may need to create the bucket that would hold that key, if the bucket doesn't exist. The process is called autovivification and you can read more about it here.
I believe you should be accessing the array as: #{$aref}[0] or $aref->[0].
a print statement does not instantiate an object. What is meant by the implicit creation is you don't need to predefine the variable before assigning to it. Since print does not assign the variable is not created.

Evaluating escape sequences in perl

I'm reading strings from a file. Those strings contain escape sequences which I would like to have evaluated before processing them further. So I do:
$t = eval("\"$t\"");
which works fine. But I'm having doubt about the performance. If eval is forking a perl process each time, it will be a performance killer. Another way I considered to do the job were regex, where I have found related questions in SO.
My question: is there a better, more efficient way to do it?
EDIT: before calling eval in my example $t is containing \064\065\x20a\n. It is evaluated to 45 a<LF>.
It's not quite clear what the strings in the file look like and what you do to them before passing off to eval. There's something missing in the explanation.
If you simply want to undo C-style escaping (as also used in Perl), use Encode::Escape:
use Encode qw(decode);
use Encode::Escape qw();
my $string_with_unescaped_literals = decode 'ascii-escape', $string_with_escaped_literals;
If you have placeholders in the file which look like Perl variables that you want to fill with values, then you are abusing eval as a poor man's templating engine. Use a real one that does not have the dangerous side effect of running arbitrary code.
$string =~ s/\\([rnt'"\\])/"qq|\\$1|"/gee
string eval can solve the problem too, but it brings up a host of security and maintenance issues, like # in string
oh gah don't use eval for this, thats dangerous if someone provides it with input like "system('sync;reboot');"..
But, you could do something like this:
#!/usr/bin/perl
$string = 'foo\"ba\\\'r';
printf("%s\n", $string);
$string =~ s/\\([\"\'])/$1/g;
printf("%s\n", $string);

perl encapsulate single variable in double quotes

In Perl, is there any reason to encapsulate a single variable in double quotes (no concatenation) ?
I often find this in the source of the program I am working on (writen 10 years ago by people that don't work here anymore):
my $sql_host = "something";
my $sql_user = "somethingelse";
# a few lines down
my $db = sub_for_sql_conection("$sql_host", "$sql_user", "$sql_pass", "$sql_db");
As far as I know there is no reason to do this. When I work in an old script I usualy remove the quotes so my editor colors them as variables not as strings.
I think they saw this somewhere and copied the style without understanding why it is so. Am I missing something ?
Thank you.
All this does is explicitly stringify the variables. In 99.9% of cases, it is a newbie error of some sort.
There are things that may happen as a side effect of this calling style:
my $foo = "1234";
sub bar { $_[0] =~ s/2/two/ }
print "Foo is $foo\n";
bar( "$foo" );
print "Foo is $foo\n";
bar( $foo );
print "Foo is $foo\n";
Here, stringification created a copy and passed that to the subroutine, circumventing Perl's pass by reference semantics. It's generally considered to be bad manners to munge calling variables, so you are probably okay.
You can also stringify an object or other value here. For example, undef stringifies to the empty string. Objects may specify arbitrary code to run when stringified. It is possible to have dual valued scalars that have distinct numerical and string values. This is a way to specify that you want the string form.
There is also one deep spooky thing that could be going on. If you are working with XS code that looks at the flags that are set on scalar arguments to a function, stringifying the scalar is a straight forward way to say to perl, "Make me a nice clean new string value" with only stringy flags and no numeric flags.
I am sure there are other odd exceptions to the 99.9% rule. These are a few. Before removing the quotes, take a second to check for weird crap like this. If you do happen upon a legit usage, please add a comment that identifies the quotes as a workable kludge, and give their reason for existence.
In this case the double quotes are unnecessary. Moreover, using them is inefficient as this causes the original strings to be copied.
However, sometimes you may want to use this style to "stringify" an object. For example, URI ojects support stringification:
my $uri = URI->new("http://www.perl.com");
my $str = "$uri";
I don't know why, but it's a pattern commonly used by newcomers to Perl. It's usually a waste (as it is in the snippet you posted), but I can think of two uses.
It has the effect of creating a new string with the same value as the original, and that could be useful in very rare circumstances.
In the following example, an explicit copy is done to protect $x from modification by the sub because the sub modifies its argument.
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f($x);
say $x;
'
Abc
Abc
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f("$x");
say $x;
'
Abc
abc
By virtue of creating a copy of the string, it stringifies objects. This could be useful when dealing with code that alters its behaviour based on whether its argument is a reference or not.
In the following example, explicit stringification is done because require handles references in #INC differently than strings.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib $lib;
use DBI;
say "ok";
'
Can't locate object method "INC" via package "Path::Class::Dir" at -e line 4.
BEGIN failed--compilation aborted at -e line 4.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib "$lib";
use DBI;
say "ok";
'
ok
In your case quotes are completely useless. We can even says that it is wrong because this is not idiomatic, as others wrote.
However quoting a variable may sometime be necessary: this explicitely triggers stringification of the value of the variable. Stringification may give a different result for some values if thoses values are dual vars or if they are blessed values with overloaded stringification.
Here is an example with dual vars:
use 5.010;
use strict;
use Scalar::Util 'dualvar';
my $x = dualvar 1, "2";
say 0+$x;
say 0+"$x";
Output:
1
2
My theory has always been that it's people coming over from other languages with bad habits. It's not that they're thinking "I will use double quotes all the time", but that they're just not thinking!
I'll be honest and say that I used to fall into this trap because I came to Perl from Java, so the muscle memory was there, and just kept firing.
PerlCritic finally got me out of the habit!
It definitely makes your code more efficient, but if you're not thinking about whether or not you want your strings interpolated, you are very likely to make silly mistakes, so I'd go further and say that it's dangerous.

When is the right time (and the wrong time) to use backticks?

Many beginning programmers write code like this:
sub copy_file ($$) {
my $from = shift;
my $to = shift;
`cp $from $to`;
}
Is this bad, and why? Should backticks ever be used? If so, how?
A few people have already mentioned that you should only use backticks when:
You need to capture (or supress) the output.
There exists no built-in function or Perl module to do the same task, or you have a good reason not to use the module or built-in.
You sanitise your input.
You check the return value.
Unfortunately, things like checking the return value properly can be quite challenging. Did it die to a signal? Did it run to completion, but return a funny exit status? The standard ways of trying to interpret $? are just awful.
I'd recommend using the IPC::System::Simple module's capture() and system() functions rather than backticks. The capture() function works just like backticks, except that:
It provides detailed diagnostics if the command doesn't start, is killed by a signal, or returns an unexpected exit value.
It provides detailed diagnostics if passed tainted data.
It provides an easy mechanism for specifying acceptable exit values.
It allows you to call backticks without the shell, if you want to.
It provides reliable mechanisms for avoiding the shell, even if you use a single argument.
The commands also work consistently across operating systems and Perl versions, unlike Perl's built-in system() which may not check for tainted data when called with multiple arguments on older versions of Perl (eg, 5.6.0 with multiple arguments), or which may call the shell anyway under Windows.
As an example, the following code snippet will save the results of a call to perldoc into a scalar, avoids the shell, and throws an exception if the page cannot be found (since perldoc returns 1).
#!/usr/bin/perl -w
use strict;
use IPC::System::Simple qw(capture);
# Make sure we're called with command-line arguments.
#ARGV or die "Usage: $0 arguments\n";
my $documentation = capture('perldoc', #ARGV);
IPC::System::Simple is pure Perl, works on 5.6.0 and above, and doesn't have any dependencies that wouldn't normally come with your Perl distribution. (On Windows it depends upon a Win32:: module that comes with both ActiveState and Strawberry Perl).
Disclaimer: I'm the author of IPC::System::Simple, so I may show some bias.
The rule is simple: never use backticks if you can find a built-in to do the same job, or if their is a robust module on the CPAN which will do it for you. Backticks often rely on unportable code and even if you untaint the variables, you can still open yourself up to a lot of security holes.
Never use backticks with user data unless you have very tightly specified what is allowed (not what is disallowed -- you'll miss things)! This is very, very dangerous.
Backticks should be used if and only if you need to capture the output of a command. Otherwise, system() should be used. And, of course, if there's a Perl function or CPAN module that does the job, this should be used instead of either.
In either case, two things are strongly encouraged:
First, sanitize all inputs: Use Taint mode (-T) if the code is exposed to possible untrusted input. Even if it's not, make sure to handle (or prevent) funky characters like space or the three kinds of quote.
Second, check the return code to make sure the command succeeded. Here is an example of how to do so:
my $cmd = "./do_something.sh foo bar";
my $output = `$cmd`;
if ($?) {
die "Error running [$cmd]";
}
Another way to capture stdout(in addition to pid and exit code) is to use IPC::Open3 possibily negating the use of both system and backticks.
Use backticks when you want to collect the output from the command.
Otherwise system() is a better choice, especially if you don't need to invoke a shell to handle metacharacters or command parsing. You can avoid that by passing a list to system(), eg system('cp', 'foo', 'bar') (however you'd probably do better to use a module for that particular example :))
In Perl, there's always more than one way to do anything you want. The primary point of backticks is to get the standard output of the shell command into a Perl variable. (In your example, anything that the cp command prints will be returned to the caller.) The downside of using backticks in your example is you don't check the shell command's return value; cp could fail and you wouldn't notice. You can use this with the special Perl variable $?. When I want to execute a shell command, I tend to use system:
system("cp $from $to") == 0
or die "Unable to copy $from to $to!";
(Also observe that this will fail on filenames with embedded spaces, but I presume that's not the point of the question.)
Here's a contrived example of where backticks might be useful:
my $user = `whoami`;
chomp $user;
print "Hello, $user!\n";
For more complicated cases, you can also use open as a pipe:
open WHO, "who|"
or die "who failed";
while(<WHO>) {
# Do something with each line
}
close WHO;
From the "perlop" manpage:
That doesn't mean you should go out of
your way to avoid backticks when
they're the right way to get something
done. Perl was made to be a glue
language, and one of the things it
glues together is commands. Just
understand what you're getting
yourself into.
For the case you are showing using the File::Copy module is probably best. However, to answer your question, whenever I need to run a system command I typically rely on IPC::Run3. It provides a lot of functionality such as collecting the return code and the standard and error output.
Whatever you do, as well as sanitising input and checking the return value of your code, make sure you call any external programs with their explicit, full path. e.g. say
my $user = `/bin/whoami`;
or
my $result = `/bin/cp $from $to`;
Saying just "whoami" or "cp" runs the risk of accidentally running a command other than what you intended, if the user's path changes - which is a security vulnerability that a malicious attacker could attempt to exploit.
Your example's bad because there are perl builtins to do that which are portable and usually more efficient than the backtick alternative.
They should be used only when there's no Perl builtin (or module) alternative. This is both for backticks and system() calls. Backticks are intended for capturing output of the executed command.
Backticks are only supposed to be used when you want to capture output. Using them here "looks silly." It's going to clue anyone looking at your code into the fact that you aren't very familiar with Perl.
Use backticks if you want to capture output.
Use system if you want to run a command. One advantage you'll gain is the ability to check the return status.
Use modules where possible for portability. In this case, File::Copy fits the bill.
In general, it's best to use system instead of backticks because:
system encourages the caller to check the return code of the command.
system allows "indirect object" notation, which is more secure and adds flexibility.
Backticks are culturally tied to shell scripting, which might not be common among readers of the code.
Backticks use minimal syntax for what can be a heavy command.
One reason users might be temped to use backticks instead of system is to hide STDOUT from the user. This is more easily and flexibly accomplished by redirecting the STDOUT stream:
my $cmd = 'command > /dev/null';
system($cmd) == 0 or die "system $cmd failed: $?"
Further, getting rid of STDERR is easily accomplished:
my $cmd = 'command 2> error_file.txt > /dev/null';
In situations where it makes sense to use backticks, I prefer to use the qx{} in order to emphasize that there is a heavy-weight command occurring.
On the other hand, having Another Way to Do It can really help. Sometimes you just need to see what a command prints to STDOUT. Backticks, when used as in shell scripts are just the right tool for the job.
Perl has a split personality. On the one hand it is a great scripting language that can replace the use of a shell. In this kind of one-off I-watching-the-outcome use, backticks are convenient.
When used a programming language, backticks are to be avoided. This is a lack of error
checking and, if the separate program backticks execute can be avoided, efficiency is
gained.
Aside from the above, the system function should be used when the command's output is not being used.
Backticks are for amateurs. The bullet-proof solution is a "Safe Pipe Open" (see "man perlipc"). You exec your command in another process, which allows you to first futz with STDERR, setuid, etc. Advantages: it does not rely on the shell to parse #ARGV, unlike open("$cmd $args|"), which is unreliable. You can redirect STDERR and change user priviliges without changing the behavior of your main program. This is more verbose than backticks but you can wrap it in your own function like run_cmd($cmd,#args);
sub run_cmd {
my $cmd = shift #_;
my #args = #_;
my $fh; # file handle
my $pid = open($fh, '-|');
defined($pid) or die "Could not fork";
if ($pid == 0) {
open STDERR, '>/dev/null';
# setuid() if necessary
exec ($cmd, #args) or exit 1;
}
wait; # may want to time out here?
if ($? >> 8) { die "Error running $cmd: [$?]"; }
while (<$fh>) {
# Have fun with the output of $cmd
}
close $fh;
}