What's the difference between open and sysopen in Perl? - perl

It seems both do the same thing, huh?
Can someone show me an example where they do different job?

sysopen is a thin wrapper around the open(2) kernel system call (the arguments correspond directly), whereas open is a higher-level wrapper which enables you to do redirections, piping, etc.
Unless you are working with a specific device that requires some special flags to be passed at open(2) time, for ordinary files on disk you should be fine with open.

To quote perlopentut:
If you want the convenience of the shell, then Perl's open is
definitely the way to go. On the other hand, if you want finer
precision than C's simplistic fopen(3S) provides you should look to
Perl's sysopen, which is a direct hook into the open(2) system call.
That does mean it's a bit more involved, but that's the price of
precision.
Since Perl is written in C, both methods likely end up making the open(2) system call. The difference is that open() in Perl has some niceties built in that make opening, piping and redirection very easy. At the same time, though, open() takes away some flexibility. It has none of the Fcntl functionality available in sysopen(), nor does it have the masking functionality.
Most situations just need open().

Related

Executing system commands safely while coding in Perl

Should one really use external commands while coding in Perl? I see several disadvantages of it. It's not system independent plus security risks might also be there. What do you think? If there is no way and you have to use the shell commands from Perl then what is the safest way to execute that particular command (like checking pid, uid etc)?
It depends on how hard it is going to be to replicate the functionality in Perl. If I needed to run the m4 macro processor on something, I'd not think of trying to replicate that functionality in Perl myself, and since there's no module on http://search.cpan.org/ that looks suitable, it would appear others agree with me. In that case, then, using the external program is sensible. On the other hand, if I needed to read the contents of a directory, then the combination of readdir() et al plus stat() or lstat() inside Perl is more sensible than futzing with the output of ls.
If you need to execute commands, think very carefully about how you invoke them. In particular, you probably want to avoid the shell interpreting the arguments, so use the array form of system (see also exec), etc, rather than a single string for the command plus arguments (which means the shell is used to process the command line).
Executing external commands can be expensive simply because it involves forking new process and watching for its output if you need it.
Probably more importantly, should external process fail for any reason, it may be difficult to understand what happened by means of your script. Worse still, surprisingly often external process can be stuck forever, so will be your script. You can use special tricks like opening pipe and watching for output in loop, but this itself is error-prone.
Perl is very capable of doing many things. So, if you stick to using only Perl native constructs and modules to accomplish your tasks, not only it will be faster because you never fork, but it will be more reliable and easier to catch errors by looking at native Perl objects and structures returned by library routines. And of course, it will be automatically portable to different platforms.
If your script runs under elevated permissions (like root or under sudo), you should be very careful as to what external programs you execute. One of the simple ways to ensure basic security is to always specify commands by full name, like /usr/bin/grep (but still think twice and just do grep by Perl itself!). However, even this may not be enough if attacker is using LD_PRELOAD mechanism to inject rogue shared libraries.
If you are willing to go very secure, it is suggested to use tainted check by using -T flag like this:
#!/usr/bin/perl -T
Taint flag will be also enabled by Perl automatically if your script was determined to have different real and effective user or group ids.
Tainted mode will severely limit your ability to do many things (like system() call) without Perl complaining - see more at http://perldoc.perl.org/perlsec.html#Taint-mode, but it will give you much higher security confidence.
Should one really use external commands while coding in Perl?
There's no single answer to this question. It all depends on what you are doing within the wide range of potential uses of Perl.
Are you using Perl as a glorified shell script on your local machine, or just trying to find a quick-and-dirty solution to your problem? In that case, it makes a lot of sense to run system commands if that is the easiest way to accomplish your task. Security and speed are not that important; what matters is the ability to code quickly.
On the other hand, are you writing a production program? In that case, you want secure, portable, efficient code. It is often preferable to write the functionality in Perl (or use a module), rather than calling an external program. At least, you should think hard about the benefits and drawbacks.

In Perl scripts, should we use shell commands or call Perl functions that imitate shell operations?

I want to know about the best practices here. Suppose I want to get the content of some line of a file. I can use a one-line shell command to get my answer, or write a subroutine, as shown in the code below.
A text file named some_text:
She laughed. Then both continued eating in silence, like strangers,
but after dinner they walked side by side; and there sprang up
between them the light jesting conversation of people who are free
and satisfied, to whom it does not matter where they go or what
they talk about.
Code to get content of line 5 of the file
#!perl
use warnings;
use strict;
my $file = "some_text";
my $lnum = 5;
my $shellcmd = "awk 'NR==$lnum' $file";
print qx($shellcmd);
print getSrcLine($file, $lnum);
sub getSrcLine {
my($file, $lnum) = #_;
open FILE, $file or die "$!";
my #ray = <FILE>;
return $ray[$lnum-1];
}
I ask this because I see a lot of Perl scripts where at some point, a shell command was called, while at some later point, the same task was done by a call to a (library or handwritten) function, for example, rm -rf versus File::Path::rmtree. I just want to make it consistent.
What is the recommended thing to do?
If there's a Perl function for the operation, Perl thinks you should use its version. However, you give an example of a Perl module providing a pure Perl way to do it. That's much different. There's no single answer (as in most things), so you have to decide for yourself what to do:
Does the pure Perl approach do it correctly? For example, File::Copy has some limitations because it makes some awkward decisions for the user, so many people think it's broken. See, for instance, File::Copy versus cp/mv.
Does pure Perl approach do it in an acceptable time? Sometimes the external program is orders of magnitude faster. Sometimes it's a lot slower.
External commands usually are portable within a family of systems (e.g. all linux-like systems) but probably not across families (e.g. Windows and linux). Your tolerance for that might affect your answer. Even if you think you are running the same command, the different flavors of unix-like systems might have different switches for the operations.
Passing complicated arguments—spaces, quotes, and special characters—to external commands can make you cry. You have to do a lot of fiddly work to make sure you're handling arguments correctly. Perl subroutines don't care though.
You have to pay much more attention to what you are doing when you are using the external command. If you just call rm, Perl is going to search through your PATH and use the first thing called rm. That doesn't mean it's the program you think it is. I write about this quite a bit in the "Secure Programming Techniques" in Mastering Perl.
If the pure Perl approach requires a module, especially if that module has many complicated dependencies, you might be in for dependency or distribution hell down the road.
Personally, I start with the pure Perl approach until it doesn't work for the situation.
For your particular examples, I'd use Perl. Shelling out to awk, which is a proto-Perl, is just odd. You should be able to do everything awk does right it Perl. If you have an awk program, you can convert it to Perl with the a2p program:
NR==5
a2p turns that into (modulo some setup bits at the start):
while (<>) {
print $_ if $. == 5;
}
Notice that it still scans the entire file even though you have the fifth line. However, you can use the translated program as a start:
while (<>) {
if( $. == 5 ) {
print;
last;
}
}
I don't think you should shell out to some other program to avoid that Perl code.
To remove a directory tree, I like File::Path. It has some dependencies, but they are all in the Perl Standard Library. There's very little pain, if any, associated with that module. I'd use it until I ran into a problem where it didn't work.
If you want your app to be portable to non-unix systems, then definitely code everything in Perl.
If not, it's really up to you... creating a new process is slower, but if it's not important for the task then it doesn't matter. Personally I would pick the solution which I can quicker implement.
It seems to me that code that works should be the first priority. Yours fails if the file name has a space in it, for example.
Using the shell makes it harder to code correctly since your program needs to properly generate another program to be run by sh. (This problem goes away if you use the multi-arg version of system to avoid the shell.)
Furthermore, using external tools can make it hard to handle errors. You didn't even attempt to do so!
On the flip side, there are multiple reasons for using external tools. For example, Perl doesn't provide as good an file copy utility as cp; using the sort tool allows you to sort arbitrary large files with limited RAM; etc.

Why is there so much "magic" in Perl?

Looking through the perlsub and perlop manpages I've noticed that there are many references to "magic" and "magical" there (just search any of them for "magic"). I wonder why is Perl so rich in them.
Some examples:
print ++($foo = 'zz') # prints 'aaa'
printf "%d: %s", $! = 1, $! # prints '1: Operation not permitted'
while (my $line = <FH>) { ... } # $line is tested for definedness, not truth
use warnings; print "0 but true" + 1 # "0 but true" is a valid number!
When a Perl feature is described as "magic":
It means that that feature is
implemented by NBA star Magic Johnson.
Whenever Perl executes "magic", it is
actually sending an RPC call to a
remote receiver implanted in Magic
himself. He computes the answer, and
then sends a return message. The use
of Mr. Johnson for all the hard parts
of Perl provides a great abstraction
layer and simplifies porting to new
platforms. It's way easier than, say,
the Apache Portable Runtime.
Source: perrin on Perl Monks
It's official! Perl is more magical.
Hits from the following Google searches:
25 site:ruby-doc.org magic
36 site:docs.python.org magic
497 site:perldoc.perl.org magic
Magic, in Perl parlance is simply the word given to attributes applied to variables / functions that allow an extension of their functionality. Some of this functionality is available directly from Perl, and some requires the use of the C api.
A perfect example of magic is the tie interface which allows you to define your own implementation of a variable. Every operation that can be done to a variable (fetching or storing a value for instance) is exposed for reimplementation, allowing for elegant and logical syntactic constructs like a hash with values stored on disk, which are transparently loaded and saved behind the scenes.
Magic can also refer to the special ways that certain builtins can behave, such as how the first argument to map or grep can either be a block or a bare expression:
my #squares = map {$_**2} 1 .. 10;
my #roots = map sqrt, 1 .. 10;
which is not a behavior available to user defined subroutines.
Many other features of Perl, such as operator overloading or variables that can return different values when used with numeric or string operators are implemented with magic. Context could be seen as magic as well.
In a nutshell, magic is any time that a Perl construct behaves differently than a naive interpretation would suggest, an exception to the rule. Magic is of course very powerful, and should not be wielded without great care. Magic Johnson is of course involved in the execution of all magic (see FM's answer), but that is beyond the scope of this explaination.
I wonder why is Perl so rich in them.
To make things easy.
You'll find that most "magic" in Perl is to simplify the syntax for common tasks.
Because perl always Does What I Mean for some values of always.
I think (opinion more than fact) that this has to do with the organic growth viewpoint that Perl's creator Larry Wall has with the Perl language. Python is a study in the opposite approach, whose style often makes Perl hackers cringe at the perception of being forced to conform to a stylistic regime.
Some of it has to do with Perl being designed to be "efficient" at writing quick scripts to do Perl*-ish* tasks, in both wall clock time, and in keystrokes. Some of it has to do with the TMTOWTDI mantra of Perl and its followers.
Programmers tend to be opinionated about Perl's frequent usage of "magic", for some it is an eye-straining visual cacophony of chaos and disrespect for orderliness (which harkens back to the days of computer Priesthood in white lab coats behind a glass window), for others it is a shining example of getting things done efficiently, if not always obviously to the novice or outsider.
Perl's design philosophy is that simple things must be simple. This sounds good,and to some extent it is. However, there's a tradeoff involved: Making every simple thing a one-liner results in tons of special case hacks to save a few lines of code. Different people have different preferences regarding making simple operations within a language simple versus making the language specification simple. Perl is at one extreme. Java is at the other, at least among languages that people actually use. Python and C# are somewhere in between.

Why are Perl source filters bad and when is it OK to use them?

It is "common knowledge" that source filters are bad and should not be used in production code.
When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.
Why are source filters bad?
When is it OK to use a source filter?
Why source filters are bad:
Nothing but perl can parse Perl. (Source filters are fragile.)
When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).
When they're okay:
You're experimenting.
You're writing throw-away code.
Your name is Damian and you must be allowed to program in latin.
You're programming in Perl 6.
Only perl can parse Perl (see this example):
#result = (dothis $foo, $bar);
# Which of the following is it equivalent to?
#result = (dothis($foo), $bar);
#result = dothis($foo, $bar);
This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.
After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.
I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:
$ perl -MSmart::Comments test.pl
so as to avoid any chance that it might remain enabled in production code.
See also: Perl Cannot Be Parsed: A Formal Proof
I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.
Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)
It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.
For example, when you declare a method in MooseX::Declare like this:
method frob ($bubble, $bobble does coerce) {
... # complicated code
}
The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.
The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.
In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.
Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.
In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)
There is a nice example here that shows in what trouble you can get with source filters.
http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/
They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.

Why shouldn't I use shell tools in Perl code?

It is generally advised not to use additional linux tools in a Perl code;
e.g if someone intends to print the last line of a text file he can:
$last_line = `tail -1 $file` ;
or otherwise, open the file and read it line by line
open(INFO,$file);
while(<INFO>) {
$last_line = $_ if eof;
}
What are the pitfalls of using the previous and why should I avoid using shell tools in my code?
thanx,
Efficiency - you don't have to spawn a new process
Portability - you don't have to worry about an executable not existing, accepting different switches, or having different output
Ease of use - you don't have to parse the output, the results are already in a usable form
Error handling - you have finer-grained control over errors and what to do about them in Perl.
It's better to keep all the action in Perl because it's faster and because it's more secure. It's faster because you're not spawning a new process, and it's more secure because you don't have to worry about shell meta character trickery.
For example, in your first case if $file contained "afilename ; rm -rf ~" you would be a very unhappy camper.
P.S. The best all-Perlway to do the tail is to use File::ReadBackwards
One of the primary reasons (besides portability) for not executing shell commands is that it introduces overhead by spawning another process. That's why much of the same functionality is available via CPAN in Perl modules.
One reason is that your Perl code might be running in an environment where there is no shell tool called 'tail'.
It's a personal call depending on the project:
Is it going to be always used in shell environments with tail?
Do you care about only using pure Perl code?
Using tail? Fine. But that's really a special case, since it's so easy to use and since it is so trivial.
The problem in general is not really efficiency or portability, that is largely irrelevant; the issue is ease of use. To run an external utility, you have to find out what arguments it accepts, write code to transform your program's data structures to that format, quote them properly, build the command line, and run the application. Then, you might have to feed it data and read data from it (involving complexity like an event loop, worrying about deadlocking, etc.), and finally interpret the return value. (UNIX processes consider "0" true and anything else false, but Perl assumes the opposite. foo() and die is hard to read.) This is a lot of work to do, and that's why people avoid it. It's much easier to create an instance of a class and call methods on it to get the data you need.
(You can abstract away processes this way; see Crypt::GpgME for example. It handles the complexity associated with invoking gpg, which would normally involve creating multiple filehandles other than STDOUT, STDIN, and STDERR, among other things.)
The main reason I see for doing it all in Perl would be for robustness. Your use of tail will fail if the filename has shell metacharacters or spaces or doesn't exist or isn't accessible. From Perl, characters in the filename aren't an issue, and you can distinguish between errors in accessing the file. Sometimes being robust is more important than speedy coding and sometimes it's not.