What are the best-practices for implementing a CLI tool in Perl? - perl

I am implementing a CLI tool using Perl.
What are the best-practices we can follow here?

As a preface, I spent 3 years engineering and implementing a pretty complicated command line toolset in Perl for a major financial company. The ideas below are basically part of our team's design guidelines.
User Interface
Command line option: allow as many as possible have default values.
NO positional parameters for any command that has more than 2 options.
Have readable options names. If length of command line is a concern for non-interactive calling (e.g. some un-named legacy shells have short limits on command lines), provide short aliases - GetOpt::Long allows that easily.
At the very least, print all options' default values in '-help' message.
Better yet, print all the options' "current" values (e.g. if a parameter and a value are supplied along with "-help", the help message will print parameter's value from command line). That way, people can assemble command line string for complicated command and verify it by appending "-help", before actually running.
Follow Unix standard convention of exiting with non-zero return code if program terminated with errors.
If your program may produce useful (e.g. worth capturing/grepping/whatnot) output, make sure any error/diagnostic messages go to STDERR so they are easily separable.
Ideally, allow the user to specify input/output files via command line parameter, instead of forcing "<" / ">" redirects - this allows MUCH simpler life to people who need to build complicated pipes using your command. Ditto for error messages - have logfile option.
If a command has side effect, having a "whatif/no_post" option is usually a Very Good Idea.
Implementation
As noted previously, don't re-invent the wheel. Use standard command line parameter handling modules - MooseX::Getopt, or Getopt::Long
For Getopt::Long, assign all the parameters to a single hash as opposed to individual variables. Many useful patterns include passing that CLI args hash to object constructors.
Make sure your error messages are clear and informative... E.g. include "$!" in any IO-related error messages. It's worth expending extra 1 minute and 2 lines in your code to have a separate "file not found" vs. "file not readable" errors, as opposed to spending 30 minutes in production emergency because a non-readable file error was misdiagnosed by Production Operations as "No input file" - this is a real life example.
Not really CLI-specific, but validate all parameters, ideally right after getting them.
CLI doesn't allow for a "front-end" validation like webapps do, so be super extra vigilant.
As discussed above, modularize business logic. Among other reasons already listed, the amount of times I had to re-implement an existing CLI tool as a web app is vast - and not that difficult if the logic is already a properly designed perm module.
Interesting links
CLI Design Patterns - I think this is ESR's
I will try to add more bullets as I recall them.

Use POD to document your tool, follow the guidelines of manpages; include at least the following sections: NAME, SYNOPSIS, DESCRIPTION, AUTHOR. Once you have proper POD you can generate a man page with pod2man, view the documentation at the console with perldoc your-script.pl.
Use a module that handles command line options for you. I really like using Getopt::Long in conjunction with Pod::Usage this way invoking --help will display a nice help message.
Make sure that your scripts returns a proper exit value if it was successful or not.
Here's a small skeleton of a script that does all of these:
#!/usr/bin/perl
=head1 NAME
simplee - simple program
=head1 SYNOPSIS
simple [OPTION]... FILE...
-v, --verbose use verbose mode
--help print this help message
Where I<FILE> is a file name.
Examples:
simple /etc/passwd /dev/null
=head1 DESCRIPTION
This is as simple program.
=head1 AUTHOR
Me.
=cut
use strict;
use warnings;
use Getopt::Long qw(:config auto_help);
use Pod::Usage;
exit main();
sub main {
# Argument parsing
my $verbose;
GetOptions(
'verbose' => \$verbose,
) or pod2usage(1);
pod2usage(1) unless #ARGV;
my (#files) = #ARGV;
foreach my $file (#files) {
if (-e $file) {
printf "File $file exists\n" if $verbose;
}
else {
print "File $file doesn't exist\n";
}
}
return 0;
}

Some lessons I've learned:
1) Always use Getopt::Long
2) Provide help on usage via --help, ideally with examples of common scenarios. It helps people don't know or have forgotten how to use the tool. (I.e., you in six months).
3) Unless it's pretty obvious to the user as why, don't go for long period (>5s) without output to the user. Something like 'print "Row $row...\n" unless ($row % 1000)' goes a long way.
4) For long running operations, allow the user to recover if possible. It really sucks to get through 500k of a million, die, and start over again.
5) Separate the logic of what you're doing into modules and leave the actual .pl script as barebones as possible; parsing options, display help, invoking basic methods, etc. You're inevitably going to find something you want to reuse, and this makes it a heck of a lot easier.

The most important thing is to have standard options.
Don't try to be clever, be simply consistent with already existing tools.
How to achieve this is also important, but only comes second.
Actually, this is quite generic to all CLI interfaces.

There are a couple of modules on CPAN that will make writing CLI programs a lot easier:
App::CLI
App::Cmd
If you app is Moose based also have a look at MooseX::Getopt and MooseX::Runnable

The following points aren't specific to Perl but I've found many Perl CL scripts to be deficient in these areas:
Use common command line options. To show the version number implement -v or --version not --ver. For recursive processing -r (or perhaps -R although in my Gnu/Linux experience -r is more common) not --rec. People will use your script if they can remember the parameters. It's easy to learn a new command if you can remember "it works like grep" or some other familiar utility.
Many command line tools process "things" (files or directories) within the "current directory". While this can be convenient make sure you also add command line options for explicitly identifying the files or directories to process. This makes it easier to put your utility in a pipeline without developers having to issue a bunch of cd commands and remember which directory they're in.

You should use Perl modules to make your code reusable and easy to understand.
should have a look at Perl best practices

Related

Cleanup huge Perl Codebase

I am currently working on a roughly 15 years old web application.
It contains mainly CGI perl scripts with HTML::Template templates.
It has over 12 000 files and roughly 260 MB of total code. I estimate that no more than 1500 perl scripts are needed and I want to get rid of all the unused code.
There are practically no tests written for the code.
My questions are:
Are you aware of any CPAN module that can help me get a list of only used and required modules?
What would be your approach if you'd want to get rid of all the extra code?
I was thinking at the following approaches:
try to override the use and require perl builtins with ones that output the loaded file name in a specific location
override the warnings and/or strict modules import function and output the file name in the specific location
study the Devel::Cover perl module and take the same approach and analyze the code when doing manual testing instead of automated tests
replace the perl executable with a custom one, which will log each name of file it reads (I don't know how to do that yet)
some creative use of lsof (?!?)
Devel::Modlist may give you what you need, but I have never used it.
The few times I have needed to do somehing like this I have opted for the more brute force approach of inspecting %INC at the end the program.
END {
open my $log_fh, ...;
print $log_fh "$_\n" for sort keys %INC;
}
As a first approximation, I would simply run
egrep -r '\<(use|require)\>' /path/to/source/*
Then spend a couple of days cleaning up the output from that. That will give you a list of all of the modules used or required.
You might also be able to play around with #INC to exclude certain library paths.
If you're trying to determine execution path, you might be able to run the code through the debugger with 'trace' (i.e. 't' in the debugger) turned on, then redirect the output to a text file for further analysis. I know that this is difficult when running CGI...
Assuming the relevant timestamps are turned on, you could check access times on the various script files - that should rule out any top-level script files that aren't being used.
Might be worth adding some instrumentation to CGI.pm to log the current script-name ($0) to see what's happening.

In Perl scripts, should we use shell commands or call Perl functions that imitate shell operations?

I want to know about the best practices here. Suppose I want to get the content of some line of a file. I can use a one-line shell command to get my answer, or write a subroutine, as shown in the code below.
A text file named some_text:
She laughed. Then both continued eating in silence, like strangers,
but after dinner they walked side by side; and there sprang up
between them the light jesting conversation of people who are free
and satisfied, to whom it does not matter where they go or what
they talk about.
Code to get content of line 5 of the file
#!perl
use warnings;
use strict;
my $file = "some_text";
my $lnum = 5;
my $shellcmd = "awk 'NR==$lnum' $file";
print qx($shellcmd);
print getSrcLine($file, $lnum);
sub getSrcLine {
my($file, $lnum) = #_;
open FILE, $file or die "$!";
my #ray = <FILE>;
return $ray[$lnum-1];
}
I ask this because I see a lot of Perl scripts where at some point, a shell command was called, while at some later point, the same task was done by a call to a (library or handwritten) function, for example, rm -rf versus File::Path::rmtree. I just want to make it consistent.
What is the recommended thing to do?
If there's a Perl function for the operation, Perl thinks you should use its version. However, you give an example of a Perl module providing a pure Perl way to do it. That's much different. There's no single answer (as in most things), so you have to decide for yourself what to do:
Does the pure Perl approach do it correctly? For example, File::Copy has some limitations because it makes some awkward decisions for the user, so many people think it's broken. See, for instance, File::Copy versus cp/mv.
Does pure Perl approach do it in an acceptable time? Sometimes the external program is orders of magnitude faster. Sometimes it's a lot slower.
External commands usually are portable within a family of systems (e.g. all linux-like systems) but probably not across families (e.g. Windows and linux). Your tolerance for that might affect your answer. Even if you think you are running the same command, the different flavors of unix-like systems might have different switches for the operations.
Passing complicated arguments—spaces, quotes, and special characters—to external commands can make you cry. You have to do a lot of fiddly work to make sure you're handling arguments correctly. Perl subroutines don't care though.
You have to pay much more attention to what you are doing when you are using the external command. If you just call rm, Perl is going to search through your PATH and use the first thing called rm. That doesn't mean it's the program you think it is. I write about this quite a bit in the "Secure Programming Techniques" in Mastering Perl.
If the pure Perl approach requires a module, especially if that module has many complicated dependencies, you might be in for dependency or distribution hell down the road.
Personally, I start with the pure Perl approach until it doesn't work for the situation.
For your particular examples, I'd use Perl. Shelling out to awk, which is a proto-Perl, is just odd. You should be able to do everything awk does right it Perl. If you have an awk program, you can convert it to Perl with the a2p program:
NR==5
a2p turns that into (modulo some setup bits at the start):
while (<>) {
print $_ if $. == 5;
}
Notice that it still scans the entire file even though you have the fifth line. However, you can use the translated program as a start:
while (<>) {
if( $. == 5 ) {
print;
last;
}
}
I don't think you should shell out to some other program to avoid that Perl code.
To remove a directory tree, I like File::Path. It has some dependencies, but they are all in the Perl Standard Library. There's very little pain, if any, associated with that module. I'd use it until I ran into a problem where it didn't work.
If you want your app to be portable to non-unix systems, then definitely code everything in Perl.
If not, it's really up to you... creating a new process is slower, but if it's not important for the task then it doesn't matter. Personally I would pick the solution which I can quicker implement.
It seems to me that code that works should be the first priority. Yours fails if the file name has a space in it, for example.
Using the shell makes it harder to code correctly since your program needs to properly generate another program to be run by sh. (This problem goes away if you use the multi-arg version of system to avoid the shell.)
Furthermore, using external tools can make it hard to handle errors. You didn't even attempt to do so!
On the flip side, there are multiple reasons for using external tools. For example, Perl doesn't provide as good an file copy utility as cp; using the sort tool allows you to sort arbitrary large files with limited RAM; etc.

Enable global warnings

I have to optimize an intranet written in Perl (about 3000 files). The first thing I want to do is enable warnings "-w" or "use warnings;" so I can get rid of all those errors, then try to implement "use strict;".
Is there a way of telling Perl to use warnings all the time (like the settings in php.ini for PHP), without the need to modify each script to add "-w" to it's first line?
I even thought to make an alias for /usr/bin/perl, or move it to another name and make a simple script instead of it just to add the -w flag (like a proxy).
How would you debug it?
Well…
You could set the PERL5OPT envariable to hold -w. See the perlrun manpage for details. I hope you’ll consider tainting, too, like -T or maybe -t, for security tracking.
But I don’t envy you. Retrofitting code developed without the benefit of use warnings and use strict is usually a royal PITA.
I have something of a standard boiler-plate I use to start new Perl programs. But I haven’t given any thought to one for CGI programs, which would likely benefit from some tweaks against that boiler-plate.
Retrofitting warnings and strict is hard. I don't recommend a Big Bang approach, setting warnings (let alone strictures) on everything. You will be inundated with warnings to the point of uselessness.
You start by enabling warnings on the modules used by the scripts (there are some, aren't there?), rather than applying warnings to everything. Get the core clean, then get to work on the periphery, one unit at a time. So, in fact, I'd recommend having a simple (Perl) script that simply finds a line that does not start with a hash and adds use warnings; (and maybe use strict; too, since you're going to be dealing with one script at a time), so you can do the renovations one script at a time.
In other words, you will probably be best off actually editing each file as you're about to renovate it.
I'd only use the blanket option to make a simple assessment of the scope of the problem: is it a complete and utter disaster, or merely a few peccadilloes in a few files. Sadly, if the code was developed without warnings and strict, it is more likely to be 'disaster' than 'minimal'.
You may find that your predecessors were prone to copy and paste and some erroneous idioms crop up repeatedly in copied code. Write a Perl script that fixes each one. I have a bunch of fix* scripts in my personal bin directory that deal with various changes - either fixing issues created by recalcitrant (or, more usually, simply long departed) colleagues or to accommodate my own changing standards.
You can set warnings and strictures for all Perl scripts by adding -Mwarnings -Mstrict to your PERL5OPT environment variable. See perlrun for details.

What is the Perl equivalent of the 'mv' system command?

I have the below "system command" for moving the file. Is it good to use File::Move command or the use the below one. In case if File::Move is better, how can I use it?
system("mv $LOGPATH/test/work/LOGFILE $LOGPATH/test/done/$LOGFILE") == 0
or WriteLog(ERROR, "mv command Failed $LOGPATH/test/work/$LOGFILE status was $?");
The function you are looking for is in File::Copy:
use File::Copy 'move';
move $x, $y or WriteLog ("move $x, $y failed: $!");
File::Copy is part of the standard distribution of Perl itself.
You can of course use a system ("mv ...") command to do the same thing. The advantages of File::Copy are:
It will work across different operating systems, which may be necessary if you're writing something for other people to use;
It doesn't expose you to security holes due to passing data to the shell, or portability bugs because the filepath has meta characters and spaces in it like /Library/Application Support/Hi I'm On A Mac/;
It is faster: see Schwern's paste. The cost of system swamps everything else. File::Copy::move and rename have about the same cost, so you may as well use the safer move which works across filesystems.
Here's one more thing to add to the mix -- if you're moving the file within
the same device (that is, there is no need to move the file contents, only
update its directory entry), you can simply use the builtin function
rename:
rename OLDNAME,NEWNAME
Changes the name of a file; an existing file NEWNAME will be
clobbered. Returns true for success, false otherwise.
Behavior of this function varies wildly depending on your
system implementation. For example, it will usually not work
across file system boundaries, even though the system mv
command sometimes compensates for this. Other restrictions
include whether it works on directories, open files, or pre-
existing files. Check perlport and either the rename(2)
manpage or equivalent system documentation for details.
For a platform independent "move" function look at the
File::Copy module.
To add to Kinopiko's otherwise excellent answer, one of the two main reasons to use File::Copy over the system mv command - aside from not-always-relevant-but-still-meaningful portability concern he noted - is the fact that calling a system command forks off at least one child process (and if you do any re-directs, two of them - mv and shell). This wastes a lot of system resources compared to File::Copy.

Why shouldn't I use shell tools in Perl code?

It is generally advised not to use additional linux tools in a Perl code;
e.g if someone intends to print the last line of a text file he can:
$last_line = `tail -1 $file` ;
or otherwise, open the file and read it line by line
open(INFO,$file);
while(<INFO>) {
$last_line = $_ if eof;
}
What are the pitfalls of using the previous and why should I avoid using shell tools in my code?
thanx,
Efficiency - you don't have to spawn a new process
Portability - you don't have to worry about an executable not existing, accepting different switches, or having different output
Ease of use - you don't have to parse the output, the results are already in a usable form
Error handling - you have finer-grained control over errors and what to do about them in Perl.
It's better to keep all the action in Perl because it's faster and because it's more secure. It's faster because you're not spawning a new process, and it's more secure because you don't have to worry about shell meta character trickery.
For example, in your first case if $file contained "afilename ; rm -rf ~" you would be a very unhappy camper.
P.S. The best all-Perlway to do the tail is to use File::ReadBackwards
One of the primary reasons (besides portability) for not executing shell commands is that it introduces overhead by spawning another process. That's why much of the same functionality is available via CPAN in Perl modules.
One reason is that your Perl code might be running in an environment where there is no shell tool called 'tail'.
It's a personal call depending on the project:
Is it going to be always used in shell environments with tail?
Do you care about only using pure Perl code?
Using tail? Fine. But that's really a special case, since it's so easy to use and since it is so trivial.
The problem in general is not really efficiency or portability, that is largely irrelevant; the issue is ease of use. To run an external utility, you have to find out what arguments it accepts, write code to transform your program's data structures to that format, quote them properly, build the command line, and run the application. Then, you might have to feed it data and read data from it (involving complexity like an event loop, worrying about deadlocking, etc.), and finally interpret the return value. (UNIX processes consider "0" true and anything else false, but Perl assumes the opposite. foo() and die is hard to read.) This is a lot of work to do, and that's why people avoid it. It's much easier to create an instance of a class and call methods on it to get the data you need.
(You can abstract away processes this way; see Crypt::GpgME for example. It handles the complexity associated with invoking gpg, which would normally involve creating multiple filehandles other than STDOUT, STDIN, and STDERR, among other things.)
The main reason I see for doing it all in Perl would be for robustness. Your use of tail will fail if the filename has shell metacharacters or spaces or doesn't exist or isn't accessible. From Perl, characters in the filename aren't an issue, and you can distinguish between errors in accessing the file. Sometimes being robust is more important than speedy coding and sometimes it's not.