Is there a way to re-use text in perlpod documentation? - perl

I have a bunch of repeated text in my perlpod documentation. I could of course create a separate section and reference it, but I was wondering if there's a way to enter the text once somewhere and have it inserted in multiple places?
I don't think this is possible, but thought I'd ask to make sure I'm not missing anything.
Or - perhaps there's a better perl documentation technique?

As you've realised, Pod is (deliberately) a very simple markup language. It doesn't have any particularly complicated feature and one of the things it is missing is a way to embed text from another source.
I'd suggest moving the repeated text to its own section and linking to that section (using L<...>) whenever you want to reference that text.

While Pod markup is meant to be very basic, we don't have to literally type it all by hand.
Text for documentation can be processed like any other text in Perl, using its extensive set of tools, to create a string with pod-formatted text. That string can then be displayed using core Pod::Usage, via a file (that can be removed or kept), or directly by using core Pod::Simple.
Display the Pod string by writing a file
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # convenience, to "spew" a file
my $man = shift;
show_pod() if $man;
say "done $$";
sub show_pod {
require Pod::Usage; Pod::Usage->import('pod2usage');
my $pod_text = make_pod();
my $pod_file = $0 =~ s/\.pl$/.pod/r;
path($pod_file)->spew($pod_text);
pod2usage( -verbose => 2, -input => $pod_file ); # exits by default
}
sub make_pod {
my $repeated = q(lotsa text that gets repeated);
my $doc_text = <<EOD;
=head1 NAME
$0 - demo perldoc
=head1 SYNOP...
Text that is also used elsewhere: $repeated...
=cut
EOD
return $doc_text;
}
The .pod file can be removed: add -exitval => 'NOEXIT' to pod2usage arguments so that it doesn't exit and then remove the file. Or, rather, keep it available for other tools and uses.
I've split the job into two subs as a hint, since it can be useful to be able to only write a .pod file, which can then also be used and viewed in other ways and formats.† This isn't needed to just show docs, and all Pod business can be done in one sub.
Dispaly the Pod string directly
If there is no desire to keep a .pod file around then we don't have to make it
sub show_pod { # The rest of the program same as above
my $pod_text = make_pod();
require Pod::Simple::Text;
Pod::Simple::Text->filter( \$pod_text ); # doesn't exit so add that
}
The ->filter call is a shortcut for creating an object, setting a filehandle, and processing the contents. See docs for a whole lot more.
Either of these approaches provides you with far more flexibility.
Also, while one can indeed solve the repeated text by referencing a separate section with that text, that of course doesn't allow us to use variables or do any Perl processing -- what is all available when you write a Pod string, which is then passed over to perlpod to dispaly.
Note   Use of .pod files affects perldoc. Thanks to #briandfoy for a comment.
For smaller documentation, that has no particular benefit of using separate .pod files, I recommend the second approach, as hinted in the answer. It only differs in how the docs text is organized in the file while still allowing one to process it as any text is normally processed with Perl.
For use cases where .pod files are of good value I still find it an acceptable trade-off but that is my call. Be aware that perldoc is affected and assess how much that matters in your project.
† I use a system like this in a large project, with .pod files in their directory. There is also a simple separate script for overall documentation management, that invokes individual programs with options to write/update their .pods, in HTML with CPAN's style-file, for the main web page. Any program can also simply display its docs in a desired format.

Related

Devel::Cover HTML output

I am playing around with Devel::Cover to see how well our test suite is actually testing our codebase. I run all of our tests using -MDevel::Cover nothing seems to fail or crash, but the HTML output of the coverage table has entries like these for all of our Modules:
The number of BEGINs listed seems to match the number of use Module::X statements in the source file, but really clutters the HTML output. Is there any way to disable this feature? I don't see any mention of it in the tutorial or the Github issue tracker.
The reason for this is that "use" is "exactly equivalent to"
BEGIN { require Module; Module->import( LIST ); }
(See perldoc -f use.)
And then "BEGIN" is basically the same as "sub BEGIN" - you can put the "sub" there if you want to. See perldoc perlmod.
So what you really do have is a subroutine, and that is what Devel::Cover is reporting.
Like many parts of Devel::Cover, the details of perl's implementation, or at least the semantics, are leaking through. There is no way to stop this, though I would be amenable to changes in this area.

Cleanup huge Perl Codebase

I am currently working on a roughly 15 years old web application.
It contains mainly CGI perl scripts with HTML::Template templates.
It has over 12 000 files and roughly 260 MB of total code. I estimate that no more than 1500 perl scripts are needed and I want to get rid of all the unused code.
There are practically no tests written for the code.
My questions are:
Are you aware of any CPAN module that can help me get a list of only used and required modules?
What would be your approach if you'd want to get rid of all the extra code?
I was thinking at the following approaches:
try to override the use and require perl builtins with ones that output the loaded file name in a specific location
override the warnings and/or strict modules import function and output the file name in the specific location
study the Devel::Cover perl module and take the same approach and analyze the code when doing manual testing instead of automated tests
replace the perl executable with a custom one, which will log each name of file it reads (I don't know how to do that yet)
some creative use of lsof (?!?)
Devel::Modlist may give you what you need, but I have never used it.
The few times I have needed to do somehing like this I have opted for the more brute force approach of inspecting %INC at the end the program.
END {
open my $log_fh, ...;
print $log_fh "$_\n" for sort keys %INC;
}
As a first approximation, I would simply run
egrep -r '\<(use|require)\>' /path/to/source/*
Then spend a couple of days cleaning up the output from that. That will give you a list of all of the modules used or required.
You might also be able to play around with #INC to exclude certain library paths.
If you're trying to determine execution path, you might be able to run the code through the debugger with 'trace' (i.e. 't' in the debugger) turned on, then redirect the output to a text file for further analysis. I know that this is difficult when running CGI...
Assuming the relevant timestamps are turned on, you could check access times on the various script files - that should rule out any top-level script files that aren't being used.
Might be worth adding some instrumentation to CGI.pm to log the current script-name ($0) to see what's happening.

An automated way to do string extraction for Perl/Mason i18n?

I'm currently working on internationalizing a very large Perl/Mason web application, as a team of one (does that make this a death march??). The application is nearing 20 years old, and is written in a relatively old-school Perl style; it doesn't use Moose or another OO module. I'm currently planning on using Locale::Maketext::Gettext to do message lookups, and using GNU Gettext catalog files.
I've been trying to develop some tools to aid in string extraction from our bigass codebase. Currently, all I have is a relatively simple Perl script to parse through source looking for string literals, prompt the user with some context and whether or not the string should be marked for translation, and mark it if so.
There's way too much noise in terms of strings I need to mark versus strings I can ignore. A lot of strings in the source aren't user-facing, such as hash keys, or type comparisons like
if (ref($db_obj) eq 'A::Type::Of::Db::Module')
I do apply some heuristics to each proposed string to see whether I can ignore it off the bat (ex. I ignore strings that are used for hash lookups, since 99% of the time in our codebase these aren't user facing). However, despite all that, around 90% of the strings my program shows me are ones I don't care about.
Is there a better way I could help automate my task of string extraction (i.e. something more intelligent than grabbing every string literal from the source)? Are there any commercial programs that do this that could handle both Perl and Mason source?
ALSO, I had a (rather silly) idea for a superior tool, whose workflow I put below. Would it be worth the effort implementing something like this (which would probably take care of 80% of the work very quickly), or should I just submit to an arduous, annoying, manual string extraction process?
Start by extracting EVERY string literal from the source, and putting it into a Gettext PO file.
Then, write a Mason plugin to parse the HTML for each page being served by the application, with the goal of noting strings that the user is seeing.
Use the hell out of the application and try to cover all use cases, building up a store of user facing strings.
Given this store of strings the user saw, do fuzzy matches against strings in the catalog file, and keep track of catalog entries that have a match from the UI.
At the end, anything in the catalog file that didn't get matched would likely not be user facing, so delete those from the catalog.
There are no Perl tools I know of which will intelligently extract strings which might need internationalization vs ones that will not. You're supposed to mark them in the code as you write them, but as you said that wasn't done.
You can use PPI to do the string extraction intelligently.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use PPI;
my $doc = PPI::Document->new(shift);
# See PPI::Node for docs on find
my $strings = $doc->find(sub {
my($top, $element) = #_;
print ref $element, "\n";
# Look for any quoted string or here doc.
# Does not pick up unquoted hash keys.
return $element->isa("PPI::Token::Quote") ||
$element->isa("PPI::Token::HereDoc");
});
# Display the content and location.
for my $string (#$strings) {
my($line, $row, $col) = #{ $string->location };
print "Found string at line $line starting at character $col.\n";
printf "String content: '%s'\n", string_content($string);
}
# *sigh* PPI::Token::HereDoc doesn't have a string method
sub string_content {
my $string = shift;
return $string->isa("PPI::Token::Quote") ? $string->string :
$string->isa("PPI::Token::HereDoc") ? $string->heredoc :
croak "$string is neither a here-doc nor a quote";
}
You can do more sophisticated examination of the tokens surrounding the strings to determine if it's something significant. See PPI::Element and PPI::Node for more details. Or you can examine the content of the string to determine if it's significant.
I can't go much further because "significant" is up to you.
Our Source Code Search Engine is normally used to efficiently search large code bases, using indexes constructed from the lexemes of the languages it knows. That list of languages is pretty broad, including Java, C#, COBOL and ... Perl. The lexeme extractors are language precise (because they are "stolen" from our DMS Software Reengineering Toolkit, a language-agnostic program transformation system, where precision is fundamental).
Given an indexed code base, one can then enter queries to find arbitrary sequences of lexemes in spite of language-specific white space; one can log the hits of such queries and their locations.
The extremely short query:
S
to the Search Engine finds all lexical elements which are classified as strings (keywords, variable names, comments are all ignored; just strings!). (Normally people write more complex queries with regular expression constraints, such as S=*Hello to find strings that end with "Hello")
The relevance here is that the Source Code Search Engine has precise knowledge of lexical syntax of strings in Perl (including specifically elements of interpolated strings and all the wacky escape sequences). So the query above will find all strings in Perl; with logging on, you get all the strings and their locations logged.
This stunt actually works for any langauge the Search Engine understands, so it is a rather general way to extract the strings for such internationalization tasks.

Removing comments using Perl

Something I keep doing is removing comments from a file as I process it. I was was wondering if there a module to do this.
Sort of code I keep writing time and again is
while(<>) {
s/#.*// ;
next if /^ \s+ $/x ;
**** do something useful here ****
}
Edit Just to clarify, the input is not Perl. It is a text file of my own making that might have data I want to process in some way or other. I want to beable to place comments that are ignored by my programs
Unless this is a learning experience I suggest you use Regexp::Common::comment instead of writing your own regular expressions.
It supports quite a few languages.
The question does not make clear what type of file it is. Are we dealing with perl source files? If so, your approach is not entirely correct - see gbacon's comment. Perl source files are notoriously difficult (impossible?) to parse with regex. In that case, or if you need to deal with several types of files, use Regexp::Common::comment as suggested by Niffle. Otherwise, if you think your regex logic is correct for your scenario, then I personally prefer to write it explicitly, it's just a pair of strighforward lines, there is little to be gained by using a module (and you introduce a dependency).

What are the best-practices for implementing a CLI tool in Perl?

I am implementing a CLI tool using Perl.
What are the best-practices we can follow here?
As a preface, I spent 3 years engineering and implementing a pretty complicated command line toolset in Perl for a major financial company. The ideas below are basically part of our team's design guidelines.
User Interface
Command line option: allow as many as possible have default values.
NO positional parameters for any command that has more than 2 options.
Have readable options names. If length of command line is a concern for non-interactive calling (e.g. some un-named legacy shells have short limits on command lines), provide short aliases - GetOpt::Long allows that easily.
At the very least, print all options' default values in '-help' message.
Better yet, print all the options' "current" values (e.g. if a parameter and a value are supplied along with "-help", the help message will print parameter's value from command line). That way, people can assemble command line string for complicated command and verify it by appending "-help", before actually running.
Follow Unix standard convention of exiting with non-zero return code if program terminated with errors.
If your program may produce useful (e.g. worth capturing/grepping/whatnot) output, make sure any error/diagnostic messages go to STDERR so they are easily separable.
Ideally, allow the user to specify input/output files via command line parameter, instead of forcing "<" / ">" redirects - this allows MUCH simpler life to people who need to build complicated pipes using your command. Ditto for error messages - have logfile option.
If a command has side effect, having a "whatif/no_post" option is usually a Very Good Idea.
Implementation
As noted previously, don't re-invent the wheel. Use standard command line parameter handling modules - MooseX::Getopt, or Getopt::Long
For Getopt::Long, assign all the parameters to a single hash as opposed to individual variables. Many useful patterns include passing that CLI args hash to object constructors.
Make sure your error messages are clear and informative... E.g. include "$!" in any IO-related error messages. It's worth expending extra 1 minute and 2 lines in your code to have a separate "file not found" vs. "file not readable" errors, as opposed to spending 30 minutes in production emergency because a non-readable file error was misdiagnosed by Production Operations as "No input file" - this is a real life example.
Not really CLI-specific, but validate all parameters, ideally right after getting them.
CLI doesn't allow for a "front-end" validation like webapps do, so be super extra vigilant.
As discussed above, modularize business logic. Among other reasons already listed, the amount of times I had to re-implement an existing CLI tool as a web app is vast - and not that difficult if the logic is already a properly designed perm module.
Interesting links
CLI Design Patterns - I think this is ESR's
I will try to add more bullets as I recall them.
Use POD to document your tool, follow the guidelines of manpages; include at least the following sections: NAME, SYNOPSIS, DESCRIPTION, AUTHOR. Once you have proper POD you can generate a man page with pod2man, view the documentation at the console with perldoc your-script.pl.
Use a module that handles command line options for you. I really like using Getopt::Long in conjunction with Pod::Usage this way invoking --help will display a nice help message.
Make sure that your scripts returns a proper exit value if it was successful or not.
Here's a small skeleton of a script that does all of these:
#!/usr/bin/perl
=head1 NAME
simplee - simple program
=head1 SYNOPSIS
simple [OPTION]... FILE...
-v, --verbose use verbose mode
--help print this help message
Where I<FILE> is a file name.
Examples:
simple /etc/passwd /dev/null
=head1 DESCRIPTION
This is as simple program.
=head1 AUTHOR
Me.
=cut
use strict;
use warnings;
use Getopt::Long qw(:config auto_help);
use Pod::Usage;
exit main();
sub main {
# Argument parsing
my $verbose;
GetOptions(
'verbose' => \$verbose,
) or pod2usage(1);
pod2usage(1) unless #ARGV;
my (#files) = #ARGV;
foreach my $file (#files) {
if (-e $file) {
printf "File $file exists\n" if $verbose;
}
else {
print "File $file doesn't exist\n";
}
}
return 0;
}
Some lessons I've learned:
1) Always use Getopt::Long
2) Provide help on usage via --help, ideally with examples of common scenarios. It helps people don't know or have forgotten how to use the tool. (I.e., you in six months).
3) Unless it's pretty obvious to the user as why, don't go for long period (>5s) without output to the user. Something like 'print "Row $row...\n" unless ($row % 1000)' goes a long way.
4) For long running operations, allow the user to recover if possible. It really sucks to get through 500k of a million, die, and start over again.
5) Separate the logic of what you're doing into modules and leave the actual .pl script as barebones as possible; parsing options, display help, invoking basic methods, etc. You're inevitably going to find something you want to reuse, and this makes it a heck of a lot easier.
The most important thing is to have standard options.
Don't try to be clever, be simply consistent with already existing tools.
How to achieve this is also important, but only comes second.
Actually, this is quite generic to all CLI interfaces.
There are a couple of modules on CPAN that will make writing CLI programs a lot easier:
App::CLI
App::Cmd
If you app is Moose based also have a look at MooseX::Getopt and MooseX::Runnable
The following points aren't specific to Perl but I've found many Perl CL scripts to be deficient in these areas:
Use common command line options. To show the version number implement -v or --version not --ver. For recursive processing -r (or perhaps -R although in my Gnu/Linux experience -r is more common) not --rec. People will use your script if they can remember the parameters. It's easy to learn a new command if you can remember "it works like grep" or some other familiar utility.
Many command line tools process "things" (files or directories) within the "current directory". While this can be convenient make sure you also add command line options for explicitly identifying the files or directories to process. This makes it easier to put your utility in a pipeline without developers having to issue a bunch of cd commands and remember which directory they're in.
You should use Perl modules to make your code reusable and easy to understand.
should have a look at Perl best practices