Is there a tool for extracting all variable, module, and function names from a Perl module file? - perl

My apologies if this is a duplicate; I may not know the proper terms to search for.
I am tasked with analyzing a Perl module file (.pm) that is a fragment of a larger application. Is there a tool, app, or script that will simply go through the code and pull out all the variable names, module names, and function calls? Even better would be something that would identify whether it was declared within this file or is something external.
Does such a tool exist? I only get the one file, so this isn't something I can execute -- just some basic static analysis I guess.

Check out the new, but well recommended Class::Sniff.
From the docs:
use Class::Sniff;
my $sniff = Class::Sniff->new({class => 'Some::class'});
my $num_methods = $sniff->methods;
my $num_classes = $sniff->classes;
my #methods = $sniff->methods;
my #classes = $sniff->classes;
{
my $graph = $sniff->graph; # Graph::Easy
my $graphviz = $graph->as_graphviz();
open my $DOT, '|dot -Tpng -o graph.png' or die("Cannot open pipe to dot: $!");
print $DOT $graphviz;
}
print $sniff->to_string;
my #unreachable = $sniff->unreachable;
foreach my $method (#unreachable) {
print "$method\n";
}
This will get you most of the way there. Some variables, depending on scope, may not be available.

If I understand correctly, you are looking for a tool to go through Perl source code. I am going to suggest PPI.
Here is an example cobbled up from the docs:
#!/usr/bin/perl
use strict;
use warnings;
use PPI::Document;
use HTML::Template;
my $Module = PPI::Document->new( $INC{'HTML/Template.pm'} );
my $sub_nodes = $Module->find(
sub { $_[1]->isa('PPI::Statement::Sub') and $_[1]->name }
);
my #sub_names = map { $_->name } #$sub_nodes;
use Data::Dumper;
print Dumper \#sub_names;
Note that, this will output:
...
'new',
'new',
'new',
'output',
'new',
'new',
'new',
'new',
'new',
...
because multiple classes are defined in HTML/Template.pm. Clearly, a less naive approach would work with the PDOM tree in a hierarchical way.

Another CPAN tools available is Class::Inspector
use Class::Inspector;
# Is a class installed and/or loaded
Class::Inspector->installed( 'Foo::Class' );
Class::Inspector->loaded( 'Foo::Class' );
# Filename related information
Class::Inspector->filename( 'Foo::Class' );
Class::Inspector->resolved_filename( 'Foo::Class' );
# Get subroutine related information
Class::Inspector->functions( 'Foo::Class' );
Class::Inspector->function_refs( 'Foo::Class' );
Class::Inspector->function_exists( 'Foo::Class', 'bar' );
Class::Inspector->methods( 'Foo::Class', 'full', 'public' );
# Find all loaded subclasses or something
Class::Inspector->subclasses( 'Foo::Class' );
This will give you similar results to Class::Sniff; you may still have to do some processing on your own.

There are better answers to this question, but they aren't getting posted, so I'll claim the fastest gun in the West and go ahead and post a 'quick-fix'.
Such a tool exists, in fact, and is built into Perl. You can access the symbol table for any namespace by using a special hash variable. To access the main namespace (the default one):
for(keys %main::) { # alternatively %::
print "$_\n";
}
If your package is named My/Package.pm, and is thus in the namespace My::Package, you would change %main:: to %My::Package:: to achieve the same effect. See the perldoc perlmod entry on symbol tables - they explain it, and they list a few alternatives that may be better, or at least get you started on finding the right module for the job (that's the Perl motto - There's More Than One Module To Do It).

If you want to do it without executing any code that you are analyzing, it's fairly easy to do this with PPI. Check out my Module::Use::Extract; it's a short bit of code shows you how to extract any sort of element you want from PPI's PerlDOM.
If you want to do it with code that you have already compiled, the other suggestions in the answers are better.

I found a pretty good answer to what I was looking for in this column by Randal Schwartz. He demonstrated using the B::Xref module to extract exactly the information I was looking for. Just replacing the evaluated one-liner he used with the module's filename worked like a champ, and apparently B::Xref comes with ActiveState Perl, so I didn't need any additional modules.
perl -MO=Xref module.pm

Related

how to include pl files in batch pod conversion?

I use the module Pod::Simple::HTMLBatch to create the documentation of the software which i write.
The script used is taken from the documentation:
use Pod::Simple::HTMLBatch;
my $batchconv = Pod::Simple::HTMLBatch->new;
$batchconv->some_option( some_value );
$batchconv->some_other_option( some_other_value );
$batchconv->batch_convert( \#search_dirs, $output_dir );
The module just considers pm files. How is it possible to tell the module to create the documentation also for pl files?
I did not find an option in the documentation.
Pod::Simple::Search looks for files matching this regular expression:
m/^[-_a-zA-Z0-9]+\.(?:pod|pm|plx?)\z/is
Which should include any *.pl files. If it's not working for you, try turning on its laborious flag, which is somewhat more forgiving about file names, testing like so:
m/\.(pod|pm|plx?)\z/i || -x _ and -T _
The only way to enable the laborious search with Pod::Simple::HTMLBatch is to create a search subclass, like I did for Pod::Site:
package My::Pod::Search;
use parent 'Pod::Simple::Search';
sub new {
my $self = shift->SUPER::new(#_);
$self->laborious(1);
return $self;
}
Then tell HTMLBatch to use your subclass:
$batchconv->search_class('My::Pod::Search');
$batchconv->batch_convert( \#search_dirs, $output_dir );
Might be nice to update HTMLBatch to accept a search object in its constructor to eliminate this silly workaround, though.

Accessing subs from a require'd perl script

I'm going to import some perl code with the require statement. The code I'd like to import is in mylibA.pl:
#!/usr/bin/perl
package FOO::BAR;
sub routine {
print "A message!\n";
}
and mylibB.pl:
#!/usr/bin/perl
package FOO::BAZ;
sub routine {
print "Another message!\n";
}
Then I'm going to use it like this:
#!/usr/bin/perl
foreach my $lib (qw/ mylibA.pl mylibB.pl /){
require $lib;
print "Make a call to ${lib}'s &routine!\n";
}
Is there a way for my script to figure out the namespace that was pulled in with the require statement?
Wow. I have to say this is the one of the most interesting Perl questions I've seen in a while. On the surface this seems like a very simple request - get an included module's namespace, but there really is no way to do this. You can get it while in the package, but not from outside the package. I tried using EXPORT to send the local package name back to the caller script but that ended up going nowhere given the difference in how "use" and "require" work. A more module type of approach probably would have worked with a "use" statement, but the requirement that the required script be able to run by themselves prevented that approach. The only thing left to do was to directly pollute the caller's namespace and hope for the best (assume that the caller had no package namespace) - something that modules are designed to prevent.
BTW - I can't believe this actually works - in strict mode, no less.
caller.pl
#!/usr/bin/perl
use strict;
#package SomePackageName; #if you enable this then this will fail to work
our $ExportedPackageName;
print "Current package=".__PACKAGE__."\n";
foreach my $lib (qw/ mylibA.pl mylibB.pl /){
require $lib;
print "Make a call to ${lib}'s &routine!\n";
print "Package name exported=".$ExportedPackageName."\n";
$ExportedPackageName->routine;
} #end foreach
print "Normal Exit";
exit;
__END__
mylibA.pl
#!/usr/bin/perl
package FOO::BAR;
use strict;
#better hope the caller does not have a package namespace
$main::ExportedPackageName=__PACKAGE__;
sub routine {
print "A message from ".__PACKAGE__."!\n";
}
1;
mylibB.pl
#!/usr/bin/perl
package FOO::BAZ;
use strict;
#better hope the caller does not have a package namespace
$main::ExportedPackageName=__PACKAGE__;
sub routine {
print "Another message, this time from ".__PACKAGE__."!\n";
}
1;
Result:
c:\Perl>
c:\Perl>perl caller.pl
Current package=main
Make a call to mylibA.pl's &routine!
Package name exported=FOO::BAR
A message from FOO::BAR!
Make a call to mylibB.pl's &routine!
Package name exported=FOO::BAZ
Another message, this time from FOO::BAZ!
Normal Exit
Regarding the mostly academical problem of finding the package(s) in a perl source file:
You can try the CPAN module Module::Extract::Namespaces to get all packages within a perl file. It is using PPI and is thus not 100% perfect, but most of the time good enough:
perl -MModule::Extract::Namespaces -e 'warn join ",", Module::Extract::Namespaces->from_file(shift)' /path/to/foo.pm
But PPI can be slow for large files.
You can try to compare the active packages before and after the require. This is also not perfect, because if your perl library file loads additional modules then you cannot tell which is the package of the prinicipal file and what's loaded later. To get the list of packages you can use for example Devel::Symdump. Here's a sample script:
use Devel::Symdump;
my %before = map { ($_,1) } Devel::Symdump->rnew->packages;
require "/path/to/foo.pm";
my %after = map { ($_,1) } Devel::Symdump->rnew->packages;
delete $after{$_} for keys %before;
print join(",", keys %after), "\n";
You can also just parse the perl file for "package" declarations. Actually, that's what the PAUSE upload daemon is doing, so it's probably "good enough" for most cases. Look at the subroutine packages_per_pmfile in
https://github.com/andk/pause/blob/master/lib/PAUSE/pmfile.pm
There are two problems here:
How do I change the behaviour of a script when executed as a standalone and when used as a module?
How do I discover the package name of a piece of code I just compiled?
The general answer to question 2 is: You don't, as any compilation unit may contain an arbitrary number of packages.
Anyway, here are three possible solutions:
Name your modules so that you already know the name when you load it.
Have each module register itself at a central rendezvous point.
Like #1, but adds autodiscovery of your plugins.
The simplest solution is to put all of the API in an ordinary module, and put the standalone logic in a seperate script:
/the/location/
Module/
A.pm
B.pm
a-standalone.pl
b-standalone.pl
Where each standalone basically looks like
use Module::A;
Module::A->run();
If another script wants to reuse that code, it does
use lib "/the/location";
use Module::A;
...
If the loading happens on runtime, then Module::Runtime helps here:
use Module::Runtime 'use_module';
use lib "/the/location";
my $mod_a = use_module('Module::A');
$mod_a->run();
It isn't strictly necessary to place the contents of a-standalone.pl and Module/A.pm into separate files, although that is clearer. If you want to conditionally run code in a module only if it is used as a script, you can utilize the unless(caller) trick.
Of course all of this is tricksing: Here we determine the file name from the module name, not the other way round – which as I already mentioned we cannot do.
What we can do is have each module register itself at a certain predefined location, e.g. by
Rendezvous::Point->register(__FILE__ => __PACKAGE__);
Of course the standalone version has to shield against the possibility that there is no Rendezvous::Point, therefore:
if (my $register = Rendezvous::Point->can("register")) {
$register->(__FILE__ => __PACKAGE__);
}
Eh, this is silly and violates DRY. So let's create a Rendezvous::Point module that takes care of this:
In /the/location/Rendezvous/Point.pm:
package Rendezvous::Point;
use strict; use warnings;
my %modules_by_filename;
sub get {
my ($class, $name) = #_;
$modules_by_filename{$name};
}
sub register {
my ($file, $package) = #_;
$modules_by_filename{$file} = $package;
}
sub import {
my ($class) = #_;
$class->register(caller());
}
Now, use Rendezvous::Point; registers the calling package, and the module name can be retrived by the absolute path.
The script that wants to use the various modules now does:
use "/the/location";
use Rendezvous::Point (); # avoid registering ourself
my $prefix = "/the/location";
for my $filename (map "$prefix/$_", qw(Module/A.pm Module/B.pm)) {
require $filename;
my $module = Rendezvous::Point->get($filename)
// die "$filename didn't register itself at the Rendezvous::Point";
$module->run();
}
Then there are fully featured plugin systems like Module::Pluggable. This system works by looking at all paths were Perl modules may reside, and loads them if they have a certain prefix. A solution with that would look like:
/the/location/
MyClass.pm
MyClass/
Plugin/
A.pm
B.pm
a-standalone.pl
b-standalone.pl
Everything is just like with the first solution: Standalone scripts look like
use lib "/the/location/";
use MyClass::Plugin::A;
MyClass::Plugin::A->run;
But MyClass.pm looks like:
package MyClass;
use Module::Pluggable require => 1; # we can now query plugins like MyClass->plugins
sub run {
# Woo, magic! Works with inner packages as well!
for my $plugin (MyClass->plugins) {
$plugin->run();
}
}
Of course, this still requires a specific naming scheme, but it auto-discovers possible plugins.
As mentioned before it is not possible to look up the namespace of a 'required' package without extra I/O, guessing or assuming.
Like Rick said before, one have to intrude the namespace of the caller or better 'main'. I prefer to inject specific hooks within a BEGIN block of the 'required' package.
#VENDOR/App/SocketServer/Protocol/NTP.pm
package VENDOR::App::SocketServer::Protocol::NTP;
BEGIN {
no warnings;
*main::HANDLE_REQUEST = \&HANDLE_REQUEST;
}
sub HANDLE_REQUEST {
}
#VENDOR/App/SocketServer.pm
my $userPackage= $ARGV[0];
require $userPackage;
main::HANDLE_REQUEST();
Instead of *main:: you can get more specific with *main::HOOKS::HANDLE_REQUESTS i.e. This enables you to resolve all injected hooks easily within the caller by iterating over the HOOK's namespace portion.
foreach my $hooks( keys %main::HOOKS ) {
}

Extract and Format Information from TAP Archive

What I like to do:
I am using Rex to remotely call tests at servers. I remotely execute the tests with a call the the local prove. I want to gather all the information about the testruns at the different servers at one place. To achieve this I run the tests with prove -a (and maybe also with --merge for capturing STDERR) to create an archive (.tgz). I then download this archive again with Rex to the controlling server. I think this is quite a good plan so far...
My problem now is that I find a lot of hints on creating such a TAP-archive, but none of how I can actually read this archive. Sure, I could open and process it somehow with Archive::Tar or parse it manually with TAP::Parser as suggested by Schwern. But knowing that there are formatters like TAP::Formatter::HTML or TAP::Formatter::JUnit (e.g. for Jenkins) I think there must be a way of using those tools directly on a TAP-archive? When I look up the docs I only find hints on how to use this stuff with prove to format tests while running them. But I need to use this formatters on the archive, I have been running prove already remotely...
So far about the context. My question in short is: How can I use the Perl-TAP-Tools to format TAP coming from a TAP-archive produced by prove?
I am thankful for any little hints. Also if you see a problem in my approach in general.
Renée provided a working solution here: http://www.perl-community.de/bat/poard/thread/18420 (German)
use strict;
use warnings;
use TAP::Harness::Archive;
use TAP::Harness;
use TAP::Formatter::HTML;
my $formatter = TAP::Formatter::HTML->new;
my $harness = TAP::Harness->new({ formatter => $formatter });
$formatter->really_quiet(1);
$formatter->prepare;
my $session;
my $aggregator = TAP::Harness::Archive->aggregator_from_archive({
archive => '/must/be/the/complete/path/to/test.tar.gz',
parser_callbacks => {
ALL => sub {
$session->result( $_[0] );
},
},
made_parser_callback => sub {
$session = $formatter->open_test( $_[1], $_[0] );
}
});
$aggregator->start;
$aggregator->stop;
$formatter->summary($aggregator);
Tanks a lot! I hope this will help some others too. It seems like this knowledge is not very wide spread yet.
I have made a module to pack this solution in a nice interface: https://metacpan.org/module/Convert::TAP::Archive
So from now on you can just type this:
use Convert::TAP::Archive qw(convert_from_taparchive);
my $html = convert_from_taparchive(
'/must/be/the/complete/path/to/test.tar.gz',
'TAP::Formatter::HTML',
);
The problem with the output is mentioned in the docs. Please provide patches or comments if you know how to fix this (minor) issue. E.g. here: https://github.com/borisdaeppen/Convert-TAP-Archive
Renee pointed me to how Tapper makes it: https://metacpan.org/source/TAPPER/Tapper-TAP-Harness-4.1.1/lib/Tapper/TAP/Harness.pm#L273
Quite some effort to read an archive though...

Perl module loading - Safeguarding against: perhaps you forgot to load "Bla"?

When you run perl -e "Bla->new", you get this well-known error:
Can't locate object method "new" via package "Bla"
(perhaps you forgot to load "Bla"?)
Happened in a Perl server process the other day due to an oversight of mine. There are multiple scripts, and most of them have the proper use statements in place. But there was one script that was doing Bla->new in sub blub at line 123 but missing a use Bla at the top, and when it was hit by a click without any of the other scripts using Bla having been loaded by the server process before, then bang!
Testing the script in isolation would be the obvious way to safeguard against this particular mistake, but alas the code is dependent upon a humungous environment. Do you know of another way to safeguard against this oversight?
Here's one example how PPI (despite its merits) is limited in its view on Perl:
use strict;
use HTTP::Request::Common;
my $req = GET 'http://www.example.com';
$req->headers->push_header( Bla => time );
my $au=Auweia->new;
__END__
PPI::Token::Symbol '$req'
PPI::Token::Operator '->'
PPI::Token::Word 'headers'
PPI::Token::Operator '->'
PPI::Token::Word 'push_header'
PPI::Token::Symbol '$au'
PPI::Token::Operator '='
PPI::Token::Word 'Auweia'
PPI::Token::Operator '->'
PPI::Token::Word 'new'
Setting the header and assigning the Auweia->new parse the same. So I'm not sure how you can build upon such a shaky foundation. I think the problem is that Auweia could also be a subroutine; perl.exe cannot tell until runtime.
Further Update
Okay, from #Schwern's instructive comments below I learnt that PPI is just a tokenizer, and you can build upon it if you accept its limitations.
Testing is the only answer worth the effort. If the code contains mistakes like forgetting to load a class, it probably contains other mistakes. Whatever the obstacles, make it testable. Otherwise you're patching a sieve.
That said, you have two options. You can use Class::Autouse which will try to load a module if it isn't already loaded. It's handy, but because it affects the entire process it can have unintended effects.
Or you can use PPI to scan your code and find all the class method calls. PPI::Dumper is very handy to understand how PPI sees Perl.
use strict;
use warnings;
use PPI;
use PPI::Dumper;
my $file = shift;
my $doc = PPI::Document->new($file);
# How PPI sees a class method call.
# PPI::Token::Word 'Class'
# PPI::Token::Operator '->'
# PPI::Token::Word 'method'
$doc->find( sub {
my($node, $class) = #_;
# First we want a word
return 0 unless $class->isa("PPI::Token::Word");
# It's not a class, it's actually a method call.
return 0 if $class->method_call;
my $class_name = $class->literal;
# Next to it is a -> operator
my $op = $class->snext_sibling or return 0;
return 0 unless $op->isa("PPI::Token::Operator") and $op->content eq '->';
# And then another word which PPI identifies as a method call.
my $method = $op->snext_sibling or return 0;
return 0 unless $method->isa("PPI::Token::Word") and $method->method_call;
my $method_name = $method->literal;
printf "$class->$method_name seen at %s line %d.\n", $file, $class->line_number;
});
You don't say what server enviroment you're running under, but from what you say it sounds like you could do with preloading all your modules in advance before executing any individual pages. Not only would this prevent the problems you're describing (where every script has to remember to load all the modules it uses) but it would also save you memory.
In pre-forking servers (as is commonly used with mod_perl and Apache) you really want to load as much of your code before your server forks for the first time so that the code is stored once in copy-on-write shared memory rather than mulitple times in each child process when it is loaded on demand.
For information on pre-loading in Apache, see the section of Practical mod_perl

Curl Perl module not working, formadd method missing

I want to use following script:
use FileHandle;
use WWW::Curl::Easy;
use WWW::Curl::Form;
my $file, my $curl, my $curlf, my $return, my $minified;
$file = new FileHandle();
$curl = new WWW::Curl::Easy();
$curl->setopt(CURLOPT_URL, "http://closure-compiler.appspot.com/compile");
$curlf = new WWW::Curl::Form();
$curlf->formadd('output_format', 'text');
$curlf->formadd('output_info', 'compiled_code');
$curlf->formadd('compilation_level', 'ADVANCED_OPTIMIZATIONS');
$curlf->formaddfile($name, 'js_code', 'multipart/form-data');
$curl->setopt(CURLOPT_HTTPPOST, $curlf);
$file->open(\$minified, ">");
$curl->setopt(CURLOPT_WRITEDATA, $file);
$return = $curl->perform();
Following error is thrown:
Can't locate object method "formadd" via package "WWW::Curl::Form" at ./minifyjs.pl ....
WHY??? The WWW::Curl module is installed properly, I used package libwww-curl-perl under Debian/Ubuntu.
Can anyone help me please?
Whoops.
Looks like this commit broke formadd. The XS sub doesn't match the PREFIX = curl_form_ declaration (as it's named curl_formadd), so perl doesn't know how to map the Perl version of the method back to XS.
4.12 was the first release that tried to support WWW::Curl::Form, looks like it didn't work after all. Not sure how I've missed this one. I should probably note it here that WWW::Curl::Form support wasn't exactly a high priority TODO item on my list, due to the existence of various high quality form handling modules on CPAN. I've only accepted the patch for the sake of feature completeness. You're encouraged to use those modules for managing form content. The standard WWW::Curl use case statement applies.
I released 4.13 to fix this issue. Good catch!
Check out WWW::Mechanize. It has a lot of nice form methods.