Tracing the source of an imported subroutine - perl

I'm modifying a rather large unit test that uses 27 modules before loading the test framework:
use Test::Most;
When the script reaches this line, it outputs the following warning:
mytest.t ........... Subroutine main::explain redefined at mytest.t line 84.
Now I can hide the redefine messages by simply undefining the the subroutine before calling use.
BEGIN {
undef *explain; # Method imported somewhere before. Hide the redefine messages
}
use Test::Most;
However, I'd like to determine which module is importing the other version of explain.
Could use a process of elimination and just comment out everything until I get the warning, but would be nice if there was a more direct route to determining the source.

Inserting use Devel::Peek qw( ); BEGIN { Devel::Peek::Dump(\&foo); } before the line that gives the warning will tell you which package (COMP_STASH) and file name (FILE).
A solution that also gets you the line number is possible. The function's opcode tree could be walked until a nextstate is found (which is probably the very first op of the tree). The file name and line number can be extracted from the op. nextstate ops set the file and line number issued by runtime warnings.
Notes:
#line directives affect both solutions.
If a module exports an imported sub, both solution would give the original package and file of origin, not of the intermediary.

You can use perl's introspection facility (called B) for this:
use B;
my $gv = B::svref_2object(\&explain)->GV;
printf "%s::%s file %s line %s\n", $gv->STASH->NAME, $gv->NAME, $gv->FILE, $gv->LINE;
Test::Most::explain file /usr/share/perl5/Test/Most.pm line 175
(line is the end of the sub, not the beginning)

Related

Perl DATA filehandle is empty when read

I have a Perl module with a template (for processing by the Template module) stored between the __DATA__ and __END__ keywords at the end of the file. When attempting to generate a file using the template, the resulting file comes out empty with no warnings or errors output. After debugging, I found that the DATA filehandle is actually empty before it is passed to the Template module.
A previous version of this module is able to correctly read the template from DATA, but none of the changes that I have made should be affecting this part of the code. These changes consist of logic changes within completely separate functions and adding the following use statements to the module:
use DBI;
use DBI::Const::GetInfoType;
use Switch;
I have tried adding write permissions on the perl module (it was originally read-only) and removing the __END__ keyword as I found that wasn't necessary. Unfortunately the DATA filehandle still appears empty.
What kind of problems could cause the DATA filehandle to be empty, and do any of these problems apply to my situation? I am using perl v5.12.5.
The reason that the DATA filehandle is empty in this case is down to the use of the Switch module. This module works by using a source filter which is clobbering the DATA filehandle during the course of its processing.
Alternatives include using if-elsif-else or using the given-when construct, although this is an experimental feature so it may not behave the same in later versions of Perl.
EDIT: Here is a simple reproducer for the issue described above:
# use Switch;
while(<DATA>) {
print($_);
}
__DATA__
One line of data
Second line of data
Without the "use Switch", you'll see the lines printed out, but with it nothing is printed.

identify a procedure and replace it with a different procedure

What I want to achieve:
###############CODE########
old_procedure(arg1, arg2);
#############CODE_END######
I have a huge code which has a old procedure in it. I want that the call to that old_procedure go to a call to a new procedure (new_procedure(arg1, arg2)) with the same arguments.
Now I know, the question seems pretty stupid but the trick is I am not allowed to change the code or the bad_function. So the only thing I can do it create a procedure externally which reads the code flow or something and then whenever it finds the bad_function, it replaces it with the new_function. They have a void type, so don't have to worry about the return values.
I am usng perl. If someone knows how to atleast start in this direction...please comment or answer. It would be nice if the new code can be done in perl or C, but other known languages are good too. C++, java.
EDIT: The code is written in shell script and perl. I cannot edit the code and I don't have location of the old_function, I mean I can find it...but its really tough. So I can use the package thing pointed out but if there is a way around it...so that I could parse the thread with that function and replace function calls. Please don't remove tags as I need suggestions from java, C++ experts also.
EDIT: #mirod
So I tried it out and your answer made a new subroutine and now there is no way of accessing the old one. I had created an variable which checks the value to decide which way to go( old_sub or new_sub)...is there a way to add the variable in the new code...which sends the control back to old_function if it is not set...
like:
use BadPackage; # sub is defined there
BEGIN
{ package BapPackage;
no warnings; # to avoid the "Subroutine bad_sub redefined" message
# check for the variable and send to old_sub if the var is not set
sub bad_sub
{ # good code
}
}
# Thanks #mirod
This is easier to do in Perl than in a lot of other languages, but that doesn't mean it's easy, and I don't know if it's what you want to hear. Here's a proof-of-concept:
Let's take some broken code:
# file name: Some/Package.pm
package Some::Package;
use base 'Exporter';
our #EXPORT = qw(forty_two nineteen);
sub forty_two { 19 }
sub nineteen { 19 }
1;
# file name: main.pl
use Some::Package;
print "forty-two plus nineteen is ", forty_two() + nineteen();
Running the program perl main.pl produces the output:
forty-two plus nineteen is 38
It is given that the files Some/Package.pm and main.pl are broken and immutable. How can we fix their behavior?
One way we can insert arbitrary code to a perl command is with the -M command-line switch. Let's make a repair module:
# file: MyRepairs.pm
CHECK {
no warnings 'redefine';
*forty_two = *Some::Package::forty_two = sub { 42 };
};
1;
Now running the program perl -MMyRepairs main.pl produces:
forty-two plus nineteen is 61
Our repair module uses a CHECK block to execute code in between the compile-time and run-time phase. We want our code to be the last code run at compile-time so it will overwrite some functions that have already been loaded. The -M command-line switch will run our code first, so the CHECK block delays execution of our repairs until all the other compile time code is run. See perlmod for more details.
This solution is fragile. It can't do much about modules loaded at run-time (with require ... or eval "use ..." (these are common) or subroutines defined in other CHECK blocks (these are rare).
If we assume the shell script that runs main.pl is also immutable (i.e., we're not allowed to change perl main.pl to perl -MMyRepairs main.pl), then we move up one level and pass the -MMyRepairs in the PERL5OPT environment variable:
PERL5OPT="-I/path/to/MyRepairs -MMyRepairs" bash the_immutable_script_that_calls_main_pl.sh
These are called automated refactoring tools and are common for other languages. For Perl though you may well be in a really bad way because parsing Perl to find all the references is going to be virtually impossible.
Where is the old procedure defined?
If it is defined in a package, you can switch to the package, after it has been used, and redefine the sub:
use BadPackage; # sub is defined there
BEGIN
{ package BapPackage;
no warnings; # to avoid the "Subroutine bad_sub redefined" message
sub bad_sub
{ # good code
}
}
If the code is in the same package but in a different file (loaded through a require), you can do the same thing without having to switch package.
if all the code is in the same file, then change it.
sed -i 's/old_procedure/new_procedure/g codefile
Is this what you mean?

Why does Perl compile diagnostics.pm if I have no diagnostics in my code?

When I was looking at the output of Devel::NYTProf v4 for a CGI program, I came across the diagnostics.pm in the report Source Code Files — ordered by exclusive time then name.
First I didn't understand why that would be in the production code. I dug deeper through the report and found out that it was called by main::BEGIN#17. That, in turn, is the following line:
# spent 34µs (26+8) within main::BEGIN#15 which was called: # once (26µs+8µs) by main::RUNTIME at line 15
use strict;
# spent 34µs making 1 call to main::BEGIN#15 # spent 8µs making 1 call to strict::import
# spent 36µs (17+19) within main::BEGIN#16 which was called: # once (17µs+19µs) by main::RUNTIME at line 16
use warnings;
# spent 36µs making 1 call to main::BEGIN#16 # spent 19µs making 1 call to warnings::import
# spent 292ms (171+121) within main::BEGIN#17 which was called: # once (171ms+121ms) by main::RUNTIME at line 17
no diagnostics;
# spent 292ms making 1 call to main::BEGIN#17
# spent 135µs (27+108) within main::BEGIN#18 which was called: # once (27µs+108µs) by main::RUNTIME at line 18
use Carp qw( carp croak );
So this seems to be the culprit. I removed the no diagnostics line, and the call was gone, effectively saving about 300ms of time.
Here's what perldoc use says about the no keyword:
There's a corresponding no declaration that unimports meanings
imported by use, i.e., it calls unimport Module LIST instead of
import. It behaves just as import does with VERSION, an omitted or
empty LIST, or no unimport method being found.
no integer;
no strict 'refs';
no warnings;
So here's my actual question: Am I correct in assuming that if I call no diagnostics, it is actually loaded before it is unimported?
Is the call to no diagnostics similar to this piece of code?
BEGIN {
require diagnostics.pm;
diagnostics->unimport;
}
In consequence, is it a bad idea to just unimport stuff that has never been imported, because that actually loads it first?
Am I correct in assuming that if I call no diagnostics, it is actually loaded before it is unimported?
Yes. Its indeed full equivalent to
BEGIN {
require diagnostics;
diagnostics->unimport;
}
So no module command actually loads and compiles the module; including executing the code which is not in any sub, in BEGIN block etc; same for all dependencies of given module(for every use/ require inside).

Is there a tool to check a Perl script for unnecessary use statements?

For Python, there is a script called importchecker which tells you if you have unnecessary import statements.
Is there a similar utility for Perl use (and require) statements?
Take a look at Devel::TraceUse it might give you a chunk of what you're looking for.
Here is a script I wrote to attempt this. It is very simplistic and will not automate anything for you but it will give you something to start with.
#!/usr/bin/perl
use strict;
use v5.14;
use PPI::Document;
use PPI::Dumper;
use PPI::Find;
use Data::Dumper;
my %import;
my $doc = PPI::Document->new($ARGV[0]);
my $use = $doc->find( sub { $_[1]->isa('PPI::Statement::Include') } );
foreach my $u (#$use) {
my $node = $u->find_first('PPI::Token::QuoteLike::Words');
next unless $node;
$import{$u->module} //= [];
push $import{$u->module}, $node->literal;
}
my $words = $doc->find( sub { $_[1]->isa('PPI::Token::Word') } );
my #words = map { $_->content } #$words;
my %words;
#words{ #words } = 1;
foreach my $u (keys %import) {
say $u;
foreach my $w (#{$import{$u}}) {
if (exists $words{$w}) {
say "\t- Found $w";
}
else {
say "\t- Can't find $w";
}
}
}
There is a number of ways to load packages and import symbols (or not). I am not aware of a tool which single-handedly and directly checks whether those symbols are used or not.
But for cases where an explicit import list is given,
use Module qw(func1 func2 ...);
there is a Perl::Critic policy TooMuchCode::ProhibitUnusedImport that helps with much of that.
One runs on the command line
perlcritic --single-policy TooMuchCode::ProhibitUnusedImport program.pl
and the program is checked. Or run without --single-policy flag for a complete check and seek Severity 1 violations in the output, which this is.
For an example, consider a program
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # a class; but it imports 'path'
use Data::Dumper; # imports 'Dumper'
use Data::Dump qw(dd pp); # imports 'dd' and 'pp'
use Cwd qw(cwd); # imports only 'cwd'
use Carp qw(carp verbose); # imports 'carp'; 'verbose' isn't a symbol
use Term::ANSIColor qw(:constants); # imports a lot of symbols
sub a_func {
say "\tSome data: ", pp [ 2..5 ];
carp "\tA warning";
}
say "Current working directory: ", cwd;
a_func();
Running the above perlcritic command prints
Unused import: dd at line 7, column 5. A token is imported but not used in the same code. (Severity: 1)
Unused import: verbose at line 9, column 5. A token is imported but not used in the same code. (Severity: 1)
We got dd caught, while pp from the same package isn't flagged since it's used (in the sub), and neither are carp and cwd which are also used; as it should be, out of what the policy aims for.
But note
whatever comes with :constants tag isn't found
word verbose, which isn't a function (and is used implicitly), is reported as unused
if a_func() isn't called then those pp and carp in it are still not reported even though they are then unused. This may be OK, since they are present in code, but it is worth noting
(This glitch-list is likely not exhaustive.)
Recall that the import list is passed to an import sub, which may expect and make use of whatever the module's design deemed worthy; these need not be only function names. It is apparently beyond this policy to follow up on all that. Still, loading modules with the explicit import list with function names is good practice and what this policy does cover is an important use case.
Also, per the clearly stated policy's usage, the Dumper (imported by Data::Dumper) isn't found, nor is path from Path::Tiny. The policy does deal with some curious Moose tricks.
How does one do more? One useful tool is Devel::Symdump, which harvests the symbol tables. It catches all symbols in the above program that have been imported (no Path::Tiny methods can be seen if used, of course). The non-existing "symbol" verbose is included as well though. Add
use Devel::Symdump;
my $syms = Devel::Symdump->new;
say for $syms->functions;
to the above example. To also deal with (runtime) require-ed libraries we have to do this at a place in code after they have been loaded, what can be anywhere in the program. Then best do it in an END block, like
END {
my $ds = Devel::Symdump->new;
say for $ds->functions;
};
Then we need to check which of these are unused. At this time the best tool I'm aware of for that job is PPI; see a complete example. Another option is to use a profiler, like Devel::NYTProf.
Another option, which requires some legwork†, is the compiler's backend B::Xref, which gets practically everything that is used in the program. It is used as
perl -MO=Xref,-oreport_Xref.txt find_unused.pl
and the (voluminous) output is in the file report_Xref.txt.
The output has sections for each involved file, which have subsections for subroutines and their packages. The last section of the output is directly useful for the present purpose.
For the example program used above I get the output file like
File /.../perl5/lib/perl5//Data/Dump.pm
...
(some 3,000 lines)
...
File find_unused.pl --> there we go, this program's file
Subroutine (definitions)
... dozens of lines ...
Subroutine (main)
Package main
&a_func &43
&cwd &27
Subroutine a_func
Package ?
#?? 14
Package main
&carp &15
&pp &14
So we see that cwd gets called (on line 27) and that carp and pp are also called in the sub a_func. Thus dd and path are unused (out of all imported symbols found otherwise, by Devel::Symdump for example). This is easy to parse.
However, while path is reported when used, if one uses new instead (also in Path::Tiny as a traditional constructor) then that isn't reported in this last section, nor are other methods.
So in principle† this is one way to find which of the symbols (for functions) reported to exist by Devel::Symdump have been used in the program.
† The example here is simple and easy to process but I have no idea how complete, or hard to parse, this is when all kinds of weird ways for using imported subs are taken into account.

Why are there several c files in symbol table of %main::?

'_<perlmain.c' => *{'::_<perlmain.c'},
'_</usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/auto/Data/Dumper/Dumper.so' => *{'::_</usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/auto/Data/Dumper/Dumper.so'},
'_<universal.c' => *{'::_<universal.c'},
'_<xsutils.c' => *{'::_<xsutils.c'},
...
Why are they in the symbol table of %main::,when are they useful?
To repeat the output from the question, run
#! /usr/bin/env perl
use Data::Dumper;
print Dumper \%main::;
The entries you see are inserted in gv_fetchfile_flags:
/* This is where the debuggers %{"::_<$filename"} hash is created */
tmpbuf[0] = '_';
tmpbuf[1] = '<';
memcpy(tmpbuf + 2, name, namelen);
gv = *(GV**)hv_fetch(PL_defstash, tmpbuf, tmplen, TRUE);
if (!isGV(gv)) {
gv_init(gv, PL_defstash, tmpbuf, tmplen, FALSE);
#ifdef PERL_DONT_CREATE_GVSV
GvSV(gv) = newSVpvn(name, namelen);
#else
sv_setpvn(GvSV(gv), name, namelen);
#endif
}
This is called many times by way of newXS as part of the boot process in S_parse_body.
boot_core_PerlIO();
boot_core_UNIVERSAL();
boot_core_mro();
Note that you also see entries for perlio.c, universal.c, and mro.c in the output.
The Debugger Internals section of the perldebguts documentation explains their use:
For example, whenever you call Perl's built-in caller function from the package DB, the arguments that the corresponding stack frame was called with are copied to the #DB::args array. These mechanisms are enabled by calling Perl with the -d switch. Specifically, the following additional features are enabled (cf. $^P in perlvar):
…
Each array #{"_<$filename"} holds the lines of $filename for a file compiled by Perl. The same is also true for evaled strings that contain subroutines, or which are currently being executed. The $filename for evaled strings looks like (eval 34). Code assertions in regexes look like (re_eval 19).
Each hash %{"_<$filename"} contains breakpoints and actions keyed by line number. Individual entries (as opposed to the whole hash) are settable. Perl only cares about Boolean true here, although the values used by perl5db.pl have the form "$break_condition\0$action".
The same holds for evaluated strings that contain subroutines, or which are currently being executed. The $filename for evaled strings looks like (eval 34) or (re_eval 19) .
Each scalar ${"_<$filename"} contains "_<$filename". This is also the case for evaluated strings that contain subroutines, or which are currently being executed. The $filename for evaled strings looks like (eval 34) or (re_eval 19).
After each required file is compiled, but before it is executed, DB::postponed(*{"_<$filename"}) is called if the subroutine DB::postponed exists. Here, the $filename is the expanded name of the required file, as found in the values of %INC.
…
As perl is a interpreter based language, it needs, right, his interpreter, the perl binary. This binary just reads the perl script, and executes the code by translating it in machine code.
Your perl interpreter is compiled with debug symbols, so it contains informations about the source files it build of. Also you see the objects of loaded modules Data::Dumper in your example.
Hope that helps