Dependencies in Perl code - perl

I've been assigned to pick up a webapplication written in some old Perl Legacy code, get it working on our server to later extend it. The code was written 10 years ago by a solitary self-taught developer...
The code has weird stuff going on - they are not afraid to do lib-param.pl on line one, and later in the file do /lib-pl/lib-param.pl - which is offcourse a different file.
Including a.pl with methods b() and c() and later including d.pl with methods c() and e() seems to be quite popular too... Packages appear to be unknown, so you'll just find &c() somewhere in the code later.
Interesting questions:
Is there a tool that can draw relations between perl-files? Show a list of files used by each other file?
The same for MySQL databases and tables? Can it show which schema's/tables are used by which files?
Is there an IDE that knows which c() is called - the one in a.pl or the one in d.pl?
How would you start to try to understand the code?
I'm inclined to go through each file and refactor it, but am not allowed to do that - only the strict minimum to get the code working. (But since the code never uses strict, I don't know if I'm gonna...)

Not using strict is a mistake -- don't continue it. Move the stuff in d.pl to D.pm (or perhaps a better name alltogether), and if the code is procedural use Sub::Exporter to get those subs back into the calling package. strict is lexical, you can turn it on for just one package. Such as your new package D;. To find out which code is being called, use Devel::SimpleTrace.
perl -MDevel::SimpleTrace ./foo.pl
Now any warnings will be accompanied by a full back-log -- sprinkle warnings around the code and run it.
I think the MySQL question should be removed, from this. Schema Table mappings have nothing to do with perl, it seems an out of place distraction on this question.

I would write a utility to scan a complete list of all subs and which file they live in; then I would write a utility to give me a list of all function calls and which file they come from.
By the way - it is not terribly hard to write a fairly mindless static analysis tool to generate a call graph.
For many cases, in well-written code, that will be enough to help me out...

Related

nytprofhtml seem to ignore module named DB

I'm trying to profile a big application that contains a module creatively named DB. After running it under -d:NYTProf and calling nytprofhtml, both without any additional switches or related environment variables, I get usual nytprof directory with HTML output. However it seems that due to some internal logic any output related to my module DB is severely mangled. Just to make sure, DB is pure Perl.
Top and all subroutines list: while other function links point to relevant -pm-NN-line.html file, links to subs from DB point to entry script instead.
"line" link in "Source Code Files" section does point to DB-pm-NN-line.html and it does exists, but unlike all other files it doesn't have "Statements" table inside and "Line" table have absolutely no lines of code, only summary of calls.
Actually, here's a small example:
# main.pl
use lib '.';
use DB;
for (1..10) {
print DB::do_stuff();
}
# DB.pm
package DB;
sub do_stuff {
my $a = 1;
my $b = 2;
my $c = $a + $b;
return $c;
}
1;
Try running perl -d:NYTProf main.pl, then nytprofhtml and then inspect nytprof/DB-pm-8-line.html.
I don't know if it happens because NYTProf itself have internal module named DB or it handles modules starting with DB in some magical way - I've noticed output for functions from DBI looks somewhat different too.
Is there a way to change/disable this behavior short of renaming my DB module?
That's hardly an option
You don't really have an option. The special DB package and the associated Devel:: namespace are coded into the perl compiler / interpreter. Unless you want to do without any debugging facilities altogether and live in fear of any mysterious unquantifiable side effects then you must refactor your code to rename your proprietary DB library
Anyway, the name is very generic and that's exactly why it is expected to be encountered
On the contrary, Devel::NYTProf is bound to use the existing core package DB. It's precisely because it is a very generic identifier that an engineer should reject it as a choice for production code, which could be required to work with pre-existing third-party code at any point
a module creatively named DB
This belies your own support of the choice. DBhas been a core module since v5.6 in 2000, and anything already in CPAN should have been off the cards from the start. I feel certain that the clash of namespaces must been discovered before now. Please don't be someone else who just sweeps the issue under the carpet
Remember that, as it stands, any code you are now putting in the DB package is sharing space with everything that perl has already put there. It is astonishing that you are not experiencing strange and inexplicable symptoms already
I don't see that it's too big a task to fix this as long as you have a proper test suite for your 10MB of Perl code. If you don't, then hopefully you will at least not make the same mistakes again

What happens if I reference a package but don't use/require it?

As much as I can (mostly for clarity/documentation), I've been trying to say
use Some::Module;
use Another::Module qw( some namespaces );
in my Perl modules that use other modules.
I've been cleaning up some old code and see some places where I reference modules in my code without ever having used them:
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
So my questions are:
Is there a performance impact (I'm thinking compile time) by not stating use x and simply referencing it in the code?
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
What's the difference between calling functions in example 1's style versus example 2's style?
I would say that this falls firmly into the category of preemptive optimisation and if you're not sure, then leave it in. You would have to be including some vast unused libraries if removing them helped at all
It is typical of Perl to hide a complex issue behind a simple mechanism that will generally do what you mean without too much thought
The simple mechanisms are these
use My::Module 'function' is the same as writing
BEGIN {
require My::Module;
My::Module->import( 'function' );
}
The first time perl successfully executes a require statement, it adds an element to the global %INC hash which has the "pathified" module name (in this case, My/Module.pm) for a key and the absolute location where it found the source as a value
If another require for the same module is encountered (that is, it already exists in the %INC hash) then require does nothing
So your question
What happens if I reference a package but don't use/require it?
We're going to have a problem with use, utilise, include and reference here, so I'm code-quoting only use and require when I mean the Perl language words.
Keeping things simple, these are the three possibilities
As above, if require is seen more than once for the same module source, then it is ignored after the first time. The only overhead is checking to see whether there is a corresponding element in %INC
Clearly, if you use source files that aren't needed then you are doing unnecessary compilation. But Perl is damn fast, and you will be able to shave only fractions of a second from the build time unless you have a program that uses enormous libraries and looks like use Catalyst; print "Hello, world!\n";
We know what happens if you make method calls to a class library that has never been compiled. We get
Can't locate object method "new" via package "My::Class" (perhaps you forgot to load "My::Class"?)
If you're using a function library, then what matters is the part of use that says
My::Module->import( 'function' );
because the first part is require and we already know that require never does anything twice. Calling import is usually a simple function call, and you would be saving nothing significant by avoiding it
What is perhaps less obvious is that big modules that include multiple subsidiaries. For instance, if I write just
use LWP::UserAgent;
then it knows what it is likely to need, and these modules will also be compiled
Carp
Config
Exporter
Exporter::Heavy
Fcntl
HTTP::Date
HTTP::Headers
HTTP::Message
HTTP::Request
HTTP::Response
HTTP::Status
LWP
LWP::MemberMixin
LWP::Protocol
LWP::UserAgent
Storable
Time::Local
URI
URI::Escape
and that's ignoring the pragmas!
Did you ever feel like you were kicking your heels, waiting for an LWP program to compile?
I would say that, in the interests of keeping your Perl code clear and tidy, it may be an idea to remove unnecessary modules from the compilation phase. But don't agonise over it, and benchmark your build times before doing any pre-handover tidy. No one will thank you for reducing the build time by 20ms and then causing them hours of work because you removed a non-obvious requirement.
You actually have a bunch of questions.
Is there a performance impact (thinking compile time) by not stating use x and simply referencing it in the code?
No, there is no performance impact, because you can't do that. Every namespace you are using in a working program gets defined somewhere. Either you used or required it earlier to where it's called, or one of your dependencies did, or another way1 was used to make Perl aware of it
Perl keeps track of those things in symbol tables. They hold all the knowledge about namespaces and variable names. So if your Some::Module is not in the referenced symbol table, Perl will complain.
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
There is no question here. But yes, you should not do that.
It's hard to say if this is a performance impact. If you have a large Catalyst application that just runs and runs for months it doesn't really matter. Startup cost is usually not relevant in that case. But if this is a cronjob that runs every minute and processes a huge pile of data, then an additional module might well be a performance impact.
That's actually also a reason why all use and require statements should be at the top. So it's easy to find them if you need to add or remove some.
What's the difference between calling functions in example 1's style versus example 2's style?
Those are for different purposes mostly.
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
This syntax is very similar to the following:
my $e = Yet::Another::Module::AFunction('Yet::Another::Module', $data)
It's used for class methods in OOP. The most well-known one would be new, as in Foo->new. It passes the thing in front of the -> to the function named AFunction in the package of the thing on the left (either if it's blessed, or if it's an identifier) as the first argument. But it does more. Because it's a method call, it also takes inheritance into account.
package Yet::Another::Module;
use parent 'A::First::Module';
1;
package A::First::Module;
sub AFunction { ... }
In this case, your example would also call AFunction because it's inherited from A::First::Module. In addition to the symbol table referenced above, it uses #ISA to keep track of who inherits from whom. See perlobj for more details.
my $demo = Whats::The:Difference::Here($data); # EXAMPLE 2
This has a syntax error. There is a : missing after The.
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
This is a function call. It calls the function Here in the package Whats::The::Difference and passes $data and nothing else.
Note that as Borodin points out in a comment, your function names are very atypical and confusing. Usually functions in Perl are written with all lowercase and with underscores _ instead of camel case. So AFunction should be a_function, and Here should be here.
1) for example, you can have multiple package definitions in one file, which you should not normally do, or you could assign stuff into a namespace directly with syntax like *Some::Namespace::frobnicate = sub {...}. There are other ways, but that's a bit out of scope for this answer.

How do I find all the modules used by a Perl script?

I'm getting ready to try to deploy some code to multiple machines. As far as I know, using a Makefile.pm to track dependencies is the best way to ensure they are installed everywhere. The problem I have is I'm not sure our Makefile.pm has been updated as this application has passed through a few different developers.
Is there any way to automatically parse through either my source or a few full runs of my program to determine exactly what versions of what modules my application is depending on? On top of that, is there any way to filter it based on CPAN packages? (So that I only depend on Moose instead of every single module that comes with Moose.)
A third related question is, if you depend on a version of a module that is not the latest, what is the best way to have someone else install it? Should I start including entire localized Perl installations with my application?
Just to be clear - you can not generically get a list of modules that the app depends on by code analysis alone. E.g. if your apps does eval { require $module; $module->import() }, where $module is passed via command line, then this can ONLY be detected by actually running the specific command line version with ALL the module values.
If you do wish to do this, you can figure out every module used by a combination of runs via:
Devel::Cover. Coverage reports would list 100% of modules used. But you don't get version #s.
Print %INC at every single possible exit point in the code as slu's answer said. This should probably be done in END{} block as well as __DIE__ handler to cover all possible exit points, and even then may be not fully 100% covering in generic case if somewhere within the program your __DIE__ handler gets overwritten.
Devel::Modlist (also mentioned by slu's answer) - the downside compared to Devel::Cover is that it does NOT seem to be able to aggregate a database across multiple sample runs like Devel::Cover does. On the plus side, it's purpose-built, so has a lot of very useful options (CPAN paths, versions).
Please note that the other module (Module::ScanDeps) does NOT seem to allow you to do runtime analysis based on arbitrary command line arguments (e.g. it seems at first glance to only allow you to execute the program with no arguments) and if that's true, is inferior to all the above 3 methods for any code that may possibly load modules dynamically.
Module::ScanDeps - Recursively scan Perl code for dependencies
Does both static and runtime scanning. Just modules, I don't know of any exact way of verifying what versions from what distributions. You could get old packages from BackPan, or just package your entire chain of local dependencies up with PAR.
You could look at %INC, see http://www.perlmonks.org/?node_id=681911 which also mentions Devel::Modlist
I would definitely use Devel::TraceUse, which also shows a tree of the modules, so it's easy to guess where they are being loaded.

Using table-of-contents in code?

Do you use table-of-contents for listing all the functions (and maybe variables) of a class in the beginning of big source code file? I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.. but some complex tasks require a lot of code. I'm not sure is it really worth it spending your time subdividing implementation into multiple of files? Or is it ok to create an index-listing additionally to the class/interface declaration?
EDIT:
To better illustrate how I use table-of-contents this is an example from my hobby project. It's actually not listing functions, but code blocks inside a function.. but you can probably get the idea anyway..
/*
CONTENTS
Order_mouse_from_to_points
Lines_intersecting_with_upper_point
Lines_intersecting_with_both_points
Lines_not_intersecting
Lines_intersecting_bottom_points
Update_intersection_range_indices
Rough_method
Normal_method
First_selected_item
Last_selected_item
Other_selected_item
*/
void SelectionManager::FindSelection()
{
// Order_mouse_from_to_points
...
// Lines_intersecting_with_upper_point
...
// Lines_intersecting_with_both_points
...
// Lines_not_intersecting
...
// Lines_intersecting_bottom_points
...
// Update_intersection_range_indices
for(...)
{
// Rough_method
....
// Normal_method
if(...)
{
// First_selected_item
...
// Last_selected_item
...
// Other_selected_item
...
}
}
}
Notice that index-items don't have spaces. Because of this I can click on one them and press F4 to jump to the item-usage, and F2 to jump back (simple visual studio find-next/prevous-shortcuts).
EDIT:
Another alternative solution to this indexing is using collapsed c# regions. You can configure visual studio to show only region names and hide all the code. Of course keyboard support for that source code navigation is pretty cumbersome...
I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.
Correct.
but some complex tasks require a lot of code
Incorrect. While a "lot" of code be required, long runs of code (over 25 lines) are a really bad idea.
actually not listing functions, but code blocks inside a function
Worse. A function that needs a table of contents must be decomposed into smaller functions.
I'm not sure is it really worth it spending your time subdividing implementation into multiple of files?
It is absolutely mandatory that you split things into smaller files. The folks that maintain, adapt and reuse your code need all the help they can get.
is it ok to create an index-listing additionally to the class/interface declaration?
No.
If you have to resort to this kind of trick, it's too big.
Also, many languages have tools to generate API docs from the code. Java, Python, C, C++ have documentation tools. Even with Javadoc, epydoc or Doxygen you still have to design things so that they are broken into intellectually manageable pieces.
Make things simpler.
Use a tool to create an index.
If you create a big index you'll have to maintain it as you change your code. Most modern IDEs create list of class members anyway. it seems like a waste of time to create such index.
I would never ever do this sort of busy-work in my code. The most I would do manually is insert a few lines at the top of the file/class explaining what this module did and how it is intended to be used.
If a list of methods and their interfaces would be useful, I generate them automatically, through a tool such as Doxygen.
I've done things like this. Not whole tables of contents, but a similar principle -- just ad-hoc links between comments and the exact piece of code in question. Also to link pieces of code that make the same simplifying assumptions that I suspect may need fixing up later.
You can use Visual Studio's task list to get a listing of certain types of comment. The format of the comments can be configured in Tools|Options, Environment\Task List. This isn't something I ended up using myself but it looks like it might help with navigating the code if you use this system a lot.
If you can split your method like that, you should probably write more methods. After this is done, you can use an IDE to give you the static call stack from the initial method.
EDIT: You can use Eclipse's 'Show Call Hierarchy' feature while programming.

What can make Class::Loader fail where "use" and "new" do not?

I'm working on a very large CGI application that uses Crypt::RSA, which is properly installed. I get a "attempted to call a null reference as a function" type of error (I can't go back to get the exact error right now because we had to rollback for a release date) when I try to run any the embedded library. I trace the null reference to Crypt::RSA's constructor, which uses Class::Loader to enable Crypt::RSA::ES::OAEP.
I replaced the class loader with a "use" and a "new", and that part works fine, though the library still fails in many points. Obviously something is wrong with my environment. I'm just not certain as to what. Can anyone give me any leads?
Ok, after 12 hours of digging into it, I got this working.
Here's what was going on (but not why). Whenever I called eval() on a quoted use or require statement (as occurs in Class::Loader, but also in other locations in the Crypt:: framwork), it failed to see paths that were otherwise included as Perl classpaths. Since most quoted use/require objects simply assume the class will be there, very few useful errors were thrown out at me. I would dump #INC to file, outside an eval block, and everything would be there.
Ironically, I used the same setup in dev vs staging, and it worked in dev, but not in staging. I must also point out that FindBin (I shouldn't be using it in CGI, I know, but Crypt uses it) was flailing up and down about /dev/null in staging, but not in development.
Since I can't easily compare versions or global configs, that's where my quest ends.
How I resolved the issue for myself in Crypt::RSA was to disable all commands tied to FindBin, and hard-code require references for anything my code would ever access. I did a require in Crypt::RSA for Crypt::RSA::ES::OAEP and one in Crypt::Random::Generator for Crypt::Random::Provider::rand
Hope this helps anyone in the future who has the problem. Anyone who can suggest the why of it, please respond and I'll add it to complete the post.