Perl shallow syntax check? ie. do not check syntax of imports - perl

How can I perform a "shallow" syntax check on perl files. The standard perl -c is useful but it checks the syntax of imports. This is sometimes nice but not great when you work in a code repository and push to a running environment and you have a function defined in the repository but not yet pushed to the running environment. It fails checking a function because the imports reference system paths (ie. use Custom::Project::Lib qw(foo bar baz)).

It can't practically be done, because imports have the ability to influence the parsing of the code that follows. For example use strict makes it so that barewords aren't parsed as strings (and changes the rules for how variable names can be used), use constant causes constant subs to be defined, and use Try::Tiny changes the parse of expressions involving try, catch, or finally (by giving them & prototypes). More generally, any module that exports anything into the caller's namespace can influence parsing because the perl parser resolves ambiguity in different ways when a name refers to an existing subroutine than when it doesn't.

There are two problems with this:
How to not fail -c if the required modules are missing?
There are two solutions:
A. Add a fake/stub module in production
B. In all your modules, use a special catch-all #INC subroutine entry (using subs in #INC is explained here). This obviously has a problem of having the module NOT fail in real production runtime if the libraries are missing - DoublePlusNotGood in my book.
Even if you could somehow skip failing on missing modules, you would STILL fail on any use of the identifiers imported from the missing module or used explicitly from that module's namespace.
The only realistic solution to this is to go back to #1a and use a fake stub module, but this time one that has a declared and (as needed) exported identifier for every public interface. E.g. do-nothing subs or dummy variables.
However, even that will fail for some advanced modules that dynamically determine what to create in their own namespace and what to export in runtime (and the caller code could dynamically determine which subs to call - heck, sometimes which modules to import).
But this approach would work just fine for normal "Java/C-like" OO or procedural code that only calls statically named predefined public subs, methods and accesses exported variables.

I would suggest that it's better to include your code repository in your syntax check. perl -I/path/to/working/code/repo/local_perl/ -c or set PERL5LIB=/path/to/working/code/repo/local_perl/ prior to running perl -c. Either option should allow you to check against your working code, assuming you have it in a directory structure similar to your live code.

I guess you could make stubs for the missing libraries in your home folder.

Have you looked into PPI? I think it does follow imports, however it could perhaps be more easily modified to guess what looks like a function name.

Related

What happens if I reference a package but don't use/require it?

As much as I can (mostly for clarity/documentation), I've been trying to say
use Some::Module;
use Another::Module qw( some namespaces );
in my Perl modules that use other modules.
I've been cleaning up some old code and see some places where I reference modules in my code without ever having used them:
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
So my questions are:
Is there a performance impact (I'm thinking compile time) by not stating use x and simply referencing it in the code?
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
What's the difference between calling functions in example 1's style versus example 2's style?
I would say that this falls firmly into the category of preemptive optimisation and if you're not sure, then leave it in. You would have to be including some vast unused libraries if removing them helped at all
It is typical of Perl to hide a complex issue behind a simple mechanism that will generally do what you mean without too much thought
The simple mechanisms are these
use My::Module 'function' is the same as writing
BEGIN {
require My::Module;
My::Module->import( 'function' );
}
The first time perl successfully executes a require statement, it adds an element to the global %INC hash which has the "pathified" module name (in this case, My/Module.pm) for a key and the absolute location where it found the source as a value
If another require for the same module is encountered (that is, it already exists in the %INC hash) then require does nothing
So your question
What happens if I reference a package but don't use/require it?
We're going to have a problem with use, utilise, include and reference here, so I'm code-quoting only use and require when I mean the Perl language words.
Keeping things simple, these are the three possibilities
As above, if require is seen more than once for the same module source, then it is ignored after the first time. The only overhead is checking to see whether there is a corresponding element in %INC
Clearly, if you use source files that aren't needed then you are doing unnecessary compilation. But Perl is damn fast, and you will be able to shave only fractions of a second from the build time unless you have a program that uses enormous libraries and looks like use Catalyst; print "Hello, world!\n";
We know what happens if you make method calls to a class library that has never been compiled. We get
Can't locate object method "new" via package "My::Class" (perhaps you forgot to load "My::Class"?)
If you're using a function library, then what matters is the part of use that says
My::Module->import( 'function' );
because the first part is require and we already know that require never does anything twice. Calling import is usually a simple function call, and you would be saving nothing significant by avoiding it
What is perhaps less obvious is that big modules that include multiple subsidiaries. For instance, if I write just
use LWP::UserAgent;
then it knows what it is likely to need, and these modules will also be compiled
Carp
Config
Exporter
Exporter::Heavy
Fcntl
HTTP::Date
HTTP::Headers
HTTP::Message
HTTP::Request
HTTP::Response
HTTP::Status
LWP
LWP::MemberMixin
LWP::Protocol
LWP::UserAgent
Storable
Time::Local
URI
URI::Escape
and that's ignoring the pragmas!
Did you ever feel like you were kicking your heels, waiting for an LWP program to compile?
I would say that, in the interests of keeping your Perl code clear and tidy, it may be an idea to remove unnecessary modules from the compilation phase. But don't agonise over it, and benchmark your build times before doing any pre-handover tidy. No one will thank you for reducing the build time by 20ms and then causing them hours of work because you removed a non-obvious requirement.
You actually have a bunch of questions.
Is there a performance impact (thinking compile time) by not stating use x and simply referencing it in the code?
No, there is no performance impact, because you can't do that. Every namespace you are using in a working program gets defined somewhere. Either you used or required it earlier to where it's called, or one of your dependencies did, or another way1 was used to make Perl aware of it
Perl keeps track of those things in symbol tables. They hold all the knowledge about namespaces and variable names. So if your Some::Module is not in the referenced symbol table, Perl will complain.
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
There is no question here. But yes, you should not do that.
It's hard to say if this is a performance impact. If you have a large Catalyst application that just runs and runs for months it doesn't really matter. Startup cost is usually not relevant in that case. But if this is a cronjob that runs every minute and processes a huge pile of data, then an additional module might well be a performance impact.
That's actually also a reason why all use and require statements should be at the top. So it's easy to find them if you need to add or remove some.
What's the difference between calling functions in example 1's style versus example 2's style?
Those are for different purposes mostly.
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
This syntax is very similar to the following:
my $e = Yet::Another::Module::AFunction('Yet::Another::Module', $data)
It's used for class methods in OOP. The most well-known one would be new, as in Foo->new. It passes the thing in front of the -> to the function named AFunction in the package of the thing on the left (either if it's blessed, or if it's an identifier) as the first argument. But it does more. Because it's a method call, it also takes inheritance into account.
package Yet::Another::Module;
use parent 'A::First::Module';
1;
package A::First::Module;
sub AFunction { ... }
In this case, your example would also call AFunction because it's inherited from A::First::Module. In addition to the symbol table referenced above, it uses #ISA to keep track of who inherits from whom. See perlobj for more details.
my $demo = Whats::The:Difference::Here($data); # EXAMPLE 2
This has a syntax error. There is a : missing after The.
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
This is a function call. It calls the function Here in the package Whats::The::Difference and passes $data and nothing else.
Note that as Borodin points out in a comment, your function names are very atypical and confusing. Usually functions in Perl are written with all lowercase and with underscores _ instead of camel case. So AFunction should be a_function, and Here should be here.
1) for example, you can have multiple package definitions in one file, which you should not normally do, or you could assign stuff into a namespace directly with syntax like *Some::Namespace::frobnicate = sub {...}. There are other ways, but that's a bit out of scope for this answer.

Refactoring Perl code on a live Web site. Are mutual module dependencies OK?

We have a big Perl based Web site.
I am assigned to refactor code of many scripts and packages. Sometimes changes are easy and I just modify existing functions. But sometimes I need to rewrite entire functions. The bad news that the functions which I rewrite call other functions. So if I move refactored code in my new module I need also copy all supplementary functions. But if I do not move refactored code to my special module, a tiny syntax error may crash the entire site :-(
Yes, I know we should use version control, etc. But we don't and this is a fact which I can't change. What to do?
So I need to keep some code in a Test module (to avoid syntax errors to crash the entire site). Would it be OK to make circular references from other modules to Test (for my new refactored routines) and from Test to other modules (for supplemetary routines)?
Note that we have some AutoRequire module which is required by most of our scripts and modules. AutoRequire makes A::X() call to automatically load A module (if it is not yet loaded).
My main question is whether it is OK in this settings to use mutual module dependencies. Any other suggestions?
Require works by first checking if the module is already loaded and if not load it and afterwards mark it as loaded. So if you have a dependency that Foo requires Bar and Bar requires Foo it will first try to load Foo, within Foo try to load Bar. But because Bar requires Foo and Bar was not yet finally loaded this can give problems.
Edit after input from #ysth: it will not fail but it might load some thing only partially, which might cause interesting problems later.
So it might just work, but it also might give problems later (like failing to use exported functions or so).
Perl allows mutual dependencies, but they can sometimes lead to unintuitive behaviour. An example I've given before is:
foo.pm
package foo;
use bar;
sub import { printf("%s loaded foo\n", scalar caller) }
1;
bar.pm
package bar;
use foo;
sub import { printf("%s loaded bar\n", scalar caller) }
1;
script.pl
#!/usr/bin/env perl
package main;
use foo;
Output:
foo loaded bar
main loaded foo
Didn't you expect to see bar loaded foo somewhere in that output?
With careful consideration of compile time versus runtime, etc, etc, you'll realise why that line was not output, and that this is proper and documented behaviour. Just not necessarily very intuitive.
That said, for purely object-oriented modules which don't do anything on import (e.g. export anything to their caller, or act as a pragma, or—as in this example—print output), it should generally be pretty safe.
That said, for purely object-oriented modules there's usually no reason to load them upfront with use anyway—instead you can load them with require or Module::Runtime when they are needed.

Explain this witchcraft!!! (in Perl, with Moose and namespace::autoclean)

So these days I'm working with a project that uses Perl and Moose. I understand Moose is built on MOP. I'm not too familiar with MOP, and I've encountered something I don't understand, and I could use a theoretical explanation. Here is the module namespace::autoclean's documentation:
SYNOPSIS
package Foo;
use namespace::autoclean;
use Some::Package qw/imported_function/;
sub bar { imported_function('stuff') }
# later on:
Foo->bar; # works
Foo->imported_function; # will fail. imported_function got cleaned after compilation
So, back before I ever used Moose, the way that you called a method on an object was: the Perl interpreter would look up that method in the symbol table of the package that your object was blessed into (then, if not found, consider #ISA inheritance and the like). The way it called an imported function from within the package was: it looked up the name of the function in the symbol table of the package. As far as I've been aware to date, that means the same symbol table, either way, so this behavior should be impossible.
My initial inspection of the source was not productive. In broad terms, what is different when using Moose, MOP, and namespace::autoclean, that this sort of trickery becomes possible?
ed. To be especially clear, if I were to replace use namespace::autoclean with
CHECK { undef *Foo::imported_function }
then the Foo->bar; call described in the documentation would crash, because Foo->bar doesn't know where to find imported_function.
It's actually quite simple. For
some_sub()
some_sub is resolved at compile time. For
$o->some_method()
some_method is resolved at runtime. It cannot be done at compile-time since it depends on the value of $o.
There is nothing here that is non-standard. The line
use Some::Package qw/imported_function/;
imports imported_function into the current package, so Foo::imported_function is the same subroutine as Some::Package::imported_function. That assumes that Some::Package inherits from Exporter to do the necessary manipulation of the symbol tables.
The calls are method calls, so Foo->bar is the same as Foo::bar('Foo'). The only special thing here is that the magic that has been done by the import function from Exporter is undone at the end of compile time by namespace::autoclean.
I haven't looked at the code for this module, but since a package's symbol table is just a hash (known as a stash, for symbol table hash) it would be easy to preserve its state at one point and restore it afterwards. So I would guess namespace::autoclean takes a snapshot of the symbol table when it is loaded and the restores that state at the end of compilation time. This can conveniently be done in a CHECK block which behaves like a BEGIN block but is executed at the end of compilation and before the run starts.

Using use without package - big mess?

I have been USEing .pm files willy-nilly in my programs without really getting into using packages unless really needed. In essence, I would just have common routines in a .pm and they would become part of main when use'd (the .pm would have no package or Exporter or the like... just a bunch of routines).
However, I came across a situation the other day which I think I know what happened and why, but I was hoping to get some coaching from the experts here on best practices and how to resolve this. In essence, should packages be used always? Should I "do" files when I just want common routines absorbed into main (or the parent module/package)? Is Exporter really the way to handle all of this?
Here's example code of what I came across (I won't post the original code as it's thousands of lines... this is just the essence of the problem).
Pgm1.pl:
use PM1;
use PM2;
print "main\n";
&pm2sub1;
&pm1sub1;
PM1.pm:
package PM1;
require Exporter;
#ISA=qw(Exporter);
#EXPORT=qw(pm1sub1);
use Data::Dump 'dump';
use PM2;
&pm2sub1;
sub pm1sub1 {
print "pm1sub1 from caller ",dump(caller()),"\n";
&pm2sub1;
}
1;
PM2.pm:
use Data::Dump 'dump';
sub pm2sub1 {
print "pm2sub1 from caller ",dump(caller()),"\n";
}
1;
In essence, I'd been use'ing PM2.pm for some time with its &pm2sub1 subroutine. Then I wrote PM1.pm at some point and it needed PM2.pm's routines as well. However, in doing it like this, PM2.pm's modules got absorbed into PM2.pm's package and then Pgm1.pl couldn't do the same since PM2.pm had already been use'd.
This code will produce
Undefined subroutine &main::pm2sub1 called at E:\Scripts\PackageFun\Pgm1.pl line 4.
pm2sub1 from caller ("PM1", "PM1.pm", 7)
main
However, when I swap the use statements like so in Pgm1.pl
use PM2;
use PM1;
print "main\n";
&pm2sub1;
&pm1sub1;
... perl will allow PM2.pm's modules into main, but then not into PM1.pm's package:
Undefined subroutine &PM1::pm2sub1 called at PM1.pm line 7.
Compilation failed in require at E:\Scripts\PackageFun\Pgm1.pl line 2.
BEGIN failed--compilation aborted at E:\Scripts\PackageFun\Pgm1.pl line 2.
So, I think I can fix this by getting religious about packages and Exporter in all my modules. Trouble is, PM2.pm is already used in a great number of other programs, so it would be a ton of regression testing to make sure I didn't break anything.
Ideas?
See my answer to What is the difference between library files and modules?.
Only use require (and thus use) for modules (files with package, usually .pm). For "libraries" (files without package, usually .pl), use do.
Better yet, only use modules!
use will not load same file more than once. It will, however, call target package's import sub every time. You should format your PM2 as proper package, so use can find its import and export function to requestor's namespace from there.
(Or you could sneak your import function into proper package by fully qualifying its name, but don't do that.)
You're just asking for trouble arranging your code like this. Give each module a package name (namespace), then fully qualify calls to its functions, e.g. PM2::sub1() to call sub1 in package PM2. You are already naming the functions with the package name on them (pm2sub1); it is two extra characters (::) to do it the right way and then you don't need to bother with Exporter either.

Perl's fields.pm warning: Name "module::FIELDS" used only once

While using the Net::OpenID::Consumer module, I get a few warnings from the fields pragma.
Name "module::FIELDS" used only once
investigating a bit, I found that this pragma is traversing the inheritance tree recursively, and looking for FIELDS. however, if that module is using Exporter, for example, and fields happen to look on Exporter only once, it trigger this warning.
Also, out of four warnings, three are actually base classes of other classes, (e.g. Exporter, Tie::Hash) but the fourth is 'Cache::RemovalStrategy::LRU', that for some reason includes:
use fields qw();
Apparently, it triggers fields to investigate the module, but not to create the FIELDS hash
So, how do I get rid of these warnings?
Edit: using Perl 5.10.0 on MacOSX
Edit: Fixed module name Net::OpenID => Net::OpenID::Consumer
There is no Net::OpenID module (though there are a number of modules under that namespace).
Please show the code that you are running?