Enable global warnings - perl

I have to optimize an intranet written in Perl (about 3000 files). The first thing I want to do is enable warnings "-w" or "use warnings;" so I can get rid of all those errors, then try to implement "use strict;".
Is there a way of telling Perl to use warnings all the time (like the settings in php.ini for PHP), without the need to modify each script to add "-w" to it's first line?
I even thought to make an alias for /usr/bin/perl, or move it to another name and make a simple script instead of it just to add the -w flag (like a proxy).
How would you debug it?

Well…
You could set the PERL5OPT envariable to hold -w. See the perlrun manpage for details. I hope you’ll consider tainting, too, like -T or maybe -t, for security tracking.
But I don’t envy you. Retrofitting code developed without the benefit of use warnings and use strict is usually a royal PITA.
I have something of a standard boiler-plate I use to start new Perl programs. But I haven’t given any thought to one for CGI programs, which would likely benefit from some tweaks against that boiler-plate.

Retrofitting warnings and strict is hard. I don't recommend a Big Bang approach, setting warnings (let alone strictures) on everything. You will be inundated with warnings to the point of uselessness.
You start by enabling warnings on the modules used by the scripts (there are some, aren't there?), rather than applying warnings to everything. Get the core clean, then get to work on the periphery, one unit at a time. So, in fact, I'd recommend having a simple (Perl) script that simply finds a line that does not start with a hash and adds use warnings; (and maybe use strict; too, since you're going to be dealing with one script at a time), so you can do the renovations one script at a time.
In other words, you will probably be best off actually editing each file as you're about to renovate it.
I'd only use the blanket option to make a simple assessment of the scope of the problem: is it a complete and utter disaster, or merely a few peccadilloes in a few files. Sadly, if the code was developed without warnings and strict, it is more likely to be 'disaster' than 'minimal'.
You may find that your predecessors were prone to copy and paste and some erroneous idioms crop up repeatedly in copied code. Write a Perl script that fixes each one. I have a bunch of fix* scripts in my personal bin directory that deal with various changes - either fixing issues created by recalcitrant (or, more usually, simply long departed) colleagues or to accommodate my own changing standards.

You can set warnings and strictures for all Perl scripts by adding -Mwarnings -Mstrict to your PERL5OPT environment variable. See perlrun for details.

Related

Executing system commands safely while coding in Perl

Should one really use external commands while coding in Perl? I see several disadvantages of it. It's not system independent plus security risks might also be there. What do you think? If there is no way and you have to use the shell commands from Perl then what is the safest way to execute that particular command (like checking pid, uid etc)?
It depends on how hard it is going to be to replicate the functionality in Perl. If I needed to run the m4 macro processor on something, I'd not think of trying to replicate that functionality in Perl myself, and since there's no module on http://search.cpan.org/ that looks suitable, it would appear others agree with me. In that case, then, using the external program is sensible. On the other hand, if I needed to read the contents of a directory, then the combination of readdir() et al plus stat() or lstat() inside Perl is more sensible than futzing with the output of ls.
If you need to execute commands, think very carefully about how you invoke them. In particular, you probably want to avoid the shell interpreting the arguments, so use the array form of system (see also exec), etc, rather than a single string for the command plus arguments (which means the shell is used to process the command line).
Executing external commands can be expensive simply because it involves forking new process and watching for its output if you need it.
Probably more importantly, should external process fail for any reason, it may be difficult to understand what happened by means of your script. Worse still, surprisingly often external process can be stuck forever, so will be your script. You can use special tricks like opening pipe and watching for output in loop, but this itself is error-prone.
Perl is very capable of doing many things. So, if you stick to using only Perl native constructs and modules to accomplish your tasks, not only it will be faster because you never fork, but it will be more reliable and easier to catch errors by looking at native Perl objects and structures returned by library routines. And of course, it will be automatically portable to different platforms.
If your script runs under elevated permissions (like root or under sudo), you should be very careful as to what external programs you execute. One of the simple ways to ensure basic security is to always specify commands by full name, like /usr/bin/grep (but still think twice and just do grep by Perl itself!). However, even this may not be enough if attacker is using LD_PRELOAD mechanism to inject rogue shared libraries.
If you are willing to go very secure, it is suggested to use tainted check by using -T flag like this:
#!/usr/bin/perl -T
Taint flag will be also enabled by Perl automatically if your script was determined to have different real and effective user or group ids.
Tainted mode will severely limit your ability to do many things (like system() call) without Perl complaining - see more at http://perldoc.perl.org/perlsec.html#Taint-mode, but it will give you much higher security confidence.
Should one really use external commands while coding in Perl?
There's no single answer to this question. It all depends on what you are doing within the wide range of potential uses of Perl.
Are you using Perl as a glorified shell script on your local machine, or just trying to find a quick-and-dirty solution to your problem? In that case, it makes a lot of sense to run system commands if that is the easiest way to accomplish your task. Security and speed are not that important; what matters is the ability to code quickly.
On the other hand, are you writing a production program? In that case, you want secure, portable, efficient code. It is often preferable to write the functionality in Perl (or use a module), rather than calling an external program. At least, you should think hard about the benefits and drawbacks.

Cleanup huge Perl Codebase

I am currently working on a roughly 15 years old web application.
It contains mainly CGI perl scripts with HTML::Template templates.
It has over 12 000 files and roughly 260 MB of total code. I estimate that no more than 1500 perl scripts are needed and I want to get rid of all the unused code.
There are practically no tests written for the code.
My questions are:
Are you aware of any CPAN module that can help me get a list of only used and required modules?
What would be your approach if you'd want to get rid of all the extra code?
I was thinking at the following approaches:
try to override the use and require perl builtins with ones that output the loaded file name in a specific location
override the warnings and/or strict modules import function and output the file name in the specific location
study the Devel::Cover perl module and take the same approach and analyze the code when doing manual testing instead of automated tests
replace the perl executable with a custom one, which will log each name of file it reads (I don't know how to do that yet)
some creative use of lsof (?!?)
Devel::Modlist may give you what you need, but I have never used it.
The few times I have needed to do somehing like this I have opted for the more brute force approach of inspecting %INC at the end the program.
END {
open my $log_fh, ...;
print $log_fh "$_\n" for sort keys %INC;
}
As a first approximation, I would simply run
egrep -r '\<(use|require)\>' /path/to/source/*
Then spend a couple of days cleaning up the output from that. That will give you a list of all of the modules used or required.
You might also be able to play around with #INC to exclude certain library paths.
If you're trying to determine execution path, you might be able to run the code through the debugger with 'trace' (i.e. 't' in the debugger) turned on, then redirect the output to a text file for further analysis. I know that this is difficult when running CGI...
Assuming the relevant timestamps are turned on, you could check access times on the various script files - that should rule out any top-level script files that aren't being used.
Might be worth adding some instrumentation to CGI.pm to log the current script-name ($0) to see what's happening.

Should I use common::sense or just stick with `use strict` and `use warnings`?

I recently installed a module from CPAN and noticed one of its dependencies was common::sense, a module that offers to enable all the warnings you want, and none that you don't. From the module's synopsis:
use common::sense;
# supposed to be the same, with much lower memory usage, as:
#
# use strict qw(vars subs);
# use feature qw(say state switch);
# no warnings;
# use warnings qw(FATAL closed threads internal debugging pack substr malloc
# unopened portable prototype inplace io pipe unpack regexp
# deprecated exiting glob digit printf utf8 layer
# reserved parenthesis taint closure semicolon);
# no warnings qw(exec newline);
Save for undef warnings sometimes being a hassle, I've usually found the standard warnings to be good. Is it worth switching to common::sense instead of my normal use strict; use warnings;?
While I like the idea of reducing boiler-plate code, I am deeply suspicious of tools like Modern::Perl and common::sense.
The problem I have with modules like this is that they bundle up a group of behaviors and hide behid glib names with changeable meanings.
For example, Modern::Perl today consists of enabling some perl 5.10 features and using strict and warnings. But what happens when Perl 5.12 or 5.14 or 5.24 come out with great new goodies, and the community discovers that we need to use the frobnitz pragma everywhere? Will Modern::Perl provide a consistent set of behaviors or will it remain "Modern". If MP keeps with the times, it will break existing systems that don't keep lock-step with its compiler requirements. It adds extra compatibility testing to upgrade. At least that's my reaction to MP. I'll be the first to admit that chromatic is about 10 times smarter than me and a better programmer as well--but I still disagree with his judgment on this issue.
common::sense has a name problem, too. Whose idea of common sense is involved? Will it change over time?
My preference would be for a module that makes it easy for me to create my own set of standard modules, and even create groups of related modules/pragmas for specific tasks (like date time manipulation, database interaction, html parsing, etc).
I like the idea of Toolkit, but it sucks for several reasons: it uses source filters, and the macro system is overly complex and fragile. I have the utmost respect for Damian Conway, and he produces brilliant code, but sometimes he goes a bit too far (at least for production use, experimentation is good).
I haven't lost enough time typing use strict; use warnings; to feel the need to create my own standard import module. If I felt a strong need for automatically loading a set of modules/pragmas, something similar to Toolkit that allows one to create standard feature groups would be ideal:
use My::Tools qw( standard datetime SQLite );
or
use My::Tools;
use My::Tools::DateTime;
use My::Tools::SQLite;
Toolkit comes very close to my ideal. Its fatal defects are a bummer.
As for whether the choice of pragmas makes sense, that's a matter of taste. I'd rather use the occasional no strict 'foo' or no warnings 'bar' in a block where I need the ability to do something that requires it, than disable the checks over my entire file. Plus, IMO, memory consumption is a red herring. YMMV.
update
It seems that there are many (how many?) different modules of this type floating around CPAN.
There is latest, which is no longer the latest. Demonstrates part of the naming problem.
Also, uni::perl which adds enabling unicode part of the mix.
ToolSet offers a subset of Toolkit's abilities, but without source filters.
I'll include Moose here, since it automatically adds strict and warnings to the calling package.
And finally Acme::Very::Modern::Perl
The proliferation of these modules and the potential for overlapping requirements, adds another issue.
What happens if you write code like:
use Moose;
use common::sense;
What pragmas are enabled with what options?
I would say stick with warnings and strict for two main reasons.
If other people are going to use or work with your code, they are (almost certainly) used to warnings and strict and their rules. Those represent a community norm that you and other people you work with can count on.
Even if this or that specific piece of code is just for you, you probably don't want to worry about remembering "Is this the project where I adhere to warnings and strict or the one where I hew to common::sense?" Moving back and forth between the two modes will just confuse you.
There is one bit nobody else seems to have picked up on, and that's FATAL in the warnings list.
So as of 2.0, use common::sense is more akin to:
use strict;
use warnings FATAL => 'all'; # but with the specific list of fatals instead of 'all' that is
This is a somewhat important and frequently overlooked feature of warnings that ramps the strictness a whole degree higher. Instead of undef string interpolation, or infinite recursion just warning you and then keeping on going despite the problem, it actually halts.
To me this is helpful, because in many cases, undef string interpolation leads to further more dangerous errors, which may go silently unnoticed, and failing and bailing is a good thing.
I obviously have no common sense because I going more for Modern::Perl ;-)
The "lower memory usage" only works if you use no modules that load strict, feature, warnings, etc. and the "much" part is...not all that much.
Not everyone's idea of common sense is the same - in that respect it's anything but common.
Go with what you know. If you get undef warnings, chances are that your program or its input is incorrect.
Warnings are there for a reason. Anything that reduces them cannot be useful. (I always compile with gcc -Wall too...)
I have never had a warning that wasn't something dodgy/just plain wrong in my code. For me, it's always something technically allowed that I almost certainly don't want to do. I think the full suite of warnings is invaluable. If you find use strict + use warnings adequate for now, I don't see why you'd want to change to using a non-standard module which is then a dependency for every piece of code you write from here on out...
When it comes to warnings, I support the use of any module or built-in language feature that gives you the level of warnings that helps you make your code as solid and reliable as it can possibly be. An ignored warning is not helpful to anyone.
But if you're cozy with the standard warnings, stick with it. Coding to a stricter standard is great if you're used to it! I wouldn't recommend switching just for the memory savings. Only switch if the module helps you turn your code around quicker and with more confidence.
Many of peoples argues in a comments with what if the MP changes, it will break your code. While this can be an real threat, here is already MUCH things what are changes over time and break the code (sometimes after a deprecation cycle, sometimes not...).
Some other modules changed the API, so breaks things, and nobody care about them. E.g. Moose has at least two things what are deprecated now, and probably will be forbidden in some future releases.
Another example, years ago was allowed to write
for $i qw(some words)
now, it is deprecated. And many others... And this is a CORE language syntax.
Everybody survived. So, don't really understand why many of people argues againist helper modules. When they going to change, (probably) here will be a sort of deprecation cycle... So, my view is:
if you write programs to yourself, use any module you want ;)
if you write a program to someone, where someone others going to maintnanece it, use minimal nonstandard "pragma-like" modules (common::sense, modern::perl, uni::perl etc...)
in the stackoverflow questions, you can safely use common::sense or Modern::Perl etc. - most of users who will answer, your questions, knows them. Everybody understand than it is easier to write use 5.010; for enable strict, warnings and fearures with 10 chars as with 3 lines...

Is it okay to use modules from within subroutines?

Recently I start playing with OO Perl and I've been creating quite a bunch of new objects for a new project that I'm working on. As I'm unfamilliar with any best practice regarding OO Perl and we're kind in a tight rush to get it done :P
I'm putting a lot of this kind of code into each of my function:
sub funcx{
use ObjectX; # i don't declare this on top of the pm file
# but inside the function itself
my $obj = new ObjectX;
}
I was wondering if this will cause any negative impact versus putting on the use Object line on top of the Perl modules outside of any function scope.
I was doing this so that I feel it's cleaner in case I need to shift the function around.
And the other thing that I have noticed is that when I try to run a test.pl script on the unix server itself which test my objects, it slow as heck. But when the same code are run through CGI which is connected to an apache server, the web page doesn't load as slowly.
Where to put use?
use occurs at compile time, so it doesn't matter where you put it. At least from a purely pragmatic, 'will it work', point of view. Because it happens at compile time use will always be executed, even if you put it in a conditional. Never do this: if( $foo eq 'foo' ) { use SomeModule }
In my experience, it is best to put all your use statements at the top of the file. It makes it easy to see what is being loaded and what your dependencies are.
Update:
As brian d foy points out, things compiled before the use statement will not be affected by it. So, the location can matter. For a typical module, location does not matter, however, if it does things that affect compilation (for example it imports functions that have prototypes), the location could matter.
Also, Chas Owens points out that it can affect compilation. Modules that are designed to alter compilation are called pragmas. Pragmas are, by convention, given names in all lower-case. These effects apply only within the scope where the module is used. Chas uses the integer pragma as an example in his answer. You can also disable a pragma or module over a limited scope with the keyword no.
use strict;
use warnings;
my $foo;
print $foo; # Generates a warning
{ no warnings 'unitialized`; # turn off warnings for working with uninitialized values.
print $foo; # No warning here
}
print $foo; # Generates a warning
Indirect object syntax
In your example code you have my $obj = new ObjectX;. This is called indirect object syntax, and it is best avoided as it can lead to obscure bugs. It is better to use this form:
my $obj = ObjectX->new;
Why is your test script slow on the server?
There is no way to tell with the info you have provided.
But the easy way to find out is to profile your code and see where the time is being consumed. NYTProf is another popular profiling tool you may want to check out.
Best practices
Check out Perl Best Practices, and the quick reference card. This page has a nice run down of Damian Conway's OOP advice from PBP.
Also, you may wish to consider using Moose. If the long script startup time is acceptable in your usage, then Moose is a huge win.
question 1
It depends on what the module does. If it has lexical effects, then it will only affect the scope it is used in:
my $x;
{
use integer;
$x = 5/2; #$x is now 2
}
my $y = 5/2; #$y is now 2.5
If it is a normal module then it makes no difference where you use it, but it is common to use all of those modules at the top of the program.
question 2
Things that can affect the speed of a program between machines
speed of the processor
version of modules installed (some modules have XS versions that are much faster)
version of Perl
number of entries in PERL5LIB
speed of the drive
daotoad and Chas. Owens already answered the part of your question pertaining to the position of use statements. Let me remark on something else here:
I was doing this so that I feel it's
cleaner in case I need to shift the
function around.
Personally, I find it much cleaner to have all the used modules in one place at the top of the file. You won't have to search for use statements to see what other modules are being used and a quick glance will tell you what is being used and even what is not being used.
Regarding your performance problem: with Apache and mod_perl the Perl interpreter will have to parse and compile your used modules only once. The next time the script is run, execution should be much faster. On the command line, however, a second run doesn't get this benefit.

What's the modern way of declaring which version of Perl to use?

When it come to saying what version of Perl we need for our scripts, we've got options, oh, brother, we've got options:
use 5.010;
use 5.010_001;
use 5.10.0;
use v5.10;
use v5.10.0;
All seem to work. perlcritic complains about all but the first two. (It's unfortunate that the v strings seem to have such flaws, since Perl 6 expects you to do use v6; for your Perl 6 scripts...)
So, what should we be doing to indicate that we want to use a particular version of perl?
There are really only two options: decimal numbers and v-strings. Which form to use depends in part on which versions of Perl you want to "support" with a meaningful error message instead of a syntax error. (The v-string syntax was added in Perl 5.6.) The accepted best practice -- which is what perlcritic enforces -- is to use decimal notation. You should specify the minimum version of Perl that's required for your script to behave properly. Normally that means declaring a dependency on language features added in a major release, such as using the say function added in 5.10. You should include the patch level if it's important for your script to behave properly. For example, some of my code specifies use 5.008001 because it depends on the fix for a bug that 5.8.0 had which was fixed in 5.8.1.
I just use something like 5.010_001. I've grow weary of dealing with version string problems for something that should be mind-numbingly simple.
Since I mostly deal with build systems, I have the constant struggle of Module::Build's internal version.pm which is out of sync with the version.pm on CPAN. I think that's mostly better now, but I have better things to think about.
The best practice should always be to do the thing that commands the least of your attention, and certainly not take more attention than the value it gives back. In my opinion, v-strings and dotted decimals were a huge distraction with no additional benefit, wasting a lot of valuable programmer time just to get back to the starting point.
I should also note that Perl::Critic has often pushed questionable practices for the higher purpose of reducing the ways that people do things. However, those practices often cause problems, make them un-best. This is one of those cases. A more realistic best practice is to not make Perl::Critic compliance your goal. Use it where it is useful, but in cases like this, don't waste mental time on it.
The "modern" way is to use the forms starting with v. However, that may not necessarily be what you really want to do.
Critic complains because older versions of Perl won't understand and play nicely with the forms that start with v. However, if your version of Perl supports it, v is nicer to read because you can say:
use v5.10.1;
... rather than ...
use 5.010_001;
So, in the documentation for use, the following workaround is offered:
use 5.006; use v5.6.1;
NB: I think the documenation is in error here, as the v is omitted from the example at perldoc use.
Since the versions of Perl that don't support the v syntax will fail at the first use, they won't get to the second more specific and readable one.