How can I hook into Perl's print? - perl

Here's a scenario. You have a large amount of legacy scripts, all using a common library. Said scripts use the 'print' statement for diagnostic output. No changes are allowed to the scripts - they range far and wide, have their approvals, and have long since left the fruitful valleys of oversight and control.
Now a new need has arrived: logging must now be added to the library. This must be done automatically and transparently, without users of the standard library needing to change their scripts. Common library methods can simply have logging calls added to them; that's the easy part. The hard part lies in the fact that diagnostic output from these scripts were always displayed using the 'print' statement. This diagnostic output must be stored, but just as importantly, processed.
As an example of this processing, the library should only record the printed lines that contain the words 'warning', 'error', 'notice', or 'attention'. The below Extremely Trivial and Contrived Example Code (tm) would record some of said output:
sub CheckPrintOutput
{
my #output = #_; # args passed to print eventually find their way here.
foreach my $value (#output) {
Log->log($value) if $value =~ /warning|error|notice|attention/i;
}
}
(I'd like to avoid such issues as 'what should actually be logged', 'print shouldn't be used for diagnostics', 'perl sucks', or 'this example has the flaws x y and z'...this is greatly simplified for brevity and clarity. )
The basic problem comes down to capturing and processing data passed to print (or any perl builtin, along those lines of reasoning). Is it possible? Is there any way to do it cleanly? Are there any logging modules that have hooks to let you do it? Or is it something that should be avoided like the plague, and I should give up on ever capturing and processing the printed output?
Additional: This must run cross-platform - windows and *nix alike. The process of running the scripts must remain the same, as must the output from the script.
Additional additional: An interesting suggestion made in the comments of codelogic's answer:
You can subclass http://perldoc.perl.org/IO/Handle.html and create your
own file handle which will do the logging work. – Kamil Kisiel
This might do it, with two caveats:
1) I'd need a way to export this functionality to anyone who uses the common library. It would have to apply automatically to STDOUT and probably STDERR too.
2) the IO::Handle documentation says that you can't subclass it, and my attempts so far have been fruitless. Is there anything special needed to make sublclassing IO::Handle work? The standard 'use base 'IO::Handle' and then overriding the new/print methods seem to do nothing.
Final edit: Looks like IO::Handle is a dead end, but Tie::Handle may do it. Thanks for all the suggestions; they're all really good. I'm going to give the Tie::Handle route a try. If it causes problems I'll be back!
Addendum: Note that after working with this a bit, I found that Tie::Handle will work, if you don't do anything tricky. If you use any of the features of IO::Handle with your tied STDOUT or STDERR, it's basically a crapshoot to get them working reliably - I could not find a way to get the autoflush method of IO::Handle to work on my tied handle. If I enabled autoflush before I tied the handle it would work. If that works for you, the Tie::Handle route may be acceptable.

There are a number of built-ins that you can override (see perlsub). However, print is one of the built-ins that doesn't work this way. The difficulties of overriding print are detailed at this perlmonk's thread.
However, you can
Create a package
Tie a handle
Select this handle.
Now, a couple of people have given the basic framework, but it works out kind of like this:
package IO::Override;
use base qw<Tie::Handle>;
use Symbol qw<geniosym>;
sub TIEHANDLE { return bless geniosym, __PACKAGE__ }
sub PRINT {
shift;
# You can do pretty much anything you want here.
# And it's printing to what was STDOUT at the start.
#
print $OLD_STDOUT join( '', 'NOTICE: ', #_ );
}
tie *PRINTOUT, 'IO::Override';
our $OLD_STDOUT = select( *PRINTOUT );
You can override printf in the same manner:
sub PRINTF {
shift;
# You can do pretty much anything you want here.
# And it's printing to what was STDOUT at the start.
#
my $format = shift;
print $OLD_STDOUT join( '', 'NOTICE: ', sprintf( $format, #_ ));
}
See Tie::Handle for what all you can override of STDOUT's behavior.

You can use Perl's select to redirect STDOUT.
open my $fh, ">log.txt";
print "test1\n";
my $current_fh = select $fh;
print "test2\n";
select $current_fh;
print "test3\n";
The file handle could be anything, even a pipe to another process that post processes your log messages.
PerlIO::tee in the PerlIO::Util module seems to allows you to 'tee' the output of a file handle to multiple destinations (e.g. log processor and STDOUT).

Lots of choices. Use select() to change the filehandle that print defaults to. Or tie STDOUT. Or reopen it. Or apply an IO layer to it.

This isn't the answer to your issue but you should be able to adopt the logic for your own use. If not, maybe someone else will find it useful.
Catching malformed headers before they happen...
package PsychicSTDOUT;
use strict;
my $c = 0;
my $malformed_header = 0;
open(TRUE_STDOUT, '>', '/dev/stdout');
tie *STDOUT, __PACKAGE__, (*STDOUT);
sub TIEHANDLE {
my $class = shift;
my $handles = [#_];
bless $handles, $class;
return $handles;
}
sub PRINT {
my $class = shift;
if (!$c++ && #_[0] !~ /^content-type/i) {
my (undef, $file, $line) = caller;
print STDERR "Missing content-type in $file at line $line!!\n";
$malformed_header = 1;
}
return 0 if ($malformed_header);
return print TRUE_STDOUT #_;
}
1;
usage:
use PsychicSTDOUT;
print "content-type: text/html\n\n"; #try commenting out this line
print "<html>\n";
print "</html>\n";

You could run the script from a wrapper script that captures the original script's stdout and writes the output somewhere sensible.

Related

Capture NOTICE from PostgreSQL in Perl DBI

I'm using full text search with user generated input on a PostgreSQL table, "the a" in this example.
my $dbh = DBI->connect("...", { RaiseError => 0, AutoCommit => 1 });
my $sth = $dbh->prepare("SELECT name from mytable WHERE tsv ## plainto_tsquery('the a')");
my $rv = $sth->execute();
If the user input only contains stop words, I get a NOTICE on STDERR and the query returns no results, but produces no error.
I would like to capture that NOTICE in Perl to possibly alert the user to rephrase the search, but I can't seem to access it.
Setting RaiseError to 1 doesn't change anything and $dbh->pg_notifies returns undef.
Any ideas ?
I presume that the mention of "NOTICE" means that something does a RAISE. How that behaves is configurable but delving into that would require far more detail.
At perl level, there are various ways to get to the STDERR stream and capture what's sent to it.
If these are warn-ings then setting a hook for $SIG{__WARN__} runs that subroutine whenever a warning is to be printed
{
local $SIG{__WARN__} = sub { say "Got warning: $_[0]"; };
warn "a warning";
}
So you can capture it this way, do what you wish with it, and then perhaps reemit. The $_[0] in the example has the string a warning, and after the anonymous sub runs the control is back at the next line after a warning is issued (the warn statement in this snippet). See %SIG.
I put this in a block only to be able to local-ize the change to SIG{__WARN__}, what is practically compulsory (if this global isn't local-ized its change affects all code). So if this code is in a suitable lexical scope anyway then that block isn't needed.
But this won't catch things printed directly to STDERR. For that the most practical way is to use libraries, and a simple and convenient one for this purpose is Capture::Tiny
use Capture::Tiny qw(capture);
my ($stdout, $stderr, $exit) = capture {
say "to STDOUT";
say STDERR "to STDERR";
warn "a warn-ing";
# ...
};
Now $stdout has text to STDOUT while $stderr has to STDERR followed by a warn-ing. So your database code would go in such a block and the NOTICE should wind up in that $stderr variable. There is also a capture_stderr function if you prefer to capture only that this way.
That doesn't seem to work as it should, and it is recognized as an unsolved bug.

Perl - New definition of myprint() or Overload print command

I am a newb to Perl. I am writing some scripts and want to define my own print called myprint() which will print the stuff passed to it based on some flags (verbose/debug flag)
open(FD, "> /tmp/abc.txt") or die "Cannot create abc.txt file";
print FD "---Production Data---\n";
myprint "Hello - This is only a comment - debug data";
Can someone please help me with some sample code to for myprint() function?
Do you care more about writing your own logging system, or do you want to know how to put logging statements in appropriate parts of your program which you can turn off (and, incur little performance penalty when they are turned off)?
If you want a logging system that is easy to start using, but also offers a world of features which you can incrementally discover and use, Log::Log4perl is a good option. It has an easy mode, which allows you to specify the desired logging level, and emits only those logging messages that are above the desired level.
#!/usr/bin/env perl
use strict; use warnings;
use File::Temp qw(tempfile);
use Log::Log4perl qw(:easy);
Log::Log4perl->easy_init({level => $INFO});
my ($fh, $filename) = tempfile;
print $fh "---Production Data---\n";
WARN 'Wrote something somewhere somehow';
The snippet also shows a better way of opening a temporary file using File::Temp.
As for overriding the built-in print … It really isn't a good idea to fiddle with built-ins except in very specific circumstances. perldoc perlsub has a section on Overriding Built-in Functions. The accepted answer to this question lists the Perl built-ins that cannot be overridden. print is one of those.
But, then, one really does not need to override a built-in to write a logging system.
So, if an already-written logging system does not do it for you, you really seem to be asking "how do I write a function that prints stuff conditionally depending on the value of a flag?"
Here is one way:
#!/usr/bin/env perl
package My::Logger;
{
use strict; use warnings;
use Sub::Exporter -setup => {
exports => [
DEBUG => sub {
return sub {} unless $ENV{MYDEBUG};
return sub { print 'DEBUG: ' => #_ };
},
]
};
}
package main;
use strict; use warnings;
# You'd replace this with use My::Logger qw(DEBUG) if you put My::Logger
# in My/Logger.pm somewhere in your #INC
BEGIN {
My::Logger->import('DEBUG');
}
sub nicefunc {
print "Hello World!\n";
DEBUG("Isn't this a nice function?\n");
return;
}
nicefunc();
Sample usage:
$ ./yy.pl
Hello World!
$ MYDEBUG=1 ./yy.pl
Hello World!
DEBUG: Isn't this a nice function?
I wasn't going to answer this because Sinan already has the answer I'd recommend, but tonight I also happened to be working on the "Filehandle References" chapter to the upcoming Intermediate Perl. That are a couple of relevant paragraphs which I'll just copy directly without adapting them to your question:
IO::Null and IO::Interactive
Sometimes we don't want to send our output anywhere, but we are forced
to send it somewhere. In that case, we can use IO::Null to create
a filehandle that simply discards anything that we give it. It looks
and acts just like a filehandle, but does nothing:
use IO::Null;
my $null_fh = IO::Null->new;
some_printing_thing( $null_fh, #args );
Other times, we want output in some cases but not in others. If we are
logged in and running our program in our terminal, we probably want to
see lots of output. However, if we schedule the job through cron, we
probably don't care so much about the output as long as it does the job.
The IO::Interactive module is smart enough to tell the difference:
use IO::Interactive;
print { is_interactive } 'Bamboo car frame';
The is_interactive subroutine returns a filehandle. Since the
call to the subroutine is not a simple scalar variable, we surround
it with braces to tell Perl that it's the filehandle.
Now that you know about "do nothing" filehandles, you can replace some
ugly code that everyone tends to write. In some cases you want output
and in some cases you don't, so many people use a post-expression
conditional to turn off a statement in some cases:
print STDOUT "Hey, the radio's not working!" if $Debug;
Instead of that, you can assign different values to $debug_fh based
on whatever condition you want, then leave off the ugly if $Debug
at the end of every print:
use IO::Null;
my $debug_fh = $Debug ? *STDOUT : IO::Null->new;
$debug_fh->print( "Hey, the radio's not working!" );
The magic behind IO::Null might give a warning about "print() on
unopened filehandle GLOB" with the indirect object notation (e.g.
print $debug_fh) even though it works just fine. We don't get that
warning with the direct form.

Subroutines vs scripts in Perl

I'm fairly new to Perl and was wondering what the best practices regarding subroutines are with Perl. Can a subroutine be too big?
I'm working on a script right now, and it might need to call another script. Should I just integrate the old script into the new one in the form of a subroutine? I need to pass one argument to the script and need one return value.
I'm guessing I'd have to do some sort of black magic to get the output from the original script, so subroutine-ing it makes sense right?
Avoiding "black magic" is always a good idea when writing code. You never want to jump through hoops and come up with an unintuitive hack to solve a problem, especially if that code needs to be supported later. It happens, admittedly, and we're all guilty of it. Circumstances can weigh heavily on "just getting the darn thing to work."
The point is, the best practice is always to make the code clean and understandable. Remember, and this is especially true with Perl code in my experience, any code you wrote yourself more than a few months ago may as well have been written by someone else. So even if you're the only one who needs to support it, do yourself a favor and make it easy to read.
Don't cling to broad sweeping ideas like "favor more files over larger files" or "favor smaller methods/subroutines over larger ones" etc. Those are good guidelines to be sure, but apply the spirit of the guideline rather than the letter of it. Keep the code clean, understandable, and maintainable. If that means the occasional large file or large method/subroutine, so be it. As long as it makes sense.
A key design goal is separation of concerns. Ideally, each subroutine performs a single well-defined task. In this light, the main question revolves not around a subroutine's size but its focus. If your program requires multiple tasks, that implies multiple subroutines.
In more complex scenarios, you may end up with groups of subroutines that logically belong together. They can be organized into libraries or, even better, modules. If possible, you want to avoid a scenario where you end up with multiple scripts that need to communicate with each other, because the usual mechanism for one script to return data to another script is tedious: the first script writes to standard output and the second script must parse that output.
Several years ago I started work at a job requiring that I build a large number of command-line scripts (at least, that's how it turned out; in the beginning, it wasn't clear what we were building). I was quite inexperienced at the time and did not organize the code very well. In hindsight, I should have worked from the premise that I was writing modules rather than scripts. In other words, the real work would have been done by modules, and the scripts (the code executed by a user on the command line) would have remained very small front-ends to invoke the modules in various ways. This would have facilitated code reuse and all of that good stuff. Live and learn, right?
Another option that hasn't been mentioned yet for reusing the code in your scripts is to put common code in a module. If you put shared subroutines into a module or modules, you can keep your scripts short and focussed on what they do that is special, while isolating the common code in a easy to access and reuse form.
For example, here is a module with a few subroutines. Put this in a file called MyModule.pm:
package MyModule;
# Always do this:
use strict;
use warnings;
use IO::Handle; # For OOP filehandle stuff.
use Exporter qw(import); # This lets us export subroutines to other scripts.
# These may be exported.
our #EXPORT_OK = qw( gather_data_from_fh open_data_file );
# Automatically export everything allowed.
# Generally best to leave empty, but in some cases it makes
# sense to export a small number of subroutines automatically.
our #EXPORT = #EXPORT_OK;
# Array of directories to search for files.
our #SEARCH_PATH;
# Parse the contents of a IO::Handle object and return structured data
sub gather_data_from_fh {
my $fh = shift;
my %data;
while( my $line = $fh->readline );
# Parse the line
chomp $line;
my ($key, #values) = split $line;
$data{$key} = \#values;
}
return \%data;
}
# Search a list of directories for a file with a matching name.
# Open it and return a handle if found.
# Die otherwise
sub open_data_file {
my $file_name = shift;
for my $path ( #SEARCH_PATH, '.' ) {
my $file_path = "$path/$file_name";
next unless -e $file_path;
open my $fh, '<', $file_path
or die "Error opening '$file_path' - $!\n"
return $fh;
}
die "No matching file found in path\n";
}
1; # Need to have trailing TRUE value at end of module.
Now in script A, we take a filename to search for and process and then print formatted output:
use strict;
use warnings;
use MyModule;
# Configure which directories to search
#MyModule::SEARCH_PATH = qw( /foo/foo/rah /bar/bar/bar /eeenie/meenie/mynie/moe );
#get file name from args.
my $name = shift;
my $fh = open_data_file($name);
my $data = gather_data_from_fh($fh);
for my $key ( sort keys %$data ) {
print "$key -> ", join ', ', #{$data->{$key}};
print "\n";
}
Script B, searches for a file, parses it and then writes the parsed data structure into a YAML file.
use strict;
use warnings;
use MyModule;
use YAML qw( DumpFile );
# Configure which directories to search
#MyModule::SEARCH_PATH = qw( /da/da/da/dum /tutti/frutti/unruly /cheese/burger );
#get file names from args.
my $infile = shift;
my $outfile = shift;
my $fh = open_data_file($infile);
my $data = gather_data_from_fh($fh);
DumpFile( $outfile, $data );
Some related documentation:
perlmod - About Perl modules in general
perlmodstyle - Perl module style guide; this has very useful info.
perlnewmod - Starting a new module
Exporter - The module used to export functions in the sample code
use - the perlfunc article on use.
Some of these docs assume you will be sharing your code on CPAN. If you won't be publishing to CPAN, simply ignore the parts about signing up and uploading code.
Even if you aren't writing for CPAN, it is beneficial to use the standard tools and CPAN file structure for your module development. Following the standard allows you to use all of the tools CPAN authors use to simplify the development, testing and installation process.
I know that all this seems really complicated, but the standard tools make each step easy. Even adding unit tests to your module distribution is easy thanks to the great tools available. The payoff is huge, and well worth the time you will invest.
Sometimes it makes sense to have a separate script, sometimes it doesn't. The "black magic" isn't that complicated.
#!/usr/bin/perl
# square.pl
use strict;
use warnings;
my $input = shift;
print $input ** 2;
#!/usr/bin/perl
# sum_of_squares.pl
use strict;
use warnings;
my ($from, $to) = #ARGV;
my $sum;
for my $num ( $from .. $to ) {
$sum += `square.pl $num` // die "square.pl failed: $? $!";
}
print $sum, "\n";
Easier and better error reporting on failure is automatic with IPC::System::Simple:
#!/usr/bin/perl
# sum_of_squares.pl
use strict;
use warnings;
use IPC::System::Simple 'capture';
my ($from, $to) = #ARGV;
my $sum;
for my $num ( $from .. $to ) {
$sum += capture( "square.pl $num" );
}
print $sum, "\n";

How can I get around a 'die' call in a Perl library I can't modify?

Yes, the problem is with a library I'm using, and no, I cannot modify it. I need a workaround.
Basically, I'm dealing with a badly written Perl library, that exits with 'die' when a certain error condition is encountered reading a file. I call this routine from a program which is looping through thousands of files, a handful of which are bad. Bad files happen; I just want my routine to log an error and move on.
IF I COULD modify the library, I would simply change the
die "error";
to a
print "error";return;
, but I cannot. Is there any way I can couch the routine so that the bad files won't crash the entire process?
FOLLOWUP QUESTION: Using an "eval" to couch the crash-prone call works nicely, but how do I set up handling for catch-able errors within that framework? To describe:
I have a subroutine that calls the library-which-crashes-sometimes many times. Rather than couch each call within this subroutine with an eval{}, I just allow it to die, and use an eval{} on the level that calls my subroutine:
my $status=eval{function($param);};
unless($status){print $#; next;}; # print error and go to next file if function() fails
However, there are error conditions that I can and do catch in function(). What is the most proper/elegant way to design the error-catching in the subroutine and the calling routine so that I get the correct behavior for both caught and uncaught errors?
You could wrap it in an eval. See:
perldoc -f eval
For instance, you could write:
# warn if routine calls die
eval { routine_might_die }; warn $# if $#;
This will turn the fatal error into a warning, which is more or less what you suggested. If die is called, $# contains the string passed to it.
Does it trap $SIG{__DIE__}? If it does, then it's more local than you are. But there are a couple strategies:
You can evoke its package and override die:
package Library::Dumb::Dyer;
use subs 'die';
sub die {
my ( $package, $file, $line ) = caller();
unless ( $decider->decide( $file, $package, $line ) eq 'DUMB' ) {
say "It's a good death.";
die #_;
}
}
If not, can trap it. (look for $SIG on the page, markdown is not handling the full link.)
my $old_die_handler = $SIG{__DIE__};
sub _death_handler {
my ( $package, $file, $line ) = caller();
unless ( $decider->decide( $file, $package, $line ) eq 'DUMB DIE' ) {
say "It's a good death.";
goto &$old_die_handler;
}
}
$SIG{__DIE__} = \&_death_handler;
You might have to scan the library, find a sub that it always calls, and use that to load your $SIG handler by overriding that.
my $dumb_package_do_something_dumb = \&Dumb::do_something_dumb;
*Dumb::do_something_dumb = sub {
$SIG{__DIE__} = ...
goto &$dumb_package_do_something_dumb;
};
Or override a builtin that it always calls...
package Dumb;
use subs 'chdir';
sub chdir {
$SIG{__DIE__} = ...
CORE::chdir #_;
};
If all else fails, you can whip the horse's eyes with this:
package CORE::GLOBAL;
use subs 'die';
sub die {
...
CORE::die #_;
}
This will override die globally, the only way you can get back die is to address it as CORE::die.
Some combination of this will work.
Although changing a die to not die has a specific solution as shown in the other answers, in general you can always override subroutines in other packages. You don't change the original source at all.
First, load the original package so you get all of the original definitions. Once the original is in place, you can redefine the troublesome subroutine:
BEGIN {
use Original::Lib;
no warnings 'redefine';
sub Original::Lib::some_sub { ... }
}
You can even cut and paste the original definition and tweak what you need. It's not a great solution, but if you can't change the original source (or want to try something before you change the original), it can work.
Besides that, you can copy the original source file into a separate directory for your application. Since you control that directory, you can edit the files in it. You modify that copy and load it by adding that directory to Perl's module search path:
use lib qw(/that/new/directory);
use Original::Lib; # should find the one in /that/new/directory
Your copy sticks around even if someone updates the original module (although you might have to merge changes).
I talk about this quite a bit in Mastering Perl, where I show some other techniques to do that sort of thing. The trick is to not break things even more. How you not break things depends on what you are doing.

Should I manually set Perl's #ARGV so I can use <> to open, scan, and close files?

I have recently started learning Perl and one of my latest assignments involves searching a bunch of files for a particular string. The user provides the directory name as an argument and the program searches all the files in that directory for the pattern. Using readdir() I have managed to build an array with all the searchable file names and now need to search each and every file for the pattern, my implementation looks something like this -
sub searchDir($) {
my $dirN = shift;
my #dirList = glob("$dirN/*");
for(#dirList) {
push #fileList, $_ if -f $_;
}
#ARGV = #fileList;
while(<>) {
## Search for pattern
}
}
My question is - is it alright to manually load the #ARGV array as has been done above and use the <> operator to scan in individual lines or should I open / scan / close each file individually? Will it make any difference if this processing exists in a subroutine and not in the main function?
On the topic of manipulating #ARGV - that's definitely working code, Perl certainly allows you to do that. I don't think it's a good coding habit though. Most of the code I've seen that uses the "while (<>)" idiom is using it to read from standard input, and that's what I initially expect your code to do. A more readable pattern might be to open/close each input file individually:
foreach my $file (#files) {
open FILE, "<$file" or die "Error opening file $file ($!)";
my #lines = <FILE>;
close FILE or die $!;
foreach my $line (#file) {
if ( $line =~ /$pattern/ ) {
# do something here!
}
}
}
That would read more easily to me, although it is a few more lines of code. Perl allows you a lot of flexibility, but I think that makes it that much more important to develop your own style in Perl that's readable and understandable to you (and your co-workers, if that's important for your code/career).
Putting subroutines in the main function or in a subroutine is also mostly a stylistic decision that you should play around with and think about. Modern computers are so fast at this stuff that style and readability is much more important for scripts like this, as you're not likely to encounter situations in which such a script over-taxes your hardware.
Good luck! Perl is fun. :)
Edit: It's of course true that if he had a very large file, he should do something smarter than slurping the entire file into an array. In that case, something like this would definitely be better:
while ( my $line = <FILE> ) {
if ( $line =~ /$pattern/ ) {
# do something here!
}
}
The point when I wrote "you're not likely to encounter situations in which such a script over-taxes your hardware" was meant to cover that, sorry for not being more specific. Besides, who even has 4GB hard drives, let alone 4GB files? :P
Another Edit: After perusing the Internet on the advice of commenters, I've realized that there are hard drives that are much larger than 4GB available for purchase. I thank the commenters for pointing this out, and promise in the future to never-ever-ever try to write a sarcastic comment on the internet.
I would prefer this more explicit and readable version:
#!/usr/bin/perl -w
foreach my $file (<$ARGV[0]/*>){
open(F, $file) or die "$!: $file";
while(<F>){
# search for pattern
}
close F;
}
But it is also okay to manipulate #ARGV:
#!/usr/bin/perl -w
#ARGV = <$ARGV[0]/*>;
while(<>){
# search for pattern
}
Yes, it is OK to adjust the argument list before you start the 'while (<>)' loop; it would be more nearly foolhardy to adjust it while inside the loop. If you process option arguments, for instance, you typically remove items from #ARGV; here, you are adding items, but it still changes the original value of #ARGV.
It makes no odds whether the code is in a subroutine or in the 'main function'.
The previous answers cover your main Perl-programming question rather well.
So let me comment on the underlying question: How to find a pattern in a bunch of files.
Depending on the OS it might make sense to call a specialised external program, say
grep -l <pattern> <path>
on unix.
Depending on what you need to do with the files containing the pattern, and how big the hit/miss ratio is, this might save quite a bit of time (and re-uses proven code).
The big issue with tweaking #ARGV is that it is a global variable. Also, you should be aware that while (<>) has special magic attributes. (reading each file in #ARGV or processing STDIN if #ARGV is empty, testing for definedness rather than truth). To reduce the magic that needs to be understood, I would avoid it, except for quickie-hack-jobs.
You can get the filename of the current file by checking $ARGV.
You may not realize it, but you are actually affecting two global variables, not just #ARGV. You are also hitting $_. It is a very, very good idea to localize $_ as well.
You can reduce the impact of munging globals by using local to localize the changes.
BTW, there is another important, subtle bit of magic with <>. Say you want to return the line number of the match in the file. You might think, ok, check perlvar and find $. gives the linenumber in the last handle accessed--great. But there is an issue lurking here--$. is not reset between #ARGV files. This is great if you want to know how many lines total you have processed, but not if you want a line number for the current file. Fortunately there is a simple trick with eof that will solve this problem.
use strict;
use warnings;
...
searchDir( 'foo' );
sub searchDir {
my $dirN = shift;
my $pattern = shift;
local $_;
my #fileList = grep { -f $_ } glob("$dirN/*");
return unless #fileList; # Don't want to process STDIN.
local #ARGV;
#ARGV = #fileList;
while(<>) {
my $found = 0;
## Search for pattern
if ( $found ) {
print "Match at $. in $ARGV\n";
}
}
continue {
# reset line numbering after each file.
close ARGV if eof; # don't use eof().
}
}
WARNING: I just modified your code in my browser. I have not run it so it, may have typos, and probably won't work without a bit of tweaking
Update: The reason to use local instead of my is that they do very different things. my creates a new lexical variable that is only visible in the contained block and cannot be accessed through the symbol table. local saves the existing package variable and aliases it to a new variable. The new localized version is visible in any subsequent code, until we leave the enclosing block. See perlsub: Temporary Values Via local().
In the general case of making new variables and using them, my is the correct choice. local is appropriate when you are working with globals, but you want to make sure you don't propagate your changes to the rest of the program.
This short script demonstrates local:
$foo = 'foo';
print_foo();
print_bar();
print_foo();
sub print_bar {
local $foo;
$foo = 'bar';
print_foo();
}
sub print_foo {
print "Foo: $foo\n";
}