Can I skip a whole file with the <> operator? - perl

The following Perl code has an obvious inefficiency;
while (<>)
{
if ($ARGV =~ /\d+\.\d+\.\d+/) {next;}
... or do something useful
}
The code will step through every line of the file we don't want.
On the size of files this particular script is running on this is unlikely to make a noticeable difference, but for the sake of learning; How can I junk the whole file <> is working and move to the next one?
The purpose of this is because the sever this script runs on stores old versions of apps with the version number in the file name, I'm only interested in the current version.

Paul Roub's solution is best if you can filter #ARGV before you start reading any files.
If you have to skip a file after you've begun iterating it,
while (<>) {
if (/# Skip the rest of this file/) {
close ARGV;
next;
}
print "$ARGV: $_";
}

grep ARGV first.
#ARGV = grep { $_ !~ /\d+\.\d+\.\d+/ } #ARGV;
while (<>)
{
# do something with the other files
}

Paul Roub's answer works for more information, see The IO operators section of perlop man page. The pattern of using grep is mentioned as well as a few other things related to <>.
Take note of the mention of ARGV::readonly regarding things like:
perl dangerous.pl 'rm -rfv *|'

Related

What does -l $_ do in Perl, and how does it work?

What is the meaning of the nest code
foreach (#items)
{
if (-l $_) ## this is what I don't understand: the meaning of -l
{
...
}
}
Thanks for any help.
Let's look at each thing:
foreach (#items) {
...
}
This for loop (foreach and for are the same command in Perl) is taking each item from the #items list, and setting it to $_. The $_ is a special variable in Perl that is used as sort of a default variable. The idea is that you could do things like this:
foreach (#items) {
s/foo/bar/;
uc;
print;
}
And each of those command would operate on that $_ variable! If you simply say print with nothing else, it would print whatever is in $_. If you say uc and didn't mention a variable, it would uppercase whatever is in $_.
This is now discouraged for several reasons. First, $_ is global, so there might be side effects that are not intended. For example, imagine you call a subroutine that mucked with the value of $_. You would suddenly be surprised that your program doesn't work.
The other -l is a test operator. This operator checks whether the file given is a symbolic link or not. I've linked to the Perldoc that explains all of the test operators.
If you're not knowledgeable in Unix or BASH/Korn/Bourne shell scripting, having a command that starts with a dash just looks weird. However, much of Perl's syntax was stolen... I mean borrowed from Unix shell and awk commands. In Unix, there's a command called test which you can use like this:
if test -L $FILE
then
....
fi
In Unix, that -L is a parameter to the test command, and in Unix, most parameters to commands start with dashes. Perl simply borrowed the same syntax dash and all.
Interestingly, if you read the Perldoc for these test commands, you will notice that like the foreach loop, the various test commands will use the $_ variable if you don't give it a variable or file name. Whoever wrote that script could have written their loop like this:
foreach (#items)
{
if (-l) ## Notice no mention of the `$_` variable
{
...
}
}
Yeah, that's soooo much clear!
Just for your information, The modern way as recommended by many Perl experts (cough Damian Conway cough) is to avoid the $_ variable whenever possible since it doesn't really add clarity and can cause problems. He also recommends just saying for and forgetting foreach, and using curly braces on the same line:
for my $file (#items) {
if ( -l $file ) {
...
}
}
That might not help with the -l command, but at least you can see you're dealing with files, so you might suspect that -l has something to do with files.
Unfortunately, the Perldoc puts all of these file tests under the -X section and alphabetized under X, so if you're searching the Perldoc for a -l command, or any command that starts with a dash, you won't find it unless you know. However, at least you know now for the future where to look when you see something like this: -s $file.
It's an operator that checks if a file is a symbolic link.
The -l filetest operator checks whether a file is a symbolic link.
The way -l works under the hood resembles the code below.
#! /usr/bin/env perl
use strict;
use warnings;
use Fcntl ':mode';
sub is_symlink {
my($path) = #_;
my $mode = (lstat $path)[2];
die "$0: lstat $path: $!" unless defined $mode;
return S_ISLNK $mode;
}
my #items = #ARGV;
foreach (#items) {
if (is_symlink $_) {
print "$0: link: $_\n";
}
}
Sample output:
$ ln -s foo/bar/baz quux
$ ./flag-links flag-links quux
./flag-links: link: quux
Note the call to lstat and not stat because the latter would attempt to follow symlinks but never identify them!
To understand how Unix mode bits work, see the accepted answer to “understanding and decoding the file mode value from stat function output.”
From perldoc :
-l File is a symbolic link.

How to remove a specific word from a file in perl

A file contains:
rhost=localhost
ruserid=abcdefg_xxx
ldir=
lfile=
rdir=p01
rfile=
pgp=none
mainframe=no
ftpmode=binary
ftpcmd1=
ftpcmd2=
ftpcmd3=
ftpcmd1a=
ftpcmd2a=
notifycc=no
firstfwd=Yes
NOTIFYNYL=
decompress=no
compress=no
I want to write a simple code that removes the "_xxx" in that second line. Keep in mind that there will never be a file that contains the string "_xxx" so that should make it extremely easier. I'm just not too familiar with the syntax. Thanks!
The short answer:
Here's how you can remove just the literal '_xxx'.
perl -pli.bak -e 's/_xxx$//' filename
The detailed explanation:
Since Perl has a reputation for code that is indistinguishable from line noise, here's an explanation of the steps.
-p creates an implicit loop that looks something like this:
while( <> ) {
# Your code goes here.
}
continue {
print or die;
}
-l sort of acts like "auto-chomp", but also places the line ending back on the line before printing it again. It's more complicated than that, but in its simplest use, it changes your implicit loop to look like this:
while( <> ) {
chomp;
# Your code goes here.
}
continue {
print $_, $/;
}
-i tells Perl to "edit in place." Behind the scenes it creates a separate output file and at the end it moves that temporary file to replace the original.
.bak tells Perl that it should create a backup named 'originalfile.bak' so that if you make a mistake it can be reversed easily enough.
Inside the substitution:
s/
_xxx$ # Match (but don't capture) the final '_xxx' in the string.
/$1/x; # Replace the entire match with nothing.
The reference material:
For future reference, information on the command line switches used in Perl "one-liners" can be obtained in Perl's documentation at perlrun. A quick introduction to Perl's regular expressions can be found at perlrequick. And a quick overview of Perl's syntax is found at perlintro.
This overwrites the original file, getting rid of _xxx in the 2nd line:
use warnings;
use strict;
use Tie::File;
my $filename = shift;
tie my #lines, 'Tie::File', $filename or die $!;
$lines[1] =~ s/_xxx//;
untie #lines;
Maybe this can help
perl -ple 's/_.*// if /^ruserid/' < file
will remove anything after the 1st '_' (included) in the line what start with "ruserid".
One way using perl. In second line ($. == 2), delete from last _ until end of line:
perl -lpe 's/_[^_]*\Z// if $. == 2' infile

How can I print a matching line, one line immediately above it and one line immediately below?

From a related question asked by Bi, I've learnt how to print a matching line together with the line immediately below it. The code looks really simple:
#!perl
open(FH,'FILE');
while ($line = <FH>) {
if ($line =~ /Pattern/) {
print "$line";
print scalar <FH>;
}
}
I then searched Google for a different code that can print matching lines with the lines immediately above them. The code that would partially suit my purpose is something like this:
#!perl
#array;
open(FH, "FILE");
while ( <FH> ) {
chomp;
$my_line = "$_";
if ("$my_line" =~ /Pattern/) {
foreach( #array ){
print "$_\n";
}
print "$my_line\n"
}
push(#array,$my_line);
if ( "$#array" > "0" ) {
shift(#array);
}
};
Problem is I still can't figure out how to do them together. Seems my brain is shutting down. Does anyone have any ideas?
Thanks for any help.
UPDATE:
I think I'm sort of touched. You guys are so helpful! Perhaps a little Off-topic, but I really feel the impulse to say more.
I needed a Windows program capable of searching the contents of multiple files and of displaying the related information without having to separately open each file. I tried googling and two apps, Agent Ransack and Devas, have proved to be useful, but they display only the lines containing the matched query and I want aslo to peek at the adjacent lines. Then the idea of improvising a program popped into my head. Years ago I was impressed by a Perl script that could generate a Tomeraider format of Wikipedia so that I can handily search Wiki on my Lifedrive and I've also read somewhere on the net that Perl is easy to learn especially for some guy like me who has no experience in any programming language. Then I sort of started teaching myself Perl a couple of days ago. My first step was to learn how to do the same job as "Agent Ransack" does and it proved to be not so difficult using Perl. I first learnt how to search the contents of a single file and display the matching lines through the modification of an example used in the book titled "Perl by Example", but I was stuck there. I became totally clueless as how to deal with multiple files. No similar examples were found in the book or probably because I was too impatient. And then I tried googling again and was led here and I asked my first question "How can I search multiple files for a string pattern in Perl?" here and I must say this forum is bloody AWESOME ;). Then I looked at more example scripts and then I came up with the following code yesterday and it serves my original purpose quite well:
The codes goes like this:
#!perl
$hits=0;
print "INPUT YOUR QUERY:";
chop ($query = <STDIN>);
$dir = 'f:/corpus/';
#files = <$dir/*>;
foreach $file (#files) {
open (txt, "$file");
while($line = <txt>) {
if ($line =~ /$query/i) {
$hits++;
print "$file \n $line";
print scalar <txt>;
}
}
}
close(txt);
print "$hits RESULTS FOUND FOR THIS SEARCH\n";
In the folder "corpus", I have a lot of text files including srt pdf doc files that contain such contents as follows:
Then I dumped the body.
J'ai mis le corps dans une décharge.
I know you have a wire.
Je sais que tu as un micro.
Now I'll tell you the truth.
Alors je vais te dire la vérité.
Basically I just need to search an English phrase and look at the French equivalent, so the script I finished yesterday is quite satisfying except that it would to be better if my script can display the above line in case I want to search a French phrase and check the English. So I'm trying to improve the code. Actually I knew the "print scalar " is buggy, but it is neat and does the job of printing the subsequent line at least most of the time). I was even expecting ANOTHER SINGLE magic line that prints the previous line instead of the subsequent :) Perl seems to be fun. I think I will spend more time trying to get a better understanding of it. And as suggested by daotoad, I'll study the codes generously offered by you guys. Again thanks you guys!
It will probably be easier just to use grep for this as it allows printing of lines before and after a match. Use -B and -A to print context before and after the match respectively. See http://ss64.com/bash/grep.html
Here's a modernized version of Pax's excellent answer:
use strict;
use warnings;
open( my $fh, '<', 'qq.in')
or die "Error opening file - $!\n";
my $this_line = "";
my $do_next = 0;
while(<$fh>) {
my $last_line = $this_line;
$this_line = $_;
if ($this_line =~ /XXX/) {
print $last_line unless $do_next;
print $this_line;
$do_next = 1;
} else {
print $this_line if $do_next;
$last_line = "";
$do_next = 0;
}
}
close ($fh);
See Why is three-argument open calls with lexical filehandles a Perl best practice? for an discussion of the reasons for the most important changes.
Important changes:
3 argument open.
lexical filehandle
added strict and warnings pragmas.
variables declared with lexical scope.
Minor changes (issues of style and personal taste):
removed unneeded parens from post-fix if
converted an if-not contstruct into unless.
If you find this answer useful, be sure to up-vote Pax's original.
Given the following input file:
(1:first) Yes, this one.
(2) This one as well (XXX).
(3) And this one.
Not this one.
Not this one.
Not this one.
(4) Yes, this one.
(5) This one as well (XXX).
(6) AND this one as well (XXX).
(7:last) And this one.
Not this one.
this little snippet:
open(FH, "<qq.in");
$this_line = "";
$do_next = 0;
while(<FH>) {
$last_line = $this_line;
$this_line = $_;
if ($this_line =~ /XXX/) {
print $last_line if (!$do_next);
print $this_line;
$do_next = 1;
} else {
print $this_line if ($do_next);
$last_line = "";
$do_next = 0;
}
}
close (FH);
produces the following, which is what I think you were after:
(1:first) Yes, this one.
(2) This one as well (XXX).
(3) And this one.
(4) Yes, this one.
(5) This one as well (XXX).
(6) AND this one as well (XXX).
(7:last) And this one.
It basically works by remembering the last line read and, when it finds the pattern, it outputs it and the pattern line. Then it continues to output pattern lines plus one more (with the $do_next variable).
There's also a little bit of trickery in there to ensure no line is printed twice.
You always want to store the last line that you saw in case the next line has your pattern and you need to print it. Using an array like you did in the second code snippet is probably overkill.
my $last = "";
while (my $line = <FH>) {
if ($line =~ /Pattern/) {
print $last;
print $line;
print scalar <FH>; # next line
}
$last = $line;
}
grep -A 1 -B 1 "search line"
I am going to ignore the title of your question and focus on some of the code you posted because it is positively harmful to let this code stand without explaining what is wrong with it. You say:
code that can print matching lines with the lines immediately above them. The code that would partially suit my purpose is something like this
I am going to go through that code. First, you should always include
use strict;
use warnings;
in your scripts, especially since you are just learning Perl.
#array;
This is a pointless statement. With strict, you can declare #array using:
my #array;
Prefer the three-argument form of open unless there is a specific benefit in a particular situation to not using it. Use lexical filehandles because bareword filehandles are package global and can be the source of mysterious bugs. Finally, always check if open succeeded before proceeding. So, instead of:
open(FH, "FILE");
write:
my $filename = 'something';
open my $fh, '<', $filename
or die "Cannot open '$filename': $!";
If you use autodie, you can get away with:
open my $fh, '<', 'something';
Moving on:
while ( <FH> ) {
chomp;
$my_line = "$_";
First, read the FAQ (you should have done so before starting to write programs). See What's wrong with always quoting "$vars"?. Second, if you are going to assign the line that you just read to $my_line, you should do it in the while statement so you do not needlessly touch $_. Finally, you can be strict compliant without typing any more characters:
while ( my $line = <$fh> ) {
chomp $line;
Refer to the previous FAQ again.
if ("$my_line" =~ /Pattern/) {
Why interpolate $my_line once more?
foreach( #array ){
print "$_\n";
}
Either use an explicit loop variable or turn this into:
print "$_\n" for #array;
So, you interpolate $my_line again and add the newline that was removed by chomp earlier. There is no reason to do so:
print "$my_line\n"
And now we come to the line that motivated me to dissect the code you posted in the first place:
if ( "$#array" > "0" ) {
$#array is a number. 0 is a number. > is used to check if the number on the LHS is greater than the number on the RHS. Therefore, there is no need to convert both operands to strings.
Further, $#array is the last index of #array and its meaning depends on the value of $[. I cannot figure out what this statement is supposed to be checking.
Now, your original problem statement was
print matching lines with the lines immediately above them
The natural question, of course, is how many lines "immediately above" the match you want to print.
#!/usr/bin/perl
use strict;
use warnings;
use Readonly;
Readonly::Scalar my $KEEP_BEFORE => 4;
my $filename = $ARGV[0];
my $pattern = qr/$ARGV[1]/;
open my $input_fh, '<', $filename
or die "Cannot open '$filename': $!";
my #before;
while ( my $line = <$input_fh> ) {
$line = sprintf '%6d: %s', $., $line;
print #before, $line, "\n" if $line =~ $pattern;
push #before, $line;
shift #before if #before > $KEEP_BEFORE;
}
close $input_fh;
Command line grep is the quickest way to accomplish this, but if your goal is to learn some Perl then you'll need to produce some code.
Rather than providing code, as others have already done, I'll talk a bit about how to write your own. I hope this helps with the brain-lock.
Read my previous answer on how to write a program, it gives some tips about how to start working on your problem.
Go through each of the sample programs you have, as well as those offered here and comment out exactly what they do. Refer to the perldoc for each function and operator you don't understand. Your first example code has an error, if 2 lines in a row match, the line after the second match won't print. By error, I mean that either the code or the spec is wrong, the desired behavior in this case needs to be determined.
Write out what you want your program to do.
Start filling in the blanks with code.
Here's a sketch of a phase one write-up:
# This program reads a file and looks for lines that match a pattern.
# Open the file
# Iterate over the file
# For each line
# Check for a match
# If match print line before, line and next line.
But how do you get the next line and the previous line?
Here's where creative thinking comes in, there are many ways, all you need is one that works.
You could read in lines one at a time, but read ahead by one line.
You could read the whole file into memory and select previous and follow-on lines by indexing an array.
You could read the file and store the offset and length each line--keeping track of which ones match as you go. Then use your offset data to extract the required lines.
You could read in lines one at a time. Cache your previous line as you go. Use readline to read the next line for printing, but use seek and tell to rewind the handle so that the 'next' line can be checked for a match.
Any of these methods, and many more could be fleshed out into a functioning program. Depending on your goals, and constraints any one may be the best choice for that problem domain. Knowing how to select which one to use will come with experience. If you have time, try two or three different ways and see how they work out.
Good luck.
If you don't mind losing the ability to iterate over a filehandle, you could just slurp the file and iterate over the array:
#!/usr/bin/perl
use strict; # always do these
use warnings;
my $range = 1; # change this to print the first and last X lines
open my $fh, '<', 'FILE' or die "Error: $!";
my #file = <$fh>;
close $fh;
for (0 .. $#file) {
if($file[$_] =~ /Pattern/) {
my #lines = grep { $_ > 0 && $_ < $#file } $_ - $range .. $_ + $range;
print #file[#lines];
}
}
This might get horribly slow for large files, but is pretty easy to understand (in my opinion). Only when you know how it works can you set about trying to optimize it. If you have any questions about any of the functions or operations I used, just ask.

Is there a simple way to do bulk file text substitution in place?

I've been trying to code a Perl script to substitute some text on all source files of my project. I'm in need of something like:
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" *.{cs,aspx,ascx}
But that parses all the files of a directory recursively.
I just started a script:
use File::Find::Rule;
use strict;
my #files = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
# In-place file editing, or something like that
}
}
But now I'm stuck. Is there a simple way to edit all files in place using Perl?
Please note that I don't need to keep a copy of every modified file; I'm have 'em all subversioned =)
Update: I tried this on Cygwin,
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" {*,*/*,*/*/*}.{cs,aspx,ascx
But it looks like my arguments list exploded to the maximum size allowed. In fact, I'm getting very strange errors on Cygwin...
If you assign #ARGV before using *ARGV (aka the diamond <>), $^I/-i will work on those files instead of what was specified on the command line.
use File::Find::Rule;
use strict;
#ARGV = (File::Find::Rule->file()->name('*.cs', '*.aspx', '*.ascx')->in('.'));
$^I = '.bak'; # or set `-i` in the #! line or on the command-line
while (<>) {
s/thisgoesout/thisgoesin/gi;
print;
}
This should do exactly what you want.
If your pattern can span multiple lines, add in a undef $/; before the <> so that Perl operates on a whole file at a time instead of line-by-line.
You may be interested in File::Transaction::Atomic or File::Transaction
The SYNOPSIS for F::T::A looks very similar with what you're trying to do:
# In this example, we wish to replace
# the word 'foo' with the word 'bar' in several files,
# with no risk of ending up with the replacement done
# in some files but not in others.
use File::Transaction::Atomic;
my $ft = File::Transaction::Atomic->new;
eval {
foreach my $file (#list_of_file_names) {
$ft->linewise_rewrite($file, sub {
s#\bfoo\b#bar#g;
});
}
};
if ($#) {
$ft->revert;
die "update aborted: $#";
}
else {
$ft->commit;
}
Couple that with the File::Find you've already written, and you should be good to go.
You can use Tie::File to scalably access large files and change them in place. See the manpage (man 3perl Tie::File).
Change
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
#inplace file editing, or something like that
}
}
To
foreach my $f (#files){
open my $in, '<', $f;
open my $out, '>', "$f.out";
while (my $line = <$in>){
chomp $line;
$line =~ s/thisgoesout/thisgoesin/gi
print $out "$line\n";
}
}
This assumes that the pattern doesn't span multiple lines. If the pattern might span lines, you'll need to slurp in the file contents. ("slurp" is a pretty common Perl term).
The chomp isn't actually necessary, I've just been bitten by lines that weren't chomped one too many times (if you drop the chomp, change print $out "$line\n"; to print $out $line;).
Likewise, you can change open my $out, '>', "$f.out"; to open my $out, '>', undef; to open a temporary file and then copy that file back over the original when the substitution's done. In fact, and especially if you slurp in the whole file, you can simply make the substitution in memory and then write over the original file. But I've made enough mistakes doing that that I always write to a new file, and verify the contents.
Note, I originally had an if statement in that code. That was most likely wrong. That would have only copied over lines that matched the regular expression "thisgoesout" (replacing it with "thisgoesin" of course) while silently gobbling up the rest.
You could use find:
find . -name '*.{cs,aspx,ascx}' | xargs perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi"
This will list all the filenames recursively, then xargs will read its stdin and run the remainder of the command line with the filenames appended on the end. One nice thing about xargs is it will run the command line more than once if the command line it builds gets too long to run in one go.
Note that I'm not sure whether find completely understands all the shell methods of selecting files, so if the above doesn't work then perhaps try:
find . | grep -E '(cs|aspx|ascx)$' | xargs ...
When using pipelines like this, I like to build up the command line and run each part individually before proceeding, to make sure each program is getting the input it wants. So you could run the part without xargs first to check it.
It just occurred to me that although you didn't say so, you're probably on Windows due to the file suffixes you're looking for. In that case, the above pipeline could be run using Cygwin. It's possible to write a Perl script to do the same thing, as you started to do, but you'll have to do the in-place editing yourself because you can't take advantage of the -i switch in that situation.
Thanks to ephemient on this question and on this answer, I got this:
use File::Find::Rule;
use strict;
sub ReplaceText {
my $regex = shift;
my $replace = shift;
#ARGV = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
$^I = '.bak';
while (<>) {
s/$regex/$replace->()/gie;
print;
}
}
ReplaceText qr/some(crazy)regexp/, sub { "some $1 text" };
Now I can even loop through a hash containing regexp=>subs entries!

Should I manually set Perl's #ARGV so I can use <> to open, scan, and close files?

I have recently started learning Perl and one of my latest assignments involves searching a bunch of files for a particular string. The user provides the directory name as an argument and the program searches all the files in that directory for the pattern. Using readdir() I have managed to build an array with all the searchable file names and now need to search each and every file for the pattern, my implementation looks something like this -
sub searchDir($) {
my $dirN = shift;
my #dirList = glob("$dirN/*");
for(#dirList) {
push #fileList, $_ if -f $_;
}
#ARGV = #fileList;
while(<>) {
## Search for pattern
}
}
My question is - is it alright to manually load the #ARGV array as has been done above and use the <> operator to scan in individual lines or should I open / scan / close each file individually? Will it make any difference if this processing exists in a subroutine and not in the main function?
On the topic of manipulating #ARGV - that's definitely working code, Perl certainly allows you to do that. I don't think it's a good coding habit though. Most of the code I've seen that uses the "while (<>)" idiom is using it to read from standard input, and that's what I initially expect your code to do. A more readable pattern might be to open/close each input file individually:
foreach my $file (#files) {
open FILE, "<$file" or die "Error opening file $file ($!)";
my #lines = <FILE>;
close FILE or die $!;
foreach my $line (#file) {
if ( $line =~ /$pattern/ ) {
# do something here!
}
}
}
That would read more easily to me, although it is a few more lines of code. Perl allows you a lot of flexibility, but I think that makes it that much more important to develop your own style in Perl that's readable and understandable to you (and your co-workers, if that's important for your code/career).
Putting subroutines in the main function or in a subroutine is also mostly a stylistic decision that you should play around with and think about. Modern computers are so fast at this stuff that style and readability is much more important for scripts like this, as you're not likely to encounter situations in which such a script over-taxes your hardware.
Good luck! Perl is fun. :)
Edit: It's of course true that if he had a very large file, he should do something smarter than slurping the entire file into an array. In that case, something like this would definitely be better:
while ( my $line = <FILE> ) {
if ( $line =~ /$pattern/ ) {
# do something here!
}
}
The point when I wrote "you're not likely to encounter situations in which such a script over-taxes your hardware" was meant to cover that, sorry for not being more specific. Besides, who even has 4GB hard drives, let alone 4GB files? :P
Another Edit: After perusing the Internet on the advice of commenters, I've realized that there are hard drives that are much larger than 4GB available for purchase. I thank the commenters for pointing this out, and promise in the future to never-ever-ever try to write a sarcastic comment on the internet.
I would prefer this more explicit and readable version:
#!/usr/bin/perl -w
foreach my $file (<$ARGV[0]/*>){
open(F, $file) or die "$!: $file";
while(<F>){
# search for pattern
}
close F;
}
But it is also okay to manipulate #ARGV:
#!/usr/bin/perl -w
#ARGV = <$ARGV[0]/*>;
while(<>){
# search for pattern
}
Yes, it is OK to adjust the argument list before you start the 'while (<>)' loop; it would be more nearly foolhardy to adjust it while inside the loop. If you process option arguments, for instance, you typically remove items from #ARGV; here, you are adding items, but it still changes the original value of #ARGV.
It makes no odds whether the code is in a subroutine or in the 'main function'.
The previous answers cover your main Perl-programming question rather well.
So let me comment on the underlying question: How to find a pattern in a bunch of files.
Depending on the OS it might make sense to call a specialised external program, say
grep -l <pattern> <path>
on unix.
Depending on what you need to do with the files containing the pattern, and how big the hit/miss ratio is, this might save quite a bit of time (and re-uses proven code).
The big issue with tweaking #ARGV is that it is a global variable. Also, you should be aware that while (<>) has special magic attributes. (reading each file in #ARGV or processing STDIN if #ARGV is empty, testing for definedness rather than truth). To reduce the magic that needs to be understood, I would avoid it, except for quickie-hack-jobs.
You can get the filename of the current file by checking $ARGV.
You may not realize it, but you are actually affecting two global variables, not just #ARGV. You are also hitting $_. It is a very, very good idea to localize $_ as well.
You can reduce the impact of munging globals by using local to localize the changes.
BTW, there is another important, subtle bit of magic with <>. Say you want to return the line number of the match in the file. You might think, ok, check perlvar and find $. gives the linenumber in the last handle accessed--great. But there is an issue lurking here--$. is not reset between #ARGV files. This is great if you want to know how many lines total you have processed, but not if you want a line number for the current file. Fortunately there is a simple trick with eof that will solve this problem.
use strict;
use warnings;
...
searchDir( 'foo' );
sub searchDir {
my $dirN = shift;
my $pattern = shift;
local $_;
my #fileList = grep { -f $_ } glob("$dirN/*");
return unless #fileList; # Don't want to process STDIN.
local #ARGV;
#ARGV = #fileList;
while(<>) {
my $found = 0;
## Search for pattern
if ( $found ) {
print "Match at $. in $ARGV\n";
}
}
continue {
# reset line numbering after each file.
close ARGV if eof; # don't use eof().
}
}
WARNING: I just modified your code in my browser. I have not run it so it, may have typos, and probably won't work without a bit of tweaking
Update: The reason to use local instead of my is that they do very different things. my creates a new lexical variable that is only visible in the contained block and cannot be accessed through the symbol table. local saves the existing package variable and aliases it to a new variable. The new localized version is visible in any subsequent code, until we leave the enclosing block. See perlsub: Temporary Values Via local().
In the general case of making new variables and using them, my is the correct choice. local is appropriate when you are working with globals, but you want to make sure you don't propagate your changes to the rest of the program.
This short script demonstrates local:
$foo = 'foo';
print_foo();
print_bar();
print_foo();
sub print_bar {
local $foo;
$foo = 'bar';
print_foo();
}
sub print_foo {
print "Foo: $foo\n";
}