rysnc on perl until loop never breaks - perl

I was having a difficulty about perl script..I have a script that includes a loop but never breaks out of it once it gets the exit status zero.The loop will run only once the exit status results a 1 upon checking initially by the "if" statement.
my $a = "/home/vivek/generated_mdsum_reference.out";
my $b = "/home/vivek/generated_mdsum_new.out";
sub CHECK {
print "\n";
print "\n";
print "\n\tGenerating MD5SUM ....";
my $dumpfile = "/home/vivek/file_dump.dmp";
print "\n";
# my $md5sum = system("md5sum $dumpfile");
my $md5sum = `md5sum $dumpfile`;
print "\n";
print "\nChecksum: $md5sum.";
# Put checksum in file
my $ochksumfile = "/home/vivek/generated_mdsum_new.out";
open (my $fh, '>', "$ochksumfile") or die "no file:$!";
my $output = $md5sum;
die "$!" if $?;
$value = (split / /, "$output")[0];
print $fh $value;
my $status =compare($b, $a);
}
my $status =compare($b, $a);
if ( $status == 1 ){
do
CHECK;
until ($status == 0 ) {
print "\n\tfiles are now Ok. Exiting..";
print "\n\t";
}
All the variables I have set in there works fine, I only ended up in the until loop which it keeps on running endlessly which I think it cannot get through until the rest of the function "CHECK"
Please help me guys.

Let's look at the block in question
my $status = compare($b, $a);
if ( $status == 1 ) {
do
CHECK;
until ($status == 0 ) {
print "\n\tfiles are now Ok. Exiting..";
print "\n\t";
}
whether you meant it or not, this is equivalent to
my $status = compare($b, $a);
if ( $status == 1 ) {
do CHECK;
until ( $status == 0 ) {
print "\n\tfiles are now Ok. Exiting..";
print "\n\t";
}
So the contents of your until loop is just two print statements, which aren't going to change the value of $status so it will loop forever
What I think you meant is
my $status = compare($b, $a);
if ( $status == 1 ) {
do {
CHECK;
} until $status == 0;
print "\n\tfiles are now Ok. Exiting..";
print "\n\t";
}
which will repeatedly call CHECK until $status is set to zero
EXCEPT THAT you should always use lower-case characters in your local identifiers. Upper-case is reserved for global identifiers such as package names. In this situation you have unwittingly created a CHECK block
perldoc perlmod says this
Five specially named code blocks are executed at the beginning and at the end of a running Perl program. These are the BEGIN , UNITCHECK , CHECK , INIT , and END blocks.
These code blocks can be prefixed with sub to give the appearance of a subroutine (although this is not considered good style). One should note that these code blocks don't really exist as named subroutines (despite their appearance). The thing that gives this away is the fact that you can have more than one of these code blocks in a program, and they will get all executed at the appropriate moment. So you can't execute any of these code blocks by name.
So because "these code blocks don't really exist as named subroutines" your program will only call CHECK during compilation, and subsequent explicit calls will be ignored
So rename your subroutine to check and chnage your code to this
my $status = compare($b, $a);
if ( $status == 1 ) {
do {
CHECK;
} until $status == 0;
print "\n\tfiles are now Ok. Exiting..";
print "\n\t";
}
and all will be well

I do not understand why the until loop should stop.
In short, you get an md5 sum for a file in your "CHECK" sub.
sub CHECK {
open (my $fh, '>', "/home/vivek/generated_mdsum_new.out");
my $dump_file = "/home/vivek/file_dump.dmp";
my $md5sum = `md5sum $dump_file`;
my $value = (split(" ",$md5sum))[0];
print $fh $value;
close $fh;
compare($a,$b);
# question: what is in $a / $b and what does compare do ?
}
Now you compare $a and $b and return that result.
Assuming $a / $b contain something useful, there is nothing in the until block or CHECK block that I see to change either, so the comparison would keep returning the same result ( assuming "compare" compares and does not change anything, which would be a bad naming if it did ).
So in the until block, do something to influence the CHECK, otherwise you are stuck in a loop.

You should always use lower-case characters in your local identifiers. Upper-case is reserved for global identifiers such as package names. In this situation you have unwittingly created a CHECK block
perldoc perlmod says this
Five specially named code blocks are executed at the beginning and at the end of a running Perl program. These are the BEGIN , UNITCHECK , CHECK , INIT , and END blocks.
These code blocks can be prefixed with sub to give the appearance of a subroutine (although this is not considered good style). One should note that these code blocks don't really exist as named subroutines (despite their appearance). The thing that gives this away is the fact that you can have more than one of these code blocks in a program, and they will get all executed at the appropriate moment. So you can't execute any of these code blocks by name.
So because "these code blocks don't really exist as named subroutines" your program will only call CHECK during compilation. That means explicit calls to CHECK will be ignored, and $status never changes
Change your subroutine to check and all will be well

The problem is here:
until ($status == 0 ) {
print "\n\tfiles are now Ok. Exiting..";
print "\n\t";
}
Nowhere in the loop are you changing the value of $status.
The "do CHECK;" isn't part of the loop.
Would eval a file by the name of the return value of the CHECK sub
If it had actually called that sub.
Which it won't because it isn't actually a subroutine.
The other form of until is:
do {
CHECK; # doesn't work as this is a special name
} until $status == 0;
print "\n\tfiles are now Ok. Exiting..";
print "\n\t";
Which is still a problem as CHECK is a special name for a block that only gets called once at CHECK time, that you can't actually call.
Plus the code in CHECK would always have the same result, so calling it repeatedly doesn't make sense, and could still result in an infinite loop if it didn't work the first time.
This is how I might have written it
This is a first pass of making your code make sense to me, also fixing some of the errors pointed out above.
I also changed $a and $b to $ref and $new since $a and $b are reserved variables.
I improved it by using modules that come with Perl, so that I don't have to check the return values of open() and close() ( autodie ), or rely on conventions of a particular platform ( Digest::MD5, and File::Spec::Functions ).
I assume that you loaded File::Compare.
I removed the setting of $status from the check sub to reduce the use of global variables.
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.1; # set minimum version which was released in 2009
use autodie;
use File::Spec::Functions qw' catfile catdir rootdir curdir ';
use Digest::MD5;
use File::Compare qw' compare ';
# should be // not ||, but it will work if your dir isn't named "0" or ""
my $basedir = $ENV{HOME} || catdir rootdir, qw' home vivek ';
# try the current directory if it doesn't exist
$basedir = curdir unless -d $basedir;
my $ref = catfile $basedir, 'generated_mdsum_reference.out';
my $new = catfile $basedir, 'generated_mdsum_new.out';
my $dumpfile = catfile $basedir, 'file_dump.dmp';
# forward declare so that we can put them at the end
sub md5_hex_file;
sub md5_sum;
sub check;
#-------------------------------------------------------
if ( compare($new, $ref) != 0 ){
if ( check($dumpfile,$new) == 0 ){
print "\n\tfiles are now Ok. Exiting..\n";
} else {
local $| = 1; # make sure the output is flushed to STDOUT
print "\n\tfiles are NOT OK. Exiting..\n";
exit 1;
}
}
#-------------------------------------------------------
# helper subroutines
sub md5_hex_file {
my ($filename) = #_;
# let Digest::MD5 read the file for us
my $ctx = Digest::MD5->new;
{ # limit scope of $fh
open my $fh, '<', $filename;
binmode $fh;
$ctx->addfile( $fh );
close $fh;
}
$ctx->hexdigest;
}
# no longer necessary
sub md5_sum {
my ($filename) = #_;
# `md5sum -b $filename`
md5_hex_file($filename) . " $filename\n";
}
sub check {
my ( $infile, $outfile ) = #_
print "\n" x 3, "\tGenerating MD5SUM ....\n";
my $md5_hex = md5_hex_file $infile;
print "\n" x 2, "Checksum: $md5_hex.\n";
# Put checksum in file
{
open my $out_fh, '>', $outfile;
print {$out_fh} $md5_hex;
close $out_fh;
}
my $status = compare $new, $ref;
return $status if $status == 0;
# add a newline and hope that fixes it
{
open my $out_fh, '>>', $outfile;
print {$out_fh} "\n";
close $out_fh;
}
return compare $new, $ref;
}
Really I think you could have just used these "one-liners"
$ perl -Mautodie -MDigest::MD5 -e \
'open $fh, q[<], shift;
print Digest::MD5->new->addfile($fh)->hexdigest' \
file_dump.dmp > generated_mdsum_new.out
$ perl -MFile::Compare -e \
'if ( compare(shift(),shift()) == 0 ){
print qq[They match\n]
} else {
print qq[They don\'t match\n]
}' \
generated_mdsum_new.out generated_mdsum_reference.out

Related

How to redirect output from pipe provided as argument to a variable

So i've made functions to capture pipes
sub capture_stdout (&) {
my $s;
open(local *STDOUT, '>', \$s);
shift->();
return $s;
}
sub capture_stderr (&) {
my $s;
open(local *STDERR, '>', \$s);
shift->();
return $s;
}
These work great. Now the challenge i am facing is, i want to make a function that takes the pipe(s) as arguments, and redirects all of them in a single sub.
I have yet been unsuccessful in making it work. So far I've cooked up something that compiles;
sub capture(#&) {
my $c = pop;
my $o = [];
say {$_[$_]} $_[$_] for (0 .. $#_);
open(local *{$_[$_]}, '>', \$o->[$_]) for (0 .. $#_);
$c->();
return $o;
}
use Data::Dumper;
say Dumper( capture *STDOUT, *STDERR, sub{ say 1; warn 2; } );
but it does not capture anything. I cant seem to figure out how to fix it. I'm convinced however that it is the local *{$_[$_]} that needs fixing, though i could be wrong.
The complete output is:
*main::STDOUT
*main::STDERR
1
2 at capture.pl line 15.
$VAR1 = [
undef,
undef
];
So then the question: Is it even possible to do what I'm attempting, and if so, how?
Thank you.
The problem with your code is that the effects of local are undone at the end of your
... for (0 .. $#_);
loop. By the time you call $c->(), the filehandles have their original values again.
So ...
You want to localize an arbitrary number of variables.
You can't use blocks (e.g. for (...) { ... }) because local is undone at the end of the scope it's in.
You can't use postfix for because apparently it implicitly creates its own mini-scope.
The solution? goto, of course!
(Or you could use recursion: Use a block, but never leave it or loop back. Just localize a single variable, then call yourself with the remaining variables. But goto is funnier.)
sub capture {
my $c = pop;
my $o = [];
my $i = 0;
LOOP: goto LOOP_END if $i >= #_;
local *{$_[$i++]};
goto LOOP;
LOOP_END:
open(*{$_[$_]}, '>', \$o->[$_]) or die "$_[$_]: $!" for 0 .. $#_;
$c->();
return $o;
}
Effectively we've created a loop without entering/leaving any scopes.
You need to actually switch out the file handles. To do that, first save the existing handles. Then create new ones that point into your output data structure. Once you've run the code, restore the original handles.
sub capture {
my $c = pop;
# we will keep the original handles in here to restore them later
my #old_handles;
my $o = [];
foreach my $i (0 .. $#_) {
# store the original handle
push #old_handles, $_[$i];
# create a new handle
open my $fh, '>', \$o->[$i] or die $!;
# stuff it into the handle slot of the typeglob associated with the old handle
*{$_[$i]} = $fh;
}
# run callback
$c->();
# restore the old handles
*{$_[$_]} = $old_handles[$_] for 0 .. $#_;
return $o;
}
SOLUTION:
The final product, not nearly as convoluted as the original goto loop:
=pod
=item C<capture>
capture takes a list of pipes/filehandles, a code block or sub, optionally arguments to send to
said block and returns any captured output as a string, or an array of strings.
my ($out, $err) = capture *STDOUT, *STDERR, sub { say 'faijas'; warn #_; }, 'jee';
my $output = capture *STDOUT, sub { say 'jee'; };
=cut
sub capture(#&;#) {
my (#o, #h);
# walk through #_, grab all filehandles and the code block into #h
push #h, shift while #_ && ref $h[$#h] ne 'CODE';
my $c = pop #h; # then separate the code block from #h, leaving only handles
# Really we want to do: open(local *{$_[$_]}, '>', \$o->[$_]) for (0 .. $#_);
# but because of scoping issues with the local keyword, we have to loop without
# creating an inner scope
my $i = 0;
R: open(local *{$h[$i]}, '>', \$o[$i]) or die "$h[$i]: $!" ;
goto R if ++$i <= $#h;
$c->(#_);
return wantarray ? #o : $o[0];
}
Big thanks to #melpomene and #simbabque for helping me around the initial problem, and #ikegami for pointing out oversights.

Perl sub skips foreach within which it is called

I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}
To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better
The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}
As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;

Statistics in Perl Script

I have the following question:
I want to create a perl script that reads from a text file (file with several columns of numbers) and calculate some statistics (mean, median, sd, variance). I already built one script, but as I am not in love yet with perl, I can't fix the problems of syntax on it...
Here is my perl script..
#!/usr/bin/perl -w
use strict;
open(FILEHANDLE, data.txt);
while (<FILEHANDLE>) {
shift #ARGV;
my #array = split(\t,$_);
}
close(FILEHANDLE);
###### mean, sum and size
$N = $sum = 0;
$array[$x-1];
$N++;
$sum += $array[$x-1];
###### minimum and the maximum
($min = 0, $max = 0);
$max = $array[$x-1] if ($max < $array[$x-1]), (my#sorted = sort { $a <=> $b } #samples) {
print join(" ",#sorted);
}
##### median
if ($N % 2==1) {
print "$median = $sorted[int($N/2)]\n"; ## check this out
};
else ($median = ($sorted[$N/2] + $sorted[($N/2)-1]) / 2)) {
print "$median\n"; # check this out
};
##### quantiles 1º and 3º
if $qt1 = $sorted[$r25-1] {
print "\n"; # check this out
};
else $qt1 = $fr*($sorted[$ir] - $sorted[$ir-1]) + $sorted[$ir-1] {
print "\n"; # check this out
};
##### variance
for (my $i=0;
$i<scalar(#samples);
$i++)
{
$Var += ($samples[$i]-$mean)**2;
$Var = $Var/($N-1);
};
###### standard error
($Std = sqrt($Var)/ sqrt($N));
############################################################
print "$min\n";
print "$max\n";
print "$mean\n";
print "$median\n";
print "$qt1\n";
print "$var\n";
print "$std\n";
exit(0);
I want to get it working. Please help. THANKS IN ADVANCE!
Errors in your code:
open(FILEHANDLE, data.txt);
data.txt needs to be quoted. You are not checking the return value of the open, e.g. ... or die $!. You should use a lexical filehandle and three argument open, e.g. open my $fh, '<', "data.txt" or die $!.
shift #ARGV;
This does nothing except remove the first value from you argument list, which is then promptly discarded.
my #array = split(\t,$_);
You are using \t as a bareword, it should be a regex, /\t/. Your #array is declared inside a lexical scope of the while loop, and will be undefined outside this block.
$N = $sum = 0;
Both variables are not declared, which will cause the script to die when you use strict (which is a very good idea). Use my $N to solve that. Also, $N is not a very good variable name.
$array[$x-1];
This will do nothing. $x is not declared (see above), and also undefined. The whole statement does nothing, it is like having a line 3;. I believe you will get an error such as Useless use of variable in void context.
$N++;
This increments $N to 1, which is a useless thing to do, since you only a few lines above initialized it to 0.
Well.. the list goes on. I suggest you start smaller, use strict and warnings since they are very good tools, and work out the errors one by one. A very good idea would be to make subroutines of your calculations, e.g.:
sub sum {
# code here
return $sum;
}
Go to perldoc.perl.org and read the documentation. Especially useful would be the syntax related ones and perlfunc.
Also, you should be aware that this functionality can be found in modules, which you can find at CPAN.
Your main problem is you have not declared your variables such as $N, $max, etc.
You need to introduce all new variables with my the first time you reference them. Just like you did with $array and $i. So for example
$N = $sum = 0;
Should become
my( $N, $sum ) = ( 0, 0 );

perl equal strings returns 0 even if they are equal

Perl is continuing to surprise me. I have a code which takes an input from the command line and checks if it is in a file. I have a file like this:
ls
date
pwd
touch
rm
First i read this file as
open(MYDATA,"filename") or die "Can not open file\n";
#commandlist = <MYDATA>;
chomp #commandlist;
close MYDATA;
the argument is in $commandname variable. To check if it is correct i printed to screen.
print $commandname."\n";
it works well. then i write the code.
$count = #commandlist;
for($i=0;$i < $count;$i++)
{
print $commandname;
print $commandlist[$i];
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
and it does not print 'equal'. but it should do becaues $commandname variable has the value 'ls' which is in the file. i also print the value of $commandname and $commandlist[$i] to see if "visibly" they are equal and i get the output:
ls
lsls
lsdate
lspwd
lstouch
lsrm
here i see that they got the same value but why never eq operator evaluates to zero.
Additionally to get this task done, I have tried various methods all of which come to be useless like making a hash from the array and using exists.
I am struggling for this seemingly easy problem for a day but i just dont get it.
Thanks in advance
EDIT:
when i change the above loop as below
$count = #commandlist;
for($i=0;$i < $count;$i++)
{
print $commandlist[$i];
print $commandname;
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
I got an output like.
ls
ls
lste
lsd
lsuch
ls
it seems like for some reason it overwrites some characters.
EDIT:
my whole script is like:
#reading file code, i posted above
while(<>)
chomp($_);
$commandname = $_;
if($commandname eq "start"){
##something here
} elsif ($commandname eq "machines"){
##something here
} else {
$count = #commandlist;
for($i=0;$i < $count;$i++)
{
print $commandlist[$i];
print $commandname;
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
}
A bit change in the code would result in what you are looking for, "chomp" the string from array before you put it for comparison. Here it is
chomp $commandlist[$i];
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
EDIT: as per perldoc chomp when you are chomping a list you should parenthesis. So, in your case ... instead simply saying
chomp #commandlist
make it like
chomp(#commandlist)
FINAL EDIT: I tried this and worked fine. Give it a try
$commandname = $ARGV[0];
open(MYDATA,"chk.txt") or die "Can not open file\n";
#commandlist = <MYDATA>;
chomp(#commandlist);
close MYDATA;
print $commandname."\n";
$count = #commandlist;
print $commandname;
for($i=0;$i < $count;$i++)
{
print $commandlist[$i];
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
The overwritting indicates the presence of a CR. The lines end with CR LF, but you only remove the LF with chomp. Change
while (<>) {
chomp($_)
to
while (<>) {
s/\s+\z//;
You might consider restructuring your code as:
my $path='filename';
my $match='ls';
part 1 - read the file
open(my $fh, '<', $path) or die "failed to open $path: $!";
my #commandlist=<$fh>;
chomp #commandlist;
# or you can combine these lines as:
# chomp(my #commandlist=<$fh>);
# because chomp operates on the array itself rather than making a copy.
close($fh);
or
use File::Slurp qw/ read_file /;
# see http://search.cpan.org/dist/File-Slurp/lib/File/Slurp.pm
my #commandlist=read_file($path); # result is pre-chomped!
part 2 - check for a match
foreach my $command (#commandlist) {
print "$match equals $command\n" if $match eq $command;
}
One important consideration is that each line in your file must contain only the command name and cannot begin or end with any spaces or tabs. To compensate for possible leading or trailing whitespace, try:
foreach my $command (#commandlist) {
$command=~s/^\s+|\s+$//g; # strip leading or trailing whitespace
print "$match equals $command\n" if $match eq $command;
}
And finally, always start your Perl script with a Perl developer's bestest friends:
use strict;
use warnings;
which will catch most (if not all) errors caused by sloppy programming practice. (We all suffer from this!)

What's the most defensive way to loop through lines in a file with Perl?

I usually loop through lines in a file using the following code:
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
...
}
However, in answering another question, Evan Carroll edited my answer, changing my while statement to:
while ( defined( my $line = <$fh> ) ) {
...
}
His rationale was that if you have a line that's 0 (it'd have to be the last line, else it would have a carriage return) then your while would exit prematurely if you used my statement ($line would be set to "0", and the return value from the assignment would thus also be "0" which gets evaluated to false). If you check for defined-ness, then you don't run into this problem. It makes perfect sense.
So I tried it. I created a textfile whose last line is 0 with no carriage return on it. I ran it through my loop and the loop did not exit prematurely.
I then thought, "Aha, maybe the value isn't actually 0, maybe there's something else there that's screwing things up!" So I used Dump() from Devel::Peek and this is what it gave me:
SV = PV(0x635088) at 0x92f0e8
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0X962600 "0"\0
CUR = 1
LEN = 80
That seems to tell me that the value is actually the string "0", as I get a similar result if I call Dump() on a scalar I've explicitly set to "0" (the only difference is in the LEN field -- from the file LEN is 80, whereas from the scalar LEN is 8).
So what's the deal? Why doesn't my while() loop exit prematurely if I pass it a line that's only "0" with no carriage return? Is Evan's loop actually more defensive, or does Perl do something crazy internally that means you don't need to worry about these things and while() actually only does exit when you hit eof?
Because
while (my $line = <$fh>) { ... }
actually compiles down to
while (defined( my $line = <$fh> ) ) { ... }
It may have been necessary in a very old version of perl, but not any more! You can see this from running B::Deparse on your script:
>perl -MO=Deparse
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
...
}
^D
die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
while (defined(my $line = <$fh>)) {
do {
die 'Unimplemented'
};
}
- syntax OK
So you're already good to go!
BTW, this is covered in the I/O Operators section of perldoc perlop:
In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or "undef" at end-of-file or on error. When $/ is set to "undef" (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by "undef" subsequently.
Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a "while" statement (even if disguised as a "for(;;)" loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.
The following lines are equivalent:
while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;
This also behaves similarly, but avoids $_ :
while (my $line = <STDIN>) { print $line }
In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }
In other boolean contexts, "<filehandle>" without an explicit "defined" test or comparison elicit a warning if the "use warnings" pragma or the -w command-line switch (the $^W variable) is in effect.
While it is correct that the form of while (my $line=<$fh>) { ... } gets compiled to while (defined( my $line = <$fh> ) ) { ... } consider there are a variety of times when a legitimate read of the value "0" is misinterpreted if you do not have an explicit defined in the loop or testing the return of <>.
Here are several examples:
#!/usr/bin/perl
use strict; use warnings;
my $str = join "", map { "$_\n" } -10..10;
$str.="0";
my $sep='=' x 10;
my ($fh, $line);
open $fh, '<', \$str or
die "could not open in-memory file: $!";
print "$sep Should print:\n$str\n$sep\n";
#Failure 1:
print 'while ($line=chomp_ln()) { print "$line\n"; }:',
"\n";
while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
rewind();
print "$sep\n";
#Failure 2:
print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
print "$sep\n";
last_char();
#Failure 3:
# fails on last line of "0"
print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
if(my $l=<$fh>) { print "$l\n" }
print "$sep\n";
last_char();
#Failure 4 and no Perl warning:
print 'print "$_\n" if <$fh>;',"\n";
print "$_\n" if <$fh>; #fails to print;
print "$sep\n";
last_char();
#Failure 5
# fails on last line of "0" with no Perl warning
print 'if($line=<$fh>) { print $line; }', "\n";
if($line=<$fh>) {
print $line;
} else {
print "READ ERROR: That was supposed to be the last line!\n";
}
print "BUT, line read really was: \"$line\"", "\n\n";
sub chomp_ln {
# if I have "warnings", Perl says:
# Value of <HANDLE> construct can be "0"; test with defined()
if($line=<$fh>) {
chomp $line ;
return $line;
}
return undef;
}
sub trim_ln {
# if I have "warnings", Perl says:
# Value of <HANDLE> construct can be "0"; test with defined()
if (my $line=<$fh>) {
$line =~ s/^\s+//;
$line =~ s/\s+$//;
return $line;
}
return undef;
}
sub rewind {
seek ($fh, 0, 0) or
die "Cannot seek on in-memory file: $!";
}
sub last_char {
seek($fh, -1, 2) or
die "Cannot seek on in-memory file: $!";
}
I am not saying these are good forms of Perl! I am saying that they are possible; especially Failure 3,4 and 5. Note the failure with no Perl warning on number 4 and 5. The first two have their own issues...