Return file handle from subroutine and pass to other subroutine - perl

I am trying to create a couple of functions that will work together. getFH should take in the mode to open the file (either > or < ), and then the file itself (from the command line). It should do some checking to see if the file is okay to open, then open it, and return the file handle. doSomething should take in the file handle, and loop over the data and do whatever. However when the program lines to the while loop, I get the error:
readline() on unopened filehandle 1
What am I doing wrong here?
#! /usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Getopt::Long;
use Pod::Usage;
# command line param(s)
my $infile = '';
my $usage = "\n\n$0 [options] \n
Options
-infile Infile
-help Show this help message
\n";
# check flags
GetOptions(
'infile=s' => \$infile,
help => sub { pod2usage($usage) },
) or pod2usage(2);
my $inFH = getFh('<', $infile);
doSomething($inFH);
## Subroutines ##
## getFH ##
## #params:
## How to open file: '<' or '>'
## File to open
sub getFh {
my ($read_or_write, $file) = #_;
my $fh;
if ( ! defined $read_or_write ) {
die "Read or Write symbol not provided", $!;
}
if ( ! defined $file ) {
die "File not provided", $!;
}
unless ( -e -f -r -w $file ) {
die "File $file not suitable to use", $!;
}
unless ( open( $fh, $read_or_write, $file ) ) {
die "Cannot open $file",$!;
}
return($fh);
}
#Take in filehandle and do something with data
sub doSomething{
my $fh = #_;
while ( <$fh> ) {
say $_;
}
}

my $fh = #_;
This line does not mean what you think it means. It sets $fh to the number of items in #_ rather than the filehandle that is passed in - if you print the value of $fh, it will be 1 instead of a filehandle.
Use my $fh = shift, my $fh = $_[0], or my ($fh) = #_ instead.

As has been pointed out, my $fh = #_ will set $fh to 1, which is not a file handle. Use
my ($fh) = #_
instead to use list assignment
In addition
-e -f -r -w $file will not do what you want. You need
-e $file and -f $file and -r $file and -w $file
And you can make this more concise and efficient by using underscore _ in place of the file name, which will re-use the information fetched for the previous file test
-e $file and -f _ and -r _ and -w _
However, note that you will be rejecting a request if a file isn't writeable, which makes no sense if the request is to open a file for reading. Also, -f will return false if the file doesn't exist, so -e is superfluous
It is good to include $! in your die strings as it contains the reason for the failure, but your first two tests don't set this value up, and so should be just die "Read or Write symbol not provided"; etc.
In addition, die "Cannot open $file", $! should probably be
die qq{Cannot open "$file": $!}
to make it clear if the file name is empty, and to add some space between the message and the value of $!
The lines read from the file will have a newline character at the end, so there is no need for say. Simply print while <$fh> is fine
Perl variable names are conventionally snake_case, so get_fh and do_something is more usual

Related

Counting number of lines with conditions

This is my script count.pl, I am trying to count the number of lines in a file.
The script's code :
chdir $filepath;
if (-e "$filepath"){
$total = `wc -l < file.list`;
printf "there are $total number of lines in file.list";
}
i can get a correct output, but i do not want to count blank lines and anything in the file that start with #. any idea ?
As this is a Perl program already open the file and read it, filtering out lines that don't count with
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $num_lines = grep { not /^$|^\s*#/ } <$fh>;
where $filename is "file.list." If by "blank lines" you mean also lines with spaces only then chagne regex to /^\s*$|^\s*#/. See grep, and perlretut for regex used in its condition.
That filehandle $fh gets closed when the control exits the current scope, or add close $fh; after the file isn't needed for processing any more. Or, wrap it in a block with do
my $num_lines = do {
open my $fh, '<', $filename or die "Can't open $filename: $!";
grep { not /^$|^\s*#/ } <$fh>;
};
This makes sense doing if the sole purpose of opening that file is counting lines.
Another thing though: an operation like chdir should always be checked, and then there is no need for the race-sensitive if (-e $filepath) either. Altogether
# Perhaps save the old cwd first so to be able to return to it later
#my $old_cwd = Cwd::cwd;
chdir $filepath or die "Can't chdir to $filepath: $!";
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $num_lines = grep { not /^$|^\s*#/ } <$fh>;
A couple of other notes:
There is no reason for printf. For all normal prints use say, for which you need use feature qw(say); at the beginning of the program. See feature pragma
Just in case, allow me to add: every program must have at the beginning
use warnings;
use strict;
Perhaps the original intent of the code in the question is to allow a program to try a non-existing location, and not die? In any case, one way to keep the -e test, as asked for
#my $old_cwd = Cwd::cwd;
chdir $filepath or warn "Can't chdir to $filepath: $!";
my $num_lines;
if (-e $filepath) {
open my $fh, '<', $filename or die "Can't open $filename: $!";
$num_lines = grep { not /^$|^\s*#/ } <$fh>;
}
where I still added a warning if chdir fails. Remove that if you really don't want it. I also added a declaration of the variable that is assigned the number of lines, with my $total_lines;. If it is declared earlier in your real code then of course remove that line here.
perl -ne '$n++ unless /^$|^#/ or eof; print "$n\n" if eof'
Works with multiple files too.
perl -ne '$n++ unless /^$|^#/ or eof; END {print "$n\n"}'
Better for a single file.
open(my $fh, '<', $filename);
my $n = 0;
for(<$fh>) { $n++ unless /^$|^#/}
print $n;
Using sed to filter out the "unwanted" lines in a single file:
sed '/^\s*#/d;/^\s*$/d' infile | wc -l
Obviously, you can also replace infile with a list of files.
The solution is very simple, no any magic.
use strict;
use warnings;
use feature 'say';
my $count = 0;
while( <> ) {
$count++ unless /^\s*$|^\s*#/;
}
say "Total $count lines";
Reference:
<>

Search string with multiple words in the pattern

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

Can't find error "Global symbol #xx requires explicit package name"

I have checked the questions that may already have an answer and none of them have helped.
This is for my semester project for Unix Programming. I have created a script that compares HTML files to one other from a website.
The script worked perfectly as expected until I tried to implement the second website, so in turn I deleted the added code for the second website and now I get the errors
Global symbol "#master" requires explicit package name
Global symbol "#child" requires explicit package name
within the csite_md5 subroutine. I have gone through the code many times over and cannot see the problem.
I am looking for another set of eyes to see if I'm just missing something simple, which usually is the case.
Also I am new to Perl as this is my first time using the language.
#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
use File::Basename;
# Path to the c-site download root directory
my $csite_dir = '/root/websites/c-site/wget/';
opendir my $dh, $csite_dir or die $!;
# Finds the sub directories c-site_'date +%F' where the c-site download is located
my #wget_subdir_csite = sort grep /^[^.]/, readdir $dh;
# Creates the absolute path to the c-site download
my $csite_master_dir = "$csite_dir$wget_subdir_csite[0]/dayzunderground.webs.com";
my $csite_child_dir = "$csite_dir$wget_subdir_csite[1]/dayzunderground.webs.com";
# Call to subroutine to append the .html file name to the absolute path
my #master_csite = &gethtml_master_csite($csite_master_dir);
my #child_csite = &gethtml_child_csite($csite_child_dir);
&csite_md5(\#master_csite, \#child_csite);
sub gethtml_master_csite{
my ($master_path) = #_;
opendir (DIR, $master_path) or die $!;
# Ends with .html and is a file
my #html_master = sort grep {m/\.html$/i && -f "$master_path/$_"} readdir(DIR);
my #files_master = ("$master_path/$html_master[0]","$master_path/$html_master[1]","$master_path/$html_master[2]","$master_path/$html_master[3]");
return #files_master
}
sub gethtml_child_csite{
my ($child_path) = #_;
opendir (DIR, $child_path) or die $!;
# Ends with .html and is a file
my #html_child = sort grep {m/\.html$/i && -f "$child_path/$_"} readdir(DIR);
my #files_child = ("$child_path/$html_child[0]","$child_path/$html_child[1]","$child_path/$html_child[2]","$child_path/$html_child[3]");
return #files_child
}
sub csite_md5{
my ($master, $child) = #_;
if(&md5sum($master[0]) ne &md5sum($child[0])){
my $filename = basename($master[0]);
system("diff -u -d -t --width=100 $master[0] $child[0] > ~/websites/c-site/diff/c-site-$filename-`date +%F`");
#print "1"
}
if(&md5sum($master[1]) ne &md5sum($child[1])){
my $filename2 = basename($master[1]);
system("diff -u -d -t --width=100 $master[1] $child[1] > ~/websites/c-site/diff/c-site-$filename2-`date +%F`");
#print "2"
}
if(&md5sum($master[2]) ne &md5sum($child[2])){
my $filename3 = basename($master[2]);
system("diff -u -d -t --width=100 $master[2] $child[2] > ~/websites/c-site/diff/c-site-$filename3-`date +%F`");
#print "3"
}
if(&md5sum($master[3]) ne &md5sum($child[3])){
my $filename4 = basename($master[3]);
system("diff -u -d -t --width=100 $master[3] $child[3] > ~/websites/c-site/diff/c-site-$filename4-`date +%F`");
#print "4"
}
}
sub md5sum{
my $file = shift;
my $digest = "";
eval{
open(FILE, $file) or die "Can't find file $file\n";
my $ctx = Digest::MD5->new;
$ctx->addfile(*FILE);
$digest = $ctx->hexdigest;
close(FILE);
};
if($#){
print $#;
return "";
}
return $digest
}
$master and $child are array references; use them like $master->[0]. $master[0] uses the array #master, which is a completely separate variable.
I thought it may help to go through your program and point out some practices that are less than optimal
You shouldn't use an ampersand & when calling a Perl subroutine. That was required in Perl 4 which was superseded about 22 years ago
It is preferable to use the File::Spec module to manipulate file paths, both to handle cases like multiple path separators and for portability. File::Spec will also do the job of File::BaseName
It is unnecessary to use the shell to create a date string. Use the Time::Piece module and localtime->ymd generates the same string as date +%F
It is neater and more concise to use map where appropriate instead of writing multiple identical assignments
The gethtml_master_csite and gethtml_child_csite subroutines are identical except that they use different variable names internally. They can be replaced by a single gethtml_csite subroutine
You should use lexical file and directory handles throughout, as you have done with the first opendir. You should also use the three-parameter form of open (with the open mode as the second parameter)
If an open fails then you should include the variable $! in the die string so that you know why it failed. Also, if you end the string with a newline then Perl won't append the source file and line number to the string when it is printed
As you have read, the csite_md5 attempts to use arrays #master and #child which don't exist. You have array references $master and $child instead. Also, the subroutine lends itself to a loop structure instead of writing the four comparisons explicitly
In md5sum you have used an eval to catch the die when the open call fails. It is nicer to check for this explicitly
The standard way of returning a false value from a subroutine is a bare return. If you return '' then it will evaluate as true in list context
With those chnages in place your code looks like this. Please ask if you have any problem understanding it. Note that I haven't been able to test it but it does compile
#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
use File::Spec::Functions qw/ catdir catfile splitpath /;
use Time::Piece 'localtime';
my $csite_dir = '/root/websites/c-site/wget/';
opendir my $dh, $csite_dir or die qq{Unable to open "$csite_dir": $!};
my #wget_subdir_csite = sort grep /^[^.]/, readdir $dh;
my ($csite_master_dir, $csite_child_dir) = map
catdir($csite_dir, $_, 'dayzunderground.webs.com'),
#wget_subdir_csite[0,1];
my #master_csite = gethtml_csite($csite_master_dir);
my #child_csite = gethtml_csite($csite_child_dir);
csite_md5(\#master_csite, \#child_csite);
sub gethtml_csite {
my ($path) = #_;
opendir my $dh, $path or die qq{Unable to open "$path": $!};
my #files = sort grep { /\.html$/i and -f } map catfile($path, $_), readdir $dh;
return #files;
}
sub csite_md5 {
my ($master_list, $child_list) = #_;
for my $i ( 0 .. $#$master_list ) {
my ($master, $child) = ($master_list->[$i], $child_list->[$i]);
if ( md5sum($master) ne md5sum($child) ) {
my $filename = (splitpath($master))[-1]; # Returns (volume, path, file)
my $date = localtime->ymd;
system("diff -u -d -t --width=100 $master $child > ~/websites/c-site/diff/c-site-$filename-$date");
}
}
}
sub md5sum {
my ($file) = #_;
my $digest = "";
open my $fh, '<', $file or do {
warn qq{Can't open file "$file": $!}; # '
return;
};
my $ctx = Digest::MD5->new;
$ctx->addfile($fh);
return $ctx->hexdigest;
}

foreach and special variable $_ not behaving as expected

I'm learning Perl and wrote a small script to open perl files and remove the comments
# Will remove this comment
my $name = ""; # Will not remove this comment
#!/usr/bin/perl -w <- wont remove this special comment
The name of files to be edited are passed as arguments via terminal
die "You need to a give atleast one file-name as an arguement\n" unless (#ARGV);
foreach (#ARGV) {
$^I = "";
(-w && open FILE, $_) || die "Oops: $!";
/^\s*#[^!]/ || print while(<>);
close FILE;
print "Done! Please see file: $_\n";
}
Now when I ran it via Terminal:
perl removeComments file1.pl file2.pl file3.pl
I got the output:
Done! Please see file:
This script is working EXACTLY as I'm expecting but
Issue 1 : Why $_ didn't print the name of the file?
Issue 2 : Since the loop runs for 3 times, why Done! Please see file: was printed only once?
How you would write this script in as few lines as possible?
Please comment on my code as well, if you have time.
Thank you.
The while stores the lines read by the diamond operator <> into $_, so you're writing over the variable that stores the file name.
On the other hand, you open the file with open but don't actually use the handle to read; it uses the empty diamond operator instead. The empty diamond operator makes an implicit loop over files in #ARGV, removing file names as it goes, so the foreach runs only once.
To fix the second issue you could use while(<FILE>), or rewrite the loop to take advantage of the implicit loop in <> and write the entire program as:
$^I = "";
/^\s*#[^!]/ || print while(<>);
Here's a more readable approach.
#!/usr/bin/perl
# always!!
use warnings;
use strict;
use autodie;
use File::Copy;
# die with some usage message
die "usage: $0 [ files ]\n" if #ARGV < 1;
for my $filename (#ARGV) {
# create tmp file name that we are going to write to
my $new_filename = "$filename\.new";
# open $filename for reading and $new_filename for writing
open my $fh, "<", $filename;
open my $new_fh, ">", $new_filename;
# Iterate over each line in the original file: $filename,
# if our regex matches, we bail out. Otherwise we print the line to
# our temporary file.
while(my $line = <$fh>) {
next if $line =~ /^\s*#[^!]/;
print $new_fh $line;
}
close $fh;
close $new_fh;
# use File::Copy's move function to rename our files.
move($filename, "$filename\.bak");
move($new_filename, $filename);
print "Done! Please see file: $filename\n";
}
Sample output:
$ ./test.pl a.pl b.pl
Done! Please see file: a.pl
Done! Please see file: b.pl
$ cat a.pl
#!/usr/bin/perl
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
$ cat a.pl.bak
#!/usr/bin/perl
# this doesn't do much
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
Its not safe to use multiple loops and try to get the right $_. The while Loop is killing your $_. Try to give your files specific names inside that loop. You can do this with so:
foreach my $filename(#ARGV) {
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
or that way:
foreach (#ARGV) {
my $filename = $_;
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
Please never use barewords for filehandles and do use a 3-argument open.
open my $FILE, '<', $filename — good
open FILE $filename — bad
Simpler solution: Don't use $_.
When Perl was first written, it was conceived as a replacement for Awk and shell, and Perl heavily borrowed from that syntax. Perl also for readability created the special variable $_ which allowed you to use various commands without having to create variables:
while ( <INPUT> ) {
next if /foo/;
print OUTPUT;
}
The problem is that if everything is using $_, then everything will effact $_ in many unpleasant side effects.
Now, Perl is a much more sophisticated language, and has things like locally scoped variables (hint: You don't use local to create these variables -- that merely gives _package variables (aka global variables) a local value.)
Since you're learning Perl, you might as well learn Perl correctly. The problem is that there are too many books that are still based on Perl 3.x. Find a book or web page that incorporates modern practice.
In your program, $_ switches from the file name to the line in the file and back to the next file. It's what's confusing you. If you used named variables, you could distinguished between files and lines.
I've rewritten your program using more modern syntax, but your same logic:
use strict;
use warnings;
use autodie;
use feature qw(say);
if ( not $ARGV[0] ) {
die "You need to give at least one file name as an argument\n";
}
for my $file ( #ARGV ) {
# Remove suffix and copy file over
if ( $file =~ /\..+?$/ ) {
die qq(File "$file" doesn't have a suffix);
}
my ( $output_file = $file ) =~ s/\..+?$/./; #Remove suffix for output
open my $input_fh, "<", $file;
open my $output_fh, ">", $output_file;
while ( my $line = <$input_fh> ) {
print {$output_fh} $line unless /^\s*#[^!]/;
}
close $input_fh;
close $output_fh;
}
This is a bit more typing than your version of the program, but it's easier to see what's going on and maintain.

Filehandle open() and the split variable

I am a beginner in Perl.
What I do not understand is the following:
To write a script that can:
Print the lines of the file $source with a comma delimiter.
Print the formatted lines to an output file.
Allow this output file to be specified in command-line.
Code:
my ( $source, $outputSource ) = #ARGV;
open( INPUT, $source ) or die "Unable to open file $source :$!";
Question: I do not understand how one can specify in the command line, upon starting to write the code the text of the output file.
I would rely on redirection operator in the shell instead, such as:
script.pl input.txt > output.txt
Then it is a simple case of doing this:
use strict;
use warnings;
while (<ARGV>) {
s/\n/,/;
print;
}
Then you can even merge several files with script.pl input1.txt input2.txt ... > output_all.txt. Or just do one file at the time, with one argument.
If I understood your question right I hope this example can help.
Program:
use warnings;
use strict;
## Check input and output file as arguments in command line.
die "Usage: perl $0 input-file output-file\n" unless #ARGV == 2;
my ( $source, $output_source ) = #ARGV;
## Open both files, one for reading and other for writing.
open my $input, "<", $source or
die "Unable to open file $source : $!\n";
open my $output, ">", $output_source or
die "Unable to open file $output_source : $!\n";
## Read all file line by line, substitute the end of line with a ',' and print
## to output file.
while ( my $line = <$input> ) {
$line =~ tr/\n/,/;
printf $output "%s", $line;
}
close $input;
close $output;
Execution:
$ perl script.pl infile outfile