Can't find error "Global symbol #xx requires explicit package name" - perl

I have checked the questions that may already have an answer and none of them have helped.
This is for my semester project for Unix Programming. I have created a script that compares HTML files to one other from a website.
The script worked perfectly as expected until I tried to implement the second website, so in turn I deleted the added code for the second website and now I get the errors
Global symbol "#master" requires explicit package name
Global symbol "#child" requires explicit package name
within the csite_md5 subroutine. I have gone through the code many times over and cannot see the problem.
I am looking for another set of eyes to see if I'm just missing something simple, which usually is the case.
Also I am new to Perl as this is my first time using the language.
#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
use File::Basename;
# Path to the c-site download root directory
my $csite_dir = '/root/websites/c-site/wget/';
opendir my $dh, $csite_dir or die $!;
# Finds the sub directories c-site_'date +%F' where the c-site download is located
my #wget_subdir_csite = sort grep /^[^.]/, readdir $dh;
# Creates the absolute path to the c-site download
my $csite_master_dir = "$csite_dir$wget_subdir_csite[0]/dayzunderground.webs.com";
my $csite_child_dir = "$csite_dir$wget_subdir_csite[1]/dayzunderground.webs.com";
# Call to subroutine to append the .html file name to the absolute path
my #master_csite = &gethtml_master_csite($csite_master_dir);
my #child_csite = &gethtml_child_csite($csite_child_dir);
&csite_md5(\#master_csite, \#child_csite);
sub gethtml_master_csite{
my ($master_path) = #_;
opendir (DIR, $master_path) or die $!;
# Ends with .html and is a file
my #html_master = sort grep {m/\.html$/i && -f "$master_path/$_"} readdir(DIR);
my #files_master = ("$master_path/$html_master[0]","$master_path/$html_master[1]","$master_path/$html_master[2]","$master_path/$html_master[3]");
return #files_master
}
sub gethtml_child_csite{
my ($child_path) = #_;
opendir (DIR, $child_path) or die $!;
# Ends with .html and is a file
my #html_child = sort grep {m/\.html$/i && -f "$child_path/$_"} readdir(DIR);
my #files_child = ("$child_path/$html_child[0]","$child_path/$html_child[1]","$child_path/$html_child[2]","$child_path/$html_child[3]");
return #files_child
}
sub csite_md5{
my ($master, $child) = #_;
if(&md5sum($master[0]) ne &md5sum($child[0])){
my $filename = basename($master[0]);
system("diff -u -d -t --width=100 $master[0] $child[0] > ~/websites/c-site/diff/c-site-$filename-`date +%F`");
#print "1"
}
if(&md5sum($master[1]) ne &md5sum($child[1])){
my $filename2 = basename($master[1]);
system("diff -u -d -t --width=100 $master[1] $child[1] > ~/websites/c-site/diff/c-site-$filename2-`date +%F`");
#print "2"
}
if(&md5sum($master[2]) ne &md5sum($child[2])){
my $filename3 = basename($master[2]);
system("diff -u -d -t --width=100 $master[2] $child[2] > ~/websites/c-site/diff/c-site-$filename3-`date +%F`");
#print "3"
}
if(&md5sum($master[3]) ne &md5sum($child[3])){
my $filename4 = basename($master[3]);
system("diff -u -d -t --width=100 $master[3] $child[3] > ~/websites/c-site/diff/c-site-$filename4-`date +%F`");
#print "4"
}
}
sub md5sum{
my $file = shift;
my $digest = "";
eval{
open(FILE, $file) or die "Can't find file $file\n";
my $ctx = Digest::MD5->new;
$ctx->addfile(*FILE);
$digest = $ctx->hexdigest;
close(FILE);
};
if($#){
print $#;
return "";
}
return $digest
}

$master and $child are array references; use them like $master->[0]. $master[0] uses the array #master, which is a completely separate variable.

I thought it may help to go through your program and point out some practices that are less than optimal
You shouldn't use an ampersand & when calling a Perl subroutine. That was required in Perl 4 which was superseded about 22 years ago
It is preferable to use the File::Spec module to manipulate file paths, both to handle cases like multiple path separators and for portability. File::Spec will also do the job of File::BaseName
It is unnecessary to use the shell to create a date string. Use the Time::Piece module and localtime->ymd generates the same string as date +%F
It is neater and more concise to use map where appropriate instead of writing multiple identical assignments
The gethtml_master_csite and gethtml_child_csite subroutines are identical except that they use different variable names internally. They can be replaced by a single gethtml_csite subroutine
You should use lexical file and directory handles throughout, as you have done with the first opendir. You should also use the three-parameter form of open (with the open mode as the second parameter)
If an open fails then you should include the variable $! in the die string so that you know why it failed. Also, if you end the string with a newline then Perl won't append the source file and line number to the string when it is printed
As you have read, the csite_md5 attempts to use arrays #master and #child which don't exist. You have array references $master and $child instead. Also, the subroutine lends itself to a loop structure instead of writing the four comparisons explicitly
In md5sum you have used an eval to catch the die when the open call fails. It is nicer to check for this explicitly
The standard way of returning a false value from a subroutine is a bare return. If you return '' then it will evaluate as true in list context
With those chnages in place your code looks like this. Please ask if you have any problem understanding it. Note that I haven't been able to test it but it does compile
#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
use File::Spec::Functions qw/ catdir catfile splitpath /;
use Time::Piece 'localtime';
my $csite_dir = '/root/websites/c-site/wget/';
opendir my $dh, $csite_dir or die qq{Unable to open "$csite_dir": $!};
my #wget_subdir_csite = sort grep /^[^.]/, readdir $dh;
my ($csite_master_dir, $csite_child_dir) = map
catdir($csite_dir, $_, 'dayzunderground.webs.com'),
#wget_subdir_csite[0,1];
my #master_csite = gethtml_csite($csite_master_dir);
my #child_csite = gethtml_csite($csite_child_dir);
csite_md5(\#master_csite, \#child_csite);
sub gethtml_csite {
my ($path) = #_;
opendir my $dh, $path or die qq{Unable to open "$path": $!};
my #files = sort grep { /\.html$/i and -f } map catfile($path, $_), readdir $dh;
return #files;
}
sub csite_md5 {
my ($master_list, $child_list) = #_;
for my $i ( 0 .. $#$master_list ) {
my ($master, $child) = ($master_list->[$i], $child_list->[$i]);
if ( md5sum($master) ne md5sum($child) ) {
my $filename = (splitpath($master))[-1]; # Returns (volume, path, file)
my $date = localtime->ymd;
system("diff -u -d -t --width=100 $master $child > ~/websites/c-site/diff/c-site-$filename-$date");
}
}
}
sub md5sum {
my ($file) = #_;
my $digest = "";
open my $fh, '<', $file or do {
warn qq{Can't open file "$file": $!}; # '
return;
};
my $ctx = Digest::MD5->new;
$ctx->addfile($fh);
return $ctx->hexdigest;
}

Related

Recovering a specific line in multiple .txt in a directory using Perl

I have the results of a program which gives me the results from some search giving me 2000+ file txt archives. I just need a specific line in each file, this is what I have been trying with Perl:
opendir(DIR, $dirname) or die "Could not open $dirname\n";
while ($filename = readdir(DIR)) {
print "$filename\n";
open ($filename, '<', $filename)or die("Could not open file.");
my $line;
while( <$filename> ) {
if( $. == $27 ) {
print "$line\n";
last;
}
}
}
closedir(DIR);
But there is a problem with the $filename in line 5 and I don't know an alternative to it so I don't have to manually name each file.
Several issues with that code:
Using an old-school bareword identifier for the directory handle instead of a autovivified variable like you are for the file handle.
Using the same variable for the filename and file handle is pretty strange.
You don't check to see if the file is a directory or something else other than a plain file before trying to open it.
$27?
You never assign anything to that $line variable before printing it.
Unless $directory is your program's current working directory, you're running into an issue mentioned in the readdir documentation
If you're planning to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir there, it would have been testing the wrong file.
(Substitute open for filetest)
Always use strict; and use warnings;.
Personally, if you just want to print the 27th line of a large number of files, I'd turn to awk and find (Using its -exec test to avoid potential errors about the command line maximum length being hit):
find directory/ -maxdepth 1 -type -f -exec awk 'FNR == 27 { print FILENAME; print }' \{\} \+
If you're on a Windows system without standard unix tools like those installed, or it's part of a bigger program, a fixed up perl way:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw/say/;
use File::Spec;
my $directory = shift;
opendir(my $dh, $directory);
while (my $filename = readdir $dh) {
my $fullname = File::Spec->catfile($directory, $filename); # Construct a full path to the file
next unless -f $fullname; # Only look at regular files
open my $fh, "<", $fullname;
while (my $line = <$fh>) {
if ($. == 27) {
say $fullname;
print $line;
last;
}
}
close $fh;
}
closedir $dh;
You might also consider using glob to get the filenames instead of opendir/readdir/closedir.
And if you have Path::Tiny available, a simpler version is:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw/say/;
use Path::Tiny;
my $directory = shift;
my $dir = path $directory;
for my $file ($dir->children) {
next unless -f $file;
my #lines = $file->lines({count => 27});
if (#lines == 27) {
say $file;
print $lines[-1];
}
}

How do I execute a unix command containing a perl variable in perl

In the following perl code, I am tring to copy a perl variable $file from one directory to another directory with:
"system("cp $file $Output_Dir);
This command writes down the file name alright but then says:
cp: cannot stat 'tasmax_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_day_19910101-19951231.nc': No such file or directory
The command
system("#sixfiles = ls $Vars[$kk]}*");
gives me the error:
sh: 1: =: not found
I wonder what is wrong with this code. Assistance will be appreciated.
#!/usr/bin/perl -w
use strict;
use warnings;
use File::Path;
use File::Copy;
my $debug = 1;
my #Vars = ("pr","tasmin","tasmax");
my $Vars;
my #sixfiles;
my $sixfiles;
my $Input_Dir = "/home/zmumba/DATA/Input_Dir";
my $Output_Dir = "/home/zmumba/DATA/Output_Dir";
for (my $kk=0; $kk < #Vars; ++$kk) {
opendir my $in_dir, $Input_Dir or die "opendir failed on $Input_Dir: $! ($^E)";
while (my $file=readdir $in_dir) {
next unless $file =~ /^$Vars[$kk]/;
next if -d $file;
print "$file\n";
print "Copying $file\n" if $debug;
my $cmd01 = "cp $file $Output_Dir";
print "Doing system ($cmd01)\n" if $debug;
system ($cmd01);
system("#sixfiles = ls $Vars[$kk]}*");
}
}
Try this:
use feature qw(say);
use strict;
use warnings;
use File::Spec;
my #Vars = ("pr","tasmin","tasmax");
my $Input_Dir = "/home/zmumba/DATA/Input_Dir";
my $Output_Dir = "/home/zmumba/DATA/Output_Dir";
opendir my $in_dir, $Input_Dir or die "opendir failed on $Input_Dir: $! ($^E)";
while (my $file=readdir $in_dir) {
next if ($file eq '.') || ($file eq '..');
next if -d $file;
next if !grep { $file =~ /^$_/ } #Vars;
say "Copying $file";
$file = File::Spec->catfile( $Input_Dir, $file );
system "cp", $file, $Output_Dir;
}
system ($cmd01);
Gives:
cp: cannot stat '<long-but-correct-file-name>': No such file or directory
This is almost certainly because you are not running the code from $Input_Dir, so that file doesn't exist in your current directory. You need to either chdir to the correct directory or add the directory path to the front of the file name variable.
system("#sixfiles = ls $Vars[$kk]}*");
This code makes no sense. The code passed to system() needs to be Unix shell code. That's the ls $Vars[$kk]}* bit (but I'm not sure where that } comes from). You can't populate a Perl array inside a shell command. You would need to capture the value returned from the ls command and then parse it somehow to separate it into a list.
You can give a try with the following code:
#!/usr/bin/env perl
use strict;
use warnings;
my $debug = 1;
my #Vars = ("pr", "tasmin", "tasmax");
my $Vars;
my $Input_Dir = "/home/zmumba/DATA/Input_Dir";
my $Output_Dir = "/home/zmumba/DATA/Output_Dir";
my $cpsrc, $cpdest = '';
print "No Write Permission: $!" unless(-w $Output_Dir);
for my $findex (0 .. $#Vars) {
$cpsrc = qq($Input_Dir/$Vars[$findex]);
print "$Vars[$findex]\n";
print "Copying $Vars[$findex]\n" if $debug;
my $cmd01 = "cp $cpsrc $Output_Dir";
print "Doing system ($cmd01)\n" if $debug;
system($cmd01);
}
You don't have to go through each file in source dir. You already know the files to copy from source.

Return file handle from subroutine and pass to other subroutine

I am trying to create a couple of functions that will work together. getFH should take in the mode to open the file (either > or < ), and then the file itself (from the command line). It should do some checking to see if the file is okay to open, then open it, and return the file handle. doSomething should take in the file handle, and loop over the data and do whatever. However when the program lines to the while loop, I get the error:
readline() on unopened filehandle 1
What am I doing wrong here?
#! /usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Getopt::Long;
use Pod::Usage;
# command line param(s)
my $infile = '';
my $usage = "\n\n$0 [options] \n
Options
-infile Infile
-help Show this help message
\n";
# check flags
GetOptions(
'infile=s' => \$infile,
help => sub { pod2usage($usage) },
) or pod2usage(2);
my $inFH = getFh('<', $infile);
doSomething($inFH);
## Subroutines ##
## getFH ##
## #params:
## How to open file: '<' or '>'
## File to open
sub getFh {
my ($read_or_write, $file) = #_;
my $fh;
if ( ! defined $read_or_write ) {
die "Read or Write symbol not provided", $!;
}
if ( ! defined $file ) {
die "File not provided", $!;
}
unless ( -e -f -r -w $file ) {
die "File $file not suitable to use", $!;
}
unless ( open( $fh, $read_or_write, $file ) ) {
die "Cannot open $file",$!;
}
return($fh);
}
#Take in filehandle and do something with data
sub doSomething{
my $fh = #_;
while ( <$fh> ) {
say $_;
}
}
my $fh = #_;
This line does not mean what you think it means. It sets $fh to the number of items in #_ rather than the filehandle that is passed in - if you print the value of $fh, it will be 1 instead of a filehandle.
Use my $fh = shift, my $fh = $_[0], or my ($fh) = #_ instead.
As has been pointed out, my $fh = #_ will set $fh to 1, which is not a file handle. Use
my ($fh) = #_
instead to use list assignment
In addition
-e -f -r -w $file will not do what you want. You need
-e $file and -f $file and -r $file and -w $file
And you can make this more concise and efficient by using underscore _ in place of the file name, which will re-use the information fetched for the previous file test
-e $file and -f _ and -r _ and -w _
However, note that you will be rejecting a request if a file isn't writeable, which makes no sense if the request is to open a file for reading. Also, -f will return false if the file doesn't exist, so -e is superfluous
It is good to include $! in your die strings as it contains the reason for the failure, but your first two tests don't set this value up, and so should be just die "Read or Write symbol not provided"; etc.
In addition, die "Cannot open $file", $! should probably be
die qq{Cannot open "$file": $!}
to make it clear if the file name is empty, and to add some space between the message and the value of $!
The lines read from the file will have a newline character at the end, so there is no need for say. Simply print while <$fh> is fine
Perl variable names are conventionally snake_case, so get_fh and do_something is more usual

Print files and subdirectories of given directory

I am trying to get all files and directories from a given directory but I can't specify what is the type (file/ directory). Nothing is being printed. What I am doing wrong and how to solve it. Here is the code:
sub DoSearch {
my $currNode = shift;
my $currentDir = opendir (my $dirHandler, $currNode->rootDirectory) or die $!;
while (my $node = readdir($dirHandler)) {
if ($node eq '.' or $node eq '..') {
next;
}
print "File: " . $node . "\n" if -f $node;
print "Directory " . $node . "\n" if -d $node;
}
closedir($dirHandler);
}
readdir returns only the node name without any path information. The file test operators will look in the current working directory if no path is specified, and because the current directory isn't $currNode->rootDirectory they won't be found
I suggest you use rel2abs from the File::Spec::Functions core module to combine the node name with the path. You can use string concatenation, but the library function takes care of corner cases like whether the directory ends with a slash
It's also worth pointing out that Perl identifiers are most often in snake_case, and people familiar with the language would thank you for not using capital letters. They should especially be avoided for the first character of an identifier, as names like that are reserved for globals like package names
I think your subroutine should look like this
use File::Spec::Functions 'rel2abs';
sub do_search {
my ($curr_node) = #_;
my $dir = $curr_node->rootDirectory;
opendir my $dh, $dir or die qq{Unable to open directory "$dir": $!};
while ( my $node = readdir $dh ) {
next if $node eq '.' or $node eq '..';
my $fullname = rel2abs($node, $dir);
print "File: $node\n" if -f $fullname;
print "Directory $node\n" if -d $fullname;
}
}
An alternative method is to set the current working directory to the directory being read. That way there is no need to manipulate file paths, but you would need to save and restore the original working directory before and after changing it
The Cwd core module provides getcwd and your code would look like this
use Cwd 'getcwd';
sub do_search {
my ($curr_node) = #_;
my $cwd = getcwd;
chdir $curr_node->rootDirectory or die $!;
opendir my $dh, '.' or die $!;
while ( my $node = readdir $dh ) {
next if $node eq '.' or $node eq '..';
print "File: \n" if -f $node;
print "Directory $node\n" if -d $node;
}
chdir $cwd or die $!;
}
Use this CPAN Module to get all files and subdirectories recursively.
use File::Find;
find(\&getFile, $dir);
my #fileList;
sub getFile{
print $File::Find::name."\n";
# Below lines will print only file name.
#if ($File::Find::name =~ /.*\/(.*)/ && $1 =~ /\./){
#push #fileList, $File::Find::name."\n";
}
Already answered, but sometimes is handy not to care with the implementation details and you could use some CPAN modules for hiding such details.
One of them is the wonderful Path::Tiny module.
Your code could be as:
use 5.014; #strict + feature 'say' + ...
use warnings;
use Path::Tiny;
do_search($_) for #ARGV;
sub do_search {
my $curr_node = path(shift);
for my $node ($curr_node->children) {
say "Directory : $node" if -d $node;
say "Plain File : $node" if -f $node;
}
}
The children method excludes the . and the .. automatically.
You also need understand that the -f test is true only for the real files. So, the above code excludes for example symlinks (whose points to real files), or FIFO files, and so on... Such "files" could be usually opened and read as plain files, therefore somethimes instead of the -f is handy to use the -e && ! -d test (e.g. exists, but not an directory).
The Path::Tiny has some methods for this, e.g. you could write
for my $node ($curr_node->children) {
print "Directory : $node\n" if $node->is_dir;
print "File : $node\n" if $node->is_file;
}
the is_file method is usually DWIM - e.g. does the: -e && ! -d.
Using the Path::Tiny you could also easily extend your function to walk the whole tree using the iterator method:
use 5.014;
use warnings;
use Path::Tiny;
do_search($_) for #ARGV;
sub do_search {
#maybe you need some error-checking here for the existence of the argument or like...
my $iterator = path(shift)->iterator({recurse => 1});
while( my $node = $iterator->() ) {
say "Directory : ", $node->absolute if $node->is_dir;
say "File : ", $node->absolute if $node->is_file;
}
}
The above prints the type for all files and directories recursive down from the given argument...
And so on... the Path::Tiny is really worth to have installed.

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.