How do I get a filehandle from the command line? - perl

I have a subroutine that takes a filehandle as an argument. How do I make a filehandle from a file path specified on the command line? I don't want to do any processing of this file myself, I just want to pass it off to this other subroutine, which returns an array of hashes with all the parsed data from the file.
Here's what the command line input I'm using looks like:
$ ./getfile.pl /path/to/some/file.csv
Here's what the beginning of the subroutine I'm calling looks like:
sub parse {
my $handle = shift;
my #data = <$handle>;
while (my $line = shift(#data)) {
# do stuff
}
}

Command line arguments are available in the predefined #ARGV array. You can get the file name from there and use open to open a filehandle to it. Assuming that you want read-only access to the file, you would do it this way:
my $file = shift #ARGV;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
parse($fh);
Note that the or die... checks the call open for success and dies with an error message if it wasn't. The built-in variable $! will contain the (OS dependent) error message on failure that tells you why the call wasn't successful. e.g. "Permission denied."

parse(*ARGV) is the simplest solution: the explanation is a bit long, but an important part of learning how to use Perl effectively is to learn Perl.
When you use a null filehandle (<>), it actually reads from the magical ARGV filehandle, which has special semantics: it reads from all the files named in #ARGV, or STDIN if #ARGV is empty.
From perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk. Input from <> comes either from standard
input, or from each file listed on the command line. Here’s how it
works: the first time <> is evaluated, the #ARGV array is checked, and
if it is empty, $ARGV[0] is set to "-", which when opened gives you
standard input. The #ARGV array is then processed as a list of
filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn’t so cumbersome to say, and will actually work. It
really does shift the #ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally--<> is just a
synonym for <ARGV>, which is magical. (The pseudo code above doesn’t
work because it treats <ARGV> as non-magical.)
You don't have to use <> in a while loop -- my $data = <> will read one line from the first non-empty file, my #data = <>; will slurp it all up at once, and you can pass *ARGV around as if it were a normal filehandle.

This is what the -n switch is for!
Take your parse method, and do this:
#!/usr/bin/perl -n
#do stuff
Each line is stored in $_. So you run
./getfile.pl /path/to.csv
And it does this.
See here and here for some more info about these. I like -p too, and have found the combo of -a and -F to be really useful.
Also, if you want to do some extra processing, add BEGIN and end blocks.
#!/usr/bin/perl -n
BEGIN {
my $accumulator;
}
# do stuff
END {
print process_total($accumulator);
}
or whatever. This is very, very useful.

Am I missing something or are you just looking for the open() call?
open($fh, "<$ARGV[0]") or die "couldn't open $ARGV[0]: $!";
do_something_with_fh($fh);
close($fh);

Related

Perl script to parse a text file and match a string

I'm editing my question to add more details
The script executes the command and redirects the output to a text file.
The script then parses the text file to match the following string " Standard 1.1.1.1"
The output in the text file is :
Host Configuration
------------------
Profile Hostname
-------- ---------
standard 1.1.1.1
standard 1.1.1.2
The code works if i search for either 1.1.1.1 or standard . When i search for standard 1.1.1.1 together the below script fails.
this is the error that i get "Unable to find string: standard 172.25.44.241 at testtest.pl
#!/usr/bin/perl
use Net::SSH::Expect;
use strict;
use warnings;
use autodie;
open (HOSTRULES, ">hostrules.txt") || die "could not open output file";
my $hos = $ssh->exec(" I typed the command here ");
print HOSTRULES ($hos);
close(HOSTRULES);
sub find_string
{
my ($file, $string) = #_;
open my $fh, '<', $file;
while (<$fh>) {
return 1 if /\Q$string/;
}
die "Unable to find string: $string";
}
find_string('hostrules.txt', 'standard 1.1.1.1');
Perhaps write a function:
use strict;
use warnings;
use autodie;
sub find_string {
my ($file, $string) = #_;
open my $fh, '<', $file;
while (<$fh>) {
return 1 if /\Q$string/;
}
die "Unable to find string: $string";
}
find_string('output.txt', 'object-cache enabled');
Or just slurp the entire file:
use strict;
use warnings;
use autodie;
my $data = do {
open my $fh, '<', 'output.txt';
local $/;
<$fh>;
};
die "Unable to find string" if $data !~ /object-cache enabled/;
You're scanning a file for a particular string. If that string is not found in that file, you want an error thrown. Sounds like a job for grep.
use strict;
use warnings;
use features qw(say);
use autodie;
use constant {
OUTPUT_FILE => 'output.txt',
NEEDED_STRING => "object-cache enabled",
};
open my $out_fh, "<", OUTPUT_FILE;
my #output_lines = <$out_fh>;
close $out_fh;
chomp #output_lines;
grep { /#{[NEEDED_STRING]}/ } #output_lines or
die qq(ERROR! ERROR! ERROR!); #Or whatever you want
The die command will end the program and exit with a non-zero exit code. The error will be printed on STDERR.
I don't know why, but using qr(object-cache enabled), and then grep { NEEDED_STRING } didn't seem to work. Using #{[...]} allows you to interpolate constants.
Instead of constants, you might want to be able to pass in the error string and the name of the file using GetOptions.
I used the old fashion <...> file handling instead of IO::File, but that's because I'm an old fogy who learned Perl back in the 20th century before it was cool. You can use IO::File which is probably better and more modern.
ADDENDUM
Any reason for slurping the entire file in memory? - Leonardo Herrera
As long as the file is reasonably sized (say 100,000 lines or so), reading the entire file into memory shouldn't be that bad. However, you could use a loop:
use strict;
use warnings;
use features qw(say);
use autodie;
use constant {
OUTPUT_FILE => 'output.txt',
NEEDED_STRING => qr(object-cache enabled),
};
open my $out_fh, "<", OUTPUT_FILE;
my $output_string_found; # Flag to see if output string is found
while ( my $line = <$out_fh> ) {
if ( $line =~ NEEDED_STRING ){
$output_string_found = "Yup!"
last; # We found the string. No more looping.
}
}
die qq(ERROR, ERROR, ERROR) unless $output_string_found;
This will work with the constant NEEDED_STRING defined as a quoted regexp.
perl -ne '/object-cache enabled/ and $found++; END{ print "Object cache disabled\n" unless $found}' < input_file
This just reads the file a line at a time; if we find the key phrase, we increment $found. At the end, after we've read the whole file, we print the message unless we found the phrase.
If the message is insufficient, you can exit 1 unless $found instead.
I suggest this because there are two things to learn from this:
Perl provides good tools for doing basic filtering and data munging right at the command line.
Sometimes a simpler approach gets a solution out better and faster.
This absolutely isn't the perfect solution for every possible data extraction problem, but for this particular one, it's just what you need.
The -ne option flags tell Perl to set up a while loop to read all of standard input a line at a time, and to take any code following it and run it into the middle of that loop, resulting in a 'run this pattern match on each line in the file' program in a single command line.
END blocks can occur anywhere and are always run at the end of the program only, so defining it inside the while loop generated by -n is perfectly fine. When the program runs out of lines, we fall out the bottom of the while loop and run out of program, so Perl ends the program, triggering the execution of the END block to print (or not) the warning.
If the file you are searching contained a string that indicated the cache was disabled (the condition you want to catch), you could go even shorter:
perl -ne '/object-cache disabled/ and die "Object cache disabled\n"' < input_file
The program would scan the file only until it saw the indication that the cache was disabled, and would exit abnormally at that point.
First, why are you using Net::SSH::Expect? Are you executing a remote command? If not, all you need to execute a program and wait for its completion is system.
system("cmd > file.txt") or die "Couldn't execute: $!";
Second, it appears that what fails is your regular expression. You are searching for the literal expression standard 1.1.1.1 but in your sample text it appears that the wanted string contains either tabs or several spaces instead of a single space. Try changing your call to your find_string function:
find_string('hostrules.txt', 'standard\s+1.1.1.1'); # note '\s+' here

issues for a code snippet to handle the input file

I am studying a Perl program, which includes the following segment for handling an input file. I do not understand what is s/^\s+//; used for? Moreover, what are '|' and '||' stand for in open(FILE, "cat $fileName |") || die "could not open file";
open(FILE, "cat $fileName |") || die "could not open file";
while (<FILE>)
{
s/^\s+//;
my #line = split;
if ($line[0]!~ /\:/) {$mark=0}
my $var = $line[$mark];
## some other code
}
You can read the documentation for the various functions in perlfunc.
This code will open a file for reading, by the rather circumspect way of piping from cat instead of simply opening the file. The | means that the shell command cat is piped to the open command, and our file handle will read from the output.
|| is simply or. Open the pipe, and if that fails, the program dies.
while(<FILE>) will read through every line of the input and assign each line to $_. That line is then used implicitly in the substitution and split below. I.e. s/^\s+// is equal to $_ =~ s/^\s+//, and split is equal to split(' ', $_).
s/^\s+//
Will remove leading whitespace. The split will split each line on whitespace, and the elements are stored in the array #line.
Because of the use of implicit split on whitespace, the stripping the leading whitespace with s/^\s+// is not really needed, as that is done automatically.
If the first element does not contain a colon :, $mark is set to 0. Otherwise, it is not set, and will presumably use the value from the previous iteration, since it is not defined inside the loop. Finally, $var is initialized as element number $mark, which is either 0 or whatever.
ETA: As a rather insidious oops: If $mark is undefined, i.e. it does not contain a colon, then $var will still be assigned $line[0], since undef will be converted to 0, with a warning. If use warnings is not in effect, this error is silent, and therefore insidious.
This code seems to be written by someone who does not know too much about perl, and it might not be very safe to use.
The substitution trims leading whitespace that appears at the beginning of the line (^), leaving any non-whitespace characters as the first.
The || operator in open... || die ... is a high-precedence or. If open fails, die executes.
open(FILE, "cat $fileName |") is a waste of an external process. To read a file for input, simply do:
open FILE, '<', $filename or die qq{Could not open "$filename" for reading: $!};
The parentheses for the open call are optional because or does not bind tightly.
It is also better to use lexical file handles:
open my $fh, '<' $filename or die qq{Could not open "$filename" for reading: $!};
This file handle is assigned to a lexical variable that lives only within the scope it is declared. Once the program flow exits this scope, the file closes automatically.
Part of the confusion is that the developer is using the default variable, $_. Many Perl commands (I would say about 1/3 of them) act upon $_ when you don't specify the name of the variable in the function. For example, these are syntactically the same:
my $uppercase_name = uc($_);
my $uppercase_name = uc;
In both cases, the uc function will print the string in the $_ variable in upper case characters. In fact, even the print statement uses the $_ variable. Again, these are both the same:
print $_;
print;
It's frowned upon to use the default variable in newer Perl scripts because it doesn't add clarity to the program and it doesn't make the program faster. I've rewritten the same code snippet you used in order to show the missing $_ variable. It might make the code easier to understand:
open(FILE, "cat $fileName |") || die "could not open file";
while ($_ = <FILE>)
{
$_ =~ s/^\s+//;
my #line = split $_;
if ($line[0] !~ /\:/) {
$mark = 0;
}
my $var = $line[$mark];
## some other code
}
Notice that the while statement is putting the value of the line read into the $_ variable and that the substitute command (the s/^\s+//) is also operating on the $_ variable. I hope that clarifies the code a bit for you.
Now for your questions:
_[W]hat do '|' and '||' stand for?
The || means or as in do this or that. In practice, the or can be thought of as an if statement:
if (not open(FILE, "cat $fileName |")) {
die "could not open file";
}
That is, if the open statement failed, then execute the die statement. If the open statement did manage to open the file, then don't execute the die statement.
In Perl, you now see or instead of || in cases like this:
open(FILE, "cat $fileName |") or die "could not open file";
which makes the meaning a bit more obvious: Open the file, or kill the program.
The single pipe (|) at the end of the file name means execute the command in the open statement (the cat $filename) and read from the output of this command. Imagine something like this:
open (COMMAND, "java -jar foo.war|") or die "Can't execute 'java -jar foo.war'";
Now, I'm running the command java -jar foo.war and using its output in my Perl script.
You can do this the other way around too:
open (MAIL, "|mail $recipient") or die "Can't mail $recipient";
print MAIL "Dear $recipient\n\n";
print MAIL "I hope everything is well.\n";
print MAIL "Sincerely,\n\nDavid";
close MAIL;
I'm now opening the command mail $recipient and writing to it with the print statements. In this case, I'm emailing $recipient with a simple message.
I do not understand what is s/^\s+//; used for?
In the original program, it was on a line by itself:
s/^\s+//;
I've added the missing variable which should help clarify it a bit:
$_ =~ s/^\s+//;
This is the Substitute command in Perl. It's taking the $_ variable and substituting the regular expression ^\s+ with nothing. If you don't understand what are regular expressions, you should take a look at the Perldoc tutorial on the subject. Basically, this is removing all spaces, tabs, and other forms of white space from the beginning of the line.

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

STDOUT to array Perl

I am compiling a Perl program, i am writing the output STDOUT to a file. In the same program , i want to run another small script using while function on the output of STDOUT. So, I need to save the output of first script in an array, then i can use in while<#array>. Like
open(File,"text.txt");
open(STDOUT,">output,txt");
#file_contents=<FILE>;
foreach (#file_contents){
//SCRIPT GOES HERE//
write;
}
format STDOUT =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
//Here I want to use output of above program in while loop //
while(<>){
}
How can i save the output of first program into array so that i can use in while loop, or how can i directly use STDOUT in while loop. I have to make sure that first part is completely executed. Thanks in advance.
Since you remapped STDOUT so it writes to a file, you could presumably close STDOUT, and then reopen the file for reading.
Quite where you're going to send any other output is a bit of a mystery, but presumably you can resolve that. Were it me, I'd not fiddle with STDOUT. I'd make the script write to a file handle:
use strict;
use warnings;
open my $input, "<", "text.txt" or die "A horrible death";
open my $output, ">", "output.txt" or die "A horrible death";
my #file_contents = <$input>;
close($input);
foreach (#file_contents)
{
# Script goes here
print $output "Any information that goes to output\n";
}
close $output;
open my $reread, "<", "output.txt" or die "A horrible death";
while (<$reread>)
{
# Process the previous output
}
Note the use of lexical file handles, the checking that the open worked, the close when finished with the input file, the use of use strict; and use warnings;. (I've only been working with Perl for 20 years and I know I don't trust my scripts until they run clean with those settings.)
I assume you want to reopen STDOUT in order to make the write function work. However, the correct solution for that is to either specify the file handle, or to a lesser extent, to use select.
write FILEHANDLE;
or
select FILEHANDLE;
write;
Unfortunately, it seems the IO of perlform is a bit arcane, and does not seem to allow for lexical file handles.
Your problem is you can't reuse the formatted text within the program, so a bit of trixy programming is required. What you can do is open a file handle that prints to a scalar. Which is another somewhat arcane perl functionality, but in this case, it might be the only way to do this directly.
# Using FOO as format to avoid destroying STDOUT
format FOO =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
my $foo;
use autodie; # save yourself some typing
open INPUT, '<', "text.txt"; # normally, we would add "or die $!" on these
open FOO, '>', \$foo; # but now autodie handles that for us
open my $output, '>', "output.txt";
while (<FILE>) {
$foo = ""; # we need to reset $foo each iteration
write FOO; # write to the file handle instead
print $output $foo; # this now prints $foo to output.txt
do_something($foo); # now you can also process the text at the same time
}
As you'll notice, we now first print the formatted line to the scalar $foo. While it is there, we can handle it as regular data, so there's no need to save to a file and reopening it to get to the data.
Each iteration, data is concatenated to the end of $foo, so to avoid accumulation, we need to reset $foo. The best way to handle this would be to make $foo lexical within the scope, but unfortunately we need $foo to be declared outside the while loop in order to be able to use it in the open statement.
It might be possible to use local $foo inside the while-loop, but I think that's adding yet more bad practice to this already very bad hack.
Conclusion:
With all this said and done, I suspect the best way to handle this is to not use perlform at all, and format your data in some other way. While perlform might be well suited to print to a file, it is not the best suited for what you have in mind. I recall this question from earlier, perhaps there was some other answer that would work better. Such as using sprintf, like Jonathan suggested
Assuming the output from your first program is tab-delimited:
while (<>) {
chomp $_;
my ($variable, $x, $y, $z) = split("\t", $_);
# do stuff with values
}

How can I mix command line arguments and filenames for <> in Perl?

Consider the following silly Perl program:
$firstarg = $ARGV[0];
print $firstarg;
$input = <>;
print $input;
I run it from a terminal like:
perl myprog.pl sample_argument
And get this error:
Can't open sample_argument: No such file or directory at myprog.pl line 5.
Any ideas why this is? When it gets to the <> is it trying to read from the (non-existent) file, "sample_argument" or something? And why?
<> is shorthand for "read from the files specified in #ARGV, or if #ARGV is empty, then read from STDIN". In your program, #ARGV contains the value ("sample_argument"), and so Perl tries to read from that file when you use the <> operator.
You can fix it by clearing #ARGV before you get to the <> line:
$firstarg = shift #ARGV;
print $firstarg;
$input = <>; # now #ARGV is empty, so read from STDIN
print $input;
See the perlio man page, which reads in part:
The null filehandle <> is special: it can be used to emulate the behavior of sed
and awk. Input from <> comes either from standard input, or from each file listed
on the command line. Here’s how it works: the first time <> is evaluated, the
#ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when
opened gives you standard input. The #ARGV array is then processed as a list of
filenames.
If you want STDIN, use STDIN, not <>.
By default, perl consumes the command line arguments as input files for <>. After you've used them, you should consume them yourself with shift;