How can I check for eof in Perl? - perl

So I have a bit of problem figuring what Perl does in the following case:
while(1){
$inputLine=<STDIN>
#parse $inputLine below
#BUT FIRST, I need to check if $inputLine = EOF
}
before I get the obvious answer of using while(<>){}, let me say that there is a very strong reason that I have to do the above (basically setting up an alarm to interrupt blocking and I didnt want that code to clutter the example).
Is there someway to compare $inputLine == undef (as I think that is what STDIN returns at the end).
Thanks.

Inside your loop, use
last unless defined $inputLine;
From the perlfunc documentation on defined:
defined EXPR
defined
Returns a Boolean value telling whether EXPR has a value other than the undefined value undef. If EXPR is not present, $_ will be checked.
Many operations return undef to indicate failure, end of file, system error, uninitialized variable, and other exceptional conditions. This function allows you to distinguish undef from other values. (A simple Boolean test will not distinguish among undef, zero, the empty string, and "0", which are all equally false.) Note that since undef is a valid scalar, its presence doesn't necessarily indicate an exceptional condition: pop returns undef when its argument is an empty array, or when the element to return happens to be undef.

defined($inputLine)
Also, see the 4 argument version of the select function for an alternative way to read from a filehandle without blocking.

You can use eof on the filehandle. eof will return 1 if the next read on FILEHANDLE is an EOF.

The following will have problems with input files that have lines which only have a line feed or as in the case that was giving me problems a FF at the beginning of some lines (Form Feed - the file was the output from a program developed at the end of the 70s and still has formatting for a line printer and is still in FORTRAN - I do miss the wide continous paper for drawing flow diagrams on the back).
open (SIMFIL, "<", 'InputFileName') or die "Can´t open InputFileName\n" ;
open (EXTRDATS, ">>", 'OutputFileName' ) or die "Can´t open OutputFileName\n";
$Simfilline = "";
while (<SIMFIL>) {
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
}
close SIMFIL;
close EXTRDATS;
`
The following is when eof comes in handy - the expression: "while ()" can return false under conditions other than the end of the file.
open (SIMFIL, "<", 'InputFileName') or die "Can´t open InputFileName\n" ;
open (EXTRDATS, ">>", 'OutputFileName' ) or die "Can´t open OutputFileName\n";
$Simfilline = "";
while (!eof SIMFIL) {
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
}
close SIMFIL;
close EXTRDATS;
This last code fragment appears to duplicate the input file exactly.

Related

perl script not looking for text file

I have a Perl script that is supposed to look for the text file I write in the command argument but, for whatever reason, it doesn't even acknowledge the existence of the text file I write in the argument, even though it is in the same folder.
This is where the code starts going haywire
my $filename = $ARGV[0];
if($filename == "") {
print("[ERROR] Argument unavailable! use ./script.pl filename.txt\n");
end;
} elsif (open (FILE, "<", $filename)) {
print("[INFO] File $filename loaded successfully!\n\n");
menu();
close FILE;
} else{
die("An error occured while opening the file: $!\n\n");
end;
}
Always use
use strict;
use warnings;
When writing Perl code. It will tell you when you do something wrong, and give you information that might otherwise be hard to find.
What do I get when I run your program with these pragmas on?
$ foo.pl asdasd
Argument "" isn't numeric in numeric eq (==) at foo.pl line 9.
Argument "asdasd" isn't numeric in numeric eq (==) at foo.pl line 9.
[ERROR] Argument unavailable! use ./script.pl filename.txt
Those warnings come from use warnings. Good thing we were using that! Here, I am told that using == for string comparisons is causing some issues.
What happens is that both the filename and the empty string "" is being cast to numbers. Perl uses context for operators, and == forces Perl to use a numeric, scalar context. It assumes the arguments are supposed to be numbers, to it tries to coerce them into numbers. It will attempt to find a number at the beginning of the string, and if it doesn't find one, cast the value to 0. So your comparison becomes:
if (0 == 0)
# equal to "foo.txt" == ""
Which is true. Hence the program never gets further than this.
The proper way to fix this particular problem is to use eq, the string equality comparison:
if ($file eq "")
Then it will check if the file name is the empty string. However, this is not the correct solution for you. Lets try it out, and use the test case that the user forgot the argument:
$ foo.pl
Use of uninitialized value $filename in string eq at foo.pl line 9.
[ERROR] Argument unavailable! use ./script.pl filename.txt
Why? Because in this case $ARGV[0] is not the empty string, it is undef, or uninitialized. It still sort of gets it right, since undef eq "" is true, but does give a warning that you are using the wrong method.
What you want to do here is just check if it exists. A new strategy:
if (#ARGV < 1) # check for number of arguments to program
You can also adopt a file test, and check if the file exists:
if ( ! -e $file)
However, the simpler way to handle those cases is to just use a proper open statement:
open my $fh, "<", $file or die "Cannot open '$file': $!";
Which will then tell you if the file did not exist. This is the idiomatic way to open files in Perl: Three argument open with explicit open mode to prevent code injection, lexical file handle, and handling exceptions and reporting the error.
If I were to write your program, I would write it as:
if (#ARGV < 1) {
die "Usage: $0 <filename>"; # $0 is your program's filename
}
my $file = shift; # default shift uses #ARGV, or #_ inside a sub, this is a common Perl idiom
open my $fh, "<", $file or die "Cannot open '$file': $!";
menu(); # your menu subroutine, I assume...
close $fh;

Operator to read a file using file handle

I am trying to read a file,
while($line = $file_handle)
When I ran this code, the program hung.
I noticed to read the file using file handle, we need to used <>
while($line = <file_handle>)
The later code obviously ran.
Now I know operator <> is to read the file line by line, I want to know what exactly is happening when I dont provide <> operator?
Is it not able to find the end of the line ?or?
Thankyou
Given this code:
use warnings;
use strict;
open my $fh, '<', 'in.txt' or die $!;
while (my $x = $fh){
$DB::single=1;
print "$x\n";
}
when run in the debugger, you can see that $x now contains a GLOB (a copy of the file handle). On each iteration, $x takes the whole handle as an assignment, and causes an infinite loop, because the statement is always true. Because you're always assigning a true value (the handle), the while statement is effectively no different than writing while(1){....
perl -d script.pl
main::(x.pl:4): open my $fh, '<', 'in.txt' or die $!;
DB<1> s
main::(x.pl:6): while (my $x = $fh){
DB<1> s
main::(x.pl:7): $DB::single=1;
DB<1> x $x
0 GLOB(0xbfdae8)
-> *main::$fh
FileHandle({*main::$fh}) => fileno(6)
DB<2> s
main::(x.pl:7): $DB::single=1;
DB<2> x $x
0 GLOB(0xbfdae8)
-> *main::$fh
FileHandle({*main::$fh}) => fileno(6)
<$fh> essentially extracts a single line from the file handle, and when EOF is hit (or an error occurs), returns undef, which terminates the loop (because undef is false).
Short File is read by the <> operator so without it it is only an assignment, thus the infinite loop.
The while (...) { ... } checks the condition inside () and if true it executes the body in the block {}. It keeps doing this until the condition in () evaluates to what is understood as false (generally 0, '0', '', or undef). This is what the operator <> provides, for example, so we have an idiom
while (my $line = <$file_handle>) { ... }
The <> operator reads a line at each iteration from the resource that $file_handle is associated with and when it reaches end-of-file it returns undef so the loop terminates and the program execution continues at the next statement after the loop. The diamond operator <> is the operator form for the function readline. See I/O Operators in perlop. For all this to work $file_handle has to be a valid resource that can retrieve data.
Without the <> operator nothing is read from anywhere, but there is only an assignment. The code does the following. It copies the variable $file_handle into the variable $line. The return value of that operation in Perl is the value that ends up in $line, and if that is a 'truthy' value then the body { ... } is executed. The $file_handle clearly evaluates to 'true', otherwise the loop body would not execute even once and the program would continue. Thus $line is true, too. So if the $file_handle doesn't change in the body {...} of the loop the condition is always true.
Then whatever is in the body keeps being executed, without a reason for the loop to terminate, and it thus never returns control to the program. It's an infinite loop, and the program appears to hang.
Note that this is sometimes used deliberately and you may see code like
while (1) {
# Compute what is needed
# Recalculate the condition for when to stop
last if $condition_to_terminate;
}
This approach can be risky though, since the condition can get more and more complicated and an error could sneak in, in which case we end up with an infinite loop. There are usually clearer ways to control loops.
A completely different example is an event loop, where it is crucial to enter an infinite loop so that we can then wait for an event of some sort, at which point a particular action is taken. This is how GUI's work, for example, and a number of other systems.
For the example that 'hung':
while($line = $file_handle)
The elements $line = $file_handle is purely an assignment. At that point, your while is just checking that the assignment is truthy, i.e. that $line is not the number 0, the string 0, an empty string, or undef, which of course it's not -> hense you get an infinite loop.

Correct use of input file in perl?

database.Win.txt is a file that contains a multiple of 3 lines. The second of every three lines is a number. The code is supposed to print out the three lines (in a new order) on one line separated by tabs, but only if the second line is 1.
Am I, by this code, actually getting the loop to create an array with three lines of database.Win.txt each time it runs through the loop? That's my goal, but I suspect this isn't what the code does, since I get an error saying that the int() function expects a numeric value, and doesn't find one.
while(<database.Win.txt>){
$new_entry[0] = <database.Win.txt>;
$new_entry[1] = <database.Win.txt>;
$new_entry[2] = <database.Win.txt>;
if(int($new_entry[1]) == 1) {
chomp($new_entry);
print "$new_entry[1], \t $new_entry[2], \t $new_entry[0], \n"
}
}
I am a total beginner with Perl. Please explain as simply as possible!
I think you've got a good start on the solution. However, your while reads one line right before the next three lines are read (if those were <$file_handles>). int isn't necessary, but chomp is--before you check the value of $new_entry[1] else there's still a record separator at the end.
Given this, consider the following:
use strict;
use warnings;
my #entries;
open my $fh, '<', 'database.Win.txt' or die $!;
while (1) {
last if eof $fh;
chomp( $entries[$_] = <$fh> ) for 0 .. 2;
if ( $entries[1] == 1 ) {
print +( join "\t", #entries ), "\n";
}
}
close $fh;
Always start with use strict; use warnings. Next, open the file using the three-argument form of open. A while (1) is used here, so three lines at a time can be read within the while loop. Since it's an 'infinite' while loop, the last if eof $fh; gives a way out, viz., if the next file read produces an end of file, it's the last. Right below that is a for loop that effectively does what you did: assign a file line to an array position. Note that chomp is used to remove the record separator during the assignment. The last part is also similar to yours, as it checks whether the second of the three lines is 1, and then the line is printed if it is.
Hope this helps!

How Can I Store a File Handle in a Perl Object and how can I access the result?

I wanted to store a file handle in a Perl Object. Here is how I went about it.
sub openFiles {
my $self = shift;
open (my $itemsFile, "<", "items.txt") or die $!;
open (my $nameFile, "<", "FullNames.txt") or die $!;
$self->{itemsFile} = $itemsFile;
$self->{nameFile} = $nameFile;
return $self;
}
Then I'm looking to access some information from one of these files. Here is how I go about it.
sub getItemDescription {
my $self = #_;
chomp(my $record = $self->{itemsFile});
return $record;
}
I attempt to access it in another procedure as follows:
print "Test 3: $self->getItemDescription()\n";
My questions are as follows:
Is the way I'm saving the file handle in the object correct? If not, how is it wrong?
Is the way I'm reading the lines of the file correct? If not, how can I get it right?
Finally, is the way I'm printing the returned object correct?
This is really important to me. If there is any way that I can improve the structure of my code, i.e. making a global variable for file handling or changing the structure of the object, please let me know.
Is the way I'm saving the file handle in the object correct?
Yes.
Is the way I'm reading the lines of the file correct?
No. That just assigns the file handle. One reads a line from the file using the readline operator.
One would normally use the <...> syntax of the readline operator, but <...> is a shortcut for both readline(...) and glob(qq<...>), and Perl thinks <$self->{itemsFile}> is short for glob(qq<$self->{itemsFile}>). You have to use readline specifically
my $record = readline($self->{itemsFile});
chomp($record) if defined($record);
or do some extra work
my $fh = $self->{itemsFile};
my $record = <$fh>;
chomp($record) if defined($record);
(Note that I don't call chomp unconditionally since readline/<> can return undef.)
Finally, is the way I'm printing the returned object correct?
I presume you mean returned string, as in the string returned by getItemDescription. The catch is, you never actually call the method. ->getItemDescription() has no meaning in double quoted string literals, even after a variable. You need to move $self->getItemDescription() out of the double quotes.
You also fail to check if you've reached the end of the file.
You are close.
To read a record (line) from a filehandle, you use the builtin readline function or the <...> operator AFTER you assign the filehandle to a "simple scalar" (see edit below).
chomp(my $record = readline( $self->{itemsFile} );
my $fh = $self->{itemsFile};
chomp(my $record = <$fh>);
There is also a bug in your getItemDescription method. You'll want to say
my ($self) = #_;
instead of
my $self = #_;
The latter call is a scalar assignment of an array, which resolves to the length of the array, not the first element of the array.
EDIT: <$self->{itemsFile}> and <{$self->{itemsFile}}> do not work, as perlop explains:
If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means <$x> is always a readline() from an indirect handle, but <$hash{key}> is always a glob(). That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element. Even <$x > (note the extra space) is treated as glob("$x "), not readline($x).
The openFiles piece is correct.
The errors occur primarily getItemDescription method.
First as previously mentioned my $self = #_; should be my ($self) = #_;.
However, the crux of the question is solved in the following fashion:
Change chomp(my $record = $self->{itemsFile}); to two lines:
$file1 = $self->{itemsFile};
chomp(my $record = $file1);
To clarify you must (in my experience and I tried all the solutions suggested) use a scalar value.
Finally, see the last two paragraphs in ikagami's answer.

what does these perl variables mean?

I'm a little noobish to perl coding conventions, could someone help explain:
why are there / and /< in front of perl variables?
what does\= and =~ mean, and what is the difference?
why does the code require an ending / before the ;, e.g. /start=\'([0-9]+)\'/?
The 1st 3 sub-questions were sort of solved by really the perldocs, but what does the following line means in the code?
push(#{$Start{$start}},$features);
i understand that we are pushing the $features into a #Start array but what does #$Start{$start} mean? Is it the same as:
#Start = ($start);
Within the code there is something like this:
use FileHandle;
sub open_infile {
my $file = shift;
my $in = FileHandle->new($file,"<:encoding(UTF-8)")
or die "ERROR: cannot open $file: $!\n" if ($Opt_utf8);
$in = new FileHandle("$file")
or die "ERROR: cannot open $file: $!\n" if (!$Opt_utf8);
return $in;
}
$uamf = shift #ARGV;
$uamin = open_infile($uamf);
while (<$uamin>) {
chomp;
if(/<segment /){
/start=\'([0-9]+)\'/;
/end=\'([0-9]+)\'/;
/features=\'([^\']+)\'/;
$features =~ s/annotation;//;
push(#{$Start{$start}},$features);
push(#{$End{$end}},$features);
}
}
EDITED
So after some intensive reading of the perl doc, here's somethings i've gotten
The /<segment / is a regex check that checks whether the readline
in while (<$uamin>) contains the following string: <segment.
Similarly the /start=\'([0-9]+)\'/ has nothing to to do with
instantiating any variable, it's a regex check to see whether the
readline in while (<$uamin>) contains start=\'([0-9]+)\' which
\'([0-9]+)\' refers to a numeric string.
In $features =~ s/annotation;// the =~ is use because the string
replacement was testing a regular expression match. See
What does =~ do in Perl?
Where did you see this syntax (or more to the point: have you edited stuff out of what you saw)? /foo/ represents the match operator using regular expressions, not variables. In other words, the first line is checking to see if the input string $_ contains the character sequence <segment.
The subsequent three lines essentially do nothing useful, in the sense that they run regular expression matches and then discard the results (there are side-effects, but subsequent regular expressions discard the side-effects, too).
The last line does a substitution, replacing the first occurance of the characters annotation; with the empty string in the string $features.
Run the command perldoc perlretut to learn about regex in Perl.