Operator to read a file using file handle

Operator to read a file using file handle - perl

I am trying to read a file,
while($line = $file_handle)
When I ran this code, the program hung.
I noticed to read the file using file handle, we need to used <>
while($line = <file_handle>)
The later code obviously ran.
Now I know operator <> is to read the file line by line, I want to know what exactly is happening when I dont provide <> operator?
Is it not able to find the end of the line ?or?
Thankyou

Given this code:
use warnings;
use strict;
open my $fh, '<', 'in.txt' or die $!;
while (my $x = $fh){
$DB::single=1;
print "$x\n";
}
when run in the debugger, you can see that $x now contains a GLOB (a copy of the file handle). On each iteration, $x takes the whole handle as an assignment, and causes an infinite loop, because the statement is always true. Because you're always assigning a true value (the handle), the while statement is effectively no different than writing while(1){....
perl -d script.pl
main::(x.pl:4): open my $fh, '<', 'in.txt' or die $!;
DB<1> s
main::(x.pl:6): while (my $x = $fh){
DB<1> s
main::(x.pl:7): $DB::single=1;
DB<1> x $x
0 GLOB(0xbfdae8)
-> *main::$fh
FileHandle({*main::$fh}) => fileno(6)
DB<2> s
main::(x.pl:7): $DB::single=1;
DB<2> x $x
0 GLOB(0xbfdae8)
-> *main::$fh
FileHandle({*main::$fh}) => fileno(6)
<$fh> essentially extracts a single line from the file handle, and when EOF is hit (or an error occurs), returns undef, which terminates the loop (because undef is false).

Short File is read by the <> operator so without it it is only an assignment, thus the infinite loop.
The while (...) { ... } checks the condition inside () and if true it executes the body in the block {}. It keeps doing this until the condition in () evaluates to what is understood as false (generally 0, '0', '', or undef). This is what the operator <> provides, for example, so we have an idiom
while (my $line = <$file_handle>) { ... }
The <> operator reads a line at each iteration from the resource that $file_handle is associated with and when it reaches end-of-file it returns undef so the loop terminates and the program execution continues at the next statement after the loop. The diamond operator <> is the operator form for the function readline. See I/O Operators in perlop. For all this to work $file_handle has to be a valid resource that can retrieve data.
Without the <> operator nothing is read from anywhere, but there is only an assignment. The code does the following. It copies the variable $file_handle into the variable $line. The return value of that operation in Perl is the value that ends up in $line, and if that is a 'truthy' value then the body { ... } is executed. The $file_handle clearly evaluates to 'true', otherwise the loop body would not execute even once and the program would continue. Thus $line is true, too. So if the $file_handle doesn't change in the body {...} of the loop the condition is always true.
Then whatever is in the body keeps being executed, without a reason for the loop to terminate, and it thus never returns control to the program. It's an infinite loop, and the program appears to hang.
Note that this is sometimes used deliberately and you may see code like
while (1) {
# Compute what is needed
# Recalculate the condition for when to stop
last if $condition_to_terminate;
}
This approach can be risky though, since the condition can get more and more complicated and an error could sneak in, in which case we end up with an infinite loop. There are usually clearer ways to control loops.
A completely different example is an event loop, where it is crucial to enter an infinite loop so that we can then wait for an event of some sort, at which point a particular action is taken. This is how GUI's work, for example, and a number of other systems.

For the example that 'hung':
while($line = $file_handle)
The elements $line = $file_handle is purely an assignment. At that point, your while is just checking that the assignment is truthy, i.e. that $line is not the number 0, the string 0, an empty string, or undef, which of course it's not -> hense you get an infinite loop.

Related

Meaning of if(!$_)

I have a script in Perl that is reading a file. At some point, the code utilize the following if statement inside a for loop:
for (my $i = 0; $i<10 ; $i++ ) {
$_ = <INPUT>;
if (!$_) {last;}
...
I am new in Perl, so I would like to know the meaning of !$_. In this example, $_ is a line of my file. So, what content the line should have to the if statement be true.

The if condition, what is inside (), is evaluated in a boolean scalar context to be tested for "truthiness." So if it's undef or '' (empty string) or 0 (or string "0") it's false.
That ! negates what follows it, so if (!$_) is true if $_ is false (undef or '' or 0 or "0"). However, in this case that $_ is assigned from <> operator so it'll always have a linefeed at the end -- unless the source for <> was exhausted in which case <> returns undef.
So, in this case, that if (!$_) tests for whether there is nothing more to read from INPUT, and exits the for loop with last if that is the case.
A few comments on the shown code.
That C-style for loop can also be written as for my $i (0..9), what is considered far nicer and more readable.† See foreach, and really follow links for flow-control key-words
The piece of code
$_=<INPUT>
if (!$_) { last; }
...
reads from INPUT filehandle and exits its loop (see last) once there is end-of-file. (That need not be an actual file but any resource readable via a filehandle.)
This is clumsy, to say the least; a common way of doing it is
while (<INPUT>) {
...
}
† So much so that even hard-core compiled languages now have it. The C++11 introduced the range-based for loop
for (auto var: container) ... // (really, const auto&), or auto&, or auto&&
and the standard reference linked above says
Used as a more readable equivalent to the traditional for loop [...]

Definition of empty {} after If() statement in eval{} statement

I am currently attempting to document a Perl script in preparation for converting it to .NET. I have no prior experience in Perl before now, however I was managing to get through it with a lot of Google-fu. I have run into a single line of code that has stopped me as I am unsure of what it does. I've been able to figure out most of it, but I'm missing a piece and I don't know if it's really that important. Here is the line of code:
eval { if(defined $timeEnd && defined $timeStart){}; 1 } or next;
I know that defined is checking the variables $timeEnd and $timeStart to see if they are null/nothing/undef. I also believe that the eval block is being used as a Try/Catch block to trap any exceptions. The line of code is in a foreach loop so I believe the next keyword will continue on with the next iteration of the foreach loop.
The part I'm having difficulty deciphering is the {};1 bit. My understanding is that the ; is a statement separator in Perl and since it's not escaped with a backslash, I have no idea what it is doing there. I also don't know what the {} means. I presume it has something to do with an array, but it would be an empty array and I don't know if it means something special when it is directly after an if() block. Lastly, I no idea what a single integer of 1 means and is doing there at the end of an eval block.
If someone could break that line of code down into individual parts and their definitions, I would greatly appreciate it.
Bonus: If you can give me a .NET conversion, and how each Perl bit relates to it, I will most certainly give you my internet respects. Here's how I would convert it to VB.NET with what I know now:
For each element in xmlList 'This isn't in the Perl code I posted, but it's the `foreach` loop that the code resides in.
Try
If Not IsNothing(timeEnd) AND Not IsNothing(timeStart) then
End If
Catch ex as Exception
Continue For
End Try
Next

Ignoring elsif and else clasuses, the syntax of an if statement is the following:
if (EXPR) BLOCK
The block is executed if the EXPR evaluates to something true. A block consists of a list of statements in curly braces. The {} in your code is the block of the if statement.
It's perfectly valid for blocks to be empty (to contain a list of zero statements). For example,
while (s/\s//) { }
is an inefficient way of doing
s/\s//g;
The thing is, the condition in the following has no side-effects, so it's quite useless:
if(defined $timeEnd && defined $timeStart){}
It can't even throw an exception![1] So
eval { if(defined $timeEnd && defined $timeStart){}; 1 } or next;
is equivalent to
eval { 1 } or next;
which is equivalent to[2]
1 or next;
which is equivalent to
# Nothing to see here...
Technically, it can if the variables are magical.
$ perl -MTie::Scalar -e'
our #ISA = "Tie::StdScalar";
tie(my $x, __PACKAGE__);
sub FETCH { die }
defined($x)
'
Died at -e line 4.
I doubt the intent is to check for this.
Technically, it also clears $#.

eval{} returns result of last expresion (1 in your example) or undef if there was an exception. You can write same code as,
my $ok = eval {
if (defined $timeEnd && defined $timeStart){};
1
};
$ok or next;
From perldoc -f eval
.. the value returned is the value of the last expression evaluated inside the mini-program; a return statement may be also used, just as with subroutines.
If there is a syntax error or runtime error, or a die statement is executed, eval returns undef in scalar context or an empty list in list context, and $# is set to the error message

Perl while loops not working

I'm quite new to perl and apologies if this has already been answered in a previous discussion. I have a script that needs to use the declared variables outside the loops, but only one loop is working, even though I have declared the variables outside of the loop, the code is:
my $sample;
open(IN, 'ls /*_R1_*.gz |');
while (my $sample = <IN>) {
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}"; #need to use fastq1 later on hence it's declared here
my $sample2;
open(IN, 'ls /*_R2_*.gz |');
while (my $sample2 = <IN>) {
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}"; #need to use fastq2 later on hence it's declared here
}
}
Sample2 works but sample1 does not, only the first sample is output and then the loop goes onto sample2, the output is:
sample =/sample1_R1_001.fastq.gz
sample2 =/sample1_R2_001.fastq.gz
sample2 =/sample2_R2_001.fastq.gz
sample2 =/sample3_R2_001.fastq.gz
etc..
Can anyone figure this out?
Thanks

From your comments, I assume that your problem is probably that you declare $fastq1 and $fastq2 inside the loop. That means they will be out of scope outside the loops, and not accessible. You need something like:
my ($fastq1, $fastq2);
while ( ... ) {
....
$fastq1 = $sample;
}
Note that this will only save the last value in the loop of that variable. The others will of course be overwritten each loop iteration. If you have more values to save, use an array or hash.
Some other notes on your code.
You should always use
use strict;
use warnings;
Not doing so is a very bad idea, as it will only hide the errors and warnings, not solve them.
my $sample;
You declare this variable twice.
open(IN, 'ls /*_R1_*.gz |');
This is just bad on all possible levels:
System calls are always the least desirable option, unless no alternatives exist
Perl has many ways of reading file names
Parsing the output of ls is fragile and not portable
Piping the result of the system command through open is compounding the other flaws with this approach.
Recommended solution: Use either opendir + readdir or glob:
for my $files (</*_R1_*.gz>) { ... }
# or
opendir my $dh, "/" or die $!;
while (my file = readdir $dh) {
next unless $file =~ /_R1_.*\.gz$/;
...
}
my $fastq1 = "${sample}";
You do not need to quote a variable. Nor use support curly braces.
When declaring the variable with my inside a loop, it only retains its value that single loop iteration. Since you never use this variable, I assume you meant to use it outside the loop. But it will be out of scope there.
This can be written
my $fastq1 = $sample;
But you probably want to declare those variables outside your while loops, or they will be out of scope there. You should know that this will only save the last value for these variables, of course.
Also, as Rohit says, your loops are nested, which I assume is not what you wanted. This is most likely because you do not use a proper text editor to write your code, so your indentation is all messed up, and it is hard to see where one loop ends. Follow Rohit's advice there.

You are closing the first while loop after the end of 2nd while loop. Because of that, your 2nd while loop become a part of your 1st while loop, wherein, you are re-assigning the file handler - IN to a different file. And since you are exhausting it in the inner while loop, your outer while loop never run again.
You should close the brace before starting the next while:
while(my $sample = <IN>){
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}";
} # You need this
my $sample2;
open(IN, 'ls /data_n2/vmistry/Fluidigm_Exome/300bp_fastq/*_R2_*.gz |');
while(my $sample2 = <IN>){
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}";
}
# } # Remove this

Syntax errors at line 24 and 26. I don't know why?

syntax error at bioinfo2.pl line 24, near ");"
syntax error at bioinfo2.pl line 26, near "}"
Execution of bioinfo2.pl aborted due to compilation errors.
print "Enter file name......\n\n";
chomp($samplefile = <STDIN>);
open(INFILE,"$samplefile") or die "Could not open $samplefile";
#residue_name= ();
#residue_count= ();
while($newline = <INFILE>)
{
if ($newline =~ /^ATOM/)
{
chomp $newline;
#columns = split //, $newline;
$res = join '', $columns[17], $columns[18], $columns[19];
splice #columns,0;
$flag=0
for ($i = 0; $i<scalar(#residue_name); $i++;)
{
if (#residue_name[i] == $res)
{
#residue_count[i] = #residue_count[i] + 1;
$flag=1;
}
}
if($flag==0)
{
push(#residue_name, $res);
}
for ($i = 0; $i<scalar(#residue_name); $i++)
{
print (#residue_name[i], "-------", #residue_count[i], "\n");
}
}
}

It might be advisable to use strict; use warnings. That forces you to declare your variables (you can do so with my), and rules out many possible errors.
Here are a few things that I noticed:
In Perl5 v10 and later, you can use the say function (use 5.010 or use feature 'say'). This works like print but adds a newline at the end.
Never use the two-arg form of open. This opens some security issues. Provide an explicit open mode. Also, you can use scalars as filehandles; this provides nice features like auto-closing of files.
open my $INFILE, '<', $samplefile or die "Can't open $samplefile: $!";
The $! variable contains the reason why the open failed.
If you want to retrieve a list of elements from an array, you can use a slice (multiple subscripts):
my $res = join '', #columns[17 .. 19]; # also, range operator ".."
Note that the sigil is now an #, because we take multiple elems.
The splice #columns, 0 is a fancy way of saying “delete all elements from the array, and return them”. This is not neccessary (you don't read from that variable later). If you use lexical variables (declared with my), then each iteration of the while loop will receive a new variable. If you really want to remove the contents, you can undef #columns. This should be more efficient.
Actual error: You require a semicolon after $flag = 0 to terminate the statement before you can begin a loop.
Actual error: A C-style for-loop contains three expressions contained in parens. Your last semicolon divides them into 4 expressions, this is an error. Simply remove it, or look at my next tip:
C-style loops (for (foo; bar; baz) {}) are painful and error-prone. If you only iterate over a range (e.g. of indices), then you can use the range operator:
for my $i (0 .. $#residue_name) { ... }
The $# sigil gives the last index of an array.
When subscripting arrays (accessing array elements), then you have to include the sigil of the index:
$residue_name[$i]
Note that the sigil of the array is $, because we access only one element.
The pattern $var = $var + 1 can be shortened to $var++. This uses the increment operator.
The $flag == 0 could be abbreviated to !$flag, as all numbers except zero are considered true.
Here is a reimplementation of the script. It takes the filename as a command line argument; this is more flexible than prompting the user.
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my #residue_name;
my #residue_count;
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
my $push_number = 1; # self-documenting variable names
for my $i (0 .. $#residue_name) {
if ($residue_name[$i] == $number) {
$residue_count[$i]++;
$push_number = 0;
}
}
push #residue_name, $number if $push_number;
# are you sure you want to print this after every input line?
# I'd rather put this outside the loop.
for my $i (0 .. $#residue_name) {
say $residue_name[$i], ("-" x 7), $residue_count[$i]; # "x" repetition operator
}
}
And here is an implementation that may be faster for large input files: We use hashes (lookup tables), instead of looping through arrays:
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my %count_residue; # this hash maps the numbers to counts
# automatically guarantees that every number has one count only
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
if (exists $count_residue{$number}) {
# if we already have an entry for that number, we increment:
$count_residue{$number}++;
} else {
# We add the entry, and initialize to zero
$count_residue{$number} = 0;
}
# The above if/else initializes new numbers (seen once) to zero.
# If you want to count starting with one, replace the whole if/else by
# $count_residue{$number}++;
# print out all registered residues in numerically ascending order.
# If you want to sort them by their count, descending, then use
# sort { $count_residue{$b} <=> $count_residue{$a} } ...
for my $num (sort {$a <=> $b} keys %count_residue) {
say $num, ("-" x 7), $count_residue{$num};
}
}

It took me a while to chance down all the various errors. As others have said, use use warnings; and use strict;
Rule #1: Whenever you see syntax error pointing to a perfectly good line, you should always see if the line before is missing a semicolon. You forgot the semicolon after $flag=0.
In order to track down all the issues, I've rewritten your code into a more modern syntax:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
print "Enter file name......\n\n";
chomp (my $samplefile = <STDIN>);
open my $input_file, '<:crlf', $samplefile;
my #residue_name;
my #residue_count;
while ( my $newline = <$input_file> ) {
chomp $newline;
next if $newline !~ /^ATOM/; #Eliminates the internal `if`
my #columns = split //, $newline;
my $res = join '', $columns[17], $columns[18], $columns[19];
my $flag = 0;
for my $i (0..$#residue_name) {
if ( $residue_name[$i] == $res ) {
$residue_count[$i]++;
$flag = 1;
}
}
if ( $flag == 0 ) {
push #residue_name, $res;
}
for my $i (0..$#residue_name) {
print "$residue_name[$i] ------- $residue_count[$i]\n";
}
}
close $input_file;
Here's a list of changes:
Lines 2 & 3: Always use use strict; and use warnings;. These will help you track down about 90% of your program errors.
Line 4: Use use autodie;. This will eliminate the need for checking whether a file opened or not.
Line 7 (and others): Using use strict; requires you to predeclare variables. Thus, you'll see my whenever a variable is first used.
Line 8: Use the three parameter open and use local variables for file handles instead of globs (i.e. $file_handle vs. FILE_HANDLE). The main reasons is that local variables are easier to pass into subroutines than globs.
Lines 9 & 10: No need to initialize the arrays, just declare them is enough.
Line 13: Always chomp as soon as you read in.
Line 14: Doing this eliminates an entire inner if statement that's embraces your entire while loop. Code blocks (such as if, while, and for) get hard to figure out when they get too long and too many embedded inside each other. Using next in this way allows me to eliminate the if block.
Line 17: Here's where you missed the semicolon which gave you your first syntax error. The main thing is I eliminated the very confusing splice command. If you want to zero out your array, you could have simply said #columns = (); which is much clearer. However, since #columns is now in scope only in the while loop, I no longer have to blank it out since it will be redefined for each line of your file.
Line 18: This is a much cleaner way of looping through all lines of your array. Note that $#residue_name gives you the last index of $#residue_name while scalar #resudue_name gives you the number of elements. This is a very important distinction! If I have an #array = (0, 1, 2, 3, 4), $#array will be 4, but scalar #array will be 5. Using the C style for loop can be a bit confusing when doing this. Should you use > or >=? Using (0..$#residue) name is obvious and eliminate the chance of errors which included the extra semi-colon inside your C style for statement. Because of the chance of errors and the complexity of the syntax, The developers who created Python have decided not allow for C style for loops.
Line 19 (and others): Using warnings pointed out that you did #residue_name[i] and it had several issues. First of all, you should use $residue_name[...] when indexing an array, and second of all, i is not an integer. You meant $i. Thus #residue_name[i] becomes $residue_name[$i].
Line 20: If you're incrementing a variable, use $foo++; or $foo += 1; and not $foo = $foo + 1;. The first two make it easier to see that you're incrementing a variable and not recalculating it's value.
Line 29: One of the great features of Perl is that variables can be interpolated inside quotes. You can put everything inside a single set of quotes. By the way, you should use . and not , if you do break up a print statement into multiple pieces. The , is a list operation. This means that what you print out is dependent upon the value of $,. The $, is a Perl variable that says what to print out between each item of a list when you interpolate a list into a string.
Please don't take this as criticism of your coding abilities. Many Perl books that teach Perl, and many course that teach Perl seem to teach Perl as it was back in the Perl 3.0 days. When I first learned Perl, it was at Perl 3.0, and much of my syntax would have looked like yours. However, Perl 5.x has been out for quite a while and contains many features that made programming easier and cleaner to read.
It took me a while to get out of Perl 3.0 habits and into Perl 4.0 and later Perl 5.0 habits. You learn by looking at what others do, and asking questions on forums like Stack Overflow.
I still can't say your code will work. I don't have your input, so I can't test it against that. However, by using this code as the basis of your program, debugging these errors should be pretty easy.

How can I check for eof in Perl?

So I have a bit of problem figuring what Perl does in the following case:
while(1){
$inputLine=<STDIN>
#parse $inputLine below
#BUT FIRST, I need to check if $inputLine = EOF
}
before I get the obvious answer of using while(<>){}, let me say that there is a very strong reason that I have to do the above (basically setting up an alarm to interrupt blocking and I didnt want that code to clutter the example).
Is there someway to compare $inputLine == undef (as I think that is what STDIN returns at the end).
Thanks.

Inside your loop, use
last unless defined $inputLine;
From the perlfunc documentation on defined:
defined EXPR
defined
Returns a Boolean value telling whether EXPR has a value other than the undefined value undef. If EXPR is not present, $_ will be checked.
Many operations return undef to indicate failure, end of file, system error, uninitialized variable, and other exceptional conditions. This function allows you to distinguish undef from other values. (A simple Boolean test will not distinguish among undef, zero, the empty string, and "0", which are all equally false.) Note that since undef is a valid scalar, its presence doesn't necessarily indicate an exceptional condition: pop returns undef when its argument is an empty array, or when the element to return happens to be undef.

defined($inputLine)
Also, see the 4 argument version of the select function for an alternative way to read from a filehandle without blocking.

You can use eof on the filehandle. eof will return 1 if the next read on FILEHANDLE is an EOF.

The following will have problems with input files that have lines which only have a line feed or as in the case that was giving me problems a FF at the beginning of some lines (Form Feed - the file was the output from a program developed at the end of the 70s and still has formatting for a line printer and is still in FORTRAN - I do miss the wide continous paper for drawing flow diagrams on the back).
open (SIMFIL, "<", 'InputFileName') or die "Can´t open InputFileName\n" ;
open (EXTRDATS, ">>", 'OutputFileName' ) or die "Can´t open OutputFileName\n";
$Simfilline = "";
while (<SIMFIL>) {
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
}
close SIMFIL;
close EXTRDATS;
`
The following is when eof comes in handy - the expression: "while ()" can return false under conditions other than the end of the file.
open (SIMFIL, "<", 'InputFileName') or die "Can´t open InputFileName\n" ;
open (EXTRDATS, ">>", 'OutputFileName' ) or die "Can´t open OutputFileName\n";
$Simfilline = "";
while (!eof SIMFIL) {
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
$Simfilline = <SIMFIL>;
print EXTRDATS $Simfilline;
}
close SIMFIL;
close EXTRDATS;
This last code fragment appears to duplicate the input file exactly.