Passing Arguments Without Commas and With Produces Different Results - perl

I'm trying to figure out why these two lines produce different results:
print($fh, "text"); --> 0x10101010 text (on STDOUT)
print($fh "text"); --> text (inside of file $fh)
When I have the comma I understand I create a list and when print only has a list it prints the list to STDOUT.
But, what is print doing when I don't have a comma? The result I want is the one I get without a comma.
This is strange to me and counters me expecting the one with the comma to work for my intended purpose. Code I usually see does filehandle printing with a line like "print $file "text"", but I want to use the parentheses as I find that more consistent with other languages. But, not putting a comma is just as inconsistent.
An explanation of the internals of "print" might help me understand. How is it getting the FILEHANDLE and LIST separate when there is no comma?
Docs: http://perldoc.perl.org/functions/print.html
Thanks!

print isn't a normal function, and you shouldn't call it with the parentheses because you're not really passing a parameter list to the function.
The way I typically it written is
print {$fh} 'text';
print {$fh} 'text1', 'text2';
or not going to a file:
print 'text';
print 'text1', 'text2';
You ask "How is it getting the FILEHANDLE and LIST separate when there is no comma?" and the answer is "Magic, because it's not a normal function."

In Perl, parens are mostly just used for precedence. It is customary to call builtins like print without parens – this emphasizes that they aren't subroutines, but special syntax like for, map, split, or my.
In your case, you have a variety of possibilities:
Leave out the comma, but this is error-prone:
print($fh #list);
print $fh (#list);
Use curly braces around the file handle (which I would suggest anyway):
print { $fh } (#list);
print({ $fh } #list);
Use the object-oriented interface:
use IO::File; # on older perls
$fh->print(#list);

Related

Reading a file line by line in Perl

I want to read a file by one line, but it's reading just the first line. How to read all lines?
My code:
open(file_E, $file_E);
while ( <file_E> ) {
/([^\n]*)/;
print $line1;
}
close($file_E);
Let's start by looking at your code.
open(file_E, $file_E);
while ( <file_E> ) {
/([^\n]*)/;
print $line1;
}
close($file_E);
On the first line you open a file named in $file_E using the bareword filehandle file_E. This should work so long as the file successfully opens. It would be better to also check the success of this operation one of two ways: Either put use autodie; at the top of your script (but then risk applying its semantics in places where your code is incompatible with this level of error handling), or change your open to look like this:
open(file_E, $file_E) or die "Failed to open $file_E: $!\n";
Now if you fail to open the file you will get an error message that will help track down the problem.
Next lets look at the while loop, because it's here where you have an issue that is causing the bug you are experiencing. On the first line of the while loop you have this:
while ( <file_E> ) {
By consulting perldoc perlsyn you will see that line is special-cased to actually do this:
while (defined($_ = <file_E>)) {
So your code is implicitly assigning each line to $_ on successive iterations. Also by consulting perldoc perlop you'll find that when the match operator (/.../ or m/.../) is invoked without binding the match explicitly using =~, the match will bind against $_. Still then, so far so good. However, you are not actually doing anything useful with the match. The match operator will return Boolean truth / falsehood for whether or not the match succeeded. And because your pattern contains capturing parenthesis, it will capture something into the capture variable $1. But you are never testing for match success, nor are you ever referring to $1 again.
On the line that follows, you do this: print $line1. Where, in your code, is $line1 being assigned a value? Because it is never being assigned a value in what you've shown us.
I can only guess that your intent is to iterate over the lines of the file, capture the line but without the trailing newline, and then print it. It seems that you wish to print it without any newlines, so that all of the input file is printed as a single line of output.
open my $input_fh_e, '<', $file_E or die "Failed to open $file_E: $!\n";
while(my $line = <$input_fh_e>) {
chomp $line;
print $line;
}
close $input_fh_e or die "Failed to close $file_E: $!\n";
No need to capture anything -- if all that the capture is doing is just grabbing everything up to the newline, you can simply strip off the newline with chomp to begin with.
In my example I used a lexical filehandle (a file handle that is lexically scoped, declared with my). This is generally a better practice in modern Perl as it avoids using a bareword, avoids possible namespace collisions, and assures that the handle will get closed as soon as the lexical scope closes.
I also used the 'three arg' version of open, which is safer because it eliminates the potential for $file_E to be used to open a pipe or do some other nefarious or simply unintended shell manipulation.
I suggest also starting your script with use strict;, because had you done so, you would have gotten an error message at compiletime telling you that $line1 was never declared. Also start your script with use warnings, so that you would get a warning when you try to print $line1 before assigning a value to it.
Most of the issues in your code will be discussed in perldoc perlintro, which you can arrive at from your command line simply by typing perldoc perlintro, assuming you have Perl installed. It typically takes 20-40 minutes to read through perlintro. If ever there were a document that should constitute required reading before getting started writing Perl code, that reading would probably include perlintro.
Another alternative, note that $_ will include newline so you will need to chomp it if you don't want the newline in $line:
open(file_E, $file_E);
while ( <file_E> ) {
my $line = $_;
print $line;
}
close($file_E);

Function parameter separator in Perl?

Function parameters are usually separated by comma(,), but seems also by space in some cases, like print FILEHANDLE 'string'. Why are both of such separators necessary?
print is special builtin Perl function with special syntax rules. Documentation lists four possible invocations:
print FILEHANDLE LIST
print FILEHANDLE
print LIST
print
So while in general function arguments are separated by comma (,), this is an exception that disambiguates print destination from contents to be printed. Another function that exhibit this behavior is system.
Other feature that shares this syntax is called "Indirect Object Notation". The expression in form:
function object arg1, arg2, … angn
# ^- no comma here
Is equivalent to:
object->function(arg1, arg2, … argn)
So that following statement pairs are equivalent:
$foo = new Bar;
$foo = Bar->new;
The Indirect Object Notation has several problems and generally should be avoided, save for few well-known idioms such as print F "something"
In that example, because print takes a list of arguments and has a default. So you need to tell the difference between printing a scalar, and printing to a filehandle.
The interpreter can tell the difference between:
print $fh "some scalar text";
print $bar, "set to a value ","\n";
print $fh;
print $bar;
But this is rather a special case - most functions don't work like that. I normally suggest for print, that surrounding the filehandle arg in braces differentiates.
You can look at prototypes as a way to get perl to do things with parameters, but I'd also normally suggest that makes for less clear code

Why is the perl print filehandle syntax the way it is?

I am wondering why the perl creators chose an unusual syntax for printing to a filehandle:
print filehandle list
with no comma after filehandle. I see that it's to distinguish between "print list" and "print filehandle list", but why was the ad-hoc syntax preferred over creating two functions - one to print to stdout and one to print to given filehandle?
In my searches, I came across the explanation that this is an indirect object syntax, but didn't the print function exist in perl 4 and before, whereas the object-oriented features came into perl relatively late? Is anyone familiar with the history of print in perl?
Since the comma is already used as the list constructor, you can't use it to separate semantically different arguments to print.
open my $fh, ...;
print $fh, $foo, $bar
would just look like you were trying to print the values of 3 variables. There's no way for the parser, which operates at compile time, to tell that $fh is going to refer to a file handle at run time. So you need a different character to syntactically (not semantically) distinguish between the optional file handle and the values to actually print to that file handle.
At this point, it's no more work for the parser to recognize that the first argument is separated from the second argument by blank space than it would be if it were separated by any other character.
If Perl had used the comma to make print look more like a function, the filehandle would always have to be included if you are including anything to print besides $_. That is the way functions work: If you pass in a second parameter, the first parameter must also be included. There isn't one function I can think of in Perl where the first parameter is optional when the second parameter exists. Take a look at split. It can be written using zero to four parameters. However, if you want to specify a <limit>, you have to specify the first three parameters too.
If you look at other languages, they all include two different ways ways to print: One if you want STDOUT, and another if you're printing to something besides STDOUT. Thus, Python has both print and write. C has both printf and fprintf. However, Perl can do this with just a single statement.
Let's look at the print statement a bit more closely -- thinking back to 1987 when Perl was first written.
You can think of the print syntax as really being:
print <filehandle> <list_to_print>
To print to OUTFILE, you would say:
To print to this file, you would say:
print OUTFILE "This is being printed to myfile.txt\n";
The syntax is almost English like (PRINT to OUTFILE the string "This is being printed to myfile.txt\n"
You can also do the same with thing with STDOUT:
print STDOUT "This is being printed to your console";
print STDOUT " unless you redirected the output.\n";
As a shortcut, if the filehandle was not given, it would print to STDOUT or whatever filehandle the select was set to.
print "This is being printed to your console";
print " unless you redirected the output.\n";
select OUTFILE;
print "This is being printed to whatever the filehandle OUTFILE is pointing to\n";
Now, we see the thinking behind this syntax.
Imagine I have a program that normally prints to the console. However, my boss now wants some of that output printed to various files when required instead of STDOUT. In Perl, I could easily add a few select statements, and my problems will be solved. In Python, Java, or C, I would have to modify each of my print statements, and either have some logic to use a file write to STDOUT (which may involve some conniptions in file opening and dupping to STDOUT.
Remember that Perl wasn't written to be a full fledge language. It was written to do the quick and dirty job of parsing text files more easily and flexibly than awk did. Over the years, people used it because of its flexibility and new concepts were added on top of the old ones. For example, before Perl 5, there was no such things as references which meant there was no such thing as object oriented programming. If we, back in the days of Perl 3 or Perl 4 needed something more complex than the simple list, hash, scalar variable, we had to munge it ourselves. It's not like complex data structures were unheard of. C had struct since its initial beginnings. Heck, even Pascal had the concept with records back in 1969 when people thought bellbottoms were cool. (We plead insanity. We were all on drugs.) However, since neither Bourne shell nor awk had complex data structures, so why would Perl need them?
Answer to "why" is probably subjective and something close to "Larry liked it".
Do note however, that indirect object notation is not a feature of print, but a general notation that can be used with any object or class and method. For example with LWP::UserAgent.
use strict;
use warnings;
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $response = get $ua "http://www.google.com";
my $response_content = decoded_content $response;
print $response_content;
Any time you write method object, it means exactly the same as object->method. Note also that parser seems to only reliably work as long as you don't nest such notations or do not use complex expressions to get object, so unless you want to have lots of fun with brackets and quoting, I'd recommend against using it anywhere except common cases of print, close and rest of IO methods.
Why not? it's concise and it works, in perl's DWIM spirit.
Most likely it's that way because Larry Wall liked it that way.

How to tell perl to print to a file handle instead of printing the file handle?

I'm trying to wrap my head around the way Perl handles the parsing of arguments to print.
Why does this
print $fh $stufftowrite
write to the file handle as expected, but
print($fh, $stufftowrite)
writes the file handle to STDOUT instead?
My guess is that it has something to do with the warning in the documentation of print:
Be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print; put parentheses around all arguments (or interpose a + , but that doesn't look as good).
Should I just get used to the first form (which just doesn't seem right to me, coming from languages that all use parentheses around function arguments), or is there a way to tell Perl to do what I want?
So far I've tried a lot of combination of parentheses around the first, second and both parameters, without success.
On lists
The structure bareword (LIST1), LIST2 means "apply the function bareword to the arguments LIST1", while bareword +(LIST1), LIST2 can, but doesn't neccessarily mean "apply bareword to the arguments of the combined list LIST1, LIST2". This is important for grouping arguments:
my ($a, $b, $c) = (0..2);
print ($a or $b), $c; # print $b
print +($a or $b), $c; # print $b, $c
The prefix + can also be used to distinguish hashrefs from blocks, and functions from barewords, e.g. when subscripting an hash: $hash{shift} returns the shift element, while $hash{+shift} calls the function shift and returns the hash element of the value of shift.
Indirect syntax
In object oriented Perl, you normally call methods on an object with the arrow syntax:
$object->method(LIST); # call `method` on `$object` with args `LIST`.
However, it is possible, but not recommended, to use an indirect notation that puts the verb first:
method $object (LIST); # the same, but stupid.
Because classes are just instances of themselves (in a syntactic sense), you can also call methods on them. This is why
new Class (ARGS); # bad style, but pretty
is the same as
Class->new(ARGS); # good style, but ugly
However, this can sometimes confuse the parser, so indirect style is not recommended.
But it does hint on what print does:
print $fh ARGS
is the same as
$fh->print(ARGS)
Indeed, the filehandle $fh is treated as an object of the class IO::Handle.
(While this is a valid syntactic explanation, it is not quite true. The source of IO::Handle itself uses the line print $this #_;. The print function is just defined this way.)
Looks like you have a typo. You have put a comma between the file handle and the argument in the second print statement. If you do that, the file handle will be seen as an argument. This seems to apply only to lexical file handles. If done with a global file handle, it will produce the fatal error
No comma allowed after filehandle at ...
So, to be clear, if you absolutely have to have parentheses for your print, do this:
print($fh $stufftowrite)
Although personally I prefer to not use parentheses unless I have to, as they just add clutter.
Modern Perl book states in the Chapter 11 ("What to Avoid"), section "Indirect Notation Scalar Limitations":
Another danger of the syntax is that the parser expects a single scalar expression as the object. Printing to a filehandle stored in an aggregate variable seems obvious, but it is not:
# DOES NOT WORK AS WRITTEN
say $config->{output} 'Fun diagnostic message!';
Perl will attempt to call say on the $config object.
print, close, and say—all builtins which operate on filehandles—operate in an indirect fashion. This was fine when filehandles were package globals, but lexical filehandles (Filehandle References) make the indirect object syntax problems obvious. To solve this, disambiguate the subexpression which produces the intended invocant:
say {$config->{output}} 'Fun diagnostic message!';
Of course, print({$fh} $stufftowrite) is also possible.
It's how the syntax of print is defined. It's really that simple. There's kind of nothing to fix. If you put a comma between the file handle and the rest of the arguments, the expression is parsed as print LIST rather than print FILEHANDLE LIST. Yes, that looks really weird. It is really weird.
The way not to get parsed as print LIST is to supply an expression that can legally be parsed as print FILEHANDLE LIST. If what you're trying to do is get parentheses around the arguments to print to make it look more like an ordinary function call, you can say
print($fh $stufftowrite); # note the lack of comma
You can also say
(print $fh $stufftowrite);
if what you're trying to do is set off the print expression from surrounding code. The key point is that including the comma changes the parse.

How do I print on a single line all content between certain start- and stop-lines?

while(<FILE>)
{
chomp $_;
$line[$i]=$_;
++$i;
}
for($j=0;$j<$i;++$j)
{
if($line[$j]=~/Syn_Name/)
{
do
{
print OUT $line[$j],"\n";
++$j;
}
until($line[$j]=~/^\s*$/)
}
}
This is my code I am trying to print data between Syn_Name and a blank line.
My code extracts the chunk that I need.
But the data between the chunk is printed line by line. I want the data for each chunk to get printed on a single line.
Simplification of your code. Using the flip-flop operator to control the print. Note that printing the final line will not add a newline (unless the line contained more than one newline). At best, it prints the empty string. At worst, it prints whitespace.
You do not need a transition array for the lines, you can use a while loop. In case you want to store the lines anyway, I added a commented line with how that is best done.
#chomp(my #line = <FILE>);
while (<FILE>) {
chomp;
if(/Syn_Name/ .. /^\s*$/) {
print OUT;
print "\n" if /^\s*$/;
}
}
Contents
Idiomatic Perl
Make errors easier to fix
Warnings about common programming errors
Don't execute unless variable names are consistent
Developing this habit will save you lots of time
Perl's range operator
Working demos
Print chomped lines immediately
Join lines with spaces
One more edge case
Idiomatic Perl
You seem to have a background with the C family of languages. This is fine because it gets the job done, but you can let Perl handle the machinery for you, namely
chomp defaults to $_ (also true with many other Perl operators)
push adds an element to the end of an array
to simplify your first loop:
while (<FILE>)
{
chomp;
push #line, $_;
}
Now you don't have update $i to keep track of how many lines you've already added to the array.
On the second loop, instead of using a C-style for loop, use a foreach loop:
The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn …
The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing for comes more naturally.) If VAR is omitted, $_ is set to each value.
This way, Perl handles the bookkeeping for you.
for (#line)
{
# $_ is the current element of #line
...
}
Make errors easier to fix
Sometimes Perl can be too accommodating. Say in the second loop you made an easy typographical error:
for (#lines)
Running your program now produces no output at all, even if the input contains Syn_Name chunks.
A human can look at the code and see that you probably intended to process the array you just created and pluralized the name of the array by mistake. Perl, being eager to help, creates a new empty #lines array, which leaves your foreach loop with nothing to do.
You may delete the spurious s at the end of the array's name but still have a program produces no output! For example, you may have an unhandled combination of inputs that doesn't open the OUT filehandle.
Perl has a couple of easy ways to spare you these (and more!) kinds of frustration from dealing with silent failures.
Warnings about common programming errors
You can turn on an enormous list of warnings that help diagnose common programming problems. With my imagined buggy version of your code, Perl could have told you
Name "main::lines" used only once: possible typo at ./synname line 16.
and after fixing the typo in the array name
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
Right away, you see valuable information that may be difficult or at least tedious to spot unaided:
variable names are inconsistent, and
the program is trying to produce output but needs a little more plumbing.
Don't execute unless variable names are consistent
Notice that even with the potential problems above, Perl tried to execute anyway. With some classes of problems such as the variable-naming inconsistency, you may prefer that Perl not execute your program but stop and make you fix it first. You can tell Perl to be strict about variables:
This generates a compile-time error if you access a variable that wasn't declared via our or use vars, localized via my, or wasn't fully qualified.
The tradeoff is you have to be explicit about which variables you intend to be part of your program instead of allowing them to conveniently spring to life upon first use. Before the first loop, you would declare
my #line;
to express your intent. Then with the bug of a mistakenly pluralized array name, Perl fails with
Global symbol "#lines" requires explicit package name at ./synname line 16.
Execution of ./synname aborted due to compilation errors.
and you know exactly which line contains the error.
Developing this habit will save you lots of time
I begin almost every non-trivial Perl program I write with
#! /usr/bin/env perl
use strict;
use warnings;
The first is the shebang line, an ordinary comment as far as Perl is concerned. The use lines enable the strict pragma and the warnings pragma.
Not wanting to be a strict-zombie, as Mark Dominus chided, I'll point out that use strict; as above with no option makes Perl strict in dealing with three error-prone areas:
strict vars, as described above;
strict refs, disallows use of symbolic references; and
strict subs, requires the programmer to be more careful in referring to subroutines.
This is a highly useful default. See the strict pragma's documentation for more details.
Perl's range operator
The perlop documentation describes .., Perl's range operator, that can help you greatly simplify the logic in your second loop:
In scalar context, .. returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each .. operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated.
In your question, you wrote that you want “data between Syn_Name and a blank line,” which in Perl is spelled
/Syn_Name/ .. /^\s*$/
In your case, you also want to do something special at the end of the range, and .. provides for that case too, ibid.
The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint.
Assigning the value returned from .. (which I usually do to a scalar named $inside or $is_inside) allows you to check whether you're at the end, e.g.,
my $is_inside = /Syn_Name/ .. /^\s*$/;
if ($is_inside =~ /E0$/) {
...
}
Writing it this way also avoids duplicating the code for your terminating condition (the right-hand operand of ..). This way if you need to change the logic, you change it in only one place. When you have to remember, you'll forget sometimes and create bugs.
Working demos
See below for code you can copy-and-paste to get working programs. For demo purposes, they read input from the built-in DATA filehandle and write output to STDOUT. Writing it this way means you can transfer my code into yours with little or no modification.
Print chomped lines immediately
As defined in your question, there's no need for one loop to collect the lines in a temporary array and then another loop to process the array. Consider the following code
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
while (<FILE>)
{
chomp;
if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
my $is_last = $is_inside =~ /E0$/;
print OUT $_, $is_last ? "\n" : ();
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
baz
ERROR IF PRESENT IN OUTPUT!
whose output is
Syn_Namefoobarbaz
We always print the current line, stored in $_. When we're at the end of the range, that is, when $is_last is true, we also print a newline. When $is_last is false, the empty list in the other branch of the ternary operator is the result—meaning we print $_ only, no newline.
Join lines with spaces
You didn't show us an example input, so I wonder whether you really want to butt the lines together rather than joining them with spaces. If you want the latter behavior, then the program becomes
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
my #lines;
while (<FILE>)
{
chomp;
if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
push #lines, $_;
if ($is_inside =~ /E0$/) {
print OUT join(" ", #lines), "\n";
#lines = ();
}
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
baz
ERROR IF PRESENT IN OUTPUT!
This code accumulates in #lines only those lines within a Syn_Name chunk, prints the chunk, and clears out #lines when we see the terminator. The output is now
Syn_Name foo bar baz
One more edge case
Finally, what happens if we see Syn_Name at the end of the file but without a terminating blank line? That may be impossible with your data, but in case you need to handle it, you'll want to use Perl's eof operator.
eof FILEHANDLE
eof
Returns 1 if the next read on FILEHANDLE will return end of file or if FILEHANDLE is not open … An eof without an argument uses the last file read.
So we terminate on either a blank line or end of file.
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
my #lines;
while (<FILE>)
{
s/\s+$//;
#if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
if (my $is_inside = /Syn_Name/ .. /^\s*$/ || eof) {
push #lines, $_;
if ($is_inside =~ /E0$/) {
print OUT join(" ", #lines), "\n";
#lines = ();
}
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
YOU CANT SEE ME!
Syn_Name
quux
potrzebie
Output:
Syn_Name foo bar
Syn_Name quux potrzebie
Here instead of chomp, the code removes any trailing invisible whitespace at the ends of lines. This will make sure spacing between joined lines is uniform even if the input is a little sloppy.
Without the eof check, the program does not print the latter line, which you can see by commenting out the active conditional and uncommenting the other.
Another simplified version:
foreach (grep {chomp; /Syn_Name/ .. /^\s*$/ } <FILE>) {
print OUT;
print OUT "\n" if /^\s*$/;
}