Why can't I have a literal list slice right after a print in Perl? - perl

I see I can do something like this:
print STDOUT (split /\./, 'www.stackoverflow.com')[1];
and "stackoverflow" is printed. However, this:
print +(split /\./, 'www.stackoverflow.com')[1];
does the same, and this:
print (split /\./, 'www.stackoverflow.com')[1];
is a syntax error. So what exactly is going on here? I've always understood the unary plus sign to do nothing whatsoever in any context. And if "print FILEHANDLE EXPR" works, I would have imagined that "print EXPR" would always work equally well. Any insights?

You do not have warnings enabled. In the print(...)[1] case, the set of parentheses are regarded as part of the function syntax.
print (...) interpreted as function at C:\Temp\t.pl line 4.
From, perldoc -f print:
Also be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print—interpose a + or put parentheses around all the arguments.
See also Why aren't newlines being printed in this Perl code?

perldoc for print includes this nugget:
Also be careful not to follow the print keyword with
a left parenthesis unless you want the corresponding right
parenthesis to terminate the arguments to the print--interpose
a "+" or put parentheses around all the arguments.
print always evaluates its arguments in LIST context.
To say
print (split /\./, 'www.stackoverflow.com')
is ok. But when you say
print (split /\./, 'www.stackoverflow.com')[0]
the parser expects a LIST after it sees the first (, and considers the LIST to be complete when it sees the closing ). The [0] is not interpreted as operating on anything, so you get a syntax error.
print "abc","def"; # prints "abcdef"
print ("abc","def"); # prints "abcdef"
print ("abc"), "def"; # prints "abc"
Other Perl functions that can take a LIST as the first argument behave the same way:
warn ($message),"\n" # \n not passed to warn, line # info not suppressed
system ("echo"),"x" # not same as system("echo","x") or system "echo","x"
# or system(("echo"),"x")

Related

Why can't I concatenate and print the value returned by a function?

I have a Perl program that calls a function, in this case ref, and checks the result. Specifically, I am testing that a variable is a hash reference. In that case, ref will return 'HASH'. I tested it and it worked.
Then I decided to log it, adding a print that displays the result of the same call, but it didn't work correctly. Here is a reduced version:
use strict;
use warnings;
my $book_ref = {};
$book_ref->{'title'} = 'The Lord of the Rings';
if (ref $book_ref eq 'HASH') {
print "ref \$book_ref is a " . ref $book_ref . "\n";
}
print "Program is over\n";
To my surprise, this was the output:
ref $book_ref is a Program is over
And despite using strict and warnings there were neither errors nor warnings.
The call to ref is exactly the same (it's a copy and paste), but while it works correctly inside the if condition, print doesn't display anything, and actually seems to be interrupted, as the newline character is clearly skipped. Why does the behaviour change?
The reason is that the function ref is called without parentheses, and this causes the line to be parsed incorrectly.
When ref is called inside the if, the condition is clearly delimited by parentheses, which means that ref knows perfectly well what its argument is: $book_ref. There's no ambiguity.
Instead, when printing the result, the lack of parentheses means that Perl will parse the line in a way that was not intended:
First it will concatenate $book_ref and "\n". In scalar context, $book_ref evaluates to a string like HASH(0x1cbef70), therefore the result is the string "HASH(0x1cbef70)\n"
Then, ref will be called on the string "HASH(0x1cbef70)\n", producing as output an empty string: ''.
At this point, print prints the empty string, that is, nothing, and stops there. The newline character \n is skipped because it has already been consumed by ref, so print doesn't even see it. And there are no errors.
All of this descends from Perl's operator precedence: from the table,
(...)
8. left + - .
9. left << >>
10. nonassoc named unary operators
11. nonassoc < > <= >= lt gt le ge
12. nonassoc == != <=> eq ne cmp ~~
(...)
where the "function call" is actually a "named unary operator" (the unary operator being ref). So the . operator at line 8 has higher precedence than the function call at line 10, which is why the result of print is not the expected one. On the other hand, the function call has higher precedence than eq (at line 12), which is why inside the if everything works as expected.
The solution to precedence problems is to use .
A possibility is to use parentheses to emphasize the function call:
print "ref \$book_ref is a " . ref($book_ref) . "\n";
Another one, which I like less but nevertheless works, is to use parentheses to isolate the string that must be concatenated, by putting the opening bracket just before ref:
print "ref \$book_ref is a " . (ref $book_ref) . "\n";
Another possible approach, suggested by zdim in a comment, is to use commas:
print "ref \$book_ref is a ", ref $book_ref, "\n";
When I first wrote the if I decided to avoid the parentheses to make the code more readable. Then I copied it and didn't notice the problem. I ended up wasting 2 hours to find the bug.
The best solution seems to be the first one, because if you copy it to another place (like another print) you are guaranteed to also copy the parentheses that prevent the problem. With the second one I probably wouldn't realize how important the parentheses are and wouldn't copy them. And the third one works only if you remember that you have to always use commas and not dots, which is not obvious and therefore error prone. So, although they work, I consider them less safe.
Other comments have also suggested to use printf, which requires dealing with format specifiers, or expression interpolation, like print "ref \$book_ref is a ${\ ref $book_ref }\n";, which I find harder to read.
Bottom line: always use the parentheses.

Passing Arguments Without Commas and With Produces Different Results

I'm trying to figure out why these two lines produce different results:
print($fh, "text"); --> 0x10101010 text (on STDOUT)
print($fh "text"); --> text (inside of file $fh)
When I have the comma I understand I create a list and when print only has a list it prints the list to STDOUT.
But, what is print doing when I don't have a comma? The result I want is the one I get without a comma.
This is strange to me and counters me expecting the one with the comma to work for my intended purpose. Code I usually see does filehandle printing with a line like "print $file "text"", but I want to use the parentheses as I find that more consistent with other languages. But, not putting a comma is just as inconsistent.
An explanation of the internals of "print" might help me understand. How is it getting the FILEHANDLE and LIST separate when there is no comma?
Docs: http://perldoc.perl.org/functions/print.html
Thanks!
print isn't a normal function, and you shouldn't call it with the parentheses because you're not really passing a parameter list to the function.
The way I typically it written is
print {$fh} 'text';
print {$fh} 'text1', 'text2';
or not going to a file:
print 'text';
print 'text1', 'text2';
You ask "How is it getting the FILEHANDLE and LIST separate when there is no comma?" and the answer is "Magic, because it's not a normal function."
In Perl, parens are mostly just used for precedence. It is customary to call builtins like print without parens – this emphasizes that they aren't subroutines, but special syntax like for, map, split, or my.
In your case, you have a variety of possibilities:
Leave out the comma, but this is error-prone:
print($fh #list);
print $fh (#list);
Use curly braces around the file handle (which I would suggest anyway):
print { $fh } (#list);
print({ $fh } #list);
Use the object-oriented interface:
use IO::File; # on older perls
$fh->print(#list);

How do I print on a single line all content between certain start- and stop-lines?

while(<FILE>)
{
chomp $_;
$line[$i]=$_;
++$i;
}
for($j=0;$j<$i;++$j)
{
if($line[$j]=~/Syn_Name/)
{
do
{
print OUT $line[$j],"\n";
++$j;
}
until($line[$j]=~/^\s*$/)
}
}
This is my code I am trying to print data between Syn_Name and a blank line.
My code extracts the chunk that I need.
But the data between the chunk is printed line by line. I want the data for each chunk to get printed on a single line.
Simplification of your code. Using the flip-flop operator to control the print. Note that printing the final line will not add a newline (unless the line contained more than one newline). At best, it prints the empty string. At worst, it prints whitespace.
You do not need a transition array for the lines, you can use a while loop. In case you want to store the lines anyway, I added a commented line with how that is best done.
#chomp(my #line = <FILE>);
while (<FILE>) {
chomp;
if(/Syn_Name/ .. /^\s*$/) {
print OUT;
print "\n" if /^\s*$/;
}
}
Contents
Idiomatic Perl
Make errors easier to fix
Warnings about common programming errors
Don't execute unless variable names are consistent
Developing this habit will save you lots of time
Perl's range operator
Working demos
Print chomped lines immediately
Join lines with spaces
One more edge case
Idiomatic Perl
You seem to have a background with the C family of languages. This is fine because it gets the job done, but you can let Perl handle the machinery for you, namely
chomp defaults to $_ (also true with many other Perl operators)
push adds an element to the end of an array
to simplify your first loop:
while (<FILE>)
{
chomp;
push #line, $_;
}
Now you don't have update $i to keep track of how many lines you've already added to the array.
On the second loop, instead of using a C-style for loop, use a foreach loop:
The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn …
The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing for comes more naturally.) If VAR is omitted, $_ is set to each value.
This way, Perl handles the bookkeeping for you.
for (#line)
{
# $_ is the current element of #line
...
}
Make errors easier to fix
Sometimes Perl can be too accommodating. Say in the second loop you made an easy typographical error:
for (#lines)
Running your program now produces no output at all, even if the input contains Syn_Name chunks.
A human can look at the code and see that you probably intended to process the array you just created and pluralized the name of the array by mistake. Perl, being eager to help, creates a new empty #lines array, which leaves your foreach loop with nothing to do.
You may delete the spurious s at the end of the array's name but still have a program produces no output! For example, you may have an unhandled combination of inputs that doesn't open the OUT filehandle.
Perl has a couple of easy ways to spare you these (and more!) kinds of frustration from dealing with silent failures.
Warnings about common programming errors
You can turn on an enormous list of warnings that help diagnose common programming problems. With my imagined buggy version of your code, Perl could have told you
Name "main::lines" used only once: possible typo at ./synname line 16.
and after fixing the typo in the array name
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
Right away, you see valuable information that may be difficult or at least tedious to spot unaided:
variable names are inconsistent, and
the program is trying to produce output but needs a little more plumbing.
Don't execute unless variable names are consistent
Notice that even with the potential problems above, Perl tried to execute anyway. With some classes of problems such as the variable-naming inconsistency, you may prefer that Perl not execute your program but stop and make you fix it first. You can tell Perl to be strict about variables:
This generates a compile-time error if you access a variable that wasn't declared via our or use vars, localized via my, or wasn't fully qualified.
The tradeoff is you have to be explicit about which variables you intend to be part of your program instead of allowing them to conveniently spring to life upon first use. Before the first loop, you would declare
my #line;
to express your intent. Then with the bug of a mistakenly pluralized array name, Perl fails with
Global symbol "#lines" requires explicit package name at ./synname line 16.
Execution of ./synname aborted due to compilation errors.
and you know exactly which line contains the error.
Developing this habit will save you lots of time
I begin almost every non-trivial Perl program I write with
#! /usr/bin/env perl
use strict;
use warnings;
The first is the shebang line, an ordinary comment as far as Perl is concerned. The use lines enable the strict pragma and the warnings pragma.
Not wanting to be a strict-zombie, as Mark Dominus chided, I'll point out that use strict; as above with no option makes Perl strict in dealing with three error-prone areas:
strict vars, as described above;
strict refs, disallows use of symbolic references; and
strict subs, requires the programmer to be more careful in referring to subroutines.
This is a highly useful default. See the strict pragma's documentation for more details.
Perl's range operator
The perlop documentation describes .., Perl's range operator, that can help you greatly simplify the logic in your second loop:
In scalar context, .. returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each .. operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated.
In your question, you wrote that you want “data between Syn_Name and a blank line,” which in Perl is spelled
/Syn_Name/ .. /^\s*$/
In your case, you also want to do something special at the end of the range, and .. provides for that case too, ibid.
The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint.
Assigning the value returned from .. (which I usually do to a scalar named $inside or $is_inside) allows you to check whether you're at the end, e.g.,
my $is_inside = /Syn_Name/ .. /^\s*$/;
if ($is_inside =~ /E0$/) {
...
}
Writing it this way also avoids duplicating the code for your terminating condition (the right-hand operand of ..). This way if you need to change the logic, you change it in only one place. When you have to remember, you'll forget sometimes and create bugs.
Working demos
See below for code you can copy-and-paste to get working programs. For demo purposes, they read input from the built-in DATA filehandle and write output to STDOUT. Writing it this way means you can transfer my code into yours with little or no modification.
Print chomped lines immediately
As defined in your question, there's no need for one loop to collect the lines in a temporary array and then another loop to process the array. Consider the following code
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
while (<FILE>)
{
chomp;
if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
my $is_last = $is_inside =~ /E0$/;
print OUT $_, $is_last ? "\n" : ();
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
baz
ERROR IF PRESENT IN OUTPUT!
whose output is
Syn_Namefoobarbaz
We always print the current line, stored in $_. When we're at the end of the range, that is, when $is_last is true, we also print a newline. When $is_last is false, the empty list in the other branch of the ternary operator is the result—meaning we print $_ only, no newline.
Join lines with spaces
You didn't show us an example input, so I wonder whether you really want to butt the lines together rather than joining them with spaces. If you want the latter behavior, then the program becomes
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
my #lines;
while (<FILE>)
{
chomp;
if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
push #lines, $_;
if ($is_inside =~ /E0$/) {
print OUT join(" ", #lines), "\n";
#lines = ();
}
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
baz
ERROR IF PRESENT IN OUTPUT!
This code accumulates in #lines only those lines within a Syn_Name chunk, prints the chunk, and clears out #lines when we see the terminator. The output is now
Syn_Name foo bar baz
One more edge case
Finally, what happens if we see Syn_Name at the end of the file but without a terminating blank line? That may be impossible with your data, but in case you need to handle it, you'll want to use Perl's eof operator.
eof FILEHANDLE
eof
Returns 1 if the next read on FILEHANDLE will return end of file or if FILEHANDLE is not open … An eof without an argument uses the last file read.
So we terminate on either a blank line or end of file.
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
my #lines;
while (<FILE>)
{
s/\s+$//;
#if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
if (my $is_inside = /Syn_Name/ .. /^\s*$/ || eof) {
push #lines, $_;
if ($is_inside =~ /E0$/) {
print OUT join(" ", #lines), "\n";
#lines = ();
}
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
YOU CANT SEE ME!
Syn_Name
quux
potrzebie
Output:
Syn_Name foo bar
Syn_Name quux potrzebie
Here instead of chomp, the code removes any trailing invisible whitespace at the ends of lines. This will make sure spacing between joined lines is uniform even if the input is a little sloppy.
Without the eof check, the program does not print the latter line, which you can see by commenting out the active conditional and uncommenting the other.
Another simplified version:
foreach (grep {chomp; /Syn_Name/ .. /^\s*$/ } <FILE>) {
print OUT;
print OUT "\n" if /^\s*$/;
}

I want to create a perl code to extract what is in the parentheses and port it to a variable

I want to create a perl code to extract what is in the parentheses and port it to a variable.
"(05-NW)HPLaserjet" should become "05-NW"
Something like this:
Catch "("
take out any spaces that exsist in between ()
everything in between () = variable 1
How would I go about doing this?
This is a job for regular expressions. Looks confusing because parens are used as meta characters in regular expression and are also part of the pattern in your example, escaped by backslashes.
C:\temp $ echo (05-NW)HPLaserjet | perl -nlwe "print for m/\(([^)]+)\)/g"
Match opening paren, start capture group, match one or more characters that aren't the closing paren, close capture group, match closing paren.
You can use regular expressions (see perlretut) to match and capture the value. By assigning to a list, you can put your captures into named variables. The global variables $1, $2 etc. are also used for capture groups, so you can use that instead of list assignment if you like.
use strict;
use warnings;
while (<>) # read every line
{
my ($printer_code) = m/
\( # Match literal opening parenthesis
([^\)]*) # Capture group (printer_code): Match characters which aren't right parenthesis, zero or more times
\)/x; # Match literal closing parenthesis
# The 'x' modifier allows you to add whitespace and comments to regex for clarity.
# If you use it, make sure you use '\ ' (or '\s', etc.) for actual literal whitespace matching!
}
__DATA__
(05-NW)HPLaserjet
perldoc perlre
use warnings;
use strict;
my $s = '(05-NW)HPLaserjet';
my ($v) = $s =~ /\((.*)\)/; # Grab everything between parens (including other parens)
$v =~ s/\s//g; # Remove all whitespace
print "$v\n";
__END__
05-NW
See also: Perl Idioms Explained - #ary = $str =~ m/(stuff)/g

Need to print the last occurrence of a string in Perl

I have a script in Perl that searches for an error that is in a config file, but it prints out any occurrence of the error. I need to match what is in the config file and print out only the last time the error occurred. Any ideas?
Wow...I was not expecting this much of a response. I should've been more clear in stating this is for log monitoring on a windows box that sends an alert to Nagios. This is actually my first Perl program and all this information has been very helpful. Does anyone know how I can apply this any of the tail answers on a wintel box?
Another way to do it:
perl -n -e '$e = $1 if /(REGEX_HERE)/; END{ print $e }' CONFIG_FILE_HERE
What exactly do you need to print? The line containing the error? More context than that?
File::ReadBackwards can be helpful.
In outline:
my $errinfo;
while (<>)
{
$errinfo = "whatever" if (m/the error pattern/);
}
print "error: $errinfo\n" if ($errinfo);
This catches all errors, but doesn't print until the end, when only the last one survives.
A brute-force approach involves setting up your own pipeline by pointing STDOUT to tail. This allows you to print all errors, and then it's up to tail to worry about only letting the last one out.
You didn't specify, so I assume a legal config line is of the form
Name = some value
Matching that is straightforward:
^ (starting at the beginning of line)
\w+ (one or more “word characters”)
\s+ (followed by mandatory whitespace)
= (followed by an equals sign)
\s+ (more mandatory whitespace)
.+ (some mandatory value)
$ (finishing at the end of the line)
Gluing it together, we get
#! /usr/bin/perl
use warnings;
use strict;
# for demo only
*ARGV = *DATA;
my $pid = open STDOUT, "|-", "tail", "-1" or die "$0: open: $!";
while (<>) {
print unless /^ \w+ \s+ = \s+ .+ $/x;
}
close STDOUT or warn "$0: close: $!";
__DATA__
This = assignment is ok
But := not this
And == definitely not this
Output:
$ ./lasterr
And == definitely not this
With regular expressions, when you want the last occurrence of a pattern, place ^.* at the front of your pattern. For example, to replace the last X in the input with Y, use
$ echo XABCXXXQQQXX | perl -pe 's/^(.*)X/$1Y/'
XABCXXXQQQXY
Note that the ^ is redundant because regular-expression quantifiers are greedy, but I like having it there for emphasis.
Applying this technique to your problem, you can search for the last line in your config file that contains an error as in the following program:
#! /usr/bin/perl
use warnings;
use strict;
local $_ = do { local $/; scalar <DATA> };
if (/\A.* ^(?! \w+ \s+ = \s+ [^\r\n]+ $) (.+?)$/smx) {
print $1, "\n";
}
__DATA__
This = assignment is ok
But := not this
And == definitely not this
The syntax of the regular expression is a bit different because $_ contains multiple lines, but the principle is the same. \A is similar to ^, but it matches only at the beginning of string to be searched. With the /m switch (“multi-line”), ^ matches at logical line boundaries.
Up to this point, we know the pattern
/\A.* ^ .../
matches the last line that looks like something. The negative look-ahead assertion (?!...) looks for a line that is not a legal config line. Ordinarily . matches any character except newline, but the /s switch (“single line”) lifts this restriction. Specifying [^\r\n]+, that is, one or more characters that are neither carriage return nor line feed, does not allow the match to spill into the next line.
Look-around assertions do not capture, so we grab the offending line with (.+?)$. The reason it's safe to use . in this context is because we know the current line is bad and the non-greedy quantifier +? stops matching as soon as it can, which in this case is the end of the current logical line.
All these regular expressions use the /x switch (“extended mode”) to allow extra whitespace: the aim is to improve readability.