Why can't I concatenate and print the value returned by a function? - perl

I have a Perl program that calls a function, in this case ref, and checks the result. Specifically, I am testing that a variable is a hash reference. In that case, ref will return 'HASH'. I tested it and it worked.
Then I decided to log it, adding a print that displays the result of the same call, but it didn't work correctly. Here is a reduced version:
use strict;
use warnings;
my $book_ref = {};
$book_ref->{'title'} = 'The Lord of the Rings';
if (ref $book_ref eq 'HASH') {
print "ref \$book_ref is a " . ref $book_ref . "\n";
}
print "Program is over\n";
To my surprise, this was the output:
ref $book_ref is a Program is over
And despite using strict and warnings there were neither errors nor warnings.
The call to ref is exactly the same (it's a copy and paste), but while it works correctly inside the if condition, print doesn't display anything, and actually seems to be interrupted, as the newline character is clearly skipped. Why does the behaviour change?

The reason is that the function ref is called without parentheses, and this causes the line to be parsed incorrectly.
When ref is called inside the if, the condition is clearly delimited by parentheses, which means that ref knows perfectly well what its argument is: $book_ref. There's no ambiguity.
Instead, when printing the result, the lack of parentheses means that Perl will parse the line in a way that was not intended:
First it will concatenate $book_ref and "\n". In scalar context, $book_ref evaluates to a string like HASH(0x1cbef70), therefore the result is the string "HASH(0x1cbef70)\n"
Then, ref will be called on the string "HASH(0x1cbef70)\n", producing as output an empty string: ''.
At this point, print prints the empty string, that is, nothing, and stops there. The newline character \n is skipped because it has already been consumed by ref, so print doesn't even see it. And there are no errors.
All of this descends from Perl's operator precedence: from the table,
(...)
8. left + - .
9. left << >>
10. nonassoc named unary operators
11. nonassoc < > <= >= lt gt le ge
12. nonassoc == != <=> eq ne cmp ~~
(...)
where the "function call" is actually a "named unary operator" (the unary operator being ref). So the . operator at line 8 has higher precedence than the function call at line 10, which is why the result of print is not the expected one. On the other hand, the function call has higher precedence than eq (at line 12), which is why inside the if everything works as expected.
The solution to precedence problems is to use .
A possibility is to use parentheses to emphasize the function call:
print "ref \$book_ref is a " . ref($book_ref) . "\n";
Another one, which I like less but nevertheless works, is to use parentheses to isolate the string that must be concatenated, by putting the opening bracket just before ref:
print "ref \$book_ref is a " . (ref $book_ref) . "\n";
Another possible approach, suggested by zdim in a comment, is to use commas:
print "ref \$book_ref is a ", ref $book_ref, "\n";
When I first wrote the if I decided to avoid the parentheses to make the code more readable. Then I copied it and didn't notice the problem. I ended up wasting 2 hours to find the bug.
The best solution seems to be the first one, because if you copy it to another place (like another print) you are guaranteed to also copy the parentheses that prevent the problem. With the second one I probably wouldn't realize how important the parentheses are and wouldn't copy them. And the third one works only if you remember that you have to always use commas and not dots, which is not obvious and therefore error prone. So, although they work, I consider them less safe.
Other comments have also suggested to use printf, which requires dealing with format specifiers, or expression interpolation, like print "ref \$book_ref is a ${\ ref $book_ref }\n";, which I find harder to read.
Bottom line: always use the parentheses.

Related

How to tell perl to print to a file handle instead of printing the file handle?

I'm trying to wrap my head around the way Perl handles the parsing of arguments to print.
Why does this
print $fh $stufftowrite
write to the file handle as expected, but
print($fh, $stufftowrite)
writes the file handle to STDOUT instead?
My guess is that it has something to do with the warning in the documentation of print:
Be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print; put parentheses around all arguments (or interpose a + , but that doesn't look as good).
Should I just get used to the first form (which just doesn't seem right to me, coming from languages that all use parentheses around function arguments), or is there a way to tell Perl to do what I want?
So far I've tried a lot of combination of parentheses around the first, second and both parameters, without success.
On lists
The structure bareword (LIST1), LIST2 means "apply the function bareword to the arguments LIST1", while bareword +(LIST1), LIST2 can, but doesn't neccessarily mean "apply bareword to the arguments of the combined list LIST1, LIST2". This is important for grouping arguments:
my ($a, $b, $c) = (0..2);
print ($a or $b), $c; # print $b
print +($a or $b), $c; # print $b, $c
The prefix + can also be used to distinguish hashrefs from blocks, and functions from barewords, e.g. when subscripting an hash: $hash{shift} returns the shift element, while $hash{+shift} calls the function shift and returns the hash element of the value of shift.
Indirect syntax
In object oriented Perl, you normally call methods on an object with the arrow syntax:
$object->method(LIST); # call `method` on `$object` with args `LIST`.
However, it is possible, but not recommended, to use an indirect notation that puts the verb first:
method $object (LIST); # the same, but stupid.
Because classes are just instances of themselves (in a syntactic sense), you can also call methods on them. This is why
new Class (ARGS); # bad style, but pretty
is the same as
Class->new(ARGS); # good style, but ugly
However, this can sometimes confuse the parser, so indirect style is not recommended.
But it does hint on what print does:
print $fh ARGS
is the same as
$fh->print(ARGS)
Indeed, the filehandle $fh is treated as an object of the class IO::Handle.
(While this is a valid syntactic explanation, it is not quite true. The source of IO::Handle itself uses the line print $this #_;. The print function is just defined this way.)
Looks like you have a typo. You have put a comma between the file handle and the argument in the second print statement. If you do that, the file handle will be seen as an argument. This seems to apply only to lexical file handles. If done with a global file handle, it will produce the fatal error
No comma allowed after filehandle at ...
So, to be clear, if you absolutely have to have parentheses for your print, do this:
print($fh $stufftowrite)
Although personally I prefer to not use parentheses unless I have to, as they just add clutter.
Modern Perl book states in the Chapter 11 ("What to Avoid"), section "Indirect Notation Scalar Limitations":
Another danger of the syntax is that the parser expects a single scalar expression as the object. Printing to a filehandle stored in an aggregate variable seems obvious, but it is not:
# DOES NOT WORK AS WRITTEN
say $config->{output} 'Fun diagnostic message!';
Perl will attempt to call say on the $config object.
print, close, and say—all builtins which operate on filehandles—operate in an indirect fashion. This was fine when filehandles were package globals, but lexical filehandles (Filehandle References) make the indirect object syntax problems obvious. To solve this, disambiguate the subexpression which produces the intended invocant:
say {$config->{output}} 'Fun diagnostic message!';
Of course, print({$fh} $stufftowrite) is also possible.
It's how the syntax of print is defined. It's really that simple. There's kind of nothing to fix. If you put a comma between the file handle and the rest of the arguments, the expression is parsed as print LIST rather than print FILEHANDLE LIST. Yes, that looks really weird. It is really weird.
The way not to get parsed as print LIST is to supply an expression that can legally be parsed as print FILEHANDLE LIST. If what you're trying to do is get parentheses around the arguments to print to make it look more like an ordinary function call, you can say
print($fh $stufftowrite); # note the lack of comma
You can also say
(print $fh $stufftowrite);
if what you're trying to do is set off the print expression from surrounding code. The key point is that including the comma changes the parse.

How to get rid of use of an uninitialized value within an 'if' construct using a Perl regex

How do I get rid of use of an uninitialized value within an if construct using a Perl regex?
When using the code below, I get use of uninitialized value messages.
if($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/)
When using the code below, I get no output.
if(defined($arrayOld[$i]) =~ /-(.*)/ || defined($arrayOld[$i]) =~ /\#(.*)/)
What is the proper way to check if a variable has a value given the code above?
Try:
if($arrayOld[$i] && $arrayOld[$i] =~ /-|\#(.*)/)
This first checks $arrayOld[$i] for a value before running a regx against it.
(Have also combined the || into the regex.)
From the error message in your comment, you're accessing an element of #arrayOld that isn't defined. Without seeing the rest of the code, this could indicate a bug in your program, or it could just be expected behavior.
If you understand why $arrayOld[$i] is undef, and you want to allow that without getting a warning, there's a couple of things you can do. Perl 5.10.0 introduced the defined-or operator //, which you can use to substitute the empty string for undef:
use 5.010;
...
if(($arrayOld[$i] // '') =~ /-(.*)/ || ($arrayOld[$i] // '') =~ /\#(.*)/)
Or, you can just turn off the warning:
if (do { no warnings 'uninitalized';
$arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/ })
Here, I'm using do to limit the time the warning is disabled. However, turning off the warning also suppresses the warning you'd get if $i were undef. Using // allows you to specify exactly what is allowed to be undef, and exactly what value should be used instead of undef.
Note: defined($arrayOld[$i]) =~ /-(.*)/ is running a pattern match on the result of the defined function, which is just going to be a true/false value; not the string you want to test.
To answer your question narrowly, you can prevent undefined-value warnings in that line of code with
if (defined $i && defined $arrayOld[$i]
&& ($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/))
{
...;
}
That is, evaluating either $i or the expression $arrayOld[$i] may result in an undefined value. Note the additional layer of parentheses that are necessary as written above because of the difference in precedence between && and ||, with the former binding more tightly. For the particular patterns in your question, you could sidestep this precedence issue by combining your patterns into one regex, but this can be tricky to do in the general case.
I recommend against using the unpleasing code above. Read on to see an elegant solution to your problem that has Perl do the work for you and is much easier to read.
Looking back
From the slightly broader context of your earlier question, $i is a loop variable and by construction will certainly be defined, so testing $i is overkill. Your code blindly pulls elements from #arrayOld, and Perl happily obliges. In cases where nothing is there, you get the undefined value.
This sort of one-by-one peeking and poking is common in C programs, but in Perl, it is almost always a red flag that you could express your algorithm more elegantly. Consider the complete, working example below.
Working demonstration
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # given/when
*FILEREAD = *DATA; # for demo only
my #interesting_line = (qr/-(.*)/, qr/\#(.*)/);
$/ = ""; # paragraph mode
while(<FILEREAD>) {
chomp;
my #arrayOld = split /\n/;
my #arrayNewLines;
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
print "\#arrayOld:\n", map("$_\n", #arrayOld), "\n",
"\#arrayNewLines:\n", map("$_\n", #arrayNewLines);
}
__DATA__
#SCSI_test # put this line into #arrayNewLines
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
- ccccccccccccccc # put this line into #arrayNewLines
Front matter
The line
use 5.10.0;
enables Perl’s given/when switch statement, and this makes for a nice way to decide which array gets a given line of input.
As the comment indicates
*FILEREAD = *DATA; # for demo only
is for the purpose of this Stack Overflow demonstration. In your real code, you have open FILEREAD, .... Placing the input from your question into Perl’s DATA filehandle allows presenting code and input in one self-contained unit, and then we alias FILEREAD to DATA so the rest of the code will drop into yours with no fuss.
The main event
The core of the processing is
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
Notice that there are no defined checks or even explicit regex matches! There’s no $i or $arrayOld[$i]! What’s going on?
You start with #arrayOld containing all the lines from the current paragraph and want to end with the interesting lines in #arrayNewLines and everything else staying in #arrayOld. The code above takes the next line out of #arrayOld with shift. If the line is interesting, we push it onto the end of #arrayNewLines. Otherwise, we put it back on the end of #arrayOld.
The statement modifier when #interesting_line performs an implicit smart-match with the topic from given. As explained in “Smart matching in detail,” when smart matching against an array, Perl implicitly loops over it and stops on the first match. In this case, the array #interesting_line contains compiled regexes that match lines you want to move to #arrayNewLines. If the current line (in $_ thanks to given) does not match any of those patterns, it goes back in #arrayOld.
We do the preceding process exactly scalar #arrayOld times, that is, once for each line in the current paragraph. This way, we process everything exactly once and do not have to worry about fussy bookkeeping over where the current array index is. Whatever is left in #arrayOld after that many shifts must be the lines we pushed back onto it, which are the uninteresting lines in the order that the occurred in the input.
Sample output
For the input in your question, the output is
#arrayOld:
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
#arrayNewLines:
#SCSI_test # put this line into #arrayNewLines
- ccccccccccccccc # put this line into #arrayNewLines

Question about precedence + repetition modifer

Please could you explain this apparently inconsistent behaviour to me:
use strict;
my #a;
print "a" x 2; # this prints: aa
#a = "a" x 2; print #a; # this prints: aa
print ("a") x 2; # this prints: a
#a = ("a") x 2; print #a; # this prints: aa
Shouldn't the last one print a single 'a'?
Edit: Ok so now this is kind of making more sense to me:
"Binary "x" is the repetition operator...In list context, if the left operand is enclosed in parentheses or is a list formed by qw/STRING/, it repeats the list." perlop
This is as clear as mud to me (binary x - why use the word binary? Is there a denary X?)
But anyway:
#a = ("a") x 2 # seems to be in list context, in that we have an array at the beginning - an array is not a list, but it contains a list, so I think we probably have a list context, (not an array context although they might be synonymous).
i suppose the "left operand" is ("a"). (It's either that or #a). perlop doesn't say what an operand actually is, querying perldoc.perl.org gives "No matches found", and googling gives "In computer programming, an operand is a term used to describe any object that is capable of being manipulated." Like an array for instance.
So the left operand might be enclosed in brackets so maybe it should "repeat the list". The list is either: ("a") x 2
or it is: ("a")
If we repeated ("a") x 2 we would get ("a") x 2 ("a") x 2. This doesn't seem right.
If we type: print $a[1] we will get a single 'a', so "it repeats the list" means Perl turns ("a") x 2 into ("a", "a") so we effectively get #a=("a", "a")
However, print ("a") x 2 does not get turned into ("a", "a"). That's because print is a "list operator" with a high precedence. So there we effectively get: (print ("a")) x 2
An array is a term so it also has a high precedence, but #a=stuff involves the assignation operator = which has a relatively low precedence. So it's quite different from print.
add use warnings; in your script, then you will get warnings like
print (...) interpreted as function .
Useless use of repeat (x) in void context .
Do it like print (("a") x 2); # this prints: aa
As you mentioned in comments, how to format your code, i would say see Perltidy.
Perltidy is a Perl script which indents and reformats Perl scripts to make them easier to read. If you write Perl scripts, or spend much time reading them, you will probably find it useful.
Some more information about Perltidy:
You can download perltidy and run it
with its defaults, or customise it to use your preferred bracing style, indentation width etc.
Customisation is easy and there are a huge number of options. The defaults are usually a fine starting place.
Perltidy can also be used to generate colourised HTML output of your code.
Perltidy comes with a sensible set of defaults; but they may not be right for you. Fortunately you can use tidyview as a graphical user interface to preview perltidy’s changes to your code and see which options suit you best. You can download tidyview from CPAN.
Note: Always add use strict and use warnings at the begining of your scripts.
I think the reason that you are getting strange behavior is because the print command for perl is to expect a list. Quoting from the relevant documentation:
Also be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print; put parentheses around all the arguments.
Putting parentheses makes it act like a function. Consider if you perform this test case:
x("y") x 2;
sub x {
my $y = shift;
print "1: $y\n"; #result 1: y
my $y = shift;
print "2: $y\n"; #result 2:
}
When written as a function call, your results are consistent for a function being evaluated to its arguments rather than an assignment being evaluated after the following operations.
You are being bitten by a common Perl parsing gotcha. Your third statement print ("a") x 2 is parsed as:
(print ("a")) x 2;
You can add another set of parentheses to fix the parsing:
print (("a") x 2); # prints aa

Why can't I have a literal list slice right after a print in Perl?

I see I can do something like this:
print STDOUT (split /\./, 'www.stackoverflow.com')[1];
and "stackoverflow" is printed. However, this:
print +(split /\./, 'www.stackoverflow.com')[1];
does the same, and this:
print (split /\./, 'www.stackoverflow.com')[1];
is a syntax error. So what exactly is going on here? I've always understood the unary plus sign to do nothing whatsoever in any context. And if "print FILEHANDLE EXPR" works, I would have imagined that "print EXPR" would always work equally well. Any insights?
You do not have warnings enabled. In the print(...)[1] case, the set of parentheses are regarded as part of the function syntax.
print (...) interpreted as function at C:\Temp\t.pl line 4.
From, perldoc -f print:
Also be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print—interpose a + or put parentheses around all the arguments.
See also Why aren't newlines being printed in this Perl code?
perldoc for print includes this nugget:
Also be careful not to follow the print keyword with
a left parenthesis unless you want the corresponding right
parenthesis to terminate the arguments to the print--interpose
a "+" or put parentheses around all the arguments.
print always evaluates its arguments in LIST context.
To say
print (split /\./, 'www.stackoverflow.com')
is ok. But when you say
print (split /\./, 'www.stackoverflow.com')[0]
the parser expects a LIST after it sees the first (, and considers the LIST to be complete when it sees the closing ). The [0] is not interpreted as operating on anything, so you get a syntax error.
print "abc","def"; # prints "abcdef"
print ("abc","def"); # prints "abcdef"
print ("abc"), "def"; # prints "abc"
Other Perl functions that can take a LIST as the first argument behave the same way:
warn ($message),"\n" # \n not passed to warn, line # info not suppressed
system ("echo"),"x" # not same as system("echo","x") or system "echo","x"
# or system(("echo"),"x")

Can someone suggest how this Perl script works?

I have to maintain the following Perl script:
#!/usr/bin/perl -w
die "Usage: $0 <file1> <file2>\n" unless scalar(#ARGV)>1;
undef $/;
my #f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
my #f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2);
foreach my $g (0 .. $#f1) {
print (($f2[$g] =~ m/RESULT:\s+PASS/) ? $f2[$g] : $f1[$g]);
}
print STDERR "$#f1 serials found\n";
I know pretty much what it does, but how it's done is difficult to follow. The calls to split() are particulary puzzling.
It's fairly idiomatic Perl and I would be grateful if a Perl expert could make a few clarifying suggestions about how it does it, so that if I need to use it on input files it can't deal with, I can attempt to modify it.
It combines the best results from two datalog files containing test results. The datalog files contain results for various serial numbers and the data for each serial number begins and ends with SERIAL NUMBER: n (I know this because my equipment creates the input files)
I could describe the format of the datalog files, but I think the only important aspect is the SERIAL NUMBER: n because that's all the Perl script checks for
The ternary operator is used to print a value from one input file or the other, so the output can be redirected to a third file.
This may not be what I would call "idiomatic" (that would be use Module::To::Do::Task) but they've certainly (ab)used some language features here. I'll see if I can't demystify some of this for you.
die "Usage: $0 <file1> <file2>\n" unless scalar(#ARGV)>1;
This exits with a usage message if they didn't give us any arguments. Command-line arguments are stored in #ARGV, which is like C's char **argv except the first element is the first argument, not the program name. scalar(#ARGV) converts #ARGV to "scalar context", which means that, while #ARGV is normally a list, we want to know about it's scalar (i.e. non-list) properties. When a list is converted to scalar context, we get the list's length. Therefore, the unless conditional is satisfied only if we passed no arguments.
This is rather misleading, because it will turn out your program needs two arguments. If I wrote this, I would write:
die "Usage: $0 <file1> <file2>\n" unless #ARGV == 2;
Notice I left off the scalar(#ARGV) and just wrote #ARGV. The scalar() function forces scalar context, but if we're comparing equality with a number, Perl can implicitly assume scalar context.
undef $/;
Oof. The $/ variable is a special Perl built-in variable that Perl uses to tell what a "line" of data from a file is. Normally, $/ is set to the string "\n", meaning when Perl tries to read a line it will read up until the next linefeed (or carriage return/linefeed on Windows). Your writer has undef-ed the variable, though, which means when you try to read a "line", Perl will just slurp up the whole file.
my #f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
This is a fun one. <> is a special filehandle that reads line-by-line from each file given on the command line. However, since we've told Perl that a "line" is an entire file, calling <> once will read in the entire file given in the first argument, and storing it temporarily as a string.
Then we take that string and split() it up into pieces, using the regex /(?=(?:SERIAL NUMBER:\s+\d+))/. This uses a lookahead, which tells our regex engine "only match if this stuff comes after our match, but this stuff isn't part of our match," essentially allowing us to look ahead of our match to check on more info. It basically splits the file into pieces, where each piece (except possibly the first) begins with "SERIAL NUMBER:", some arbitrary whitespace (the \s+ part), and then some digits (the \d+ part). I can't teach you regexes, so for more info I recommend reading perldoc perlretut - they explain all of that better than I ever will.
Once we've split the string into a list, we store that list in a list called #f1.
my #f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
This does the same thing as the last line, only to the second file, because <> has already read the entire first file, and storing the list in another variable called #f2.
die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2);
This line prints an error message if #f1 and #f2 are different sizes. $#f1 is a special syntax for arrays - it returns the index of the last element, which will usually be the size of the list minus one (lists are 0-indexed, like in most languages). He uses this same value in his error message, which may be deceptive, as it will print 1 fewer than might be expected. I would write it as:
die "Error: file $ARGV[0] has ", $#f1 + 1, " serials, file $ARGV[1] has ", $#f2 + 1, "\n"
if $#f1 != $#f2;
Notice I changed "file1" to "file $ARGV[0]" - that way, it will print the name of the file you specified, rather than just the ambiguous "file1". Notice also that I split up the die() function and the if() condition on two lines. I think it's more readable that way. I also might write unless $#f1 == $#f2 instead of if $#f1 != $#f2, but that's just because I happen to think != is an ugly operator. There's more than one way to do it.
foreach my $g (0 .. $#f1) {
This is a common idiom in Perl. We normally use for() or foreach() (same thing, really) to iterate over each element of a list. However, sometimes we need the indices of that list (some languages might use the term "enumerate"), so we've used the range operator (..) to make a list that goes from 0 to $#f1, i.e., through all the indices of our list, since $#f1 is the value of the highest index in our list. Perl will loop through each index, and in each loop, will assign the value of that index to the lexically-scoped variable $g (though why they didn't use $i like any sane programmer, I don't know - come on, people, this tradition has been around since Fortran!). So the first time through the loop, $g will be 0, and the second time it will be 1, and so on until the last time it is $#f1.
print (($f2[$g] =~ m/RESULT:\s+PASS/) ? $f2[$g] : $f1[$g]);
This is the body of our loop, which uses the ternary conditional operator ?:. There's nothing wrong with the ternary operator, but if the code gives you trouble we can just change it to an if(). Let's just go ahead and rewrite it to use if():
if($f2[$g] =~ m/RESULT:\s+PASS/) {
print $f2[$g];
} else {
print $f1[$g];
}
Pretty simple - we do a regex check on $f2[$g] (the entry in our second file corresponding to the current entry in our first file) that basically checks whether or not that test passed. If it did, we print $f2[$g] (which will tell us that test passed), otherwise we print $f1[$g] (which will tell us the test that failed).
print STDERR "$#f1 serials found\n";
This just prints an ending diagnostic message telling us how many serials were found (minus one, again).
I personally would rewrite that whole hairy bit where he hacks with $/ and then does two reads from <> to be a loop, because I think that would be more readable, but this code should work fine, so if you don't have to change it too much you should be in good shape.
The undef $/ line deactivates the input record separator. Instead of reading records line by line, the interpreter will read whole files at once after that.
The <>, or 'diamond operator' reads from the files from the command line or standard input, whichever makes sense. In your case, the command line is explicitely checked, so it will be files. Input record separation has been deactivated, so each time you see a <>, you can think of it as a function call returning a whole file as a string.
The split operators take this string and cut it in chunks, each time it meets the regular expression in argument. The (?= ... ) construct means "the delimiter is this, but please keep it in the chunked result anyway."
That's all there is to it. There would always be a few optimizations, simplifications, or "other ways to do it," but this should get you running.
You can get quick glimpse how the script works, by translating it into java or scala. The inccode.com translator delivers following java code:
public class script extends CRoutineProcess implements IInProcess
{
VarArray arrF1 = new VarArray();
VarArray arrF2 = new VarArray();
VarBox call ()
{
// !/usr/bin/perl -w
if (!(BoxSystem.ProgramArguments.scalar().isGT(1)))
{
BoxSystem.die(BoxString.is(VarString.is("Usage: ").join(BoxSystem.foundArgument.get(0
).toString()).join(" <file1> <file2>\n")
)
);
}
BoxSystem.InputRecordSeparator.empty();
arrF1.setValue(BoxConsole.readLine().split(BoxPattern.is("(?=(?:SERIAL NUMBER:\\s+\\d+))")));
arrF2.setValue(BoxConsole.readLine().split(BoxPattern.is("(?=(?:SERIAL NUMBER:\\s+\\d+))")));
if ((arrF1.length().isNE(arrF2.length())))
{
BoxSystem.die("Error: file1 has $#f1 serials, file2 has $#f2\n");
}
for (
VarBox varG : VarRange.is(0,arrF1.length()))
{
BoxSystem.print((arrF2.get(varG).like(BoxPattern.is("RESULT:\\s+PASS"))) ? arrF2.get(varG) : arrF1.get(varG)
);
}
return STDERR.print("$#f1 serials found\n");
}
}