Clarification on chomp - perl

I'm on break from classes right now and decided to spend my time learning Perl. I'm working with Beginning Perl (http://www.perl.org/books/beginning-perl/) and I'm finishing up the exercises at the end of chapter three.
One of the exercises asked that I "Store your important phone numbers in a hash. Write a program to look up numbers by the person's name."
Anyway, I had come up with this:
#!/usr/bin/perl
use warnings;
use strict;
my %name_number=
(
Me => "XXX XXX XXXX",
Home => "YYY YYY YYYY",
Emergency => "ZZZ ZZZ ZZZZ",
Lookup => "411"
);
print "Enter the name of who you want to call (Me, Home, Emergency, Lookup)", "\n";
my $input = <STDIN>;
print "$input can be reached at $name_number{$input}\n";
And it just wouldn't work. I kept getting this error message:
Use of uninitialized value in concatenation (.) or string at hello.plx
line 17, line 1
I tried playing around with the code some more but each "solution" looked more complex than the "solution" that came before it. Finally, I decided to check the answers.
The only difference between my code and the answer was the presence of chomp ($input); after <STDIN>;.
Now, the author has used chomp in previous example but he didn't really cover what chomp was doing. So, I found this answer on www.perlmeme.org:
The chomp() function will remove (usually) any newline character from
the end of a string. The reason we say usually is that it actually
removes any character that matches the current value of $/ (the input
record separator), and $/ defaults to a newline..
Anyway, my questions are:
What newlines are getting removed? Does Perl automatically append a "\n" to the input from <STDIN>? I'm just a little unclear because when I read "it actually removes any character that matches the current value of $/", I can't help but think "I don't remember putting a $/ anywhere in my code."
I'd like to develop best practices as soon as possible - is it best to always include chomp after <STDIN> or are there scenarios where it's unnecessary?

<STDIN> reads to the end of the input string, which contains a newline if you press return to enter it, which you probably do.
chomp removes the newline at the end of a string. $/ is a variable (as you found, defaulting to newline) that you probably don't have to worry about; it just tells perl what the 'input record separator' is, which I'm assuming means it defines how far <FILEHANDLE> reads. You can pretty much forget about it for now, it seems like an advanced topic. Just pretend chomp chomps off a trailing newline. Honestly, I've never even heard of $/ before.
As for your other question, it is generally cleaner to always chomp variables and add newlines as needed later, because you don't always know if a variable has a newline or not; by always chomping variables you always get the same behavior. There are scenarios where it is unnecessary, but if you're not sure it can't hurt to chomp it.
Hope this helps!

OK, as of 1), perl doesn't add any \n at input. It is you that hit Enter when finished entering the number. If you don't specify $/, a default of \n will be put (under UNIX, at least).
As of 2), chomp will be needed whenever input comes from the user, or whenever you want to remove the line ending character (reading from a file, for example).
Finally, the error you're getting may be from perl not understanding your variable within the double quotes of the last print, because it does have a _ character. Try to write the string as follows:
print "$input can be reached at ${name_number{$input}}\n";
(note the {} around the last variable).

<STDIN> is a short-cut notation for readline( *STDIN );. What readline() does is reads the file handle until it encounters the contents of $/ (aka $INPUT_RECORD_SEPARATOR) and returns everything it has read including the contents of $/. What chomp() does is remove the last occurrence contents of $/, if present.
The contents is often called a newline character but it may be composed of more than one character. On Linux, it contains a LF character but on Windows, it contains CR-LF.
See:
perldoc -f readline
perldoc -f chomp
perldoc perlvar and search for /\$INPUT_RECORD_SEPARATOR/

I think best practice here is to write:
chomp(my $input = <STDIN>);
Here is quick example how chomp function ($/ meaning is explained there) works removing just one trailing new line (if any):
chomp (my $input = "Me\n"); # OK
chomp ($input = "Me"); # OK (nothing done)
chomp ($input = "Me\n\n"); # $input now is "Me\n";
chomp ($input); # finally "Me"
print "$input can be reached at $name_number{$input}\n";
BTW: That's funny thing is that I am learning Perl too and I reached hashes five minutes ago.

Though it may be obvious, it's still worth mentioning why the chomp is needed here.
The hash created contains 4 lookup keys: "Me", "Home", "Emergency" and "Lookup"
When $input is specified from <STDIN>, it'll contain "Me\n", "Me\r\n" or some other line-ending variant depending on what operating system is being used.
The uninitialized value error comes about because the "Me\n" key does not exist in the hash. And this is why the chomp is needed:
my $input = <STDIN>; # "Me\n" --> Key DNE, $name_number{$input} not defined
chomp $input; # "Me" --> Key exists, $name_number{$input} defined

Related

Reading a file line by line in Perl

I want to read a file by one line, but it's reading just the first line. How to read all lines?
My code:
open(file_E, $file_E);
while ( <file_E> ) {
/([^\n]*)/;
print $line1;
}
close($file_E);
Let's start by looking at your code.
open(file_E, $file_E);
while ( <file_E> ) {
/([^\n]*)/;
print $line1;
}
close($file_E);
On the first line you open a file named in $file_E using the bareword filehandle file_E. This should work so long as the file successfully opens. It would be better to also check the success of this operation one of two ways: Either put use autodie; at the top of your script (but then risk applying its semantics in places where your code is incompatible with this level of error handling), or change your open to look like this:
open(file_E, $file_E) or die "Failed to open $file_E: $!\n";
Now if you fail to open the file you will get an error message that will help track down the problem.
Next lets look at the while loop, because it's here where you have an issue that is causing the bug you are experiencing. On the first line of the while loop you have this:
while ( <file_E> ) {
By consulting perldoc perlsyn you will see that line is special-cased to actually do this:
while (defined($_ = <file_E>)) {
So your code is implicitly assigning each line to $_ on successive iterations. Also by consulting perldoc perlop you'll find that when the match operator (/.../ or m/.../) is invoked without binding the match explicitly using =~, the match will bind against $_. Still then, so far so good. However, you are not actually doing anything useful with the match. The match operator will return Boolean truth / falsehood for whether or not the match succeeded. And because your pattern contains capturing parenthesis, it will capture something into the capture variable $1. But you are never testing for match success, nor are you ever referring to $1 again.
On the line that follows, you do this: print $line1. Where, in your code, is $line1 being assigned a value? Because it is never being assigned a value in what you've shown us.
I can only guess that your intent is to iterate over the lines of the file, capture the line but without the trailing newline, and then print it. It seems that you wish to print it without any newlines, so that all of the input file is printed as a single line of output.
open my $input_fh_e, '<', $file_E or die "Failed to open $file_E: $!\n";
while(my $line = <$input_fh_e>) {
chomp $line;
print $line;
}
close $input_fh_e or die "Failed to close $file_E: $!\n";
No need to capture anything -- if all that the capture is doing is just grabbing everything up to the newline, you can simply strip off the newline with chomp to begin with.
In my example I used a lexical filehandle (a file handle that is lexically scoped, declared with my). This is generally a better practice in modern Perl as it avoids using a bareword, avoids possible namespace collisions, and assures that the handle will get closed as soon as the lexical scope closes.
I also used the 'three arg' version of open, which is safer because it eliminates the potential for $file_E to be used to open a pipe or do some other nefarious or simply unintended shell manipulation.
I suggest also starting your script with use strict;, because had you done so, you would have gotten an error message at compiletime telling you that $line1 was never declared. Also start your script with use warnings, so that you would get a warning when you try to print $line1 before assigning a value to it.
Most of the issues in your code will be discussed in perldoc perlintro, which you can arrive at from your command line simply by typing perldoc perlintro, assuming you have Perl installed. It typically takes 20-40 minutes to read through perlintro. If ever there were a document that should constitute required reading before getting started writing Perl code, that reading would probably include perlintro.
Another alternative, note that $_ will include newline so you will need to chomp it if you don't want the newline in $line:
open(file_E, $file_E);
while ( <file_E> ) {
my $line = $_;
print $line;
}
close($file_E);

Need further understanding in the next unless code I'm reading

I need help with 2 things on this code that I'm reading. First, is I keep seeing this inside of while loop to read a file:
wile(<filename>){
next unless (/\w/);
chomp;
s/^\s*//;
s/^\s*$//;
my($name, $datatype, $io, $dummy) = split /\s*,\s*/, $_, 4;
}
So, I'm wondering what that is doing? Because there are commas in the same line being read, so wouldn't the commas make it go to the next iteration? SO how would it split the lines if it is going to another iteration when the commas are being read?
Another one I'm stomped by is:
while (<AP>) {
chomp;
s/
//g;
}
I have no idea what that code is actually substituting...
Thanks!
The first snippet:
Reads a line from a filehandle called filename. This is a really bad name for a filehandle
It skips the processing if there is not even a single \w (word character) on the line.
The next unless (/\w/); is the same as next if not (/\w/). Note that there is no need for parenthesis -- next unless /\w/; is fine.
A word character is, from perlretut
\w matches a word character (alphanumeric or _), not just [0-9a-zA-Z_] but also digits and characters from non-roman scripts
It removes (only) the newline with chomp. Then it removes leading spaces, if any
It removes blank lines, the ones with only spaces on them
It splits the line by commas, allowing that they have spaces before and/or after. It also limits the number of terms returned, to 4. This means that it returns the first three comma-separated fields, and then all the rest as one string in the last element of the list
The second snippet is really bad, whatever it is meant to do. (Remove spaces on the line?)
Comments
It is far better to use lexical filehandles, rather than barenames. So you'd open a file as
open my $fh, '<', $filename or die "Can't open $filename: $!";
and read it by while (my $line = <$fh>) or by while (<$fh>).
Normally you'll see lines skipped if they have nothing other than spaces
next unless /\S/; # or
next if /^\s*$/;
Using \w also skips lines with some characters (other than what is matched by \w), which means that one had better be very sure that those are fine to skip.
Here it may be meant to skip a line with commas but no \w (comma is not matched by \w), for which split would return spaces (or empty strings) in a list. I find this a bit hidden and fragile. I'd drop lines with spaces only, and handle possible loose commas in processing. As it stands it doesn't help with ,,a, anyway, what yields ('', '', 'a'). So checking is probably needed in any case.
Note that altogether this code leaves trailing spaces. When split is invoked with the optional fourth argument it keeps all spaces, and they haven't been removed otherwise.

Print Line By Line

I've been trying to work on a lyrical bot for my server, but before I started to work on it, I wanted to give it a test so I came up with this script using the Lyrics::Fetcher module.
use strict;
use warnings;
use Lyrics::Fetcher;
my ($artist, $song) = ('Coldplay', 'Adventures Of A Lifetime');
my $lyrics = Lyrics::Fetcher->fetch($artist, $song, [qw(LyricWiki AstraWeb)]);
my #lines = split("\n\r", $lyrics);
foreach my $line (#lines) {
sleep(10);
print $line;
}
This script works fine, it grabs the lyrics and prints it out in a whole(which is not what I'm looking for).
I was hoping to achieve a line by line print of the lyrics every 10 seconds. Help please?
Your call to split looks suspicious. In particular the regex "\n\r". Note, the first argument to split is always interpreted as a regex regardless of whether you supply a quoted string.
On Unix systems the line ending is typically "\n". On DOS/Windows it's "\r\n" (the reverse of what you have). On ancient Macs it was "\r". To match all thre you could do:
my #lines = split(/\r\n|\n|\r/, $lyrics);
You will need to enable autoflush, otherwise the lines will just be buffered and printed when the buffer is full or when the program terminates
STDOUT->autoflush;
You can use the regex generic newline pattern \R to split on any line ending, whether your data contains CR, LF, or CR LF. This feature is available only in Perl v5.10 or better
my #lines = split /\R/, $lyrics;
And you will need to print a newline after each line of lyrics, because the split will have removed them
print $line, "\n";

Perl Program to Mimic RNA Synthesis

Looking for suggestions on how to approach my Perl programming homework assignment to write an RNA synthesis program. I've summed and outlined the program below. Specifically, I'm looking for feedback on the blocks below (I'll number for easy reference). I've read up to chapter 6 in Elements of Programming with Perl by Andrew Johnson (great book). I've also read the perlfunc and perlop pod-pages with nothing jumping out on where to start.
Program Description: The program should read an input file from the command line, translate it into RNA, and then transcribe the RNA into a sequence of uppercase one-letter amino acid names.
Accept a file named on the command line
here I will use the <> operator
Check to make sure the file only contains acgt or die
if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }
Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C)
not sure how to do this
Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"
not sure but I'm thinking this is where I will start a %hash variables?
Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)
Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)
If a gap is encountered a new line is started and process is repeated
not sure but we can assume that gaps are multiples of threes.
Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?
Note
Must be self contained program (stored values for codon names & symbols).
Whenever the program reads a codon that has no symbol this is a gap in the RNA, it should start a new line of output and begin at the next occurance of "AUG". For simplicity we can assume that gaps are always multiples of threes.
Before I spend any additional hours on research I am hoping to get confirmation that I'm taking the right approach. Thanks for taking time to read and for sharing your expertise!
1. here I will use the <> operator
OK, your plan is to read the file line by line. Don't forget to chomp each line as you go, or you'll end up with newline characters in your sequence.
2. Check to make sure the file only contains acgt or die
if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }
In a while loop, the <> operator puts the line read into the special variable $_, unless you assign it explicitly (my $line = <>).
In the code above, you're reading one line from the file and discarding it. You'll need to save that line.
Also, the ne operator compares two strings, not one string and one regular expression. You'll need the !~ operator here (or the =~ one, with a negated character class [^acgt]. If you need the test to be case-insensitive, look into the i flag for regular expression matching.
3. Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C).
As GWW said, check your biology. T->U is the only step in transcription. You'll find the tr (transliterate) operator helpful here.
4. Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"
not sure but I'm thinking this is where I will start a %hash variables?
I would use a buffer here. Define an scalar outside the while(<>) loop. Use index to match "AUG". If you don't find it, put the last two bases on that scalar (you can use substr $line, -2, 2 for that). On the next iteration of the loop append (with .=) the line to those two bases, and then test for "AUG" again. If you get a hit, you'll know where, so you can mark the spot and start translation.
5. Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)
Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)
Again, as GWW said, build a hash table:
%codons = ( AUG => 'M', ...).
Then you can use (for eg.) split to build an array of the current line you're examining, build codons three elements at a time, and grab the correct aminoacid code from the hash table.
6.If a gap is encountered a new line is started and process is repeated
not sure but we can assume that gaps are multiples of threes.
See above. You can test for the existence of a gap with exists $codons{$current_codon}.
7. Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?
You know, looking at the above, it seems way too complex. I built a few building blocks; the subroutines read_codon and translate: I think they help the logic of the program immensely.
I know this is a homework assignment, but I figure it might help you get a feel for other possible approaches:
use warnings; use strict;
use feature 'state';
# read_codon works by using the new [state][1] feature in Perl 5.10
# both #buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, #buffer always holds whatever was left
# from the previous call.
# The base case is that #buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the #buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the #buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the #buffer, join them into a string,
# transcribe it and return it.
sub read_codon {
my ($file) = #_;
state #buffer;
open state $handle, '<', $file or die $!;
if (#buffer < 3) {
my $new_line = scalar <$handle> or return;
chomp $new_line;
push #buffer, split //, $new_line;
}
return transcribe(
join '',
shift #buffer,
shift #buffer,
shift #buffer
);
}
sub transcribe {
my ($codon) = #_;
$codon =~ tr/T/U/;
return $codon;
}
# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it,
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub
{
our %codes = (
AUG => 'M',
...
);
sub translate {
my ($codon) = #_ or return;
state $TRANSLATE = 0;
$TRANSLATE = 1 if $codon =~ m/AUG/i;
$TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;
return $codes{$codon} if $TRANSLATE;
}
}
I can give you a few hints on a few of your points.
I think your first goal should be to parse the file character by character, ensuring each one is valid, group them into sets of three nucleotides and then work on your other goals.
I think your biology is a bit off as well, when you transcribe DNA to RNA you need to think about what strands are involved. You may not need to "complement" your bases during your transcription step.
2. You should check this as your parse the file character by character.
3. You could do this with a loop and some if statements or hash
4. This could probably be done with a counter as you read the file character by character. Since you need to insert a space after every 3rd character.
5. This would be a good place to use a hash that's based on the amino acid codon table.
6. You'll have to look for the gap character as you parse the file. This seems to contradict your #2 requirement since the program says your text can only contain ATGC.
There are a lot of perl functions that could make this easier. There are also perl modules such as bioperl. But I think using some of these could defeat the purpose of your assignment.
Look at BioPerl and browse the source-modules for indicators on how to go about it.

Can someone suggest how this Perl script works?

I have to maintain the following Perl script:
#!/usr/bin/perl -w
die "Usage: $0 <file1> <file2>\n" unless scalar(#ARGV)>1;
undef $/;
my #f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
my #f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2);
foreach my $g (0 .. $#f1) {
print (($f2[$g] =~ m/RESULT:\s+PASS/) ? $f2[$g] : $f1[$g]);
}
print STDERR "$#f1 serials found\n";
I know pretty much what it does, but how it's done is difficult to follow. The calls to split() are particulary puzzling.
It's fairly idiomatic Perl and I would be grateful if a Perl expert could make a few clarifying suggestions about how it does it, so that if I need to use it on input files it can't deal with, I can attempt to modify it.
It combines the best results from two datalog files containing test results. The datalog files contain results for various serial numbers and the data for each serial number begins and ends with SERIAL NUMBER: n (I know this because my equipment creates the input files)
I could describe the format of the datalog files, but I think the only important aspect is the SERIAL NUMBER: n because that's all the Perl script checks for
The ternary operator is used to print a value from one input file or the other, so the output can be redirected to a third file.
This may not be what I would call "idiomatic" (that would be use Module::To::Do::Task) but they've certainly (ab)used some language features here. I'll see if I can't demystify some of this for you.
die "Usage: $0 <file1> <file2>\n" unless scalar(#ARGV)>1;
This exits with a usage message if they didn't give us any arguments. Command-line arguments are stored in #ARGV, which is like C's char **argv except the first element is the first argument, not the program name. scalar(#ARGV) converts #ARGV to "scalar context", which means that, while #ARGV is normally a list, we want to know about it's scalar (i.e. non-list) properties. When a list is converted to scalar context, we get the list's length. Therefore, the unless conditional is satisfied only if we passed no arguments.
This is rather misleading, because it will turn out your program needs two arguments. If I wrote this, I would write:
die "Usage: $0 <file1> <file2>\n" unless #ARGV == 2;
Notice I left off the scalar(#ARGV) and just wrote #ARGV. The scalar() function forces scalar context, but if we're comparing equality with a number, Perl can implicitly assume scalar context.
undef $/;
Oof. The $/ variable is a special Perl built-in variable that Perl uses to tell what a "line" of data from a file is. Normally, $/ is set to the string "\n", meaning when Perl tries to read a line it will read up until the next linefeed (or carriage return/linefeed on Windows). Your writer has undef-ed the variable, though, which means when you try to read a "line", Perl will just slurp up the whole file.
my #f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
This is a fun one. <> is a special filehandle that reads line-by-line from each file given on the command line. However, since we've told Perl that a "line" is an entire file, calling <> once will read in the entire file given in the first argument, and storing it temporarily as a string.
Then we take that string and split() it up into pieces, using the regex /(?=(?:SERIAL NUMBER:\s+\d+))/. This uses a lookahead, which tells our regex engine "only match if this stuff comes after our match, but this stuff isn't part of our match," essentially allowing us to look ahead of our match to check on more info. It basically splits the file into pieces, where each piece (except possibly the first) begins with "SERIAL NUMBER:", some arbitrary whitespace (the \s+ part), and then some digits (the \d+ part). I can't teach you regexes, so for more info I recommend reading perldoc perlretut - they explain all of that better than I ever will.
Once we've split the string into a list, we store that list in a list called #f1.
my #f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
This does the same thing as the last line, only to the second file, because <> has already read the entire first file, and storing the list in another variable called #f2.
die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2);
This line prints an error message if #f1 and #f2 are different sizes. $#f1 is a special syntax for arrays - it returns the index of the last element, which will usually be the size of the list minus one (lists are 0-indexed, like in most languages). He uses this same value in his error message, which may be deceptive, as it will print 1 fewer than might be expected. I would write it as:
die "Error: file $ARGV[0] has ", $#f1 + 1, " serials, file $ARGV[1] has ", $#f2 + 1, "\n"
if $#f1 != $#f2;
Notice I changed "file1" to "file $ARGV[0]" - that way, it will print the name of the file you specified, rather than just the ambiguous "file1". Notice also that I split up the die() function and the if() condition on two lines. I think it's more readable that way. I also might write unless $#f1 == $#f2 instead of if $#f1 != $#f2, but that's just because I happen to think != is an ugly operator. There's more than one way to do it.
foreach my $g (0 .. $#f1) {
This is a common idiom in Perl. We normally use for() or foreach() (same thing, really) to iterate over each element of a list. However, sometimes we need the indices of that list (some languages might use the term "enumerate"), so we've used the range operator (..) to make a list that goes from 0 to $#f1, i.e., through all the indices of our list, since $#f1 is the value of the highest index in our list. Perl will loop through each index, and in each loop, will assign the value of that index to the lexically-scoped variable $g (though why they didn't use $i like any sane programmer, I don't know - come on, people, this tradition has been around since Fortran!). So the first time through the loop, $g will be 0, and the second time it will be 1, and so on until the last time it is $#f1.
print (($f2[$g] =~ m/RESULT:\s+PASS/) ? $f2[$g] : $f1[$g]);
This is the body of our loop, which uses the ternary conditional operator ?:. There's nothing wrong with the ternary operator, but if the code gives you trouble we can just change it to an if(). Let's just go ahead and rewrite it to use if():
if($f2[$g] =~ m/RESULT:\s+PASS/) {
print $f2[$g];
} else {
print $f1[$g];
}
Pretty simple - we do a regex check on $f2[$g] (the entry in our second file corresponding to the current entry in our first file) that basically checks whether or not that test passed. If it did, we print $f2[$g] (which will tell us that test passed), otherwise we print $f1[$g] (which will tell us the test that failed).
print STDERR "$#f1 serials found\n";
This just prints an ending diagnostic message telling us how many serials were found (minus one, again).
I personally would rewrite that whole hairy bit where he hacks with $/ and then does two reads from <> to be a loop, because I think that would be more readable, but this code should work fine, so if you don't have to change it too much you should be in good shape.
The undef $/ line deactivates the input record separator. Instead of reading records line by line, the interpreter will read whole files at once after that.
The <>, or 'diamond operator' reads from the files from the command line or standard input, whichever makes sense. In your case, the command line is explicitely checked, so it will be files. Input record separation has been deactivated, so each time you see a <>, you can think of it as a function call returning a whole file as a string.
The split operators take this string and cut it in chunks, each time it meets the regular expression in argument. The (?= ... ) construct means "the delimiter is this, but please keep it in the chunked result anyway."
That's all there is to it. There would always be a few optimizations, simplifications, or "other ways to do it," but this should get you running.
You can get quick glimpse how the script works, by translating it into java or scala. The inccode.com translator delivers following java code:
public class script extends CRoutineProcess implements IInProcess
{
VarArray arrF1 = new VarArray();
VarArray arrF2 = new VarArray();
VarBox call ()
{
// !/usr/bin/perl -w
if (!(BoxSystem.ProgramArguments.scalar().isGT(1)))
{
BoxSystem.die(BoxString.is(VarString.is("Usage: ").join(BoxSystem.foundArgument.get(0
).toString()).join(" <file1> <file2>\n")
)
);
}
BoxSystem.InputRecordSeparator.empty();
arrF1.setValue(BoxConsole.readLine().split(BoxPattern.is("(?=(?:SERIAL NUMBER:\\s+\\d+))")));
arrF2.setValue(BoxConsole.readLine().split(BoxPattern.is("(?=(?:SERIAL NUMBER:\\s+\\d+))")));
if ((arrF1.length().isNE(arrF2.length())))
{
BoxSystem.die("Error: file1 has $#f1 serials, file2 has $#f2\n");
}
for (
VarBox varG : VarRange.is(0,arrF1.length()))
{
BoxSystem.print((arrF2.get(varG).like(BoxPattern.is("RESULT:\\s+PASS"))) ? arrF2.get(varG) : arrF1.get(varG)
);
}
return STDERR.print("$#f1 serials found\n");
}
}