Missing expression in split syntax in perl - perl

I haven't done much (any really) perl programming and today someone sent me some code to review in perl. Mostly I can understand what's happening but there was one line that I simply cannot get my head around, the split function in the code block below
while(<>) {
chomp;
my($v1, $v2, $v3, $v4) = split(/,/);
# ....
I read this blog post and it suggests that the syntax is valid (and I believe that it is). But what I don't understand is what exactly is being split with the Expression missing

The split documentation explains what happens when each argument is omitted.
If only PATTERN is given, EXPR defaults to $_.

This code uses a number of shortcuts. It is equivalent to this:
while(defined($_ = readline(ARGV))) {
chomp($_);
my($v1, $v2, $v3, $v4) = split(/,/, $_, 5);

Related

what does print for mean in Perl?

I need to edit some Perl script and I'm new to this language.
I encountered the following statement:
print for (#$result);
I know that $result is a reference to an array and #$result returns the whole array.
But what does print for mean?
Thank you in advance.
In Perl, there's such a thing as an implicit variable. You may have seen it already as $_. There's a lot of built in functions in perl that will work on $_ by default.
$_ is set in a variety of places, such as loops. So you can do:
while ( <$filehandle> ) {
chomp;
tr/A-Z/a-z/;
s/oldword/newword/;
print;
}
Each of these lines is using $_ and modifying it as it goes. Your for loop is doing the same - each iteration of the loop sets $_ to the current value and print is then doing that by default.
I would point out though - whilst useful and clever, it's also a really good way to make confusing and inscrutable code. In nested loops, for example, it can be quite unclear what's actually going on with $_.
So I'd typically:
avoid writing it explicitly - if you need to do that, you should consider actually naming your variable properly.
only use it in places where it makes it clearer what's going on. As a rule of thumb - if you use it more than twice, you should probably use a named variable instead.
I find it particularly useful if iterating on a file handle. E.g.:
while ( <$filehandle> ) {
next unless m/keyword/; #skips any line without 'keyword' in it.
my ( $wiggle, $wobble, $fronk ) = split ( /:/ ); #split $_ into 3 variables on ':'
print $wobble, "\n";
}
It would be redundant to assign a variable name to capture a line from <$filehandle>, only to immediately discard it - thus instead we use split which by default uses $_ to extract 3 values.
If it's hard to figure out what's going on, then one of the more useful ways is to use perl -MO=Deparse which'll re-print the 'parsed' version of the script. So in the example you give:
foreach $_ (#$result) {
print $_;
}
It is equivalent to for (#$result) { print; }, which is equivalent to for (#$result) { print $_; }. $_ refers to the current element.

Perl - Code review [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am working on a program that takes information from a CSV file as a source to search with through a text file that has "customer packages". I am getting odd counts on only some of the entries, and I can't seem to figure out what is causing the duplicate counts. Can anyone look through my code and tell me if my logic/syntax is off? (probably is). All i am trying to accomplish is to count the total occurances in the text file of an entry in the csv file (packageid,package_description)
Thanks for the help! im going nuts over here.
#!/usr/bin/perl
use strict;
use Text::CSV;
# Variables already declared in the other PL file ** Remove if consolidating **
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new(); # Create a Text::CSV object
open (CSV2, "<", $file2) or die $!; #open CSV file for parsing
while (<CSV2>) {
if ($csv2->parse($_)) {
my #columns2 = $csv2->fields(); # Parse CSV and load into an array for each row.
my $packID = $columns2[0];
my $packDESC = $columns2[1];
my $val = 'customer_packages_report.txt';
chomp ($val);
my $cnt=0;
open (HNDL, "$val") || die "wrong filename";
while ($val = <HNDL>)
{
while ($val =~ /$packID - $packDESC/ig)
{
$cnt++;
}
}
#if ($packDESC =~ /\(/g) {
# $packDESC =~ s/\(/\(/g;
#}
print "Total iterations of $packDESC: $cnt\n";
close (HNDL);
# End original code
} # Close IF
} # Close WHILE
close CSV;
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# Variables already declared in the other PL file ** Remove if consolidating **
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new(); # Create a Text::CSV object
open (CSV2, "<", $file2) or die "I die while opening $file2! $!"; #open CSV file for parsing
while ($each_csv2_line=<CSV2>) {
if ($csv2->parse($each_csv2_line)) {
my #columns2 = $csv2->fields(); # Parse CSV and load into an array for each row.
my $packID = $columns2[0];
my $packDESC = $columns2[1];
my $val = 'customer_packages_report.txt';
chomp ($val);
my $cnt=0;
open (HNDL,"<","$val") or die "wrong filename: $val! $!";
while (<HNDL>){
$cnt++ while (/$packID - $packDESC/ig);
}
#if ($packDESC =~ /\(/g) {
# $packDESC =~ s/\(/\(/g;
#}
print "Total iterations of $packDESC: $cnt\n";
close (HNDL);
# End original code
} # Close IF
} # Close WHILE
# end of script
close CSV;
My recommendations:
Use $HNDL instead of HNDL <- lexical variables for filehandles more better.
Try to catch all mistakes (by defined and ==0 and eq "")
I try to format your code and add some features that i sometimes use. Be better than me and read first Style Coding for Little Perl Monk. And you can be more impressive with this language and write not only writeonly code.
Example (and also a quote):
"The situation is exactly the same for the line-input operator, <>, although Perl does this for you automatically.
It looks like you’re testing the line from STDIN in this while:
while (<STDIN>) {
do_something($_);
}
However, this is a special case in which Perl automatically converts to check $_ for definedness:
while ( defined( $_ = <STDIN> ) ) { # implicitly done
do_something($_);
}
"
Effective Perl Programming, page 24.
You could do a number of things to improve your code:
use warnings;.
Use proper indentation.
Use descriptive variable names. Instead of $file2 (has no meaning, and why is there no file 1?), use $package_file or whatever makes sense.
if you are already using Text::CSV, you can use $csv->getline() to go through the file line by line. This will simplify your code. See the documentation for an example.
chomp($val) removes a newline from the end of a string. You are using it on a string literal you just declared, which has no newline. That doesn't make sense.
Never use the same variable ($val) to do two completely different things. This is extremely confusing.
Might the variables that you are interpolating in the regex contain special characters? If so, you need to escape them. For example, if $packDESC contained a period, it would match any character in the regex. To treat the contents of the variable literally, use \Q..\E, as in this example: /\Q$packID - $packDESC\E/ig.
You are opening customer_packages_report.txt and going through it line-by-line on every line of the csv file. You could simplify this by reading it in once and storing the results in an array.
You don't need a while loop to count matches: $cnt = () = /$packID - $packDESC/ig;. This puts the match in array context, returning an array of matches, then puts it back in scalar context to count the matches. A little bit tricky, but simpler.
It's hard to say exactly what is causing your problem without seeing the data. Might you have some unnecessary repetition that stems from your nested looping over both files? I would start by rewriting to improve your code, then see if the problem still exists.
Your code seems to compile with perl -c without errors, so that's good. If I were to guess, I would assume your problem lies in having meta characters in some of your fields. The regex /$packID - $packDESC/ is vulnerable to meta characters. For example
my $str = "foo? bar";
$str =~ /$str/; # returns false, because ? is a meta character
In the above example, the question mark ? is a quantifier which affects whatever comes before it, so that o? means "0 or 1 o". To solve the meta character problem, use the \Q ... \E escape:
$str =~ /\Q$str/; # will now match
Terminating the escape sequence with \E is optional.
Some other things to note:
It is very good that you use use strict. You should also always use warnings. Not doing so is not removing the issues with your code, only hiding them.
You create a Text::CSV object with default settings. Depending on your input, that may or may not be appropriate. Setting binary => 1 is recommended in the documentation.
Using the parse() function may not be the best option, the documentation has good things to say about getline.
As loldop points out in the comments, you are reusing $val to read from your file. While technically that should work, it is asking for trouble.
Style and practice notes and practical tips:
Using three-argument open and lexical file handles is a good thing to do. Three-argument in essence means to use an explicit open mode, which makes your script safer to use. Using lexical file handles means that you will not have global scope on your file handle, which is a good thing.
This code
my #columns2 = $csv2->fields();
my $packID = $columns2[0];
my $packDESC = $columns2[1];
Can be written like this
my ($packID, $packDESC) = $csv2->fields();
You are chomping $val right after you assign it. That is redundant, because chomp by default only removes newlines from the end of your strings, and you did not add any such. It doesn't change anything, but not required here. If you read something from stdin or a file, you would probably want to use chomp, though.
Using die without referring to the error $! is a sure way to make yourself annoyed.
Do not underestimate how much easier it becomes to write code when you use proper indentation. Use a text editor with automatic indentation and colouring. I can warmly recommend vim (gvim if you are using windows). Though it has a learning curve, is is a powerful editor that also often comes already installed on many systems.
Since so many people have already commented on your program itself, I'm going to talk about how you can become a better Perl programmer, and help write in such a way that will help eliminate many of your issues.
Take a look at Perl::Tidy and run your program thorough that. That will help improve your syntax and Perl and will help you catch a lot of the various issues you're having.
Also, you should get a copy of Perl Best Practices which is where most of Perl Tidy is taken from. And, as someone already referenced Effective Perl Programming is another excellent book.
The big issue with Perl is that few people learn it. Most are tossed into a situation where we had to pick it up ourselves. Plus, Perl is a fairly old and rather crufty language. Most Perl books still lean heavily on Perl 3.x ways of programming and fail to mention such basics as using use strict; and use warnings;.
You combine old programming practices, with most people learning Perl by hacking their way through old programs with old syntax (and probably written by people who learned Perl by hacking their way through even older programs), and you can see why Perl has a reputation of being a write-only language.
You may want to use the getline method from Text::CSV, which saves a few lines of code.
The problem is likely to be because you have regex metacharacters in the strings you are searching for. Escape them with \Q...\E in the regex so that they are taken literally. In the rewrite below I have also added \s* instead of a literal space, just in case there isn't exactly one space on either side of the hyphen.
I have also changed the filehandles to lexical ones, which have the advantage that they will be closed automatically when the handle goes out of scope.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new();
open(my $csv_fh, '<', $file2) or die $!;
while (my $row = $csv2->getline($csv_fh)) {
my ($packID, $packDESC) = #$row;
my $val = 'customer_packages_report.txt';
chomp($val);
open(my $fh, '<', $val) or die "wrong filename";
my $cnt = 0;
while ($val = <$fh>) {
while ($val =~ /\Q$packID\E\s*-\s*\Q$packDESC\E/ig) {
$cnt++;
}
}
print "Total iterations of $packDESC: $cnt\n";
}

How to get rid of use of an uninitialized value within an 'if' construct using a Perl regex

How do I get rid of use of an uninitialized value within an if construct using a Perl regex?
When using the code below, I get use of uninitialized value messages.
if($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/)
When using the code below, I get no output.
if(defined($arrayOld[$i]) =~ /-(.*)/ || defined($arrayOld[$i]) =~ /\#(.*)/)
What is the proper way to check if a variable has a value given the code above?
Try:
if($arrayOld[$i] && $arrayOld[$i] =~ /-|\#(.*)/)
This first checks $arrayOld[$i] for a value before running a regx against it.
(Have also combined the || into the regex.)
From the error message in your comment, you're accessing an element of #arrayOld that isn't defined. Without seeing the rest of the code, this could indicate a bug in your program, or it could just be expected behavior.
If you understand why $arrayOld[$i] is undef, and you want to allow that without getting a warning, there's a couple of things you can do. Perl 5.10.0 introduced the defined-or operator //, which you can use to substitute the empty string for undef:
use 5.010;
...
if(($arrayOld[$i] // '') =~ /-(.*)/ || ($arrayOld[$i] // '') =~ /\#(.*)/)
Or, you can just turn off the warning:
if (do { no warnings 'uninitalized';
$arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/ })
Here, I'm using do to limit the time the warning is disabled. However, turning off the warning also suppresses the warning you'd get if $i were undef. Using // allows you to specify exactly what is allowed to be undef, and exactly what value should be used instead of undef.
Note: defined($arrayOld[$i]) =~ /-(.*)/ is running a pattern match on the result of the defined function, which is just going to be a true/false value; not the string you want to test.
To answer your question narrowly, you can prevent undefined-value warnings in that line of code with
if (defined $i && defined $arrayOld[$i]
&& ($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/))
{
...;
}
That is, evaluating either $i or the expression $arrayOld[$i] may result in an undefined value. Note the additional layer of parentheses that are necessary as written above because of the difference in precedence between && and ||, with the former binding more tightly. For the particular patterns in your question, you could sidestep this precedence issue by combining your patterns into one regex, but this can be tricky to do in the general case.
I recommend against using the unpleasing code above. Read on to see an elegant solution to your problem that has Perl do the work for you and is much easier to read.
Looking back
From the slightly broader context of your earlier question, $i is a loop variable and by construction will certainly be defined, so testing $i is overkill. Your code blindly pulls elements from #arrayOld, and Perl happily obliges. In cases where nothing is there, you get the undefined value.
This sort of one-by-one peeking and poking is common in C programs, but in Perl, it is almost always a red flag that you could express your algorithm more elegantly. Consider the complete, working example below.
Working demonstration
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # given/when
*FILEREAD = *DATA; # for demo only
my #interesting_line = (qr/-(.*)/, qr/\#(.*)/);
$/ = ""; # paragraph mode
while(<FILEREAD>) {
chomp;
my #arrayOld = split /\n/;
my #arrayNewLines;
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
print "\#arrayOld:\n", map("$_\n", #arrayOld), "\n",
"\#arrayNewLines:\n", map("$_\n", #arrayNewLines);
}
__DATA__
#SCSI_test # put this line into #arrayNewLines
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
- ccccccccccccccc # put this line into #arrayNewLines
Front matter
The line
use 5.10.0;
enables Perl’s given/when switch statement, and this makes for a nice way to decide which array gets a given line of input.
As the comment indicates
*FILEREAD = *DATA; # for demo only
is for the purpose of this Stack Overflow demonstration. In your real code, you have open FILEREAD, .... Placing the input from your question into Perl’s DATA filehandle allows presenting code and input in one self-contained unit, and then we alias FILEREAD to DATA so the rest of the code will drop into yours with no fuss.
The main event
The core of the processing is
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
Notice that there are no defined checks or even explicit regex matches! There’s no $i or $arrayOld[$i]! What’s going on?
You start with #arrayOld containing all the lines from the current paragraph and want to end with the interesting lines in #arrayNewLines and everything else staying in #arrayOld. The code above takes the next line out of #arrayOld with shift. If the line is interesting, we push it onto the end of #arrayNewLines. Otherwise, we put it back on the end of #arrayOld.
The statement modifier when #interesting_line performs an implicit smart-match with the topic from given. As explained in “Smart matching in detail,” when smart matching against an array, Perl implicitly loops over it and stops on the first match. In this case, the array #interesting_line contains compiled regexes that match lines you want to move to #arrayNewLines. If the current line (in $_ thanks to given) does not match any of those patterns, it goes back in #arrayOld.
We do the preceding process exactly scalar #arrayOld times, that is, once for each line in the current paragraph. This way, we process everything exactly once and do not have to worry about fussy bookkeeping over where the current array index is. Whatever is left in #arrayOld after that many shifts must be the lines we pushed back onto it, which are the uninteresting lines in the order that the occurred in the input.
Sample output
For the input in your question, the output is
#arrayOld:
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
#arrayNewLines:
#SCSI_test # put this line into #arrayNewLines
- ccccccccccccccc # put this line into #arrayNewLines

Usage of defined with Filehandle and while Loop

While reading a book on advanced Perl programming(1), I came across
this code:
while (defined($s = <>)) {
...
Is there any special reason for using defined here? The documentation for
perlop says:
In these loop constructs, the assigned value (whether assignment is
automatic or explicit) is then tested to see whether it is defined. The
defined test avoids problems where line has a string value that would be
treated as false by Perl, for example a "" or a "0" with no trailing
newline. If you really mean for such values to terminate the loop, they
should be tested for explicitly: [...]
So, would there be a corner case or that's simply because the book is too old
and the automatic defined test was added in a recent Perl version?
(1) Advanced Perl Programming, First Edition, Sriram Srinivasan. O'Reilly
(1997)
Perl has a lot of implicit behaviors, many more than most other languages. Perl's motto is There's More Than One To Do It, and because there is so much implicit behavior, there is often More Than One Way To express the exact same thing.
/foo/ instead of $_ =~ m/foo/
$x = shift instead of $x = shift #_
while (defined($_=<ARGV>)) instead of while(<>)
etc.
Which expressions to use are largely a matter of your local coding standards and personal preference. The more explicit expressions remind the reader what is really going on under the hood. This may or may not improve the readability of the code -- that depends on how knowledgeable the audience is and whether you are using well-known idioms.
In this case, the implicit behavior is a little more complicated than it seems. Sometimes perl will implicitly perform a defined(...) test on the result of the readline operator:
$ perl -MO=Deparse -e 'while($s=<>) { print $s }'
while (defined($s = <ARGV>)) {
print $s;
}
-e syntax OK
but sometimes it won't:
$ perl -MO=Deparse -e 'if($s=<>) { print $s }'
if ($s = <ARGV>) {
print $s;
}
-e syntax OK
$ perl -MO=Deparse -e 'while(some_condition() && ($s=<>)) { print $s }'
while (some_condition() and $s = <ARGV>) {
print $s;
}
-e syntax OK
Suppose that you are concerned about the corner cases that this implicit behavior is supposed to handle. Have you committed perlop to memory so that you understand when Perl uses this implicit behavior and when it doesn't? Do you understand the differences in this behavior between Perl v5.14 and Perl v5.6? Will the people reading your code understand?
Again, there's no right or wrong answer about when to use the more explicit expressions, but the case for using an explicit expression is stronger when the implicit behavior is more esoteric.
Say you have the following file
4<LF>
3<LF>
2<LF>
1<LF>
0
(<LF> represents a line feed. Note the lack of newline on the last line.)
Say you use the code
while ($s = <>) {
chomp;
say $s;
}
If Perl didn't do anything magical, the output would be
4
3
2
1
Note the lack of 0, since the string 0 is false. defined is needed in the unlikely case that
You have a non-standard text file (missing trailing newline).
The last line of the file consists of a single ASCII zero (0x30).
BUT WAIT A MINUTE! If you actually ran the above code with the above data, you would see 0 printed! What many don't know is that Perl automagically translates
while ($s = <>) {
to
while (defined($s = <>)) {
as seen here:
$ perl -MO=Deparse -e'while($s=<DATA>) {}'
while (defined($s = <DATA>)) {
();
}
__DATA__
-e syntax OK
So you technically don't even need to specify defined in this very specific circumstance.
That said, I can't blame someone for being explicit instead of relying on Perl automagically modifying their code. After all, Perl is (necessarily) quite specific as to which code sequences it will change. Note the lack of defined in the following even though it's supposedly equivalent code:
$ perl -MO=Deparse -e'while((), $s=<DATA>) {}'
while ((), $s = <DATA>) {
();
}
__DATA__
-e syntax OK
while($line=<DATA>){
chomp($line);
if(***defined*** $line){
print "SEE:$line\n";
}
}
__DATA__
1
0
3
Try the code with defined removed and you will see the different result.

How can I print a matching line, one line immediately above it and one line immediately below?

From a related question asked by Bi, I've learnt how to print a matching line together with the line immediately below it. The code looks really simple:
#!perl
open(FH,'FILE');
while ($line = <FH>) {
if ($line =~ /Pattern/) {
print "$line";
print scalar <FH>;
}
}
I then searched Google for a different code that can print matching lines with the lines immediately above them. The code that would partially suit my purpose is something like this:
#!perl
#array;
open(FH, "FILE");
while ( <FH> ) {
chomp;
$my_line = "$_";
if ("$my_line" =~ /Pattern/) {
foreach( #array ){
print "$_\n";
}
print "$my_line\n"
}
push(#array,$my_line);
if ( "$#array" > "0" ) {
shift(#array);
}
};
Problem is I still can't figure out how to do them together. Seems my brain is shutting down. Does anyone have any ideas?
Thanks for any help.
UPDATE:
I think I'm sort of touched. You guys are so helpful! Perhaps a little Off-topic, but I really feel the impulse to say more.
I needed a Windows program capable of searching the contents of multiple files and of displaying the related information without having to separately open each file. I tried googling and two apps, Agent Ransack and Devas, have proved to be useful, but they display only the lines containing the matched query and I want aslo to peek at the adjacent lines. Then the idea of improvising a program popped into my head. Years ago I was impressed by a Perl script that could generate a Tomeraider format of Wikipedia so that I can handily search Wiki on my Lifedrive and I've also read somewhere on the net that Perl is easy to learn especially for some guy like me who has no experience in any programming language. Then I sort of started teaching myself Perl a couple of days ago. My first step was to learn how to do the same job as "Agent Ransack" does and it proved to be not so difficult using Perl. I first learnt how to search the contents of a single file and display the matching lines through the modification of an example used in the book titled "Perl by Example", but I was stuck there. I became totally clueless as how to deal with multiple files. No similar examples were found in the book or probably because I was too impatient. And then I tried googling again and was led here and I asked my first question "How can I search multiple files for a string pattern in Perl?" here and I must say this forum is bloody AWESOME ;). Then I looked at more example scripts and then I came up with the following code yesterday and it serves my original purpose quite well:
The codes goes like this:
#!perl
$hits=0;
print "INPUT YOUR QUERY:";
chop ($query = <STDIN>);
$dir = 'f:/corpus/';
#files = <$dir/*>;
foreach $file (#files) {
open (txt, "$file");
while($line = <txt>) {
if ($line =~ /$query/i) {
$hits++;
print "$file \n $line";
print scalar <txt>;
}
}
}
close(txt);
print "$hits RESULTS FOUND FOR THIS SEARCH\n";
In the folder "corpus", I have a lot of text files including srt pdf doc files that contain such contents as follows:
Then I dumped the body.
J'ai mis le corps dans une décharge.
I know you have a wire.
Je sais que tu as un micro.
Now I'll tell you the truth.
Alors je vais te dire la vérité.
Basically I just need to search an English phrase and look at the French equivalent, so the script I finished yesterday is quite satisfying except that it would to be better if my script can display the above line in case I want to search a French phrase and check the English. So I'm trying to improve the code. Actually I knew the "print scalar " is buggy, but it is neat and does the job of printing the subsequent line at least most of the time). I was even expecting ANOTHER SINGLE magic line that prints the previous line instead of the subsequent :) Perl seems to be fun. I think I will spend more time trying to get a better understanding of it. And as suggested by daotoad, I'll study the codes generously offered by you guys. Again thanks you guys!
It will probably be easier just to use grep for this as it allows printing of lines before and after a match. Use -B and -A to print context before and after the match respectively. See http://ss64.com/bash/grep.html
Here's a modernized version of Pax's excellent answer:
use strict;
use warnings;
open( my $fh, '<', 'qq.in')
or die "Error opening file - $!\n";
my $this_line = "";
my $do_next = 0;
while(<$fh>) {
my $last_line = $this_line;
$this_line = $_;
if ($this_line =~ /XXX/) {
print $last_line unless $do_next;
print $this_line;
$do_next = 1;
} else {
print $this_line if $do_next;
$last_line = "";
$do_next = 0;
}
}
close ($fh);
See Why is three-argument open calls with lexical filehandles a Perl best practice? for an discussion of the reasons for the most important changes.
Important changes:
3 argument open.
lexical filehandle
added strict and warnings pragmas.
variables declared with lexical scope.
Minor changes (issues of style and personal taste):
removed unneeded parens from post-fix if
converted an if-not contstruct into unless.
If you find this answer useful, be sure to up-vote Pax's original.
Given the following input file:
(1:first) Yes, this one.
(2) This one as well (XXX).
(3) And this one.
Not this one.
Not this one.
Not this one.
(4) Yes, this one.
(5) This one as well (XXX).
(6) AND this one as well (XXX).
(7:last) And this one.
Not this one.
this little snippet:
open(FH, "<qq.in");
$this_line = "";
$do_next = 0;
while(<FH>) {
$last_line = $this_line;
$this_line = $_;
if ($this_line =~ /XXX/) {
print $last_line if (!$do_next);
print $this_line;
$do_next = 1;
} else {
print $this_line if ($do_next);
$last_line = "";
$do_next = 0;
}
}
close (FH);
produces the following, which is what I think you were after:
(1:first) Yes, this one.
(2) This one as well (XXX).
(3) And this one.
(4) Yes, this one.
(5) This one as well (XXX).
(6) AND this one as well (XXX).
(7:last) And this one.
It basically works by remembering the last line read and, when it finds the pattern, it outputs it and the pattern line. Then it continues to output pattern lines plus one more (with the $do_next variable).
There's also a little bit of trickery in there to ensure no line is printed twice.
You always want to store the last line that you saw in case the next line has your pattern and you need to print it. Using an array like you did in the second code snippet is probably overkill.
my $last = "";
while (my $line = <FH>) {
if ($line =~ /Pattern/) {
print $last;
print $line;
print scalar <FH>; # next line
}
$last = $line;
}
grep -A 1 -B 1 "search line"
I am going to ignore the title of your question and focus on some of the code you posted because it is positively harmful to let this code stand without explaining what is wrong with it. You say:
code that can print matching lines with the lines immediately above them. The code that would partially suit my purpose is something like this
I am going to go through that code. First, you should always include
use strict;
use warnings;
in your scripts, especially since you are just learning Perl.
#array;
This is a pointless statement. With strict, you can declare #array using:
my #array;
Prefer the three-argument form of open unless there is a specific benefit in a particular situation to not using it. Use lexical filehandles because bareword filehandles are package global and can be the source of mysterious bugs. Finally, always check if open succeeded before proceeding. So, instead of:
open(FH, "FILE");
write:
my $filename = 'something';
open my $fh, '<', $filename
or die "Cannot open '$filename': $!";
If you use autodie, you can get away with:
open my $fh, '<', 'something';
Moving on:
while ( <FH> ) {
chomp;
$my_line = "$_";
First, read the FAQ (you should have done so before starting to write programs). See What's wrong with always quoting "$vars"?. Second, if you are going to assign the line that you just read to $my_line, you should do it in the while statement so you do not needlessly touch $_. Finally, you can be strict compliant without typing any more characters:
while ( my $line = <$fh> ) {
chomp $line;
Refer to the previous FAQ again.
if ("$my_line" =~ /Pattern/) {
Why interpolate $my_line once more?
foreach( #array ){
print "$_\n";
}
Either use an explicit loop variable or turn this into:
print "$_\n" for #array;
So, you interpolate $my_line again and add the newline that was removed by chomp earlier. There is no reason to do so:
print "$my_line\n"
And now we come to the line that motivated me to dissect the code you posted in the first place:
if ( "$#array" > "0" ) {
$#array is a number. 0 is a number. > is used to check if the number on the LHS is greater than the number on the RHS. Therefore, there is no need to convert both operands to strings.
Further, $#array is the last index of #array and its meaning depends on the value of $[. I cannot figure out what this statement is supposed to be checking.
Now, your original problem statement was
print matching lines with the lines immediately above them
The natural question, of course, is how many lines "immediately above" the match you want to print.
#!/usr/bin/perl
use strict;
use warnings;
use Readonly;
Readonly::Scalar my $KEEP_BEFORE => 4;
my $filename = $ARGV[0];
my $pattern = qr/$ARGV[1]/;
open my $input_fh, '<', $filename
or die "Cannot open '$filename': $!";
my #before;
while ( my $line = <$input_fh> ) {
$line = sprintf '%6d: %s', $., $line;
print #before, $line, "\n" if $line =~ $pattern;
push #before, $line;
shift #before if #before > $KEEP_BEFORE;
}
close $input_fh;
Command line grep is the quickest way to accomplish this, but if your goal is to learn some Perl then you'll need to produce some code.
Rather than providing code, as others have already done, I'll talk a bit about how to write your own. I hope this helps with the brain-lock.
Read my previous answer on how to write a program, it gives some tips about how to start working on your problem.
Go through each of the sample programs you have, as well as those offered here and comment out exactly what they do. Refer to the perldoc for each function and operator you don't understand. Your first example code has an error, if 2 lines in a row match, the line after the second match won't print. By error, I mean that either the code or the spec is wrong, the desired behavior in this case needs to be determined.
Write out what you want your program to do.
Start filling in the blanks with code.
Here's a sketch of a phase one write-up:
# This program reads a file and looks for lines that match a pattern.
# Open the file
# Iterate over the file
# For each line
# Check for a match
# If match print line before, line and next line.
But how do you get the next line and the previous line?
Here's where creative thinking comes in, there are many ways, all you need is one that works.
You could read in lines one at a time, but read ahead by one line.
You could read the whole file into memory and select previous and follow-on lines by indexing an array.
You could read the file and store the offset and length each line--keeping track of which ones match as you go. Then use your offset data to extract the required lines.
You could read in lines one at a time. Cache your previous line as you go. Use readline to read the next line for printing, but use seek and tell to rewind the handle so that the 'next' line can be checked for a match.
Any of these methods, and many more could be fleshed out into a functioning program. Depending on your goals, and constraints any one may be the best choice for that problem domain. Knowing how to select which one to use will come with experience. If you have time, try two or three different ways and see how they work out.
Good luck.
If you don't mind losing the ability to iterate over a filehandle, you could just slurp the file and iterate over the array:
#!/usr/bin/perl
use strict; # always do these
use warnings;
my $range = 1; # change this to print the first and last X lines
open my $fh, '<', 'FILE' or die "Error: $!";
my #file = <$fh>;
close $fh;
for (0 .. $#file) {
if($file[$_] =~ /Pattern/) {
my #lines = grep { $_ > 0 && $_ < $#file } $_ - $range .. $_ + $range;
print #file[#lines];
}
}
This might get horribly slow for large files, but is pretty easy to understand (in my opinion). Only when you know how it works can you set about trying to optimize it. If you have any questions about any of the functions or operations I used, just ask.