search a paragraph in reverse order without array in perl - perl

i have a log file where the errors will be mentioned as "ERROR" in the beginning of the line and next line will have the detailed text about the error. I would like to search for "ERROR" in the reverse order so i can find the last error and print the next line or copy is the line to a variable.
In shell i can try the below command which will help me to achieve the same. Can some one give me a equivalent perl code.
grep -A2 ERROR sapinst.log | tail -2
As the log file will be huge (~5000+ lines), so I don't want to store it in an array.

Your file size is rather small and perl is pretty quick, so I wouldn't worry about reverse order that much. This little program reads lines of input from the files you specify on the command line (or standard input if you specify none), skips lines until it finds ERROR, then prints that and the next line:
#!perl
while( <> ) {
next unless /ERROR/;
print;
print scalar <>;
}
From there you can use tail if you like. This Perl does the same as the grep you posted (although since you already have that solution I wonder why you want a different one).
If you don't want to use tail, keep track of the two lines you'll output and replace them when you find a new set:
my( $error_line, $next_line );
while( <> ) {
next unless /ERROR/;
$error_line = $_;
$next_line = scalar <>;
}
print $error_line, $next_line;
If you have a recent enough perl, you can use the safer double diamond line input operator:
use v5.22;
my( $error_line, $next_line );
while( <<>> ) {
next unless /ERROR/;
$error_line = $_;
$next_line = scalar <<>>;
}
print $error_line, $next_line;
You can use File::ReadBackwards, but you'll have to do the same task by remembering every line then checking if the previous line had ERROR. For you data sizes, the benefit probably isn't apparent. If the simple solution isn't fast enough, it's time to get fancier (but not before then).

Related

Perl, find a match and read next line in perl

I would like to use
myscript.pl targetfolder/*
to read some number from ASCII files.
myscript.pl
#list = <#ARGV>;
# Is the whole file or only 1st line is loaded?
foreach $file ( #list ) {
open (F, $file);
}
# is this correct to judge if there is still file to load?
while ( <F> ) {
match_replace()
}
sub match_replace {
# if I want to read the 5th line in downward, how to do that?
# if I would like to read multi lines in multi array[row],
# how to do that?
if ( /^\sName\s+/ ) {
$name = $1;
}
}
I would recommend a thorough read of perlintro - it will give you a lot of the information you need. Additional comments:
Always use strict and warnings. The first will enforce some good coding practices (like for example declaring variables), the second will inform you about potential mistakes. For example, one warning produced by the code you showed would be readline() on unopened filehandle F, giving you the hint that F is not open at that point (more on that below).
#list = <#ARGV>;: This is a bit tricky, I wouldn't recommend it - you're essentially using glob, and expanding targetfolder/* is something your shell should be doing, and if you're on Windows, I'd recommend Win32::Autoglob instead of doing it manually.
foreach ... { open ... }: You're not doing anything with the files once you've opened them - the loop to read from the files needs to be inside the foreach.
"Is the whole file or only 1st line is loaded?" open doesn't read anything from the file, it just opens it and provides a filehandle (which you've named F) that you then need to read from.
I'd strongly recommend you use the more modern three-argument form of open and check it for errors, as well as use lexical filehandles since their scope is not global, as in open my $fh, '<', $file or die "$file: $!";.
"is this correct to judge if there is still file to load?" Yes, while (<$filehandle>) is a good way to read a file line-by-line, and the loop will end when everything has been read from the file. You may want to use the more explicit form while (my $line = <$filehandle>), so that your variable has a name, instead of the default $_ variable - it does make the code a bit more verbose, but if you're just starting out that may be a good thing.
match_replace(): You're not passing any parameters to the sub. Even though this code might still "work", it's passing the current line to the sub through the global $_ variable, which is not a good practice because it will be confusing and error-prone once the script starts getting longer.
if (/^\sName\s+/){$name = $1;}: Since you've named the sub match_replace, I'm guessing you want to do a search-and-replace operation. In Perl, that's called s/search/replacement/, and you can read about it in perlrequick and perlretut. As for the code you've shown, you're using $1, but you don't have any "capture groups" ((...)) in your regular expression - you can read about that in those two links as well.
"if I want to read the 5th line in downward , how to do that ?" As always in Perl, There Is More Than One Way To Do It (TIMTOWTDI). One way is with the range operator .. - you can skip the first through fourth lines by saying next if 1..4; at the beginning of the while loop, this will test those line numbers against the special $. variable that keeps track of the most recently read line number.
"and if I would like to read multi lines in multi array[row], how to do that ?" One way is to use push to add the current line to the end of an array. Since keeping the lines of a file in an array can use up more memory, especially with large files, I'd strongly recommend making sure you think through the algorithm you want to use here. You haven't explained why you would want to keep things in an array, so I can't be more specific here.
So, having said all that, here's how I might have written that code. I've added some debugging code using Data::Dumper - it's always helpful to see the data that your script is working with.
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper; # for debugging
$Data::Dumper::Useqq=1;
for my $file (#ARGV) {
print Dumper($file); # debug
open my $fh, '<', $file or die "$file: $!";
while (my $line = <$fh>) {
next if 1..4;
chomp($line); # remove line ending
match_replace($line);
}
close $fh;
}
sub match_replace {
my ($line) = #_; # get argument(s) to sub
my $name;
if ( $line =~ /^\sName\s+(.*)$/ ) {
$name = $1;
}
print Data::Dumper->Dump([$line,$name],['line','name']); # debug
# ... do more here ...
}
The above code is explicitly looping over #ARGV and opening each file, and I did say above that more verbose code can be helpful in understanding what's going on. I just wanted to point out a nice feature of Perl, the "magic" <> operator (discussed in perlop under "I/O Operators"), which will automatically open the files in #ARGV and read lines from them. (There's just one small thing, if I want to use the $. variable and have it count the lines per file, I need to use the continue block I've shown below, this is explained in eof.) This would be a more "idiomatic" way of writing that first loop:
while (<>) { # reads line into $_
next if 1..4;
chomp; # automatically uses $_ variable
match_replace($_);
} continue { close ARGV if eof } # needed for $. (and range operator)

Reading a file line by line in Perl

I want to read a file by one line, but it's reading just the first line. How to read all lines?
My code:
open(file_E, $file_E);
while ( <file_E> ) {
/([^\n]*)/;
print $line1;
}
close($file_E);
Let's start by looking at your code.
open(file_E, $file_E);
while ( <file_E> ) {
/([^\n]*)/;
print $line1;
}
close($file_E);
On the first line you open a file named in $file_E using the bareword filehandle file_E. This should work so long as the file successfully opens. It would be better to also check the success of this operation one of two ways: Either put use autodie; at the top of your script (but then risk applying its semantics in places where your code is incompatible with this level of error handling), or change your open to look like this:
open(file_E, $file_E) or die "Failed to open $file_E: $!\n";
Now if you fail to open the file you will get an error message that will help track down the problem.
Next lets look at the while loop, because it's here where you have an issue that is causing the bug you are experiencing. On the first line of the while loop you have this:
while ( <file_E> ) {
By consulting perldoc perlsyn you will see that line is special-cased to actually do this:
while (defined($_ = <file_E>)) {
So your code is implicitly assigning each line to $_ on successive iterations. Also by consulting perldoc perlop you'll find that when the match operator (/.../ or m/.../) is invoked without binding the match explicitly using =~, the match will bind against $_. Still then, so far so good. However, you are not actually doing anything useful with the match. The match operator will return Boolean truth / falsehood for whether or not the match succeeded. And because your pattern contains capturing parenthesis, it will capture something into the capture variable $1. But you are never testing for match success, nor are you ever referring to $1 again.
On the line that follows, you do this: print $line1. Where, in your code, is $line1 being assigned a value? Because it is never being assigned a value in what you've shown us.
I can only guess that your intent is to iterate over the lines of the file, capture the line but without the trailing newline, and then print it. It seems that you wish to print it without any newlines, so that all of the input file is printed as a single line of output.
open my $input_fh_e, '<', $file_E or die "Failed to open $file_E: $!\n";
while(my $line = <$input_fh_e>) {
chomp $line;
print $line;
}
close $input_fh_e or die "Failed to close $file_E: $!\n";
No need to capture anything -- if all that the capture is doing is just grabbing everything up to the newline, you can simply strip off the newline with chomp to begin with.
In my example I used a lexical filehandle (a file handle that is lexically scoped, declared with my). This is generally a better practice in modern Perl as it avoids using a bareword, avoids possible namespace collisions, and assures that the handle will get closed as soon as the lexical scope closes.
I also used the 'three arg' version of open, which is safer because it eliminates the potential for $file_E to be used to open a pipe or do some other nefarious or simply unintended shell manipulation.
I suggest also starting your script with use strict;, because had you done so, you would have gotten an error message at compiletime telling you that $line1 was never declared. Also start your script with use warnings, so that you would get a warning when you try to print $line1 before assigning a value to it.
Most of the issues in your code will be discussed in perldoc perlintro, which you can arrive at from your command line simply by typing perldoc perlintro, assuming you have Perl installed. It typically takes 20-40 minutes to read through perlintro. If ever there were a document that should constitute required reading before getting started writing Perl code, that reading would probably include perlintro.
Another alternative, note that $_ will include newline so you will need to chomp it if you don't want the newline in $line:
open(file_E, $file_E);
while ( <file_E> ) {
my $line = $_;
print $line;
}
close($file_E);

How to remove a specific word from a file in perl

A file contains:
rhost=localhost
ruserid=abcdefg_xxx
ldir=
lfile=
rdir=p01
rfile=
pgp=none
mainframe=no
ftpmode=binary
ftpcmd1=
ftpcmd2=
ftpcmd3=
ftpcmd1a=
ftpcmd2a=
notifycc=no
firstfwd=Yes
NOTIFYNYL=
decompress=no
compress=no
I want to write a simple code that removes the "_xxx" in that second line. Keep in mind that there will never be a file that contains the string "_xxx" so that should make it extremely easier. I'm just not too familiar with the syntax. Thanks!
The short answer:
Here's how you can remove just the literal '_xxx'.
perl -pli.bak -e 's/_xxx$//' filename
The detailed explanation:
Since Perl has a reputation for code that is indistinguishable from line noise, here's an explanation of the steps.
-p creates an implicit loop that looks something like this:
while( <> ) {
# Your code goes here.
}
continue {
print or die;
}
-l sort of acts like "auto-chomp", but also places the line ending back on the line before printing it again. It's more complicated than that, but in its simplest use, it changes your implicit loop to look like this:
while( <> ) {
chomp;
# Your code goes here.
}
continue {
print $_, $/;
}
-i tells Perl to "edit in place." Behind the scenes it creates a separate output file and at the end it moves that temporary file to replace the original.
.bak tells Perl that it should create a backup named 'originalfile.bak' so that if you make a mistake it can be reversed easily enough.
Inside the substitution:
s/
_xxx$ # Match (but don't capture) the final '_xxx' in the string.
/$1/x; # Replace the entire match with nothing.
The reference material:
For future reference, information on the command line switches used in Perl "one-liners" can be obtained in Perl's documentation at perlrun. A quick introduction to Perl's regular expressions can be found at perlrequick. And a quick overview of Perl's syntax is found at perlintro.
This overwrites the original file, getting rid of _xxx in the 2nd line:
use warnings;
use strict;
use Tie::File;
my $filename = shift;
tie my #lines, 'Tie::File', $filename or die $!;
$lines[1] =~ s/_xxx//;
untie #lines;
Maybe this can help
perl -ple 's/_.*// if /^ruserid/' < file
will remove anything after the 1st '_' (included) in the line what start with "ruserid".
One way using perl. In second line ($. == 2), delete from last _ until end of line:
perl -lpe 's/_[^_]*\Z// if $. == 2' infile

How do I print on a single line all content between certain start- and stop-lines?

while(<FILE>)
{
chomp $_;
$line[$i]=$_;
++$i;
}
for($j=0;$j<$i;++$j)
{
if($line[$j]=~/Syn_Name/)
{
do
{
print OUT $line[$j],"\n";
++$j;
}
until($line[$j]=~/^\s*$/)
}
}
This is my code I am trying to print data between Syn_Name and a blank line.
My code extracts the chunk that I need.
But the data between the chunk is printed line by line. I want the data for each chunk to get printed on a single line.
Simplification of your code. Using the flip-flop operator to control the print. Note that printing the final line will not add a newline (unless the line contained more than one newline). At best, it prints the empty string. At worst, it prints whitespace.
You do not need a transition array for the lines, you can use a while loop. In case you want to store the lines anyway, I added a commented line with how that is best done.
#chomp(my #line = <FILE>);
while (<FILE>) {
chomp;
if(/Syn_Name/ .. /^\s*$/) {
print OUT;
print "\n" if /^\s*$/;
}
}
Contents
Idiomatic Perl
Make errors easier to fix
Warnings about common programming errors
Don't execute unless variable names are consistent
Developing this habit will save you lots of time
Perl's range operator
Working demos
Print chomped lines immediately
Join lines with spaces
One more edge case
Idiomatic Perl
You seem to have a background with the C family of languages. This is fine because it gets the job done, but you can let Perl handle the machinery for you, namely
chomp defaults to $_ (also true with many other Perl operators)
push adds an element to the end of an array
to simplify your first loop:
while (<FILE>)
{
chomp;
push #line, $_;
}
Now you don't have update $i to keep track of how many lines you've already added to the array.
On the second loop, instead of using a C-style for loop, use a foreach loop:
The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn …
The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing for comes more naturally.) If VAR is omitted, $_ is set to each value.
This way, Perl handles the bookkeeping for you.
for (#line)
{
# $_ is the current element of #line
...
}
Make errors easier to fix
Sometimes Perl can be too accommodating. Say in the second loop you made an easy typographical error:
for (#lines)
Running your program now produces no output at all, even if the input contains Syn_Name chunks.
A human can look at the code and see that you probably intended to process the array you just created and pluralized the name of the array by mistake. Perl, being eager to help, creates a new empty #lines array, which leaves your foreach loop with nothing to do.
You may delete the spurious s at the end of the array's name but still have a program produces no output! For example, you may have an unhandled combination of inputs that doesn't open the OUT filehandle.
Perl has a couple of easy ways to spare you these (and more!) kinds of frustration from dealing with silent failures.
Warnings about common programming errors
You can turn on an enormous list of warnings that help diagnose common programming problems. With my imagined buggy version of your code, Perl could have told you
Name "main::lines" used only once: possible typo at ./synname line 16.
and after fixing the typo in the array name
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
Right away, you see valuable information that may be difficult or at least tedious to spot unaided:
variable names are inconsistent, and
the program is trying to produce output but needs a little more plumbing.
Don't execute unless variable names are consistent
Notice that even with the potential problems above, Perl tried to execute anyway. With some classes of problems such as the variable-naming inconsistency, you may prefer that Perl not execute your program but stop and make you fix it first. You can tell Perl to be strict about variables:
This generates a compile-time error if you access a variable that wasn't declared via our or use vars, localized via my, or wasn't fully qualified.
The tradeoff is you have to be explicit about which variables you intend to be part of your program instead of allowing them to conveniently spring to life upon first use. Before the first loop, you would declare
my #line;
to express your intent. Then with the bug of a mistakenly pluralized array name, Perl fails with
Global symbol "#lines" requires explicit package name at ./synname line 16.
Execution of ./synname aborted due to compilation errors.
and you know exactly which line contains the error.
Developing this habit will save you lots of time
I begin almost every non-trivial Perl program I write with
#! /usr/bin/env perl
use strict;
use warnings;
The first is the shebang line, an ordinary comment as far as Perl is concerned. The use lines enable the strict pragma and the warnings pragma.
Not wanting to be a strict-zombie, as Mark Dominus chided, I'll point out that use strict; as above with no option makes Perl strict in dealing with three error-prone areas:
strict vars, as described above;
strict refs, disallows use of symbolic references; and
strict subs, requires the programmer to be more careful in referring to subroutines.
This is a highly useful default. See the strict pragma's documentation for more details.
Perl's range operator
The perlop documentation describes .., Perl's range operator, that can help you greatly simplify the logic in your second loop:
In scalar context, .. returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each .. operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated.
In your question, you wrote that you want “data between Syn_Name and a blank line,” which in Perl is spelled
/Syn_Name/ .. /^\s*$/
In your case, you also want to do something special at the end of the range, and .. provides for that case too, ibid.
The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint.
Assigning the value returned from .. (which I usually do to a scalar named $inside or $is_inside) allows you to check whether you're at the end, e.g.,
my $is_inside = /Syn_Name/ .. /^\s*$/;
if ($is_inside =~ /E0$/) {
...
}
Writing it this way also avoids duplicating the code for your terminating condition (the right-hand operand of ..). This way if you need to change the logic, you change it in only one place. When you have to remember, you'll forget sometimes and create bugs.
Working demos
See below for code you can copy-and-paste to get working programs. For demo purposes, they read input from the built-in DATA filehandle and write output to STDOUT. Writing it this way means you can transfer my code into yours with little or no modification.
Print chomped lines immediately
As defined in your question, there's no need for one loop to collect the lines in a temporary array and then another loop to process the array. Consider the following code
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
while (<FILE>)
{
chomp;
if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
my $is_last = $is_inside =~ /E0$/;
print OUT $_, $is_last ? "\n" : ();
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
baz
ERROR IF PRESENT IN OUTPUT!
whose output is
Syn_Namefoobarbaz
We always print the current line, stored in $_. When we're at the end of the range, that is, when $is_last is true, we also print a newline. When $is_last is false, the empty list in the other branch of the ternary operator is the result—meaning we print $_ only, no newline.
Join lines with spaces
You didn't show us an example input, so I wonder whether you really want to butt the lines together rather than joining them with spaces. If you want the latter behavior, then the program becomes
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
my #lines;
while (<FILE>)
{
chomp;
if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
push #lines, $_;
if ($is_inside =~ /E0$/) {
print OUT join(" ", #lines), "\n";
#lines = ();
}
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
baz
ERROR IF PRESENT IN OUTPUT!
This code accumulates in #lines only those lines within a Syn_Name chunk, prints the chunk, and clears out #lines when we see the terminator. The output is now
Syn_Name foo bar baz
One more edge case
Finally, what happens if we see Syn_Name at the end of the file but without a terminating blank line? That may be impossible with your data, but in case you need to handle it, you'll want to use Perl's eof operator.
eof FILEHANDLE
eof
Returns 1 if the next read on FILEHANDLE will return end of file or if FILEHANDLE is not open … An eof without an argument uses the last file read.
So we terminate on either a blank line or end of file.
#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*FILE = *DATA;
*OUT = *STDOUT;
my #lines;
while (<FILE>)
{
s/\s+$//;
#if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
if (my $is_inside = /Syn_Name/ .. /^\s*$/ || eof) {
push #lines, $_;
if ($is_inside =~ /E0$/) {
print OUT join(" ", #lines), "\n";
#lines = ();
}
}
}
__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar
YOU CANT SEE ME!
Syn_Name
quux
potrzebie
Output:
Syn_Name foo bar
Syn_Name quux potrzebie
Here instead of chomp, the code removes any trailing invisible whitespace at the ends of lines. This will make sure spacing between joined lines is uniform even if the input is a little sloppy.
Without the eof check, the program does not print the latter line, which you can see by commenting out the active conditional and uncommenting the other.
Another simplified version:
foreach (grep {chomp; /Syn_Name/ .. /^\s*$/ } <FILE>) {
print OUT;
print OUT "\n" if /^\s*$/;
}

get last few lines of file stored in variable

How could I get the last few lines of a file that is stored in a variable? On linux I would use the tail command if it was in a file.
1) How can I do this in perl if the data is in a file?
2) How can I do this if the content of the file is in a variable?
To read the end of a file, seek near the end of the file and begin reading. For example,
open my $fh, '<', $file;
seek $fh, -1000, 2;
my #lines = <$fh>;
close $fh;
print "Last 5 lines of $file are: ", #lines[-5 .. -1];
Depending on what is in the file or how many lines you want to look at, you may want to use a different magic number than -1000 above.
You could do something similar with a variable, either
open my $fh, '<', \$the_variable;
seek $fh, -1000, 2;
or just
open my $fh, '<', \substr($the_variable, -1000);
will give you an I/O handle that produces the last 1000 characters in $the_variable.
The File::ReadBackwards module on the CPAN is probably what you want. You can use it thus. This will print the last three lines in the file:
use File::ReadBackwards
my $bw = File::ReadBackwards->new("some_file");
print reverse map { $bw->readline() } (1 .. 3);
Internally, it seek()s to near the end of the file and looks for line endings, so it should be fairly efficient with memory, even with very big files.
To some extent, that depends how big the file is, and how many lines you want. If it is going to be very big you need to be careful, because reading it all into memory will take a lot longer than just reading the last part of the file.
If it is small. the easiest way is probably to File::Slurp it into memory, split by record delimiters, and keep the last n records. In effect, something like:
# first line if not yet in a string
my $string = File::Slurp::read_file($filename);
my #lines = split(/\n/, $string);
print join("\n", #lines[-10..-1])
If it is large, too large to find into memory, you might be better to use file system operations directly. When I did this, I opened the file and used seek() and read the last 4k or so of the file, and repeated backwards until I had enough data to get the number of records I needed.
Not a detailed answer, but the question could be a touch more specific.
I know this is an old question, but I found it while looking for a way to search for a pattern in the first and last k lines of a file.
For the tail part, in addition to seek (if the file is seekable), it saves some memory to use a rotating buffer, as follows (returns the last k lines, or less if fewer than $k are available):
my $i = 0; my #a;
while (<$fh>) {
$a[$i++ % $k] = $_;
}
my #tail = splice #a,0,$i % $k;
splice #a,#a,0,#tail;
return #a;
A lot has already been stated on the file side, but if it's already in a string, you can use the following regex:
my ($lines) = $str ~= /
(
(?:
(?:(?<=^)|(?<=\n)) # match beginning of line (separated due to variable lookbehind limitation)
[^\n]*+ # match the line
(?:\n|$) # match the end of the line
){0,5}+ # match at least 0 and at most 5 lines
)$ # match must be from end of the string
/sx # s = treat string as single line
# x = allow whitespace and comments
This runs extremely fast. Benchmarking shows between 40-90% faster compared to the split/join method (variable due to current load on machine). This is presumably due to less memory manipulations. Something you might want to consider if speed is essential. Otherwise, it's just interesting.