How can I extract certain lines with Perl?

How can I extract certain lines with Perl? - perl

I have string like this
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS
I want the value of "Newly generated warnings:" which should be
A has warnings
B has warning
I am new to perl and don't know how to use regex in Perl. Kindly help.

Here are two options:
split the string into lines, and filter the lines array using grep
use a regex on the multi-line string
my $str = "
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS";
my #lines = grep{ /\w+ has warning/ } split(/\n/, $str);
print "Option 1 using split and grep:\n";
print join("\n", #lines);
$str =~ s/^.*Newly generated warnings:\s*(.*?)\s+Status:.*$/$1/sm;
print "\n\nOption 2 using regex:\n";
print $str;
Output:
Option 1 using split and grep:
A has warnings
B has warning
Option 2 using regex:
A has warnings
B has warning
Explanation for option 1:
split(/\n/, $str) - split the string into an array of strings
grep{ /\w+ has warning/ } - filter using a grep regex to lines of interest
Note: This is short for the standard regex test $_ =~ /\w+ has warning/. The $_ contains the string element, e.g. line.
Explanation for option 1:
$str =~ s/search/replace/ - standard search and replace on a string
Note: Unlike in many other languages, strings are mutable in Perl
s/^.*Newly generated warnings:\s*(.*?)\s+Status:.*$/$1/sm:
search:
^.* - from beginning of string grab everything until:
Newly generated warnings:
\s+ - scan over whitespace
(.*?) - capture group 1 with non-greedy scan
\s+Status:.*$ - scan over whitespace, Status:, and everything else to end of string
replace:
$1 - use capture group 1
flags:
s - dot matches newlines
m - multiple lines, e.g. ^ is start of string, $ end of string

This sort of problem where you can read up to the line that has the section that you want and do nothing with those lines, then read lines until the start of the stuff you do want, keeping those lines:
# ignore all these lines
while( <DATA> ) {
last if /Newly generated warnings/;
}
# process all these lines
while( <DATA> ) {
last if /\A\s*\z/; # stop of the first blank line
print; # do whatever you need
}
__END__
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS
That's reading from a filehandle. Handling a string is trivially easy because you can open a filehandle on a string so you can treat the string line-by-line:
my $string = <<'HERE';
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS
HERE
open my $fh, '<', \ $string;
while( <$fh> ) {
last if /Newly generated warnings/;
}
while( <$fh> ) {
last if /\A\s*\z/;
print; # do whatever you need
}

Related

print lines after finding a key word in perl

I have a variable $string and i want to print all the lines after I find a keyword in the line (including the line with keyword)
$string=~ /apple /;
I'm using this regexp to find the key word but I do not how to print lines after this keyword.

It's not really clear where your data is coming from. Let's assume it's a string containing newlines. Let's start by splitting it into an array.
my #string = split /\n/, $string;
We can then use the flip-flop operator to decide which lines to print. I'm using \0 as a regex that is very unlikely to match any string (so, effectively, it's always false).
for (#string) {
say if /apple / .. /\0/;
}

Just keep a flag variable, set it to true when you see the string, print if the flag is true.
perl -ne 'print if $seen ||= /apple/'

If your data in scalar variable we can use several methods
Recommended method
($matching) = $string=~ /([^\n]*apple.+)/s;
print "$matching\n";
And there is another way to do it
$string=~ /[^\n]*apple.+/s;
print $&; #it will print the data which is match.
If you reading the data from file, try the following
while (<$fh>)
{
if(/apple/)
{
print <$fh>;
}
}
Or else try the following one liner
perl -ne 'print <> and exit if(/apple/);' file.txt

How to remove the word using array index in perl?

How to remove the certain words using array index for the following input using Perl?
file.txt
BOCK:top:blk1
BOCK:block2:blk2
BOCK:test:blk3
After join:
/BOCK/top/blk1
/BOCK/block2/blk2
/BOCK/test/blk3
Expected output:
/BOCK/blk1
/BOCK/blk2
/BOCK/blk3
Code which I had tried:
use warnings;
use strict;
my #words;
open(my $infile,'<','file.txt') or die $!;
while(<$infile>)
{
push(#words,split /\:/);
}
my $word=join("/",#words);
print $word;
close ($infile);
foreach my $word(#words)
{
if($word=~ /(\w+\/\w+\/\w+)/)
{
print $word;
}
}

The easiest way to get rid of the middle element is to use splice.
while ( my $line = <DATA> ) {
my #words;
push( #words, split( /:/, $line ) ); # colon has no special meaning
splice( #words, 1, 1 );
print '/', join( '/', #words );
}
__DATA__
BOCK:top:blk1
BOCK:block2:blk2
BOCK:test:blk3
I assumed that you want to do that for every line. The code that you had did something else. Because your #words is declared outside of the while loop it gets bigger withe every iteration, and every third element contains a newline \n character because you never chomp. Then you build create one long $word that has all the words from all lines joined with a slash /. Afterwards you try to match that for three words joined with slashes, which works. But you only have one capture group, so your $3 is never defined.

The code can be simplified and cleaned up, even to the point of
my #paths = map { '/' . join '/', (split ':')[0,-1] } <$infile>;
print "$_\n" for #paths;
The map imposes the list context on the filehandle read, which thus returns a list of all lines from the file. The code in map's block is applied to each element: it splits the line and takes the first and last element of that list, joins them, and then prepends the leading /. Inside the block the line is in the variable $_, what split uses as default. The resulting list is returned and assigned to #path.
A number of errors in the posted code have been explained clearly in simbabque's answer.
Thanks to jm666 in a comment for catching the requirement for the leading /.
The above can also be used for a one-liner
perl -F: -lane'print "/" . join "/", #F[0,-1]' < file.txt > out.txt
The -a turns on autosplit mode (with -n or -p), whereby each line is split and available in #F. The -F switch allows to specify the pattern to split on, here :, instead of the default space.
See switches in perlrun.

Perl: Find a match, remove the same lines, and to get the last field

Being a Perl newbie, please pardon me for asking this basic question.
I have a text file #server1 that shows a bunch of sentences (white space is the field separator) on many lines in the file.
I needed to match lines with my keyword, remove the same lines, and extract only the last field, so I have tried with:
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my #uniqmatchedline = split(/ /, #allmatchedlines);
my $lastfield = $uniqmatchedline[-1]\n";
print "$lastfield\n";
and it gives me the output showing:
1
I don't know why it's giving me just "1".
Could someone please explain why I'm getting "1" and how I can get the last field of the matched line correctly?
Thank you!

my #uniqmatchedline = split(/ /, #allmatchedlines);
You're getting "1" because split takes a scalar, not an array. An array in scalar context returns the number of elements.
You need to split on each individual line. Something like this:
my #uniqmatchedline = map { split(/ /, $_) } #allmatchedlines;

There are two issues with your code:
split is expecting a scalar value (string) to split on; if you are passing an array, it will convert the array to scalar (which is just the array length)
You did not have a way to remove same lines
To address these, the following code should work (not tested as no data):
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my %existing;
my #uniqmatchedline = grep !$existing{$_}++, #allmatchedlines; #this will return the unique lines
my #lastfields = map { ((split / /, $_)[-1]) . "\n" } #uniqmatchedline ; #this maps the last field in each line into an array
print for #lastfields;

Apart from two errors in the code, I find the statement "remove the same lines and extract only the last field" unclear. Once duplicate matching lines are removed, there may still be multiple distinct sentences with the pattern.
Until a clarification comes, here is code that picks the last field from the last such sentence.
use warnings 'all';
use strict;
use List::MoreUtils qw(uniq)
my $file = '/tmp/myfile.txt';
my $cmd = "ssh user1\#server1 cat $file";
open my $fh, '-|', $cmd // die "Error opening $cmd: $!"; # /
while (<$fh>) {
chomp;
push #allmatchedlines, $_ if /mysearch/;
}
close(output1);
my #unique_matched_lines = uniq #allmatchedlines;
my $lastfield = ( split ' ', $unique_matched_lines[-1] )[-1];
print $lastfield, "\n";
I changed to the three-argument open, with error checking. Recall that open for a process involves a fork and returns pid, so an "error" doesn't at all relate to what happened with the command itself. See open. (The # / merely turns off wrong syntax highlighting.) Also note that # under "..." indicates an array and thus need be escaped.
The (default) pattern ' ' used in split splits on any amount of whitespace. The regex / / turns off this behavior and splits on a single space. You most likely want to use ' '.
For more comments please see the original post below.
The statement #allmatchedlines = $_ if /mysearch/; on every iteration assigns to the array, overwriting whatever has been in it. So you end up with only the last line that matched mysearch. You want push #allmatchedlines, $_ ... to get all those lines.
Also, as shown in the answer by Justin Schell, split needs a scalar so it is taking the length of #allmatchedlines – which is 1 as explained above. You should have
my #words_in_matched_lines = map { split } #allmatchedlines;
When all this is straightened out, you'll have words in the array #uniqmatchedline and if that is the intention then its name is misleading.
To get unique elements of the array you can use the module List::MoreUtils
use List::MoreUtils qw(uniq);
my #unique_elems = uniq #whole_array;

Regular expression to print a string from a command outpout

I have written a function that uses regex and prints the required string from a command output.
The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ?
sub check_summary{
my ($self) = #_;
my $type = 0;
my $return_type = 0;
my $ipsla = $self->{'ssh_obj'}->exec('show ip sla');
foreach my $line( $ipsla) {
if ( $line =~ m/(icmp)/ ) {
$type = $1;
}
if ( $line =~ m/(OK)/ ) {
$return_type = $1;
}
}
INFO ($type,$return_type);
}
command Ouptut :
PSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
ID Type Destination Stats Return Last
(ms) Code Run
-----------------------------------------------------------------------
*1 icmp 192.168.25.14 RTT=1 OK 1 second ago

Updated to some clarifications -- we need only the last line
As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need.
We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop
my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string
# my #lines = #$ipsla; # if $ipsla is an arrayref
pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end
my ($type, $return_type) = (split ' ', $lines[-1])[1,4];
Here are some comments on the code. Let me know if more is needed.
We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them.
Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as
while ( $lines[-1] !~ /\S/ ) { pop #lines }
Original version, edited for clarifications. It is also a valid way to do what is needed.
I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop
my $data_start;
foreach (#lines)
{
if (not $data_start) {
$data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces
}
else {
my ($type, $return_type) = (split)[1,4];
print "type: $type, return code: $return_type\n";
}
}
This is a sketch until clarifications come. It also assumes that there are more lines than one.

I'm not sure of all possibilities of output from that command so my regular expression may need tweaking.
I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes.
use strict;
use warnings;
sub check_summary {
my ($self) = #_;
my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef.
my $ipsla = $self->{ssh_obj}->exec('show ip sla');
foreach my $line (#$ipsla) {
chomp $line; # Remove newlines from last field
if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) {
$results{Code} = $1; # Code prefixing ID
$results{ID} = $2;
$results{Type} = $3;
$results{Destination} = $4;
$results{Stats} = $5;
$results{Return_Code} = $6;
$results{Last_Run} = $7;
}
}
# Testing
use Data::Dumper;
print Dumper(\%results);
}
# Demonstrate
check_summary();
# Commented for testing
#INFO ($type,$return_type);
Worked on the submitted test line.
EDIT:
Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them.
Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables...
$1
$2

Usage of Range operator in perl

I have the following code especially the condition in the if block and how the id is being fetched, to read the below text in the file and display the ids as mentioned below:
Using a Range operator ..:
use strict;
use warnings;
use autodie;
#open my $fh, '<', 'sha.log';
my $fh = \*DATA;
my #work_items;
while (<$fh>) {
if ( my $range = /Work items:/ ... !/^\s*\(\d+\) (\d+)/ ) {
push #work_items, $1 if $range > 1 && $range !~ /E/;
}
}
print "#work_items\n";
Text in the file
__DATA__
Change sets:
(0345) ---$User1 "test12"
Component: (0465) "textfiles1"
Modified: 14-Sep-2014 02:17 PM
Changes:
---c- (0574) /<unresolved>/sha.txt
Work items:
(0466) 90516 "test defect
(0467) 90517 "test defect
Change sets:
(0345) ---$User1 "test12"
Component: (0465) "textfiles1"
Modified: 14-Sep-2014 02:17 PM
Changes:
---c- (0574) /<unresolved>/sha.txt
Work items:
(0468) 90518 "test defect
Outputs:
90516 90517 90518
Question: Range operator is used with two dots why it is being used with 3 dots here??

First, its not really the range operator; it's known as the flip-flop operator when used in scalar context. And like all symbolic operators, it's documented in perlop.
... is almost the same thing as ... When ... is used instead of .., the end condition isn't tested on the same pass as the start condition.
$ perl -E'for(qw( a b a c a d a )) { say if $_ eq "a" .. $_ eq "a"; }'
a # Start and stop at the first 'a'
a # Start and stop at the second 'a'
a # Start and stop at the third 'a'
a # Start and stop at the fourth 'a'
$ perl -E'for(qw( a b a c a d a )) { say if $_ eq "a" ... $_ eq "a"; }'
a # Start at the first 'a'
b
a # Stop at the second 'a'
a # Start at the third 'a'
d
a # Stop at the fourth 'a'

Per http://perldoc.perl.org/perlop.html#Range-Operators:
If you don't want it to test the right operand until the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does.
So, this:
/Work items:/ ... !/^\s*\(\d+\) (\d+)/
means "from a line that matches /Work items:/ until the next subsequent line that doesn't match /^\s*\(\d+\) (\d+)/", whereas this:
/Work items:/ .. !/^\s*\(\d+\) (\d+)/
would mean "from a line that matches /Work items:/ until the line that doesn't match /^\s*\(\d+\) (\d+)/" (even if it's the same one).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I extract certain lines with Perl? - perl

Related

print lines after finding a key word in perl

How to remove the word using array index in perl?

Perl: Find a match, remove the same lines, and to get the last field

Regular expression to print a string from a command outpout

Usage of Range operator in perl

Categories

Resources