Usage of Range operator in perl - perl

I have the following code especially the condition in the if block and how the id is being fetched, to read the below text in the file and display the ids as mentioned below:
Using a Range operator ..:
use strict;
use warnings;
use autodie;
#open my $fh, '<', 'sha.log';
my $fh = \*DATA;
my #work_items;
while (<$fh>) {
if ( my $range = /Work items:/ ... !/^\s*\(\d+\) (\d+)/ ) {
push #work_items, $1 if $range > 1 && $range !~ /E/;
}
}
print "#work_items\n";
Text in the file
__DATA__
Change sets:
(0345) ---$User1 "test12"
Component: (0465) "textfiles1"
Modified: 14-Sep-2014 02:17 PM
Changes:
---c- (0574) /<unresolved>/sha.txt
Work items:
(0466) 90516 "test defect
(0467) 90517 "test defect
Change sets:
(0345) ---$User1 "test12"
Component: (0465) "textfiles1"
Modified: 14-Sep-2014 02:17 PM
Changes:
---c- (0574) /<unresolved>/sha.txt
Work items:
(0468) 90518 "test defect
Outputs:
90516 90517 90518
Question: Range operator is used with two dots why it is being used with 3 dots here??

First, its not really the range operator; it's known as the flip-flop operator when used in scalar context. And like all symbolic operators, it's documented in perlop.
... is almost the same thing as ... When ... is used instead of .., the end condition isn't tested on the same pass as the start condition.
$ perl -E'for(qw( a b a c a d a )) { say if $_ eq "a" .. $_ eq "a"; }'
a # Start and stop at the first 'a'
a # Start and stop at the second 'a'
a # Start and stop at the third 'a'
a # Start and stop at the fourth 'a'
$ perl -E'for(qw( a b a c a d a )) { say if $_ eq "a" ... $_ eq "a"; }'
a # Start at the first 'a'
b
a # Stop at the second 'a'
a # Start at the third 'a'
d
a # Stop at the fourth 'a'

Per http://perldoc.perl.org/perlop.html#Range-Operators:
If you don't want it to test the right operand until the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does.
So, this:
/Work items:/ ... !/^\s*\(\d+\) (\d+)/
means "from a line that matches /Work items:/ until the next subsequent line that doesn't match /^\s*\(\d+\) (\d+)/", whereas this:
/Work items:/ .. !/^\s*\(\d+\) (\d+)/
would mean "from a line that matches /Work items:/ until the line that doesn't match /^\s*\(\d+\) (\d+)/" (even if it's the same one).

Related

How can I extract certain lines with Perl?

I have string like this
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS
I want the value of "Newly generated warnings:" which should be
A has warnings
B has warning
I am new to perl and don't know how to use regex in Perl. Kindly help.
Here are two options:
split the string into lines, and filter the lines array using grep
use a regex on the multi-line string
my $str = "
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS";
my #lines = grep{ /\w+ has warning/ } split(/\n/, $str);
print "Option 1 using split and grep:\n";
print join("\n", #lines);
$str =~ s/^.*Newly generated warnings:\s*(.*?)\s+Status:.*$/$1/sm;
print "\n\nOption 2 using regex:\n";
print $str;
Output:
Option 1 using split and grep:
A has warnings
B has warning
Option 2 using regex:
A has warnings
B has warning
Explanation for option 1:
split(/\n/, $str) - split the string into an array of strings
grep{ /\w+ has warning/ } - filter using a grep regex to lines of interest
Note: This is short for the standard regex test $_ =~ /\w+ has warning/. The $_ contains the string element, e.g. line.
Explanation for option 1:
$str =~ s/search/replace/ - standard search and replace on a string
Note: Unlike in many other languages, strings are mutable in Perl
s/^.*Newly generated warnings:\s*(.*?)\s+Status:.*$/$1/sm:
search:
^.* - from beginning of string grab everything until:
Newly generated warnings:
\s+ - scan over whitespace
(.*?) - capture group 1 with non-greedy scan
\s+Status:.*$ - scan over whitespace, Status:, and everything else to end of string
replace:
$1 - use capture group 1
flags:
s - dot matches newlines
m - multiple lines, e.g. ^ is start of string, $ end of string
This sort of problem where you can read up to the line that has the section that you want and do nothing with those lines, then read lines until the start of the stuff you do want, keeping those lines:
# ignore all these lines
while( <DATA> ) {
last if /Newly generated warnings/;
}
# process all these lines
while( <DATA> ) {
last if /\A\s*\z/; # stop of the first blank line
print; # do whatever you need
}
__END__
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS
That's reading from a filehandle. Handling a string is trivially easy because you can open a filehandle on a string so you can treat the string line-by-line:
my $string = <<'HERE';
Modified files: ['A', 'B']
File: /tpl/src/vlan/VlanInterfaceValidator.cpp
Newly generated warnings:
A has warnings
B has warning
Status: PASS
HERE
open my $fh, '<', \ $string;
while( <$fh> ) {
last if /Newly generated warnings/;
}
while( <$fh> ) {
last if /\A\s*\z/;
print; # do whatever you need
}

How to grep a pattern for each and every element of an array

I have buffered a complete file in an array soc_data(#soc_data).Now i am trying to grep a pattern in each and every line of the file or we can say each and every element of the array .
i have used this .But this is not working properly and flag is always 1 , but it should toggle as the pattern is not present in every line but only in few lines.
my $flag = 0 ;
if(grep(" ".$val." ".$instance, #soc_data)){
$flag = 1 ;
}
else {
$flag = 0 ;
}
Please suggest the way to do it and the mistake that i am doing here .
It cannot toggle with the code you have written. Your code checks if there is any occurrence of the pattern in the array. grep will look at each line (element in the array) in turn, and return a list of the ones that match the pattern. Then your flag is set and you are done.
my #list = ( 1 .. 20 );
my #match = grep /3/, #list;
print "#match";
# 3 13
If you want to do each line individually, you need to loop over the array yourself in an outer loop, and then do a match. No need for grep then.
foreach my $line (#soc_data) {
my $flag = 0;
$flag = 1 if $line =~ m/ $val $instance/; # you might want to use \s
# do things with $flag
}
grep is often used to apply regular expressions to a list of strings, but the grep in Perl is more general than that, and you must explicitly use a regular expression if that's what you want to use grep for. Compare:
#list = (7,8,9,10);
print grep /1/, #list; # 10 -- only "10" matches /1/
print grep 1, #list; # 7 8 9 10, EXPR is always true
Your use of grep is more like the 2nd case. The first argument to grep is just a non-empty and non-zero scalar, so it is always "true" and the return value of grep is always every element in the list. I suspect you want something like
if (grep / $val $instance/, #soc_data) { ... }
or if $val and $instance might have regexp metacharacters,
if (grep / \Q$val\E \Q$instance\E/, #soc_data) { ... }
(I don't really know what you mean by "toggling", so I probably haven't properly addressed that part of your question)

How to remove the word using array index in perl?

How to remove the certain words using array index for the following input using Perl?
file.txt
BOCK:top:blk1
BOCK:block2:blk2
BOCK:test:blk3
After join:
/BOCK/top/blk1
/BOCK/block2/blk2
/BOCK/test/blk3
Expected output:
/BOCK/blk1
/BOCK/blk2
/BOCK/blk3
Code which I had tried:
use warnings;
use strict;
my #words;
open(my $infile,'<','file.txt') or die $!;
while(<$infile>)
{
push(#words,split /\:/);
}
my $word=join("/",#words);
print $word;
close ($infile);
foreach my $word(#words)
{
if($word=~ /(\w+\/\w+\/\w+)/)
{
print $word;
}
}
The easiest way to get rid of the middle element is to use splice.
while ( my $line = <DATA> ) {
my #words;
push( #words, split( /:/, $line ) ); # colon has no special meaning
splice( #words, 1, 1 );
print '/', join( '/', #words );
}
__DATA__
BOCK:top:blk1
BOCK:block2:blk2
BOCK:test:blk3
I assumed that you want to do that for every line. The code that you had did something else. Because your #words is declared outside of the while loop it gets bigger withe every iteration, and every third element contains a newline \n character because you never chomp. Then you build create one long $word that has all the words from all lines joined with a slash /. Afterwards you try to match that for three words joined with slashes, which works. But you only have one capture group, so your $3 is never defined.
The code can be simplified and cleaned up, even to the point of
my #paths = map { '/' . join '/', (split ':')[0,-1] } <$infile>;
print "$_\n" for #paths;
The map imposes the list context on the filehandle read, which thus returns a list of all lines from the file. The code in map's block is applied to each element: it splits the line and takes the first and last element of that list, joins them, and then prepends the leading /. Inside the block the line is in the variable $_, what split uses as default. The resulting list is returned and assigned to #path.
A number of errors in the posted code have been explained clearly in simbabque's answer.
Thanks to jm666 in a comment for catching the requirement for the leading /.
The above can also be used for a one-liner
perl -F: -lane'print "/" . join "/", #F[0,-1]' < file.txt > out.txt
The -a turns on autosplit mode (with -n or -p), whereby each line is split and available in #F. The -F switch allows to specify the pattern to split on, here :, instead of the default space.
See switches in perlrun.

Whats wrong with this code to read file?

I have been trying to read a file called "perlthisfile.txt" which is basically the output of nmap on my computer.
I want to get only the ip addresses printed out, so i wrote the following code but it is not working:
#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
print"\n running \n";
open (MYFILE, 'perlthisfile.txt') or die "Cannot open file\n";
while(<MYFILE>) {
chomp;
my #value = split(' ', <MYFILE>);
print"\n before foreach \n";
foreach my $val (#value) {
if (looks_like_number($val)) {
print "\n looks like number block \n";
if ($val == /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5})/) {
print "\n$val\n";
}
}
}
}
close(MYFILE);
exit 0;
And when i ran this code the output was:
running
before foreach
before foreach
looks like number block
before foreach
looks like number block
before foreach
looks like number block
My perlthisfile.txt:
Starting Nmap 6.00 ( http://nmap.org ) at 2013-10-16 22:59 EST
Nmap scan report for BoB2.iiNet (10.1.1.1)
Nmap scan report for android-fbff3c3812154cdc (10.1.1.3)
All 1000 scanned ports on android-fbff3c3812154cdc (10.1.1.3) are closed
Nmap scan report for 10.1.1.5
All 1000 scanned ports on 10.1.1.5 are open|filtered
Nmap scan report for 10.1.1.6
All 1000 scanned ports on 10.1.1.6 are closed
Several issues here. As #toolic said, calling <MYFILE> inside the split is probably not what you want - it will read the next record from the file, use $_ instead.
Also, you are using == with a regex, you should use the binding operator, =~ (== is only used for numeric comparisons in Perl):
if ($val =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5})/){
I suggest that looks_like_number is redundant if the regex works. I suspect that you are using it because == gives something like isn't numeric in numeric eq (==) depending on the version of perl you are using.
You had a few errors, one of which is regex which should have optional part for port number (: and following \d{1,5})
#!/usr/bin/perl
use strict;
use warnings;
open (my $MYFILE, '<', 'perlthisfile.txt') or die $!;
my $looks_like_ip = qr/( \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} (?: : \d{1,5})? )/x;
while (<$MYFILE>) {
chomp;
my #value = split;
print"\n before foreach \n";
foreach my $val (#value) {
if (my ($match) = $val =~ /$looks_like_ip/){
print "\n$match\n";
}
# else { print "$val doesn't contain IP\n" }
}
}
close($MYFILE) or warn $!;
If this is what it looks to be, which is a quick hack to extract IPs, you might get away with something simple such as:
perl -nlwe '/((?:\d+\.)+\d+)/ && print $1' perlthisfile.txt
Which is to say, not a very strict regex by any means, it just matches numbers joined by periods. If you'd like to only print unique IPs, you can make use of a hash to dedupe:
perl -nlwe '/((?:\d+\.)+\d+)/ && !$seen{$1}++ && print $1" perlthisfile.txt
With a slightly tighter regex that also matches port numbers:
perl -nlwe '/((?:\d+[\.:]){3,4}\d+)/ && print $1' perlthisfile.txt
This will disallow shorter chains of numbers, and allow for a port number.
This last regex explained:
/( # opening parenthesis, starts a string capture
(?: # a non-capturing parenthesis
\d+ # match a number, repeated one or more times
[\.:] # [ ... ] is a character class, it matches one of the literal
# characters inside it, and only one time
){3,4} # closing the non-capturing parenthesis, adding a quantifier
# that says this parenthesis can match 3 or 4 times
\d+ # match one or more numbers
)/x # close capturing parenthesis (added `/x` switch)
The /x switch is just so that you can use the above regex as-is, with comments and whitespace.
The logic behind this is simply: We want a string consisting of a number followed by a period or a colon. We want this string 3 or 4 times. End with another number.
The + and {3,4} are quantifiers, they dictate how many times the item to the left of it is supposed to match. By default, every item matches one time, but by using a quantifier you can change that. + is shorthand for {1,}, and you also have:
? -> {1,0}
* -> {0,}
The syntax is {min,max}, and when a number is missing, that means as many times as possible.

In Perl can i apply 'grep' command on the data i captured using flip-flop operator directly?

I need to find the 'number' of occurrences of particular words (C7STH, C7ST2C) that come in the output of a command. The command starts and ends with a 'fixed' text - START & END like below. This command is repeated many times against different nodes in the log file.
...
START
SLC ACL PARMG ST SDL SLI
0 A1 17 C7STH-1&&-31 MSC19-0/RTLTB2-385
1 A1 17 C7STH-65&&-95 MSC19-0/RTLTB2-1697
SLC ACL PARMG ST SDL SLI
0 A2 0 C7ST2C-4 ETRC18-0/RTLTB2-417
1 A2 0 C7ST2C-5 ETRC18-0/RTLTB2-449
2 A2 0 C7ST2C-6 ETRC18-0/RTLTB2-961
...
END
....
I am using flip-flop operator (if (/^START$/ .. /^END$/)to get each command output. Now
Is there a way to do 'grep' on this data without going line by line? Like can i get all the text between 'START' and 'END' into an array and do 'grep' on this etc?
Also is it 'ok' to have multiple levels of if blocks with flip-flop operator from performance point of view?
This would be a simple solution:
my $number = grep {/particular word/} grep {/START/../END/} <>;
(Since you don't provide a code sample I've used the diamond operator and assumed the log file is passed as an argument to the script. Replace with file handle if needed.)
grep {/START/../END/} <> creates a list of elements within and including the delimiters, and grep {/particular word/} works on that list.
From a performance point of view you'd be better off with
for (<>) {
$number++ if /START/../END/ and /a/;
}
Note that you have to use and instead of && or wrap your flip-flop expression in parentheses because of operator precedence.
Combining both:
my $number = grep {/START/../END/ and /particular word/} <>;
Maybe your looking for something along these lines:
#!/usr/bin/env perl
use strict;
use warnings;
my $word = q(stuff);
my #data;
while (<DATA>) {
if ( /^START/../^END/ ) {
chomp;
push #data, $_ unless /^(?:START|END)/;
}
if ( /^END/ ) {
my $str = "#data";
print +(scalar grep {/$word/} (split / /,$str)),
" occurances of '$word'\n";
#data = ();
}
}
__DATA__
this is a line
START of my stuff
more my stuff
and still more stuff
and lastly, yet more stuff
END of my stuff
this is another line
START again
stuff stuff stuff stuff
yet more stuff
END again
...which would output:
3 occurances of 'stuff'
5 occurances of 'stuff'
Like can i get all the text between 'START' and 'END' into an array and do 'grep' on this etc?
(push #ar,$_) if /START/ .. /END/;
grep {/word/ #ar};
Also is it 'ok' to have multiple levels of if blocks with flip-flop operator from performance point of view?
As long as you are not working for NASA.