Search array of strings for specific words - perl

Firstly, I don't know Perl at all, and need a reasonably quick answer on this. I have the result of running a command stored in an array:
my #result = `$command`;
What I need to do is search the array to see if any element contains the word "Merge" or the word "changed" (both case insensitive).
Can someone advise please?

The tool for the job here is grep - a function that allows you to specify a filter against a list. You can use it much like Unix grep, but it'll also allow for more complex tests (e.g. code to run).
In your case:
my #matches = grep { /merge|changed/i } #result;
if ( #matches ) {
print "One or more lines matched\n";
}

You can do this with a regular expression. In this example if the line match 'merge' or 'changed' (insensitive of course) the line is printed :
#!/usr/bin/perl
use strict;
use warnings;
my #result = `command`;
foreach my $line (#result){
if ($line =~ /merge|changed/i){
print $line;
}
}

Related

Using binding operator in perl

I am working on a program in perl and I am trying to combine more than one regex in a binding operator. I have tried using the syntax below but it's not working. I will like to know if there is any other way to go with this.
$in =~ (s/pattern/replacement/)||(s/pattern/replacement/)||...
You can often get a clue about what the Perl makes of some code using B::Deparse.
$ perl -MO=Deparse -E'$in =~ (s/pattern1/replacement1/)||(s/pattern2/replacement2/)'
[ ... snip ... ]
s/pattern2/replacement2/u unless $in =~ s/pattern1/replacement1/u;
-e syntax OK
So it's attempting your first substitution on $in. And if that fails, it is then trying your second substitution. But it's not using $in for the second substitution, it's using $_ instead.
You're running up against precedence issues here. Perl interprets your code as:
($in =~ s/pattern1/replacement1/) or (s/pattern2/replacement2/)
Notice that the opening parenthesis has moved before $in.
As others have pointed out, it's best to use a loop approach here. But I thought it might be useful to explain why your version didn't work.
Update: To be clear, if you wanted to use syntax like this, then you would need:
($in =~ s/pattern1/replacement1/) or
($in =~ s/pattern2/replacement2/);
Note that I've included $in =~ in each expression. At this point, it becomes obvious (I hope) why the looping solution is better.
However, because or is a short-circuiting operator, this statement will stop after the first successful substitution. I assumed that's what you wanted from your use of it in your original code. If that's not what you want, then you need to either switch to using and or (better, in my opinion) break them out into separate statements.
$in =~ s/pattern1/replacement1/;
$in =~ s/pattern2/replacement2/;
The closest you could get with a syntax looking similar to that would be
s/one/ONE/ or
s/two/TWO/ or
...
s/ten/TEN/ for $str;
This will attempt each substitution in turn, once only, stopping after the first successful one.
Use for to "topicalize" (alias $_ to your variable).
for ($in) {
s/pattern/replacement/;
s/pattern/replacement/;
}
A simpler way might be to create an array of all such patterns and replacements, then simply iterate through your array applying the substitution one pattern at a time.
my $in = "some string you want to modify";
my #patterns = (
['pattern to match', 'replacement string'],
# ...
);
$in = replace_many($in, \#patterns);
sub replace_many {
my ($in, $replacements) = #_;
foreach my $replacement ( #$replacements ) {
my ($pattern, $replace_string) = #$replacement;
$in =~ s/$pattern/$replace_string/;
}
return $in;
}
It's not at all clear what you need, and it's not at all clear that you can accomplish what you appear to want by the means you suggest. The OR operator is a short circuit operator, and you may not want this behavior. Please give an example of the input you expect, and the output you desire, hopefully several examples of each. Meanwhile, here is a test script.
use warnings;
use strict;
my $in1 = 'George Walker Bush';
my $in2 = 'George Walker Bush';
my $in3 = 'George Walker Bush';
my $in4 = 'George Walker Bush';
(my $out1 = $in1) =~ s/e/*/g;
print "out1 = $out1 \n";
(my $out2 = $in2) =~ s/Bush/Obama/;
print "out2 = $out2 \n";
(my $out3 = $in3) =~ s/(George)|(Bush)/Obama/g;
print "out3 = $out3\n";
$in4 =~ /(George)|(Walker)|(Bush)/g;
print "$1 - $2 - $3\n";
exit(0);
You will notice in the last case that only the first OR operator matches in the regular expression. If you wanted to replace 'George Walker Bush' with Barack Hussein Obama', you could do that easily enough, but you would also replace 'George Washington'with 'Barack Washington' - is this what you want? Here is the output of the script:
out1 = G*org* Walk*r Bush
out2 = George Walker Obama
out3 = Obama Walker Obama
Use of uninitialized value $2 in concatenation (.) or string at pq_151111a.plx line 19.
Use of uninitialized value $3 in concatenation (.) or string at pq_151111a.plx line 19.
George - -

Perl - Need to append duplicates in a file and write unique value only

I have searched a fair bit and hope I'm not duplicating something someone has already asked. I have what amounts to a CSV that is specifically formatted (as required by a vendor). There are four values that are being delimited as follows:
"Name","Description","Tag","IPAddresses"
The list is quite long (and there are ~150 unique names--only 2 in the sample below) but it basically looks like this:
"2B_AppName-Environment","desc","tag","192.168.1.1"
"2B_AppName-Environment","desc","tag","192.168.22.155"
"2B_AppName-Environment","desc","tag","10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4"
"6G_ServerName-AltEnv","desc","tag","192.192.192.40"
"6G_ServerName-AltEnv","desc","tag","192.168.50.5"
I am hoping for a way in Perl (or sed/awk, etc.) to come up with the following:
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
So basically, the resulting file will APPEND the duplicates to the first match -- there should only be one line per each app/server name with a list of comma-separated IP addresses just like what is shown above.
Note that the "Decription" and "Tag" fields don't need to be considered in the duplication removal/append logic -- let's assume these are blank for the example to make things easier. Also, in the vendor-supplied list, the "Name" entries are all already sorted to be together.
This short Perl program should suit you. It expects the path to the input CSV file as a parameter on the command line and prints the result to STDOUT. It keeps track of the appearance of new name fields in the #names array so that it can print the output in the order that each name first appears, and it takes the values for desc and tag from the first occurrence of each unique name.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({always_quote => 1, eol => "\n"});
my (#names, %data);
while (my $row = $csv->getline(*ARGV)) {
my $name = $row->[0];
if ($data{$name}) {
$data{$name}[3] .= ','.$row->[3];
}
else {
push #names, $name;
$data{$name} = $row;
}
}
for my $name (#names) {
$csv->print(*STDOUT, $data{$name});
}
output
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
Update
Here's a version that ignores any record that doesn't have a valid IPv4 address in the fourth field. I've used Regexp::Common as it's the simplest way to get complex regex patterns right. It may need installing on your system.
use strict;
use warnings;
use Text::CSV;
use Regexp::Common;
my $csv = Text::CSV->new({always_quote => 1, eol => "\n"});
my (#names, %data);
while (my $row = $csv->getline(*ARGV)) {
my ($name, $address) = #{$row}[0,3];
next unless $address =~ $RE{net}{IPv4};
if ($data{$name}) {
$data{$name}[3] .= ','.$address;
}
else {
push #names, $name;
$data{$name} = $row;
}
}
for my $name (#names) {
$csv->print(*STDOUT, $data{$name});
}
I would advise you to use a CSV parser like Text::CSV for this type of problem.
Borodin has already pasted a good example of how to do this.
One of the approaches that I'd advise you NOT to use are regular expressions.
The following one-liner demonstrates how one could do this, but this is a very fragile approach compared to an actual csv parser:
perl -0777 -ne '
while (m{^((.*)"[^"\n]*"\n(?:(?=\2).*\n)*)}mg) {
$s = $1;
$s =~ s/"\n.*"([^"\n]+)(?=")/,$1/g;
print $s
}' test.csv
Outputs:
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
Explanation:
Switches:
-0777: Slurp the entire file
-n: Creates a while(<>){...} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
while (m{^((.*)"[^"]*"\n(?:(?=\2).*\n)*)}mg): Separate text into matching sections.
$s =~ s/"\n.*"([^"\n]+)(?=")/,$1/g;: Join all ip addresses by a comma in matching sections.
print $s: Print the results.

How can I determine if an element exists in an array (perl)

I'm looping through an array, and I want to test if an element is found in another array.
In pseudo-code, what I'm trying to do is this:
foreach $term (#array1) {
if ($term is found in #array2) {
#do something here
}
}
I've got the "foreach" and the "do something here" parts down-pat ... but everything I've tried for the "if term is found in array" test does NOT work ...
I've tried grep:
if grep {/$term/} #array2 { #do something }
# this test always succeeds for values of $term that ARE NOT in #array2
if (grep(/$term/, #array2)) { #do something }
# this test likewise succeeds for values NOT IN the array
I've tried a couple different flavors of "converting the array to a hash" which many previous posts have indicated are so simple and easy ... and none of them have worked.
I am a long-time low-level user of perl, I understand just the basics of perl, do not understand all the fancy obfuscated code that comprises 99% of the solutions I read on the interwebs ... I would really, truly, honestly appreciate any answers that are explicit in the code and provide a step-by-step explanation of what the code is doing ...
... I seriously don't grok $_ and any other kind or type of hidden, understood, or implied value, variable, or function. I would really appreciate it if any examples or samples have all variables and functions named with clear terms ($term as opposed to $_) ... and describe with comments what the code is doing so I, in all my mentally deficient glory, may hope to possibly understand it some day. Please. :-)
...
I have an existing script which uses 'grep' somewhat succesfully:
$rc=grep(/$term/, #array);
if ($rc eq 0) { #something happens here }
but I applied that EXACT same code to my new script and it simply does NOT succeed properly ... i.e., it "succeeds" (rc = zero) when it tests a value of $term that I know is NOT present in the array being tested. I just don't get it.
The ONLY difference in my 'grep' approach between 'old' script and 'new' script is how I built the array ... in old script, I built array by reading in from a file:
#array=`cat file`;
whereas in new script I put the array inside the script itself (coz it's small) ... like this:
#array=("element1","element2","element3","element4");
How can that result in different output of the grep function? They're both bog-standard arrays! I don't get it!!!! :-(
########################################################################
addendum ... some clarifications or examples of my actual code:
########################################################################
The term I'm trying to match/find/grep is a word element, for example "word123".
This exercise was just intended to be a quick-n-dirty script to find some important info from a file full of junk, so I skip all the niceties (use strict, warnings, modules, subroutines) by choice ... this doesn't have to be elegant, just simple.
The term I'm searching for is stored in a variable which is instantiated via split:
foreach $line(#array1) {
chomp($line); # habit
# every line has multiple elements that I want to capture
($term1,$term2,$term3,$term4)=split(/\t/,$line);
# if a particular one of those terms is found in my other array 'array2'
if (grep(/$term2/, #array2) {
# then I'm storing a different element from the line into a 3rd array which eventually will be outputted
push(#known, $term1) unless $seen{$term1}++;
}
}
see that grep up there? It ain't workin right ... it is succeeding for all values of $term2 even if it is definitely NOT in array2 ... array1 is a file of a couple thousand lines. The element I'm calling $term2 here is a discrete term that may be in multiple lines, but is never repeated (or part of a larger string) within any given line. Array2 is about a couple dozen elements that I need to "filter in" for my output.
...
I just tried one of the below suggestions:
if (grep $_ eq $term2, #array2)
And this grep failed for all values of $term2 ... I'm getting an all or nothing response from grep ... so I guess I need to stop using grep. Try one of those hash solutions ... but I really could use more explanation and clarification on those.
This is in perlfaq. A quick way to do it is
my %seen;
$seen{$_}++ for #array1;
for my $item (#array2) {
if ($seen{$item}) {
# item is in array2, do something
}
}
If letter case is not important, you can set the keys with $seen{ lc($_) } and check with if ($seen{ lc($item) }).
ETA:
With the changed question: If the task is to match single words in #array2 against whole lines in #array1, the task is more complicated. Trying to split the lines and match against hash keys will likely be unsafe, because of punctuation and other such things. So, a regex solution will likely be the safest.
Unless #array2 is very large, you might do something like this:
my $rx = join "|", #array2;
for my $line (#array1) {
if ($line =~ /\b$rx\b/) { # use word boundary to avoid partial matches
# do something
}
}
If #array2 contains meta characters, such as *?+|, you have to make sure they are escaped, in which case you'd do something like:
my $rx = join "|", map quotemeta, #array2;
# etc
You could use the (infamous) "smart match" operator, provided you are on 5.10 or later:
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
print "a in \#array1\n" if 'a' ~~ #array1;
print "z in \#array1\n" if 'z' ~~ #array1;
print "z in \#array2\n" if 'z' ~~ #array2;
The example is very simple, but you can use an RE if you need to as well.
I should add that not everyone likes ~~ because there are some ambiguities and, um, "undocumented features". Should be OK for this though.
This should work.
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
for my $term (#array1) {
if (grep $_ eq $term, #array2) {
print "$term found.\n";
}
}
Output:
a found.
c found.
e found.
g found.
#!/usr/bin/perl
#ar = ( '1','2','3','4','5','6','10' );
#arr = ( '1','2','3','4','5','6','7','8','9' ) ;
foreach $var ( #arr ){
print "$var not found\n " if ( ! ( grep /$var/, #ar )) ;
}
Pattern matching is the most efficient way of matching elements. This would do the trick. Cheers!
print "$element found in the array\n" if ("#array" =~ m/$element/);
Your 'actual code' shouldn't even compile:
if (grep(/$term2/, #array2) {
should be:
if (grep (/$term2/, #array2)) {
You have unbalanced parentheses in your code. You may also find it easier to use grep with a callback (code reference) that operates on its arguments (the array.) It helps keep the parenthesis from blurring together. This is optional, though. It would be:
if (grep {/$term2/} #array2) {
You may want to use strict; and use warnings; to catch issues like this.
The example below might be helpful, it tries to see if any element in #array_sp is present in #my_array:
#! /usr/bin/perl -w
#my_array = qw(20001 20003);
#array_sp = qw(20001 20002 20004);
print "#array_sp\n";
foreach $case(#my_array){
if("#array_sp" =~ m/$case/){
print "My God!\n";
}
}
use pattern matching can solve this. Hope it helps
-QC
1. grep with eq , then
if (grep {$_ eq $term2} #array2) {
print "$term2 exists in the array";
}
2. grep with regex , then
if (grep {/$term2/} #array2) {
print "element with pattern $term2 exists in the array";
}

Extracting specific lines with Perl

I am writing a perl program to extract lines that are in between the two patterns i am matching. for example the below text file has 6 lines. I am matching load balancer and end. I want to get the 4 lines that are in between.
**load balancer**
new
old
good
bad
**end**
My question is how do you extract lines in between load balancer and end into an array. Any help is greatly appreciated.
You can use the flip-flop operator to tell you when you are between the markers. It will also include the actual markers, so you'll need to except them from the data collection.
Note that this will mash together all the records if you have several, so if you do you need to store and reset #array somehow.
use strict;
use warnings;
my #array;
while (<DATA>) {
if (/^load balancer$/ .. /^end$/) {
push #array, $_ unless /^(load balancer|end)$/;
}
}
print #array;
__DATA__
load balancer
new
old
good
bad
end
You can use the flip-flop operator.
Additionally, you can also use the return value of the flipflop to filter out the boundary lines. The return value is a sequence number (starting with 1) and the last number has the string E0 appended to it.
# Define the marker regexes separately, cuz they're ugly and it's easier
# to read them outside the logic of the loop.
my $start_marker = qr{^ \s* \*\*load \s balancer\*\* \s* $}x;
my $end_marker = qr{^ \s* \*\*end\*\* \s* $}x;
while( <DATA> ) {
# False until the first regex is true.
# Then it's true until the second regex is true.
next unless my $range = /$start_marker/ .. /$end_marker/;
# Flip-flop likes to work with $_, but it's bad form to
# continue to use $_
my $line = $_;
print $line if $range !~ /^1$|E/;
}
__END__
foo
bar
**load balancer**
new
old
good
bad
**end**
baz
biff
Outputs:
new
old
good
bad
If you prefer a command line variation:
perl -ne 'print if m{\*load balancer\*}..m{\*end\*} and !m{\*load|\*end}' file
For files like this, I often use a change in the Record Separator ( $/ or $RS from English )
use English qw<$RS>;
local $RS = "\nend\n";
my $record = <$open_handle>;
When you chomp it, you get rid of that line.
chomp( $record );

how can i fetch the whole word on the basis of index no of that string in perl

I have one string of line like
comments:[I#1278327] is related to office communicator.i fixed the bug to declare it null at first time.
Here I am searching index of I#then I want the whole word means [I#1278327]. I'm doing it like this:
open(READ1,"<letter.txt");
while(<READ1>)
{
if(index($_,"I#")!=-1)
{
$indexof=index($_,"I#");
print $indexof,"\n";
$string=substr($_,$indexof);##i m cutting that string first from index of I# to end then...
$string=substr($string,0,index($string," "));
$lengthof=length($string);
print $lengthof,"\n";
print $string,"\n";
print $_,"\n";
}
}
Is any API is there in perl to find the word length directly after finding the index of I# in that line.
You could do something like:
$indexof=index($_,"I#");
$index2 = index($_,' ',$indexof);
$lengthof = $index2 - $indexof;
However, the bigger issue is you are using Perl as if it were BASIC. A more perlish approach to the task of printing selected lines:
use strict;
use warnings;
open my $read, '<', 'letter.txt'; # safer version of open
LINE:
while (<$read>) {
print "$1 - $_" if (/(I#.*?) /);
}
I would use a regex instead, a regex will allow you to match a pattern ("I#") and also capture other data from the string:
$_ =~ m/I#(\d+)/;
The line above will match and set $1 to the number.
See perldoc perlre