How to search for multiple strings in same line using perl? - perl

I know how to extract a line by searching for a single string in a file inside perl script and below command worked perfectly fine which gave the lines having 255.255.255.255 in it.
my #newList = grep /255.255.255.255/, #File1;
However when I want to search for multiple strings(fields) in a file, grep command is not working
I have below file where if sourceipaddress, destipaddr and port number matches, it should extract the entire line and write into an array
gitFile:
access abc permit tcp sourceipaddress sourcesubnet destipaddr destsubnet eq portnumber
This is the way I have chosen to resolve the issue where I'm splitting based on the fields and searching for those fields in an array using grep but it does not seem to work (Tried 5 different ways which are commented below but none of the commands worked). I just want a way to search multiple strings(which includes ipaddress) in a line. Kindly help as I’m struggling with this as I’m new to perl.
my #columns = split(' ',$line);
my $fld0 = $columns[3];
my $fld3 = $columns[6];
my $fld5 = $columns[9];
#my #gitLines = grep {$_ =~ "$fld0" && $_ =~ "$sIP" && $_ =~ "$dIP" && $_ =~ "$fld5"} #gitFile;
#my #gitLines = #gitFile =~ /$fld0|$sIP|$dIP|$fld5/;
#my #gitLines = grep /$fld0/ && /$sIP/ && /$dIP/ &&/$fld5/,#gitFile;
#grep {$fld0} && {$sIP} && {$dIP} && {$fld5} #gitFile;
#my #gitLines = grep /255.255.255.255/ && /$fld0/, #File1;
I'm trying this in Linux GNU/Linux flavor

Without complete code it is not clear what is going on in your program. I infer by context that $line has a template for a line so you extract patterns from it, and that #gitFile has all lines from a file. Then among those lines you want to identify the ones that have all three patterns.
The first attempt should be written as
my #gitLines = grep { /$fld0/ && /$fld1/ && /$fld2/ } #gitFile;
While you may indeed pick your delimiters, for any other than // there must be the m, so you can have grep { m"$fld0" && .. } (there is no value in explicit $_ as it only adds noise). But I find it only obscuring to use uncommon delimiters in this case.
The second attempt is wrong as you cannot match an array. Also, using the alternation | would match even when only one pattern matches.
Another way is to form a regex to parse the line instead of matching separately on each field
my $re = join '.*?', map { quotemeta } (split ' ', $line)[3,6,9];
my #gitLines = grep { /$re/ } #gitFile;
Regex patterns should be built using qr operator but for the simple .*? pattern a string works.
Here the patterns need be present in a line in the exact order, unlike in the grep above. A clear advantage is that it runs regex over the line once, while in grep the engine starts three times.
Note that it is generally better to process files line-by-line, unless there are specific reasons to read the whole file ahead of time. For example
# $line contains patterns that must all match at indices 3,6,9
my $re = join '.*?', map { quotemeta } (split ' ', $line)[3,6,9];
my #gitLines;
open my $fh, '<', $git_file_name or die "Can't open $git_file_name: $!";
while (<$fh>) {
next if not /$re/;
push #gitLines, $_;
}
More than the efficiency this has the advantage of being more easily maintained.

Basicaly, I believe you are trying to find more than 1 match in a line and you have got each line in an array called #gitFile.
I am trying to do it in a simpler way as per my understanding.
$fld0 = 'pattern1';
$fld1 = 'pattern2';
foreach(#gitFile)
{
if(($_=~ m/$fld0/ && $_ =~ m/$fld1/))
{
push(#gitLines ,$_);
}
}

Related

Remove duplicate lines on file by substring - preserve order (PERL)

i m trying to write a perl script to deal with some 3+ gb text files, that are structured like :
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212123x534534534534xx4545454x232323xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
I want to perform two operations :
Count the number of delimiters per line and compare it to a static number (ie 5), those lines that exceed said number should be output to a file.control.
Remove duplicates on the file by substring($line, 0, 7) - first 7 numbers, but i want to preserve order. I want the output of that in a file.output.
I have coded this in simple shell script (just bash), but it took too long to process, the same script calling on perl one liners was quicker, but i m interested in a way to do this purely in perl.
The code i have so far is :
open $file_hndl_ot_control, '>', $FILE_OT_CONTROL;
open $file_hndl_ot_out, '>', $FILE_OT_OUTPUT;
# INPUT.
open $file_hndl_in, '<', $FILE_IN;
while ($line_in = <$file_hndl_in>)
{
# Calculate n. of delimiters
my $delim_cur_line = $line_in =~ y/"$delimiter"//;
# print "$commas \n"
if ( $delim_cur_line != $delim_amnt_per_line )
{
print {$file_hndl_ot_control} "$line_in";
}
# Remove duplicates by substr(0,7) maintain order
my substr_in = substr $line_in, 0, 11;
print if not $lines{$substr_in}++;
}
And i want the file.output file to look like
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
and the file.control file to look like :
(assuming delimiter control number is 6)
4352342xx23232xxx345545x45454x23232xxx
Could someone assist me? Thank you.
Posting edits : Tried code
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
open(my $fh1, ">>", "outputcontrol.txt");
open(my $fh2, ">>", "outputoutput.txt");
while ( <> ) {
my $count = ($_ =~ y/x//);
print "$count \n";
# print $_;
if ( $count != $delim_amnt_per_line )
{
print fh1 $_;
}
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print fh2;
}
I dont know if i m supposed to post new code in here. But i tried the above, based on your example. What baffles me (i m still very new in perl) is that it doesnt output to either filehandle, but if i redirected from the command line just as you said, it worked perfect. The problem is that i need to output into 2 different files.
It looks like entries with the same seven-character prefix may appear anywhere in the file, so it's necessary to use a hash to keep track of which ones have already been encountered. With a 3GB text file this may result in your perl process running out of memory, in which case a different approach is necessary. Please give this a try and see if it comes in under the bar
The tr/// operator (the same as y///) doesn't accept variables for its character list, so I've used eval to create a subroutine delimiters() that will count the number of occurrences of $delimiter in $_
It's usually easiest to pass the input file as a parameter on the command line, and redirect the output as necessary. That way you can run your program on different files without editing the source, and that's how I've written this program. You should run it as
$ perl filter.pl my_input.file > my_output.file
use strict;
use warnings 'all';
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
eval "sub delimiters { tr/$delimiter// }";
while ( <> ) {
next if delimiters() == $delim_amnt_per_line;
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print;
}
output
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx

grep in perl without array

If I have one variable : I assigned entire file text to it
$var = `cat file_name`
Suppose in the file , the word 'mine' comes in 17th line (location is not available but just giving example) and I want to search a pattern 'word' after N (eg 10) lines of word 'mine' if pattern 'word' exist in those lines or not. How can i do that in the regular expression without using array'
Example:
$var = "I am good in perl\n but would like to know about the \n grep command in details";
I want to search particular pattern in specific lines (lines 2 to 3 only). How can I do it without using array.
There is a valid case for not using arrays here - when files are prohibitively large.
This is a pretty specific requirement. Rather than beat around the bush to find that Perl idiom, I'd prescribe a subroutine:
sub n_lines_apart {
my ( $file, $n, $first_pattern, $second_pattern ) = #_;
open my $fh, '<', $file or die $!;
my $lines_apart;
while (<$fh>) {
$lines_apart++ if qr/$first_pattern/ .. qr/$second_pattern/;
}
return $lines_apart && $lines_apart <= $n+1;
}
Caveat
The sub above is not designed to handle multiple matches in a single file. Let that be an exercise for the reader.
You can do this with a regular expression match like this:
my $var = `cat $filename`;
while ( $var =~ /foo/g ) {
print $1, "\n";
print "match occurred at position ", pos($var), " in the string.\n";
}
This will print out all the matches of the string 'foo' from your string, similar to grep but not using an array (or list). The /$regexp/g syntax makes the regular expression iteratively match against the string from left to right.
I'd recommend reading perlrequick for a tutorial on matching with regular expressions.
Try this:
perl -ne '$m=$. if !$m && /first-pattern/;
print if $m && ($.-$m >= 2 && $.-$m <= 3) && /second-pattern/'

Matching of data from output table

We need to match certain data element by element that is an output in tabular form obtained on the command prompt.The following is the approach being currently followed wherein the $Var contains the output. Is there an optimal way of doing this without directing the command output to file.
Please share your thoughts.
$Var = "iSCSI Storage LHN StgMgmt Name IP Name
==============================================================
0 Storage_1 15.178.209.194 admin
1 acct-mgmt 15.178.209.194 storage1
2 acct-mgmt2 15.178.209.194 storage2";
#tab = split("\n",$Var);
foreach (#tab) {
next if ($_ !~ /^\d/);
$_ =~ s/\s+//g;
$first=0 if($_ =~ /Storage/i && /15.178.209.194/);
push(#Array, $_); }
$_ =~ /Storage/i && /15.178.209.194/ is silly. That gets broken up like this: ($_ =~ /Storage/i) && (/15.178.209.194/). Either use $_ consistently or don't - the // and s/// operators automatically operate on $_.
Also you should know that in the regex /15.178.209.194/, the .s are being interpreted as any character. Either escape them or use the index() function.
Additionally, I would recommend that you separate each line using split(). This allows you to compare each individual column. You can use split() with a regex like so: #array = split(/\s+/, $string);.
Finally, I'm not really sure what $first is for, but I notice that all three sample lines in that input trigger $first=0 as they all contain that IP and the string "storage".
If I understand you correctly you want to invoke your script like this:
./some_shell_command | perl perl_script.pl
What you want to use is the Perl diamond operator <>:
#!/usr/bin/perl
use strict;
use warnings;
my $first;
my #Array;
for (<>) {
next unless /^\d/;
s/\s+/ /g;
$first = 0 if /Storage/i && /15.178.209.194/;
push(#Array, $_);
}
I've removed the redundant uses of $_ and fixed your substitution, since you probably don't want to remove all spaces.

Perl comparison operation between a variable and an element of an array

I am having quite a bit of trouble with a Perl script I am writing. I want to compare an element of an array to a variable I have to see if they are true. For some reason I cannot seem to get the comparison operation to work correctly. It will either evaluate at true all the time (even when outputting both strings clearly shows they are not the same), or it will always be false and never evaluate (even if they are the same). I have found an example of just this kind of comparison operation on another website, but when I use it it doesn't work. Am I missing something? Is the variable type I take from the file not a string? (Can't be an integer as far as I can tell as it is an IP address).
$ipaddress = '192.43.2.130'
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
print $address[0];
print $address[1];
}
for ($i = 0; $i < #address; $i++)
{
print "hello";
if ($address[$i] eq $ipaddress)
{print $address[$i];
$file = "server_$i";
print "I got here first";
goto SENDING;}
}
SENDING:
print " I am here";
I am pretty weak in Perl, so forgive me for any rookie mistakes/assumptions I may have made in my very meager bit of code. Thank you for you time.
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
You have several issues with this code here. First you should use strict because it would tell you that #address is being used before it's defined and you're also using numeric comparison on a string.
Secondly you aren't creating an array of the address in the file. You need to loop through the lines of the file to add each address:
my #address = ();
while( my $addr = <FH> ) {
chomp($addr); # removes the newline character
push(#address, $addr);
}
However you really don't need to push into an array at all. Just loop through the file and find the IP. Also don't use goto. That's what last is for.
while( my $addr = <FH> ) {
chomp($addr);
if( $addr eq $ipaddress ) {
$file = "server_$i";
print $addr,"\n";
print "I got here first"; # not sure what this means
last; # breaks out of the loop
}
}
When you're reading in from a file like that, you should use chomp() when doing a comparison with that line. When you do:
print $address[0];
print $address[1];
The output is on two separate lines, even though you haven't explicitly printed a newline. That's because $address[$i] contains a newline at the end. chomp removes this.
if ($address[$i] eq $ipaddress)
could read
my $currentIP = $address[$i];
chomp($currentIP);
if ($currentIP eq $ipaddress)
Once you're familiar with chomp, you could even use:
chomp(my $currentIP = $address[$i]);
if ($currentIP eq $ipaddress)
Also, please replace the goto with a last statement. That's perl's equivalent of C's break.
Also, from your comment on Jack's answer:
Here's some code you can use for finding how long it's been since a file was modified:
my $secondsSinceUpdate = time() - stat('filename.txt')->mtime;
You probably are having an issue with newlines. Try using chomp($address[$i]).
First of all, please don't use goto. Every time you use goto, the baby Jesus cries while killing a kitten.
Secondly, your code is a bit confusing in that you seem to be populating #address after starting the if($address[0] == '') statement (not to mention that that if should be if($address[0] eq '')).
If you're trying to compare each element of #address with $ipaddress for equality, you can do something like the following
Note: This code assumes that you've populated #address.
my $num_matches=0;
foreach(#address)
{
$num_matches++ if $_ eq $ipaddress;
}
if($num_matches)
{
#You've got a match! Do something.
}
else
{
#You don't have any matches. This may or may not be bad. Do something else.
}
Alternatively, you can use the grep operator to get any and all matches from #address:
my #matches=grep{$_ eq $ipaddress}#address;
if(#matches)
{
#You've got matches.
}
else
{
#Sorry, no matches.
}
Finally, if you're using a version of Perl that is 5.10 or higher, you can use the smart match operator (ie ~~):
if($ipaddress~~#address)
{
#You've got a match!
}
else
{
#Nope, no matches.
}
When you read from a file like that you include the end-of-line character (generally \n) in each element. Use chomp #address; to get rid of it.
Also, use last; to exit the loop; goto is practically never needed.
Here's a rather idiomatic rewrite of your code. I'm excluding some of your logic that you might need, but isn't clear why:
$ipaddress = '192.43.2.130'
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
while (<FH>) { # loop over the file, using the default input space
chomp; # remove end-of-line
last if ($_ eq $ipaddress); # a RE could easily be used here also, but keep the exact match
}
close(FH);
$file = "server_$."; # $. is the line number - it's not necessary to keep track yourself
print "The file is $file\n";
Some people dislike using perl's implicit variables (like $_ and $.) but they're not that hard to keep track of. perldoc perlvar lists all these variables and explains their usage.
Regarding the exact match vs. "RE" (regular expression, or regexp - see perldoc perlre for lots of gory details) -- the syntax for testing a RE against the default input space ($_) is very simple. Instead of
last if ($_ eq $ipaddress);
you could use
last if (/$ipaddress/);
Although treating an ip address as a regular expression (where . has a special meaning) is probably not a good idea.

How do I remove a a list of character sequences from the beginning of a string in Perl?

I have to read lines from a file and store them into a hash in Perl. Many of these lines have special character sequences at the beginning that I need to remove before storing. These character sequences are
| || ### ## ##||
For example, if it is ||https://ads, I need to get https://ads; if ###http, I need to get http.
I need to exclude these character sequences. I want to do this by having all the character sequences to exclude in a array and then check if the line starts with these character sequences and remove those. What is a good way to do this?
I've gone as far as:
our $ad_file = "C:/test/list.txt";
our %ads_list_hash = ();
my $lines = 0;
# List of lines to ignore
my #strip_characters = qw /| || ### ## ##||/;
# Create a list of substrings in the easylist.txt file
open my $ADS, '<', $ad_file or die "can't open $ad_file";
while(<$ADS>) {
chomp;
$ads_list_hash{$lines} = $_;
$lines ++;
}
close $ADS;
I need to add the logic to remove the #strip_characters from the beginning of each line if any of them are present.
Probably a bit too complex and general for the task, but still..
my $strip = join "|", map {quotemeta} #strip_characters;
# avoid bare [] etc. in the RE
# ... later, in the while()
s/^(?:$strip)+//o;
# /o means "compile $strip into the regex once and for all"
Why don't you do it with a regex? Something like
$line =~ s/^[## |]+//;
should work.
If you want to remove a list of characters (according to your title), then a very simple regular expression will work.
Within the loop, add the following regular expression
while( <$ADS> ) {
chomp;
s/^[## \|]+//;
$ads_list_hash{$lines++} = $_;
}
Note the pipe charachter ('|') is escapted.
However, it appears that you want to remove a list of expressions. You can do the following
while( <$ADS> ) {
chomp;
s/^((\|)|(\|\|)|(###)|(##)|(##\|\|))+//;
$add_list_hash{$lines++} = $_;
}
You said that the list of expression is stored in an array or words. In your sample code, you create this array with 'qw'. If the list of expressions isn't known at compile time, you can build a regular expression in a variable, and use it.
my #strip_expression = ... // get an array of strip expressions
my $re = '^((' . join(')|(',#strip_expression) . '))+';
and then, use the following statement in the loop:
s/$re//;
Finaly, one thing not related to the question can be said about the code: It would be much more appropriate to use Array instead of Hash, to map an integer to a set of strings. Unless you have some other requirement, better have:
our #ads_list; // no need to initialize the array (or the hash) with empty list
...
while( <$ADS> ) {
chomp;
s/.../;
push #ads_list, $_;
}
$ads_list_hash{$lines} = $_;
$lines ++;
Don't do that. If you want an array, use an array:
push #ads_lines, $_;
Shawn's Rule of Programming #7: When creating data structures: if preserving the order is important, use an array; otherwise use a hash.
Because substitutions return whether or not they did anything you can use a
substitution to search the string for your pattern and remove it if it's there.
while( <$ADS> ) {
next unless s/^\s*(?:[#]{2,3}|(?:##)?[|]{1,2})\s*//;
chomp;
$ads_list_hash{$lines} = $_;
$lines ++;
}