grep in perl without array - perl

If I have one variable : I assigned entire file text to it
$var = `cat file_name`
Suppose in the file , the word 'mine' comes in 17th line (location is not available but just giving example) and I want to search a pattern 'word' after N (eg 10) lines of word 'mine' if pattern 'word' exist in those lines or not. How can i do that in the regular expression without using array'
Example:
$var = "I am good in perl\n but would like to know about the \n grep command in details";
I want to search particular pattern in specific lines (lines 2 to 3 only). How can I do it without using array.

There is a valid case for not using arrays here - when files are prohibitively large.
This is a pretty specific requirement. Rather than beat around the bush to find that Perl idiom, I'd prescribe a subroutine:
sub n_lines_apart {
my ( $file, $n, $first_pattern, $second_pattern ) = #_;
open my $fh, '<', $file or die $!;
my $lines_apart;
while (<$fh>) {
$lines_apart++ if qr/$first_pattern/ .. qr/$second_pattern/;
}
return $lines_apart && $lines_apart <= $n+1;
}
Caveat
The sub above is not designed to handle multiple matches in a single file. Let that be an exercise for the reader.

You can do this with a regular expression match like this:
my $var = `cat $filename`;
while ( $var =~ /foo/g ) {
print $1, "\n";
print "match occurred at position ", pos($var), " in the string.\n";
}
This will print out all the matches of the string 'foo' from your string, similar to grep but not using an array (or list). The /$regexp/g syntax makes the regular expression iteratively match against the string from left to right.
I'd recommend reading perlrequick for a tutorial on matching with regular expressions.

Try this:
perl -ne '$m=$. if !$m && /first-pattern/;
print if $m && ($.-$m >= 2 && $.-$m <= 3) && /second-pattern/'

Related

How to search for multiple strings in same line using perl?

I know how to extract a line by searching for a single string in a file inside perl script and below command worked perfectly fine which gave the lines having 255.255.255.255 in it.
my #newList = grep /255.255.255.255/, #File1;
However when I want to search for multiple strings(fields) in a file, grep command is not working
I have below file where if sourceipaddress, destipaddr and port number matches, it should extract the entire line and write into an array
gitFile:
access abc permit tcp sourceipaddress sourcesubnet destipaddr destsubnet eq portnumber
This is the way I have chosen to resolve the issue where I'm splitting based on the fields and searching for those fields in an array using grep but it does not seem to work (Tried 5 different ways which are commented below but none of the commands worked). I just want a way to search multiple strings(which includes ipaddress) in a line. Kindly help as I’m struggling with this as I’m new to perl.
my #columns = split(' ',$line);
my $fld0 = $columns[3];
my $fld3 = $columns[6];
my $fld5 = $columns[9];
#my #gitLines = grep {$_ =~ "$fld0" && $_ =~ "$sIP" && $_ =~ "$dIP" && $_ =~ "$fld5"} #gitFile;
#my #gitLines = #gitFile =~ /$fld0|$sIP|$dIP|$fld5/;
#my #gitLines = grep /$fld0/ && /$sIP/ && /$dIP/ &&/$fld5/,#gitFile;
#grep {$fld0} && {$sIP} && {$dIP} && {$fld5} #gitFile;
#my #gitLines = grep /255.255.255.255/ && /$fld0/, #File1;
I'm trying this in Linux GNU/Linux flavor
Without complete code it is not clear what is going on in your program. I infer by context that $line has a template for a line so you extract patterns from it, and that #gitFile has all lines from a file. Then among those lines you want to identify the ones that have all three patterns.
The first attempt should be written as
my #gitLines = grep { /$fld0/ && /$fld1/ && /$fld2/ } #gitFile;
While you may indeed pick your delimiters, for any other than // there must be the m, so you can have grep { m"$fld0" && .. } (there is no value in explicit $_ as it only adds noise). But I find it only obscuring to use uncommon delimiters in this case.
The second attempt is wrong as you cannot match an array. Also, using the alternation | would match even when only one pattern matches.
Another way is to form a regex to parse the line instead of matching separately on each field
my $re = join '.*?', map { quotemeta } (split ' ', $line)[3,6,9];
my #gitLines = grep { /$re/ } #gitFile;
Regex patterns should be built using qr operator but for the simple .*? pattern a string works.
Here the patterns need be present in a line in the exact order, unlike in the grep above. A clear advantage is that it runs regex over the line once, while in grep the engine starts three times.
Note that it is generally better to process files line-by-line, unless there are specific reasons to read the whole file ahead of time. For example
# $line contains patterns that must all match at indices 3,6,9
my $re = join '.*?', map { quotemeta } (split ' ', $line)[3,6,9];
my #gitLines;
open my $fh, '<', $git_file_name or die "Can't open $git_file_name: $!";
while (<$fh>) {
next if not /$re/;
push #gitLines, $_;
}
More than the efficiency this has the advantage of being more easily maintained.
Basicaly, I believe you are trying to find more than 1 match in a line and you have got each line in an array called #gitFile.
I am trying to do it in a simpler way as per my understanding.
$fld0 = 'pattern1';
$fld1 = 'pattern2';
foreach(#gitFile)
{
if(($_=~ m/$fld0/ && $_ =~ m/$fld1/))
{
push(#gitLines ,$_);
}
}

Perl: Find a match, remove the same lines, and to get the last field

Being a Perl newbie, please pardon me for asking this basic question.
I have a text file #server1 that shows a bunch of sentences (white space is the field separator) on many lines in the file.
I needed to match lines with my keyword, remove the same lines, and extract only the last field, so I have tried with:
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my #uniqmatchedline = split(/ /, #allmatchedlines);
my $lastfield = $uniqmatchedline[-1]\n";
print "$lastfield\n";
and it gives me the output showing:
1
I don't know why it's giving me just "1".
Could someone please explain why I'm getting "1" and how I can get the last field of the matched line correctly?
Thank you!
my #uniqmatchedline = split(/ /, #allmatchedlines);
You're getting "1" because split takes a scalar, not an array. An array in scalar context returns the number of elements.
You need to split on each individual line. Something like this:
my #uniqmatchedline = map { split(/ /, $_) } #allmatchedlines;
There are two issues with your code:
split is expecting a scalar value (string) to split on; if you are passing an array, it will convert the array to scalar (which is just the array length)
You did not have a way to remove same lines
To address these, the following code should work (not tested as no data):
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my %existing;
my #uniqmatchedline = grep !$existing{$_}++, #allmatchedlines; #this will return the unique lines
my #lastfields = map { ((split / /, $_)[-1]) . "\n" } #uniqmatchedline ; #this maps the last field in each line into an array
print for #lastfields;
Apart from two errors in the code, I find the statement "remove the same lines and extract only the last field" unclear. Once duplicate matching lines are removed, there may still be multiple distinct sentences with the pattern.
Until a clarification comes, here is code that picks the last field from the last such sentence.
use warnings 'all';
use strict;
use List::MoreUtils qw(uniq)
my $file = '/tmp/myfile.txt';
my $cmd = "ssh user1\#server1 cat $file";
open my $fh, '-|', $cmd // die "Error opening $cmd: $!"; # /
while (<$fh>) {
chomp;
push #allmatchedlines, $_ if /mysearch/;
}
close(output1);
my #unique_matched_lines = uniq #allmatchedlines;
my $lastfield = ( split ' ', $unique_matched_lines[-1] )[-1];
print $lastfield, "\n";
I changed to the three-argument open, with error checking. Recall that open for a process involves a fork and returns pid, so an "error" doesn't at all relate to what happened with the command itself. See open. (The # / merely turns off wrong syntax highlighting.) Also note that # under "..." indicates an array and thus need be escaped.
The (default) pattern ' ' used in split splits on any amount of whitespace. The regex / / turns off this behavior and splits on a single space. You most likely want to use ' '.
For more comments please see the original post below.
The statement #allmatchedlines = $_ if /mysearch/; on every iteration assigns to the array, overwriting whatever has been in it. So you end up with only the last line that matched mysearch. You want push #allmatchedlines, $_ ... to get all those lines.
Also, as shown in the answer by Justin Schell, split needs a scalar so it is taking the length of #allmatchedlines – which is 1 as explained above. You should have
my #words_in_matched_lines = map { split } #allmatchedlines;
When all this is straightened out, you'll have words in the array #uniqmatchedline and if that is the intention then its name is misleading.
To get unique elements of the array you can use the module List::MoreUtils
use List::MoreUtils qw(uniq);
my #unique_elems = uniq #whole_array;

How to use "or " operator to assign several values to a variable?

I'm new with perl.
I would like to say that a variable could take 2 values, then I call it from another function.
I tried:
my(#file) = <${dirname}/*.txt || ${dirname}/*.xml> ;
but this seems not working for the second value, any suggestions?
When using the <*> operator as a fileglob operator, you can use any common glob pattern. Available patterns are
* (any number of any characters),
? (any single character),
{a,b,c} (any of the a, b or c patterns),
So you could do
my #file = glob "$dirname/*.{txt,xml}";
or
my #file = (glob("$dirname/*.txt"), glob("$dirname/*.xml"));
or
my #file = glob "$dirname/*.txt $dirname/*.xml";
as the glob pattern is split at whitespace into subpatterns
If I understood correctly, you want #files to fallback on the second option (*.xml) if no *.txt files are found.
If so, your syntax is close. It should be:
my #files = <$dirname/*.txt> || <$dirname/*.xml>;
or
my #files = glob( "$dirname/*.txt" ) || glob( "$dirname/*.xml" );
Also, it's a good idea to check for #files to make sure it's populated (what if you don't have any *.txt or *.xml?)
warn 'No #files' unless #files;
my (#file) = (<${dirname}/*.txt>, <${dirname}/*.xml>);
my(#file) = <${dirname}/*.txt>, <${dirname}/*.xml> ;
<> converts it into an array of file names, so you are essentially doing my #file = #array1, #array2. This will iterate first through txt files and then through xml files.
This works
my $file = $val1 || $val2;
what it means is set $file to $val1, but if $val1 is 0 or false or undef then set $file1 to $val2
In essence, surrounding a variable with < > means either
1) treat it as a filehandle ( for example $read=<$filehandle> )
2) use it as a shell glob (for example #files=<*.xml> )
Looks to me like you wish to interpolate the value $dirname and add either .txt or .xml on the end. The < > will not achieve this
If you wish to send two values to a function then this might be what you want
my #file=("$dirname.txt","$dirname.xml");
then call the function with #file, ie myfunction(#file)
In the function
sub myfunction {
my $file1=shift;
my $file2=shift;
All this stuff is covered in perldocs perlsub and perldata
Have fun

how can i fetch the whole word on the basis of index no of that string in perl

I have one string of line like
comments:[I#1278327] is related to office communicator.i fixed the bug to declare it null at first time.
Here I am searching index of I#then I want the whole word means [I#1278327]. I'm doing it like this:
open(READ1,"<letter.txt");
while(<READ1>)
{
if(index($_,"I#")!=-1)
{
$indexof=index($_,"I#");
print $indexof,"\n";
$string=substr($_,$indexof);##i m cutting that string first from index of I# to end then...
$string=substr($string,0,index($string," "));
$lengthof=length($string);
print $lengthof,"\n";
print $string,"\n";
print $_,"\n";
}
}
Is any API is there in perl to find the word length directly after finding the index of I# in that line.
You could do something like:
$indexof=index($_,"I#");
$index2 = index($_,' ',$indexof);
$lengthof = $index2 - $indexof;
However, the bigger issue is you are using Perl as if it were BASIC. A more perlish approach to the task of printing selected lines:
use strict;
use warnings;
open my $read, '<', 'letter.txt'; # safer version of open
LINE:
while (<$read>) {
print "$1 - $_" if (/(I#.*?) /);
}
I would use a regex instead, a regex will allow you to match a pattern ("I#") and also capture other data from the string:
$_ =~ m/I#(\d+)/;
The line above will match and set $1 to the number.
See perldoc perlre

perl split on empty file

I have basically the following perl I'm working with:
open I,$coupon_file or die "Error: File $coupon_file will not Open: $! \n";
while (<I>) {
$lctr++;
chomp;
my #line = split/,/;
if (!#line) {
print E "Error: $coupon_file is empty!\n\n";
$processFile = 0; last;
}
}
I'm having trouble determining what the split/,/ function is returning if an empty file is given to it. The code block if (!#line) is never being executed. If I change that to be
if (#line)
than the code block is executed. I've read information on the perl split function over at
http://perldoc.perl.org/functions/split.html and the discussion here about testing for an empty array but not sure what is going on here.
I am new to Perl so am probably missing something straightforward here.
If the file is empty, the while loop body will not run at all.
Evaluating an array in scalar context returns the number of elements in the array.
split /,/ always returns a 1+ elements list if $_ is defined.
You might try some debugging:
...
chomp;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper( { "line is" => $_ } );
my #line = split/,/;
print Dumper( { "split into" => \#line } );
if (!#line) {
...
Below are a few tips to make your code more idiomatic:
The special variable $. already holds the current line number, so you can likely get rid of $lctr.
Are empty lines really errors, or can you ignore them?
Pull apart the list returned from split and give the pieces names.
Let Perl do the opening with the "diamond operator":
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The #ARGV array is then processed as a list of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work.
Say your input is in a file named input and contains
Campbell's soup,0.50
Mac & Cheese,0.25
Then with
#! /usr/bin/perl
use warnings;
use strict;
die "Usage: $0 coupon-file\n" unless #ARGV == 1;
while (<>) {
chomp;
my($product,$discount) = split /,/;
next unless defined $product && defined $discount;
print "$product => $discount\n";
}
that we run as below on Unix:
$ ./coupons input
Campbell's soup => 0.50
Mac & Cheese => 0.25
Empty file or empty line? Regardless, try this test instead of !#line.
if (scalar(#line) == 0) {
...
}
The scalar method returns the array's length in perl.
Some clarification:
if (#line) {
}
Is the same as:
if (scalar(#line)) {
}
In a scalar context, arrays (#line) return the length of the array. So scalar(#line) forces #line to evaluate in a scalar context and returns the length of the array.
I'm not sure whether you're trying to detect if the line is empty (which your code is trying to) or whether the whole file is empty (which is what the error says).
If the line, please fix your error text and the logic should be like the other posters said (or you can put if ($line =~ /^\s*$/) as your if).
If the file, you simply need to test if (!$lctr) {} after the end of your loop - as noted in another answer, the loop will not be entered if there's no lines in the file.