how to count a repeating string in a line using perl - perl

I have the below file
file1:
abc def host 123 host 869 host
I wrote below script to count the occurrence of a "host" keyword in each line.
I tried all the ways(refer the ones which are commented) still it does not seem to work. sed command worked in command line but not inside the perl script
#!/usr/bin/perl
open(SOURCE,"</home/amp/surevy01/file1");
open(DESTINATION,"</home/amp/surevy01/file2");
while(my $line = <SOURCE>)
{
while(my $line1 = <DESTINATION>)
{
#chomp($line);
#chomp($line1);
if ($line =~ "host")
{
#my $count = grep {host} $line;
#my $count = `sed -i {s/host/host\n/g} $line1 | grep -c {host}`;
#my $count = `perl -pi -e 's/host/host\n/g' $line1 | grep -c host`;
#my $count grep ("host" ,$line);
print "$count";
print "match found \n";
next;
}
else
{
print "match not found \n";
exit;
}
}
}
I'm a beginner to perl. Looking for your valuable suggestions

Your own solution will match instances like hostages and Shostakovich
grep is the canonical way to count elements of a list, and split will turn your line into a list of words, giving
my $count = grep { $_ eq 'host' } split ' ', $line

I don't know why you're looping through two files in your example, but you can use the /g (global) flag:
my $line = "abc def host 123 host 869 host";
my $x = 0;
while ($line =~ /host/g){
$x++;
}
print "$x\n"; # 3
When you run a regex with /g in scalar context (as is the conditional in the while statement), it will keep track of the location of the last match and restart from there. Therefore, /host/g in a loop as above will find each occurence of host. You can also use the /g in list contexts:
my $line = "abc def host 123 host 869 host";
my #matches = $contents =~ /host/g;
print scalar #matches; # 3 again
In this case, #matches will contain all matches of the regexp against the string, which will be ('host', 'host', 'host') since the query is a simple string. Then, scalar(#matches) will yield the length of the list.

This produces the number of instances of host in $line:
my $count = () = $line =~ /host/g;
But that also matches hosting. To avoid that, the following will probably do the trick:
my $count = () = $line =~ /\bhost\b/g;
=()= this is called Perl secret Goatse operator. More info

Related

zcat working in command line but not in perl script

Here is a part of my script:
foreach $i ( #contact_list ) {
print "$i\n";
$e = "zcat $file_list2| grep $i";
print "$e\n";
$f = qx($e);
print "$f";
}
$e prints properly but $f gives a blank line even when $file_list2 has a match for $i.
Can anyone tell me why?
Always is better to use Perl's grep instead of using pipe :
#lines = `zcat $file_list2`; # move output of zcat to array
die('zcat error') if ($?); # will exit script with error if zcat is problem
# chomp(#lines) # this will remove "\n" from each line
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print #ar;
# print join("\n",#ar)."\n"; # in case of using chomp
}
Best solution is not calling zcat, but using zlib library :
http://perldoc.perl.org/IO/Zlib.html
use IO::Zlib;
# ....
# place your defiiniton of $file_list2 and #contact list here.
# ...
$fh = new IO::Zlib; $fh->open($file_list2, "rb")
or die("Cannot open $file_list2");
#lines = <$fh>;
$fh->close;
#chomp(#lines); #remove "\n" symbols from lines
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print (#ar);
# print join("\n",#ar)."\n"; #in case of using chomp
}
Your question leaves us guessing about many things, but a better overall approach would seem to be opening the file just once, and processing each line in Perl itself.
open(F, "zcat $file_list |") or die "$0: could not zcat: $!\n";
LINE:
while (<F>) {
######## FIXME: this could be optimized a great deal still
foreach my $i (#contact_list) {
if (m/$i/) {
print $_;
next LINE;
}
}
}
close (F);
If you want to squeeze out more from the inner loop, compile the regexes from #contact_list into a separate array before the loop, or perhaps combine them into a single regex if all you care about is whether one of them matched. If, on the other hand, you want to print all matches for one pattern only at the end when you know what they are, collect matches into one array per search expression, then loop them and print when you have grepped the whole set of input files.
Your problem is not reproducible without information about what's in $i, but I can guess that it contains some shell metacharacter which causes it to be processed by the shell before the grep runs.

Is there a better way to count occurrence of char in a string?

I felt there must a better way to count occurrence instead of writing a sub in perl, shell in Linux.
#/usr/bin/perl -w
use strict;
return 1 unless $0 eq __FILE__;
main() if $0 eq __FILE__;
sub main{
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = count_occurrence($str, $char);
print "count<$count> of <$char> in <$str>\n";
}
sub count_occurrence{
my ($str, $char) = #_;
my $len = length($str);
$str =~ s/$char//g;
my $len_new = length($str);
my $count = $len - $len_new;
return $count;
}
If the character is constant, the following is best:
my $count = $str =~ tr/y//;
If the character is variable, I'd use the following:
my $count = length( $str =~ s/[^\Q$char\E]//rg );
I'd only use the following if I wanted compatibility with versions of Perl older than 5.14 (as it is slower and uses more memory):
my $count = () = $str =~ /\Q$char/g;
The following uses no memory, but might be a bit slow:
my $count = 0;
++$count while $str =~ /\Q$char/g;
Counting the occurences of a character in a string can be performed with one line in Perl (as compared to your 4 lines). There is no need for a sub (although there is nothing wrong with encapsulating functionality in a sub). From perlfaq4 "How can I count the number of occurrences of a substring within a string?"
use warnings;
use strict;
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = () = $str =~ /\Q$char/g;
print "count<$count> of <$char> in <$str>\n";
In a beautiful* Bash/Coreutils/Grep one-liner:
$ str=ru8xysyyyyyyysss6s5s
$ char=y
$ fold -w 1 <<< "$str" | grep -c "$char"
8
Or maybe
$ grep -o "$char" <<< "$str" | wc -l
8
The first one works only if the substring is just one character long; the second one works only if the substrings are non-overlapping.
* Not really.
toolic has given a correct answer, but you might consider not hardcoding your values to make the program reusable.
use strict;
use warnings;
die "Usage: $0 <text> <characters>" if #ARGV < 1;
my $search = shift; # the string you are looking for
my $str; # the input string
if (#ARGV && -e $ARGV[0] || !#ARGV) { # if str is file, or there is no str
local $/; # slurp input
$str = <>; # use diamond operator
} else { # else just use the string
$str = shift;
}
my $count = () = $str =~ /\Q$search\E/gms;
print "Found $count of '$search' in '$str'\n";
This will allow you to use the program to count for the occurrence of a character, or a string, inside a string, a file, or standard input. For example:
count.pl needles haystack.txt
some_process | count.pl foo
count.pl x xyzzy

Perl: Grep unique value

Basically I wanted to emulate the piped grep operation as we do in shell script, (grep pattern1 |grep pattern2) in my Perl code to make the result unique.
Below code is working, bust just wanted to know this is the right approach. Please note, I don't want to introduce a inner loop here, just for the grep part.
foreach my $LINE ( #ARRAY1 ) {
#LINES = split /\s+/, $LINE;
#RESULT= grep ( /$LINES[0]/, ( grep /$LINES[1]/, #ARRAY2 ) );
...
This is basically same thing what you're doing, "for every #ARRAY2 element, check whether it matches ALL elements from #LINES" (stop as soon as any of the #LINES element does not match),
use List::Util "none";
my #RESULT= grep { my $s = $_; none { $s !~ /$_/ } #LINES } #ARRAY2;
# index() is faster for literal values
my #RESULT= grep { my $s = $_; none { index($s, $_) <0 } #LINES } #ARRAY2;
There is no need to cascade calls to grep -- you can simply and the conditions together
It's also worth saying that you should be using lower-case letters for your identifiers, and split /\s+/ should almost always be split ' '
Here's what I would write
for my $line ( #array1 ) {
my #fields = split ' ', $line;
my #result = grep { /$fields[0]/ and /$fields[1] } #array2;
...
}
There are different ways to grep/extract unique values from array in perl.
##2) Best of all
my %hash = map { $_ , 1 } #array;
my #uniq = keys %hash;
print "\n Uniq Array:", Dumper(\#uniq);
##3) Costly process as it involves 'greping'
my %saw;
my #out = grep(!$saw{$_}++, #array);
print "\n Uniq Array: #out \n";

How to grep read file in perl

I would like to read a file in perl, after, the user can input any string and grep will try to find the string inputted in the file read. It will only exit when the user input nothing or any space character. Here's my code which is not working:
#! usr/bin/perl
use warnings;
use strict;
open MATCHSTRING,"matchstring";
my #lines = <MATCHSTRING>;
while (<>) {
chomp;
my #match = grep {/\b$_\b/s} #lines;
print #match;
}
I'm still lacking the condition where it will exit once nothing is inputted or a newline or any space character.
while (<>)
means
while (defined($_ = <>))
so need to press Ctrl-D (unix) or Ctrl-Z, Enter (Windows) to signal end of input. Or you could add a check for a blank line:
while (<>) {
chomp;
last if $_ eq "";
print grep /\b$_\b/s, #lines;
}
There is possible problem in your example with my #match = grep {/\b$_\b/s} #lines; as grep is not working with user input, but only with content of #lines. What it does is this:
grep { $lines[index] =~ /\b$lines[index]\b/s }
and you probably want this:
while (my $input = <>) {
chomp($input);
last if $input =~ /^ \s* $/x; # exit loop if no input or only whitespaces
my #match = grep { /\b$input\b/s } #lines;
print #match;
}

While and foreach mixed loop issue

!C:\Perl\bin\perl.exe
use strict;
use warnings;
my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";
while (my $line = <DATA> ) {
foreach my $argnum (0 .. $#ARGV) {
if ($line =~ /$ARGV[$argnum]/)
{
print $line;
}
}
}
__DATA__
A
B
Hello World :-)
Hello World !
when I passed one arg, it works well.
Such as I run test.pl A or test.pl B or **test.pl Hello"
when I passed two args, it works some time only.
Successful: When I run test.pl A B or test.pl A Hello or **test.pl B Hello"
Failed: when I run test.pl Hello World*
Produced and output duplicate lines:
D:\learning\perl>t.pl Hello World
thanks, you gave me 2 command-line arguments.
Hello World :-)
Hello World :-)
Hello World !
Hello World !
D:\learning\perl>
How to fix it? Thank you for reading and replies.
[update]
I don't want to print duplicate lines.
I don't see the problem, your script processes the __DATA__ and tests all input words against it: since "Hello" and "World" match twice each, it prints 4 rows.
If you don't want it to write multiple lines, just add last; after the print statement.
The reason you're getting the duplicate output is because the regex $line =~ /Hello/ matches both "Hello World" lines and $line =~ /World/ also matches both "Hello World" lines. To prevent that, you'll need to add something to remember which lines from the __DATA__ section have already been printed so that you can skip printing them if they match another argument.
Also, some very minor stylistic cleanup:
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
my $numArgs = #ARGV;
print "thanks, you gave me $numArgs command-line arguments.\n";
while (my $line = <DATA> ) {
foreach my $arg (#ARGV) {
if ($line =~ /$arg/)
{
print $line;
}
}
}
__DATA__
A
B
Hello World :-)
Hello World !
Using an array in scalar context returns its size, so $size = #arr is preferred over $size = $#arr + 1
If you're not going to use a counter for anything other than indexing through an array (for $i (0..$#arr) { $elem = $arr[$i]; ... }), then it's simpler and more straightforward to just loop over the array instead (for $elem (#arr) { ... }).
Your foreach loop could also be replaced with a grep statement, but I'll leave that as an exercise for the reader.
Assuming you want to print each line from DATA only once if one or more patterns match, you can use grep. Note that use of \Q to quote regex metacharacters in the command line arguments and the use of the #patterns array to precompile the patterns.
Read if grep { $line =~ $_ } #patterns out loud: If $line matches one or more patterns ;-)
#!/usr/bin/perl
use strict; use warnings;
printf "Thanks, you gave me %d command line arguments.\n", scalar #ARGV;
my #patterns = map { qr/\Q$_/ } #ARGV;
while ( my $line = <DATA> ) {
print $line if grep { $line =~ $_ } #patterns;
}
__DATA__
A
B
Hello World :-)
Hello World !
Here are some comments on your script to help you learn:
my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";
The command line arguments are in #ARGV (please do read the documentation). In scalar context, #ARGV evaluates to the number of elements in that array. Therefore, you can simply use:
printf "Thanks, you gave me %d command line arguments.\n", scalar #ARGV;
Further, you can iterate directly over the elements of #ARGV in your foreach loop instead of indexed access.
while (my $line = <DATA> ) {
foreach my $arg ( #ARGV ) {
if ( $line =~ /$arg/ ) {
print $line;
}
}
}
Now, what happens to your program if I pass ( to it on the command line? Or, even World? What should happen?