Is there a better way to count occurrence of char in a string? - perl

I felt there must a better way to count occurrence instead of writing a sub in perl, shell in Linux.
#/usr/bin/perl -w
use strict;
return 1 unless $0 eq __FILE__;
main() if $0 eq __FILE__;
sub main{
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = count_occurrence($str, $char);
print "count<$count> of <$char> in <$str>\n";
}
sub count_occurrence{
my ($str, $char) = #_;
my $len = length($str);
$str =~ s/$char//g;
my $len_new = length($str);
my $count = $len - $len_new;
return $count;
}

If the character is constant, the following is best:
my $count = $str =~ tr/y//;
If the character is variable, I'd use the following:
my $count = length( $str =~ s/[^\Q$char\E]//rg );
I'd only use the following if I wanted compatibility with versions of Perl older than 5.14 (as it is slower and uses more memory):
my $count = () = $str =~ /\Q$char/g;
The following uses no memory, but might be a bit slow:
my $count = 0;
++$count while $str =~ /\Q$char/g;

Counting the occurences of a character in a string can be performed with one line in Perl (as compared to your 4 lines). There is no need for a sub (although there is nothing wrong with encapsulating functionality in a sub). From perlfaq4 "How can I count the number of occurrences of a substring within a string?"
use warnings;
use strict;
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = () = $str =~ /\Q$char/g;
print "count<$count> of <$char> in <$str>\n";

In a beautiful* Bash/Coreutils/Grep one-liner:
$ str=ru8xysyyyyyyysss6s5s
$ char=y
$ fold -w 1 <<< "$str" | grep -c "$char"
8
Or maybe
$ grep -o "$char" <<< "$str" | wc -l
8
The first one works only if the substring is just one character long; the second one works only if the substrings are non-overlapping.
* Not really.

toolic has given a correct answer, but you might consider not hardcoding your values to make the program reusable.
use strict;
use warnings;
die "Usage: $0 <text> <characters>" if #ARGV < 1;
my $search = shift; # the string you are looking for
my $str; # the input string
if (#ARGV && -e $ARGV[0] || !#ARGV) { # if str is file, or there is no str
local $/; # slurp input
$str = <>; # use diamond operator
} else { # else just use the string
$str = shift;
}
my $count = () = $str =~ /\Q$search\E/gms;
print "Found $count of '$search' in '$str'\n";
This will allow you to use the program to count for the occurrence of a character, or a string, inside a string, a file, or standard input. For example:
count.pl needles haystack.txt
some_process | count.pl foo
count.pl x xyzzy

Related

How to separate an array in Perl based on pattern

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.
my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}
split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;
Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

How can I extract the number from the output of a shell command?

The output for the command is ent3, and from that output I want 3 to be stored in a variable
Perl code
sub {
if ( $exit == 1 )
{
$cmdStr = "lsdev | grep en | grep VLAN | awk '{ print \$1 }'\r";
$result =_run_cmd($cmdStr);
my #PdAt_val = split("\r?\n", $result);
my $num = $result =~ /([0-9]+)/;
print "The char is $num\n";
$exit = 0;
exp_continue;
Tidied code
sub {
if ( $exit == 1 ) {
$cmdStr = "lsdev | grep en | grep VLAN | awk '{ print \$1 }'\r";
$result = _run_cmd($cmdStr);
my #PdAt_val = split("\r?\n", $result);
my $num = $result =~ /([0-9]+)/;
print "The char is $num\n";
$exit = 0;
exp_continue;
Your code that is doing the work here is:
my $num = $result =~ /([0-9]+)/;
Let's put that into a simple program so we can see what's going on.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $result = 'ext3';
my $num = $result =~ /([0-9]+)/;
say $num;
And that prints 1. Which isn't what we want. What's going on?
Well, if you read the documentation for the match operator (in the section Regexp Quote-Like Operators in "perlop"), you'll see what the operator returns under different circumstances. It says:
Searches a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails.
So that explains the behaviour we're seeing. That "1" is just a true value saying that the match succeeded. But how do we get the value that we have captured in our parentheses. There are a couple of ways. Firstly, it's written into the $1 variable.
my $num;
if ($result =~ /([0-9]+)/) {
$num = $1;
}
say $num;
But I think the other approach is what you were looking for. If you read on, you'll see what the operator returns in list context:
m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, that is, ($1, $2, $3 ...)
So if we put the match operator in list context, then we'll get the contents of $1 returned. How do we put a match into list context? By making the expression a list assignment - which we can do by putting parentheses around the left-hand side of the assignment.
my ($num) = $result =~ /([0-9]+)/;
say $num;
Using regex, something like this should work:
if($result =~ /([0-9]+)/) {
$num = $1;
}
print $num;

how to count a repeating string in a line using perl

I have the below file
file1:
abc def host 123 host 869 host
I wrote below script to count the occurrence of a "host" keyword in each line.
I tried all the ways(refer the ones which are commented) still it does not seem to work. sed command worked in command line but not inside the perl script
#!/usr/bin/perl
open(SOURCE,"</home/amp/surevy01/file1");
open(DESTINATION,"</home/amp/surevy01/file2");
while(my $line = <SOURCE>)
{
while(my $line1 = <DESTINATION>)
{
#chomp($line);
#chomp($line1);
if ($line =~ "host")
{
#my $count = grep {host} $line;
#my $count = `sed -i {s/host/host\n/g} $line1 | grep -c {host}`;
#my $count = `perl -pi -e 's/host/host\n/g' $line1 | grep -c host`;
#my $count grep ("host" ,$line);
print "$count";
print "match found \n";
next;
}
else
{
print "match not found \n";
exit;
}
}
}
I'm a beginner to perl. Looking for your valuable suggestions
Your own solution will match instances like hostages and Shostakovich
grep is the canonical way to count elements of a list, and split will turn your line into a list of words, giving
my $count = grep { $_ eq 'host' } split ' ', $line
I don't know why you're looping through two files in your example, but you can use the /g (global) flag:
my $line = "abc def host 123 host 869 host";
my $x = 0;
while ($line =~ /host/g){
$x++;
}
print "$x\n"; # 3
When you run a regex with /g in scalar context (as is the conditional in the while statement), it will keep track of the location of the last match and restart from there. Therefore, /host/g in a loop as above will find each occurence of host. You can also use the /g in list contexts:
my $line = "abc def host 123 host 869 host";
my #matches = $contents =~ /host/g;
print scalar #matches; # 3 again
In this case, #matches will contain all matches of the regexp against the string, which will be ('host', 'host', 'host') since the query is a simple string. Then, scalar(#matches) will yield the length of the list.
This produces the number of instances of host in $line:
my $count = () = $line =~ /host/g;
But that also matches hosting. To avoid that, the following will probably do the trick:
my $count = () = $line =~ /\bhost\b/g;
=()= this is called Perl secret Goatse operator. More info

What does dot-equals mean in Perl?

What does ".=" mean in Perl (dot-equals)? Example code below (in the while clause):
if( my $file = shift #ARGV ) {
$parser->parse( Source => {SystemId => $file} );
} else {
my $input = "";
while( <STDIN> ) { $input .= $_; }
$parser->parse( Source => {String => $input} );
}
exit;
Thanks for any insight.
The period . is the concatenation operator. The equal sign to the right means that this is an assignment operator, like in C.
For example:
$input .= $_;
Does the same as
$input = $input . $_;
However, there's also some perl magic in this, for example this removes the need to initialize a variable to avoid "uninitialized" warnings. Try the difference:
perl -we 'my $x; $x = $x + 1' # Use of uninitialized value in addition ...
perl -we 'my $x; $x += 1' # no warning
This means that the line in your code:
my $input = "";
Is quite redundant. Albeit some people might find it comforting.
For pretty much any binary operator X, $a X= $b is equivalent to $a = $a X $b. The dot . is a string concatenation operator; thus, $a .= $b means "stick $b at the end of $a".
In your code, you start with an empty $input, then repeatedly read a line and append it to $input until there's no lines left. You should end up with the entire file as the contents of $input, one line at a time.
It should be equivalent to the loopless
local $/;
$input = <STDIN>;
(define line separator as a non-defined character, then read until the "end of line" that never comes).
EDIT: Changed according to TLP's comment.
You have found the string concatenation operator.
Let's try it :
my $string = "foo";
$string .= "bar";
print $string;
foobar
This performs concatenation to the $input var. Whatever is coming in via STDIN is being assigned to $input.

How can I search and replace a match a specific number of times in a string in Perl?

How can I search and replace a match with specific number of times using s///;. For example:
$string="abcabdaaa";
I want to replace a with i in $string n times. How can I do that? n is an integer provided by user.
The simple answer probably doesn't do want you want.
my $str = 'aaaa';
$str =~ s/a/a_/ for 1..2;
print $str, "\n"; # a__aaa. But you want a_a_aa, right?
You need to count the replacements yourself, and act accordingly:
$str = 'aaaa';
my $n = 0;
$str =~ s/(a)/ ++$n > 2 ? $1 : 'a_' /ge;
print $str, "\n";
See the FAQ, How do I change the Nth occurrence of something? for related examples.
Just substitute $n times:
$string =~ s/a/i/ for 1..$n;
This will do it.
More general solution would be global substitution with counter:
my $i = 0; # count the substitutions made
$string =~ s/(a)/ ++$i > $n ? $1 : "i" /ge;
I'm not aware of any flag that would do that. I'd simply use a loop:
for (my $i = 0; $i < $n; $i++)
{
$string =~ s/a/i/;
}
you can try this:
$str1=join('i',split(/a/,$str,$n));
Here is a way to do based on the comment you made to eugene y's answer
#!/usr/bin/perl
use strict; use warnings;
my $string = '***ab***c';
my $n = 3;
1 while $n -- and $string =~ s/\*([^\n])/*\n$1/;
print "$string\n";
Output:
*
*
*
ab***c
Using
sub substitute_n {
my $n = shift;
my $pattern = shift;
my $replace = shift;
local $_ = shift;
my $i = 1;
s{($pattern)} {
$i++ <= $n ? eval qq{"$replace"} : $1;
}ge;
$_;
}
You can then write
my $s = "***ab***c";
print "[", substitute_n(2, qr/\*/, '$1\n', $s), "]\n";
to get the following output:
[*
*
*ab***c]