perl while loop iterates through a variable. - perl

This little snippet is from the first chapter of the LWP perl oreilly book.
This line
$count++ while $catalog =~ m/Perl/gi;
perplexes me
I do not understand how the while statement iterates through the lines in the $catalog variable to find the matched, I don't even know how to explain what that line does in english much less perl
#!/usr/bin/perl -w
use strict ;
use LWP::Simple ;
my $catalog = get("http://www.oreilly.com/catalog");
my $count = 0;
$count++ while $catalog =~ m/Perl/gi;
print "$count\n";
so I have tried writing it out long hand to no avail.
#!/usr/bin/perl -w
use strict ;
use LWP::Simple ;
my $catalog = get("http://www.oreilly.com/catalog");
open( my $fh_catalog ,"<" , $catalog) || die "cant open $!";
while (<$fh_catalog>) {
print $_ ;
sleep 1;
}
I even tried
#!/usr/bin/perl -w
use strict ;
use LWP::Simple ;
my $catalog = get("http://www.oreilly.com/catalog");
while (<$catalog>) {
print $_ ;
sleep 1;
}

$catalog contains the string <!DOCTYPE HTML PUB[...][newline][newline]<html>[...].
Your first snippet fails because $catalog doesn't contain a file name.
Your second snippet fails because $catalog doesn't contain a file handle.
When a match operator with the /g modifier is used scalar context, it searches from where the last search left off.
The analog would be
use Time::HiRes qw( sleep ); # Support sleeping fractions of seconds.
$| = 1; # Turn off STDOUT's output buffering.
for my $i (0..length($content)-1) {
print(substr($content, $i, 1));
sleep 0.1;
}
Let's use a simpler string as an example.
my $s = "a000a000a000";
++$count while $s =~ /a/g;
Here's what happens:
The match operator is executed. It finds the first a, sets pos($s) = 1;, and returns true.
The loop body is entered, and $count is incremented.
The match operator is executed. It behaves as if the string started as pos($s) (1), finds the second a, sets pos($s) = 5;, and returns true.
The loop body is entered, and $count is incremented.
The match operator is executed. It behaves as if the string started as pos($s) (5), finds the third a, sets pos($s) = 9;, and returns true.
The loop body is entered, and $count is incremented.
The match operator is executed. It behaves as if the string started as pos($s) (9), fails to find a match, clears pos($s), and returns false. The loops exits.
Nothing changes if some of the characters of the string are newlines.

Related

Perl diamond operator in double while loop hangs

In my Perl script I have a double infinite while loop. I read lines from a file with the diamond operator. But somehow if my script reaches the last line of the file, it does not return undef, but hangs forever.
If I reduced my code to a single while loop this did not happen. So I wonder if I am doing something wrong or if this is a known limitation of the language. (This is actually my first perl script.)
Below is my script. It is meant to count the size of DNA sequences in fasta files, but the hanging behavior can be observed with any other file with multiple lines of text.
Perl version 5.18.2
Invoked from the commandline like perl script.pl file.fa
$l = <>;
while (1) {
$N = 0;
while (1) {
print "Get line";
$l = <>;
print "Got line";
if (not($l)) {
last;
}
if ($l =~ /^>/) {
last;
}
$N += length($l);
}
print $N;
if (not($N)) {
last;
}
}
I put some debug print statements so that you can see that the last line printed is "Get line" and then it hangs.
Welcome to Perl.
The issue with your code is that you have no way of escaping the outer loop. <> will return undef when it reaches the end of the file. At this point your inner loop ends and the outer loop sends it back in. Forcing further reads causes <> to start looking at STDIN which never sends an EOF, so your loop continues forever.
As this is your first Perl script I'm going to rewrite it for you with some comments. Perl is a fantastic language, you can write some great code, however mostly due to it's age there are some older styles which are no longer advised.
use warnings; # Warn about coding errors
use strict; # Enforce good style
use 5.010; # Enable modernish (10 year old) features
# Another option which mostly does the same as above.
# I normally do this, but it does require a non-standard CPAN library
# use Modern::Perl;
# Much better style to have the condition in the while loop
# Much clearer than having an infinite loop with break/last statements
# Also avoid $l as a variable name, it looks too much like $1
my $count = 0; # Note variable declaration, enforced by strict
while(my $line = <>) {
if ($line =~ /^>/) {
# End of input block, output and reset
say $count;
$count = 0;
} else {
$count += length($line);
}
}
# Have reached the end of the input files
say $count;
try "echo | perl script.pl file.fa".
works for me with same "problem" in my code.
gets EOF from stdin.

Grep using perl

I'm trying to grep multiple patterns from a log file using perl. For the first pattern i'm getting the desired matching pattern via read only variable($1,$2..). But for the next pattern the read only variable is returning the previous value but not the value matching the second pattern.
here is the code:
$tmp = `grep "solo_video_channel_.*(0): queueing" $log`;
chomp($tmp);
$tmp =~ m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/;
$chnl = $2;
$page = $3;
$timestamp = $1;
$tmp1 = `grep "(0): DUMP GO" $log`;
chomp($tmp1);
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
$dmp = $1;
print "dump go time = $1\n";
tmp1's value after grep is coming as expected. but $1 value remains same as the previous one.
Any suggestions?
Always make sure that you verify that a regex matched before using a captured variable.
Additionally, there is no reason to shell out to grep. Use Perl's file processing instead:
use strict;
use warnings;
local #ARGV = $log;
while (<>) {
chomp;
if (/solo_video_channel_.*\(0\): queueing/) {
if ( my ( $timestamp, $chnl, $page ) = m/(.*):.*solo_video_channel_write(.*): queueing page (.*).*/ ) {
print "$. - $timestamp, $chnl, $page\n";
}
}
if ( my ($dmp) = m/(.*): solo_video_channel_write\(0\): DUMP GO/ ) {
print "dump go time = $dmp\n";
}
}
Note, your first set of if's could almost certainly be combined into a single if statement, but I left it as is for now.
Why not use Pure Perl? It's faster than running external greps. Plus, you can grep both regular expressions at once. Faster than looping through the file twice.
Always check the value of your rexp match. Here I'm using if statements to do this. Note too that I am printing all lines that don't match with UNMATCHED LINES. You can remove the else when you see that everything is working, or simply redirect 2> /dev/null.
use strict;
use warnings;
use autodie;
use feature qw(say);
my $log = "log.txt";
open my $log_fh, "<", $log;
while ( my $line = <$log_fh> ) {
my $timestamp;
my $channel;
my $page;
my $gotime;
if ( $line =~ /(.*):.*solo_video_channel_(.*):\s+queueing page (.*)/ ) {
$timestamp = $1;
$channel = $2;
$page = $3;
say qq(Timestamp = "$timestamp" Channel = "$channel" Page = "$page");
}
elsif ( $line =~ /(.*): solo_video_channel_write(0): DUMP GO/ ) {
$gotime = $1;
say "Dump Go Time = $1";
}
else {
say STDERR qq(UNMATCHED LINES: "$line");
}
}
close $log_fh;
In the second regexp you need to escape the literal brackets
$tmp1 =~ m/(.*): solo_video_channel_write\(0\): DUMP GO/
This is because the expression \(0\) matches the exact pattern (0)
In the example given in this answer this would include strings such as
37: solo_video_channel_write(0): DUMP GO
In contrast, the expression (0) matches the exact pattern 0 and sets a capture group.
With the regexp given in your original question
$tmp1 =~ m/(.*): solo_video_channel_write(0): DUMP GO/;
matching would occur on strings such as
37: solo_video_channel_write0: DUMP GO
Of course in the original program the strings are not in this format, so they do not match and $1 is not set
The regular expression syntax for the shell program grep is (confusingly) different
To use round brackets for setting a capture group they must be escaped with a backslash, which is the opposite to the syntax in perl

How can I count the characters in STDIN using perl without wc?

I am attempting to write a script to count the number of lines, words, and characters input by the user in STDIN. Using the script below, I can accomplish this when a user inputs a file as a CLI, but when I attempt to use this code for STDIN, I end up with an infinite loop. What should I change to fix this?
print "Enter a string to be counted";
my $userInput = <STDIN>;
while ($userInput) {
$lines++;
$chars += length ($_);
$words += scalar(split(/\s+/, $_));
}
printf ("%5d %5d %5d %10s", $lines, $words, $chars, $fileName);
Your program is fine, expect that you need to read from the file handle in the while test. At present you are just reading one line from STDIN and repeatedly checking that it is true - i.e. not zero or undef.
Your code should look like this
use strict;
use warnings;
my ($lines, $chars, $words) = (0, 0, 0);
print "Enter a string to be counted";
while (<STDIN>) {
++$lines;
$chars += length;
$words += scalar split;
}
printf "%5d %5d %5d\n", $lines, $words, $chars;
Note that I have used just length instead of length $_ as $_ is the default parameter for the length operator. $_ only really comes into its own if you use the defaults.
Similarly, the default parameters to split are split ' ', $_ which is what you want in preference to split /\s+/, $_ because the latter returns a zero-length initial field if there are any leading spaces in $_. The special value of a single literal space ' ' just extracts all the sequences of non-space characters, which is almost always what you want. Anything other than just a single space is converted to a regex pattern as normal.
Finally, I have used ++$lines instead of $lines++. The latter is popular only because of the name of the language C++, and it is less common that the value returned by the expression needs to be the original value of the variable rather than the new one. Much more often the increment is used as a statement on its own, as here, when the returned value is irrelevant. If Perl didn't optimise it out (because the context is void and the return value is unused) the code would be doing unnecessary additional work to save the original value of the variable so that it can be returned after the increment. I also think ++$var looks more like the imperative "increment $var" and improves the readability of the code.
Your input has to be within the loop. Else you are processing the same string over and over again.
Maybe this is what you need?
use strict;
use warnings;
print "Enter a string to be counted:\n";
my $lines = 0;
my $chars = 0;
my $words = 0;
while (<>) {
chomp;
$lines++;
$chars += length ($_);
$words += scalar(split(/\s+/, $_));
}
printf ("lines: %5d words: %5d chars: %5d\n", $lines, $words, $chars);

Perl regular expressions and returned array of matched groups

i am new in Perl and i need to do some regexp.
I read, when array is used like integer value, it gives count of elements inside.
So i am doing for example
if (#result = $pattern =~ /(\d)\.(\d)/) {....}
and i was thinking it should return empty array, when pattern matching fails, but it gives me still array with 2 elements, but with uninitialized values.
So how i can put pattern matching inside if condition, is it possible?
EDIT:
foreach (keys #ARGV) {
if (my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) {
if (defined $params{$result[0]}) {
print STDERR "Cmd option error\n";
}
$params{$result[0]} = (defined $result[1] ? $result[1] : 1);
}
else {
print STDERR "Cmd option error\n";
exit ERROR_CMD;
}
}
It is regexp pattern for command line options, cmd options are in long format with two hyphens preceding and possible with argument, so
--CMD[=ARG]. I want elegant solution, so this is why i want put it to if condition without some prolog etc.
EDIT2:
oh sry, i was thinking groups in #result array are always counted from 0, but accesible are only groups from branch, where the pattern is success. So if in my code command is "input", it should be in $result[0], but actually it is in $result[1]. I thought if $result[0] is uninitialized, than pattern fails and it goes to the if statement.
Consider the following:
use strict;
use warnings;
my $pattern = 42.42;
my #result = $pattern =~ /(\d)\.(\d)/;
print #result, ' elements';
Output:
24 elements
Context tells Perl how to treat #result. There certainly aren't 24 elements! Perl has printed the array's elements which resulted from your regex's captures. However, if we do the following:
print 0 + #result, ' elements';
we get:
2 elements
In this latter case, Perl interprets a scalar context for #result, so adds the number of elements to 0. This can also be achieved through scalar #results.
Edit to accommodate revised posting: Thus, the conditional in your code:
if(my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) { ...
evaluates to true if and only if the match was successful.
#results = $pattern =~ /(\d)\.(\d)/ ? ($1,$2) : ();
Try this:
#result = ();
if ($pattern =~ /(\d)\.(\d)/)
{
push #result, $1;
push #result, $2;
}
=~ is not an equal sign. It's doing a regexp comparison.
So my code above is initializing the array to empty, then assigning values only if the regexp matches.

Statistics in Perl Script

I have the following question:
I want to create a perl script that reads from a text file (file with several columns of numbers) and calculate some statistics (mean, median, sd, variance). I already built one script, but as I am not in love yet with perl, I can't fix the problems of syntax on it...
Here is my perl script..
#!/usr/bin/perl -w
use strict;
open(FILEHANDLE, data.txt);
while (<FILEHANDLE>) {
shift #ARGV;
my #array = split(\t,$_);
}
close(FILEHANDLE);
###### mean, sum and size
$N = $sum = 0;
$array[$x-1];
$N++;
$sum += $array[$x-1];
###### minimum and the maximum
($min = 0, $max = 0);
$max = $array[$x-1] if ($max < $array[$x-1]), (my#sorted = sort { $a <=> $b } #samples) {
print join(" ",#sorted);
}
##### median
if ($N % 2==1) {
print "$median = $sorted[int($N/2)]\n"; ## check this out
};
else ($median = ($sorted[$N/2] + $sorted[($N/2)-1]) / 2)) {
print "$median\n"; # check this out
};
##### quantiles 1º and 3º
if $qt1 = $sorted[$r25-1] {
print "\n"; # check this out
};
else $qt1 = $fr*($sorted[$ir] - $sorted[$ir-1]) + $sorted[$ir-1] {
print "\n"; # check this out
};
##### variance
for (my $i=0;
$i<scalar(#samples);
$i++)
{
$Var += ($samples[$i]-$mean)**2;
$Var = $Var/($N-1);
};
###### standard error
($Std = sqrt($Var)/ sqrt($N));
############################################################
print "$min\n";
print "$max\n";
print "$mean\n";
print "$median\n";
print "$qt1\n";
print "$var\n";
print "$std\n";
exit(0);
I want to get it working. Please help. THANKS IN ADVANCE!
Errors in your code:
open(FILEHANDLE, data.txt);
data.txt needs to be quoted. You are not checking the return value of the open, e.g. ... or die $!. You should use a lexical filehandle and three argument open, e.g. open my $fh, '<', "data.txt" or die $!.
shift #ARGV;
This does nothing except remove the first value from you argument list, which is then promptly discarded.
my #array = split(\t,$_);
You are using \t as a bareword, it should be a regex, /\t/. Your #array is declared inside a lexical scope of the while loop, and will be undefined outside this block.
$N = $sum = 0;
Both variables are not declared, which will cause the script to die when you use strict (which is a very good idea). Use my $N to solve that. Also, $N is not a very good variable name.
$array[$x-1];
This will do nothing. $x is not declared (see above), and also undefined. The whole statement does nothing, it is like having a line 3;. I believe you will get an error such as Useless use of variable in void context.
$N++;
This increments $N to 1, which is a useless thing to do, since you only a few lines above initialized it to 0.
Well.. the list goes on. I suggest you start smaller, use strict and warnings since they are very good tools, and work out the errors one by one. A very good idea would be to make subroutines of your calculations, e.g.:
sub sum {
# code here
return $sum;
}
Go to perldoc.perl.org and read the documentation. Especially useful would be the syntax related ones and perlfunc.
Also, you should be aware that this functionality can be found in modules, which you can find at CPAN.
Your main problem is you have not declared your variables such as $N, $max, etc.
You need to introduce all new variables with my the first time you reference them. Just like you did with $array and $i. So for example
$N = $sum = 0;
Should become
my( $N, $sum ) = ( 0, 0 );