How can I count the characters in STDIN using perl without wc? - perl

I am attempting to write a script to count the number of lines, words, and characters input by the user in STDIN. Using the script below, I can accomplish this when a user inputs a file as a CLI, but when I attempt to use this code for STDIN, I end up with an infinite loop. What should I change to fix this?
print "Enter a string to be counted";
my $userInput = <STDIN>;
while ($userInput) {
$lines++;
$chars += length ($_);
$words += scalar(split(/\s+/, $_));
}
printf ("%5d %5d %5d %10s", $lines, $words, $chars, $fileName);

Your program is fine, expect that you need to read from the file handle in the while test. At present you are just reading one line from STDIN and repeatedly checking that it is true - i.e. not zero or undef.
Your code should look like this
use strict;
use warnings;
my ($lines, $chars, $words) = (0, 0, 0);
print "Enter a string to be counted";
while (<STDIN>) {
++$lines;
$chars += length;
$words += scalar split;
}
printf "%5d %5d %5d\n", $lines, $words, $chars;
Note that I have used just length instead of length $_ as $_ is the default parameter for the length operator. $_ only really comes into its own if you use the defaults.
Similarly, the default parameters to split are split ' ', $_ which is what you want in preference to split /\s+/, $_ because the latter returns a zero-length initial field if there are any leading spaces in $_. The special value of a single literal space ' ' just extracts all the sequences of non-space characters, which is almost always what you want. Anything other than just a single space is converted to a regex pattern as normal.
Finally, I have used ++$lines instead of $lines++. The latter is popular only because of the name of the language C++, and it is less common that the value returned by the expression needs to be the original value of the variable rather than the new one. Much more often the increment is used as a statement on its own, as here, when the returned value is irrelevant. If Perl didn't optimise it out (because the context is void and the return value is unused) the code would be doing unnecessary additional work to save the original value of the variable so that it can be returned after the increment. I also think ++$var looks more like the imperative "increment $var" and improves the readability of the code.

Your input has to be within the loop. Else you are processing the same string over and over again.

Maybe this is what you need?
use strict;
use warnings;
print "Enter a string to be counted:\n";
my $lines = 0;
my $chars = 0;
my $words = 0;
while (<>) {
chomp;
$lines++;
$chars += length ($_);
$words += scalar(split(/\s+/, $_));
}
printf ("lines: %5d words: %5d chars: %5d\n", $lines, $words, $chars);

Related

Accept user input perl

How do I accept a list of integers as input? The only thing I can think of is getting each integer from the list specifically using STDIN. Is there a better way to do this?
You want input a list of integers? I take it you mean that you want to enter a list of numbers, and accept that input if they're all integers.
In this program, I loop forever until I get a valid list, thus for (;;). Some people prefer while (1).
I use Scalar::Util's looks_like_number to test whether the input is numeric, and then use int to verify that the number is an integer. You could use a regular expression like /^\d+$/, but there's no guarantee that it works in all circumstances. Using int and looks_like_number guarantees the results.
I assume that a list of integers could be space separated or comma separated or both, thus my split /[\s,]+/.
You said:
The only thing I can think of is getting each integer from the list specifically using STDIN. Is there a better way to do this?
You read in data from a file handle, whether a file or something like STDIN. No way around that. However, you can at least make it a single input rather one at a time which I assume you mean.
By the way, I could have combined my numeric test with:
if( not looks_like_number $integer or $integer != $integer ) {
Since this is an or statement, if this would first check if $input looks numeric, and if it isn't, would warn about the input before checking to see if it's an integer. However, I'm not sure this is actually clearer than making it too separate statements.
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
use Scalar::Util qw(looks_like_number);
my #integers; # Storage for my integers
#
# Keep looping until you have a valid input
#
INPUT:
for (;;) {
print "List of integers: ";
my $input = <STDIN>;
chomp $input;
#integers = split /[\s,]+/, $input;
#
# Verify inputted "number" is numeric and an integer
#
for my $integer ( #integers ) {
if( not looks_like_number $integer ) {
warn qq(Invalid number entered: "$integer");
next INPUT;
}
if( $integer != int $integer ) {
warn qq(Invalid integer entered: "$integer");
next INPUT;
}
}
#
# Exit if there's at least one integer in #integers
#
last if #integers;
}
say "Integers: " . join ": ", #integers;
This is how I did it:
$input = 0;
while($input != -1)
{
print "add input, finish with -1", "\n";
$input = <STDIN>;
chomp($input);
push(#array, $input);
}
#You also need to remove the last input, -1, with pop:
pop(#array);
print #array;
Console output:
add input, finish with -1
1
add input, finish with -1
2
add input, finish with -1
-1
12
If the user inputs a tab delimited string of numbers directly,
you can use the splice function to separate the strings.
#array = splice(/\t/,$array[0])
Here's one approach, taking a comma-separated list of integers:
my $input = <STDIN>;
chomp($input);
if ($input !~ m/^(\d+(,\d+)*)?$/) { die('invalid input'); }
my #input = split(/,/, $input );
Or you could read one integer per line:
my #input;
while (my $input = <STDIN>) {
chomp($input);
if ($input !~ m/^\d+$/) { die('invalid input'); }
push(#input, $input );
} ## end while

What does this if statement do? (string comparison)

I am trying to understand a piece of code which loops over a file, does various assignments, then enters a set of if statements where a string is seemingly compared to nothing. What are /nonsynonymous/ and /prematureStop/ being compared to here? I am mostly experienced with python.
open(IN,$file);
while(<IN>){
chomp $_;
my #tmp = split /\t+/,$_;
my $id = join("\t",$tmp[0],$tmp[1]-1);
$id =~ s/chr//;
my #info_field = split /;/,$tmp[2];
my $vat = $info_field[$#info_field];
my $score = 0;
$self -> {VAT} ->{$id}= $vat;
$self ->{GENE} -> {$id} = $tmp[3];
if (/nonsynonymous/ || /prematureStop/){...
It is comparing against the current input line ($_).
By default, perl will automatically use the current input line ($_) when doing regex matches unless overridden (with =~).
From http://perldoc.perl.org/perlretut.html
If you're matching against the special default variable $_ , the $_ =~
part can be omitted:
$_ = "Hello World";
if (/World/) {
print "It matches\n";
}
else {
print "It doesn't match\n";
}
Often in Perl, if a specific variable isn't given, it's assumed that you want to use the default variable $_. For instance, the while loop assigns the incoming lines from <IN> to that variable, chomp $_; could just as well have been written chomp;, and the regular expressions in the if statement try to match with $_ as well.

Perl regular expressions and returned array of matched groups

i am new in Perl and i need to do some regexp.
I read, when array is used like integer value, it gives count of elements inside.
So i am doing for example
if (#result = $pattern =~ /(\d)\.(\d)/) {....}
and i was thinking it should return empty array, when pattern matching fails, but it gives me still array with 2 elements, but with uninitialized values.
So how i can put pattern matching inside if condition, is it possible?
EDIT:
foreach (keys #ARGV) {
if (my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) {
if (defined $params{$result[0]}) {
print STDERR "Cmd option error\n";
}
$params{$result[0]} = (defined $result[1] ? $result[1] : 1);
}
else {
print STDERR "Cmd option error\n";
exit ERROR_CMD;
}
}
It is regexp pattern for command line options, cmd options are in long format with two hyphens preceding and possible with argument, so
--CMD[=ARG]. I want elegant solution, so this is why i want put it to if condition without some prolog etc.
EDIT2:
oh sry, i was thinking groups in #result array are always counted from 0, but accesible are only groups from branch, where the pattern is success. So if in my code command is "input", it should be in $result[0], but actually it is in $result[1]. I thought if $result[0] is uninitialized, than pattern fails and it goes to the if statement.
Consider the following:
use strict;
use warnings;
my $pattern = 42.42;
my #result = $pattern =~ /(\d)\.(\d)/;
print #result, ' elements';
Output:
24 elements
Context tells Perl how to treat #result. There certainly aren't 24 elements! Perl has printed the array's elements which resulted from your regex's captures. However, if we do the following:
print 0 + #result, ' elements';
we get:
2 elements
In this latter case, Perl interprets a scalar context for #result, so adds the number of elements to 0. This can also be achieved through scalar #results.
Edit to accommodate revised posting: Thus, the conditional in your code:
if(my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) { ...
evaluates to true if and only if the match was successful.
#results = $pattern =~ /(\d)\.(\d)/ ? ($1,$2) : ();
Try this:
#result = ();
if ($pattern =~ /(\d)\.(\d)/)
{
push #result, $1;
push #result, $2;
}
=~ is not an equal sign. It's doing a regexp comparison.
So my code above is initializing the array to empty, then assigning values only if the regexp matches.

Calculate Character Frequency in Message using Perl

I am writing a Perl Script to find out the frequency of occurrence of characters in a message. Here is the logic I am following:
Read one char at a time from the message using getc() and store it into an array.
Run a for loop starting from index 0 to the length of this array.
This loop will read each char of the array and assign it to a temp variable.
Run another for loop nested in the above, which will run from the index of the character being tested till the length of the array.
Using a string comparison between this character and the current array indexed char, a counter is incremented if they are equal.
After completion of inner For Loop, I am printing the frequency of the char for debug purposes.
Question: I don't want the program to recompute the frequency of a character if it's already been calculated. For instance, if character "a" occurs 3 times, for the first run, it calculates the correct frequency. However, at the next occurrence of "a", since loop runs from that index till the end, the frequency is (actual freq -1). Similary for the third occurrence, frequency is (actual freq -2).
To solve this. I used another temp array to which I would push the char whose frequency is already evaluated.
And then at the next run of for loop, before entering the inner for loop, I compare the current char with the array of evaluated chars and set a flag. Based on that flag, the inner for loop runs.
This is not working for me. Still the same results.
Here's the code I have written to accomplish the above:
#!/usr/bin/perl
use strict;
use warnings;
my $input=$ARGV[0];
my ($c,$ch,$flag,$s,#arr,#temp);
open(INPUT,"<$input");
while(defined($c = getc(INPUT)))
{
push(#arr,$c);
}
close(INPUT);
my $length=$#arr+1;
for(my $i=0;$i<$length;$i++)
{
$count=0;
$flag=0;
$ch=$arr[$i];
foreach $s (#temp)
{
if($ch eq $s)
{
$flag = 1;
}
}
if($flag == 0)
{
for(my $k=$i;$k<$length;$k++)
{
if($ch eq $arr[$k])
{
$count = $count+1;
}
}
push(#temp,$ch);
print "The character \"".$ch."\" appears ".$count." number of times in the message"."\n";
}
}
You're making your life much harder than it needs to be. Use a hash:
my %freq;
while(defined($c = getc(INPUT)))
{
$freq{$c}++;
}
print $_, " ", $freq{$_}, "\n" for sort keys %freq;
$freq{$c}++ increments the value stored in $freq{$c}. (If it was unset or zero, it becomes one.)
The print line is equivalent to:
foreach my $key (sort keys %freq) {
print $key, " ", $freq{$key}, "\n";
}
If you want to do a single character count for the whole file then use any of the suggested methods posted by the others. If you want a count of all the occurances
of each character in a file then I propose:
#!/usr/bin/perl
use strict;
use warnings;
# read in the contents of the file
my $contents;
open(TMP, "<$ARGV[0]") or die ("Failed to open $ARGV[0]: $!");
{
local($/) = undef;
$contents = <TMP>;
}
close(TMP);
# split the contents around each character
my #bits = split(//, $contents);
# build the hash of each character with it's respective count
my %counts = map {
# use lc($_) to make the search case-insensitive
my $foo = $_;
# filter out newlines
$_ ne "\n" ?
($foo => scalar grep {$_ eq $foo} #bits) :
() } #bits;
# reverse sort (highest first) the hash values and print
foreach(reverse sort {$counts{$a} <=> $counts{$b}} keys %counts) {
print "$_: $counts{$_}\n";
}
I donĀ“t understand the problem you are trying to solve, so I propose a more simple way to count the characters in a string:
$string = "fooooooobar";
$char = 'o';
$count = grep {$_ eq $char} split //, $string;
print $count, "\n";
This prints the number of $char occurrences in $string (7).
Hope this helps to write a more compact code
Faster solution :
#result = $subject =~ m/a/g; #subject is your file
print "Found : ", scalar #result, " a characters in file!\n";
Of course you can put a variable in the place of 'a' or even better execute this line for whatever characters you want to count the occurrences.
As a one-liner:
perl -F"" -anE '$h{$_}++ for #F; END { say "$_ : $h{$_}" for keys %h }' foo.txt

Perl if equals sign

I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}