!C:\Perl\bin\perl.exe
use strict;
use warnings;
my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";
while (my $line = <DATA> ) {
foreach my $argnum (0 .. $#ARGV) {
if ($line =~ /$ARGV[$argnum]/)
{
print $line;
}
}
}
__DATA__
A
B
Hello World :-)
Hello World !
when I passed one arg, it works well.
Such as I run test.pl A or test.pl B or **test.pl Hello"
when I passed two args, it works some time only.
Successful: When I run test.pl A B or test.pl A Hello or **test.pl B Hello"
Failed: when I run test.pl Hello World*
Produced and output duplicate lines:
D:\learning\perl>t.pl Hello World
thanks, you gave me 2 command-line arguments.
Hello World :-)
Hello World :-)
Hello World !
Hello World !
D:\learning\perl>
How to fix it? Thank you for reading and replies.
[update]
I don't want to print duplicate lines.
I don't see the problem, your script processes the __DATA__ and tests all input words against it: since "Hello" and "World" match twice each, it prints 4 rows.
If you don't want it to write multiple lines, just add last; after the print statement.
The reason you're getting the duplicate output is because the regex $line =~ /Hello/ matches both "Hello World" lines and $line =~ /World/ also matches both "Hello World" lines. To prevent that, you'll need to add something to remember which lines from the __DATA__ section have already been printed so that you can skip printing them if they match another argument.
Also, some very minor stylistic cleanup:
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
my $numArgs = #ARGV;
print "thanks, you gave me $numArgs command-line arguments.\n";
while (my $line = <DATA> ) {
foreach my $arg (#ARGV) {
if ($line =~ /$arg/)
{
print $line;
}
}
}
__DATA__
A
B
Hello World :-)
Hello World !
Using an array in scalar context returns its size, so $size = #arr is preferred over $size = $#arr + 1
If you're not going to use a counter for anything other than indexing through an array (for $i (0..$#arr) { $elem = $arr[$i]; ... }), then it's simpler and more straightforward to just loop over the array instead (for $elem (#arr) { ... }).
Your foreach loop could also be replaced with a grep statement, but I'll leave that as an exercise for the reader.
Assuming you want to print each line from DATA only once if one or more patterns match, you can use grep. Note that use of \Q to quote regex metacharacters in the command line arguments and the use of the #patterns array to precompile the patterns.
Read if grep { $line =~ $_ } #patterns out loud: If $line matches one or more patterns ;-)
#!/usr/bin/perl
use strict; use warnings;
printf "Thanks, you gave me %d command line arguments.\n", scalar #ARGV;
my #patterns = map { qr/\Q$_/ } #ARGV;
while ( my $line = <DATA> ) {
print $line if grep { $line =~ $_ } #patterns;
}
__DATA__
A
B
Hello World :-)
Hello World !
Here are some comments on your script to help you learn:
my $numArgs = $#ARGV + 1;
print "thanks, you gave me $numArgs command-line arguments.\n";
The command line arguments are in #ARGV (please do read the documentation). In scalar context, #ARGV evaluates to the number of elements in that array. Therefore, you can simply use:
printf "Thanks, you gave me %d command line arguments.\n", scalar #ARGV;
Further, you can iterate directly over the elements of #ARGV in your foreach loop instead of indexed access.
while (my $line = <DATA> ) {
foreach my $arg ( #ARGV ) {
if ( $line =~ /$arg/ ) {
print $line;
}
}
}
Now, what happens to your program if I pass ( to it on the command line? Or, even World? What should happen?
Related
Here is a part of my script:
foreach $i ( #contact_list ) {
print "$i\n";
$e = "zcat $file_list2| grep $i";
print "$e\n";
$f = qx($e);
print "$f";
}
$e prints properly but $f gives a blank line even when $file_list2 has a match for $i.
Can anyone tell me why?
Always is better to use Perl's grep instead of using pipe :
#lines = `zcat $file_list2`; # move output of zcat to array
die('zcat error') if ($?); # will exit script with error if zcat is problem
# chomp(#lines) # this will remove "\n" from each line
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print #ar;
# print join("\n",#ar)."\n"; # in case of using chomp
}
Best solution is not calling zcat, but using zlib library :
http://perldoc.perl.org/IO/Zlib.html
use IO::Zlib;
# ....
# place your defiiniton of $file_list2 and #contact list here.
# ...
$fh = new IO::Zlib; $fh->open($file_list2, "rb")
or die("Cannot open $file_list2");
#lines = <$fh>;
$fh->close;
#chomp(#lines); #remove "\n" symbols from lines
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print (#ar);
# print join("\n",#ar)."\n"; #in case of using chomp
}
Your question leaves us guessing about many things, but a better overall approach would seem to be opening the file just once, and processing each line in Perl itself.
open(F, "zcat $file_list |") or die "$0: could not zcat: $!\n";
LINE:
while (<F>) {
######## FIXME: this could be optimized a great deal still
foreach my $i (#contact_list) {
if (m/$i/) {
print $_;
next LINE;
}
}
}
close (F);
If you want to squeeze out more from the inner loop, compile the regexes from #contact_list into a separate array before the loop, or perhaps combine them into a single regex if all you care about is whether one of them matched. If, on the other hand, you want to print all matches for one pattern only at the end when you know what they are, collect matches into one array per search expression, then loop them and print when you have grepped the whole set of input files.
Your problem is not reproducible without information about what's in $i, but I can guess that it contains some shell metacharacter which causes it to be processed by the shell before the grep runs.
When I run this code, I am purely trying to get all the lines containing the word "that" in them. Sounds easy enough. But when I run it, I get a list of matches that contain the word "that" but only at the end of the line. I don't know why it's coming out like this and I have been going crazy trying to solve it. I am currently getting an output of 268 total matches, and the output I need is only 13. Please advise!
#!/usr/bin/perl -w
#Usage: conc.shift.pl textfile word
open (FH, "$ARGV[0]") || die "cannot open";
#array = (1,2,3,4,5);
$count = 0;
while($line = <FH>) {
chomp $line;
shift #array;
push(#array, $line);
$count++;
if ($line =~ /that/)
{
$output = join(" ",#array);
print "$output \n";
}
}
print "Total matches: $count\n";
Don't you want to increment your $count variable only if the line contains "that", i.e.:
if ($line =~ /that/) {
$count++;
instead of incrementing the counter before checking if $line contains "that", as you have it:
$count++;
if ($line =~ /that/) {
Similarly, I suspect that your push() and join() calls, for stashing a matching line in #array, should also be within the if block, only executed if the line contains "that".
Hope this helps!
I felt there must a better way to count occurrence instead of writing a sub in perl, shell in Linux.
#/usr/bin/perl -w
use strict;
return 1 unless $0 eq __FILE__;
main() if $0 eq __FILE__;
sub main{
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = count_occurrence($str, $char);
print "count<$count> of <$char> in <$str>\n";
}
sub count_occurrence{
my ($str, $char) = #_;
my $len = length($str);
$str =~ s/$char//g;
my $len_new = length($str);
my $count = $len - $len_new;
return $count;
}
If the character is constant, the following is best:
my $count = $str =~ tr/y//;
If the character is variable, I'd use the following:
my $count = length( $str =~ s/[^\Q$char\E]//rg );
I'd only use the following if I wanted compatibility with versions of Perl older than 5.14 (as it is slower and uses more memory):
my $count = () = $str =~ /\Q$char/g;
The following uses no memory, but might be a bit slow:
my $count = 0;
++$count while $str =~ /\Q$char/g;
Counting the occurences of a character in a string can be performed with one line in Perl (as compared to your 4 lines). There is no need for a sub (although there is nothing wrong with encapsulating functionality in a sub). From perlfaq4 "How can I count the number of occurrences of a substring within a string?"
use warnings;
use strict;
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = () = $str =~ /\Q$char/g;
print "count<$count> of <$char> in <$str>\n";
In a beautiful* Bash/Coreutils/Grep one-liner:
$ str=ru8xysyyyyyyysss6s5s
$ char=y
$ fold -w 1 <<< "$str" | grep -c "$char"
8
Or maybe
$ grep -o "$char" <<< "$str" | wc -l
8
The first one works only if the substring is just one character long; the second one works only if the substrings are non-overlapping.
* Not really.
toolic has given a correct answer, but you might consider not hardcoding your values to make the program reusable.
use strict;
use warnings;
die "Usage: $0 <text> <characters>" if #ARGV < 1;
my $search = shift; # the string you are looking for
my $str; # the input string
if (#ARGV && -e $ARGV[0] || !#ARGV) { # if str is file, or there is no str
local $/; # slurp input
$str = <>; # use diamond operator
} else { # else just use the string
$str = shift;
}
my $count = () = $str =~ /\Q$search\E/gms;
print "Found $count of '$search' in '$str'\n";
This will allow you to use the program to count for the occurrence of a character, or a string, inside a string, a file, or standard input. For example:
count.pl needles haystack.txt
some_process | count.pl foo
count.pl x xyzzy
I am trying to read from a file and print out the location of duplicate words on each line.I have stored each line in an array, but I am not sure if this is the right way to start.
while (my $fileLine = <$fh>){
my #lineWords = split /\s+/, $fileLine;
print "#\n"
}
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>){
chomp; # remove end of line chars
my #wordsInLine = split /\s+/, $_;
#wordsInLine = map {lc($_)} #wordsInLine; # convert words to lowercase
my( $word, %wordsInLine, $n );
for $word (#wordsInLine) {
$wordsInLine{$word}++; # use hash %wordsInLine to count occurences of words
}
for $word (#wordsInLine) {
$n++;
if( (my $count = $wordsInLine{$word}||0) > 1 ) {
print "line $.: Word $n \"$word\" is repeated $count times\n";
delete($wordsInLine{$word}); # do not generate more than one report
# about the same word in single line
}
}
}
__DATA__
This this is a sample sentence
A that That THAT !
I have created an array as follows
while (defined ($line = `<STDIN>`))
{
chomp ($line);
push #stack,($line);
}
each line has two numbers.
15 6
2 8
how do iterate over each item in each line?
i.e. I want to print
15
6
2
8
I understand it's something like
foreach (#{stack}) (#stack){
print "?????
}
This is where I am stuck.
See the perldsc documentation. That's the Perl Data Structures Cookbook, which has examples for dealing with arrays of arrays. From what you're doing though, it doesn't look like you need an array of arrays.
For your problem of taking two numbers per line and outputting one number per line, just turn the whitespace into newlines:
while( <> ) {
s/\s+/\n/; # turn all whitespace runs into newlines
print; # it's ready to print
}
With Perl 5.10, you can use the new \h character class that matches only horizontal whitespace:
while( <> ) {
s/\h+/\n/; # turn all horizontal whitespace runs into newlines
print; # it's ready to print
}
As a Perl one-liner, that's just:
% perl -pe 's/\h+/\n/' file.txt
#!/usr/bin/perl
use strict;
use warnings;
while ( my $data = <DATA> ) {
my #values = split ' ', $data;
print $_, "\n" for #values;
}
__DATA__
15 6
2 8
Output:
C:\Temp> h
15
6
2
8
Alternatively, if you want to store each line in #stack and print out later:
my #stack = map { [ split ] } grep { chomp; length } <DATA>;
The line above slurps everything coming from the DATA filehandle into a list of lines (because <DATA> happens in list context). The grep chomps each line and filters by length after chomping (to avoid getting any trailing empty lines in the data file -- you can avoid it if there are none). The map then splits each line along spaces, and then creates an anonymous array reference for each line. Finally, such array references are stored in each element of #stack. You might want to use Data::Dumper to look at #stack to understand what's going on.
print join("\n", #$_), "\n" for #stack;
Now, we look over each entry in stack, dereferencing each array in turn, then joining the elements of each array with newlines to print one element per line.
Output:
C:\Temp> h
15
6
2
8
The long way of writing essentially the same thing (with less memory consumption) would be:
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, \#values;
}
for my $ref ( #stack ) {
print join("\n", #$ref), "\n";
}
Finally, if you wanted do something other than printing all values, say, sum all the numbers, you should store one value per element of #stack:
use List::Util qw( sum );
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, #values;
}
printf "The sum is %d\n", sum #stack;
#!/usr/bin/perl
while ($line = <STDIN>) {
chomp ($line);
push #stack, $line;
}
# prints each line
foreach $line (#stack) {
print "$line\n";
}
# splits each line into items using ' ' as separator
# and prints the items
foreach $line (#stack) {
#items = split / /, $line;
foreach $item (#items) {
print $item . "\n";
}
}
I use 'for' for "C" style loops, and 'foreach' for iterating over lists.
#!/usr/bin/perl
use strict;
use warnings;
open IN, "< read.txt" or
die "Can't read in 'read.txt'!";
my $content = join '', <IN>;
while ($content =~ m`(\d+)`g) {
print "$1\n";
}