Counting and printing location of duplicate words in a line using Perl - perl

I am trying to read from a file and print out the location of duplicate words on each line.I have stored each line in an array, but I am not sure if this is the right way to start.
while (my $fileLine = <$fh>){
my #lineWords = split /\s+/, $fileLine;
print "#\n"
}

#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>){
chomp; # remove end of line chars
my #wordsInLine = split /\s+/, $_;
#wordsInLine = map {lc($_)} #wordsInLine; # convert words to lowercase
my( $word, %wordsInLine, $n );
for $word (#wordsInLine) {
$wordsInLine{$word}++; # use hash %wordsInLine to count occurences of words
}
for $word (#wordsInLine) {
$n++;
if( (my $count = $wordsInLine{$word}||0) > 1 ) {
print "line $.: Word $n \"$word\" is repeated $count times\n";
delete($wordsInLine{$word}); # do not generate more than one report
# about the same word in single line
}
}
}
__DATA__
This this is a sample sentence
A that That THAT !

Related

how to count the specific word inputted in STDIN inside the text (PERL)

how to count the specific word inputted in STDIN inside the text (PERL)
my output just count all the owrds found inside the text . but i need the specific word i inputted inside the STDIN
open my($file), '<','C:\Users\yukari\Desktop\hi.txt' or die "not exist";
print "Search the word:";
$word = <STDIN>;
print "\n";
while ( my $line = <$file> ) {
chomp($line);
# print $line;
foreach $word (split(' ', $line)) {
$count{$word}++;
}
}
foreach $word (sort keys %count) {
print "$word: $count{$word}\n";
}
I believe you want to get a word from the user and count the number of occurrences of that word in the entire text file.
You can try something like this:
use strict;
use warnings;
open(WRITE,'>','log.txt') or die "Unable to open the file";
my $string = <<END;
foo baz bar
baz bar bar foo
foo foo bar bar baz baz
baz baz baz
END
print WRITE $string;
close WRITE;
open(READ, '<','log.txt') or die "unable to open the file";
my $search = <STDIN>;
chomp $search;
my $count = 0;
while ( my $line = <READ> ) {
chomp($line);
my #words = split(' ',$line);
foreach my $word(#words){
$count++ if($word eq $search);
}
}
close READ;
print "Search string: $search, Count: $count","\n";
You have a problem here. You are using the variable $word for three different things.
You are using it as the word that you are searching for:
$word = <STDIN>;
You are using it to store each word on a line read from your file:
foreach $word (split(' ', $line)) {
And you are using it to contain the keys you are reading from your hash at the end.
foreach $word (sort keys %count) {
In particular, the second use is interfering with the first. When you are reading data from your file, you have no way of knowing what word you are looking for.
If you are looking for a single word, there is no need for a hash to store the counts. I'd write your code like this:
# ALWAYS INCLUDE THESE
use strict;
use warnings;
use feature 'say';
# Renamed your variable, it's a file handle, not a file.
# Also, include $! in error message so we know what really
# went wrong.
open my $fh, '<', 'C:\Users\yukari\Desktop\hi.txt'
or die "Can't open file: $!";
print "Search the word:";
my $search_word = <STDIN>;
print "\n";
# Scalar variable to store the count
my $count;
# Remove the $line variable and use Perl's default variable $_
# instead. This makes the code *far* cleaner.
while ( <$file> ) {
chomp;
# By default, split splits $_ on whitespace
foreach $word (split) {
# Skip words we don't care about
next if $word ne $search_word;
# Increment the counter
$count++;
}
}
say "$search_word appeared $word times";

Attach frequent words in text file

I have two files. The first contains frequent
word sequences
extracted from a text file
a.txt :
big pizza
eat big pizza
...
the text file is
b.txt :
i eat big pizza .my big pizza ...
My problem is to add bbb between words from each sequence that exist in file a.txt and write a new file .
so the result will be
i eatbbbbigbbbpizza.my bigbbbpizza...
below is my script. It adds bbb only between pairs 9f words. How can I correct this?
use strict;
use warnings;
use autodie;
my ($f1, $f2) = ('a.txt', 'b.txt');
open( my $fh, $f1 );
my #seq;
foreach ( <$fh> ) {
chomp;
s/^\s+|\s+$//g;
push #seq, $_;
}
close $fh;
open($fh, $f2);
foreach (<$fh> ) {
foreach my $r (#seq) {
my $t = $r =~ s/ /bbb/r;
if (/$r/) {
s/$r/$t/g;
}
}
print ;
}
close $fh;
All that is wrong is your line
my $t = $r =~ s/ /bbb/r;
This substitution runs just once, and so replaces only the first space with bbb
You need to use a global substitution instead. And while we're changing this line it's best to also replace the space with \h+, which matches any amount of "horizontal space", including both tabs and spaces
my $t = $r =~ s/\h+/bbb/gr;
As it stands your code will find and replacing substrings of other entries in #seq if they appear earlier in the array. In this case, that means big pizza will be found first and converted to bigbbbpizza, after which
eat big pizza can no longer be found. You need to first sort your array in descending order of length so that longer phrases are found before shorter ones
#seq = sort { length($b) <=> length($a) } #seq;
Then your program will work a little better
Here is the modified code.
use strict;
use warnings;
use autodie;
my ($f1, $f2) = ('a.txt', 'b.txt');
open(my $fh, $f1);
my #seq;
foreach (<$fh> )
{
chomp;
s/^\s+|\s+$//g;
push #seq, $_;
}
close $fh;
#seq = sort bylen #seq; # need to sort #seq by length.
open($fh, $f2);
foreach (<$fh> ) {
foreach my $r (#seq) {
my $t = $r =~ s/ /bbb/gr;
s/$r/$t/g; # you may need to take care of cases of extra spaces
}
print;
}
close $fh;
exit 0;
sub bylen {
length($b) <=> length($a);
}

cant retrieve values from hash reversal (Perl)

I've initialized a hash with Names and their class ranking as follows
a=>5,b=>2,c=>1,d=>3,e=>5
I've this code so far
my %Ranks = reverse %Class; #As I need to find out who's ranked first
print "\nFirst place goes to.... ", $Ranks{1};
The code only prints out
"First place goes to...."
I want it to print out
First place goes to....c
Could you tell me where' I'm going wrong here?
The class hash prints correctly
but If I try to print the reversed hash using
foreach $t (keys %Ranks) {
print "\n $t $Ranks{$t}"; }
It prints
5
abc23
cab2
ord
If this helps in any way
FULL CODE
#Script to read from the data file and initialize it into a hash
my %Code;
my %Ranks;
#Check whether the file exists
open(fh, "Task1.txt") or die "The File Does Not Exist!\n", $!;
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}
close(fh);
#Prints the dataset
print "Code \t Name\n";
foreach $code ( keys %Code) {
print "$code \t $Code{$code}\n";
}
#Find out who comes first
my %Ranks = reverse %Class;
foreach $t (keys %Ranks)
{
print "\n $t $Ranks{$t}";
}
print "\nFirst place goes to.... ", $Ranks{1}, "\n";
When you want to check what your data structures actually contain, use Data::Dumper. use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper(\%Class));. You'll find un-chomped newlines.
You need to use chomp. At present your $fields[2] value has a trailing newline.
Change your file read loop to this
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}

How can I iterate through nested arrays?

I have created an array as follows
while (defined ($line = `<STDIN>`))
{
chomp ($line);
push #stack,($line);
}
each line has two numbers.
15 6
2 8
how do iterate over each item in each line?
i.e. I want to print
15
6
2
8
I understand it's something like
foreach (#{stack}) (#stack){
print "?????
}
This is where I am stuck.
See the perldsc documentation. That's the Perl Data Structures Cookbook, which has examples for dealing with arrays of arrays. From what you're doing though, it doesn't look like you need an array of arrays.
For your problem of taking two numbers per line and outputting one number per line, just turn the whitespace into newlines:
while( <> ) {
s/\s+/\n/; # turn all whitespace runs into newlines
print; # it's ready to print
}
With Perl 5.10, you can use the new \h character class that matches only horizontal whitespace:
while( <> ) {
s/\h+/\n/; # turn all horizontal whitespace runs into newlines
print; # it's ready to print
}
As a Perl one-liner, that's just:
% perl -pe 's/\h+/\n/' file.txt
#!/usr/bin/perl
use strict;
use warnings;
while ( my $data = <DATA> ) {
my #values = split ' ', $data;
print $_, "\n" for #values;
}
__DATA__
15 6
2 8
Output:
C:\Temp> h
15
6
2
8
Alternatively, if you want to store each line in #stack and print out later:
my #stack = map { [ split ] } grep { chomp; length } <DATA>;
The line above slurps everything coming from the DATA filehandle into a list of lines (because <DATA> happens in list context). The grep chomps each line and filters by length after chomping (to avoid getting any trailing empty lines in the data file -- you can avoid it if there are none). The map then splits each line along spaces, and then creates an anonymous array reference for each line. Finally, such array references are stored in each element of #stack. You might want to use Data::Dumper to look at #stack to understand what's going on.
print join("\n", #$_), "\n" for #stack;
Now, we look over each entry in stack, dereferencing each array in turn, then joining the elements of each array with newlines to print one element per line.
Output:
C:\Temp> h
15
6
2
8
The long way of writing essentially the same thing (with less memory consumption) would be:
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, \#values;
}
for my $ref ( #stack ) {
print join("\n", #$ref), "\n";
}
Finally, if you wanted do something other than printing all values, say, sum all the numbers, you should store one value per element of #stack:
use List::Util qw( sum );
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, #values;
}
printf "The sum is %d\n", sum #stack;
#!/usr/bin/perl
while ($line = <STDIN>) {
chomp ($line);
push #stack, $line;
}
# prints each line
foreach $line (#stack) {
print "$line\n";
}
# splits each line into items using ' ' as separator
# and prints the items
foreach $line (#stack) {
#items = split / /, $line;
foreach $item (#items) {
print $item . "\n";
}
}
I use 'for' for "C" style loops, and 'foreach' for iterating over lists.
#!/usr/bin/perl
use strict;
use warnings;
open IN, "< read.txt" or
die "Can't read in 'read.txt'!";
my $content = join '', <IN>;
while ($content =~ m`(\d+)`g) {
print "$1\n";
}

Cleanest Perl parser for Makefile-like continuation lines

A perl script I'm writing needs to parse a file that has continuation lines like a Makefile. i.e. lines that begin with whitespace are part of the previous line.
I wrote the code below but don't feel like it is very clean or perl-ish (heck, it doesn't even use "redo"!)
There are many edge cases: EOF at odd places, single-line files, files that start or end with a blank line (or non-blank line, or continuation line), empty files. All my test cases (and code) are here: http://whatexit.org/tal/flatten.tar
Can you write cleaner, perl-ish, code that passes all my tests?
#!/usr/bin/perl -w
use strict;
sub process_file_with_continuations {
my $processref = shift #_;
my $nextline;
my $line = <ARGV>;
$line = '' unless defined $line;
chomp $line;
while (defined($nextline = <ARGV>)) {
chomp $nextline;
next if $nextline =~ /^\s*#/; # skip comments
$nextline =~ s/\s+$//g; # remove trailing whitespace
if (eof()) { # Handle EOF
$nextline =~ s/^\s+/ /;
if ($nextline =~ /^\s+/) { # indented line
&$processref($line . $nextline);
}
else {
&$processref($line);
&$processref($nextline) if $nextline ne '';
}
$line = '';
}
elsif ($nextline eq '') { # blank line
&$processref($line);
$line = '';
}
elsif ($nextline =~ /^\s+/) { # indented line
$nextline =~ s/^\s+/ /;
$line .= $nextline;
}
else { # non-indented line
&$processref($line) unless $line eq '';
$line = $nextline;
}
}
&$processref($line) unless $line eq '';
}
sub process_one_line {
my $line = shift #_;
print "$line\n";
}
process_file_with_continuations \&process_one_line;
How about slurping the whole file into memory and processing it using regular expressions. Much more 'perlish'. This passes your tests and is much smaller and neater:
#!/usr/bin/perl
use strict;
use warnings;
$/ = undef; # we want no input record separator.
my $file = <>; # slurp whole file
$file =~ s/^\n//; # Remove newline at start of file
$file =~ s/\s+\n/\n/g; # Remove trailing whitespace.
$file =~ s/\n\s*#[^\n]+//g; # Remove comments.
$file =~ s/\n\s+/ /g; # Merge continuations
# Done
print $file;
If you don't mind loading the entire file in memory, then the code below passes the tests.
It stores the lines in an array, adding each line either to the previous one (continuation) or at the end of the array (other).
#!/usr/bin/perl
use strict;
use warnings;
my #out;
while( <>)
{ chomp;
s{#.*}{}; # suppress comments
next unless( m{\S}); # skip blank lines
if( s{^\s+}{ }) # does the line start with spaces?
{ $out[-1] .= $_; } # yes, continuation, add to last line
else
{ push #out, $_; } # no, add as new line
}
$, = "\n"; # set output field separator
$\ = "\n"; # set output record separator
print #out;