So i have been working on this perl script that will analyze and count the same letters in different line spaces. I have implemented the count to a hash but am having trouble excluding a " - " character from the output results of this hash. I tried using delete command or next if, but am not getting rid of the - count in the output.
So with this input:
#extract = ------------------------------------------------------------------MGG-------------------------------------------------------------------------------------
And following code:
#Count selected amino acids.
my %counter = ();
foreach my $extract(#extract) {
#next if $_ =~ /\-/; #This line code does not function correctly.
$counter{$_}++;
}
sub largest_value_mem (\%) {
my $counter = shift;
my ($key, #keys) = keys %$counter;
my ($big, #vals) = values %$counter;
for (0 .. $#keys) {
if ($vals[$_] > $big) {
$big = $vals[$_];
$key = $keys[$_];
}
}
$key
}
I expect the most common element to be G, same as the output. If there is a tie in the elements, say G = M, if there is a way to display both in that would be great but not necessary. Any tips on how to delete or remove the '-' is much appreciated. I am slowly learning perl language.
Please let me know if what I am asking is not clear or if more information is needed, thanks again kindly for all the comments.
Your data doesn't entirely make sense, since it's not actually working perl code. I'm guessing that it's a string divided into characters. After that it sounds like you just want to be able to find the highest frequency character, which is essentially just a sort by descending count.
Therefore the following demonstrates how to count your characters and then sort the results:
use strict;
use warnings;
my $str = '------------------------------------------------------------------MGG-------------------------------------------------------------------------------------';
my #chars = split '', $str;
#Count Characteres
my %count;
$count{$_}++ for #chars;
delete $count{'-'}; # Don't count -
# Sort keys by count descending
my #keys = sort {$count{$b} <=> $count{$a}} keys %count;
for my $key (#keys) {
print "$key $count{$key}\n";
}
Outputs:
G 2
M 1
foreach my $extract(#extract) {
#next if $_ =~ /\-/
$_ setting is suppressed by $extract here.
(In this case, $_ keeps value from above, e.g. routine argument list, previous match, etc.)
Also, you can use character class for better readability:
next if $extract=~/[-]/;
Related
I'm having trouble understanding why this code will not output anything:
#!/usr/bin/perl -w
use strict;
my %allwords = (); #Create an empty hash list.
my $running_total = 0;
while (<>) {
print "In the loop 1";
chomp;
print "Got here";
my #words = split(/\W+/,$_);
}
foreach my $val (my #words) {
print "$val\n";
}
And I run it from the terminal using the command:
perl wordfinder.pl < exampletext.txt
I would expect the code above to output each word from the input file, but it does not output anything other than "In the loop 1" and "Got here". I'm trying to separate the input file word by word, using the split parameter I specified.
Update 1: Here, I have declared the variables within their proper scope, which was my main issue. Now I am getting all of the words from the input file to output on the terminal:
my %allwords = (); #Create an empty hash list.
my $running_total = 0;
my #words = ();
my $val;
while (<>) {
print "Inputting words into an array! \n";
chomp;
#words = split(/\W+/,$_);
}
print("Words have been input successfully, performing analysis: \n");
foreach $val (#words) {
print "$val\n";
}
UPDATE 2: Progress has been made. Now, we put all words from any input files into a hash, and then print each unique key (i.e. each unique word found across all input files) from the hash.
#!/usr/bin/perl -w
use strict;
# Description: We want to take ALL text files from the command line input and calculate
# the frequencies of the words contained therein.
# Step 1: Loop over all words in all input files, and put each new unique word in a
# hash (check to see if contained in hash, if not, put the word in; if the word already
# exists in the hash, then increase its "total" by 1). Also, keep a running total of
# all words.
print("Welcome to word frequency finder. \n");
my $running_total = 0;
my %words;
my $val;
while (<>) {
chomp;
foreach my $str (split(/\W+/,$_)) {
$words{$str}++;
$running_total++;
}
}
print("Words have been input successfully, performing analysis: \n");
# Step 2: Loop over all entries in the hash and look for the word (key) with the
# maximum amount, and then remove this from the hash and put in a separate list.
# Do this until the size of the separate list is 10, since we want the top 10 words.
foreach $val (keys %words) {
print "$val\n";
}
Since you've already completed step 1, you're left with getting your top ten most common words. Rather than looping through the hash and finding the most frequent entry, let's let Perl do the work for us by sorting the hash by its values.
To sort the %words hash by its keys, we can use the expression sort keys %words; to sort a hash by its values, but be able to access its keys, we need a more complex expression:
sort { $words{$a} <=> $words{$a} } keys %words
Breaking it down, to sort numerically, we use the expression
sort { $a <=> $b } #array
(see [perl sort][1] for more on the special variables $a and $b used in sorting)
sort { $a <=> $b } keys %words
would sort on the hash keys, so to sort on the values, we do
sort { $words{$a} <=> $words{$b} } keys %words
Note that the output is still the keys of the hash %words.
We actually want to sort from high to low, so swap $a and $b over to reverse the sort direction:
sort { $words{$b} <=> $words{$a} } keys %words
Since we're compiling a top ten list, we only want the first ten from our hash. It's possible to do this by taking a slice of the hash, but the easiest way is just to use an accumulator to keep count of how many entries we have in the top ten:
my %top_ten;
my $i = 0;
for (sort { $words{$b} <=> $words{$a} } keys %words) {
# $_ is the current hash key
$top_ten{$_} = $words{$_};
$i++;
last if $i == 10;
}
And we're done!
I have a hash that I sorted by values greatest to least. How would I go about getting the top 5? There was a post on here that talked about getting only one value.
What is the easiest way to get a key with the highest value from a hash in Perl?
I understand that so would lets say getting those values add them to an array and delete the element in the hash and then do the process again?
Seems like there should be an easier way to do this then that though.
My hash is called %words.
Edited Took out code as the question answered without really needing it.
Your question is how to get the five highest values from your hash. You have this code:
my #keys = sort {
$words{$b} <=> $words{$a}
or
"\L$a" cmp "\L$b"
} keys %words;
Where you have your sorted hash keys. Take the five top keys from there?
my #highest = splice #keys, 0, 5; # also deletes the keys from the array
my #highest = #keys[0..4]; # non-destructive solution
Also some comments on your code:
open( my $filehandle0, '<', $file0 ) || die "Could not open $file0\n";
It is a good idea to include the error message $! in your die statement to get valuable information for why the open failed.
for (#words) {
s/[\,|\.|\!|\?|\:|\;|\"]//g;
}
Like I said in the comment, you do not need to escape characters or use alternations in a character class bracket. Use either:
s/[,.!?:;"]//g for #words; #or
tr/,.!?:;"//d for #words;
This next part is a bit odd.
my #stopwords;
while ( my $line = <$filehandle1> ) {
chomp $line;
my #linearray = split( " ", $line );
push( #stopwords, #linearray );
}
for my $w ( my #stopwords ) {
s/\b\Q$w\E\B//ig;
}
You read in the stopwords from a file... and then you delete the stopwords from $_? Are you even using $_ at this point? Moreover, you are redeclaring the #stopwords array in the loop header, which will effectively mean your new array will be empty, and your loop will never run. This error is silent, it seems, so you might never notice.
my %words = %words_count;
Here you make a copy of %words_count, which seems to be redundant, since you never use it again. If you have a big hash, this can decrease performance.
my $key_count = 0;
$key_count = keys %words;
This can be done in one line: my $key_count = keys %words. More readable, in my opinion.
$value_count = $words{$key} + $value_count;
Can also be abbreviated with the += operator: $value_cont += $words{$key}
It is very good that you use strict and warnings.
If performance isn't a big deal
(sort {$words{$a} <=> $words{$b}} keys %words)[0..4])
if you absolutely need killer speed, a selection sort which terminates after 5 iterations is probably the best thing for you.
my #results;
for (0..4) {
my $maxkey;
my $max = 0;
for my $key (keys %words){
if ($max < $words{$key}){
$maxkey = $key;
$max = $words{$key};
}
}
push #results, $maxkey;
delete $words{$maxkey};
}
say join(","=>#results);
There's CPAN module for that, Sort::Key::Top.
It has a straight-forward interface and an efficient XS implementation:
use Sort::Key::Top qw(rnkeytop);
my #results = rnkeytop { $words{$_} } 5 => keys %words;
# my code as follows
use strict;
use FileHandle;
my #LISTS = ('incoming');
my $WORK ="c:\";
my $OUT ="c:\";
foreach my $list (#LISTS) {
my $INFILE = $WORK."test.dat";
my $OUTFILE = $OUT."TEST.dat";
while (<$input>) {
chomp;
my($f1,$f2,$f3,$f4,$f5,$f6,$f7) = split(/\|/);
push #sum, $f4,$f7;
}
}
while (#sum) {
my ($key,$value)= {shift#sum, shift#sum};
$hash{$key}=0;
$hash{$key} += $value;
}
while my $key (#sum) {
print $output2 sprintf("$key1\n");
# print $output2 sprintf("$key ===> $hash{$key}\n");
}
close($input);
close($output);
I am getting errors Unintialized error at addition (+) If I use 2nd print
I get HASH(0x19a69451) values if I use 1st Print.
I request you please correct me.
My output should be
unique Id ===> Total Revenue ($f4==>$f7)
This is wrong:
"c:\";
Perl reads that as a string starting with c:";\n.... Or in other words, it is a run away string. You need to write the last character as \\ to escape the \ and prevent it from escaping the subsequent " character
You probably want to use parens instead of braces:
my ($key, $value) = (shift #sum, shift #sum);
You would get that Unintialized error at addition (+) warning if the #sum array has an odd number of elements.
See also perltidy.
You should not enter the second while loop :
while my $key (#sum) {
because the previous one left the array #sum empty.
You could change to:
while (<$input>) {
chomp;
my #tmp = split(/\|/);
$hash{$tmp[3]} += $tmp[6];
}
print Dumper \%hash;
I'm looking for help sorting an array where each element is made up of "a number, then a string, then a number". I would like to sort on the first number part of the array elements, descending (so that I list the higher numbers first), while also listing the text etc.
am still a beginner so alternatives to the below are also welcome
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/$i,/g; # maybe try to make a string only once then search trough it... ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
}
#for (#count2) {print "$_\n";}
# try to add up all numbers in the first coloum to make sure they == 100
#sort #count2 and print the top 7
#count2 = sort {$b <=> $a} #count2; # try to stop printout of this, or sort on =~ m/^anumber/ ??? or just on the first one or two \d
foreach my $i (0..6) {
print $count2[$i] ."\n"; # seems to be sorted right anyway
}
First, store your data in an array, not in a string:
# inside the first loop, replace your line with the push() with this one:
push(#count2, [$count1, $i];
Then you can easily sort by the first element of each subarray:
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
And when you print it, construct the string:
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
See also: http://perldoc.perl.org/perlreftut.html, perlfaq4
Taking your requirements as is. You're probably better off not embedding count information in a string. However, I'll take it as a learning exercise.
Note, I am trading memory for brevity and likely speed by using a hash to do the counting.
However, the sort could be optimized by using a Schwartzian Transform.
EDIT: Create results array using only numbers that were drawn
#!/usr/bin/perl
use strict; use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 );
my %counts;
++$counts{$_} for #arr;
my #result = map sprintf('%d times for %d', $counts{$_}, $_),
sort {$counts{$a} <=> $counts{$b}} keys %counts;
print "$_\n" for #result;
However, I'd probably have done something like this:
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99; #initialize #arr capacity to 100 elements
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1); # pick a number
$arr[ $i ] = $n; # store it
++$counts{ $n }; # update count
}
# sort keys according to counts, keys of %counts has only the numbers drawn
# for each number drawn, create an anonymous array ref where the first element
# is the number drawn, and the second element is the number of times it was drawn
# and put it in the #result array
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
print Dump \#result;
I am writing a Perl Script to find out the frequency of occurrence of characters in a message. Here is the logic I am following:
Read one char at a time from the message using getc() and store it into an array.
Run a for loop starting from index 0 to the length of this array.
This loop will read each char of the array and assign it to a temp variable.
Run another for loop nested in the above, which will run from the index of the character being tested till the length of the array.
Using a string comparison between this character and the current array indexed char, a counter is incremented if they are equal.
After completion of inner For Loop, I am printing the frequency of the char for debug purposes.
Question: I don't want the program to recompute the frequency of a character if it's already been calculated. For instance, if character "a" occurs 3 times, for the first run, it calculates the correct frequency. However, at the next occurrence of "a", since loop runs from that index till the end, the frequency is (actual freq -1). Similary for the third occurrence, frequency is (actual freq -2).
To solve this. I used another temp array to which I would push the char whose frequency is already evaluated.
And then at the next run of for loop, before entering the inner for loop, I compare the current char with the array of evaluated chars and set a flag. Based on that flag, the inner for loop runs.
This is not working for me. Still the same results.
Here's the code I have written to accomplish the above:
#!/usr/bin/perl
use strict;
use warnings;
my $input=$ARGV[0];
my ($c,$ch,$flag,$s,#arr,#temp);
open(INPUT,"<$input");
while(defined($c = getc(INPUT)))
{
push(#arr,$c);
}
close(INPUT);
my $length=$#arr+1;
for(my $i=0;$i<$length;$i++)
{
$count=0;
$flag=0;
$ch=$arr[$i];
foreach $s (#temp)
{
if($ch eq $s)
{
$flag = 1;
}
}
if($flag == 0)
{
for(my $k=$i;$k<$length;$k++)
{
if($ch eq $arr[$k])
{
$count = $count+1;
}
}
push(#temp,$ch);
print "The character \"".$ch."\" appears ".$count." number of times in the message"."\n";
}
}
You're making your life much harder than it needs to be. Use a hash:
my %freq;
while(defined($c = getc(INPUT)))
{
$freq{$c}++;
}
print $_, " ", $freq{$_}, "\n" for sort keys %freq;
$freq{$c}++ increments the value stored in $freq{$c}. (If it was unset or zero, it becomes one.)
The print line is equivalent to:
foreach my $key (sort keys %freq) {
print $key, " ", $freq{$key}, "\n";
}
If you want to do a single character count for the whole file then use any of the suggested methods posted by the others. If you want a count of all the occurances
of each character in a file then I propose:
#!/usr/bin/perl
use strict;
use warnings;
# read in the contents of the file
my $contents;
open(TMP, "<$ARGV[0]") or die ("Failed to open $ARGV[0]: $!");
{
local($/) = undef;
$contents = <TMP>;
}
close(TMP);
# split the contents around each character
my #bits = split(//, $contents);
# build the hash of each character with it's respective count
my %counts = map {
# use lc($_) to make the search case-insensitive
my $foo = $_;
# filter out newlines
$_ ne "\n" ?
($foo => scalar grep {$_ eq $foo} #bits) :
() } #bits;
# reverse sort (highest first) the hash values and print
foreach(reverse sort {$counts{$a} <=> $counts{$b}} keys %counts) {
print "$_: $counts{$_}\n";
}
I donĀ“t understand the problem you are trying to solve, so I propose a more simple way to count the characters in a string:
$string = "fooooooobar";
$char = 'o';
$count = grep {$_ eq $char} split //, $string;
print $count, "\n";
This prints the number of $char occurrences in $string (7).
Hope this helps to write a more compact code
Faster solution :
#result = $subject =~ m/a/g; #subject is your file
print "Found : ", scalar #result, " a characters in file!\n";
Of course you can put a variable in the place of 'a' or even better execute this line for whatever characters you want to count the occurrences.
As a one-liner:
perl -F"" -anE '$h{$_}++ for #F; END { say "$_ : $h{$_}" for keys %h }' foo.txt