Mode function in Perl - perl

I need help with creating the mode function in Perl. I have written a code using help from different sources but every time the value differs.
Mode - The maximum frequency of occurrence of an element in an array
Current Code -
#array = <STDIN>;
#sorted = sort { $a <=> $b } #array ;
for $i(#sorted)
{
$cnt =0;
for $j(#sorted)
{
if($i eq $j)
{
$cnt = $cnt + 1;
$data{$i}= $cnt;
}
}
}
#modes = sort { $data{$a} <=> $data{$b} } keys %data;
$mode = $modes[-1];

Having cleaned up your indentation and added use strict and use warnings, I get the following code.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #array = <STDIN>;
my #sorted = sort { $a <=> $b } #array ;
my %data;
for my $i (#sorted) {
my $cnt = 0;
for my $j (#sorted) {
if ($i eq $j) {
$cnt = $cnt + 1;
$data{$i}= $cnt;
}
}
}
my #modes = sort { $data{$a} <=> $data{$b} } keys %data;
say "The mode is: $modes[-1]";
And that seems to work correctly. So, perhaps, you need to explain the problem in a little more detail.
Update: Ok, so I think I now understand your problem - even though you really haven't said what it is very clearly.
If there are two potential values for the mode (i.e. two numbers that appear the same number of times) then you want the smallest one. And your current solution picks one of the potential values at random.
The problem is this line:
#modes = sort { $data{$a} <=> $data{$b} } keys %data;
This sorts the hash by the number of times the values appear. But if multiple values appear the same number of time, they can appear in any order - so you'll get a seemingly random number out of the end.
The solution is to add more intelligence to the sort so that in case of a tie, the smaller key sorts last. That would look like this:
my #modes = sort {
$data{$a} <=> $data{$b}
or
$b <=> $a
} keys %data;

Related

A simple variable count inside array

After working with this code, I am stuck at what I think is a simple error, yet I need outside eyes to see what is wrong.
I used unpack function to divide an array into the following.
#extract =
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
Apparently, after unpacking into the array, when I try to go into the while loop, #extract shows up completely empty. Any idea as to why this is happening?
print #extract; #<-----------Prints input
my $sum = 0;
my %counter = ();
while (my $column = #extract) {
print #extract; #<------- This extract is completely empty. Should be input
for (my $aa = (split ('', $column))){
$counter{$aa}++;
delete $counter{'-'}; # Don't count -
}
# Sort keys by count descending
my #keys = (sort {$counter{$b} <=> $counter{$a}} keys %counter) [0]; #gives highest letter
for my $key (#keys) {
$sum += $counter{$key};
print OUTPUT "$key $counter{$key} ";
Each line is an array element correct? I don't see in your code where you are checking the individual characters.
Assuming the input that you have shown is a 3 element array containing the line as a string:
#!/usr/bin/perl
use strict;
use warnings;
my #entries;
while(my $line = shift(#extract)){
my %hash;
for my $char(split('', $line)){
if($char =~ /[a-zA-Z]/) { $hash{$char}++ }
}
my $high;
for my $key (keys %hash) {
if(!defined($high)){ $high = $key }
elsif($hash{$high} < $hash{$key}){
$high = $key
}
}
push #entries, {$high => $hash{$high}};
}
Note this empties #extract, if you don't want to do that you'd have to use a for loop like below
for my $i (0 .. $#extract){
#my %hash etc...
}
EDIT:
Changed it so that only the highest number is actually kept
An approach using reduce from List::Util.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util 'reduce';
my #extract = qw/
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
/;
for (#extract) {
my %count;
tr/a-zA-Z//cd;
for (split //) {
$count{$_}++;
}
my $max = reduce { $count{$a} > $count{$b} ? $a : $b } keys %count;
print "$max $count{$max}\n";
}

How can I get this basic Perl sub program that sorts to work properly?

I am brand new to Perl. Can someone help me out and give me a tip or a solution on how to get this sorting sub program to work. I know it has something to do with how arrays are passed to sub programs. I searched online and did not find an answer that I was satisfied with... I also like the suggestions the helpful S.O. users give me too. I would like to have the program print the sorted array in the main sub program. Currently, it is printing the elements of the array #a in original order. I want the sub program to modify the array so when I print the array it is in sorted order. Any suggestions are appreciated. Of course, I want to see the simplest way to fix this.
sub sort {
my #array = #_;
my $i;
my $j;
my $iMin;
for ( $i = 0; $i < #_ - 1; $i++ ) {
$iMin = $i;
for ( $j = $i + 1; $j < #_; $j++ ) {
if ( $array[$j] < $array[$iMin] ) {
$iMin = $j;
}
}
if ( $iMin != $i ) {
my $temp = $array[$i];
$array[$i] = $array[$iMin];
$array[$iMin] = $temp;
}
}
}
Then call from a main sub program:
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
&sort(#a);
my $i;
for ( $i = 0; $i < #a; $i++ ) {
print "$a[$i]\n";
}
}
main;
When your sub does the following assignment my #array = #_, it is creating a copy of the passed contents. Therefore any modifications to the values of #array will not effect #a outside your subroutine.
Following the clarification that this is just a personal learning exercise, there are two solutions.
1) You can return the sorted array and assign it to your original variable
sub mysort {
my #array = #_;
...
return #array;
}
#a = mysort(#a)
2) Or you can pass a reference to the array, and work on the reference:
sub mysort {
my $arrayref = shift;
...
}
mysort(\#a)
Also, it's probably a good idea to not use a sub named sort since that's that's a builtin function. Duplicating your code using perl's sort:
#a = sort {$a <=> $b} #a;
Also, the for loops inside your sub should be rewritten to utilize the last index of an #array, which is written as $#array, and the range operator .. which is useful for incrementors :
for ( my $j = $i + 1; $j <= $#array; $j++ ) {
# Or simpler:
for my $j ($i+1 .. $#array) {
And finally, because you're new, I should pass on that all your scripts should start with use strict; and use warnings;. For reasons why: Why use strict and warnings?
With very few, rare exceptions the simplest (and easiest) way to sort stuff in perl is simply to use the sort builtin.
sort takes an optional argument, either a block or a subname, which can be used to control how sort evaluates which of the two elements it is comparing at any given moment is greater.
See sort on perldoc for further information.
If you require a "natural" sort function, where you get the sequence 0, 1, 2, 3, ... instead of 0, 1, 10, 11, 12, 2, 21, 22, 3, ..., then use the perl module Sort::Naturally which is available on CPAN (and commonly available as a package on most distros).
In your case, if you need a pure numeric sort, the following will be quite sufficient:
use Sort::Naturally; #Assuming Sort::Naturally is installed
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
#Choose one of the following
#a = sort #a; #Sort in "ASCII" ascending order
#a = sort { $b cmp $a } #a; #Sort in reverse of the above
#a = nsort #a; #Sort in "natural" order
#a = sort { ncmp($b, $a) } #a; #Reverse of the above
print "$_\n" foreach #a; #To see what you actually got
}
It is also worth mentioning the use sort 'stable'; pragma which can be used to ensure that sorting occurs using a stable algorithm, meaning that elements which are equal will not be rearranged relative to one another.
As a bonus, you should be aware that sort can be used to sort data structures as well as simple scalars:
#Assume #a is an array of hashes
#a = sort { $a->{name} cmp $b->{name} } #; #Sort #a by name key
#Sort #a by name in ascending order and date in descending order
#a = sort { $a->{name} cmp $b->{name} || $b->{date} cmp $a->{date} } #a;
#Assume #a is an array of arrays
#Sort #a by the 2nd element of the arrays it contains
#a = sort { $a->[1] cmp $b->[1] } #a;
#Assume #a is an array of VERY LONG strings
#Sort #a alphanumerically, but only care about
#the first 1,000 characters of each string
#a = sort { substr($a, 0, 1000) cmp substr($b, 0, 1000) } #a;
#Assume we want to "sort" an array without modifying it:
#Yes, the names here are confusing. See below.
my #idxs = sort { $a[$a] cmp $a[$b] } (0..$#a);
print "$a[$_]\n" foreach #idxs;
##idxs contains the indexes to #a, in the order they would have
#to be read from #a in order to get a sorted version of #a
As a final note, please remember that $a and $b are special variables in perl, which are pre-populated in the context of a sorting sub or sort block; the upshot is that if you're working with sort you can always expect $a and $b to contain the next two elements being compared, and should use them accordingly, but do NOT do my $a;, e.g., or use variables with either name in non-sort-related stuff. This also means that naming things %a or #a, or %b or #b, can be confusing -- see the final section of my example above.

Perl Hash of Hash Output

I'm reading a file. I want a hash that gives me the first number of a line as a key to a hash of all the numbers of the rest of the line to 1.
I believe I'm adding the hash correctly, because Dumper prints correctly.
However, print "$first $secondID\n" is not giving me any output.
while (<FILE>) {
chomp $_;
if (/(\d+)\t(.+)/) {
$firstNum = $1;
#seconds = split(/\,/,$2);
foreach $following (#seconds) {
$Pairs->{$firstNum}{$following} = 1;
}
foreach $first (sort {$a <=> $b} keys %Pairs) {
print "$first\n";
%second = {$Pairs{$first}};
foreach $secondID (sort {$a <=> $b} keys %second) {
print "$first $secondID\n";
}
}
print Dumper($Pairs);
}
else {
print "ERROR\n";
}
}
Later on, given a pair of numbers I would like to look up to see whether $Pairs{$num1}{$num2} is defined. would I write
if(defined $Pairs{$num1}{$num2})
Or should I check the first key first. Then check the second key
if (defined $Pairs{$num1}) {
$temp = $Pairs{$num1};
if (defined $temp{$num2}) {
print "true\n;
}
}
You have a couple of errors. Firstly you seem to be unsure whether you are using %Pairs or $Pairs to store your hash, and secondly you have %second = {$Pairs{$first}}, which tries to assign a hash reference to the hash %second. Presumably you want my %second = %{ $Pairs{$first} }.
You should always use strict and use warnings at the start of all your Perl programs, and declare all variables at the point of first use using my. This will alert you to simple mistakes you could otherwise easily overlook, and would have shown up your use of both %Pairs and $Pairs in this program, as well as your attempt to assign a single value (a hash reference) to a hash.
Rather than copying the entire hash, you should save a reference to it in $seconds. Then you can dereference it in the following for loop.
Experienced Perl programmers would also thank you for using lower-case plus underscore for local (my) variables, and reserving capitals for package and class names.
This program works as you intended, and expects the file name as a command-line parameter:
use strict;
use warnings;
my %pairs;
while (<>) {
unless ( /(\d+)\s+(.+)/ ) {
print "ERROR\n";
next;
}
my $first_num = $1;
my #seconds = split /,/, $2;
foreach my $following (#seconds) {
$pairs{$first_num}{$following} = 1;
}
foreach my $first (sort { $a <=> $b } keys %pairs) {
print "$first\n";
my $second = $pairs{$first};
foreach my $second_id (sort { $a <=> $b } keys %$second) {
print "$first $second_id\n";
}
}
}
my %hash;
while ( <> ) {
my #numbers = split /\D+/;
my $key = shift #numbers;
#{$hash{$key}}{ #numbers } = ( 1 ) x #numbers;
}
# test it this way...
if ( $hash{ $num1 }{ $num2 } ) {
}
Use:
%second = %{$Pairs->{$first}};

How can I do alpha numeric sort in Perl?

I have a file which looks like this:
80,1p21
81,19q13
82,6p12.3
83,Xp11.22
84,3pter-q21
86,3q26.33
87,14q24.1-q24.2|14q24|14q22-q24
88,1q42-q43
89,11q13.1
90,2q23-q24
91,12q13
92,2q22.3
93,3p22
94,12q11-q14
95,3p21.1
97,14q24.3
98,2p16.2
And I want to sort them based on the second column. And the first column should change accordingly too. When you use the 'sort' command in Perl, it doesn't do it because it says it's not numeric. Is there a way to sort things alpha numerically in Perl?
If you read the documentation for sort, you'll see that you don't need to do a numeric sort in Perl. You can do string comparisons too.
#sorted = sort { $a cmp $b } #unsorted;
But that still leaves you with a problem as, for example, 19q will sort before 6p. So you can write your own sort function which can make whatever transformations you want before doing the comparison.
#sorted = sort my_complex_sort #unsorted;
sub my_complex_sort {
# code that compares $a and $b and returns -1, 0 or 1 as appropriate
# It's probably best in most cases to do the actual comparison using cmp or <=>
# Extract the digits following the first comma
my ($number_a) = $a =~ /,(\d+)/;
my ($number_b) = $b =~ /,(\d+)/;
# Extract the letter following those digits
my ($letter_a) = $a =~ /,\d+(a-z)/;
my ($letter_b) = $b =~ /,\d+(a-z)/;
# Compare and return
return $number_a <=> $number_b or $letter_a cmp $letter_b;
}
#!/usr/bin/env perl
use strict;
use warnings;
my #datas = map { /^(\d+),(\d*)(.*)$/; [$1, $2, $3]; } <DATA>;
my #res = sort {$a->[1] <=> $b->[1] or $a->[2] cmp $b->[2]} #datas;
foreach my $data (#res) {
my ($x, $y, $z) = #{$data};
print "$x,$y$z\n";
}
__DATA__
80,1p21
81,19q13
82,6p12.3
83,Xp11.22
84,3pter-q21
86,3q26.33
87,14q24.1-q24.2|14q24|14q22-q24
88,1q42-q43
89,11q13.1
90,2q23-q24
91,12q13
92,2q22.3
93,3p22
94,12q11-q14
95,3p21.1
97,14q24.3
98,2p16.2
I actually found the answer to this. The code looks a bit complicated though.
#!/usr/bin/env perl
use strict;
use warnings;
sub main {
my $file;
if (#ARGV != 1) {
die "Usage: perl hashofhash_sort.pl <filename>\n";
}
else {
$file = $ARGV[0];
}
open(IN, $file) or die "Error!! Cannot open the $file file: $!\n";
my #file = <IN>;
chomp #file;
my ($entrez_gene, $loci, $chr, $band, $pq, $band_num);
my (%chromosome, %loci_entrez);
foreach my $line (#file) {
if ($line =~ /(\d+),(.+)/) {
# Entrez genes
$entrez_gene = $1;
# Locus like 12p23.4
$loci = $2;
if ($loci =~ /^(\d+)(.+)?/) {
# chromosome number alone (only numericals)
$chr = $1;
if ($2) {
# locus minus chromosome number. If 12p23.4, then $band is p23.4
$band = "$2";
if ($band =~ /^([pq])(.+)/) {
# either p or q
$pq = $1;
# stores the numericals. for p23.4, stores 23.4
$band_num = $2;
}
if (exists $chromosome{$chr}) {
if (exists $chromosome{$chr}{$pq}) {
push (#{$chromosome{$chr}{$pq}}, $band_num);
}
else {
$chromosome{$chr}{$pq} = [$band_num];
}
}
else {
$chromosome{$chr}{$pq} = [$band_num];
}
}
}
}
} # End of foreach loop
foreach my $key (sort {$a <=> $b} keys %chromosome) {
my %seen = ();
foreach my $key2 (sort {$a cmp $b } keys %{$chromosome{$key}}) {
my #unique = grep { ! $seen{$_}++ } #{$chromosome{$key}{$key2}};
my #sorted = sort #unique;
foreach my $element (#sorted) {
my $sorted_locus = "$key$key2$element";
if (exists $loci_entrez{$sorted_locus}) {
foreach my $element2 (#{$loci_entrez{$sorted_locus}}) {
print "$element2,$sorted_locus\n";
}
}
}
}
}
} # End of main
main();
In the very general case, the question is ambiguous on what to do with integers that are equal but written differently, because of the possibility of leading zeros. The following comparison function (for sort) allows one to consider the lexicographic order as soon as one doesn't have different integers. This is the same as zsh's numeric sort.
sub alphanumcmp ($$)
{
my (#u,#v);
if ((#u = $_[0] =~ /^(\d+)/) &&
(#v = $_[1] =~ /^(\d+)/))
{
my $c = $u[0] <=> $v[0];
return $c if $c;
}
if ((#u = $_[0] =~ /^(.)(.*)/) &&
(#v = $_[1] =~ /^(.)(.*)/))
{
return $u[0] cmp $v[0] || &alphanumcmp($u[1],$v[1]);
}
return $_[0] cmp $_[1];
}
For instance, one would get the following sorted elements:
a0. a00. a000b a00b a0b a001b a01. a01b a1. a1b a010b a10b a011b a11b
Note 1: The use of <=> assumes that the numbers are not too large.
Note 2: In the question, the user wants to do an alphanumeric sort on the second column (instead of the whole string). So, in this particular case, the comparison function could just be adapted to ignore the first column or a Schwartzian transform could be used.

perl: shuffle value-sorted hash?

At first sorry for my english - i hope you will understand me.
There is a hash:
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
I want to sort it by values (not keys) so I have:
for my $key ( sort { $hash{ $a } <=> $hash{ $b } } keys %hash ) { ... }
And at first I get all the keys with value 1, then with value 2, etc... Great.
But if hash is not changing, the order of keys (in this sort-by-value) is always the same.
Question: How can I shuffle sort-results, so every time I run 'for' loop, I get different order of keys with value 1, value 2, etc. ?
Not quite sure I well understand your needs, but is this ok:
use List::Util qw(shuffle);
my %hash;
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
for my $key (sort { $hash{ $a } <=> $hash{ $b } } shuffle( keys %hash )) {
say "hash{$key} = $hash{$key}"
}
You can simply add another level of sorting, which will be used when the regular sorting method cannot distinguish between two values. E.g.:
sort { METHOD_1 || METHOD_2 || ... METHOD_N } LIST
For example:
sub regular_sort {
my $hash = shift;
for (sort { $hash->{$a} <=> $hash->{$b} } keys %$hash) {
print "$_ ";
};
}
sub random_sort {
my $hash = shift;
my %rand = map { $_ => rand } keys %hash;
for (sort { $hash->{$a} <=> $hash->{$b} ||
$rand{$a} <=> $rand{$b} } keys %$hash ) {
print "$_ ";
};
}
To sort the keys by value, with random ordering of keys with identical values, I see two solutions:
use List::Util qw( shuffle );
use sort 'stable';
my #keys =
sort { $hash{$a} <=> $hash{$b} }
shuffle keys %hash;
or
my #keys =
map $_->[0],
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
map [ $_, $hash{$_}, rand ],
keys %hash;
The use sort 'stable'; is required to prevent sort from corrupting the randomness of the list returned by shuffle.
The above's use of the Schwartzian Transform is not an attempt at optimisation. I've seen people use rand in the compare function itself to try to achieve the above result, but doing so is buggy for two reasons.
When using "misbehaving" comparisons such as that, the results are documented as being undefined, so sort is allowed to return garbage, repeated elements, missing elements, etc.
Even if sort doesn't return garbage, it won't be a fair sort. The result will be weighed.
You can have two functions for ascending and decending order and use them accordingly like
sub hasAscending {
$hash{$a} <=> $hash{$b};
}
sub hashDescending {
$hash{$b} <=> $hash{$a};
}
foreach $key (sort hashAscending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
foreach $key (sort hashDescending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
It seems like you want to randomize looping through the keys.
Perl, does not store in sequential or sorted order, but this doesn't seem to be random enough for you, so you may want to create an array of keys and loop through that instead.
First, populate an array with keys, then use a random number algorithm (1..$#length_of_array) to push the key at that position in the array, to the array_of_keys.
If you're trying to randomize the keys of the sorted-by-value hash, that's a little different.
See Codepad
my %hash = (a=>1, b=>3, c=>3, d=>2, e=>1, f=>1);
my %hash_by_val;
for my $key ( sort { $hash{$a} <=> $hash{$b} } keys %hash ) {
push #{ $hash_by_val{$hash{$key}} }, $key;
}
for my $key (sort keys %hash_by_val){
my #arr = #{$hash_by_val{$key}};
my $arr_ubound = $#arr;
for (0..$arr_ubound){
my $randnum = int(rand($arr_ubound));
my $val = splice(#arr,$randnum,1);
$arr_ubound--;
print "$key : $val\n"; # notice: output varies b/t runs
}
}