Function explanation: delete a key in a hashtable - perl

I'm a Perl programmer beginner and I am seeking for some explanation about the code above. The function is named delete_depth_hash but I can't understand how it works.
sub delete_depth_hash {
my (%hash_genotype_depth) = #_;
my %new_hash;
foreach my $geno(keys %hash_genotype_depth) {
foreach my $dep(keys %{$hash_genotype_depth{$geno}}) {
my $frequence = scalar($hash_genotype_depth{$geno}{$dep});
$new_hash{$geno} +=$frequence;
}
}
return %new_hash;
}

The behaviour can be explained by an example:
use Data::Dumper;
print Dumper({ delete_depth_hash( a => { b => 1, c => 2 } ) });
It outputs:
$VAR1 = {
'a' => 3
};
So, it basically sums the numbers regardless of the subkeys in a hash of hashes. Note that the scalar does nothing here, as the value of a hash must always be a scalar.
As the inner keys are just removed, you can simplify the code using the List::Util's sum and the values function:
use List::Util qw{ sum };
sub delete_depth_hash {
my (%hash_genotype_depth) = #_;
my %new_hash;
for my $geno (keys %hash_genotype_depth) {
$new_hash{$geno} = sum(values %{ $hash_genotype_depth{$geno} });
}
return %new_hash;
}

Related

Find key for greatest value in hash of hashes in Perl

I have a hash of hashes containing keys, values, and counts of the form ((k1, v1), c1). I am trying to write a subroutine that returns the value of the key passed as a parameter with the greatest count. For example, if I had:
%hash = (
"a" => {
"b" => 2,
"c" => 1,
},
"d" => {
"e" => 4,
},
);
and made the call:
print &function("a");
it should print "b" because key "a" has the highest count of 2 with "b" as its value. Here is the code I have so far:
sub function() {
$key = $_[0];
if(exists($hash{$key})) {
while (my ($value, $count) = each %{$hash{$key}}) {
#logic goes here
}
} else {
return "$key does not exist";
}
}
The sub doesn't need to know anything about the outer hash, so it makes far more sense to call the sub as follows:
print key_with_highest_val($hash{a});
The sub simply needs to iterate over all the elements of that hash, keeping track of the highest value seen, and the key at which it was seen.
sub key_with_highest_val {
my ($h) = #_;
my $hi_v;
my $hi_k;
for my $k (keys(%$h)) {
my $v = $h->{$k};
if (!defined($hi_v) || $v > $hi_v) {
$hi_v = $v;
$hi_k = $k;
}
}
return $hi_k;
}
As Chris Charley points out, List::Util's reduce can simply this function. With the calling convention I recommended above, the reduce solution becomes the following:
use List::Util qw( reduce );
sub key_with_highest_val {
my ($h) = #_;
return reduce { $h->{$a} >= $h->{$b} ? $a : $b } keys(%$h);
}
Both versions return an arbitrary key among those that tied when there's a tie.
Use the reduce function from List::Util (which is part of core perl).
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce/;
my %hash = (
"a" => {
"b" => 2,
"c" => 1,
},
"d" => {
"e" => 4,
},
);
my $key = 'a';
print "For key: $key, max key is ", max_key($key, %hash), "\n";
sub max_key {
my ($key, %hash) = #_;
return "$key does not exist" unless exists $hash{$key};
my $href = $hash{$key};
return reduce { $href->{$a} > $href->{$b} ? $a : $b } keys %$href;
}
You should always include use strict and use warnings at the top of your programs to catch errors so you can find and fix them. This requires declaring of your variables with my, like my $key = 'a';, for example and my %hash = ...
This program prints:
For key: a, max key is b
This code makes the following assumptions:
The values of your nested hashes are always numeric.
You don't have duplicate values.
Anything else is left as an exercise for the reader.
use strict;
use warnings;
use Data::Dump;
use List::Util qw(max);
my %hash = (
a => {
b => 2,
c => 1,
},
d => {
e => 4,
},
);
dd(max_hash_value(\%hash, $_)) for 'a' .. 'd';
sub max_hash_value {
my ($hash_ref, $search_key) = #_;
return unless $hash_ref->{$search_key};
my %lookup;
while (my ($key, $value) = each(%{$hash_ref->{$search_key}})) {
$lookup{$value} = $key;
}
return $lookup{max(keys(%lookup))};
}
Output:
"b"
()
()
"e"

Adding each element in an array to a complex hash in Perl

I have an array with n number of elments. I want to add each of the elements to a complex hash, each as a key/value pair. If the number of elements were fixed, say three, I would do:
my %hash;
my #array = ("first", "second", "third");
$hash{$array[0]}{$array[1]}{$array[2]}++;
The structure I want to end up with, is this (printed with Data::Dumper):
$VAR1 = 'first';
$VAR2 = {
'second' => {
'third' => 1
};
But I am at loss at achieving the same structure when the number of elements in the array isn't fixed. Something with anonymous variables and iterating through the array, yes, but something like foreach #array{$hash{$_}++}; will obviously only make one entry per element, and not the desired structure. Help?
Something like this could build the structure you desire for N elements:
use strict;
use warnings;
use Data::Dumper;
my #array = qw(first second third four five six seven);
my $hash;
foreach my $key ( reverse #array ) {
$hash = { $key => $hash };
}
print Dumper $hash;
__END__
$VAR1 = {
'first' => {
'second' => {
'third' => {
'fourth' => {
'fifth' => {
'sixth' => {
'seventh' => undef
}
}
}
}
}
}
};
It is not clear what you really need this for. There may be a better solution if you explain your use-case a little more. Incrementing this structure doesn't appear to be very easy.
After playing around a little, you can increment by traversing the hash reference to the bottom then incrementing the value of the last element. It is not very pretty though :|
# incrementing
my $elem = $hash; # copy the reference
foreach my $key ( #array ) {
# found the bottom of the hash
unless ( $elem->{$key} && ref($elem->{$key}) ) {
$elem->{$key}++;
last;
}
# not at the bottom, move to the next level
$elem = $elem->{$key};
}
print Dumper $hash;
__END__
$VAR1 = {
'first' => {
'second' => {
'third' => {
'fourth' => {
'fifth' => {
'sixth' => {
'seventh' => 1
}
}
}
}
}
}
};
This is relatively simple if you maintain a current hash reference. This short program demonstrates
The first few steps make sure that each hash element exists and its value is a hash reference. $href is moved to the next level of hash at each stage. For the final element of the array, the latest hash level's element is incremented instead of being set to a hash reference.
Whether or not this data structure is the correct choice depends on what else you need to do with it once you have built it
use strict;
use warnings;
my %hash;
my #array = qw/ first second third fourth fifth /;
drill_hash(\%hash, #array);
use Data::Dump;
dd \%hash;
sub drill_hash {
my ($href, #list) = #_;
my $final = pop #list;
$href = $href->{$_} //= {} for #list;
++$href->{$final};
}
output
{
first => { second => { third => { fourth => { fifth => 1 } } } },
}
Update
Having understood your purpose, the simplest way to keep a count of occurrences of ngrams like that is to have a speficic hash key that is used to keep the count of the sequence of words so far.
This program uses the value _COUNT for that key, and you can see that, for example, {under}{a}{_COUNT} and {under}{a}{rock}{_COUNT} are both 1
use strict;
use warnings;
my %counts;
count_ngram(\%counts, qw/ under a /);
count_ngram(\%counts, qw/ a rock /);
count_ngram(\%counts, qw/ under a rock /);
count_ngram(\%counts, qw/ a tree /);
count_ngram(\%counts, qw/ under a tree /);
use Data::Dump;
dd \%counts;
sub count_ngram {
my ($href, #ngram) = #_;
my $final = pop #ngram;
$href = $href->{$_} //= {} for #ngram;
++$href->{$final}{_COUNT};
}
output
{
a => { rock => { _COUNT => 1 }, tree => { _COUNT => 1 } },
under => {
a => { _COUNT => 1, rock => { _COUNT => 1 }, tree => { _COUNT => 1 } },
},
}

Printing Hash of Hash into a Matrix Table in Perl

I have a data structure like this:
#!/usr/bin/perl -w
my $hash = {
'abTcells' => {
'mesenteric_lymph_node' => {
'Itm2a' => '664.661',
'Gm16452' => '18.1425',
'Sergef' => '142.8205'
},
'spleen' => {
'Itm2a' => '58.07155',
'Dhx9' => '815.2795',
'Ssu72' => '292.889'
}
}
};
What I want to do is to print it out into this format:
mesenteric_lymph_node spleen
Itm2a 664.661 58.07155
Gm16452 18.1425 NA
Sergef 142.8205 NA
Dhx9 NA 815.2795
Ssu72 NA 292.889
What's the way to do it.
I'm currently stuck with the following code https://eval.in/44207
foreach my $ct (keys %{$hash}) {
print "$ct\n\n";
my %hash2 = %{$hash->{$ct}};
foreach my $ts (keys %hash2) {
print "$ts\n";
my %hash3 = %{$hash2{$ts}};
foreach my $gn (keys %hash3) {
print "$gn $hash3{$gn}\n";
}
}
}
Use Text::Table for output. Beautify to taste.
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Table;
my $hash = {
'abTcells' => {
'mesenteric_lymph_node' => {
'Itm2a' => '664.661',
'Gm16452' => '18.1425',
'Sergef' => '142.8205'
},
'spleen' => {
'Itm2a' => '58.07155',
'Dhx9' => '815.2795',
'Ssu72' => '292.889'
}
}
};
my $struct = $hash->{abTcells};
my #cols = sort keys %{ $struct };
my #rows = sort keys %{ { map {
my $x = $_;
map { $_ => undef }
keys %{ $struct->{$x} }
} #cols } };
my $tb = Text::Table->new('', #cols);
for my $r (#rows) {
$tb->add($r, map $struct->{$_}{$r} // 'NA', #cols);
}
print $tb;
Output:
mesenteric_lymph_node spleen
Dhx9 NA 815.2795
Gm16452 18.1425 NA
Itm2a 664.661 58.07155
Sergef 142.8205 NA
Ssu72 NA 292.889
Now, the order of the rows above is different than the one you show because I wanted it to be consistent. If you know the set of all possible rows, then you can specify another order obviously.
First thing would be to separate out the two hashes:
my %lymph_node = %{ $hash->{abTcells}->{mesenteric_lymph_node} };
my %spleen = %{ $hash->{abTcells}->{spleen} };
Now, you have two separate hashes that contains the data you want.
What we need is a list of all the keys. Let's make a third hash that contains your keys.
my %keys;
map { $keys{$_} = 1; } keys %lymph_node, keys %spleen;
Now, we can go through all your keys and print the value for each of the two hashes. If one of the hashes doesn't have the data, we'll set it to NA:
for my $value ( sort keys %keys ) {
my $spleen_value;
my $lymph_nodes_value;
$spleen_value = exists $spleen{$value} ? $spleen{$value} : "NA";
$lymph_node_value = exists $lymph_node{$value} ? $lymph_node{$value} : "NA";
printf "%-20.20s %-9.5f %-9.5f\n", $key, $lymph_node_value, $spleen_value;
}
The printf statement is a nice way to tabularize data. You'll have to create the headings yourself. The ... ? ... : ... statement is an abbreviated if/then/else If the statement before the ? is true, then the value is the value between the ? and the :. Else, the value is the value after the :.
Both of your inner hashes have the same keys, So do a foreach on one of the hashes to get the key, and then print both.

how to declare array reference in hash refrence

my $memType = [];
my $portOp = [];
my $fo = "aster.out.DRAMA.READ.gz";
if($fo =~/aster.out\.(.*)\.(.*)\.gz/){
push (#{$memType},$1);
push (#{$portOp},$2);
}
print Dumper #{$memType};
foreach my $mem (keys %{$portCapability->{#{$memType}}}){
//How to use the array ref memType inside a hash//
print "entered here\n";
//cannot post the rest of the code for obvious reasons//
}
I am not able to enter the foreach loop . Can anyone help me fix it?
Sorry this is not the complete code . Please help me.
%{$portCapability->{#{$memType}}}
This doesn't do what you may think it means.
You treat $portCapability->{#{$memType}} as a hash reference.
The #{$memType} is evaluated in scalar context, thus giving the size of the array.
I aren't quite sure what you want, but would
%{ $portCapability->{ $memType->[0] } }
work?
If, however, you want to slice the elements in $portCapability, you would need somethink like
#{ $portCapability }{ #$memType }
This evaluates to a list of hashrefs. You can then loop over the hashrefs, and loop over the keys in an inner loop:
for my $hash (#{ $portCapability }{ #$memType }) {
for my $key (keys %$hash) {
...;
}
}
If you want a flat list of all keys of the inner hashes, but don't need the hashes themselves, you could shorten above code to
for my $key (map {keys %$_} #{ $portCapability }{ #$memType }) {
...;
}
I think what you want is this:
my $foo = {
asdf => {
a => 1, b => 2,
},
foo => {
c => 3, d => 4
},
bar => {
e => 5, f => 6
}
};
my #keys = qw( asdf foo );
foreach my $k ( map { keys %{ $foo->{$_} } } #keys ) {
say $k;
}
But you do not know which of these $k belongs to which key of $foo now.
There's no direct way to get the keys of multiple things at the same time. It doesn't matter if these things are hashrefs that are stored within the same hashref under different keys, or if they are seperate variables. What you have to do is build that list yourself, by looking at each of the things in turn. That's simply done with above map statement.
First, look at all the keys in $foo. Then for each of these, return the keys inside that element.
my $memType = [];
my $portOp = [];
my $fo = “aster.out.DRAMA.READ.gz”;
if ($fo =~ /aster.out\.(\w+)\.(\w+)\.gz/ ) { #This regular expression is safer
push (#$memType, $1);
push (#$portOp, $2);
}
print Dumper “#$memType”; #should print “DRAMA”
#Now if you have earlier in your program the hash %portCapability, your code can be:
foreach $mem (#$memType) {
print $portCapability{$mem};
}
#or if you have the hash $portCapability = {…}, your code can be:
foreach $mem (#$memType) {
print $portCapability->{$mem};
}
#Hope it helps

Equivalent of "shift" for a hash to create a $class->next() method

I almost feel like saying "it's me again!".
Anyway, here we go.
I like using while $object->next() style constructs. They appeal to me and seem "neat".
Now, when the thing I'm iterating over is an array, it's straightforward ("shift #ary or return undef")
sub next {
my ( $self, $args ) = #_;
my $next = shift #{ $self->{list_of_things} } or return undef;
my ( $car, $engine_size, $color )
= split( /\Q$opts->{fieldsep}/, $next );
$self->car = $host;
$self->engine_size = $engine_size;
$self->color = $color;
}
In this example I use AUTOLOAD to create the getters and setters and then have those instance variables available in my object during the while loop.
I'd like to do something similar but with the "list_of_things" being a %hash.
Here's a non-OO example that doesn't make it into the first iteration. Any ideas why?
(The total "list_of_things" is not that big - maybe 100 entries - so to do a keys(%{$hash}) every time doesn't seem too wasteful to me).
use strict;
use warnings;
use Data::Dumper;
my $list_of_things = {
volvo => {
color => "red",
engine_size => 2000,
},
bmw => {
color => "black",
engine_size => 2500,
},
mini => {
color => "british racing green",
engine_size => 1200,
}
};
sub next {
my $args = $_;
my #list = keys( %{$list_of_things} );
return undef if scalar #list == "0";
my $next = $list_of_things->{ $list[0] };
delete $list_of_things->{ $list[0] };
return $next;
}
while ( next()) {
print Dumper $_;
print scalar keys %{ $list_of_things }
}
Is there a better way of doing this? Am I doing something crazy?
EDIT:
I tried Ikegami's suggestion. Of course, Ikegami's example works flawlessly. When I try and abstract a little, so that all that is exposed to the object is a next->() method, I get the same "perl-going-to-100%-cpu" problem as in my original example.
Here's a non-OO example:
use Data::Dumper qw( Dumper );
sub make_list_iter {
my #list = #_;
return sub { #list ? shift(#list) : () };
}
sub next {
make_list_iter( keys %$hash );
}
my $hash = { ... };
while ( my ($k) = next->() ) {
print Dumper $hash->{$k};
}
It does not seem to get past the first step of the while() loop.
I am obviously missing something here...
If you don't want to rely on the hash's builtin iterator (used by each, keys and values), there's nothing stopping you from making your own.
use Data::Dumper qw( Dumper );
sub make_list_iter {
my #list = #_;
return sub { #list ? shift(#list) : () };
}
my $list_of_things = { ... };
my $i = make_list_iter(keys %$list_of_things);
while (my ($k) = $i->()) {
local $Data::Dumper::Terse = 1;
local $Data::Dumper::Indent = 0;
say "$k: " . Dumper($list_of_things->{$k});
}
The each operator is a builtin that iterates over hashes. It returns undef when it runs out of elements to return. So you could so something like
package SomeObject;
# creates new object instance
sub new {
my $class = shift;
return bless { hash_of_things => { #_ } }, $class
}
sub next {
my $self = shift;
my ($key,$value) = each %{ $self->{hash_of_things} };
return $key; # or return $value
}
Calling keys on the hash will reset the each iterator. It's good to know this so you can reset it on purpose:
sub reset {
my $self = shift;
keys %{ $self->{hash_of_things} }
}
and so you can avoid resetting it on accident.
The section on tie'ing hashes in perltie also has an example like this.
Here's how List::Gen could be used to create an iterator from a list:
use strict;
use warnings;
use List::Gen 'makegen';
my #list_of_things = ( # This structure is more suitable IMO
{
make => 'volvo',
color => 'red',
engine_size => 2000,
},
{
make => 'bmw',
color => 'black',
engine_size => 2500,
},
{
make => 'mini',
color => 'british racing green',
engine_size => 1200,
}
);
my $cars = makegen #list_of_things;
print $_->{make}, "\n" while $cars->next;
Well, if you don't need $list_of_things for later, you can always do something like
while(keys %$list_of_things)
{
my $temp=(sort keys %$list_of_things)[0];
print "key: $temp, value array: " . join(",",#{$list_of_things->{$temp}}) . "\n";
delete $list_of_things->{$temp};
}
And if you do need it, you can always assign it to a temporary hash reference and perform the same while loop on it.