looping through hash of hashes reference perl - perl

I have a hash of hashes that I am passing to a subroutine. In the subroutine I need to loop over the hash of hashes and access the value of the inner hash based on the outer hash's key. I am having trouble referencing and dereferencing the hash of hashes.
Here is my code.
use List::Util qw( min max );
##testingWords is array of strings
foreach(#testingWords)
{
#skip values that are '[' or ']' and move onto next value in array.
if($_ eq '[' or $_ eq ']')
{
next;
}
#if value in array matches key in %trainingHashRaw (hash of hashes)pass key to getMax.
if($trainingHashRaw{$_})
{
#key is $_, value is returned string from getMax
#%trainingHashRelative is hash of hashes
$testingHash{$_} = getMax($_, \%trainingHashRelative);
}
}
sub getMax
{
my $key = shift;
my $hash = shift;
my #max = ();
my $max = 0;
my $tag = "";
for my $i(keys $hash)
{
for my $j(keys $hash->{$i})
{
if($key eq $i)
{
push(#max, $hash->{$i}->{$j});
}
}
if(#max)
{
$max = max #max;
}
}
for my $i(keys $hash)
{
for my $j(keys $hash->{$i})
{
if($max == $hash->{$i}->{$j})
{
$tag = $j;
}
}
}
return $tag;
}

It is not very clear what your data structures should contain. So I made up an example for what I think that you have meant. The max value can occur more than once, so I keep track of all occurrences.
use strict;
use warnings;
use List::Util qw( min max );
use Data::Dumper;
# for each outer key we want to get the max of values of the associated hashref
# outer_a: 3
# outer_b: 100
my %hash_of_hashes = (
outer_a => {
inner_a_x => 1,
inner_a_y => 2,
inner_a_z => 3,
},
outer_b => {
inner_a_x => -100,
inner_a_y => 100,
},
outer_c => {
inner_a_x => 100,
inner_a_y => 1,
}
);
my ($max_value, $keys_of_max_value) = get_max( \%hash_of_hashes );
print "The max value $max_value occured in ", join( ' ,', #{$keys_of_max_value} ), ".\n";
sub get_max {
my ($hoh_ref) = #_; # reference to a hash of hashes
# %tmp keeps track of the outer_key for the max of the inner values
# %tmp = (
# 3 => ['outer_a'],
# 100 => ['outer_b', 'outer_c']
# )
my %tmp;
while ( my ($outer_key, $inner_hashref) = each %{$hoh_ref} ) {
my #inner_values = values %{$inner_hashref};
my $max_inner_values = max( #inner_values );
$tmp{$max_inner_values} ||= []; # for clarity create the arref expiclitly
push #{$tmp{$max_inner_values}}, $outer_key;
}
my $max_value = max( keys %tmp ); # 100
return ( $max_value, $tmp{$max_value} ); # 100, ['outer_a', 'outer_b']
}

Related

Iterate through values of a HASH and convert to comma separated strings

I want to iterate through the values of a big hash, and if any of the values of that hash are keys, I want to convert it into a comma separated list which can be parsed in 'query_form'.
Right now from the data below I have:
name=Bob&surname=Whitbread&customerErrors=HASH(Xa456) (for example)
Here's what I have so far:
sub convertArgsToQueryString {
my $class = shift;
my $args = shift;
return unless ($args && ref($args) eq 'HASH');
foreach my $key (values %$args) {
if (ref($key) eq 'HASH') {
# change to a comma separated list
}
}
my $dummyURL = URI->new('', 'http');
$dummyURL->query_form(%$args);
return $dummyURL->query;
}
Data:
my $data = {
'name' => 'Bob',
'surname' => 'Whitbread',
'customerErrors' => {
'error1' => 'paymentError',
'error2' => 'addressError'
},
};
Query Form:
name=Bob&surname=Whitbread&customerErrors=paymentError,addressError
This will do what you want
print join ",", values %{$data->{customerErrors}},"\n";
Although I would suggest, rather than error1 as hash keys, you'd be better off with an array:
my $data = {
'name' => 'Bob',
'surname' => 'Whitbread',
'customerErrors' => [ 'paymentError', 'addressError' ],
};
Scaling that out to be generic, you will find the ref function to be helpful:
foreach my $key ( keys %$data ) {
print "$key is a ", ref $data->{$key},"\n";
if ( ref $data->{$key} eq 'HASH' ) {
print join ",", values %{$data->{$key}};
}
else {
print $data -> {$key},"\n";
}
}
Or tersely:
print join "\&", map { #join iterated on &
join "=", $_, #join paired values on =
ref $data->{$_} eq 'HASH' #ternary to check reference type
? values %{ $data->{$_} } #extract values if HASH
: $data->{$_} #extract just value if not.
} keys %$data; #iterate keys of data
Which gives as output:
name=Bob&customerErrors=addressError=paymentError&surname=Whitbread

Find key for greatest value in hash of hashes in Perl

I have a hash of hashes containing keys, values, and counts of the form ((k1, v1), c1). I am trying to write a subroutine that returns the value of the key passed as a parameter with the greatest count. For example, if I had:
%hash = (
"a" => {
"b" => 2,
"c" => 1,
},
"d" => {
"e" => 4,
},
);
and made the call:
print &function("a");
it should print "b" because key "a" has the highest count of 2 with "b" as its value. Here is the code I have so far:
sub function() {
$key = $_[0];
if(exists($hash{$key})) {
while (my ($value, $count) = each %{$hash{$key}}) {
#logic goes here
}
} else {
return "$key does not exist";
}
}
The sub doesn't need to know anything about the outer hash, so it makes far more sense to call the sub as follows:
print key_with_highest_val($hash{a});
The sub simply needs to iterate over all the elements of that hash, keeping track of the highest value seen, and the key at which it was seen.
sub key_with_highest_val {
my ($h) = #_;
my $hi_v;
my $hi_k;
for my $k (keys(%$h)) {
my $v = $h->{$k};
if (!defined($hi_v) || $v > $hi_v) {
$hi_v = $v;
$hi_k = $k;
}
}
return $hi_k;
}
As Chris Charley points out, List::Util's reduce can simply this function. With the calling convention I recommended above, the reduce solution becomes the following:
use List::Util qw( reduce );
sub key_with_highest_val {
my ($h) = #_;
return reduce { $h->{$a} >= $h->{$b} ? $a : $b } keys(%$h);
}
Both versions return an arbitrary key among those that tied when there's a tie.
Use the reduce function from List::Util (which is part of core perl).
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce/;
my %hash = (
"a" => {
"b" => 2,
"c" => 1,
},
"d" => {
"e" => 4,
},
);
my $key = 'a';
print "For key: $key, max key is ", max_key($key, %hash), "\n";
sub max_key {
my ($key, %hash) = #_;
return "$key does not exist" unless exists $hash{$key};
my $href = $hash{$key};
return reduce { $href->{$a} > $href->{$b} ? $a : $b } keys %$href;
}
You should always include use strict and use warnings at the top of your programs to catch errors so you can find and fix them. This requires declaring of your variables with my, like my $key = 'a';, for example and my %hash = ...
This program prints:
For key: a, max key is b
This code makes the following assumptions:
The values of your nested hashes are always numeric.
You don't have duplicate values.
Anything else is left as an exercise for the reader.
use strict;
use warnings;
use Data::Dump;
use List::Util qw(max);
my %hash = (
a => {
b => 2,
c => 1,
},
d => {
e => 4,
},
);
dd(max_hash_value(\%hash, $_)) for 'a' .. 'd';
sub max_hash_value {
my ($hash_ref, $search_key) = #_;
return unless $hash_ref->{$search_key};
my %lookup;
while (my ($key, $value) = each(%{$hash_ref->{$search_key}})) {
$lookup{$value} = $key;
}
return $lookup{max(keys(%lookup))};
}
Output:
"b"
()
()
"e"

Grouping with Perl: finding a faster solution to recursion

The Perl code below works, but it doesn't scale well even with considerable computer resources. I hoping that someone can help me find more efficient code such as by replacing recursion with iteration, if that's the problem.
my data structure looks like this:
my %REV_ALIGN;
$REV_ALIGN{$dna}{$rna} = ();
Any dna key may have multiple rna sub keys. The same rna sub key may appear with multiple different dna keys. The purpose is to group rna ( transcripts ) based on shared dna sequence elements. For example, if dnaA has RNA1, RNA8, RNA9, and RNA4, and dnaB has RNA11, RNA4, and RNA99, then we group all these transcripts together ( RNA1, RNA9, RNA4, RNA11, RNA99 ) and continue to proceed to try and add to the group by selecting other dna. My recusive solution to this problem works but doesn't scale so well when using data from whole genome to transcriptome alignment.
SO MY QUESTION IS: WHAT IS A MORE EFFICIENT SOLUTION TO THIS PROBLEM? THANK YOU VERY MUCH
my #groups;
while ( my $x =()= keys %REV_ALIGN )
{
my #DNA = keys %REV_ALIGN;
my $dna = shift #DNA;
# the corresponding list of rna
my #RNA = keys %{$REV_ALIGN{$dna}};
delete $REV_ALIGN{$dna};
if ( $x == 1 )
{
push #groups, \#RNA;
last;
}
my $ref = group_transcripts ( \#RNA, \%REV_ALIGN );
push #groups, $ref;
}
sub group_transcripts
{
my $tran_ref = shift;
my $align_ref = shift;
my #RNA_A = #$tran_ref;
my %RNA;
# create a null hash with seed list of transcripts
#RNA{#RNA_A} = ();
# get a list of all remaining dna sequences in the alignment
my #DNA = keys %{$align_ref};
my %count;
# select a different list of transcripts
for my $dna ( #DNA )
{
next unless exists $align_ref->{$dna};
my #RNA_B = keys %{$align_ref->{$dna}};
# check to see two list share and transcripts
for my $element ( #RNA_A, #RNA_B )
{
$count{$element}++;
}
for my $rna_a ( keys %count )
{
# if they do, add any new transcripts to the current group
if ( $count{$rna_a} == 2 )
{
for my $rna_b ( #RNA_B )
{
push #RNA_A, $rna_b if $count{$rna_b} == 1;
}
delete $align_ref->{$dna};
delete $count{$_} foreach keys %count;
# recurse to try and continue adding to list
#_ = ( \#RNA_A, $align_ref );
goto &group_transcripts;
}
}
delete $count{$_} foreach keys %count;
}
# if no more transcripts can be added, return a reference to the group
return \#RNA_A;
}
You have a loops nested four deep. It's an pretty safe bet that's why your code scales poorly.
If I understand correctly what you are trying to accomplish, the input
my %REV_ALIGN = (
"DNA1" => { map { $_ => undef } "RNA1", "RNA2" }, # \ Linked by RNA1 \
"DNA2" => { map { $_ => undef } "RNA1", "RNA3" }, # / \ Linked by RNA3 > Group
"DNA3" => { map { $_ => undef } "RNA3", "RNA4" }, # / /
"DNA4" => { map { $_ => undef } "RNA5", "RNA6" }, # \ Linked by RNA5 \ Group
"DNA5" => { map { $_ => undef } "RNA5", "RNA7" }, # / /
"DNA6" => { map { $_ => undef } "RNA8" }, # > Group
);
should result in
my #groups = (
[
dna => [ "DNA1", "DNA2", "DNA3" ],
rna => [ "RNA1", "RNA2", "RNA3", "RNA4" ],
],
[
dna => [ "DNA4", "DNA5" ],
rna => [ "RNA5", "RNA6", "RNA7" ],
],
[
dna => [ "DNA6" ],
rna => [ "RNA8" ],
],
);
If so, you can use the following:
use strict;
use warnings;
use Graph::Undirected qw( );
my %REV_ALIGN = (
"DNA1" => { map { $_ => undef } "RNA1", "RNA2" },
"DNA2" => { map { $_ => undef } "RNA1", "RNA3" },
"DNA3" => { map { $_ => undef } "RNA3", "RNA4" },
"DNA4" => { map { $_ => undef } "RNA5", "RNA6" },
"DNA5" => { map { $_ => undef } "RNA5", "RNA7" },
"DNA6" => { map { $_ => undef } "RNA8" },
);
my $g = Graph::Undirected->new();
for my $dna (keys(%REV_ALIGN)) {
for my $rna (keys(%{ $REV_ALIGN{$dna} })) {
$g->add_edge("dna:$dna", "rna:$rna");
}
}
my #groups;
for my $raw_group ($g->connected_components()) {
my %group = ( dna => [], rna => [] );
for (#$raw_group) {
my ($type, $val) = split(/:/, $_, 2);
push #{ $group{$type} }, $val;
}
push #groups, \%group;
}
use Data::Dumper qw( Dumper );
print(Dumper(\#groups));
If you just want the RNA, the final section simplifies to the following:
my #groups;
for my $raw_group ($g->connected_components()) {
my #group;
for (#$raw_group) {
my ($type, $val) = split(/:/, $_, 2);
push #group, $val if $type eq 'rna';
}
push #groups, \#group;
}

Getting values from a hash by order defined in another list

Looking for a way, how to get the values from a hash, in an order defined by another list.
The "demo" code (real values are different):
use 5.014;
use warnings;
my $href = {
'Long one' => 'v1',
'xxx two' => 'v2',
'another3' => 'v3',
'xfour' => 'v4',
'some5' => undef,
};
#keys from the $href in defined order
my #order = ('Long one', 'xfour', 'nono', 'another3', 'some5', 'xxx two');
#in the real code:
#my $href = some_sub(......); my #order = another_sub(....);
#cleanup the #order form undefined values
#order = grep { exists $href->{$_} && defined $href->{$_} } #order;
#my input
while(<DATA>) {
chomp;
#filter out nonexistent keys and undefined values
my #defined_data = grep { exists $href->{$_} && defined $href->{$_} } split '/';
my $str = "xx";
$str = join('/',
map { $href->{$_} } some_my_sort(#defined_data)
) if #defined_data;
say $str;
}
sub some_my_sort {
my(#list) = #_;
# "somewhat"sort the #list in order defined by #order
# haven't any idea how to do this :(
# __HERE NEED HELP__ to sort the #list, to the order defined in the #order
#and get only first two values. if exists only one value, return only one
if($#list > 0) {
return ($list[0], $list[1]);
}
else {
return($list[0]);
}
}
__DATA__
another3/some5/Long one/xfour/xxx two
xxx two/blabla/some5/another3/xfour
some5
notexists/some5/xxx two/Long one
some5/another3
for the above input want get the next output:
v1/v4
v4/v3
xx
v1/v2
v3
Form #ikegami solution:
use 5.014;
use warnings;
my $href = { 'Long one' => 'v1', 'xxx two' => 'v2', 'another3' => 'v3', 'xfour' => 'v4', 'some5' => undef, };
my #order = ('Long one', 'xfour', 'nono', 'another3', 'some5', 'xxx two');
#order = grep { exists $href->{$_} && defined $href->{$_} } #order;
my %order = map { $order{$_} => $_ } 0..$#order;
while (<DATA>) {
chomp;
my #keys = grep { defined $href->{$_} } split '/';
#keys = sort { $order{$a} <=> $order{$b} } #keys;
splice(#keys, 2) if #keys > 2;
#keys = 'xx' if !#keys;
say join '/', #{$href}{ #keys };
}
get the next - error - and don't really understand why:
Global symbol "%order" requires explicit package name at ike line 8.
Execution of ike aborted due to compilation errors.
After defining #order, define %order_h:
my %order_h;
my $i = 0;
$order_h{$_} = $i++ for #order;
Then, instead of the comment about sorting, add this line:
#list = sort { $order_h{$a} <=> $order_h{$b} } #list;
Here's a cleaned-up version of the entire code:
my %order = map { $order[$_] => $_ } 0..$#order;
while (<DATA>) {
chomp;
my #keys = grep { defined $href->{$_} } split '/';
#keys = sort { $order{$a} <=> $order{$b} } #keys;
splice(#keys, 2) if #keys > 2;
say join '/', #keys ? #{$href}{ #keys } : 'xx';
}

Perl Working On Two Hash References

I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}
The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...