Search whether AoH value exists in same Hash - perl

I have hash which contains some data.
I want my final %hash to be printed like this:
'UGroup=1' => [ 'C72', 'C73', 'C71' ]
Here is my script:
use Data::Dumper;
my %h = (
'C72' => [ 'S=2-1' ],
'C73' => [ 'S=3-1' ],
'C71' => [ 'S=91-1'],
'UGroup=1' => [ 'S=1-1',
'S=2-1',
'S=3-1',
'S=91-1'],
);
print Dumper(\%h);
foreach my $C (sort keys %h) {
next unless $C =~ /UGroup/;
for my $f (#{$h{$C}}){
print "\tf:$f\n";
#This is not correct, but wanted to do something like this.
push #{$hash{$C}}, $f if(exists $h{$f});
}
}
print Dumper(\%hash);
Here in example input hash I need to check if S=91-1 has any key? If yes then associate that key to value for %hash with its original key.
How can I do that?

You didn't name the things, so
S=91-1 shall be a snake,
C71 shall be a cow, and
UGroup=1 shall be a group.
Start by building this hash:
my %cows_by_snake = (
'S=91-1' => [ 'C71' ],
'S=2-1' => [ 'C72' ],
'S=3-1' => [ 'C73' ],
);
Just ignore the keys that of %h that are groups when you do so.
Once you built a hash, it's simply a question of doing the following:
Create an empty result hash.
For each group,
Create an empty collection of cows.
For each snake associated the the group,
Add the cows associated with the snake to the collection.
Eliminate the duplicates in the collection of cows.
Add the group and the associated cows to the result hash.
my #groups;
my #cows;
for my $cow_or_group (keys(%h)) {
if ($cow_or_group =~ /^UGroup=/) {
push #groups, $cow_or_group;
} else {
push #cows, $cow_or_group;
}
}
my %cows_by_snake;
for my $cow (#cows) {
for my $snake (#{ $h{$cow} }) {
push #{ $cows_by_snake{$snake} }, $cow;
}
}
my %results;
for my $group (#groups) {
my %group_cows;
for my $snake (#{ $h{$group} }) {
for my $cow (#{ $cows_by_snake{$snake} }) {
++$group_cows{$cow};
}
}
$results{$group} = [ sort keys %group_cows ];
}

Related

How to fetch key of a hash through values in Hash of Arrays

Below is the small code snippet i have created using Hash Key and Hash Value which is an array.
And my input is Hash value (Array in this case) , i have to search for all the arrays in $ENV hash and fetch the hash key
Example if i search for m3d1 , the output should be TEST3
use strict;
use warnings;
use Data::Dumper;
my %ENV;
$ENV{"TEST3"}=["m3d1","m3d2"] ;
$ENV{"TEST4"}=["m4d1","m4d2"] ;
$ENV{"TEST5"}=["m5d1","m5d2"] ;
$ENV{"TEST6"}=["m6d1","m6d2"] ;
print Dumper \#keys;
print Dumper \#values;
Is it possible , or is there a better design ?
Of course it's possible. We can visit every value in the data structure until we find the one that matches.
You didn't say what the strings represent, to I'm going to call TEST3 a group, and I'm going to call m3d1 a host. The following snippets don't assume a host is preset in only one of group.
my #groups;
for my $group (keys(%ENV)) {
for my $host (#{ $ENV{$group} }) {
if ($host eq $target_host) {
push #groups, $group;
}
}
}
die("Not found\n") if !#groups;
say for #groups;
But this isn't efficient. If you were planning on doing many lookups, this would be slow.
Let's start by turning the structure inside out.
my %groups_by_host;
for my $group (keys(%ENV)) {
for my $host (#{ $ENV{$group} }) {
push #{ $groups_by_host{$host} }, $group;
}
}
The above produces
my %groups_by_host = (
m3d1 => [ "TEST3" ],
m3d2 => [ "TEST3" ],
m4d1 => [ "TEST4" ],
m4d2 => [ "TEST4" ],
m5d1 => [ "TEST5" ],
m5d2 => [ "TEST5" ],
m6d1 => [ "TEST6" ],
m6d2 => [ "TEST6" ],
);
Then, searching becomes instantaneous.
my $groups = $groups_by_host{$target_host}
or die("Not found\n");
say for #$groups;

Hash content extraction based on condition

I have a hash containing node data.
I am expecting hash content to be printed in -r_<count> and -d_<count> attributes.
Here is the script:
use strict; use warnings;
use Data::Dumper;
my %hash = (
'Network=Test,Cell=31' => [ 'Network=Test,Unit=RU-1-1,Port=A',
'Network=Test,Unit=RU-1-2,Port=A'
],
'Network=Test,Cell=32' => [ 'Network=Test,Unit=RU-1-1,Port=A',
'Network=Test,Unit=RU-1-2,Port=A'
],
'Network=Test,Cell=33' => [ 'Network=Test,Unit=RU-1-5,Port=A',
'Network=Test,Unit=RU-1-6,Port=A'
],
);
print "hash:\n".Dumper(\%hash);
my $count = 0;
foreach my $d (sort keys %hash) {
$count++;
print "-d_". $count."=".$d . "\n";
my %seen = ();
foreach my $r (sort #{$hash{$d}}) {
$seen{$r}++;
}
if ((keys %seen) > 0) {
my $uniq = join ("###",sort keys %seen);
print "-r_". $count . "=" . $uniq . "\n";
} else {
print "-r_". $count."="."NA\n";
}
}
And I am able to print output like below(current output):
-d_1=Network=Test,Cell=31
-r_1=Network=Test,Unit=RU-1-1,Port=A###Network=Test,Unit=RU-1-2,Port=A
-d_2=Network=Test,Cell=32
-r_2=Network=Test,Unit=RU-1-1,Port=A###Network=Test,Unit=RU-1-2,Port=A
-d_3=Network=Test,Cell=33
-r_3=Network=Test,Unit=RU-1-5,Port=A###Network=Test,Unit=RU-1-6,Port=A
But I want output to be printed like below (expected output):
-r_1=Network=Test,Unit=RU-1-1,Port=A
-d_1=Network=Test,Cell=31###Network=Test,Cell=32
-r_2=Network=Test,Unit=RU-1-2,Port=A
-d_2=Network=Test,Cell=31###Network=Test,Cell=32
-r_3=Network=Test,Unit=RU-1-5,Port=A
-d_3=Network=Test,Cell=33
-r_4=Network=Test,Unit=RU-1-6,Port=A
-d_4=Network=Test,Cell=33
The expected output is, the value of -r_<count> should be printed as singular (from %hash keys array value) and -d_<count> (from %hash keys) should printed.
The output is guided by the unique values of the arrays. Your output loop must therefore iterate over these.
(
'Network=Test,Unit=RU-1-1,Port=A',
'Network=Test,Unit=RU-1-2,Port=A',
'Network=Test,Unit=RU-1-5,Port=A',
'Network=Test,Unit=RU-1-6,Port=A',
)
However, for each of these, the output needs the associated keys. This means the output loop requires the following data:
(
'Network=Test,Unit=RU-1-1,Port=A' => [ 'Network=Test,Cell=31', 'Network=Test,Cell=32' ],
'Network=Test,Unit=RU-1-2,Port=A' => [ 'Network=Test,Cell=31', 'Network=Test,Cell=32' ],
'Network=Test,Unit=RU-1-5,Port=A' => [ 'Network=Test,Cell=33' ],
'Network=Test,Unit=RU-1-6,Port=A' => [ 'Network=Test,Cell=33' ],
)
Basically, your data structure is inside-out. But now that we know what we want, it's just a question of transforming the data structure into what we need.
my %foos_by_bar;
for my $foo (keys %hash) { # %hash_b
my $bars = $hash{$foo}; # %hash_a
for my $bar (#$bars) {
push #{ $foos_by_bar{$bar} }, $foo;
}
}
The output loop simply needs to iterate over the (possibly sorted) keys of %foos_by_bar, and #{ $foos_by_bar{$bar} } contains the data you need for -d.
Nothing's stopping you from iterating over the sorted keys of %foos_by_bar in the output loop to produce predictable output, but that won't necessarily give you the same order as in the question. If you need that specific order, you can use the following:
my #bars;
my %foos_by_bar;
for my $foo (sort keys %hash) { # %hash_b
my $bars = $hash{$foo}; # %hash_a
for my $bar (#$bars) {
push #bars, $bar if !$foos_by_bar{$bar};
push #{ $foos_by_bar{$bar} }, $foo;
}
}
In this case, the output loop would iterate over #bars.

Creating hash of hash dynamically in perl

I am trying to create a hash of hash of - the nesting depth depends on the number of arguments passed into #aGroupByFields array.
In the below implementation, I am getting the desired hash structure.But I have hard coded the fields [ example - $phBugRecord->{createdBy} ] instead of deriving it from the array.
I am not sure how to dynamically create this.
my (#aGroupByFields) = ['createdBy','status','devPriority'];
# In real case,these are passed in as arguments
my (%hTemp);
# This is the final hash which will be structured according to above fields
# %hBugDetails is the hash containing details of all bugs
foreach my $phBugRecord ( #{ $hBugDetails{records} } ) {
# The below statement needs to be generated dynamically as
# opposed to the hard-coded values.
push(
#{
$hTemp{ $phBugRecord->{createdBy} }{ $phBugRecord->{status} }
{ $phBugRecord->{devPriority} }
},
$phBugRecord
);
}
Any pointer will be a great help.Thanks.
Here is a working implementation with Data::Diver.
use strict;
use warnings;
use Data::Diver 'DiveVal';
use Data::Printer;
my %hBugDetails = (
records => [
{
createdBy => 'created_by1',
status => 'status1',
devPriority => 'dev_priority1',
foo => 'foo1',
bar => 'bar1',
},
{
createdBy => 'created_by1',
status => 'status2',
devPriority => 'dev_priority2',
foo => 'foo',
bar => 'bar',
},
],
);
# we want to group by these fields
my #group_by = ( 'createdBy', 'status', 'devPriority' );
my $grouped_bugs = {}; # for some reason we need to start with an empty hashref
foreach my $bug ( #{ $hBugDetails{records} } ) {
# this will auto-vivify the hash for us
push #{ DiveVal( $grouped_bugs, map { $bug->{$_} } #group_by ) }, $bug;
}
p $grouped_bugs;
The output looks like this.
\ {
created_by1 {
status1 {
dev_priority1 [
[0] {
bar "bar1",
createdBy "created_by1",
devPriority "dev_priority1",
foo "foo1",
status "status1"
}
]
},
status2 {
dev_priority2 [
[0] {
bar "bar",
createdBy "created_by1",
devPriority "dev_priority2",
foo "foo",
status "status2"
}
]
}
}
}
Note that I renamed your variables. It was very hard to read the code like that. It makes more sense to just use speaking names instead of cryptic abbreviations for the type of variable. The sigil already does that for you.
This code will do what you need
my #aGroupByFields = qw/ createdBy status devPriority /;
my %hTemp;
for my $phBugRecord ( #{ $hBugDetails{records} } ) {
my $hash = \%hTemp;
for my $field ( #aGroupByFields ) {
my $key = $phBugRecord->{$field};
if ( $field eq $aGroupByFields[-1] ) {
push #{ $hash->{ $key } }, $phBugRecord;
}
else {
$hash = $hash->{ $key } //= {};
}
}
}

Grouping with Perl: finding a faster solution to recursion

The Perl code below works, but it doesn't scale well even with considerable computer resources. I hoping that someone can help me find more efficient code such as by replacing recursion with iteration, if that's the problem.
my data structure looks like this:
my %REV_ALIGN;
$REV_ALIGN{$dna}{$rna} = ();
Any dna key may have multiple rna sub keys. The same rna sub key may appear with multiple different dna keys. The purpose is to group rna ( transcripts ) based on shared dna sequence elements. For example, if dnaA has RNA1, RNA8, RNA9, and RNA4, and dnaB has RNA11, RNA4, and RNA99, then we group all these transcripts together ( RNA1, RNA9, RNA4, RNA11, RNA99 ) and continue to proceed to try and add to the group by selecting other dna. My recusive solution to this problem works but doesn't scale so well when using data from whole genome to transcriptome alignment.
SO MY QUESTION IS: WHAT IS A MORE EFFICIENT SOLUTION TO THIS PROBLEM? THANK YOU VERY MUCH
my #groups;
while ( my $x =()= keys %REV_ALIGN )
{
my #DNA = keys %REV_ALIGN;
my $dna = shift #DNA;
# the corresponding list of rna
my #RNA = keys %{$REV_ALIGN{$dna}};
delete $REV_ALIGN{$dna};
if ( $x == 1 )
{
push #groups, \#RNA;
last;
}
my $ref = group_transcripts ( \#RNA, \%REV_ALIGN );
push #groups, $ref;
}
sub group_transcripts
{
my $tran_ref = shift;
my $align_ref = shift;
my #RNA_A = #$tran_ref;
my %RNA;
# create a null hash with seed list of transcripts
#RNA{#RNA_A} = ();
# get a list of all remaining dna sequences in the alignment
my #DNA = keys %{$align_ref};
my %count;
# select a different list of transcripts
for my $dna ( #DNA )
{
next unless exists $align_ref->{$dna};
my #RNA_B = keys %{$align_ref->{$dna}};
# check to see two list share and transcripts
for my $element ( #RNA_A, #RNA_B )
{
$count{$element}++;
}
for my $rna_a ( keys %count )
{
# if they do, add any new transcripts to the current group
if ( $count{$rna_a} == 2 )
{
for my $rna_b ( #RNA_B )
{
push #RNA_A, $rna_b if $count{$rna_b} == 1;
}
delete $align_ref->{$dna};
delete $count{$_} foreach keys %count;
# recurse to try and continue adding to list
#_ = ( \#RNA_A, $align_ref );
goto &group_transcripts;
}
}
delete $count{$_} foreach keys %count;
}
# if no more transcripts can be added, return a reference to the group
return \#RNA_A;
}
You have a loops nested four deep. It's an pretty safe bet that's why your code scales poorly.
If I understand correctly what you are trying to accomplish, the input
my %REV_ALIGN = (
"DNA1" => { map { $_ => undef } "RNA1", "RNA2" }, # \ Linked by RNA1 \
"DNA2" => { map { $_ => undef } "RNA1", "RNA3" }, # / \ Linked by RNA3 > Group
"DNA3" => { map { $_ => undef } "RNA3", "RNA4" }, # / /
"DNA4" => { map { $_ => undef } "RNA5", "RNA6" }, # \ Linked by RNA5 \ Group
"DNA5" => { map { $_ => undef } "RNA5", "RNA7" }, # / /
"DNA6" => { map { $_ => undef } "RNA8" }, # > Group
);
should result in
my #groups = (
[
dna => [ "DNA1", "DNA2", "DNA3" ],
rna => [ "RNA1", "RNA2", "RNA3", "RNA4" ],
],
[
dna => [ "DNA4", "DNA5" ],
rna => [ "RNA5", "RNA6", "RNA7" ],
],
[
dna => [ "DNA6" ],
rna => [ "RNA8" ],
],
);
If so, you can use the following:
use strict;
use warnings;
use Graph::Undirected qw( );
my %REV_ALIGN = (
"DNA1" => { map { $_ => undef } "RNA1", "RNA2" },
"DNA2" => { map { $_ => undef } "RNA1", "RNA3" },
"DNA3" => { map { $_ => undef } "RNA3", "RNA4" },
"DNA4" => { map { $_ => undef } "RNA5", "RNA6" },
"DNA5" => { map { $_ => undef } "RNA5", "RNA7" },
"DNA6" => { map { $_ => undef } "RNA8" },
);
my $g = Graph::Undirected->new();
for my $dna (keys(%REV_ALIGN)) {
for my $rna (keys(%{ $REV_ALIGN{$dna} })) {
$g->add_edge("dna:$dna", "rna:$rna");
}
}
my #groups;
for my $raw_group ($g->connected_components()) {
my %group = ( dna => [], rna => [] );
for (#$raw_group) {
my ($type, $val) = split(/:/, $_, 2);
push #{ $group{$type} }, $val;
}
push #groups, \%group;
}
use Data::Dumper qw( Dumper );
print(Dumper(\#groups));
If you just want the RNA, the final section simplifies to the following:
my #groups;
for my $raw_group ($g->connected_components()) {
my #group;
for (#$raw_group) {
my ($type, $val) = split(/:/, $_, 2);
push #group, $val if $type eq 'rna';
}
push #groups, \#group;
}

Perl accessing the elements in a hash/hash reference data structure

I have a question I'm hoping you could help with as I am new to hashes and hash reference stuff?
I have the following data structure:
$VAR1 = {
'http://www.superuser.com/' => {
'difference' => {
'http://www.superuser.com/questions' => '10735',
'http://www.superuser.com/faq' => '13095'
},
'equal' => {
'http://www.superuser.com/ ' => '20892'
}
},
'http://www.stackoverflow.com/' => {
'difference' => {
'http://www.stackoverflow.com/faq' => '13015',
'http://www.stackoverflow.com/questions' => '10506'
},
'equal' => {
'http://www.stackoverflow.com/ ' => '33362'
}
}
If I want to access all the URLs in the key 'difference' so I can then perform some other actions on the URLs, what is the correct or preferred method of accessing those elements?
e.g I will end up with the following URLs that I can then do stuff to in a foreach loop with:
http://www.superuser.com/questions
http://www.superuser.com/faq
http://www.stackoverflow.com/faq
http://www.stackoverflow.com/questions
------EDIT------
Code to access the elements further down the data structure shown above:
my #urls;
foreach my $key1 ( keys( %{$VAR1} ) ) {
print( "$key1\n" );
foreach my $key2 ( keys( %{$VAR1->{$key1}} ) ) {
print( "\t$key2\n" );
foreach my $key3 ( keys( %{$VAR1->{$key1}{$key2}} ) ) {
print( "\t\t$key3\n" );
push #urls, keys %{$VAR1->{$key1}{$key2}{$key3}};
}
}
}
print "#urls\n";
Using the code above why do I get the following error?
Can't use string ("13238") as a HASH ref while "strict refs" in use at ....
It is not difficult, just take the second level of keys off every key in the variable:
my #urls;
for my $key (keys %$VAR1) {
push #urls, keys %{$VAR1->{$key}{'difference'}};
}
If you're struggling with dereferencing, just keep in mind that all values in a hash or array can only be a scalar value. In a multilevel hash or array the levels are just single hashes/arrays stacked on top of each other.
For example, you could do:
for my $value (values %$VAR1) {
push #urls, keys %{$value->{'difference'}};
}
Or
for my $name (keys %$VAR1) {
my $site = $VAR1->{$name};
push #urls, keys %{$site->{'difference'}};
}
..taking the route either directly over the value (a reference to a hash) or over a temporary variable, representing the value via the key. There is more to read in perldoc perldata.