Perl Hash Comparison - perl

I have two "different" files with the same kind of data i.e.
KEY_gl Start_gl End_gl
1 114029 17
2 284 1624
3 1803 2942
4 3070 3282
5 3295 4422
KEY_gm Start_gm End_gm
1 115000 17
2 284 1624
3 1803 2942
4 3070 3282
5 3295 4422
I have saved these two different files in "hash" . The "Key" column is the key and the start and end are the values for these two different keys.
I have written a code to compare these two hashes and print out the "similar" and "non similar" keys from the files.
The Code is
my %hash_gl = ();
my %hash_gm = ();
open( my $fgl, "/home/gaurav/GMR/new_gl.txt" ) or die "Can't open the file";
while ( my $line_gl = <$fgl> ) {
chomp $line_gl;
my ( $key_gl, $start_gl, $end_gl ) = split( "\t", $line_gl );
$hash_gl{$key_gl} = [ $start_gl, $end_gl ];
}
while ( my ( $key_gl, $val_gl ) = each %hash_gl ) {
#print "$key_gl => #{$val_gl}\n";
}
open( my $fgm, "/home/gaurav/GMR/new_gm.txt" ) or die "Can't open the file";
while ( my $line_gm = <$fgm> ) {
chomp $line_gm;
my ( $key_gm, $start_gm, $end_gm ) = split( "\t", $line_gm );
$hash_gm{$key_gm} = [ $start_gm, $end_gm ];
}
while ( my ( $key_gm, $val_gm ) = each %hash_gm ) {
#print "$key_gm => #{$val_gm}\n";
}
for ( sort keys %hash_gl ) {
unless ( exists $hash_gm{$_} ) {
print "$_: not found in second hash\n";
next;
}
if ( $hash_gm{$_} == $hash_gl{$_} ) {
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
}
Kindly tell the errors in this as I am not getting the desired output.Also , I am sorry that I am new to this forum so I am not able to do the editing correctly.

After reading your files, the two hashes look like this. I created the output using Data::Dump's function dd.
my %hash_gl = (
1 => [ 114029, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 442 ],
KEY_gl => [ "Start_gl", "End_gl" ],
);
my %hash_gm = (
1 => [ 115000, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 4422 ],
KEY_gm => [ "Start_gm", "End_gm" ],
);
As you can see, the values are array refs. You put them in array refs when saying $hash_gl{$key_gl} == [ $start_gl, $end_gl ]; (and the same for gm).
When you compare the two, you are using ==, which is used for numerical comparison. If you print one of the $hash_gm{$_} values, you will get something like this:
ARRAY(0x3bb114)
That's because it's an array ref. You cannot compare those using ==.
You now have two possibilities:
you can do the comparison yourself; to do that, you need to go into the array ref and compare each value:
if ( $hash_gm{$_}->[0] == $hash_gl{$_}->[0]
&& $hash_gm{$_}->[1] == $hash_gl{$_}->[1] )
{
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
you can install and use Array::Utils
use Array::Utils 'array_diff';
# later...
if (! array_diff( #{ $hash_gm{$_} }, #{ $hash_gl{$_} } )) {
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
I would go with the first solution as that is more readable because you do not need the dereferencing and the effort to install a module just to save half a line of code is not worth it.

Assuming that you want to compare the values, say the start position, here's how I'd do it:
use warnings;
use strict;
open my $in, '<', '1.txt' or die "$!\n";
open my $in2, '<', '2.txt' or die "$!\n";
my (%hash1, %hash2);
while (<$in>){
chomp;
next unless /^\s+/;
my ($key, $start, $stop) = /\s+(\d+)\s+(\d+)\s+(\d+)/;
$hash1{$key} = [$start, $stop];
}
while (<$in2>){
chomp;
next unless /^\s+/;
my ($key, $start, $stop) = /\s+(\d+)\s+(\d+)\s+(\d+)/;
$hash2{$key} = [$start, $stop];
}
for my $key (sort keys %hash1){
if (#{$hash1{$key}}[0] == #{$hash2{$key}}[0]){
print "start matches: file1 #{$hash1{$key}}[0]\tfile2 #{$hash2{$key}}[0]\n";
}
else {print "start doesn't match: file1 #{$hash1{$key}}[0]\t file2 #{$hash2{$key}}[0]\n"};
}

#!/usr/bin/perl
use warnings;
use strict;
use feature 'say';
my %hash_gl = (
1 => [ 114029, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 442 ],
);
my %hash_gm = (
1 => [ 115000, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 4422 ],
);
sub check_hash_size {
my $hash_gl = shift;
my $hash_gm = shift;
if ((keys %$hash_gl) != (keys %$hash_gm)) {
say "the hashes are 2 different sizes";
}
else
{
say "the hashes are the same size";
}
}
sub diag_hashes {
my $hash_gl = shift;
my $hash_gm = shift;
for my $gl_key ( keys %$hash_gl ) {
if ( (scalar #{$$hash_gl{$gl_key}}) != (scalar #{$$hash_gm{$gl_key}}) ) {
say "$gl_key entry arrays are different sizes";
}
else
{
say "arrays are the same size for key $gl_key";
}
if ( ((scalar #{$$hash_gl{$gl_key}}) or (scalar #{$$hash_gm{$gl_key}})) > 2 ) {
say "$gl_key entry array exceeds 2 values";
}
if ($$hash_gl{$gl_key}[0] eq $$hash_gm{$gl_key}[0]) {
say "$gl_key start is the same in both hashes";
}
else
{
say "** key $gl_key start is different";
}
if ($$hash_gl{$gl_key}[1] eq $$hash_gm{$gl_key}[1]) {
print "$gl_key end is the same in both hashes","\n";
}
else
{
say "** key $gl_key end is different";
}
}
}
check_hash_size( \%hash_gl ,\%hash_gm);
diag_hashes( \%hash_gl ,\%hash_gm);

Related

How to iterate over a 300 pages with a parser with Perl::Mechanize?

I have written a little parser that extracts the data out of a page.
use strict;
use warnings FATAL => qw#all#;
use LWP::UserAgent;
use HTML::TreeBuilder::XPath;
use Data::Dumper;
my $handler_relurl = sub { q#https://europa.eu# . $_[0] };
my $handler_trim = sub { $_[0] =~ s#^\s*(.+?)\s*$#$1#r };
my $handler_val = sub { $_[0] =~ s#^[^:]+:\s*##r };
my $handler_split = sub { [ split $_[0], $_[1] ] };
my $handler_split_colon = sub { $handler_split->( qr#; #, $_[0] ) };
my $handler_split_comma = sub { $handler_split->( qr#, #, $_[0] ) };
my $conf =
{
url => q#https://europa.eu/youth/volunteering/evs-organisation_en#,
parent => q#//div[#class="vp ey_block block-is-flex"]#,
children =>
{
internal_url => [ q#//a/#href#, [ $handler_relurl ] ],
external_url => [ q#//i[#class="fa fa-external-link fa-lg"]/parent::p//a/#href#, [ $handler_trim ] ],
title => [ q#//h4# ],
topics => [ q#//div[#class="org_cord"]#, [ $handler_val, $handler_split_colon ] ],
location => [ q#//i[#class="fa fa-location-arrow fa-lg"]/parent::p#, [ $handler_trim ] ],
hand => [ q#//i[#class="fa fa-hand-o-right fa-lg"]/parent::p#, [ $handler_trim, $handler_split_comma ] ],
pic_number => [ q#//p[contains(.,'PIC no')]#, [ $handler_val ] ],
}
};
print Dumper browse( $conf );
sub browse
{
my $conf = shift;
my $ref = [ ];
my $lwp_useragent = LWP::UserAgent->new( agent => q#IE 6#, timeout => 10 );
my $response = $lwp_useragent->get( $conf->{url} );
die $response->status_line unless $response->is_success;
my $content = $response->decoded_content;
my $html_treebuilder_xpath = HTML::TreeBuilder::XPath->new_from_content( $content );
my #nodes = $html_treebuilder_xpath->findnodes( $conf->{parent} );
for my $node ( #nodes )
{
push #$ref, { };
while ( my ( $key, $val ) = each %{$conf->{children}} )
{
my $xpath = $val->[0];
my $handlers = $val->[1] // [ ];
$val = ($node->findvalues( qq#.$xpath# ))[0] // next;
$val = $_->( $val ) for #$handlers;
$ref->[-1]->{$key} = $val;
}
}
return $ref;
}
on a first glance the issue about scraping from page to page - can be solved via different approaches:
we have the pagination on the bottom of the page: see for example:
http://europa.eu/youth/volunteering/evs-organisation_en?country=&topic=&field_eyp_vp_accreditation_type=All&town=&name=&pic=&eiref=&inclusion_topic=&field_eyp_vp_feweropp_additional_mentoring_1=&field_eyp_vp_feweropp_additional_physical_environment_1=&field_eyp_vp_feweropp_additional_other_support_1=&field_eyp_vp_feweropp_other_support_text=&&page=5
and
http://europa.eu/youth/volunteering/evs-organisation_en?country=&topic=&field_eyp_vp_accreditation_type=All&town=&name=&pic=&eiref=&inclusion_topic=&field_eyp_vp_feweropp_additional_mentoring_1=&field_eyp_vp_feweropp_additional_physical_environment_1=&field_eyp_vp_feweropp_additional_other_support_1=&field_eyp_vp_feweropp_other_support_text=&&page=6
and
http://europa.eu/youth/volunteering/evs-organisation_en?country=&topic=&field_eyp_vp_accreditation_type=All&town=&name=&pic=&eiref=&inclusion_topic=&field_eyp_vp_feweropp_additional_mentoring_1=&field_eyp_vp_feweropp_additional_physical_environment_1=&field_eyp_vp_feweropp_additional_other_support_1=&field_eyp_vp_feweropp_other_support_text=&&page=7
well we can set this url (s) as a base -
if we have an array from which we load the urls that need to be visited - we would come across all the pages...
Note: we have more than 6000 results - and on each page 21 little entries that represent one record: so we have approx 305 Pages that we have to visit.
we can increment the pages (that are shown above) and count to the number of 305
Hardcoding the total number of pages isn't practical as it could vary. we could:
- extract the number of results from the first page, divide that by the results per page ( 21 ) and round it down.
- extract the url from the "last" link at the bottom of the page, create a URI object and read the page number from the query string.
now i think i have to Loop over all the pages.
my $url_pattern = 'https://europa.eu/youth/volunteering/evs-organisation_en&page=%s';
for my $page ( 0 .. $last )
{
my $url = sprintf $url_pattern, $page;
...
}
or i try to incorporate paging into the $conf, perhaps an iterator which upon each call fetches the next node...
After parsing each page, check for the existence of the next › link at the bottom. When you have arrived on page 292, there are no more pages, so you are done and can exit the loop with e.g. last.

GetOption - Perl - Referencing

So I have stumbled upon a little issue when trying to build out a simple "Airport Search Script" in Perl.
my $filename = '/home/student/perl-basic/topic-07/iata_airports.csv';
my $number = '1';
my $matching;
my $latitude;
my $longitude;
my $word = 'false';
GetOptions (
"filename=s" => \$filename,
"number=i" => \$number,
"matching=s" => \$matching,
"latitude=f" => \$latitude,
"longitude=f" => \$longitude,
"word=s" => \$word
);
sub parse_airports {
my $file = shift;
my $csv = Text::CSV->new( { binary => 1, eol => $/ } );
open ( my $fh, "<", $file ), or die "Error opening input file: $!";
my $ra_colnames = $csv->getline ( $fh );
$csv->column_names( #$ra_colnames );
my $ra_airports = $csv->getline_hr_all( $fh );
close ( $fh );
return $ra_airports;
}
sub get_name_matching_airports {
}
my $rah_airports = parse_airports( $filename );
my $rah_airports_found = [];
if ($matching) {
say "Up to $number airports matching $matching in $filename:";
$rah_airports_found = get_name_matching_airports(
airports => $rah_airports,
matching_string => $matching,
word => $word,
);
}
elsif ($latitude && $longitude) {
say "Up to $number airports near [$latitude, $longitude] in $filename:"
}
else {
say "Must have at least --matching, or --latitude and --longitude as arguments";
}
print pp($rah_airports_found);
So where I am struggling is in the "sub get_name_matching_airports"
Because you do not have the file let me explain the file structure.
It is a hash (ALL IATA Airports) with hashes (DETAILS of each airport). There are around 15 keys in each airport hash and one of the keys titles is (NAME). I have opened the file and parsed all the info into a hash ref which is returned at the end of the sub "parse_airports".
In the sub "get_name_matching_airports" I need to find additional airports with similar names based on the argument I passed in, into ($matching).
EXAMPLE: I parse (case-insensitive) "London" as an argument from the command line e.g. ./search_airports2 --matching London. In the sub "get_name_matching_airports" I will need to respond with any airport that has london (case-insensitive) in key(name).
Then push these newly found airports which are similar into the array "rah_airports_found" and in the end print this out.
SO I SOLVED MY PROBLEM WITH THE FOLLOWING CODE:
sub get_name_matching_airports {
my %params = (
airports => undef,
matching_string => undef,
word => undef,
#_
);
my #rah_airports_found;
my $ra_airports = $params{airports};
my $counter = 0;
foreach my $i ( #$ra_airports ) {
if ( $params{word} ) {
if ( $i->{name} eq $params{matching_string} ) {
push #rah_airports_found, $i;
$counter++;
}
}
else {
if ( $i->{name} =~ /$params{matching_string}/i ) {
push #rah_airports_found, $i;
$counter++;
}
if ( defined( $number ) && $counter == $number ) {
return \#rah_airports_found;
}
}
}
return \#rah_airports_found;
}
Example:
for my $Airport_rf (keys %{$rah_airports}) {
if ( $Airport_rf->{NAME} =~ m{\Q$matching\E}xi) {
# do your stuff here
}
}
If you don´t know the exact key of the hashref, you have to match the CLI parameter against all values.

Grouping with Perl: finding a faster solution to recursion

The Perl code below works, but it doesn't scale well even with considerable computer resources. I hoping that someone can help me find more efficient code such as by replacing recursion with iteration, if that's the problem.
my data structure looks like this:
my %REV_ALIGN;
$REV_ALIGN{$dna}{$rna} = ();
Any dna key may have multiple rna sub keys. The same rna sub key may appear with multiple different dna keys. The purpose is to group rna ( transcripts ) based on shared dna sequence elements. For example, if dnaA has RNA1, RNA8, RNA9, and RNA4, and dnaB has RNA11, RNA4, and RNA99, then we group all these transcripts together ( RNA1, RNA9, RNA4, RNA11, RNA99 ) and continue to proceed to try and add to the group by selecting other dna. My recusive solution to this problem works but doesn't scale so well when using data from whole genome to transcriptome alignment.
SO MY QUESTION IS: WHAT IS A MORE EFFICIENT SOLUTION TO THIS PROBLEM? THANK YOU VERY MUCH
my #groups;
while ( my $x =()= keys %REV_ALIGN )
{
my #DNA = keys %REV_ALIGN;
my $dna = shift #DNA;
# the corresponding list of rna
my #RNA = keys %{$REV_ALIGN{$dna}};
delete $REV_ALIGN{$dna};
if ( $x == 1 )
{
push #groups, \#RNA;
last;
}
my $ref = group_transcripts ( \#RNA, \%REV_ALIGN );
push #groups, $ref;
}
sub group_transcripts
{
my $tran_ref = shift;
my $align_ref = shift;
my #RNA_A = #$tran_ref;
my %RNA;
# create a null hash with seed list of transcripts
#RNA{#RNA_A} = ();
# get a list of all remaining dna sequences in the alignment
my #DNA = keys %{$align_ref};
my %count;
# select a different list of transcripts
for my $dna ( #DNA )
{
next unless exists $align_ref->{$dna};
my #RNA_B = keys %{$align_ref->{$dna}};
# check to see two list share and transcripts
for my $element ( #RNA_A, #RNA_B )
{
$count{$element}++;
}
for my $rna_a ( keys %count )
{
# if they do, add any new transcripts to the current group
if ( $count{$rna_a} == 2 )
{
for my $rna_b ( #RNA_B )
{
push #RNA_A, $rna_b if $count{$rna_b} == 1;
}
delete $align_ref->{$dna};
delete $count{$_} foreach keys %count;
# recurse to try and continue adding to list
#_ = ( \#RNA_A, $align_ref );
goto &group_transcripts;
}
}
delete $count{$_} foreach keys %count;
}
# if no more transcripts can be added, return a reference to the group
return \#RNA_A;
}
You have a loops nested four deep. It's an pretty safe bet that's why your code scales poorly.
If I understand correctly what you are trying to accomplish, the input
my %REV_ALIGN = (
"DNA1" => { map { $_ => undef } "RNA1", "RNA2" }, # \ Linked by RNA1 \
"DNA2" => { map { $_ => undef } "RNA1", "RNA3" }, # / \ Linked by RNA3 > Group
"DNA3" => { map { $_ => undef } "RNA3", "RNA4" }, # / /
"DNA4" => { map { $_ => undef } "RNA5", "RNA6" }, # \ Linked by RNA5 \ Group
"DNA5" => { map { $_ => undef } "RNA5", "RNA7" }, # / /
"DNA6" => { map { $_ => undef } "RNA8" }, # > Group
);
should result in
my #groups = (
[
dna => [ "DNA1", "DNA2", "DNA3" ],
rna => [ "RNA1", "RNA2", "RNA3", "RNA4" ],
],
[
dna => [ "DNA4", "DNA5" ],
rna => [ "RNA5", "RNA6", "RNA7" ],
],
[
dna => [ "DNA6" ],
rna => [ "RNA8" ],
],
);
If so, you can use the following:
use strict;
use warnings;
use Graph::Undirected qw( );
my %REV_ALIGN = (
"DNA1" => { map { $_ => undef } "RNA1", "RNA2" },
"DNA2" => { map { $_ => undef } "RNA1", "RNA3" },
"DNA3" => { map { $_ => undef } "RNA3", "RNA4" },
"DNA4" => { map { $_ => undef } "RNA5", "RNA6" },
"DNA5" => { map { $_ => undef } "RNA5", "RNA7" },
"DNA6" => { map { $_ => undef } "RNA8" },
);
my $g = Graph::Undirected->new();
for my $dna (keys(%REV_ALIGN)) {
for my $rna (keys(%{ $REV_ALIGN{$dna} })) {
$g->add_edge("dna:$dna", "rna:$rna");
}
}
my #groups;
for my $raw_group ($g->connected_components()) {
my %group = ( dna => [], rna => [] );
for (#$raw_group) {
my ($type, $val) = split(/:/, $_, 2);
push #{ $group{$type} }, $val;
}
push #groups, \%group;
}
use Data::Dumper qw( Dumper );
print(Dumper(\#groups));
If you just want the RNA, the final section simplifies to the following:
my #groups;
for my $raw_group ($g->connected_components()) {
my #group;
for (#$raw_group) {
my ($type, $val) = split(/:/, $_, 2);
push #group, $val if $type eq 'rna';
}
push #groups, \#group;
}

Pass hash to subroutine inside a subroutine already passed that hash

I am working with passing hashes to various subroutines, and I was wondering how to pass a hash to a subroutine and then pass the same hash inside that subroutine to a different subroutine and so on.
For example, the following code works fine.
use strict;
use warnings;
my %hash = (
key1 => 'value1',
key2 => 'value2',
key3 => 'value3',
key4 => '',
);
print %hash, "\n";
check_it(\%hash);
sub check_it {
my $params = shift;
foreach(keys %{$params}){
if($params->{$_}) {
print "'$_' defined as '$params->{$_}'\n";
}
else {
print "'$_' not defined as '$params->{$_}'. Deleting it.\n";
#delete $params->{$_};
$params->{$_} = 'null';
}
}
for ( my $i = 0 ; $i < 7 ; $i++ ) {
print "looping\n";
&check_tags_again(\%hash);
}
}
sub check_tags_again {
my $hash_now = shift;
#check again...
foreach(keys %{$hash_now}){
print "An element of hash: ", $hash_now->{$_}, "\n";
#if(defined $hash_now->{$_}){ delete $hash_now->{$_};}
}
&check_tags_thrice(\%hash);
}
sub check_tags_thrice {
my $hash_now = shift;
#check again...
foreach(keys %{$hash_now}){
print "An element of hash: ", $hash_now->{$_}, "\n";
#if(defined $hash_now->{$_}){ delete $hash_now->{$_};}
}
}
print "final hash:", %hash, "\n";
BUT, when I run the code that follows:
use strict;
use warnings;
use Data::Dumper;
sub process_data {
my $group_size = 10;
my %HoA = (
flintstones => [ "fred", "barney" ],
jetsons => [ "george", "jane", "elroy" ],
simpsons => [ "homer", "marge", "bart" ],
);
&delete_stuff( \%HoA, $group_size );
print "New group:\n";
print Dumper( \%HoA );
undef %HoA;
}
sub delete_stuff {
my $HoARef = shift;
my $group_size = shift;
print "group size in sub $group_size\n";
for ( my $j = 0 ; $j < $group_size ; $j++ ) {
my $dlted = &delete_other_stuff( \%HoA, $j );
print "deleted? '$dlted'\n";
if ( $dlted == 0 ) {
&presence_check( \%HoA, $j );
}
for ( my $i = 0 ; $i < $group_size ; $i++ ) {
}
}
}
sub delete_other_stuff {
my $HoAref = shift;
my $Dex = shift;
return $deleted;
}
sub presence_check {
my $HoAreF = shift;
my $DeX = shift;
}
I get:
Global symbol "%HoA" requires explicit package name at x.pl line 32.
Global symbol "%HoA" requires explicit package name at x.pl line 35.
Execution of x.pl aborted due to compilation errors.
I'm confused because I think it's doing the same thing as the first, but now it claims that %HoA was never initialized.
In delete_stuff, you don't have %HoA, you have $HoARef. If all the subs are expecting a reference to a hash, then you can just use it:
for ( my $j = 0 ; $j < $group_size ; $j++ ) {
my $dlted = &delete_other_stuff( $HoARef, $j );
print "deleted? '$dlted'\n";
if ( $dlted == 0 ) {
&presence_check( $HoARef, $j );
}
...
}
By the way, we're closing on 20 years of Perl 5. There is no reason to call a sub with explicitly passed parameters with an &, which is a Perl 4 holdover.

Sorting a hash by value when it has many keys

I believe this is how you would normally sort a hash by value:
foreach my $key (sort { $hash{$a} <=> $hash{$b} } (keys %hash) ) {
print "$key=>$hash{$key}";
}
This would print out the values smallest to largest.
Now, what if I have a hash like this:
$hash{$somekey}{$somekey2}{$thirdkey}
How could I sort by values and get all the keys as well?
I would just create a new hash:
my %new;
for my $k1 (keys %hash) {
for my $k2 (keys %{$hash{$k1}}) {
for my $k3 (keys %{$hash{$k1}{$k2}}) {
$new{$k1,$k2,$k3} = $hash{$k1}{$k2}{$k3};
}
}
}
my #ordered = sort { $new{$a} <=> $new{$b} } keys %new;
for my $k (#ordered) {
my #keys = split($;, $k);
print "key: #k - value: $new{$k}\n";
}
I have done something similar by moving a reference down to the appropriate hash key. You can then perform the sort on the pointer.
The advantage to doing it this way is that it is easy to adjust if the level changes.
What I have used this methodology for is systematically moving the pointer to a specific level by referencing an array of keys. (Ex: my #Keys = ('Value', 'Value2');)
I believe a derivative of the following example might give you what you are looking for.
my $list_ref;
my $pointer;
my %list = (
Value => {
Value2 => {
A => '1',
C => '3',
B => '2',
},
},
);
$list_ref = \%list;
$pointer = $list_ref->{Value}->{Value2};
foreach my $key (sort { $pointer->{$a} <=> $pointer->{$b} } (keys %{$pointer})) {
print "Key: $key\n";
}
For academic purposes, here's a fairly tidy recursive function:
sub flatten_hash {
my ($hash, $path) = #_;
$path = [] unless defined $path;
my #ret;
while (my ($key, $value) = each %$hash) {
if (ref $value eq 'HASH') {
push #ret, flatten_hash($value, [ #$path, $key ]);
} else {
push #ret, [ [ #$path, $key ], $value ];
}
}
return #ret;
}
which takes a hash like
{
roman => {
i => 1,
ii => 2,
iii => 3,
},
english => {
one => 1,
two => 2,
three => 3,
},
}
and turns it into a list like
(
[ ['roman','i'], 1 ],
[ ['roman', 'ii'], 2 ],
[ ['roman', 'iii'], 3 ],
[ ['english', 'one'], 1 ],
[ ['english', 'two'], 2 ],
[ ['english', 'three'], 3 ]
)
although of course the order is bound to vary. Given that list, you can sort it on { $a->[1] <=> $b->[1] } or similar, and then extract the key path from #{ $entry->[0] } for each entry. It works regardless of the depth of the data structure, and even if the leaf nodes don't occur all at the same depth. It needs a little bit of extension to deal with structures that aren't purely of hashrefs and plain scalars, though.
Here's a way to do it using Deep::Hash::Utils.
use Deep::Hash::Utils qw(slurp);
my %h = (
A => {
Aa => { Aaa => 4, Aab => 5 },
Ab => { Aba => 1 },
Ac => { Aca => 2, Acb => 9, Acc => 0 },
},
B => {
Ba => { Baa => 44, Bab => -55 },
Bc => { Bca => 22, Bcb => 99, Bcc => 100 },
},
);
my #all_keys_and_vals = slurp \%h;
print "#$_\n" for sort { $a->[-1] <=> $b->[-1] } #all_keys_and_vals;
Output:
B Ba Bab -55
A Ac Acc 0
A Ab Aba 1
A Ac Aca 2
A Aa Aaa 4
A Aa Aab 5
A Ac Acb 9
B Bc Bca 22
B Ba Baa 44
B Bc Bcb 99
B Bc Bcc 100