Can Perl's "exists" modify data structure values? - perl

I have a nested hash table that looks like this:
my %myhash = (
"val1" => {
"A/B.c" => {
"funct1" => 1
}
},
"val2" => {
"C/D.c" => {
"funct2" => 1
}
}
)
My objective with this data structure is to produce different values based on whether certain hash tables exist. For example,
sub mysub
{
my $val = shift;
my $file = shift;
my $funct = shift;
if (exists $myhash{$val}{$file}{$funct}) {
return "return1";
}
if (exists $myhash{$val}{$file}) {
return "return2";
}
return "return3";
}
The behavior I'm encountering is as follows. I have an instance in time when
my $val = "val1";
my $file = "C/D.c";
my $funct = "funct3";
At this point in time, the return value I get "return2". These are my observations with the Perl debugger:
Break at first "if" in mysub
Print p $proxToBugs{"val1"}{"C/D.c"} ==> Returns blank line. Okay. Continue and this "if" is skipped.
Continue and break at the second "if" in mysub
Print p $proxToBugs{"val1"}{"C/D.c"} ==> Returns "HASH(0x...)". WTF moment. Function returns "return2".
This tells me that running the first if modified the data structure, which allows the second if to pass when in fact it shouldn't. The function I'm running is identical to the function shown above; this one is just sanitized. Anyone has an explanation for me? :)

Yes. This is because of autovivification. See the bottom of the exists documentation:
Although the mostly deeply nested array or hash will not spring into existence just because its existence was tested, any intervening ones [autovivified arrays or hashes] will [spring into existance]. Thus $ref->{"A"} and $ref->{"A"}->{"B"} will spring into existence due to the existence test for the $key element above. This happens anywhere the arrow operator is used...
Where "...test for the $key element above..." refers to:
if (exists $ref->{A}->{B}->{$key}) { }
if (exists $hash{A}{B}{$key}) { } # same idea, implicit arrow
Happy coding.

As pst rightly points out, this is autovivification. There are at least two ways to avoid it. The first (and most common in my experience) is to test at each level:
if (
exists $h{a} and
exists $h{a}{b} and
exists $h{a}{b}{c}
) {
...
}
The short-circuit nature of and causes the second and third calls to exists to not be executed if the earlier levels don't exist.
A more recent solution is the autovivification pragma (available from CPAN):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
{
my %h;
if (exists $h{a}{b}{c}) {
print "impossible, it is empty\n";
}
print Dumper \%h;
}
{
no autovivification;
my %h;
if (exists $h{a}{b}{c}) {
print "impossible, it is empty\n";
}
print Dumper \%h;
}
A third method that ysth mentions in the comments has the benefits of being in core (like the first example) and of not repeating the exists function call; however, I believe it does so at the expense of readability:
if (exists ${ ${ $h{a} || {} }{b} || {} }{c}) {
...
}
It works by replacing any level that doesn't exist with a hashref to take the autovivification. These hashrefs will be discarded after the if statement is done executing. Again we see the value of short-circuiting logic.
Of course, all three of these methods makes an assumption about the data the hash is expected to hold, a more robust method includes calls to ref or reftype depending on how you want to treat objects (there is a third option that takes into account classes that overload the hash indexing operator, but I can't remember its name):
if (
exists $h{a} and
ref $h{a} eq ref {} and
exists $h{a} and
ref $h{a}{b} eq ref {} and
exists $h{a}{b}{c}
) {
...
}
In the comments, pst asked if something like myExists($ref,"a","b","c") exists. I am certain there is a module in CPAN that does something like that, but I am not aware of it. There are too many edge cases for me to find that useful, but a simple implementation would be:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
sub safe_exists {
my ($ref, #keys) = #_;
for my $k (#keys) {
return 0 unless ref $ref eq ref {} and exists $ref->{$k};
$ref = $ref->{$k};
}
return 1;
}
my %h = (
a => {
b => {
c => 5,
},
},
);
unless (safe_exists \%h, qw/x y z/) {
print "x/y/z doesn't exist\n";
}
unless (safe_exists \%h, qw/a b c d/) {
print "a/b/c/d doesn't exist\n";
}
if (safe_exists \%h, qw/a b c/) {
print "a/b/c does exist\n";
}
print Dumper \%h;

If you want to turn off autovivification, you can do that lexically with the autovivification pragma:
{
no autovivification;
if( exists $hash{A}{B}{$key} ) { ... }
}
I wrote more about this at The Effective Perler as Turn off autovivification when you don’t want it.

Related

Automatically call hash values that are subroutine references

I have a hash with a few values that are not scalar data but rather anonymous subroutines that return scalar data. I want to make this completely transparent to the part of the code that looks up values in the hash, so that it doesn't have to be aware that some of the hash values may be anonymous subroutines that return scalar data rather than just plain old scalar data.
To that effect, is there any way to have the anonymous subroutines executed when their keys are accessed, without using any special syntax? Here's a simplified example that illustrates the goal and the problem:
#!/usr/bin/perl
my %hash = (
key1 => "value1",
key2 => sub {
return "value2"; # In the real code, this value can differ
},
);
foreach my $key (sort keys %hash) {
print $hash{$key} . "\n";
}
The output I would like is:
perl ./test.pl
value1
value2
Instead, this is what I get:
perl ./test.pl
value1
CODE(0x7fb30282cfe0)
As noted by Oleg, it's possible to do this using various more or less arcane tricks like tie, overloading or magic variables. However, this would be both needlessly complicated and pointlessly obfuscated. As cool as such tricks are, using them in real code would be a mistake at least 99% of the time.
In practice, the simplest and cleanest solution is probably to write a helper subroutine that takes a scalar and, if it's a code reference, executes it and returns the result:
sub evaluate {
my $val = shift;
return $val->() if ref($val) eq 'CODE';
return $val; # otherwise
}
and use it like this:
foreach my $key (sort keys %hash) {
print evaluate($hash{$key}) . "\n";
}
I don't believe that the words that others have written in disapproval of the tie mechanism are warranted. None of the authors seem to properly understand how it works and what core library backup is available
Here's a tie example based on Tie::StdHash
If you tie a hash to the Tie::StdHash class then it works exactly as a normal hash. That means there's nothing left to write except for methods that you may want to override
In this case I've overridden TIEHASH so that I could specify the initialisation list in the same statement as the tie command, and FETCH, which calls the superclass's FETCH and then makes a call to it if it happens to be a subroutine reference
Your tied hash will work as normal except for the change that you have asked for. I hope it is obvious that there is no longer a direct way to retrieve a subroutine reference if you have stored it as a hash value. Such a value will always be replaced by the result of calling it without any parameters
SpecialHash.pm
package SpecialHash;
use Tie::Hash;
use base 'Tie::StdHash';
sub TIEHASH {
my $class = shift;
bless { #_ }, $class;
}
sub FETCH {
my $self = shift;
my $val = $self->SUPER::FETCH(#_);
ref $val eq 'CODE' ? $val->() : $val;
}
1;
main.pl
use strict;
use warnings 'all';
use SpecialHash;
tie my %hash, SpecialHash => (
key1 => "value1",
key2 => sub {
return "value2"; # In the real code, this value can differ
},
);
print "$hash{$_}\n" for sort keys %hash;
output
value1
value2
Update
It sounds like your real situation is with an existing hash that looks something like this
my %hash = (
a => {
key_a1 => 'value_a1',
key_a2 => sub { 'value_a2' },
},
b => {
key_b1 => sub { 'value_b1' },
key_b2 => 'value_b2',
},
);
Using tie on already-populated variables isn't so neat as tying then at the point of declaration and then inserting the values as the data must be copied to the tied object. However the way I have written the TIEHASH method in the SpecialHash class makes this simple to do in the tie statement
If possible, it would be much better to tie each hash before you put data into it and add it to the primary hash
This program ties every value of %hash that happens to be a hash reference. The core of this is the statement
tie %$val, SpecialHash => ( %$val )
which functions identically to
tie my %hash, SpecialHash => ( ... )
in the previous code but dereferences $val to make the syntax valid, and also uses the current contents of the hash as the initialisation data for the tied hash. That is how the data gets copied
After that there is just a couple of nested loops that dump the whole of %hash to verify that the ties are working
use strict;
use warnings 'all';
use SpecialHash;
my %hash = (
a => {
key_a1 => 'value_a1',
key_a2 => sub { 'value_a2' },
},
b => {
key_b1 => sub { 'value_b1' },
key_b2 => 'value_b2',
},
);
# Tie all the secondary hashes that are hash references
#
for my $val ( values %hash ) {
tie %$val, SpecialHash => ( %$val ) if ref $val eq 'HASH';
}
# Dump all the elements of the second-level hashes
#
for my $k ( sort keys %hash ) {
my $v = $hash{$k};
next unless ref $v eq 'HASH';
print "$k =>\n";
for my $kk ( sort keys %$v ) {
my $vv = $v->{$kk};
print " $kk => $v->{$kk}\n"
}
}
output
a =>
key_a1 => value_a1
key_a2 => value_a2
b =>
key_b1 => value_b1
key_b2 => value_b2
There's a feature called "magic" that allows code to be called when variables are accessed.
Adding magic to a variable greatly slows down access to that variable, but some are more expensive than others.
There's no need to make access to every element of the hash magical, just some values.
tie is an more expensive form of magic, and it's not needed here.
As such, the most efficient solution is the following:
use Time::HiRes qw( time );
use Variable::Magic qw( cast wizard );
{
my $wiz = wizard(
data => sub { my $code = $_[1]; $code },
get => sub { ${ $_[0] } = $_[1]->(); },
);
sub make_evaluator { cast($_[0], $wiz, $_[1]) }
}
my %hash;
$hash{key1} = 'value1';
make_evaluator($hash{key2}, sub { 'value2#'.time });
print("$hash{$_}\n") for qw( key1 key2 key2 );
Output:
value1
value2#1462548850.76715
value2#1462548850.76721
Other examples:
my %hash; make_evaluator($hash{key}, sub { ... });
my $hash; make_evaluator($hash->{$key}, sub { ... });
my $x; make_evaluator($x, sub { ... });
make_evaluator(my $x, sub { ... });
make_evaluator(..., sub { ... });
make_evaluator(..., \&some_sub);
You can also "fix up" an existing hash. In your hash-of-hashes scenario,
my $hoh = {
{
key1 => 'value1',
key2 => sub { ... },
...
},
...
);
for my $h (values(%$hoh)) {
for my $v (values(%$h)) {
if (ref($v) eq 'CODE') {
make_evaluator($v, $v);
}
}
}
Yes you can. You can either tie hash to implementation that will resolve coderefs to their return values or you can use blessed scalars as values with overloaded mehods for stringification, numification and whatever else context you want to resolve automatically.
One of perl's special features for just such a use case is tie. This allows you to attach object oriented style methods, to a scalar or hash.
It should be used with caution, because it can mean that your code is doing really strange things, in unexpected ways.
But as an example:
#!/usr/bin/env perl
package RandomScalar;
my $random_range = 10;
sub TIESCALAR {
my ( $class, $range ) = #_;
my $value = 0;
bless \$value, $class;
}
sub FETCH {
my ($self) = #_;
return rand($random_range);
}
sub STORE {
my ( $self, $range ) = #_;
$random_range = $range;
}
package main;
use strict;
use warnings;
tie my $random_var, 'RandomScalar', 5;
for ( 1 .. 10 ) {
print $random_var, "\n";
}
$random_var = 100;
for ( 1 .. 10 ) {
print $random_var, "\n";
}
As you can see - this lets you take an 'ordinary' scalar, and do fruity things with it. You can use a very similar mechanism with a hash - an example might be to do database lookups.
However, you also need to be quite cautious - because you're creating action at a distance by doing so. Future maintenance programmers might well not expect your $random_var to actually change each time you run it, and a value assignment to not actually 'set'.
It can be really useful for e.g. testing though, which is why I give an example.
In your example - you could potentially 'tie' the hash:
#!/usr/bin/env perl
package MagicHash;
sub TIEHASH {
my ($class) = #_;
my $self = {};
return bless $self, $class;
}
sub FETCH {
my ( $self, $key ) = #_;
if ( ref( $self->{$key} ) eq 'CODE' ) {
return $self->{$key}->();
}
else {
return $self->{$key};
}
}
sub STORE {
my ( $self, $key, $value ) = #_;
$self->{$key} = $value;
}
sub CLEAR {
my ($self) = #_;
$self = {};
}
sub FIRSTKEY {
my ($self) = #_;
my $null = keys %$self; #reset iterator
return each %$self;
}
sub NEXTKEY {
my ($self) = #_;
return each %$self;
}
package main;
use strict;
use warnings;
use Data::Dumper;
tie my %magic_hash, 'MagicHash';
%magic_hash = (
key1 => 2,
key2 => sub { return "beefcake" },
);
$magic_hash{random} = sub { return rand 10 };
foreach my $key ( keys %magic_hash ) {
print "$key => $magic_hash{$key}\n";
}
foreach my $key ( keys %magic_hash ) {
print "$key => $magic_hash{$key}\n";
}
foreach my $key ( keys %magic_hash ) {
print "$key => $magic_hash{$key}\n";
}
This is slightly less evil, because future maintenance programmers can use your 'hash' normally. But dynamic eval can shoot the unwary in the foot, so still - caution is advised.
And alternative is to do it 'proper' object oriented - create a 'storage object' that's ... basically like the above - only it creates an object, rather than using tie. This should be much clearer for long term usage, because you won't get unexpected behaviour. (It's an object doing magic, which is normal, not a hash that 'works funny').
You need to identify when a code ref is present, then execute it as an actual call:
foreach my $key (sort keys %hash) {
if (ref $hash{$key} eq 'CODE'){
print $hash{$key}->() . "\n";
}
else {
print "$hash{$key}\n";
}
}
Note that you may consider making all of the hash values subs (a true dispatch table) instead of having some that return non-coderefs and some that return refs.
However, if you define the hash as such, you don't have to do any special trickery when it comes time to use the hash. It calls the sub and returns the value directly when the key is looked up.
key2 => sub {
return "value2";
}->(),
No, not without some ancillary code. You are asking for a simple scalar value and a code reference to behave in the same way. The code that would do that is far from simple and also injects complexity between your hash and its use. You might find the following approach simpler and cleaner.
You can make all values code references, making the hash a dispatch table, for uniform invocation
my %hash = (
key1 => sub { return "value1" },
key2 => sub {
# carry on some processing ...
return "value2"; # In the real code, this value can differ
},
);
print $hash{$_}->() . "\n" for sort keys %hash;
But of course there is a minimal overhead to this approach.

Perl, check if pair exists in hash of hashes

In Perl, I have a hash of hashes created with a loop similar to the following
my %HoH
for my $i (1..10) {
$HoH{$a}{$b} = $i;
}
$a and $b are variables that do have some value when the HoH gets filled in. After creating the HoH, how can I check if a particular pair ($c, $d) exists in the HoH? The following does not work
if (defined $HoH{$c}{$d}) {...}
because if $c does not exist in HoH already, it will be created as a key without a value.
Writing
if (defined $HoH{$c}{$d}) {...}
will "work" insomuch as it will tell you whether or not $HoH{$c}{$d} has a defined value. The problem is that if $HoH{$c} doesn't already exist it will be created (with an appropriate value) so that $HoH{$c}{$d} can be tested. This process is called "autovivification." It's convenient when setting values, e.g.
my %hoh;
$hoh{a}{b} = 1; # Don't need to set '$hoh{a} = {}' first
but inconvenient when retrieving possibly non-existent values. I wish that Perl was smart enough to only perform autovivification for expressions used as lvalues and short-circuit to return undef for rvalues but, alas, it's not that magical. The autovivification pragma (available on CPAN) adds the functionality to do this.
To avoid autovivification you need to test the intermediate values first:
if (exists $HoH{$c} && defined $HoH{$c}{$d}) {
...
}
use Data::Dumper;
my %HoH;
$HoH{A}{B} = 1;
if(exists $HoH{C} && exists $HoH{C}{D}) {
print "exists\n";
}
print Dumper(\%HoH);
if(exists $HoH{C}{D}) {
print "exists\n";
}
print Dumper(\%HoH);
Output:
$VAR1 = {
'A' => {
'B' => 1
}
};
$VAR1 = {
'A' => {
'B' => 1
},
'C' => {}
};
Autovivification is causing the keys to be created. "exists" in my second example shows this so the first example checks both keys individually.
Several ways:
if ( $HoH{$c} && defined $HoH{$c}{$d} ) {...}
or
if ( defined ${ $HoH{$c} || {} }{$d} ) {...}
or
no autovivification;
if (defined $HoH{$c}{$d}) {...}
or
use Data::Diver;
if ( defined Data::Diver::Dive( \%HoH, $c, $d ) ) {...}
You have to use the exists function
exists EXPR
Given an expression that specifies an
element of a hash, returns true if the
specified element in the hash has ever
been initialized, even if the
corresponding value is undefined.
Note that the EXPR can be arbitrarily
complicated as long as the final
operation is a hash or array key
lookup or subroutine name:
if (exists $ref->{A}->{B}->{$key}) { }
if (exists $hash{A}{B}{$key}) { }
My take:
use List::Util qw<first>;
use Params::Util qw<_HASH>;
sub exists_deep (\[%$]#) {
my $ref = shift;
return unless my $h = _HASH( $ref ) // _HASH( $$ref )
and defined( my $last_key = pop )
;
# Note that this *must* be a hash ref, for anything else to make sense.
return if first { !( $h = _HASH( $h->{ $_ } )) } #_;
return exists $h->{ $last_key };
}
You could also do this recursively. You could also create a descent structure allowing intermediate and even terminal arrayref with just a little additional coding.

Perl: if ( element in list )

I'm looking for presence of an element in a list.
In Python there is an in keyword and I would do something like:
if element in list:
doTask
Is there something equivalent in Perl without having to manually iterate through the entire list?
UPDATE:
The smartmatch family of features are now experimental
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.
Warnings will now be issued when the parser sees ~~, given, or when.
If you can get away with requiring Perl v5.10, then you can use any of the following examples.
The smart match ~~ operator.
if( $element ~~ #list ){ ... }
if( $element ~~ [ 1, 2, 3 ] ){ ... }
You could also use the given/when construct. Which uses the smart match functionality internally.
given( $element ){
when( #list ){ ... }
}
You can also use a for loop as a "topicalizer" ( meaning it sets $_ ).
for( #elements ){
when( #list ){ ... }
}
One thing that will come out in Perl 5.12 is the ability to use the post-fix version of when. Which makes it even more like if and unless.
given( $element ){
... when #list;
}
If you have to be able to run on older versions of Perl, there still are several options.
You might think you can get away with using List::Util::first, but there are some edge conditions that make it problematic.
In this example it is fairly obvious that we want to successfully match against 0. Unfortunately this code will print failure every time.
use List::Util qw'first';
my $element = 0;
if( first { $element eq $_ } 0..9 ){
print "success\n";
} else {
print "failure\n";
}
You could check the return value of first for defined-ness, but that will fail if we actually want a match against undef to succeed.
You can safely use grep however.
if( grep { $element eq $_ } 0..9 ){ ... }
This is safe because grep gets called in a scalar context. Arrays return the number of elements when called in scalar context. So this will continue to work even if we try to match against undef.
You could use an enclosing for loop. Just make sure you call last, to exit out of the loop on a successful match. Otherwise you might end up running your code more than once.
for( #array ){
if( $element eq $_ ){
...
last;
}
}
You could put the for loop inside the condition of the if statement ...
if(
do{
my $match = 0;
for( #list ){
if( $element eq $_ ){
$match = 1;
last;
}
}
$match; # the return value of the do block
}
){
...
}
... but it might be more clear to put the for loop before the if statement.
my $match = 0;
for( #list ){
if( $_ eq $element ){
$match = 1;
last;
}
}
if( $match ){ ... }
If you're only matching against strings, you could also use a hash. This can speed up your program if #list is large and, you are going to match against %hash several times. Especially if #array doesn't change, because then you only have to load up %hash once.
my %hash = map { $_, 1 } #array;
if( $hash{ $element } ){ ... }
You could also make your own subroutine. This is one of the cases where it is useful to use prototypes.
sub in(&#){
local $_;
my $code = shift;
for( #_ ){ # sets $_
if( $code->() ){
return 1;
}
}
return 0;
}
if( in { $element eq $_ } #list ){ ... }
if( $element ~~ #list ){
do_task
}
~~ is the "smart match operator", and does more than just list membership detection.
grep is helpful here
if (grep { $_ eq $element } #list) {
....
}
If you plan to do this many times, you can trade-off space for lookup time:
#!/usr/bin/perl
use strict; use warnings;
my #array = qw( one ten twenty one );
my %lookup = map { $_ => undef } #array;
for my $element ( qw( one two three ) ) {
if ( exists $lookup{ $element }) {
print "$element\n";
}
}
assuming that the number of times the element appears in #array is not important and the contents of #array are simple scalars.
List::Util::first
$foo = first { ($_ && $_ eq "value" } #list; # first defined value in #list
Or for hand-rolling types:
my $is_in_list = 0;
foreach my $elem (#list) {
if ($elem && $elem eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
A slightly different version MIGHT be somewhat faster on very long lists:
my $is_in_list = 0;
for (my $i = 0; i < scalar(#list); ++$i) {
if ($list[i] && $list[i] eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
TIMTOWTDI
sub is (&#) {
my $test = shift;
$test->() and return 1 for #_;
0
}
sub in (#) {#_}
if( is {$_ eq "a"} in qw(d c b a) ) {
print "Welcome in perl!\n";
}
List::MoreUtils
On perl >= 5.10 the smart match operator is surely the easiest way, as many others have already said.
On older versions of perl, I would instead suggest List::MoreUtils::any.
List::MoreUtils is not a core module (some say it should be) but it's very popular and it's included in major perl distributions.
It has the following advantages:
it returns true/false (as Python's in does) and not the value of the element, as List::Util::first does (which makes it hard to test, as noted above);
unlike grep, it stops at the first element which passes the test (perl's smart match operator short circuits as well);
it works with any perl version (well, >= 5.00503 at least).
Here is an example which works with any searched (scalar) value, including undef:
use List::MoreUtils qw(any);
my $value = 'test'; # or any other scalar
my #array = (1, 2, undef, 'test', 5, 6);
no warnings 'uninitialized';
if ( any { $_ eq $value } #array ) {
print "$value present\n"
}
P.S.
(In production code it's better to narrow the scope of no warnings 'uninitialized').
Probably Perl6::Junction is the clearest way to do. No XS dependencies, no mess and no new perl version required.
use Perl6::Junction qw/ any /;
if (any(#grant) eq 'su') {
...
}
This blog post discusses the best answers to this question.
As a short summary, if you can install CPAN modules then the best solutions are:
if any(#ingredients) eq 'flour';
or
if #ingredients->contains('flour');
However, a more usual idiom is:
if #any { $_ eq 'flour' } #ingredients
which i find less clear.
But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list. While any() will stop as soon as it finds your value.
Check out the blog post for more details.
PS: i'm answering for people who will have the same question in the future.
You can accomplish a similar enough syntax in Perl if you do some Autoload hacking.
Create a small package to handle the autoload:
package Autoloader;
use strict;
use warnings;
our $AUTOLOAD;
sub AUTOLOAD {
my $self = shift;
my ($method) = (split(/::/, $AUTOLOAD))[-1];
die "Object does not contain method '$method'" if not ref $self->{$method} eq 'CODE';
goto &{$self->{$method}};
}
1;
Then your other package or main script will contain a subroutine that returns the blessed object which gets handled by Autoload when its method attempts to be called.
sub element {
my $elem = shift;
my $sub = {
in => sub {
return if not $_[0];
# you could also implement this as any of the other suggested grep/first/any solutions already posted.
my %hash; #hash{#_} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}
This leaves you with usage looking like:
doTask if element('something')->in(#array);
If you reorganize the closure and its arguments, you can switch the syntax around the other way to make it look like this, which is a bit closer to the autobox style:
doTask if search(#array)->contains('something');
function to do that:
sub search {
my #arr = #_;
my $sub = {
contains => sub {
my $elem = shift or return;
my %hash; #hash{#arr} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}

How can I cleanly turn a nested Perl hash into a non-nested one?

Assume a nested hash structure %old_hash ..
my %old_hash;
$old_hash{"foo"}{"bar"}{"zonk"} = "hello";
.. which we want to "flatten" (sorry if that's the wrong terminology!) to a non-nested hash using the sub &flatten(...) so that ..
my %h = &flatten(\%old_hash);
die unless($h{"zonk"} eq "hello");
The following definition of &flatten(...) does the trick:
sub flatten {
my $hashref = shift;
my %hash;
my %i = %{$hashref};
foreach my $ii (keys(%i)) {
my %j = %{$i{$ii}};
foreach my $jj (keys(%j)) {
my %k = %{$j{$jj}};
foreach my $kk (keys(%k)) {
my $value = $k{$kk};
$hash{$kk} = $value;
}
}
}
return %hash;
}
While the code given works it is not very readable or clean.
My question is two-fold:
In what ways does the given code not correspond to modern Perl best practices? Be harsh! :-)
How would you clean it up?
Your method is not best practices because it doesn't scale. What if the nested hash is six, ten levels deep? The repetition should tell you that a recursive routine is probably what you need.
sub flatten {
my ($in, $out) = #_;
for my $key (keys %$in) {
my $value = $in->{$key};
if ( defined $value && ref $value eq 'HASH' ) {
flatten($value, $out);
}
else {
$out->{$key} = $value;
}
}
}
Alternatively, good modern Perl style is to use CPAN wherever possible. Data::Traverse would do what you need:
use Data::Traverse;
sub flatten {
my %hash = #_;
my %flattened;
traverse { $flattened{$a} = $b } \%hash;
return %flattened;
}
As a final note, it is usually more efficient to pass hashes by reference to avoid them being expanded out into lists and then turned into hashes again.
First, I would use perl -c to make sure it compiles cleanly, which it does not. So, I'd add a trailing } to make it compile.
Then, I'd run it through perltidy to improve the code layout (indentation, etc.).
Then, I'd run perlcritic (in "harsh" mode) to automatically tell me what it thinks are bad practices. It complains that:
Subroutine does not end with "return"
Update: the OP essentially changed every line of code after I posted my Answer above, but I believe it still applies. It's not easy shooting at a moving target :)
There are a few problems with your approach that you need to figure out. First off, what happens in the event that there are two leaf nodes with the same key? Does the second clobber the first, is the second ignored, should the output contain a list of them? Here is one approach. First we construct a flat list of key value pairs using a recursive function to deal with other hash depths:
my %data = (
foo => {bar => {baz => 'hello'}},
fizz => {buzz => {bing => 'world'}},
fad => {bad => {baz => 'clobber'}},
);
sub flatten {
my $hash = shift;
map {
my $value = $$hash{$_};
ref $value eq 'HASH'
? flatten($value)
: ($_ => $value)
} keys %$hash
}
print join( ", " => flatten \%data), "\n";
# baz, clobber, bing, world, baz, hello
my %flat = flatten \%data;
print join( ", " => %flat ), "\n";
# baz, hello, bing, world # lost (baz => clobber)
A fix could be something like this, which will create a hash of array refs containing all the values:
sub merge {
my %out;
while (#_) {
my ($key, $value) = splice #_, 0, 2;
push #{ $out{$key} }, $value
}
%out
}
my %better_flat = merge flatten \%data;
In production code, it would be faster to pass references between the functions, but I have omitted that here for clarity.
Is it your intent to end up with a copy of the original hash or just a reordered result?
Your code starts with one hash (the original hash that is used by reference) and makes two copies %i and %hash.
The statement my %i=%{hashref} is not necessary. You are copying the entire hash to a new hash. In either case (whether you want a copy of not) you can use references to the original hash.
You are also losing data if your hash in the hash has the same value as the parent hash. Is this intended?

Traversing a multi-dimensional hash in Perl

If you have a hash (or reference to a hash) in perl with many dimensions and you want to iterate across all values, what's the best way to do it. In other words, if we have
$f->{$x}{$y}, I want something like
foreach ($x, $y) (deep_keys %{$f})
{
}
instead of
foreach $x (keys %f)
{
foreach $y (keys %{$f->{$x})
{
}
}
Stage one: don't reinvent the wheel :)
A quick search on CPAN throws up the incredibly useful Data::Walk. Define a subroutine to process each node, and you're sorted
use Data::Walk;
my $data = { # some complex hash/array mess };
sub process {
print "current node $_\n";
}
walk \&process, $data;
And Bob's your uncle. Note that if you want to pass it a hash to walk, you'll need to pass a reference to it (see perldoc perlref), as follows (otherwise it'll try and process your hash keys as well!):
walk \&process, \%hash;
For a more comprehensive solution (but harder to find at first glance in CPAN), use Data::Visitor::Callback or its parent module - this has the advantage of giving you finer control of what you do, and (just for extra street cred) is written using Moose.
Here's an option. This works for arbitrarily deep hashes:
sub deep_keys_foreach
{
my ($hashref, $code, $args) = #_;
while (my ($k, $v) = each(%$hashref)) {
my #newargs = defined($args) ? #$args : ();
push(#newargs, $k);
if (ref($v) eq 'HASH') {
deep_keys_foreach($v, $code, \#newargs);
}
else {
$code->(#newargs);
}
}
}
deep_keys_foreach($f, sub {
my ($k1, $k2) = #_;
print "inside deep_keys, k1=$k1, k2=$k2\n";
});
This sounds to me as if Data::Diver or Data::Visitor are good approaches for you.
Keep in mind that Perl lists and hashes do not have dimensions and so cannot be multidimensional. What you can have is a hash item that is set to reference another hash or list. This can be used to create fake multidimensional structures.
Once you realize this, things become easy. For example:
sub f($) {
my $x = shift;
if( ref $x eq 'HASH' ) {
foreach( values %$x ) {
f($_);
}
} elsif( ref $x eq 'ARRAY' ) {
foreach( #$x ) {
f($_);
}
}
}
Add whatever else needs to be done besides traversing the structure, of course.
One nifty way to do what you need is to pass a code reference to be called from inside f. By using sub prototyping you could even make the calls look like Perl's grep and map functions.
You can also fudge multi-dimensional arrays if you always have all of the key values, or you just don't need to access the individual levels as separate arrays:
$arr{"foo",1} = "one";
$arr{"bar",2} = "two";
while(($key, $value) = each(%arr))
{
#keyValues = split($;, $key);
print "key = [", join(",", #keyValues), "] : value = [", $value, "]\n";
}
This uses the subscript separator "$;" as the separator for multiple values in the key.
There's no way to get the semantics you describe because foreach iterates over a list one element at a time. You'd have to have deep_keys return a LoL (list of lists) instead. Even that doesn't work in the general case of an arbitrary data structure. There could be varying levels of sub-hashes, some of the levels could be ARRAY refs, etc.
The Perlish way of doing this would be to write a function that can walk an arbitrary data structure and apply a callback at each "leaf" (that is, non-reference value). bmdhacks' answer is a starting point. The exact function would vary depending one what you wanted to do at each level. It's pretty straightforward if all you care about is the leaf values. Things get more complicated if you care about the keys, indices, etc. that got you to the leaf.
It's easy enough if all you want to do is operate on values, but if you want to operate on keys, you need specifications of how levels will be recoverable.
a. For instance, you could specify keys as "$level1_key.$level2_key.$level3_key"--or any separator, representing the levels.
b. Or you could have a list of keys.
I recommend the latter.
Level can be understood by #$key_stack
and the most local key is $key_stack->[-1].
The path can be reconstructed by: join( '.', #$key\_stack )
Code:
use constant EMPTY_ARRAY => [];
use strict;
use Scalar::Util qw<reftype>;
sub deep_keys (\%) {
sub deeper_keys {
my ( $key_ref, $hash_ref ) = #_;
return [ $key_ref, $hash_ref ] if reftype( $hash_ref ) ne 'HASH';
my #results;
while ( my ( $key, $value ) = each %$hash_ref ) {
my $k = [ #{ $key_ref || EMPTY_ARRAY }, $key ];
push #results, deeper_keys( $k, $value );
}
return #results;
}
return deeper_keys( undef, shift );
}
foreach my $kv_pair ( deep_keys %$f ) {
my ( $key_stack, $value ) = #_;
...
}
This has been tested in Perl 5.10.
If you are working with tree data going more than two levels deep, and you find yourself wanting to walk that tree, you should first consider that you are going to make a lot of extra work for yourself if you plan on reimplementing everything you need to do manually on hashes of hashes of hashes when there are a lot of good alternatives available (search CPAN for "Tree").
Not knowing what your data requirements actually are, I'm going to blindly point you at a tutorial for Tree::DAG_Node to get you started.
That said, Axeman is correct, a hashwalk is most easily done with recursion. Here's an example to get you started if you feel you absolutely must solve your problem with hashes of hashes of hashes:
#!/usr/bin/perl
use strict;
use warnings;
my %hash = (
"toplevel-1" =>
{
"sublevel1a" => "value-1a",
"sublevel1b" => "value-1b"
},
"toplevel-2" =>
{
"sublevel1c" =>
{
"value-1c.1" => "replacement-1c.1",
"value-1c.2" => "replacement-1c.2"
},
"sublevel1d" => "value-1d"
}
);
hashwalk( \%hash );
sub hashwalk
{
my ($element) = #_;
if( ref($element) =~ /HASH/ )
{
foreach my $key (keys %$element)
{
print $key," => \n";
hashwalk($$element{$key});
}
}
else
{
print $element,"\n";
}
}
It will output:
toplevel-2 =>
sublevel1d =>
value-1d
sublevel1c =>
value-1c.2 =>
replacement-1c.2
value-1c.1 =>
replacement-1c.1
toplevel-1 =>
sublevel1a =>
value-1a
sublevel1b =>
value-1b
Note that you CAN NOT predict in what order the hash elements will be traversed unless you tie the hash via Tie::IxHash or similar — again, if you're going to go through that much work, I recommend a tree module.