Confusion with checking Hash of Hash of Arrays - perl

I am trying to compare my hash input to valid allowed options in my data structure, and if it's not one of the options then I set the default value for the key. I seem to be missing something here though.
Example of current data structure..
my $opts = {
file => { require => 1 },
head => {
default => 1,
allowed => [0,1],
},
type => {
default => 'foo',
allowed => [qw(foo bar baz)]
},
};
$args is my hash ref ( file => 'file.txt', type => 'foo', head => 1 )
Snippet of what I've tried..
for my $k ( keys %$opts ) {
croak("Argument '$k' is required in constructor call!")
if $opts->{$k}->{require} and !exists $args->{$k};
if (exists $args->{$k}) {
if (grep {!$args->{$k}} #{$opts->{$k}->{allowed}} ) {
$args->{$k} = $opts->{$k}->{default};
}
...
} else {
..set our defaults
$args->{$k} = $opts->{$k}->{default};
}
}

The checking for allowed values is faulty.
The grep function takes a code block and a list. It sets the $_ variable to each element in the list in turn. If the block returns a true value, the element is kept. In scalar context, grep does not return a list of kept elements, but a count.
Your grep block is {!$args->{$k}}. This returns true when $args->{$k} is false and vice versa. The result does not depend on $_, and therefore doesn't check if the argument is one of the allowed values.
To see if the given value is allowed value, you'll have to test for some form of equivalence, e.g.
if (grep { $args->{$k} eq $_ } #{ $opts->{$k}{allowed} }) {
# this is executed when the arg matches an allowed value
} else {
# the arg is not allowed
}
An Excursion To Smart Matching and List::MoreUtils
If you can use a perl > v10, then smart matching is available. This would express above condition as
use 5.010;
$args->{$k} ~~ $opts->{$k}{allowed}
The lengthy table of possible type combinations states that this is roughly equivalent to the grep if the arg is a scalar (string/number), and the allowed arrayref holds only normal scalars as well.
However, smart matching was re-marked as experimantal in v18, and behaviour will likely change soon.
In the meantime, it might be better to stick to explicit grep etc. But we could implement two improvements:
The grep will test all elements, even when a match was already found. This can be inefficient. The first function from List::Util core module has the same syntax as grep, but stops after the first element. If the block matches a value, this value is returned. If no value matches, it returns undef. This makes things complicated when undef might be a valid value, or even when false values may be allowed. But in your case, the grep could be replaced by
use List::Util 'first';
defined first { $_ eq $args->{$k} } #{ $opts->{$k}{allowed} }
The List::MoreUtils module has even more functionality. It provides for example the any function, which corresponds to the mathematical ∃ (there exists) quantifier:
use List::MoreUtils 'any';
any { $_ eq $args->{$k} } #{ $opts->{$k}{allowed} }
This only returns a boolean value. While it may not be as efficient as a plain grep or first, using any is quite self-documenting, and easier to use.
Until now, I have assumed that we'll only ever do string comparision to the allowed values. This sometimes works, but it would be better to specify an explicit mode. For example
croak qq(Value for "$k": "$args->{$k}" not allowed) unless
$opts->{$k}{mode} eq 'str' and any { $args->{$k} eq $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'like' and any { $args->{$k} =~ $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'num' and any { $args->{$k} == $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'smart' and any { $args->{$k} ~~ $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'code' and any { $args->{$k}->($_) } #{ $opts->{$k}{allowed} };
Preventing unknown options
You may or may not want to forbid unknown options in your $args hash. Especially if you consider composability of classes, you may want to ignore unknown options, as a superclass or subclass may need these.
But if you choose to check for wrong options, you could delete those elements you already handled:
my $self = {};
for my $k (keys %$opts) {
my $v = delete $args->{$k};
...; # use $v in the rest of the loop
$self->{$k} = $v;
}
croak "Unknown arguments (" . (join ", ", keys %$args) . ") are forbidden" if keys %$args;
or grep for unknown args:
my #unknown = grep { not exists $opts->{$_} } keys %$args;
croak "Unknown arguments (" . (join ", ", #unknown) . ") are forbidden" if #unknown;
for my $k (keys %$opts) {
...;
}
or you could loop over the combined keys of $args and $opts:
use List::Util 'uniq';
for my $k (uniq keys(%$opts), keys(%$args)) {
croak "Unknown argument $k" unless exists $opts->{$k};
...;
}
Scalar Context
I have assumed that you correctly initialized $args as a hash reference:
my $args = { file => 'file.txt', type => 'foo', head => 1 };
Using parens instead of curlies is syntactically valid:
my $args = ( file => 'file.txt', type => 'foo', head => 1 );
but this doesn't produce a hash. Instead, the => and , behave like the comma operator in C: the left operand is evaluated and discarded. That is, only the last element is kept:
my $args = 1; # equivalent to above snippet.

Related

Convert hashref to array of kv pairs

Here's my initial code:
sub my_sub {
my $hash = {
age => 5, # default value for "age" key
#_ # the rest of the "hash" as an array
};
...
}
#used like so:
my_sub("age" => 42, ...);
But I'd like to also support taking in a hashref in addition to an array. So I've tried:
sub my_sub {
my $hash = {
age => 5, # default value for "age" key
ref(#_) eq "" ? #_ : %{$_[0]}
};
...
}
If you call it with an array like before, the ref(#_) eq "" check would be true and the conditional would evaluate to #_ like before. But I can't for the life of me get the false case to work. Currently, it says there are an odd number of elements in the anonymous hash. But when I print out the value of %{$_[0]} I see the expected flattened array of (age, 42, ...) - in other words it looks right in the context of printing it out, but it barfs when it's in the conditional, and I have no idea why. Halp.
I'm not sure why you'd want to use a hash ref in the caller since it just makes the call noisier. But you could use
sub my_sub {
my %args = #_ == 1 ? %{ $_[0] } : #_;
$args{ age } //= 5;
...
}
or
sub my_sub {
my $args = #_ == 1 ? $_[0] : { #_ };
$args->{ age } //= 5; # Warning: modifies caller
...
}
On the surface, the second is faster when provided a hash ref (since it doesn't build a new hash). But, in practice, a new hash is created in the caller each time, cancelling all benefits. On the plus side, that also means there's usually no consequences to modifying the hash (the issue identified by comment in the snippet).

Perl: How to grep with Hash Slices?

I am trying to sort out defined parameters from a complex hash, using hash slices. Hash slices are great because they avoid lots of foreach and if defined so they simplify syntax greatly.
However, I'm clearly not doing this correctly:
use DDP;
my %hash = (
'a' => # first patient
{
'age' => 9,
'BMI' => 20
},
'b' =>
{
'age' => 8
}
);
my %defined_patients = grep {defined $hash{$_}{'age', 'BMI'}} keys %hash;
p %defined_patients;
The above code gives an empty hash, when I want it to return just patient a.
This question is similar to "no autovivication" pragma fails with grep in Perl and I've based my code on it.
I've also tried
my #defined_patients = grep {defined $hash{$_}{'age', 'BMI'}} keys %hash;
but that doesn't work either.
How can I use hash slices to grep patients with the defined keys?
If you want to check that none of the target keys are undefined, you have to check separately. This greps for undefined values:
grep { ! defined } #{ $hash{$_} }{ qw(age BMI) }
Notice that the hash slice
#{ $hash{$_} }{ qw(age BMI) }
In v5.24, you can use postfix dereferencing instead:
$hash{$_}->#{ qw(age BMI) }
But that grep has to fit in another one. Since you want the cases where all values are defined, you have to negate the result of the inner grep:
my #patients =
grep { ! grep { ! defined } $hash{$_}->#{ qw(age BMI) } }
keys %hash;
That's pretty ugly though. I'd probably do something simpler in a subroutine. This way you can handle any number of keys easily:
sub some_patients {
my( $hash, $keys ) = #_;
my #patient_keys;
foreach my $key ( keys %$hash ) {
next unless grep { ! defined } $hash{$key}->#{ #$keys };
push $key, #patient_keys;
}
return #patient_keys;
}
Now I simply call a subroutine instead of grokking multilevel greps:
my #patient_keys = some_patients( \%patients, [ qw(age BMI) ] );
Or, for something more targeted, maybe something like this that tests a particular sub-hash instead of the whole data structure:
sub has_defined_keys {
my( $hash, $keys ) = #_;
! grep { ! defined } $hash->#{#$keys}
}
my #target-keys = ...;
my #keys = grep {
has_defined_keys( $patients{$_}, \#target-keys )
} keys %patients;
Either way, when things start getting a bit too complex, use a subroutine to give those things names so you can hide the code in favor of something short.

Return boolean value after iterating through an array

I'm trying to see if keyword_objects has an element {name=>'CRASH', status=>'+'}.
# $bug is a reference to the below data
{
'keyword_objects' => [
bless( { 'name' => 'CRASH', 'status' => '+'}, 'SomeModule::SomeFilename' ),
bless( { 'name' => 'CUSTOMER', 'status' => '-' }, 'SomeModule::SomeFilename' ) ],
'category' => 'error'
}
I couldn't find something like filter in another language so my alternative was using map.
my #isCrash = map { $_->name eq 'CRASH' && $_->status eq '+' } #{ $bug->{keyword_objects} };
The problem with this is that, when there is no such keyword, every time the operation is done, it seems to return an empty value. #isCrash becomes an array of multiple empty values and if(#isCrash) becomes useless. I surely can introduce a new variable which can be changed from the map operation but I feel like there should be a better way to do it. Someone please chime in and share your knowledge.
In Perl (and, indeed, in Unix in general), filter is spelled grep.
my $is_crash = grep { $_->name eq 'CRASH' and $_->status eq '+' }
#{ $bug->{keyword_objects} };
From perldoc -f grep:
grep BLOCK LIST
grep EXPR,LIST
[ ... snip ...]
Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value consisting of those elements for which the expression evaluated to true. In scalar context, returns the number of times the expression was true.
So $is_crash will contain the number of elements in your input list where the expression is true. And that can be used as a Boolean value (as zero is false and all other integers are true).
Your code mapped every entry to scalar value of logical expression (true of false).
Try the code below to map it to entry itself for match and empty list for no match.
Sum of multiple empty lists is empty list.
my #isCrash = map { $_->name eq 'CRASH' && $_->status eq '+' ? ($_) : () } #{ $bug->{keyword_objects} };
Alternative approach is to use grep:
# get list of "crashed" elements
my #CrashedList = grep { $_->name eq 'CRASH' && $_->status eq '+'} #{ $bug->{keyword_objects} };
# get number of "crashed" elements in scalar context
my $numberOfCrashes = grep { $_->name eq 'CRASH' && $_->status eq '+'} #{ $bug->{keyword_objects} };

Perl map block local variable usage

This code compiles a set by way of hash keys of the unique basename stubs in a set of paths.
%stubs = map { $f=basename $_; $f =~ /^([A-Za-z]+[0-9]+)\./ ; $1=>() } #pathlist;
Why do I need the $f references here? I thought I'd be ok with:
%stubs = map { basename; /^([A-Za-z]+[0-9]+)\./; $1=>() } #pathlist;
But I get no match. Am I not permitted to modify $_ in the map block?
For those wondering what the code is doing:
For each $path (#pathlist), it's getting the basename, matching the first letter-number sequence, and then returning the first bracket match as the key on an empty list value. Example:
/some/dir/foo123.adfjijoijb
/some/dir/foo123.oibhobihe
/some/dir/bar789.popjpoj
returns
foo123 => ()
bar789 => ()
After which I use the keys of the map as the set of values so process.
basename does not default to acting on $_. But you can match against its return value instead of using $f:
%stubs = map { basename($_) =~ /^([A-Za-z]+[0-9]+)\./; $1 => undef } #pathlist;
Note that () in a list doesn't produce an element, it just flattens to nothing; you have to provide a value, even if only undef. With $1 => (), map iterations would alternate producing a key and a value for %stubs.
It's good to always check that your regex succeed before using $1:
%stubs = map { basename($_) =~ /^([A-Za-z]+[0-9]+)\./ ? ($1 => undef) : () } #pathlist;
though if you don't mind the hash values being the empty string instead of undef, you can just make the regex match return the desired list:
%stubs = map { basename($_) =~ /^([A-Za-z]+[0-9]+)()\./ } #pathlist;
In map and grep, $_ is an alias for the values in the array. If you modify them, you actually modify the values in the array. This is probably not what you want and probably what is going wrong, but to debug print keys %stubs and #pathlist afterwards in both cases and let us know what it says.
Also: File::Basename's basename does not implicitly work on $_. It generates an error for me.
#!/usr/bin/perl
use feature say;
use File::Basename;
#pathlist=("/some/dir/foo123.adfjijoijb","/some/dir/foo123.oibhobihe","/some/dir/bar789.popjpoj");
%stubs1 = map { $f=basename $_; $f =~ /^([A-Za-z]+[0-9]+)\./ ; $1=>() } #pathlist;
say join(',',keys %stubs1);
say "---";
say join(',',#pathlist);
say "---";
%stubs = map { $_=basename $_; /^([A-Za-z]+[0-9]+)\./; $1=>() } #pathlist;
say join(',',keys %stubs);
say "---";
say join(',',#pathlist);
say "---";
%stubs = map {basename; /^([A-Za-z]+[0-9]+)\./; $1=>() } #pathlist;
Alternate implementation:
my %stubs =
map { $_ => undef }
map { basename($_) =~ /^([A-Za-z]+[0-9]+)\./ }
#pathlist;

Perl, check if pair exists in hash of hashes

In Perl, I have a hash of hashes created with a loop similar to the following
my %HoH
for my $i (1..10) {
$HoH{$a}{$b} = $i;
}
$a and $b are variables that do have some value when the HoH gets filled in. After creating the HoH, how can I check if a particular pair ($c, $d) exists in the HoH? The following does not work
if (defined $HoH{$c}{$d}) {...}
because if $c does not exist in HoH already, it will be created as a key without a value.
Writing
if (defined $HoH{$c}{$d}) {...}
will "work" insomuch as it will tell you whether or not $HoH{$c}{$d} has a defined value. The problem is that if $HoH{$c} doesn't already exist it will be created (with an appropriate value) so that $HoH{$c}{$d} can be tested. This process is called "autovivification." It's convenient when setting values, e.g.
my %hoh;
$hoh{a}{b} = 1; # Don't need to set '$hoh{a} = {}' first
but inconvenient when retrieving possibly non-existent values. I wish that Perl was smart enough to only perform autovivification for expressions used as lvalues and short-circuit to return undef for rvalues but, alas, it's not that magical. The autovivification pragma (available on CPAN) adds the functionality to do this.
To avoid autovivification you need to test the intermediate values first:
if (exists $HoH{$c} && defined $HoH{$c}{$d}) {
...
}
use Data::Dumper;
my %HoH;
$HoH{A}{B} = 1;
if(exists $HoH{C} && exists $HoH{C}{D}) {
print "exists\n";
}
print Dumper(\%HoH);
if(exists $HoH{C}{D}) {
print "exists\n";
}
print Dumper(\%HoH);
Output:
$VAR1 = {
'A' => {
'B' => 1
}
};
$VAR1 = {
'A' => {
'B' => 1
},
'C' => {}
};
Autovivification is causing the keys to be created. "exists" in my second example shows this so the first example checks both keys individually.
Several ways:
if ( $HoH{$c} && defined $HoH{$c}{$d} ) {...}
or
if ( defined ${ $HoH{$c} || {} }{$d} ) {...}
or
no autovivification;
if (defined $HoH{$c}{$d}) {...}
or
use Data::Diver;
if ( defined Data::Diver::Dive( \%HoH, $c, $d ) ) {...}
You have to use the exists function
exists EXPR
Given an expression that specifies an
element of a hash, returns true if the
specified element in the hash has ever
been initialized, even if the
corresponding value is undefined.
Note that the EXPR can be arbitrarily
complicated as long as the final
operation is a hash or array key
lookup or subroutine name:
if (exists $ref->{A}->{B}->{$key}) { }
if (exists $hash{A}{B}{$key}) { }
My take:
use List::Util qw<first>;
use Params::Util qw<_HASH>;
sub exists_deep (\[%$]#) {
my $ref = shift;
return unless my $h = _HASH( $ref ) // _HASH( $$ref )
and defined( my $last_key = pop )
;
# Note that this *must* be a hash ref, for anything else to make sense.
return if first { !( $h = _HASH( $h->{ $_ } )) } #_;
return exists $h->{ $last_key };
}
You could also do this recursively. You could also create a descent structure allowing intermediate and even terminal arrayref with just a little additional coding.