Perl map block local variable usage - perl

This code compiles a set by way of hash keys of the unique basename stubs in a set of paths.
%stubs = map { $f=basename $_; $f =~ /^([A-Za-z]+[0-9]+)\./ ; $1=>() } #pathlist;
Why do I need the $f references here? I thought I'd be ok with:
%stubs = map { basename; /^([A-Za-z]+[0-9]+)\./; $1=>() } #pathlist;
But I get no match. Am I not permitted to modify $_ in the map block?
For those wondering what the code is doing:
For each $path (#pathlist), it's getting the basename, matching the first letter-number sequence, and then returning the first bracket match as the key on an empty list value. Example:
/some/dir/foo123.adfjijoijb
/some/dir/foo123.oibhobihe
/some/dir/bar789.popjpoj
returns
foo123 => ()
bar789 => ()
After which I use the keys of the map as the set of values so process.

basename does not default to acting on $_. But you can match against its return value instead of using $f:
%stubs = map { basename($_) =~ /^([A-Za-z]+[0-9]+)\./; $1 => undef } #pathlist;
Note that () in a list doesn't produce an element, it just flattens to nothing; you have to provide a value, even if only undef. With $1 => (), map iterations would alternate producing a key and a value for %stubs.
It's good to always check that your regex succeed before using $1:
%stubs = map { basename($_) =~ /^([A-Za-z]+[0-9]+)\./ ? ($1 => undef) : () } #pathlist;
though if you don't mind the hash values being the empty string instead of undef, you can just make the regex match return the desired list:
%stubs = map { basename($_) =~ /^([A-Za-z]+[0-9]+)()\./ } #pathlist;

In map and grep, $_ is an alias for the values in the array. If you modify them, you actually modify the values in the array. This is probably not what you want and probably what is going wrong, but to debug print keys %stubs and #pathlist afterwards in both cases and let us know what it says.
Also: File::Basename's basename does not implicitly work on $_. It generates an error for me.
#!/usr/bin/perl
use feature say;
use File::Basename;
#pathlist=("/some/dir/foo123.adfjijoijb","/some/dir/foo123.oibhobihe","/some/dir/bar789.popjpoj");
%stubs1 = map { $f=basename $_; $f =~ /^([A-Za-z]+[0-9]+)\./ ; $1=>() } #pathlist;
say join(',',keys %stubs1);
say "---";
say join(',',#pathlist);
say "---";
%stubs = map { $_=basename $_; /^([A-Za-z]+[0-9]+)\./; $1=>() } #pathlist;
say join(',',keys %stubs);
say "---";
say join(',',#pathlist);
say "---";
%stubs = map {basename; /^([A-Za-z]+[0-9]+)\./; $1=>() } #pathlist;

Alternate implementation:
my %stubs =
map { $_ => undef }
map { basename($_) =~ /^([A-Za-z]+[0-9]+)\./ }
#pathlist;

Related

String Parsing for nested parenthesis in perl

The issue is when I try to compare the input to the output file, i am unable to handle the nesting of the parenthesis, and the complexity needs to be very low. is there a parsing module for this? compatible to 5.8.4. I found modules but they needed at least 5.10.:(
Input
(K1=V1,K2=V2,K3=V3(K2=V2.K5=V5)K6=V6(K7=V7,K8=V8(K9=V9,K10=V10)K11=V11)K12=V12,K13=V13)
OUTPUT FILE
(K0=V0,K1=V1,K2=V2,K3=V3(K1=V1,K2=V2,K4=V4,K5=V5,K14=V14),K15=V15,K6=V6(K18=V18,K7=V7,K19=V19,K8=V8(K20=V20,K9=V9,K16=V16,K10=V10,K21=V21)K11=V11)K12=V12,K13=V13,K22=V22)
I need to pick up each key value pair from input and one by one verify from the output file that the value is the same. if not
I need to store the key with the existing value.( The issue is with the nesting )
INPUT
K3=V3(K2=V2,K5=V5)
OUTPUT
K3=V3(K1=V1,K2=V2,K4=V4,K5=V5,K14=V14)
The issue is that "K2=V2" inside the V3 value is to be checked inside the V3 value in the output file. So I cannot just use a regular expression to do that as K2=V2 may appear outside the V3 parenthesis too.
I was trying to create a hash of a hash of a hash but failed. could someone suggest a way I could achieve this?
The following code builds the hash of hashes. Note that values (V3) are lost if they contain an inner hash.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub to_hash {
my $string = shift;
$string =~ s/^\( | \)$//gx; # Remove the outer parentheses.
my #stack = {};
my #keys;
while (length $string) {
$string =~ s/^([^,=()]+) = ([^(),]*)//x or die $string;
my ($key, $value) = ($1, $2);
$stack[-1]{$key} = $value;
next if $string =~ s/^,//;
if ($string =~ s/^\(//) {
push #stack, {};
push #keys, $key;
} elsif ($string =~ s/^\),?//) {
my $last = pop #stack;
$stack[-1]{ pop #keys } = $last;
}
}
return $stack[0]
}
my $input = '(K1=V1,K2=V2,K3=V3(K2=V2,K5=V5)K6=V6(K7=V7,K8=V8(K9=V9,K10=V10)K11=V11)K12=V12,K13=V13)';
print Dumper to_hash($input);
Output
$VAR1 = {
'K2' => 'V2',
'K13' => 'V13',
'K6' => {
'K7' => 'V7',
'K8' => {
'K9' => 'V9',
'K10' => 'V10'
},
'K11' => 'V11'
},
'K3' => {
'K2' => 'V2',
'K5' => 'V5'
},
'K12' => 'V12',
'K1' => 'V1'
};
Nested parens either suggests an application of Text::Balanced and its extract_bracketed function, or building yourself a little parser subclass on Parser::MGC. Using the latter to build a little "convert string into data structure" parser is usually pretty straightforward for simple examples like this.

Confusion with checking Hash of Hash of Arrays

I am trying to compare my hash input to valid allowed options in my data structure, and if it's not one of the options then I set the default value for the key. I seem to be missing something here though.
Example of current data structure..
my $opts = {
file => { require => 1 },
head => {
default => 1,
allowed => [0,1],
},
type => {
default => 'foo',
allowed => [qw(foo bar baz)]
},
};
$args is my hash ref ( file => 'file.txt', type => 'foo', head => 1 )
Snippet of what I've tried..
for my $k ( keys %$opts ) {
croak("Argument '$k' is required in constructor call!")
if $opts->{$k}->{require} and !exists $args->{$k};
if (exists $args->{$k}) {
if (grep {!$args->{$k}} #{$opts->{$k}->{allowed}} ) {
$args->{$k} = $opts->{$k}->{default};
}
...
} else {
..set our defaults
$args->{$k} = $opts->{$k}->{default};
}
}
The checking for allowed values is faulty.
The grep function takes a code block and a list. It sets the $_ variable to each element in the list in turn. If the block returns a true value, the element is kept. In scalar context, grep does not return a list of kept elements, but a count.
Your grep block is {!$args->{$k}}. This returns true when $args->{$k} is false and vice versa. The result does not depend on $_, and therefore doesn't check if the argument is one of the allowed values.
To see if the given value is allowed value, you'll have to test for some form of equivalence, e.g.
if (grep { $args->{$k} eq $_ } #{ $opts->{$k}{allowed} }) {
# this is executed when the arg matches an allowed value
} else {
# the arg is not allowed
}
An Excursion To Smart Matching and List::MoreUtils
If you can use a perl > v10, then smart matching is available. This would express above condition as
use 5.010;
$args->{$k} ~~ $opts->{$k}{allowed}
The lengthy table of possible type combinations states that this is roughly equivalent to the grep if the arg is a scalar (string/number), and the allowed arrayref holds only normal scalars as well.
However, smart matching was re-marked as experimantal in v18, and behaviour will likely change soon.
In the meantime, it might be better to stick to explicit grep etc. But we could implement two improvements:
The grep will test all elements, even when a match was already found. This can be inefficient. The first function from List::Util core module has the same syntax as grep, but stops after the first element. If the block matches a value, this value is returned. If no value matches, it returns undef. This makes things complicated when undef might be a valid value, or even when false values may be allowed. But in your case, the grep could be replaced by
use List::Util 'first';
defined first { $_ eq $args->{$k} } #{ $opts->{$k}{allowed} }
The List::MoreUtils module has even more functionality. It provides for example the any function, which corresponds to the mathematical ∃ (there exists) quantifier:
use List::MoreUtils 'any';
any { $_ eq $args->{$k} } #{ $opts->{$k}{allowed} }
This only returns a boolean value. While it may not be as efficient as a plain grep or first, using any is quite self-documenting, and easier to use.
Until now, I have assumed that we'll only ever do string comparision to the allowed values. This sometimes works, but it would be better to specify an explicit mode. For example
croak qq(Value for "$k": "$args->{$k}" not allowed) unless
$opts->{$k}{mode} eq 'str' and any { $args->{$k} eq $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'like' and any { $args->{$k} =~ $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'num' and any { $args->{$k} == $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'smart' and any { $args->{$k} ~~ $_ } #{ $opts->{$k}{allowed} }
or $opts->{$k}{mode} eq 'code' and any { $args->{$k}->($_) } #{ $opts->{$k}{allowed} };
Preventing unknown options
You may or may not want to forbid unknown options in your $args hash. Especially if you consider composability of classes, you may want to ignore unknown options, as a superclass or subclass may need these.
But if you choose to check for wrong options, you could delete those elements you already handled:
my $self = {};
for my $k (keys %$opts) {
my $v = delete $args->{$k};
...; # use $v in the rest of the loop
$self->{$k} = $v;
}
croak "Unknown arguments (" . (join ", ", keys %$args) . ") are forbidden" if keys %$args;
or grep for unknown args:
my #unknown = grep { not exists $opts->{$_} } keys %$args;
croak "Unknown arguments (" . (join ", ", #unknown) . ") are forbidden" if #unknown;
for my $k (keys %$opts) {
...;
}
or you could loop over the combined keys of $args and $opts:
use List::Util 'uniq';
for my $k (uniq keys(%$opts), keys(%$args)) {
croak "Unknown argument $k" unless exists $opts->{$k};
...;
}
Scalar Context
I have assumed that you correctly initialized $args as a hash reference:
my $args = { file => 'file.txt', type => 'foo', head => 1 };
Using parens instead of curlies is syntactically valid:
my $args = ( file => 'file.txt', type => 'foo', head => 1 );
but this doesn't produce a hash. Instead, the => and , behave like the comma operator in C: the left operand is evaluated and discarded. That is, only the last element is kept:
my $args = 1; # equivalent to above snippet.

Copy array to hash

I'm trying to copy an array to a hash, such that each element of the array is a key, followed by an empty value.
my %questions = map { #u_list => $_ } #u_list;
This only prints out
=>
I see on perldoc this idiom:
%hash = map { get_a_key_for($_) => $_ } #array;
But I cannot figure out how to set the keys. I want the keys to be each element in the array.
Super confusing but functional answer:
#questions{#u_list}=();
This is using the hash slice syntax to specify a set of hash keys..
my %questions = map { $_ => undef } #u_list;
In the map, each element of #u_list gets set to $_.
%hash = map { $_ => '' } #array;
This sets the values to an empty string
$_ is the current element of your list #u_list.
So you have to say
my %questions = map { $_ => 1 } #u_list;
to map your list elements as hash keys.
Here are a few different ways to do this, just for reference.
Using map
my %questions = map { $_, undef } #u_list;
Using a foreach
my %questions;
$questions{$_} = undef foreach ( #u_list );
Using a hash slice.
my %questions;
#questions{#u_list} = (undef) x #u_list;

Perl, check if pair exists in hash of hashes

In Perl, I have a hash of hashes created with a loop similar to the following
my %HoH
for my $i (1..10) {
$HoH{$a}{$b} = $i;
}
$a and $b are variables that do have some value when the HoH gets filled in. After creating the HoH, how can I check if a particular pair ($c, $d) exists in the HoH? The following does not work
if (defined $HoH{$c}{$d}) {...}
because if $c does not exist in HoH already, it will be created as a key without a value.
Writing
if (defined $HoH{$c}{$d}) {...}
will "work" insomuch as it will tell you whether or not $HoH{$c}{$d} has a defined value. The problem is that if $HoH{$c} doesn't already exist it will be created (with an appropriate value) so that $HoH{$c}{$d} can be tested. This process is called "autovivification." It's convenient when setting values, e.g.
my %hoh;
$hoh{a}{b} = 1; # Don't need to set '$hoh{a} = {}' first
but inconvenient when retrieving possibly non-existent values. I wish that Perl was smart enough to only perform autovivification for expressions used as lvalues and short-circuit to return undef for rvalues but, alas, it's not that magical. The autovivification pragma (available on CPAN) adds the functionality to do this.
To avoid autovivification you need to test the intermediate values first:
if (exists $HoH{$c} && defined $HoH{$c}{$d}) {
...
}
use Data::Dumper;
my %HoH;
$HoH{A}{B} = 1;
if(exists $HoH{C} && exists $HoH{C}{D}) {
print "exists\n";
}
print Dumper(\%HoH);
if(exists $HoH{C}{D}) {
print "exists\n";
}
print Dumper(\%HoH);
Output:
$VAR1 = {
'A' => {
'B' => 1
}
};
$VAR1 = {
'A' => {
'B' => 1
},
'C' => {}
};
Autovivification is causing the keys to be created. "exists" in my second example shows this so the first example checks both keys individually.
Several ways:
if ( $HoH{$c} && defined $HoH{$c}{$d} ) {...}
or
if ( defined ${ $HoH{$c} || {} }{$d} ) {...}
or
no autovivification;
if (defined $HoH{$c}{$d}) {...}
or
use Data::Diver;
if ( defined Data::Diver::Dive( \%HoH, $c, $d ) ) {...}
You have to use the exists function
exists EXPR
Given an expression that specifies an
element of a hash, returns true if the
specified element in the hash has ever
been initialized, even if the
corresponding value is undefined.
Note that the EXPR can be arbitrarily
complicated as long as the final
operation is a hash or array key
lookup or subroutine name:
if (exists $ref->{A}->{B}->{$key}) { }
if (exists $hash{A}{B}{$key}) { }
My take:
use List::Util qw<first>;
use Params::Util qw<_HASH>;
sub exists_deep (\[%$]#) {
my $ref = shift;
return unless my $h = _HASH( $ref ) // _HASH( $$ref )
and defined( my $last_key = pop )
;
# Note that this *must* be a hash ref, for anything else to make sense.
return if first { !( $h = _HASH( $h->{ $_ } )) } #_;
return exists $h->{ $last_key };
}
You could also do this recursively. You could also create a descent structure allowing intermediate and even terminal arrayref with just a little additional coding.

How can I cleanly turn a nested Perl hash into a non-nested one?

Assume a nested hash structure %old_hash ..
my %old_hash;
$old_hash{"foo"}{"bar"}{"zonk"} = "hello";
.. which we want to "flatten" (sorry if that's the wrong terminology!) to a non-nested hash using the sub &flatten(...) so that ..
my %h = &flatten(\%old_hash);
die unless($h{"zonk"} eq "hello");
The following definition of &flatten(...) does the trick:
sub flatten {
my $hashref = shift;
my %hash;
my %i = %{$hashref};
foreach my $ii (keys(%i)) {
my %j = %{$i{$ii}};
foreach my $jj (keys(%j)) {
my %k = %{$j{$jj}};
foreach my $kk (keys(%k)) {
my $value = $k{$kk};
$hash{$kk} = $value;
}
}
}
return %hash;
}
While the code given works it is not very readable or clean.
My question is two-fold:
In what ways does the given code not correspond to modern Perl best practices? Be harsh! :-)
How would you clean it up?
Your method is not best practices because it doesn't scale. What if the nested hash is six, ten levels deep? The repetition should tell you that a recursive routine is probably what you need.
sub flatten {
my ($in, $out) = #_;
for my $key (keys %$in) {
my $value = $in->{$key};
if ( defined $value && ref $value eq 'HASH' ) {
flatten($value, $out);
}
else {
$out->{$key} = $value;
}
}
}
Alternatively, good modern Perl style is to use CPAN wherever possible. Data::Traverse would do what you need:
use Data::Traverse;
sub flatten {
my %hash = #_;
my %flattened;
traverse { $flattened{$a} = $b } \%hash;
return %flattened;
}
As a final note, it is usually more efficient to pass hashes by reference to avoid them being expanded out into lists and then turned into hashes again.
First, I would use perl -c to make sure it compiles cleanly, which it does not. So, I'd add a trailing } to make it compile.
Then, I'd run it through perltidy to improve the code layout (indentation, etc.).
Then, I'd run perlcritic (in "harsh" mode) to automatically tell me what it thinks are bad practices. It complains that:
Subroutine does not end with "return"
Update: the OP essentially changed every line of code after I posted my Answer above, but I believe it still applies. It's not easy shooting at a moving target :)
There are a few problems with your approach that you need to figure out. First off, what happens in the event that there are two leaf nodes with the same key? Does the second clobber the first, is the second ignored, should the output contain a list of them? Here is one approach. First we construct a flat list of key value pairs using a recursive function to deal with other hash depths:
my %data = (
foo => {bar => {baz => 'hello'}},
fizz => {buzz => {bing => 'world'}},
fad => {bad => {baz => 'clobber'}},
);
sub flatten {
my $hash = shift;
map {
my $value = $$hash{$_};
ref $value eq 'HASH'
? flatten($value)
: ($_ => $value)
} keys %$hash
}
print join( ", " => flatten \%data), "\n";
# baz, clobber, bing, world, baz, hello
my %flat = flatten \%data;
print join( ", " => %flat ), "\n";
# baz, hello, bing, world # lost (baz => clobber)
A fix could be something like this, which will create a hash of array refs containing all the values:
sub merge {
my %out;
while (#_) {
my ($key, $value) = splice #_, 0, 2;
push #{ $out{$key} }, $value
}
%out
}
my %better_flat = merge flatten \%data;
In production code, it would be faster to pass references between the functions, but I have omitted that here for clarity.
Is it your intent to end up with a copy of the original hash or just a reordered result?
Your code starts with one hash (the original hash that is used by reference) and makes two copies %i and %hash.
The statement my %i=%{hashref} is not necessary. You are copying the entire hash to a new hash. In either case (whether you want a copy of not) you can use references to the original hash.
You are also losing data if your hash in the hash has the same value as the parent hash. Is this intended?