Cleanest way to parse argument with Getopt::Long - perl

I use GetOpt to parse command-line arguments. I would like to add a new option "multi" which should get a string which looks as following: key1=abc,key2=123,key3=xwz.
I don't know how many custom keys user want to give but he can give minimax 5 keys. Also, I would like to put it in a hash with keys.
I'm looking for a good and clean way to implement it.
For starters, I thought of using --multi {key1=abc,key2=123,key3=xwz} but for some reason, it gets only
the first key key1=abc. Also I tried: --multi {key1=abc},{key2=123},{key3=xwz} but it feels kind of messy. I want to give the user the possibility to add arguments with - like key1=./some_script.pl --help. Part of the code:
my %arg;
GetOptions(
"multi=s" => \$arg{"multi"},
}
Then I would like to somehow put those keys in the hash so it will be easy to use them. So I thought of using: $arg{"multi"}{"key3"} in order to get the value of key3.
How should I approach this feature? What is the cleanest way to do so?
To summarize it:
What is the best way to ask the user to give keys in order to get a similar situation to key1=abc,key2=123,key3=xwz, without using a file (giving options, not in a file way)? Meaning - how would you like, as a user of the script, to give those fields?
How to validate that user gave less than 5 keys?
How should I parse those keys and what is the best way to insert those keys into the hash map in the multi key.
Expected output: I would like to have a hash which looks like this: $arg{"multi"}{"key3"} and returns xwz.

The following program reads the comma-separated sub-options from the --multi option on the command line.
#!perl
use strict;
use warnings;
use Data::Dumper;
use Getopt::Long 'GetOptionsFromArray';
my #args = ('--multi', '{key1=abc,key2=123,key3=xwz}', 'some', 'other');
my %arg;
GetOptionsFromArray(
\#args,
"multi=s" => \$arg{"multi"},
);
if( $arg{multi} and $arg{multi} =~ /^\{(.*)\}$/) {
# split up into hash:
$arg{ multi } = { split /[{},=]/, $1 };
};
print Dumper \%arg;
__END__
$VAR1 = {
'multi' => {
'key2' => '123',
'key1' => 'abc',
'key3' => 'xwz'
}
};
The program uses GetOptionsFromArray for easy testability. In the real program, you will likely use GetOptions(...), which is identical to GetOptionsFromArray(\#ARGV, ...).

One way is to assign options of key=value format to a hash, what GetOpt::Long allows. Even better, as this functionality merely needs a hash reference, it turns out that you can have it assign to a hashref that is a value inside a deeper data structure. You can make direct use of that
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
use Data::Dump qw(dd);
my %args;
$args{multi} = {};
GetOptions( 'multi=s' => $args{multi} ) or die "Bad options: $!";
dd \%args;
With multiple invocations of that option the key-value pairs are added
script.pl --multi k1=v1 --multi k2=v2
and the above program prints
{ multi => { k1 => "v1", k2 => "v2" } }
I use Data::Dump to print complex data. Change to core Data::Dumper if that's a problem.
While Getopt::Long has a way to limit the number of arguments that an option takes that apparently applies only for array destinations. So you'd have to count keys to check.
Another way is to process the input string in a subroutine, where you can do practically anything you want. Adding that to the above script, to add yet another key with its hashref to %args
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
use Data::Dump qw(dd);
my %args;
$args{multi} = {};
GetOptions(
'multi=s' => $args{multi},
'other=s' => sub { $args{other} = { split /[=,]/, $_[1] } }
) or die "Bad options: $!";
dd \%args;
When called as
script.pl --multi k1=v1 --multi k2=v2 --other mk1=mv1,mk2=mv2
This prints
{
other => { mk1 => "mv1", mk2 => "mv2" },
multi => { k1 => "v1", k2 => "v2" },
}

Related

Merge two yml files does not handle duplicates?

I am trying to merge 2 yml files using Hash::Merge perl module. And trying to Dump it to yml file using Dump from YMAL module.
use strict;
use warnings;
use Hash::Merge qw( merge );
Hash::Merge::set_behavior('RETAINMENT_PRECEDENT');
use File::Slurp qw(write_file);
use YAML;
my $yaml1 = $ARGV[0];
my $yaml2 = $ARGV[1];
my $yaml_output = $ARGV[2];
my $clkgrps = &YAML::LoadFile($yaml1);
my $clkgrps1 = &YAML::LoadFile($yaml2);
my $clockgroups = merge($clkgrps1, $clkgrps);
my $out_yaml = Dump $clockgroups;
write_file($yaml_output, { binmode => ':raw' }, $out_yaml);
After merging yml file, I could see duplicate entries i.e. following content is same in two yml files. While merging it is treating them as different entries. Do we have any implicit way in handle duplicates?
The data structures obtained from YAML files generally contain keys with values being arrayrefs with hashrefs. In your test case that's the arrayref for the key test.
Then a tool like Hash::Merge can only add the hashrefs to the arrayref belonging to the same key; it is not meant to compare array elements, as there aren't general criteria for that. So you need to do this yourself in order to prune duplicates, or apply any specific rules of your choice to data.
One way to handle this is to serialize (so stringify) complex data structures in each arrayref that may contain duplicates so to be able to build a hash with them being keys, which is a standard way to handle duplicates (with O(1) complexity, albeit possibly with a large constant).
There are a number of ways to serialize data in Perl. I'd recommend JSON::XS, as a very fast tool with output that can be used by any language and tool. (But please research others of course, that may suit your precise needs better.)
A simple complete example, using your test cases
use strict;
use warnings;
use feature 'say';
use Data::Dump qw(dd pp);
use YAML;
use JSON::XS;
use Hash::Merge qw( merge );
#Hash::Merge::set_behavior('RETAINMENT_PRECEDENT'); # irrelevant here
die "Usage: $0 in-file1 in-file2 output-file\n" if #ARGV != 3;
my ($yaml1, $yaml2, $yaml_out) = #ARGV;
my $hr1 = YAML::LoadFile($yaml1);
my $hr2 = YAML::LoadFile($yaml2);
my $merged = merge($hr2, $hr1);
#say "merged: ", pp $merged;
for my $key (keys %$merged) {
# The same keys get overwritten
my %uniq = map { encode_json $_ => 1 } #{$merged->{$key}};
# Overwrite the arrayref with the one without dupes
$merged->{$key} = [ map { decode_json $_ } keys %uniq ];
}
dd $merged;
# Save the final structure...
More complex data structures require a more judicious traversal; consider using a tool for that.
With files as shown in the question this prints
{
test => [
{ directory => "LIB_DIR", name => "ObsSel.ktc", project => "TOT" },
{ directory => "MODEL_DIR", name => "pipe.v", project => "TOT" },
{
directory => "PCIE_LIB_DIR",
name => "pciechip.ktc",
project => "PCIE_MODE",
},
{ directory => "NAME_DIR", name => "fame.v", project => "SINGH" },
{ directory => "TREE_PROJECT", name => "Syn.yml", project => "TOT" },
],
}
(I use Data::Dump to show complex data, for its simplicity and default compact output.)
If there are issues with serializing and comparing entire structures consider using a digest (checksum, hashing) of some sort.
Another option altogether would be to compare data structures as they are in order to resolve duplicates, by hand. For comparison of complex data structures I like to use Test::More, which works very nicely for mere comparisons outside of any testing. But there are dedicated tools as well of course, like Data::Compare.
Finally, instead of manually processing the result of a naive merge, like above, one can code the desired behavior using Hash::Merge::add_behavior_spec and then have the module do it all. For specific examples of how to use this feature see for instance this post
and this post and this post.
Note that in this case you still write all the code to do the job like above but the module does take some of the mechanics off of your hands.

Perl: Add hash as sub hash to simple hash

I looked at the other two questions that seem to be about this, but they are a little obtuse and I can't relate them to what I want to do, which I think is simpler. I also think this will be a much clearer statement of a very common problem/task so I'm posting this for the benefit of others like me.
The Problem:
I have 3 files, each file a list of key=value pairs:
settings1.ini
key1=val1
key2=val2
key3=val3
settings2.ini
key1=val4
key2=val5
key3=val6
settings3.ini
key1=val7
key2=val8
key3=val9
No surprise, I want to read those key=value pairs into a hash to operate on them, so...
I have a hash of the filenames:
my %files = { file1 => 'settings1.ini'
, file2 => 'settings2.ini'
, file3 => 'settings3.ini'
};
I can iterate through the filenames like so:
foreach my $fkey (keys %files) {
say $files{$fkey};
}
Ok.
Now I want to add the list of key=value pairs from each file to the hash as a sub-hash under each respective 'top-level' filename key, such that I can iterate through them like so:
foreach my $fkey (keys %files) {
say "File: $files{$fkey}";
foreach my $vkey (keys $files{$fkey}) {
say " $vkey: $files{$fkey}{$vkey}";
}
}
In other words, I want to add a second level to the hash such that it goes from just being (in psuedo terms) a single layer list of values:
file1 => settings1.ini
file2 => settings2.ini
file3 => settings3.ini
to being a multi-layered list of values:
file1 => key1 => 'val1'
file1 => key2 => 'val2'
file1 => key3 => 'val3'
file2 => key1 => 'val4'
file2 => key2 => 'val5'
file2 => key3 => 'val6'
file3 => key1 => 'val7'
file3 => key2 => 'val8'
file3 => key3 => 'val9'
Where:
my $fkey = 'file2';
my $vkey = 'key3';
say $files{$fkey}{$vkey};
would print the value
'val6'
As a side note, I am trying to use File::Slurp to read in the key=value pairs. Doing this on a single level hash is fine:
my %new_hash = read_file($files{$fkey}) =~ m/^(\w+)=([^\r\n\*,]*)$/img;
but - to rephrase this whole problem - what I really want to do is 'graft' the new hash of key=value pairs onto the existing hash of filenames 'under' the top $file key as a 'child/branch' sub-hash.
Questions:
How do I do this, how do I build a multi-level hash one layer at a time like this?
Can I do this without having to pre-define the hash as multi-layered up front?
I use strict; and so I have seen the
Can't use string ("string") as a HASH ref while "strict refs" in use at script.pl line <lineno>.
which I don't fully understand...
Edit:
Thank you Timur Shtatland, Polar Bear and Dave Cross for your great answers. In mentally parsing your suggestions it occurred to me that I had slightly mislead you by being a little inconsistent in my original question. I apologize. I also think I see now why I saw the 'strict refs' error. I have made some changes.
Note that my first mention of the initial hash of filename is correct. The subsequent foreach examples looping through %files, however, were incorrect because I went from using file1 as the first file key to using settings1.ini as the first file key. I think this is why Perl threw the strict refs error - because I tried to change the key from the initial string to a hash_ref pointing to the sub-hash (or vice versa).
Have I understood that correctly?
There are several CPAN modules purposed for ini files. You should study what is available and choose what your heart desire.
Otherwise you can write your own code something in the spirit of following snippet
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my #files = qw(settings1.ini settings2.ini settings3.ini);
my %hash;
for my $file (#files) {
$hash{$file} = read_settings($file);
}
say Dumper(\%hash);
sub read_settings {
my $fname = shift;
my %hash;
open my $fh, '<', $fname
or die "Couldn't open $fname";
while( <$fh> ) {
chomp;
my($k,$v) = split '=';
$hash{$k} = $v
}
close $fh;
return \%hash;
}
Output
$VAR1 = {
'settings1.ini' => {
'key2' => 'val2',
'key1' => 'val1',
'key3' => 'val3'
},
'settings2.ini' => {
'key2' => 'val5',
'key1' => 'val4',
'key3' => 'val6'
},
'settings3.ini' => {
'key1' => 'val7',
'key2' => 'val8',
'key3' => 'val9'
}
};
To build the hash one layer at a time, use anonymous hashes. Each value of %files here is a reference to a hash, for example, for $files{'settings1.ini'}:
# read the data into %new_hash, then:
$files{'settings1.ini'} = { %new_hash }
You do not need to predefine the hash as multi-layered (as hash of hashes) upfront.
Also, avoid reinventing the wheel. Use Perl modules for common tasks, in this case consider something like Config::IniFiles for parsing *.ini files
SEE ALSO:
Anonymous hashes: perlreftut
Hashes of hashes: perldsc
Perl makes stuff like this ridiculously easy.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my %files;
# <> reads from the files given on the command line
# one line at a time.
while (<>) {
chomp;
my ($key, $val) = split /=/;
# $ARGV contains the name of the file that
# is currently being read.
$files{$ARGV}{$key} = $val;
}
say Dumper \%files;
Running this as:
$ perl readconf settings1.ini settings2.ini settings3.ini
Gives the following output:
$VAR1 = {
'settings3.ini' => {
'key2' => 'val8',
'key1' => 'val7',
'key3' => 'val9'
},
'settings2.ini' => {
'key3' => 'val6',
'key1' => 'val4',
'key2' => 'val5'
},
'settings1.ini' => {
'key3' => 'val3',
'key1' => 'val1',
'key2' => 'val2'
}
};

Is it possible to read __DATA__ with Config::General in Perl?

I'd like to setup Config::General to read from the __DATA__ section of a script instead of an external file. (I realize that's not normally how it works, but I'd like to see if I can get it going. A specific use case is so I can send a script example to another developer without having to send a separate config file.)
According to the perldoc perldata, $main::DATA should act as a valid filehandle. I think Config::General should then be able to use -ConfigFile => \$FileHandle to read it, but it's not working for me. For example, this script will execute without crashing, but the __DATA__ isn't read in.
#!/usr/bin/perl -w
use strict;
use Config::General;
use YAML::XS;
my $configObj = new Config::General(-ConfigFile => $main::DATA);
my %config_hash = $configObj->getall;
print Dump \%config_hash;
__DATA__
testKey = testValue
I also tried:
my $configObj = new Config::General(-ConfigFile => \$main::DATA);
and
my $configObj = new Config::General(-ConfigFile => *main::DATA);
and a few other variations, but couldn't get anything to work.
Is it possible to use Config::General to read config key/values from __DATA__?
-ConfigFile requires a reference to a handle. This works:
my $configObj = Config::General->new(
-ConfigFile => \*main::DATA
);
The DATA handle is a glob, not a scalar.
Try *main::DATA instead of $main::DATA.
(and maybe try \*main::DATA. From the Config::General docs it looks like you are supposed to pass a filehandle argument as a reference.)
If the -ConfigGeneral => filehandle argument to the constructor doesn't do what you mean, an alternative is
new Config::General( -String => join ("", <main::DATA>) );
This works for me:
#!/usr/bin/perl
use strict;
use warnings;
use Config::General;
use YAML::XS;
my $string;
{
local $/;
$string = <main::DATA>;
};
my $configObj = new Config::General(-String => $string);
my %config_hash = $configObj->getall;
use Data::Dumper;
warn Dumper(\%config_hash);
__DATA__
testKey = testValue

Is there a way make perl compilation fail if a hash key wasn't defined in the initial hash definition?

All keys used should be present in the initial %hash definition.
use strict;
my %hash = ('key1' => 'abcd', 'key2' => 'efgh');
$hash{'key3'} = '1234'; ## <== I'd like for these to fail at compilation.
$hash{'key4'}; ## <== I'd like for these to fail at compilation.
Is there a way to do this?
The module Hash::Util has been part of Perl since 5.8.0. And that includes a 'lock_keys' function that goes some way to implementing what you want. It gives a runtime (not compile-time) error if you try to add a key to a hash.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Hash::Util 'lock_keys';
my %hash = (key1 => 'abcd', key2 => 'efgh');
lock_keys(%hash);
$hash{key3} = '1234'; ## <== I'd like for these to fail at compilation.
say $hash{key4}; ## <== I'd like for these to fail at compilation.
Tie::StrictHash dies when you try to assign a new hash key, but it does it at runtime instead of compile time.
use strict;
my %hash = ('key1' => 'abcd', 'key2' => 'efgh');
my $ke = 'key3';
if (!exists $hash{$ke}) {
exit;
}

How can I make Perl die when reading, but not writing, to non-existing keys in deep hash?

I'm using dynamic multilevel hashes from which I read data but also writes data.
A common pitfall for me is accessing non-existing keys (typos, db revisions etc.). I get undefs which propagate to other parts and cause problems. I would like to die whenever I try to read a non-existing key, but still be allowed to add new keys.
So the wanted behavior is:
my %hash;
$hash{A} = 5; # ok
print $hash{A}, "\n"; # ok
print $hash{X}, "\n"; # should die
$hash{B}{C}{D} = 10; # ok
print $hash{B}{C}{X}, "\n"; # should die
I previously posted a similar question and got great answers. I especially like the accepted one, which allows using the normal hash syntax. The only problem is I'm not sure how to easily generalize this to deep hashes as in the example above.
p.s.
I find this feature really useful and I wonder if I'm missing something, since it does not seem very popular. Perhaps it is not common to read/write from/to the same hash?
With warnings pragma switched on then you do get Use of uninitialized value in print at... warnings at the two lines you want to die.
So if you make warnings fatal then they would die instead:
use warnings FATAL => 'all';
Update
Based on comments you've made I assume your common case issue is something along these lines:
my $x = $hash{B}{C}{X};
Which won't throw warning/error until you actually use $x later on.
To get around this then you can do:
my $x = $hash{B}{C}{X} // 'some default value';
my $z = $hash{B}{C}{Z} // die "Invalid hash value";
Unfortunately the above would mean a lot of extra typing :(
Here is at least a short cut:
use 5.012;
use warnings FATAL => 'all';
use Carp 'croak';
# Value Or Croak!
sub voc { $_[0] // croak "Invalid hash" }
Then below would croak!
my $x = voc $hash{B}{C}{X};
Hopefully this and also the fatal warnings are helpful to you.
/I3az/
It's late for me so I'll be brief, but you could do this using the tie functionality -- have your hash represented by an object underneath, and implement the functions needed to interact with the hash.
Check out perldoc -f tie; there are also many classes on CPAN to look at, including Tie::Hash itself which is a good base class for tied hashes which you could build on, overriding a few methods to add your error checking.
If you want to wrap checks around a hash, create a subroutine to do it and use it as your interface:
use 5.010;
use Carp qw(croak);
sub read_from_hash {
my( $hash, #keys ) = #_;
return check_hash( $hash, #keys ) // croak ...;
}
But now you're starting to look like a class. When you need specialized behavior, start writing object-oriented classes. Do whatever you need to do. That's the part you're missing, I think.
The problem with sticking to the hash interface is that people expect the hash syntax to act as normal hashes. When you change that behavior, other people are going to have a tough time figuring out what's going on and why.
If you don't know what keys the hash might have, use one of the tied hash suggestions or just turn on warnings. Be aware that tying is very slow, nine times slower than a regular hash and three times slower than an object.
If you have a fixed set of possible keys, what you want is a restricted hash. A restricted hash will only allow you to access a given set of keys and will throw an error if you try to access anything else. It can also recurse. This is much faster than tying.
Otherwise, I would suggest turning your data into an object with methods rather than direct hash accesses. This is slower than a hash or restricted hash, but faster than a tied hash. There are many modules on CPAN to generate methods for you starting with Class::Accessor.
If your data is not fixed, you can write simple get() and set() methods like so:
package Safe::Hash;
use strict;
use warnings;
use Carp;
sub new {
my $class = shift;
my $self = shift || {};
return bless $self, $class;
}
sub get {
my($self, $key) = #_;
croak "$key has no value" unless exists $self->{$key};
return $self->{$key};
}
sub set {
my($self, $key, $value) = #_;
$self->{$key} = $value;
return;
}
You can get recursive behavior by storing objects in objects.
my $inner = Safe::Hash->new({ foo => 42 });
my $outer = Safe::Hash->new({ bar => 23 });
$outer->set( inner => $inner );
print $outer->get("inner")->get("foo");
Finally, since you mentioned db revisions, if your data is being read from a database then you will want to look into an object relation mapper (ORM) to generate classes and objects and SQL statements for you. DBIx::Class and Rose::DB::Object are two good examples.
Use DiveDie from Data::Diver:
use Data::Diver qw(DiveDie);
my $href = { a => { g => 4}, b => 2 };
print DiveDie($href, qw(a g)), "\n"; # prints "4"
print DiveDie($href, qw(c)), "\n"; # dies
re: your comment - hints on how to get the recursive effect on Ether's tie answer.
I'ts not for the fainthearted, but below is a basic example of one way that you might do what you're after by using Tie::Hash:
HashX.pm
package HashX;
use 5.012;
use warnings FATAL => 'all';
use Carp 'croak';
use Tie::Hash;
use base 'Tie::StdHash';
sub import {
no strict 'refs';
*{caller . '::hash'} = sub {
tie my %h, 'HashX', #_;
\%h;
}
}
sub TIEHASH {
my $class = shift;
croak "Please define a structure!" unless #_;
bless { #_ }, $class;
}
sub STORE {
my ($self, $key, $value) = #_;
croak "Invalid hash key used to store a value" unless exists $self->{$key};
$self->{$key} = $value;
}
sub FETCH {
my ($self, $key) = #_;
exists $self->{$key}
? $self->{$key}
: croak "Invalid hash key used to fetch a value";
}
1;
Above module is like a strict hash. You have to declare the hash structure up front then any FETCH or STORE will croak unless the hash key does exist.
The module has a simple hash function which is imported into calling program and is used to build the necessary tie for everything to work.
use 5.012;
use warnings;
use HashX;
# all my hashref are ties by using hash()
my $hash = hash(
a => hash(
b => hash(
c => undef,
),
),
);
$hash->{a}{b}{c} = 1; # ok
$hash->{a}{b}{c} = 2; # also ok!
$hash->{a}{b}{d} = 3; # throws error
my $x = $hash->{a}{b}{x}; # ditto
Remember this is a quick & dirty example and is untested beyond above. I'm hoping it will give you the idea of how it could be done using Tie::Hash and even whether it's worth attempting :)