Perl - reading in multi-line records from config file - perl

I'm trying to read in a multi-line config file with records into a perl hash array
Example Config File:
record_1
phone=5551212
data=1234234
end_record_1
record_2
people_1=bob
people_2=jim
data=1234
end_record_2
record_3
people_1=sue
end_record_3
here's what I'm looking for:
$myData{1}{"phone"} <--- 5551212
$myData{1}{"data"} <--- 1234234
$myData{2}{"people_1"} <--- bob
... etc
What's the best way to read this in? Module? Regex with multi-line match? Brute force? I'm up in the air on where to head next.

Here's one option with your data set:
use strict;
use warnings;
use Data::Dumper;
my %hash;
{
local $/ = '';
while (<DATA>) {
my ($rec) = /record_(\d+)/;
$hash{$rec}{$1} = $2 while /(\S+)=(.+)/g;
}
}
print Dumper \%hash;
__DATA__
record_1
phone=5551212
data=1234234
end_record_1
record_2
people_1=bob
people_2=jim
data=1234
end_record_2
record_3
people_1=sue
end_record_3
Output:
$VAR1 = {
'1' => {
'data' => '1234234',
'phone' => '5551212'
},
'3' => {
'people_1' => 'sue'
},
'2' => {
'people_1' => 'bob',
'data' => '1234',
'people_2' => 'jim'
}
};
Setting local $/ = '' results in an empty line being treated as a "record separator" in your data set, so we can use regexs on those records to grab the information for the hash keys/values.
Hope this helps!

There are a number of modules for this, so the best practice is (as usual) to use them rather than re-invent the wheel.
From the snippet of the config file you posted, it looks like Config::Simple may be the best choice. If you can simplify the config format, then Config::Tiny would be easier to use. If things get more complicated, then you may have to use Config::General.
http://metacpan.org/pod/Config::Tiny
http://metacpan.org/pod/Config::Simple
http://metacpan.org/pod/Config::General

Read it in one line at a time.
When you see a new record, add a new empty associative array to myData and grab a reference to it - this will be your "current record".
Now when you see key/value pairs on a line, you can add that to the current record array (if there is one)
When you see the end of a record, you just clear the reference to the current record.

Related

Cleanest way to parse argument with Getopt::Long

I use GetOpt to parse command-line arguments. I would like to add a new option "multi" which should get a string which looks as following: key1=abc,key2=123,key3=xwz.
I don't know how many custom keys user want to give but he can give minimax 5 keys. Also, I would like to put it in a hash with keys.
I'm looking for a good and clean way to implement it.
For starters, I thought of using --multi {key1=abc,key2=123,key3=xwz} but for some reason, it gets only
the first key key1=abc. Also I tried: --multi {key1=abc},{key2=123},{key3=xwz} but it feels kind of messy. I want to give the user the possibility to add arguments with - like key1=./some_script.pl --help. Part of the code:
my %arg;
GetOptions(
"multi=s" => \$arg{"multi"},
}
Then I would like to somehow put those keys in the hash so it will be easy to use them. So I thought of using: $arg{"multi"}{"key3"} in order to get the value of key3.
How should I approach this feature? What is the cleanest way to do so?
To summarize it:
What is the best way to ask the user to give keys in order to get a similar situation to key1=abc,key2=123,key3=xwz, without using a file (giving options, not in a file way)? Meaning - how would you like, as a user of the script, to give those fields?
How to validate that user gave less than 5 keys?
How should I parse those keys and what is the best way to insert those keys into the hash map in the multi key.
Expected output: I would like to have a hash which looks like this: $arg{"multi"}{"key3"} and returns xwz.
The following program reads the comma-separated sub-options from the --multi option on the command line.
#!perl
use strict;
use warnings;
use Data::Dumper;
use Getopt::Long 'GetOptionsFromArray';
my #args = ('--multi', '{key1=abc,key2=123,key3=xwz}', 'some', 'other');
my %arg;
GetOptionsFromArray(
\#args,
"multi=s" => \$arg{"multi"},
);
if( $arg{multi} and $arg{multi} =~ /^\{(.*)\}$/) {
# split up into hash:
$arg{ multi } = { split /[{},=]/, $1 };
};
print Dumper \%arg;
__END__
$VAR1 = {
'multi' => {
'key2' => '123',
'key1' => 'abc',
'key3' => 'xwz'
}
};
The program uses GetOptionsFromArray for easy testability. In the real program, you will likely use GetOptions(...), which is identical to GetOptionsFromArray(\#ARGV, ...).
One way is to assign options of key=value format to a hash, what GetOpt::Long allows. Even better, as this functionality merely needs a hash reference, it turns out that you can have it assign to a hashref that is a value inside a deeper data structure. You can make direct use of that
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
use Data::Dump qw(dd);
my %args;
$args{multi} = {};
GetOptions( 'multi=s' => $args{multi} ) or die "Bad options: $!";
dd \%args;
With multiple invocations of that option the key-value pairs are added
script.pl --multi k1=v1 --multi k2=v2
and the above program prints
{ multi => { k1 => "v1", k2 => "v2" } }
I use Data::Dump to print complex data. Change to core Data::Dumper if that's a problem.
While Getopt::Long has a way to limit the number of arguments that an option takes that apparently applies only for array destinations. So you'd have to count keys to check.
Another way is to process the input string in a subroutine, where you can do practically anything you want. Adding that to the above script, to add yet another key with its hashref to %args
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
use Data::Dump qw(dd);
my %args;
$args{multi} = {};
GetOptions(
'multi=s' => $args{multi},
'other=s' => sub { $args{other} = { split /[=,]/, $_[1] } }
) or die "Bad options: $!";
dd \%args;
When called as
script.pl --multi k1=v1 --multi k2=v2 --other mk1=mv1,mk2=mv2
This prints
{
other => { mk1 => "mv1", mk2 => "mv2" },
multi => { k1 => "v1", k2 => "v2" },
}

HTML::Parser handler sends undefined parameter to callback function?

How its being declared:
my $HTML_GRABBER = HTML::Parser->new('api_version' => 2,
'handlers' => {
'start' => [\&start_tag,"tagname,text"],
'text' => [\&read_text,"tagname, text"],
'end' => [\&end_tag,"tagname"]
}
);
callback function:
sub read_text {
print Dumper(#_);
die "\n";
my ($tag,$stuff) = #_;
if(($DO_NOTHING==0)&&($tag eq $current_tag))
{
push #{$data_queue}, $stuff;
}
}
result:
$VAR1 = undef;
$VAR2 = '
';
so it passes an undefined value and an empty string for tag and text, apparently. THis is reading from a saved HTML file on my harddrive. IDK
I had something like this in mind:
#DOC structure:
#(
# "title"=> {"text"=>#("text")}
# "div" => [
# {
# "p"=> [
# {
# "class" => string
# "id" => string
# "style" => string
# "data"=>["first line", "second line"]
# }
# ],
# "class" => string
# "id" => string
# "style" => string
# }
# ]
#)
You've told it to.
You specified which parameters should be passed to the text handler:
'text' => [\&read_text,"tagname, text"],
Well, there is no tagname for a text token, and therefore it passes you undef as the first paramter.
What exactly are you trying to do? If you describe your actual goal, we might be able to suggest a better solution instead of just pointing out the flaws in your current implementation. Check out: What is an XY Problem?
Addendum about Mojo::DOM
There are modern modules like Mojo::DOM that are much better for navigating a document structure and finding specific data. Check out Mojocast Episode 5 for a helpful 8 minute introductory video.
You appear to be prematurely worried about efficiency of the parse. Initially, I'd advise you to just store the raw html in the database, and reparse it whenever you need to pull new information.
If you Benchmark and decide this is too slow, then you can use Storable to save a serialized copy of the parsed $dom object. However, this should definitely be in addition to the saved html.
use strict;
use warnings;
use Mojo::DOM;
use Storable qw(freeze thaw);
my $dom = Mojo::DOM->new(do {local $/; <DATA>});
# Serializing to memory - Can then put it into a DB if you want
my $serialized = freeze $dom;
my $newdom = thaw($serialized);
# Load Title from Serialized dom
print $newdom->find('title')->text;
__DATA__
<html>
<head><title>My Title</title></head>
<body>
<h1>My Header one</h1>
<p>My Paragraph One</p>
<p>My Paragraph Two</p>
</body>
</html>
Outputs:
My Title

Accessing Hash of hash of Array in Perl

I have the following data I wish to access:
$data1=
{'Family' => {
'House' => [
{
'Id' => '1111',
'Name' => 'DFG'
},
{
'Id' => '211',
'Name' => 'ABC'
}
]
}
}
I want to access the each Name field value. I am using this code:
foreach(keys%$data1) {
if(ref($data1->{$_}) eq 'HASH') {
foreach my $inner_key (keys%{$data1->{$_}}) {
print "Key:$inner_key and value:$data1->{$_}->{$inner_key}\n";
}
}
else {
print "Key: $_ and Value: $data1->{$_}\n"
}
}
It prints Key:House and value:ARRAY(OXXX).
I know I am doing something wrong. Since the data in 'House' is an array of hashes, I even tried accessing through $data1->{$_}->{$inner_key}[0]. What is wrong in the code???
You have to dereference array for foreach loop first, and then dereference hashref to reach "Name" values.
print "Key:$inner_key and value:$_->{Name}\n"
for #{$data1->{$_}->{$inner_key}};
You should read perlref first to learn how to create and use references.
Here is a demonstration:
#!/usr/bin/perl
use strict;
use warnings;
my $data1=
{'Family' => {
'House' => [
{
'Id' => '1111',
'Name' => 'DFG'
},
{
'Id' => '211',
'Name' => 'ABC'
}
]
}
};
while (my ($key1, $val1) = each %$data1) {
print "\$key1 = $key1\n";
while (my ($key2, $val2) = each %$val1) {
print "\t\$key2 = $key2\n";
foreach my $val3 (#$val2) {
while (my ($key4, $val4) = each %$val3) {
print "\t\t\$key4 = $key4 => $val4\n";
}
print "\n";
}
}
}
[Edit I typed too slowly while answering, so this response bascially duplicates #mpapec's below - I will leave the references here and you can vote me up for those ;-) but do not accept my response as the answer].
Try something like the following to see if it works:
for $inner_hash (#{ $data1->{Family}{House} }) {
say "Name: $inner_hash->{Name}"
}
since you need to get the inner hashes' values from inside the elements of the array (that is what value:ARRAY(OXXX) is telling you).
You can use perldoc to look at the perldata, perlref, perlreftut and perldsc PODs to learn more about data structures and dereferencing. If keeping your data structure in mind while you are writing code gets to be too hard to do, it may mean you need to simplify things: either the data itself or by writing sub's to make it easier to access, or making use some of the excellent utility modules from CPAN.
There's also some good perl data structure related tutorials out there. The POD/perldoc documentation that ships with perl (along with Chapter 9 of Programming Perl) is the canonical reference, but you might browse these nodes from perlmonks:
References quick reference
Referencing in advanced data structures
Visualizing perl data structures
Perlmonks Hash Tutorial
NB Above I'm using the perlcritic and Perl Best Practices style of dereferencing: e.g.: #{ $data1->{Family}{House} } so the syntax reminds me that the inner hashes (or inner-inner?) are inside an array. There's a cool new way of dereferencing being introduced in perl 5.20 called postfix dereferencing which will be great, but you can't go wrong following the recommendations of PBP.
"Until you start thinking in terms of hashes, you aren't really thinking in Perl." -- Larry Wall
Cheers,

Issue in hash modification

I am printing a hash [ print Dumper($myhash); ], it is as below :
$VAR1= {
'context_verdict' => 'Failed',
'logfile' => 'abc',
'Block_000' => {
'Element_0032' => {
'e_verdict' => 'FAILED',
'e_name' => 'Element_0032',
'e_log' => 'This is really bad...',
'e_ref' => 'Good'
}
}
Now I want to change the value of logfile from abc to def. how to achieve this ?
I wrote
$myhash{'$VAR1'}->{'logfile'}="def";
But it does not works!! It is still the "abc".
Try this one:
$myhash->{'logfile'}="def";
Data::Dumper names your variable as $VAR1, this is not an entry in your hash.
First of all, always use use strict; use warnings;.
You want
$VAR1->{'logfile'} = "def";
If you obtained the dump using Dumper(\%myhash),
$myhash{'logfile'} = "def";
If you obtained the dump using Dumper($myhash),
$myhash->{'logfile'} = "def";
$myhash holds a reference to a hash, so you need to dereference it to access the hash. That's what -> is doing.
Data::Dumper helps to analyse a huge hash and the values will be named $VAR in the output.
Answer to your question is:
You can set the value as
$myhash->{'logfile'}="def";

Testing for different types of hash values in Perl?

I'm writing a small Perl script that goes through an XML file via XML::Simple
my $xml = new XML::Simple;
my $detail= $xml->XMLin($xml_local);
Sometimes, the contents of an element in the XML are empty.
When there is no content in an element in the XML, and I try to print out the contents using:
print $detail->{Parsing}->{Through}->{XML}->{ElementContents}
I get the output:
HASH(0x18948c4)
......or something similar..... the only difference is the chars between the ()'s
I want to test if the content is empty and default the variable to something else - maybe '' or "" - anything but the hash reference/address/whatever that is.
I tried this, but got an error that its not an array reference:
print $detail->{Parsing}->{Through}->{XML}->{ElementContents}[0]
UPDATE
Output of one of the elements using Data::Dumper:
'something' => [
{
'somedetail' => '',
'somedetail' => '',
'somedetail' => 'http://www.google.com'
'somedetail' => 'google',
'somedetail' => '1',
'somedetail' => '01/21/02'
},
How can I test for these '' empty strings using Perl? They are returned as HASH(0x18948c4) unless some filtering is enabled.
The reason it prints HASH(0x18948c4) is because the contents of that value are NOT in fact empty, but a hashref. When you print something, Perl tries to stringify that something, and stringified result of a hash reference is HASH(address) where address is the address of the actual hash.
Print the actual contents of that hashref as follows:
use Data::Dumper;
print Data::Dumper->Dump([$detail->{Parsing}->{Through}->{XML}->{ElementContents}]);
If as you say there are "no contents", it will probably be an empty hashref:
$VAR1 = {};
If so, you can check for it via:
if (ref($detail->{Parsing}->{Through}->{XML}->{ElementContents}) eq ref({})
&& !keys %{ $detail->{Parsing}->{Through}->{XML}->{ElementContents} })
print "No contents, empty hashref";
}
First condition ensures it's a hashref, second, that the hash resulting from its dereference has zero elements as its keys - meaning it's an empty hash being referenced.
However, I seriously doubt it's an empty hash from what I recall about XML::Simple - and doing the Data::Dumper print as shown above will show you HOW to deal with it. You should always print out unknown data structures this way to figure out what to do with them.
E.g., if your Data::Dumper output was:
$VAR1 = {
'a' => 1
};
Then you need to print $detail->{Parsing}->{Through}->{XML}->{ElementContents}->{a}, obviously. Again, be careful to only print something that is a scalar and not an arrayref or hashref, so go down the data structure as much as needed to get to a scalar.
This is a modified version of DVK's answer that worked for me:
if (ref($detail->{Parsing}->{Through}->{XML}->{ElementContents}) eq ref({}))
{
...empty element content...
}
I needed to remove the 2nd condition of the if(condition1 && condition2) statement he gave me.