need to modify perl Deep::Hash::Utils - perl

i am very new to perl. i am trying to use the below code from CPAN.
my $C;
# Recursive version of C<each>;
sub reach {
my $ref = shift;
if (ref $ref eq 'HASH') {
if (defined $C->{$ref}{v}) {
if (ref $C->{$ref}{v} eq 'HASH') {
if (my #rec = reach($C->{$ref}{v})) {
return ($C->{$ref}{k},#rec);
}
} elsif (ref $C->{$ref}{v} eq 'ARRAY') {
if (my #rec = reach($C->{$ref}{v})) {
if (defined $C->{$ref}{k}) {
return $C->{$ref}{k},#rec;
}
return #rec;
}
}
undef $C->{$ref};
}
if (my ($k,$v) = each %$ref) {
$C->{$ref}{v} = $v;
$C->{$ref}{k} = $k;
return ($k,reach($v));
}
return ();
} elsif (ref $ref eq 'ARRAY') {
if (defined $C->{$ref}{v}) {
if (ref $C->{$ref}{v} eq 'HASH' ||
ref $C->{$ref}{v} eq 'ARRAY') {
if (my #rec = reach($C->{$ref}{v})) {
if (defined $C->{$ref}{k}) {
return $C->{$ref}{k},#rec;
}
return #rec;
}
}
}
if (my $v = $ref->[$C->{$ref}{i}++ || 0]) {
$C->{$ref}{v} = $v;
return (reach($v));
}
return ();
}
return $ref;
}
input:
bar => {cmd_opts => { gld_upf => ['abc' , 'def']} }
current output:
[bar, cmd_opts, gld_upf, abc]
[bar, cmd_opts, gld_upf, def]
desired output:
[bar, cmd_opts, gld_upf, ['abc', 'def']]
also, what are the concepts that are being used in this code?
are there any books/courses i can take for this?

also, what are the concepts that are being used in this code? are there any books/courses i can take for this?
The code mentioned by you from the Deep::Hash::Utils CPAN module is mainly handling nested data structures.
A couple of places to read about these:
the official docs: perldsc ; perlreftut ; perlref ;
Modern Perl by chromatic has a section on Nested Data Structures around page 60
Intermediate Perl: Beyond The Basics of Learning Perl 2nd edition has a section about Nested Data Structures around page 44.
In the most basic case, in these nested data structures, every node has one of the following types:
scalar
hashref
arrayref
In turn, the values in the array pointed to by an arrayref can be of type scalar/hashref/arrayref.
The same goes for the values of the hash pointed to by an arrayref, it can be of type scalar/hashref/arrayref.
This induces a tree-like structure. The algorithm for traversing such a tree is depth-first search
where some additional logic is required to check the type of the node and depending on the type decide how to proceed further down the tree.
To make a parallel, all of this is not that much different from traversing a filesystem hierarchy (see link1, link2).
A bigger list called perlres on Perl resources is available.
In this specific case, the function reach from Deep::Hash::Utils acts as an iterator, and it returns all paths descending from the root down to each leaf.
Whenever a #path to a leaf is found, its elements are compared side-by-side with another list called #output, and there are three cases:
there's no element on that position, so we store it
the elements are equal, so we skip them
the elements are different, so we merge them together in a list
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Deep::Hash::Utils qw/reach/;
my $input = { bar => {cmd_opts => { gld_upf => ['abc' , 'def']} } };
my #output = ();
while (my #path = reach($input)) {
for(my $i=0;$i<=$#path;$i++){
if(defined $output[$i]) {
if(ref($output[$i]) eq "") {
if($output[$i] eq $path[$i]) {
next;
};
my $e1 = $output[$i];
my $e2 = $path[$i];
$output[$i] = [$e1,$e2];
}elsif(ref($output[$i]) eq "ARRAY"){
push #{$output[$i]}, $path[$i];
};
} else {
$output[$i] = $path[$i];
};
};
}
print Dumper \#output;
OUTPUT:
$VAR1 = [
'bar',
'cmd_opts',
'gld_upf',
[
'abc',
'def'
]
];

Related

Split string into a hash of hashes (perl)

at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH

Possible to call STORE on deeper level tied hash assignment?

I'm trying to write a Perl module for a “persistent YAML hash”, with the following properties:
With every access, check if the YAML file has changed, and if so, reload.
As soon as any data in the hash is changed, save.
Don't save on UNTIE, so that the file isn't updated when you only read values.
My first attempt seemed to work pretty well:
package YAMLHash;
use v5.24;
use warnings;
use experimental 'signatures';
use YAML::XS qw(DumpFile LoadFile);
use File::stat;
sub refresh($self)
{
if (-f $self->{file}) {
if (stat($self->{file})->mtime > $self->{mtime}) {
$self->{data} = LoadFile($self->{file});
$self->{mtime} = stat($self->{file})->mtime;
}
}
}
sub save($self)
{
DumpFile($self->{file}, $self->{data});
$self->{mtime} = stat($self->{file})->mtime;
}
sub TIEHASH($class, #args)
{
my ($filename) = $args[0];
die "No filename specified" unless $filename;
my $self = bless { data=>{}, file=>$filename, mtime=>0 }, $class;
refresh($self);
return $self;
}
sub FETCH($self, $key = '')
{
refresh($self);
return $self->{data}{$key};
}
sub EXISTS($self, $key)
{
refresh($self);
return exists($self->{data}{$key});
}
sub FIRSTKEY($self)
{
refresh($self);
my #ignore = keys %{$self->{data}}; # reset iterator
return each %{$self->{data}};
}
sub NEXTKEY($self, $lastkey)
{
refresh($self);
return each %{$self->{data}};
}
sub SCALAR($self)
{
return scalar %{$self->{data}};
}
sub STORE($self, $key, $value)
{
refresh($self);
$self->{data}{$key} = $value;
save($self);
}
sub DELETE($self, $key)
{
refresh($self);
delete $self->{data}{$key};
save($self);
}
sub CLEAR($self, $key)
{
$self->{data} = {};
save($self);
}
1;
I tried this as follows:
use YAMLHash;
tie my %foo, 'YAMLHash', 'test.yaml';
$foo{hello} = 'world';
$foo{answer} = 42;
$foo{counter}++;
and the resulting YAML file looks like this:
---
answer: 42
counter: 1
hello: world
But then I changed my example code to:
use YAMLHash;
tie my %foo, 'YAMLHash', 'test.yaml';
$foo{hello} = 'world';
$foo{answer} = 42;
$foo{counter}++;
$foo{a}{b}{c}{d} = 'e';
and the result is:
---
a: {}
answer: 42
counter: 2
hello: world
So, obviously, STORE is called when $foo{a} is created, but not when $foo{a}{b}{c}{d} is assigned.
Is there any way to make this do what I want?
You will need to tie %{ $foo{a} }, %{ $foo{a}{b} } and %{ $foo{a}{b}{c} } as well.
You could recursively tie the hashes and arrays in the data structure in TIEHASH. Don't forget to the do the same thing to data added to the structure via STORE!
You might want to use a different class for the root of the data structure and non-root nodes.
Warning: Using tie will make accesses slower.
Note that you need to tie the scalars too, not just the hashes (and arrays). All of the following change the value of a hash element without calling STORE:
Changing the scalar directly:
++$foo{a};
chomp($foo{a});
$foo{a} =~ s/x/y/g;
...
Changing a scalar via an alias or a reference:
my \$x = \$foo{a}; $x = 123;
my $r = \$foo{a}; $$r = 123;
for ($foo{a}) { $_ = 123; }
sub { $_[0] = 123; }->($foo{a});
...

functional Perl: Filter, Iterator

I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to:
use a filter that wraps the given iterator and returns an iterator without empty lines or comment lines
The mentioned filter(s) should be defined outside of the function for reusability by other functions.
use another function that is given the line and returns a tuple of key and values string
use another function that breaks the comma separated values into a list of values.
What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable.
For reference, here is (a quick hack) how I might do it in Python:
re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
and re_key_values.match(line)
def parse(lines):
interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
return dict(splitted_values)
A direct translation of your Python would be
my $re_is_comment_line = qr/^\s*#/;
my $re_key_values = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter = qr/\s*,\s*/;
my $is_interesting_line= sub {
my $_ = shift;
length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};
sub parse {
my #lines = #_;
my #interesting_lines = grep $is_interesting_line->($_), #lines;
my #key_values = map [/$re_key_values/], #interesting_lines;
my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } #key_values;
return %splitted_values;
}
Differences are:
ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.
Perl doesn't emphazise laziness, and seldomly uses iterators. There are instances where this is required, but usually the whole list is evaluated at once.
In the next example, the following will be added:
Regexes don't have to be precompiled, Perl is very good with regex optimizations.
Instead of extracting key/values with regexes, we use split. It takes an optional third argument that limits the number of resulting fragments.
The whole map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL).
.
sub parse {
my %splitted_values =
map { $_->[0], [split /\s*,\s*/, $_->[1]] }
map {[split /\s*=\s*/, $_, 2]}
grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
#_;
return \%splitted_values; # returning a reference improves efficiency
}
But I think a more elegant solution here is to use a traditional loop:
sub parse {
my %splitted_values;
LINE: for (#_) {
next LINE if !length or /^\s*#/;
s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
my ($key, $vals) = split /\s*=\s*/, $_, 2;
defined $vals or next LINE; # check if $vals was assigned
#{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
}
return \%splitted_values
}
If we decide to pass a filehandle instead, the loop would be replaced with
my $fh = shift;
LOOP: while (<$fh>) {
chomp;
...;
}
which would use an actual iterator.
You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(#args) syntax.
use Carp; # Error handling for writing APIs
sub parse {
my $args = shift;
my $interesting = $args->{interesting} or croak qq("interesting" callback required);
my $kv_splitter = $args->{kv_splitter} or croak qq("kv_splitter" callback required);
my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default
my %splitted_values;
LINE: for (#_) {
next LINE unless $interesting->($_);
s/\A\s*|\s*\z//g;
my ($key, $vals) = $kv_splitter->($_);
defined $vals or next LINE;
$splitted_values{$key} = $val_transform->($vals);
}
return \%splitted_values;
}
This could then be called like
my $data = parse {
interesting => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
kv_splitter => sub { split /\s*=\s*/, $_[0], 2 },
val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, #lines;
I think the most modern approach consists in taking advantage of the CPAN modules. In your example, Config::Properties may helps:
use strict;
use warnings;
use Config::Properties;
my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');
As indicated in the posts linked to by #collapsar, Higher-Order Perl is a great read for exploring functional techniques in Perl.
Here is an example that hits your bullet points:
use strict;
use warnings;
use Data::Dumper;
my #filt_rx = ( qr{^\s*\#},
qr{^[\r\n]+$} );
my $kv_rx = qr{^\s*(\w+)\s*=\s*([^\r\n]*)};
my $spl_rx = qr{\s*,\s*};
my $iterator = sub {
my ($fh) = #_;
return sub {
my $line = readline($fh);
return $line;
};
};
my $filter = sub {
my ($it,#r) = #_;
return sub {
my $line;
do {
$line = $it->();
} while ( defined $line
&& grep { $line =~ m/$_/} #r );
return $line;
};
};
my $kv = sub {
my ($line,$rx) = #_;
return ($line =~ m/$rx/);
};
my $spl = sub {
my ($values,$rx) = #_;
return split $rx, $values;
};
my $it = $iterator->( \*DATA );
my $f = $filter->($it,#filt_rx);
my %map;
while ( my $line = $f->() ) {
my ($k,$v) = $kv->($line,$kv_rx);
$map{$k} = [ $spl->($v,$spl_rx) ];
}
print Dumper \%map;
__DATA__
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
It produces the following hash on the provided input:
$VAR1 = {
'key2' => [
'value1',
'value2',
'value3'
],
'key1' => [
'value'
]
};
you might be interested in this SO question as well as this one.
the following code is a self-contained perl script destined to give you an idea of how to implement in perl (only partially in a functional style; in case you don't revulse seeing the particular coding style and/or language construct, i can refine the solution somewhat).
Miguel Prz is right that in most cases you'd search CPAN for solutions to match your requirements.
my (
$is_interesting_line
, $re_is_comment_line
, $re_key_values
, $re_splitter
);
$re_is_comment_line = qr(^\s*#);
$re_key_values = qr(^\s*(\w+)\s*=\s*(.*)$);
$re_splitter = qr(\s*,\s*);
$is_interesting_line = sub {
my $line = shift;
return (
(!(
!defined($line)
|| ($line eq '')
))
&& ($line =~ /$re_key_values/)
);
};
sub strip {
my $line = shift;
# your implementation goes here
return $line;
}
sub parse {
my #lines = #_;
#
my (
$dict
, $interesting_lines
, $k
, $v
);
#
#$interesting_lines =
grep {
&{$is_interesting_line} ( $_ );
} ( map { strip($_); } #lines )
;
$dict = {};
map {
if ($_ =~ /$re_key_values/) {
($k, $v) = ($1, [split(/$re_splitter/, $2)]);
$$dict{$k} = $v;
}
} #$interesting_lines;
return $dict;
} # parse
#
# sample execution goes here
#
my $parse =<<EOL;
# comment
what = is, this, you, wonder
it = is, perl
EOL
parse ( split (/[\r\n]+/, $parse) );

Mapping values with Column header and row header

I have some files with below data.
sample File 1:
sitename1,2009-07-19,"A1",11975,17.23
sitename1,2009-07-19,"A2",11,0.02
sitename1,2009-07-20,"A1",2000,17.23
sitename1,2009-07-20,"A2",538,0.02
I want to map the values in column 4 with column 2 and 3 as shown below.
Output required.
Site,Type,2009-07-19,2009-07-20
sitename1,"A1",11975,2000
sitename1,"A2",11,538
Here is what I have tried so far:
#! /usr/bin/perl -w
use strict;
use warnings;
my $column_header=["Site,Type"];
my $position={};
my $last_position=0;
my $current_event=[];
my $events=[];
while (<STDIN>) {
my ($site,$date,$type,$value,$percent) = split /[,\n]/, $_;
my $event_key = $date;
if (not defined $position->{$event_key}) {
$last_position+=1;
$position->{$event_key}=$last_position;
push #$column_header,$event_key;
}
my $pos = $position->{$event_key};
if (defined $current_event->[$pos]) {
dumpEvent();
}
if (not defined $current_event->[0]) {
$current_event->[0]="$site,$type";
}
$current_event->[$pos]=$value;
}
dumpEvent();
my $order = [];
for (my $scan=0; $scan<scalar(#$column_header); $scan++) {
push #$order,$scan;
}
printLine($column_header);
map { printLine($_) } #$events;
sub printLine {
my $record=shift;
my #result=();
foreach my $offset (#$order) {
if (defined $record->[$offset]) {
push #result,$record->[$offset];
} else {
push #result,"";
}
}
print join(",",#result)."\n";
}
sub dumpEvent {
return unless defined $current_event->[0];
push #$events,$current_event;
$current_event=[];
}
The output i am getting is as below.
*Site,Type,2009-07-19,2009-07-20*
sitename1,"A1",11975,
sitename1,"A2",11,
sitename1,"A1",,14620
sitename1,"A2",,538
If I understand you correctly (and I have to admit I'm only guessing), you have several types of things at different dates and a value for each. Thus you need a data structure like this hash for each site:
$foo = {
site => 'sitename1',
type => 'A1',
dates => [
{
date => '2009-07-19',
value => 11975,
},
{
date => '2009-07-20',
value => 538,
},
],
};
Is that even close?
The folowing code produces the expected result and makes "some" sense. I don't know if it makes real sense.
my %dates;
my %SiteType;
while (<DATA>) {
chomp;
my ($site,$date,$type,$value,$percent) = split /,/;
$dates{$date} = '1';
push #{$SiteType{"$site,$type"}}, $value ;
};
print 'Site,Type,', join(',', sort keys %dates), "\n";
foreach ( sort keys %SiteType) {
print $_, ',', join(',', #{$SiteType{$_}}), "\n";
};

How can I execute a given function on every element of a complicated data structure in Perl?

I want to decode all the HTML entities in a complicated data structure. Basically I'm looking for a "super map()" function. Here's what I have so far:
sub _html_decode {
my $self = shift;
my $ref = shift;
if (ref($ref) eq "HASH") {
$self->_html_decode_hash($ref)
}
if (ref($ref) eq "ARRAY") {
$self->_html_decode_array($ref);
}
}
sub _html_decode_array {
my $self = shift;
my $ref = shift;
unless (#$ref) {return;}
foreach (0 .. (scalar(#$ref) - 1)) {
if (ref($ref->[$_]) eq "HASH") {
$self->_html_decode_hash($ref->[$_]);
}
if (ref($ref->[$_]) eq "ARRAY") {
$self->_html_decode_array($ref->[$_]);
}
else {
$ref->[$_] = decode_entities($ref->[$_]);
}
}
}
sub _html_decode_hash {
my $self = shift;
my $ref = shift;
unless (%$ref) {return;}
while (my ($k, $v) = each %$ref) {
if (ref($v) eq "HASH") {
$self->_html_decode_hash($v);
}
if (ref($v) eq "ARRAY") {
$self->_html_decode_array($v)
}
else {
$ref->{$k} = decode_entities($v);
}
}
}
I think this should do, but I haven't tested it.
sub _html_decode {
my ($self, $ref) = #_;
if (ref($ref) eq "HASH") {
for my $value (values %{$ref}) {
$self->_html_decode($value);
}
}
elsif (ref($ref) eq "ARRAY") {
for my $value (#{$ref}) {
$self->_html_decode($value);
}
}
else {
$_[1] = decode_entities($_[1]);
}
}
I admit the last part isn't pretty though.
To execute a subroutine on every element of an arbitrary complex data structure, check out the Visitor design pattern. Basically, your data structure is an object that knows what elements it still needs to process and it applies your subroutine to them. There is also a bit of the Iterator pattern there because you figure out how to
I have an example in my Netscape::Bookmarks module. That data structure is deeply nested with several different sorts of objects. By using the Visitor pattern, most of the complexity went away.
Besides that, you might want to check out my Object::Iterate module. It has an imap function that works with objects instead of lists. I stole the idea of __next__ from Python and applied it to Perl.
Data::Rmap seems to do this as well. Does anyone have any experience with this module?