Extract subset of XML with XML::Twig - perl

I'm trying to use
XML::Twig
to extract a subset of an XML document so that I can convert it to CSV.
Here's a sample of my data
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Actions>
<Click>
<Field1>Data1</Field1>
<Field2>Data2</Field2>
</Click>
<Click>
<Field1>Data3</Field1>
<Field2>Data4</Field2>
</Click>
</Actions>
And here's an attempt at coding the desired outcome
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Text::CSV; # later
use Data::Dumper;
my $file = shift #ARGV or die "Need a file to process: $!";
my $twig = XML::Twig->new();
$twig->parsefile($file);
my $root = $twig->root;
my #data;
for my $node ( $twig->findnodes( '//Click/*' ) ) {
my $key = $node->name;
my $val = $node->text;
push #data, { $key => $val }
}
print Dumper \#data;
which gives
$VAR1 = [
{
'Field1' => 'Data1'
},
{
'Field2' => 'Data2'
},
{
'Field1' => 'Data3'
},
{
'Field2' => 'Data4'
}
];
What I'm looking to create is an array of hashes, if that's best
my #AoH = (
{ Field1 => 'Data1', Field2 => 'Data2' },
{ Field1 => 'Data3', Field2 => 'Data4' },
)
I'm not sure how to loop through the data to extract this.

You structure has two levels, so you need two levels of loops.
my #data;
for my $click_node ( $twig->findnodes( '/Actions/Click' ) ) {
my %click_data;
for my $child_node ( $click_node->findnodes( '*' ) ) {
my $key = $child_node->name;
my $val = $child_node->text;
$click_data{$key} = $val;
}
push #data, \%click_data;
}
local $Data::Dumper::Sortkeys = 1;
print(Dumper(\#data));
Output:
$VAR1 = [
{
'Field1' => 'Data1',
'Field2' => 'Data2'
},
{
'Field1' => 'Data3',
'Field2' => 'Data4'
}
];

Related

Perl data structure to xml

I have got perl data something like below:
$data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" }, { id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ]
};
I would like to convert this into an xml something like below:
<map>
<item id="1" name="A">
<users>
<user id="1" name="u1"/>
<user id="2" name="u2"/>
</users>
<groups>
<group id="1" name="g1"/>
</groups>
</item>
</map>
I could do that manually creating each line explicitly. However I am looking for any CPAN Module base solution.
I tried XML::Twig but didn't go anywhere. I have used XML::Simple in the past for such thing but this time wanted to try something else as XML::Simple has been getting bad reviews.
You can do it similarly to Sobrique's method but with less hardcoded strings, like this:
#!/usr/bin/env perl
use strict; use warnings;
use XML::Twig;
my $data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" }, { id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ]
};
sub array_to_elts {
my ( $root, $name, $arrayref ) = #_;
map { $root->insert_new_elt($name, $_) } #{ $arrayref };
}
my $twig = XML::Twig
->new()
->set_xml_version("1.0")
->set_encoding('utf-8');
my $map = XML::Twig::Elt->new('map');
$twig->set_root($map);
my $item = $map->insert_new_elt(
'item',
{ id => $data->{'id'}, name => $data->{'name'} },
);
my $lines = $item->insert_new_elt('groups');
my $links = $item->insert_new_elt('users' );
array_to_elts($lines, 'group', $data->{'groups'});
array_to_elts($links, 'user', $data->{'users' });
$twig->set_pretty_print('indented');
$twig->print;
You could go to extreme lengths to reduce the hardcoded vals and base more off the raw data, but it quickly gets harder to read..
"Generic" way using XML::LibXML. You might need to add new code to the "else" part to handle other types of structures.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" },
{ id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ],
};
sub to_xml {
my ($data, $xml) = #_;
for my $entry (keys %$data) {
my $ref = ref $data->{$entry};
if (not $ref) {
$xml->setAttribute($entry, $data->{$entry});
} elsif ('ARRAY' eq $ref) {
(my $name = $entry) =~ s/s$// or die "Can't guess the element name.\n";
my $list = $xml->addNewChild(q(), $entry);
for my $inner (#{ $data->{$entry} }) {
to_xml($inner, $list->addNewChild(q(), $name));
}
} else {
die "Unhandled structure $ref.\n";
}
}
}
my $xml = 'XML::LibXML::Document'->createDocument;
my $root = $xml->createElement('map');
$xml->setDocumentElement($root);
for my $entry ($data) {
my $item = $root->addNewChild(q(), 'item');
to_xml($entry, $item);
}
print $xml;
Yes, wise choice. XML::Simple ... isn't. It's for simple XML.
As noted in the comments though - your data is a little ambiguous - specifically, how do you tell what the elements should be called within 'groups' or 'users'.
This looks like you might have parsed some JSON. (Indeed, you can turn it straight back into JSON:
print to_json ( $data, { pretty => 1 } );
The core problem is - where JSON supports arrays, XML does not. So there's really very little you could do that will directly turn your data structure into XML.
However if you don't mind doing a bit of work yourself:
Here's how you assemble some XML using XML::Twig
Assembling XML in Perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->set_root(
XML::Twig::Elt->new(
'map',
)
);
my $item = $twig->root->insert_new_elt('item', { 'id' => 1, 'name' => 'A' } );
my $users = $item ->insert_new_elt( 'users' );
$users -> insert_new_elt ( 'user', { 'id' => 1, 'name' => 'u1' } );
$users -> insert_new_elt ( 'user', { 'id' => 2, 'name' => 'u2' } );
my $groups = $item -> insert_new_elt ('last_child', 'groups');
$groups -> insert_new_elt ( 'group', { 'id' => 1, 'name' => 'g1' } );
$twig->set_xml_version("1.0");
$twig->set_encoding('utf-8');
$twig->print;
Which prints:
<?xml version="1.0" encoding="utf-8"?>
<map>
<item id="1" name="A">
<users>
<user id="2" name="u2"/>
<user id="1" name="u1"/>
</users>
<groups>
<group id="1" name="g1"/>
</groups>
</item>
</map>
Iterating your data structure is left as an exercise for the reader.
As Borodin correctly notes - you have no way to infer map item group or user from your data structure. The latter two you can perhaps infer based on plurals, but given your data set, the best I can come up with is something like this:
use strict;
use warnings;
use XML::Twig;
my $data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" }, { id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ]
};
my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->set_root( XML::Twig::Elt->new( 'map', ) );
my $item = $twig->root->insert_new_elt('item');
foreach my $key ( keys %$data ) {
if ( not ref $data->{$key} ) {
$item->set_att( $key, $data->{$key} );
next;
}
if ( ref( $data->{$key} ) eq "ARRAY" ) {
my $fakearray = $item->insert_new_elt($key);
foreach my $element ( #{ $data->{$key} } ) {
my $name = $key;
$name =~ s/s$//g;
$fakearray->insert_new_elt( $name, $element );
}
next;
}
if ( ref ( $data -> {$key} ) eq "HASH" ) {
$item -> insert_new_elt( $key, $data -> {$key} );
next;
}
}
$twig->set_xml_version("1.0");
$twig->set_encoding('utf-8');
$twig->print;
This isn't ideal because - map is hardcoded, as is item. And I take the very simplistic approach of assuming the array has an s on the end, to pluralise it.

How to use selectall-hashref for all columns

I would like to implement this subroutine using selectall_hashref:
sub query {
use SQL::Abstract;
my $sql = SQL::Abstract->new;
my ($table, $fields, $where) = #_;
my ($stmt, #bind) = $sql->select($table, $fields, $where);
my $sth = $dbh->prepare($stmt);
$sth->execute(#bind);
my #rows;
while(my #row = $sth->fetchrow_array() ) {
my %data;
#data{ #{$sth->{NAME}} } = #row;
push #rows, \%data;
}
return \#rows;
}
Unfortunately selectall_hashref requires a list of wanted columns. Is there a way to write something similar my first subroutine?
Obviously this doesn't work:
sub query {
return $dbh->selectall_hashref(shift, q/*/);
}
The expected output could be an array of hashes or an hash of hashes:
{ '1' => { column1 => 'foo', column2 => 'bar' },
'2' => { column1 => '...', column2 => '...' },
... }
or
[ { column1 => 'foo', column2 => 'bar' },
{ column1 => '...', column2 => '...' },
... ]
What you want is selectall_arrayref, not selectall_hashref. That does this exactly.
use DBI;
use Data::Printer;
my $dbh = DBI->connect('DBI:mysql:database=foo;', 'foo', 'bar');
my $foo = $dbh->selectall_arrayref(
'select * from foo',
{ Slice => {} }
);
p $foo
__END__
\ [
[0] {
id 1,
baz "",
},
[1] {
id 2,
baz "",
},
]

Converting HoA to HoH with counting

Have this code:
use 5.020;
use warnings;
use Data::Dumper;
my %h = (
k1 => [qw(aa1 aa2 aa1)],
k2 => [qw(ab1 ab2 ab3)],
k3 => [qw(ac1 ac1 ac1)],
);
my %h2;
for my $k (keys %h) {
$h2{$k}{$_}++ for (#{$h{$k}});
}
say Dumper \%h2;
produces:
$VAR1 = {
'k1' => {
'aa2' => 1,
'aa1' => 2
},
'k3' => {
'ac1' => 3
},
'k2' => {
'ab1' => 1,
'ab3' => 1,
'ab2' => 1
}
};
Is possible to write the above code with "another way"? (e.g. simpler or more compact)?
Honestly, I don't like the number of times $h2{$k} is evaluated.
my %h2;
for my $k (keys %h) {
my $src = $h{$k};
my $dst = $h2{$k} = {};
++$dst->{$_} for #$src;
}
A subroutine can help make the intent more obvious. Maybe.
sub counts { my %c; ++$c{$_} for #_; \%c }
$h2{$_} = counts(#{ $h{$_} }) for keys %h;
That can be simplified if you do the change in-place.
sub counts { my %c; ++$c{$_} for #_; \%c }
$_ = counts(#$_) for values %h;

Perl: Sorting hash of hash by value descending order

data :
%HoH => (
abc => {
value => "12",
},
xyz => {
number => "100",
},
pqr => {
digit => "5",
}
)
How do I sort the hash of hash by value in descending order?
Output
100
12
5
You can't sort a hash, it won't hold the order. If you wanted to keep them sorted, you'll have to sort the keys based on the number and store the keys in an array.
#!/usr/bin/perl
use strict;
use warnings;
my %HoH = (
abc => { value => 12 },
xyz => { value => 100},
pqr => { value => 5},
def => { value => 15},
hij => { value => 30},
);
my #sorted_keys = map { $_->[0] }
sort { $b->[1] <=> $a->[1] } # use numeric comparison
map { my $temp;
if ( exists $HoH{$_}{'value'} ) {
$temp = $HoH{$_}{'value'};
} elsif ( exists $HoH{$_}{'number'} ) {
$temp = $HoH{$_}{'number'};
} elsif ( exists $HoH{$_}{'digit'} ) {
$temp = $HoH{$_}{'digit'};
} else {
$temp = 0;
}
{[$_, $temp]} }
(keys %HoH);
for my $key (#sorted_keys) {
my $temp;
if ( exists $HoH{$key}{'value'} ) {
$temp = $HoH{$key}{'value'};
} elsif ( exists $HoH{$key}{'number'} ) {
$temp = $HoH{$key}{'number'};
} elsif ( exists $HoH{$key}{'digit'} ) {
$temp = $HoH{$key}{'digit'};
} else {
$temp = 0;
}
print $key . ":" . $temp ."\n";
}
Output:
xyz:100
hij:30
def:15
abc:12
pqr:5
This technique to do the sorting is called Schwartzian Transform.
Given you're not actually using the keys for anything, you can flatten the data structure into a single array and then sort it:
use strict;
use warnings;
my %HoH = (
abc => {value => "12",},
xyz => {number => "100",},
pqr => {digit => "5",},
);
my #numbers = sort {$b <=> $a} map {values %$_} values %HoH;
print "$_\n" for #numbers;
Outputs:
100
12
5
However, if you want to use the additional key information, then you'll need fold your Hash of Hash into an array, and then you can sort however you like:
my #array;
while (my ($k, $ref) = each %HoH) {
while (my ($k2, $v) = each %$ref) {
push #array, [$k, $k2, $v];
}
}
#array = sort {$b->[2] <=> $a->[2]} #array;
use Data::Dump;
dd \#array;
Outputs:
[
["xyz", "number", 100],
["abc", "value", 12],
["pqr", "digit", 5],
]
I came up with this solution
#!/usr/bin/perl
use strict;
use warnings;
my %HoH = (
abc => {
value => "12",
},
xyz => {
number => "100",
},
pqr => {
digit => "5",
}
);
my %rever;
for my $TopKey(keys %HoH){
for my $value(values %{ $HoH{$TopKey} }){
push #{ $rever{$value} }, $TopKey;
}
}
my #nums = sort {$b <=> $a} (keys(%rever));
print $_, "\n" for #nums;
I reversed the values in case you still needed to use the key names.
This is how it looks after using Dumper.
$VAR1 = '100';
$VAR2 = [
'xyz'
];
$VAR3 = '12';
$VAR4 = [
'abc'
];
$VAR5 = '5';
$VAR6 = [
'pqr'
];

Perl adding Lines into a Multi-Dimensional Hash

Hello I want to split a Line and add the Values in to a multi dimensional Hash. This is how the Lines look like:
__DATA__
49839382;Test1;bgsae;npvxs
49839384;Test2;bgsae;npvxs
49839387;Test3;bgsae;npvxs
So what I am doing now is:
my %prefix = map { chomp; split ';' } <DATA>;
But now I can only access Test1 with:
print $prefix{"49839382"}
But how can I also add the bgsae to the Hash so I can access is with
$prefix{"49839382"}{"Test1"}
Thank you for your help.
What structure are you trying to build?
use Data::Dumper;
my %prefix = map { chomp (my #fields = split /;/); $fields[0] => { #fields[1 .. $#fields] } } <DATA>;
print Dumper \%prefix;
Output:
$VAR1 = {
'49839384' => {
'Test2' => 'bgsae',
'npvxs' => undef
},
'49839382' => {
'Test1' => 'bgsae',
'npvxs' => undef
},
'49839387' => {
'npvxs' => undef,
'Test3' => 'bgsae'
}
};
Or do you need a deeper hash?
my %prefix;
for (<DATA>) {
chomp;
my $ref = \%prefix;
for (split /;/) {
warn "[$_]";
$ref->{$_} = {};
$ref = $ref->{$_};
}
}
Returns:
$VAR1 = {
'49839384' => {
'Test2' => {
'bgsae' => {
'npvxs' => {}
}
}
},
'49839382' => {
'Test1' => {
'bgsae' => {
'npvxs' => {}
}
}
},
'49839387' => {
'Test3' => {
'bgsae' => {
'npvxs' => {}
}
}
}
};
I don't know what you need the data for, but at a guess you want something more like this.
It builds a hash of arrays, using the first field as the key for the data, and the remaining three in an array for the value. So you can access the test number as $data{'49839382'}[0] etc.
use strict;
use warnings;
my %data = map {
chomp;
my #fields = split /;/;
shift #fields => \#fields;
} <DATA>;
use Data::Dumper;
print Data::Dumper->Dump([\%data], ['*data']);
__DATA__
49839382;Test1;bgsae;npvxs
49839384;Test2;bgsae;npvxs
49839387;Test3;bgsae;npvxs
output
%data = (
'49839384' => [
'Test2',
'bgsae',
'npvxs'
],
'49839382' => [
'Test1',
'bgsae',
'npvxs'
],
'49839387' => [
'Test3',
'bgsae',
'npvxs'
]
);