Perl data structure to xml - perl

I have got perl data something like below:
$data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" }, { id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ]
};
I would like to convert this into an xml something like below:
<map>
<item id="1" name="A">
<users>
<user id="1" name="u1"/>
<user id="2" name="u2"/>
</users>
<groups>
<group id="1" name="g1"/>
</groups>
</item>
</map>
I could do that manually creating each line explicitly. However I am looking for any CPAN Module base solution.
I tried XML::Twig but didn't go anywhere. I have used XML::Simple in the past for such thing but this time wanted to try something else as XML::Simple has been getting bad reviews.

You can do it similarly to Sobrique's method but with less hardcoded strings, like this:
#!/usr/bin/env perl
use strict; use warnings;
use XML::Twig;
my $data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" }, { id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ]
};
sub array_to_elts {
my ( $root, $name, $arrayref ) = #_;
map { $root->insert_new_elt($name, $_) } #{ $arrayref };
}
my $twig = XML::Twig
->new()
->set_xml_version("1.0")
->set_encoding('utf-8');
my $map = XML::Twig::Elt->new('map');
$twig->set_root($map);
my $item = $map->insert_new_elt(
'item',
{ id => $data->{'id'}, name => $data->{'name'} },
);
my $lines = $item->insert_new_elt('groups');
my $links = $item->insert_new_elt('users' );
array_to_elts($lines, 'group', $data->{'groups'});
array_to_elts($links, 'user', $data->{'users' });
$twig->set_pretty_print('indented');
$twig->print;
You could go to extreme lengths to reduce the hardcoded vals and base more off the raw data, but it quickly gets harder to read..

"Generic" way using XML::LibXML. You might need to add new code to the "else" part to handle other types of structures.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" },
{ id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ],
};
sub to_xml {
my ($data, $xml) = #_;
for my $entry (keys %$data) {
my $ref = ref $data->{$entry};
if (not $ref) {
$xml->setAttribute($entry, $data->{$entry});
} elsif ('ARRAY' eq $ref) {
(my $name = $entry) =~ s/s$// or die "Can't guess the element name.\n";
my $list = $xml->addNewChild(q(), $entry);
for my $inner (#{ $data->{$entry} }) {
to_xml($inner, $list->addNewChild(q(), $name));
}
} else {
die "Unhandled structure $ref.\n";
}
}
}
my $xml = 'XML::LibXML::Document'->createDocument;
my $root = $xml->createElement('map');
$xml->setDocumentElement($root);
for my $entry ($data) {
my $item = $root->addNewChild(q(), 'item');
to_xml($entry, $item);
}
print $xml;

Yes, wise choice. XML::Simple ... isn't. It's for simple XML.
As noted in the comments though - your data is a little ambiguous - specifically, how do you tell what the elements should be called within 'groups' or 'users'.
This looks like you might have parsed some JSON. (Indeed, you can turn it straight back into JSON:
print to_json ( $data, { pretty => 1 } );
The core problem is - where JSON supports arrays, XML does not. So there's really very little you could do that will directly turn your data structure into XML.
However if you don't mind doing a bit of work yourself:
Here's how you assemble some XML using XML::Twig
Assembling XML in Perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->set_root(
XML::Twig::Elt->new(
'map',
)
);
my $item = $twig->root->insert_new_elt('item', { 'id' => 1, 'name' => 'A' } );
my $users = $item ->insert_new_elt( 'users' );
$users -> insert_new_elt ( 'user', { 'id' => 1, 'name' => 'u1' } );
$users -> insert_new_elt ( 'user', { 'id' => 2, 'name' => 'u2' } );
my $groups = $item -> insert_new_elt ('last_child', 'groups');
$groups -> insert_new_elt ( 'group', { 'id' => 1, 'name' => 'g1' } );
$twig->set_xml_version("1.0");
$twig->set_encoding('utf-8');
$twig->print;
Which prints:
<?xml version="1.0" encoding="utf-8"?>
<map>
<item id="1" name="A">
<users>
<user id="2" name="u2"/>
<user id="1" name="u1"/>
</users>
<groups>
<group id="1" name="g1"/>
</groups>
</item>
</map>
Iterating your data structure is left as an exercise for the reader.
As Borodin correctly notes - you have no way to infer map item group or user from your data structure. The latter two you can perhaps infer based on plurals, but given your data set, the best I can come up with is something like this:
use strict;
use warnings;
use XML::Twig;
my $data = {
id => 1,
name => "A",
users => [ { id => 1, name => "u1" }, { id => 2, name => "u2" } ],
groups => [ { id => 1, name => "g1" } ]
};
my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->set_root( XML::Twig::Elt->new( 'map', ) );
my $item = $twig->root->insert_new_elt('item');
foreach my $key ( keys %$data ) {
if ( not ref $data->{$key} ) {
$item->set_att( $key, $data->{$key} );
next;
}
if ( ref( $data->{$key} ) eq "ARRAY" ) {
my $fakearray = $item->insert_new_elt($key);
foreach my $element ( #{ $data->{$key} } ) {
my $name = $key;
$name =~ s/s$//g;
$fakearray->insert_new_elt( $name, $element );
}
next;
}
if ( ref ( $data -> {$key} ) eq "HASH" ) {
$item -> insert_new_elt( $key, $data -> {$key} );
next;
}
}
$twig->set_xml_version("1.0");
$twig->set_encoding('utf-8');
$twig->print;
This isn't ideal because - map is hardcoded, as is item. And I take the very simplistic approach of assuming the array has an s on the end, to pluralise it.

Related

Extract subset of XML with XML::Twig

I'm trying to use
XML::Twig
to extract a subset of an XML document so that I can convert it to CSV.
Here's a sample of my data
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Actions>
<Click>
<Field1>Data1</Field1>
<Field2>Data2</Field2>
</Click>
<Click>
<Field1>Data3</Field1>
<Field2>Data4</Field2>
</Click>
</Actions>
And here's an attempt at coding the desired outcome
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Text::CSV; # later
use Data::Dumper;
my $file = shift #ARGV or die "Need a file to process: $!";
my $twig = XML::Twig->new();
$twig->parsefile($file);
my $root = $twig->root;
my #data;
for my $node ( $twig->findnodes( '//Click/*' ) ) {
my $key = $node->name;
my $val = $node->text;
push #data, { $key => $val }
}
print Dumper \#data;
which gives
$VAR1 = [
{
'Field1' => 'Data1'
},
{
'Field2' => 'Data2'
},
{
'Field1' => 'Data3'
},
{
'Field2' => 'Data4'
}
];
What I'm looking to create is an array of hashes, if that's best
my #AoH = (
{ Field1 => 'Data1', Field2 => 'Data2' },
{ Field1 => 'Data3', Field2 => 'Data4' },
)
I'm not sure how to loop through the data to extract this.
You structure has two levels, so you need two levels of loops.
my #data;
for my $click_node ( $twig->findnodes( '/Actions/Click' ) ) {
my %click_data;
for my $child_node ( $click_node->findnodes( '*' ) ) {
my $key = $child_node->name;
my $val = $child_node->text;
$click_data{$key} = $val;
}
push #data, \%click_data;
}
local $Data::Dumper::Sortkeys = 1;
print(Dumper(\#data));
Output:
$VAR1 = [
{
'Field1' => 'Data1',
'Field2' => 'Data2'
},
{
'Field1' => 'Data3',
'Field2' => 'Data4'
}
];

executing a function within an array within a hash in perl

I have a Perl data structurte like so
%myhash = (
k1 => v1,
kArray => [
{
name => "anonymous hash",
...
},
\&funcThatReturnsHash,
{
name => "another anonymous hash",
...
}
]
);
Elsewhere I iterate through the list in kArray which contains a bunch of hashes. I would like to either process the actual hash OR the hash returned by the function.
foreach my $elem( #{myhash{kArray}} ) {
if (ref($elem) == "CODE") {
%thisHash = &$elem;
}
else {
%thisHash = %$elem;
}
...
}
However ref ($elem) is always scalar or undefined. I tried func, &func, \&func, \%{&func}, in %myhash to no effect.
how do I extract the hash within the function in the main body?
Apart from the code sample you give being invalid Perl, the main problems seem to be that you are using == to compare strings instead of eq, and you are assigning a hash reference to a hash variable %thishash. I assure you that ref $elem never returns SCALAR with the data you show
It would help you enormously if you followed the common advice to use strict and use warnings at the top of your code
This will work for you
for my $elem ( #{ $myhash{kArray} } ) {
my $this_hash;
if ( ref $elem eq 'CODE' ) {
$this_hash = $elem->();
}
else {
$this_hash = $elem;
}
# Do stuff with $this_hash
}
or you could just use a map like this
use strict;
use warnings;
use 5.010;
use Data::Dump;
my %myhash = (
k1 => v1,
kArray => [
{
name => "anonymous hash",
},
\&funcThatReturnsHash,
{
name => "another anonymous hash",
}
]
);
for my $hash ( map { ref eq 'CODE' ? $_->() : $_ } #{ $myhash{kArray} } ) {
say $hash->{name};
}
sub funcThatReturnsHash {
{ name => 'a third anonymous hash' };
}
output
anonymous hash
a third anonymous hash
another anonymous hash
If you turn on strict and warnings, you'll see that:
foreach my $elem(#{mynahs{kArray}}) {
Isn't valid. You need at the very least a $ before mynahs.
But given something like this - your approach works - here's an example using map to 'run' the code references:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
sub gimme_hash {
return { 'fish' => 'paste' };
}
my $stuff =
[ { 'anon1' => 'value' },
\&gimme_hash,
{ 'anon2' => 'anothervalue' }, ];
my $newstuff = [ map { ref $_ eq "CODE" ? $_->() : $_ } #$stuff ];
print Dumper $newstuff;
Turns that hash into:
$VAR1 = [
{
'anon1' => 'value'
},
{
'fish' => 'paste'
},
{
'anon2' => 'anothervalue'
}
];
But your approach does work:
foreach my $element ( #$stuff ) {
my %myhash;
if ( ref $element eq "CODE" ) {
%myhash = %{$element -> ()};
}
else {
%myhash = %$element;
}
print Dumper \%myhash;
}
Gives:
$VAR1 = {
'anon1' => 'value'
};
$VAR1 = {
'fish' => 'paste'
};
$VAR1 = {
'anon2' => 'anothervalue'
};

How can I find which keys in a Perl multi-level hash correspond to a given value?

I have a data structure which looks like this:
my %hoh = (
'T431567' => {
machin => '01',
bidule => '02',
truc => '03',
},
'T123456' => {
machin => '97',
bidule => '99',
truc => '69',
},
'T444444' => {
machin => '12',
bidule => '64',
truc => '78',
},
);
I want to search the various values of truc for a particular value and find the top-level attribute which corresponds to that entry. For example, looking for a value of 78, I want to find the result 'T444444', because $hoh{T444444}{truc} is 78.
How can I do this, please?
You can do this with grep:
my #keys = grep { $hoh{$_}{truc} == 78 } keys %hoh;
Note that this can return more than one key, if there are duplicate values in the hash. Also note that this is not particularly efficient, since it has to search the entire hash. In most cases it's probably fine, but if the hash can be very large and you may need to run lots of such queries against it, it may be more efficient to build a reverse index as suggested by Sobrique:
my %trucs;
foreach my $part (keys %hoh) {
my $val = $hoh{$part}{truc};
push #{ $trucs{$val} }, $part;
}
my #keys = #{ $trucs{78} };
or, more generally:
my %index;
foreach my $part (keys %hoh) {
my %data = %{ $hoh{$part} };
foreach my $key (keys %data) {
my $val = $data{$key};
push #{ $index{$key}{$val} }, $part;
}
}
my #keys = #{ $index{truc}{78} };
Can't with that data structure as is - There is no 'backwards' relationship from value to key without you creating it.
You've two options - run a search, or create an 'index'. Practically speaking, these are the same, just one saves the results.
my %index;
foreach my $key ( keys %hoh ) {
my $truc = $hoh{$key}{'truc'};
$index{$truc} = $key;
}
Note - won't do anything clever if the 'truc' numbers are duplicated - it'll overwrite. (Handling this is left as an exercise to the reader).
This solution is similar to those already posted, but it uses the each operator to process the original hash in fewer lines of code, and probably more quickly.
I have added the dump output only so that you can see the form of the structure that is built.
use strict;
use warnings;
my %hoh = (
T123456 => { bidule => '99', machin => '97', truc => '69' },
T431567 => { bidule => '02', machin => '01', truc => '03' },
T444444 => { bidule => '64', machin => '12', truc => '78' },
);
my %trucs;
while ( my ($key, $val) = each %hoh ) {
next unless defined( my $truc = $val->{truc} );
push #{ $trucs{$truc} }, $key ;
}
use Data::Dump;
dd \%trucs;
print "\n";
print "$_\n" for #{ $trucs{78} };
output
{ "03" => ["T431567"], "69" => ["T123456"], "78" => ["T444444"] }
T444444
If you can guarantee that the answer is unique, i.e. that there is never more than one element of the original hash that has a given value for the truc entry, or you are interested only in the last one found, then you can write this still more neatly
my %trucs;
while ( my ($key, $val) = each %hoh ) {
next unless defined( my $truc = $val->{truc} );
$trucs{$truc} = $key ;
}
print $trucs{78}, "\n";
output
T444444
Simplest of all, if there is always a truc entry in each second-level hash, and its values is guaranteed to be unique, then this will do the job
my %trucs = map { $hoh{$_}{truc} => $_ } keys %hoh;
print $trucs{78}, "\n";
with the output as above.

Use Mango to search MongoDB by multiple ObjectIDs

How can I find multiple MongoDB documents by their ObjectIDs using Mango in one go. I currently have a non-blocking sub which I'm using to filter out the docs that don't match in a grep statement but was wondering if it's possible to pass the ObjectIDs (maybe an array of instances of Mango::BSON::ObjectID) to the find method?
I also think it's not very efficient this way since big collections will bare a hefty price!
#items;
$self->mango->db->collection('items')->find()->all(sub {
my ($collection, $err, $items) = #_;
return $self->render_exception($err) if $err;
my #oids = $self->req->params->param('ids[]');
foreach my $item (#$items) {
push #items, $item if grep (/$item->{_id}/, #oids);
}
$self->render(json => {items => \#items});
});
This sub is inside one of my Mojolicous controllers which responds to a JSON call.
I'm using the following:
Mojolicous 4.91
Mango 0.24
MongoDB 2.4.9
Thanks in advance.
Update
I have applied Neil's logic and now it works fine.
my #oids = map { Mango::BSON::ObjectID->new($_) } ($self->req->params->param('ids[]'));
print Dumper(#oids),"\n";
$self->mango->db->collection('items')->find({ "_id" => {'$in' => \#oids} })->all(sub {
my ($collection, $err, $items) = #_;
return $self->render_exception($err) if $err;
print Dumper($items),"\n";
$self->render(json => {items => $items});
});
The dumped OIDs are as follows:
$VAR1 = bless( {
'oid' => '52faf6de10d041d196cb545b'
}, 'Mango::BSON::ObjectID' );
$VAR2 = bless( {
'oid' => '5300409310d041d196cb545d'
}, 'Mango::BSON::ObjectID' );
and the found objects will look something like this (depending on your structure):
$VAR1 = [
{
'_id' => bless( {
'oid' => '52faf6de10d041d196cb545b'
}, 'Mango::BSON::ObjectID' ),
'class' => [
'Sport'
],
'make' => '4321',
'year' => 2012
},
{
'_id' => bless( {
'oid' => '5300409310d041d196cb545d'
}, 'Mango::BSON::ObjectID' ),
'class' => [
'Classic'
],
'make' => '1234',
'year' => 2014
}
]
You seem to be looking for the $in operator. So rather than looping all the items to find the ones that match you pass in the _id values sent to your controller within the call to find():
my #oids = $self->req->params->param('ids[]');
$self->mango->db->collection('items')
->find({ "_id" => { '$in' => \#oids } })->all(sub {
my ($collection, $err, $items) = #_;
return $self->render_exception($err) if $err;
$self->render(json => {items => $items});
});
Update
Actually had some time to test this with Mango and managed to get a working case:
{ "_id" : ObjectId("5345faf8a1f97e61848485e8"), "prerequisites" : [ "a" ] }
{ "_id" : ObjectId("5345fafea1f97e61848485e9"), "prerequisites" : [ "a", "b" ] }
{ "_id" : ObjectId("5345fb02a1f97e61848485ea"), "prerequisites" : [ "a", "c" ] }
And then running with the code:
#!/usr/bin/env perl
use Modern::Perl;
use EV;
use AnyEvent;
use Data::Dumper;
use Mango;
use Mango::BSON::ObjectID;
my $mango = Mango->new();
my $cv = AE::cv;
my $col = $mango->db('test')->collection('courses');
my #oids = (
"5345fafea1f97e61848485e9",
"5345fb02a1f97e61848485ea"
);
#oids = map { Mango::BSON::ObjectID->new($_) } #oids;
$col->find({ '_id' => { '$in' => \#oids } })->all( sub {
my ( $cursor, $err, $docs ) = #_;
$cv->send( Dumper( $docs ) );
});
say $cv->recv;
So that actually will select the correct documents from the sample, and it does seem that the ObjectID values need to be passed in that way in order to match.
At least there should be enough there to debug from.

Perl adding Lines into a Multi-Dimensional Hash

Hello I want to split a Line and add the Values in to a multi dimensional Hash. This is how the Lines look like:
__DATA__
49839382;Test1;bgsae;npvxs
49839384;Test2;bgsae;npvxs
49839387;Test3;bgsae;npvxs
So what I am doing now is:
my %prefix = map { chomp; split ';' } <DATA>;
But now I can only access Test1 with:
print $prefix{"49839382"}
But how can I also add the bgsae to the Hash so I can access is with
$prefix{"49839382"}{"Test1"}
Thank you for your help.
What structure are you trying to build?
use Data::Dumper;
my %prefix = map { chomp (my #fields = split /;/); $fields[0] => { #fields[1 .. $#fields] } } <DATA>;
print Dumper \%prefix;
Output:
$VAR1 = {
'49839384' => {
'Test2' => 'bgsae',
'npvxs' => undef
},
'49839382' => {
'Test1' => 'bgsae',
'npvxs' => undef
},
'49839387' => {
'npvxs' => undef,
'Test3' => 'bgsae'
}
};
Or do you need a deeper hash?
my %prefix;
for (<DATA>) {
chomp;
my $ref = \%prefix;
for (split /;/) {
warn "[$_]";
$ref->{$_} = {};
$ref = $ref->{$_};
}
}
Returns:
$VAR1 = {
'49839384' => {
'Test2' => {
'bgsae' => {
'npvxs' => {}
}
}
},
'49839382' => {
'Test1' => {
'bgsae' => {
'npvxs' => {}
}
}
},
'49839387' => {
'Test3' => {
'bgsae' => {
'npvxs' => {}
}
}
}
};
I don't know what you need the data for, but at a guess you want something more like this.
It builds a hash of arrays, using the first field as the key for the data, and the remaining three in an array for the value. So you can access the test number as $data{'49839382'}[0] etc.
use strict;
use warnings;
my %data = map {
chomp;
my #fields = split /;/;
shift #fields => \#fields;
} <DATA>;
use Data::Dumper;
print Data::Dumper->Dump([\%data], ['*data']);
__DATA__
49839382;Test1;bgsae;npvxs
49839384;Test2;bgsae;npvxs
49839387;Test3;bgsae;npvxs
output
%data = (
'49839384' => [
'Test2',
'bgsae',
'npvxs'
],
'49839382' => [
'Test1',
'bgsae',
'npvxs'
],
'49839387' => [
'Test3',
'bgsae',
'npvxs'
]
);