XML::Simple, XML nodes turned into value of 'name' nodes - perl

I am using Perl with XML::Simple to convert a hash into an XML document.
My script looks as follows:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml_simple = XML::Simple->new( NoAttr => 1,
KeepRoot => 1);
my $hash = { output => { 'products' => [ { 'product' => { 'titel' => 'ABN AMRO Bank hypotheken',
'owner' => 'ABN AMRO Hypotheken Groep',
'code' => 'ABN AMRO BANK R' } },
{ 'product' => { 'titel' => 'Aegon',
'owner' => 'AEGON Hypotheken',
'code' => 'AEGON pilot' } } ],
'date' => '2012-02-20'} };
my $xml = $xml_simple->XMLout( $hash );
print Dumper( $xml );
The output I am getting is:
<output>
<date>2012-02-20</date>
<products>
<name>product</name>
<code>ABN AMRO BANK R</code>
<owner>ABN AMRO Hypotheken Groep</owner>
<titel>ABN AMRO Bank hypotheken</titel>
</products>
<products>
<name>product</name>
<code>AEGON pilot</code>
<owner>AEGON Hypotheken</owner>
<titel>Aegon</titel>
</products>
</output>
but what I am looking for is this (see the 'product' nodes):
<output>
<date>2012-02-20</date>
<products>
<product>
<code>ABN AMRO BANK R</code>
<owner>ABN AMRO Hypotheken Groep</owner>
<titel>ABN AMRO Bank hypotheken</titel>
</product>
<product>
<code>AEGON pilot</code>
<owner>AEGON Hypotheken</owner>
<titel>Aegon</titel>
</product>
</products>
</output>
Is this doable with XML::Simple or should I use a different module?

You can let XML::Simple tell you what data structure it wants
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $desired_xml = << 'END';
<myxml>
<output>
<date>2012-02-20</date>
<products>
<product>
<code>ABN AMRO BANK R</code>
<owner>ABN AMRO Hypotheken Groep</owner>
<titel>ABN AMRO Bank hypotheken</titel>
</product>
<product>
<code>AEGON pilot</code>
<owner>AEGON Hypotheken</owner>
<titel>Aegon</titel>
</product>
</products>
</output>
</myxml>
END
#print $desired_xml;
my $xml_simple = XML::Simple->new(
NoAttr => 1,
KeepRoot => 1
);
#my $hash = XMLin( $desired_xml, forcearray => 1 );
my $hash = {
output => [
{
date => ["2012-02-20"],
products => [
{
product => [
{
code => ["ABN AMRO BANK R"],
owner => ["ABN AMRO Hypotheken Groep"],
titel => ["ABN AMRO Bank hypotheken"],
},
{
code => ["AEGON pilot"],
owner => ["AEGON Hypotheken"],
titel => ["Aegon"],
},
],
},
],
},
],
};
#print Data::Dumper->Dump( [$hash], [qw(hash)] );
my $xml = $xml_simple->XMLout($hash);
print Data::Dumper->Dump( [$xml], [qw(xml)] );

Related

Data::Dumper::Freezer proper usage

I'm trying to log data structures in an old and big Perl project. In order to do so, I use Data::Dumper, however, some structures are a bit too large and spam the log. So I'm looking for a way to log them in a less verbose manner.
Now Dumper's doc mentions $Data::Dumper::Freezer = <method_name> variable that can be used to fix that. I've tried using that.
Adding a serializer method that returns "shortened" value results in nothing, though the method gets called. Making the serializer method act on $_[0] causes the needed effect, but spoils the original data structure.
I'm confused. What am I doing wrong? How can I fix it?
Here's a refined sample code:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Indent = 0;
$\="\n";
my $x = Foo->new ( answer => 42, use => "force" );
my $y = { foo => $x };
print "initial plain:\n\t", Dumper( $x );
print "initial compound:\n\t", Dumper( $y );
{
local $Data::Dumper::Freezer = 'freeze_pure';
print "still not abbreviated data:\n\t", Dumper( $y );
};
{
local $Data::Dumper::Freezer = 'freeze_replace';
print "abbreviated data:\n\t", Dumper( $y );
};
print "initial data is still intact:\n\t", Dumper( $x );
print "compound data is corrupted:\n\t", Dumper( $y );
package Foo;
sub new {
my $class = shift;
return bless { #_ }, $class;
};
sub freeze_pure {
my $self = $_[0];
warn "# In freeze_pure";
return bless {
values => join ",", values %$self
}, (ref $self) . "::short";
};
sub freeze_replace {
my $self = $_[0];
warn "# In freeze_replace";
$_[0] = bless {
values => join ",", values %$self
}, (ref $self) . "::short";
return;
};
And output:
initial plain:
$VAR1 = bless( {'use' => 'force','answer' => 42}, 'Foo' );
initial compound:
$VAR1 = {'foo' => bless( {'use' => 'force','answer' => 42}, 'Foo' )};
# In freeze_pure at dumper-freezer.pl line 36.
still not abbreviated data:
$VAR1 = {'foo' => bless( {'use' => 'force','answer' => 42}, 'Foo' )};
# In freeze_replace at dumper-freezer.pl line 42.
abbreviated data:
$VAR1 = {'foo' => bless( {'values' => 'force,42'}, 'Foo::short' )};
initial data is still intact:
$VAR1 = bless( {'use' => 'force','answer' => 42}, 'Foo' );
compound data is corrupted:
$VAR1 = {'foo' => bless( {'values' => 'force,42'}, 'Foo::short' )};
Although the documentation is a bit sparse, the intended use of freezer/toaster is data serialization/de-serialization, not prettification of debugging output.
So, Data::Dumper calls the freezer method, but doesn't use the return value. The idea is probably that if you're going to serialize an object, you won't be messing with it again until you de-serialize it, so there's no problem with changing the object itself.
Here's the relevant section of code from the Data::Dumper source:
# Call the freezer method if it's specified and the object has the
# method. Trap errors and warn() instead of die()ing, like the XS
# implementation.
my $freezer = $s->{freezer};
if ($freezer and UNIVERSAL::can($val, $freezer)) {
eval { $val->$freezer() };
warn "WARNING(Freezer method call failed): $#" if $#;
}
If you just want to reduce the size of the output in your logs, you can remove newlines and indentation by setting $Data::Dumper::Indent to zero:
use Data::Dumper;
use WWW::Mechanize;
$Data::Dumper::Indent = 0;
my $mech = WWW::Mechanize->new;
print Dumper $mech;
Output:
$VAR1 = bless( {'headers' => {},'ssl_opts' => {'verify_hostname' => 1},'forms' => undef,'page_stack' => [],'text' => undef,'requests_redirectable' => ['GET','HEAD','POST'],'timeout' => 180,'onerror' => sub { "DUMMY" },'current_form' => undef,'links' => undef,'max_redirect' => 7,'quiet' => 0,'images' => undef,'noproxy' => 0,'stack_depth' => 8675309,'show_progress' => undef,'protocols_forbidden' => undef,'no_proxy' => [],'handlers' => {'request_prepare' => bless( [{'owner' => 'LWP::UserAgent::cookie_jar','callback' => sub { "DUMMY" },'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:705'}], 'HTTP::Config' ),'response_header' => bless( [{'owner' => 'LWP::UserAgent::parse_head','callback' => sub { "DUMMY" },'m_media_type' => 'html','line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:684'}], 'HTTP::Config' ),'response_done' => bless( [{'owner' => 'LWP::UserAgent::cookie_jar','callback' => sub { "DUMMY" },'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:708'}], 'HTTP::Config' )},'onwarn' => sub { "DUMMY" },'protocols_allowed' => undef,'use_eval' => 1,'local_address' => undef,'autocheck' => 1,'title' => undef,'def_headers' => bless( {'user-agent' => 'WWW-Mechanize/1.75'}, 'HTTP::Headers' ),'cookie_jar' => bless( {'COOKIES' => {}}, 'HTTP::Cookies' ),'proxy' => {},'max_size' => undef}, 'WWW::Mechanize' );
This is still a lot of output, but it's certainly more compact than:
$VAR1 = bless( {
'headers' => {},
'ssl_opts' => {
'verify_hostname' => 1
},
'forms' => undef,
'page_stack' => [],
'text' => undef,
'requests_redirectable' => [
'GET',
'HEAD',
'POST'
],
'timeout' => 180,
'onerror' => sub { "DUMMY" },
'current_form' => undef,
'links' => undef,
'max_redirect' => 7,
'quiet' => 0,
'images' => undef,
'noproxy' => 0,
'stack_depth' => 8675309,
'show_progress' => undef,
'protocols_forbidden' => undef,
'no_proxy' => [],
'handlers' => {
'request_prepare' => bless( [
{
'owner' => 'LWP::UserAgent::cookie_jar',
'callback' => sub { "DUMMY" },
'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:705'
}
], 'HTTP::Config' ),
'response_header' => bless( [
{
'owner' => 'LWP::UserAgent::parse_head',
'callback' => sub { "DUMMY" },
'm_media_type' => 'html',
'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:684'
}
], 'HTTP::Config' ),
'response_done' => bless( [
{
'owner' => 'LWP::UserAgent::cookie_jar',
'callback' => sub { "DUMMY" },
'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:708'
}
], 'HTTP::Config' )
},
'onwarn' => sub { "DUMMY" },
'protocols_allowed' => undef,
'use_eval' => 1,
'local_address' => undef,
'autocheck' => 1,
'title' => undef,
'def_headers' => bless( {
'user-agent' => 'WWW-Mechanize/1.75'
}, 'HTTP::Headers' ),
'cookie_jar' => bless( {
'COOKIES' => {}
}, 'HTTP::Cookies' ),
'proxy' => {},
'max_size' => undef
}, 'WWW::Mechanize' );
Alternatively, you could try Data::Dump, which allows you to filter the output using Data::Dump::Filtered. I prefer Data::Dump to Data::Dumper anyway because I think it has more sensible defaults (e.g. outputting escape sequences for whitespace other than spaces).
I haven't used the filtering feature yet, but brian d foy wrote a nice article about it with several examples.

perl Email::MIME not working intermittent

the following code sometimes work, and sometimes do not. It is runnign on linux, where postfix is installed, i disabled it and stopped the service. does this need postfix to run?
when i run this test code in terminal i get no error and no email.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Email::MIME;
use Email::Sender::Simple qw(sendmail);
my $sub='test';
my $exitCode=0;
my $emailTo='raxxxx#xxxx.com';
my $bcc='';
if ($exitCode == 0){$exitCode = '';}
my #mesgBody = ("test\n","email\n");
my $message = Email::MIME->create(
header_str => [
From => '"Rajeev" <'.$emailTo.'>',
To => $emailTo,
Subject => $sub,
],
attributes => {
'X-Priority' => 1,
'X-MSMail-Priority' => 'High',
encoding => 'quoted-printable',
charset => 'ISO-8859-1',
},
body_str => "#mesgBody"."\n".$exitCode, #old body_str => $sub."\n".$mesg."\n".$exitCode,
);
#sendmail($message);
if ($bcc eq ''){
my $result=sendmail(
$message,
{
from => '"Rajeev" <'.$emailTo.'>',
to => [$emailTo],
}
);
print "result=".Dumper($result)."\n";
} else {
sendmail(
$message,
{
from => '"Rajeev" <'.$emailTo.'>',
to => [$emailTo, $bcc],
}
);
}
output:->
result=$VAR1 = bless( {}, 'Email::Sender::Success' );
so if this is success, why am i not getting any email?
I also see nothing in system logs.
thank you.
# service postfix start
solved the problem.

Perl xml simple for parsing node with the same name

I have the following xml file
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<pathway name="path:ko01200" org="ko" >
<entry id="1" >
<graphics name="one"
type="circle" />
</entry>
<entry id="7" >
<graphics name="one"
type="rectangle" />
<graphics name="two"
type="rectangle"/>
</entry>
</pathway>
I tired to pars it using xml simple with the following code which I am stuck since one of the nodes had 2 graphic elements. So it complains. I assume I have to have another foreach loop for graphic elements but I don't know how to proceed .
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("file.xml",KeyAttr => ['id']);
print Dumper($data);
foreach my $entry ( keys %{$data->{entry}} ) {
print $data->{entry}->{$entry}->{graphics}->{type}."\n";
}
here is the code result
$VAR1 = {
'entry' => {
'1' => {
'graphics' => {
'name' => 'one...',
'type' => 'circle'
}
},
'7' => {
'graphics' => [
{
'name' => 'one',
'type' => 'rectangle'
},
{
'name' => 'two',
'type' => 'rectangle'
}
]
}
},
'org' => 'ko',
'name' => 'path:ko01200'
};
circle
Not a HASH reference at stack.pl line 12.
XML::Simple lacks consistency because it's up to the user to enable strict mode, so graphics node is sometimes hash, sometimes array depending on number of child elements.
for my $entry ( keys %{$data->{entry}} ) {
my $graphics = $data->{entry}{$entry}{graphics};
$graphics = [ $graphics ] if ref $graphics eq "HASH";
print "$_->{type}\n" for #$graphics;
}
There are better modules for XML parsing, please check XML::LibXML
or as #RobEarl suggested use ForceArray parameter:
XMLin("file.xml",KeyAttr => ['id'], ForceArray => [ 'graphics' ]);

How to parse XML and create a tree structure in Perl

I am parsing a XML file with XML::Simple. Is there any way to get a tree form from the XML? If so please explain with example or suggest a CPAN package.
I would like to know which tag I have to process after column and so on.
There is no sequence for the tags. The column tag can appear after Table or display_name many times.
Tab
column
Table
column
display_name
column
display_name
XML:
<Tab>
<column>
<display_name>xyz</display_name>
<display_name>pqr</display_name>
</column>
<Table>
<column><display_name>Department</display_name></column>
</Table>
<display_name>abc</display_name>
<column>pwd</column>
<display_name>jack</display_name>
</Tab>
output with XML::Simple:
$VAR1 = {
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => [
'abc',
'jack'
],
'column' => [
{
'display_name' => [
'xyz',
'pqr'
]
},
'pwd'
]
};
Expected o/p:
$VAR1 = {
'column' => {
'display_name' => [
'xyz',
'pqr'
]
}
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => 'abc',
'column' => 'pwd',
'display_name' =>'jack'
};
I know a hash with same keys isn't possible. Please suggest a way that I can maintain the sequence of tags and will be able to print them.
XML::LibXML creates a tree with no loss of information.
use XML::LibXML qw( );
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($qfn);
You can generate the output you specified from there. (I don't know why you'd want to, since the Perl code you want for output would lose data if run.)
I used XML::Parser for same file
#!/usr/sbin/perl
use XML::Parser;
use Data::Dumper;
use strict;
my $Filename = "abc.xml";
my $Parser = new XML::Parser( Style => 'tree' );
my $Tree = $Parser->parsefile( $Filename );
print Dumper( $Tree );
If there is another way to get desired output please suggest.

Get Perl to print full "key path" to values (Data::Dumper won't)

$foo{alongkeyname}{anotherlongkeyname}{yetanotherlongkeyname}{afairlyshortkeynamewellitgotlongwhileiwastypingitsoiguessnot}{bob}{something} = 1;
How do I get Perl to print $foo and show me the full "path name" to
get to 1? In other words, I want output that looks similar to the
input above.
Data::Dumper won't do this, and the long key names wrap the output,
making even the indented form less useful.
Ages ago, I wrote my own "unfold" subroutine at https://github.com/barrycarter/bcapps/blob/master/bclib.pl#L109 which outputs:
<hash HASH(0x92a33a4)>
<key>
alongkeyname
</key>
<val>
<hash HASH(0x95103b4)>
<key>
anotherlongkeyname
</key>
<val>
<hash HASH(0x9510464)>
<key>
yetanotherlongkeyname
</key>
<val>
<hash HASH(0x9510434)>
<key>
afairlyshortkeynamewellitgotlongwhileiwastypingitsoiguessnot
</key>
<val>
<hash HASH(0x95bae7c)>
<key>
bob
</key>
<val>
<hash HASH(0x95cf8bc)>
something: 1
</hash HASH(0x95cf8bc)>
</val>
</hash HASH(0x95bae7c)>
</val>
</hash HASH(0x9510434)>
</val>
</hash HASH(0x9510464)>
</val>
</hash HASH(0x95103b4)>
</val>
</hash HASH(0x92a33a4)>
but that's not really useful either.
Real-life project inspiring this question: pulling SYNOP/BUOY data from the
XML::Simple hashified output of metaf2xml
EDIT: Thank you Ben! I tried this and it worked great on my example. Then I tried it on another hash, and got:
$VAR1 = {'remark' => [{'obsStationType' => {'stationType' => {'v' => 'AO2'},'s' => 'AO2'}},{'needMaint' => {'s' => '$'}}],'QNH' => {'inHg' => {'v' => '29.99'},'s' => 'A2999'},'visPrev' => {'distance' => {'u' => 'SM','v' => '7','rp' => '1'},'s' => '7SM'},'sfcWind' => {'wind' => {'speed' => {'u' => 'KT','v' => '3'},'dir' => {'rn' => '5','v' => '60','rp' => '4'}},'measurePeriod' => {'u' => 'MIN','v' => '2'},'s' => '06003KT'},'obsStationId' => {'id' => {'v' => 'KBTR'},'s' => 'KBTR'},'obsTime' => {'s' => '080940Z','timeAt' => {'hour' => {'v' => '09'},'minute' => {'v' => '40'},'day' => {'v' => '08'}}},'s' => 'KBTR 080940Z 06003KT 7SM SCT003 BKN200 24/23 A2999 RMK AO2 $','cloud' => [{'cloudCover' => {'v' => 'SCT'},'s' => 'SCT003','cloudBase' => {'u' => 'FT','v' => '300'}},{'cloudCover' => {'v' => 'BKN'},'s' => 'BKN200','cloudBase' => {'u' => 'FT','v' => '20000'}}],'temperature' => {'relHumid4' => {'v' => '94.15'},'dewpoint' => {'temp' => {'u' => 'C','v' => '23'}},'relHumid3' => {'v' => '94.03'},'relHumid1' => {'v' => '94.16'},'relHumid2' => {'v' => '94.17'},'air' => {'temp' => {'u' => 'C','v' => '24'}},'s' => '24/23'}};
So the question I think I want to answer is: what value of this hash will give me the "94.15" you see above? It's sort of hard to tell from the above.
(If anyone's curious, the answer is $hash{temperature}{relHumid4}{v})
MORE EDIT: Thanks, Ilmari. I tried dump_var($VAR1) w/ my VAR1 above and got...
HASH(0x9ae6764) = undef;
I also tried dump_var({$VAR1}) with the same result. I might've missed something. Could you cut and paste my VAR1 above and see if it works? I did export 'Dumper' as you indicate in your 'use' statement.
Here's a quick do-it-yourself solution:
use Data::Dumper 'Dumper';
sub dump_var {
my ($prefix, $var) = #_;
my #rv;
local $Data::Dumper::Indent = 0;
local $Data::Dumper::Terse = 1;
if (ref $var eq 'ARRAY' and #$var) {
for my $i (0 .. $#$var) {
push #rv, dump_var($prefix . "->[$i]", $var->[$i]);
}
} elsif (ref $var eq 'HASH' and %$var) {
foreach my $key (sort keys %$var) {
push #rv, dump_var($prefix . '->{'.Dumper($key).'}', $var->{$key});
}
} elsif (ref $var eq 'SCALAR') {
push #rv, dump_var('${' . $prefix . '}', $$var);
} else {
push #rv, "$prefix = " . Dumper($var) . ";\n";
}
return #rv;
}
and some test code:
my $foo = {
alpha => [ 'beta', \ 'gamma' ],
one => { two => { three => 3, four => 3.141 },
five => { six => undef, seven => \*STDIN },
},
foobar => sub { print "Hello, world!\n"; },
};
print dump_var('$foo' => $foo);
which produces the output:
$foo->{'alpha'}->[0] = 'beta';
${$foo->{'alpha'}->[1]} = 'gamma';
$foo->{'foobar'} = sub { "DUMMY" };
$foo->{'one'}->{'five'}->{'seven'} = \*::STDIN;
$foo->{'one'}->{'five'}->{'six'} = undef;
$foo->{'one'}->{'two'}->{'four'} = '3.141';
$foo->{'one'}->{'two'}->{'three'} = 3;
Edit: While testing a PHP version of this code, I realized that it didn't correctly handle empty arrays and hashes. I've fixed the code so that such values are passed directly to Dumper.
Data::Dumper can print output similar to what you're looking for by setting Indent to 0.
[ben#imac ~]$ perl
use Data::Dumper;
$Data::Dumper::Indent = 0;
$foo{alongkeyname}{anotherlongkeyname}{yetanotherlongkeyname}{afairlyshortkeynamewellitgotlongwhileiwastypingitsoiguessnot}{bob}{something} = 1;
print Dumper(\%foo);
Output:
$VAR1 = {'alongkeyname' => {'anotherlongkeyname' => {'yetanotherlongkeyname' => {'afairlyshortkeynamewellitgotlongwhileiwastypingitsoiguessnot' => {'bob' => {'something' => 1}}}}}};
For a possible solution to the problem behind your question, please see the feature announced today in the Project News for metaf2xml.