How to parse XML and create a tree structure in Perl - perl

I am parsing a XML file with XML::Simple. Is there any way to get a tree form from the XML? If so please explain with example or suggest a CPAN package.
I would like to know which tag I have to process after column and so on.
There is no sequence for the tags. The column tag can appear after Table or display_name many times.
Tab
column
Table
column
display_name
column
display_name
XML:
<Tab>
<column>
<display_name>xyz</display_name>
<display_name>pqr</display_name>
</column>
<Table>
<column><display_name>Department</display_name></column>
</Table>
<display_name>abc</display_name>
<column>pwd</column>
<display_name>jack</display_name>
</Tab>
output with XML::Simple:
$VAR1 = {
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => [
'abc',
'jack'
],
'column' => [
{
'display_name' => [
'xyz',
'pqr'
]
},
'pwd'
]
};
Expected o/p:
$VAR1 = {
'column' => {
'display_name' => [
'xyz',
'pqr'
]
}
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => 'abc',
'column' => 'pwd',
'display_name' =>'jack'
};
I know a hash with same keys isn't possible. Please suggest a way that I can maintain the sequence of tags and will be able to print them.

XML::LibXML creates a tree with no loss of information.
use XML::LibXML qw( );
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($qfn);
You can generate the output you specified from there. (I don't know why you'd want to, since the Perl code you want for output would lose data if run.)

I used XML::Parser for same file
#!/usr/sbin/perl
use XML::Parser;
use Data::Dumper;
use strict;
my $Filename = "abc.xml";
my $Parser = new XML::Parser( Style => 'tree' );
my $Tree = $Parser->parsefile( $Filename );
print Dumper( $Tree );
If there is another way to get desired output please suggest.

Related

Perl DBM::Deep - add/delete in an arrayref of hashrefs

I've been working with DBM::Deep and so far, it's been easy to Read and Update the keys in the DB but when it comes to adding or deleting entities it gets a little complicated and I can't see how it could be done.
I've imported an XML file with XML::Hash and then copied on a DBM::Deep object. So the result is somehow complicated ... The objective of course is to be able to recreate the XML file easily.
So this code:
use DBM::Deep;
use List::Util qw(first);
use Data::Dumper;
my $db = DBM::Deep->new('foo.db');
my $devices = $db->{foo}->{devices}->{device};
(my $match) = grep { $_->{hostname} eq 'myfoo' } #$devices;
print Dumper ($match);
print Dumper($devices);
Gives the following output for the first print:
$VAR1 = bless( {
'enable' => '0',
'hostname' => 'myfoo',
'auth' => 'myauth',
'ip' => 'myip',
'protocol' => 'ssh'
}, 'DBM::Deep::Hash' );
The second print shows:
$VAR1 = bless( [
bless( {
'enable' => '0',
'hostname' => 'myfoo',
'auth' => 'myauth',
'ip' => 'myip',
'protocol' => 'ssh'
}, 'DBM::Deep::Hash' ),
bless( {
'ip' => 'myotherip',
'hostname' => 'myotherfoo',
'auth' => 'myauth',
'protocol' => 'telnet'
}, 'DBM::Deep::Hash' ),
and so on.
Can someone please help me to understand how to Create and Delete in this data structure?

Config::IniFiles hash behaves different than manually written hash

I am loading a config file, which ends up as an embedded hash, with Config::IniFiles. After that, I want to modify the resulting hash by, for some keys, bringing its values one level up. In the example below, I am aiming for this as a result:
$VAR1 = {
'max_childrensubtree' => '7',
'port' => '1984',
'user' => 'someuser',
'password' => 'somepw',
'max_width' => '20',
'host' => 'localhost',
'attrs' => {
'subattr2' => 'cat',
'topattr1' => 'cat',
'subattr2_1' => 'pt',
'subattr1' => 'rel'
},
'max_descendants' => '1000'
};
So for the keys params and basex at the highest level, I want to move its contents (key-value pairs) to the highest level - and remove the items themselves. In short:
(
a => {
'key1' => 'ok',
'key2' => 'hello'
}
)
turns into
(
'key1' => 'ok',
'key2' => 'hello'
)
The strange thing is that what I am trying to do does not work on a hash built from a read INI file, but it does work with a manually inserted hash. In other words, this works:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Data::Dumper;
my %ini = (
'params' => {
'max_width' => '20',
'max_childrensubtree' => '7',
'max_descendants' => '1000'
},
'attrs' => {
'topattr1' => 'cat',
'subattr1' => 'rel',
'subattr2' => 'cat',
'subattr2_1' => 'pt',
},
'basex' => {
'host' => 'localhost',
'port' => '1984',
'user' => 'someuser',
'password' => 'somepw'
}
);
&_parse_ini(\%ini);
sub _parse_ini {
my $ref = shift;
foreach (('params', 'basex')) {
foreach my $k (keys %{$ref->{$_}}) {
$ref->{$k} = $ref->{$_}->{$k};
}
delete $ref->{$_};
}
print Dumper($ref);
}
But this does not:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Data::Dumper;
use Config::IniFiles;
# Load config file
tie my %ini, 'Config::IniFiles', (-file => $ARGV[0]);
&_parse_ini(\%ini);
sub _parse_ini {
my $ref = shift;
foreach (('params', 'basex')) {
foreach my $k (keys %{$ref->{$_}}) {
$ref->{$k} = $ref->{$_}->{$k};
}
delete $ref->{$_};
}
print Dumper($ref);
}
The input ini file for this example would be:
[params]
max_width = 20
max_childrensubtree = 7
max_descendants = 1000
[attrs]
topattr1 = cat
subattr1 = rel
subattr2 = cat
subattr2_1 = pt
[basex]
host = localhost
port = 1984
user = admin
password = admin
I have been looking in the documentation and on SO for similar issues but have found none. It appears that the hashes are identical (Config::IniFiles doesn't seem to add something specific), so I have no idea why it works for 'manual' hashes, and not for read-in ones.
The two hashes are not identical at all, although they may appear to be from the point of view of the data they contain.
The first one is a regular hash. You can do whatever you like with it.
The second one is a tied hash. It becomes an object of Config::IniFiles, but with a hash like interface. So whilst it appears to be a hash, the package can override the methods for storing or fetching information in the hash however it likes.
In this particular case, it looks like Config::IniFiles will only store a new key value in the hash if the value is hash ref. So you can't flatten out the tied hash as you want. Instead you'll have to create a new hash and copy the data in to it to do what you want.

Print data after dumper

I have this structure with data-dumper:
$VAR1 = {
'field' => [
{
'content' => {
'en' => [
'Footware haberdashery leather goods'
],
'de' => [
'Schuhe Kurzwaren und Lederartikel'
],
'it' => [
'Calzature mercerie e pelletterie'
]
},
'type' => 'tag',
'valore' => 'TAG3'
},
{
'content' => {
'en' => [
'Cobbler'
],
'de' => [
'Schuster'
],
'it' => [
'Calzolai'
]
},
'type' => 'tag',
'valore' => 'TAG24'
}
]
};
My question is: how to take data and print one for one ?
I want print the name, the tag and valore.
For my software is necessary take the name of shop and more data for example the type
It looks like the structure is a hashref containing an arrayref of hashes, and so on. And apparently where you mention 'name' you mean 'content' by language. Likewise, it seems that where you mention 'tag' you mean 'type'. My answer will be based on those assumptions.
foreach my $rec (#{$href->{field}}) {
print "$rec->{content}->{en}->[0]: $rec->{type}, $rec->{valore}\n";
}
The -> between {content} and {en}, and again between {en} and [0] are optional, and a matter of style.
If you simply want to access elements directly (foregoing the loop), you might do it like this:
print $href->{field}->[0]->{content}->{en}->[0], "\n";
print $href->{field}->[0]->{type}, "\n";
print $href->{field}->[0]->{valore}, "\n";
If you want to print all the languages, you could do this:
foreach my $rec (#{$href->{field}}) {
print $rec->{content}->{$_}->[0], "\n" foreach sort keys %{$rec->{content}};
print $rec->{type}, "\n";
print $rec->{valor}, "\n\n";
}
There are several Perl documentation pages that could be of use to you in the future as you learn to manipulate references and datastructures with Perl: perlreftut, perlref, and perldsc. Access them from your own system as perldoc perlreftut, for example.

Perl xml simple for parsing node with the same name

I have the following xml file
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<pathway name="path:ko01200" org="ko" >
<entry id="1" >
<graphics name="one"
type="circle" />
</entry>
<entry id="7" >
<graphics name="one"
type="rectangle" />
<graphics name="two"
type="rectangle"/>
</entry>
</pathway>
I tired to pars it using xml simple with the following code which I am stuck since one of the nodes had 2 graphic elements. So it complains. I assume I have to have another foreach loop for graphic elements but I don't know how to proceed .
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("file.xml",KeyAttr => ['id']);
print Dumper($data);
foreach my $entry ( keys %{$data->{entry}} ) {
print $data->{entry}->{$entry}->{graphics}->{type}."\n";
}
here is the code result
$VAR1 = {
'entry' => {
'1' => {
'graphics' => {
'name' => 'one...',
'type' => 'circle'
}
},
'7' => {
'graphics' => [
{
'name' => 'one',
'type' => 'rectangle'
},
{
'name' => 'two',
'type' => 'rectangle'
}
]
}
},
'org' => 'ko',
'name' => 'path:ko01200'
};
circle
Not a HASH reference at stack.pl line 12.
XML::Simple lacks consistency because it's up to the user to enable strict mode, so graphics node is sometimes hash, sometimes array depending on number of child elements.
for my $entry ( keys %{$data->{entry}} ) {
my $graphics = $data->{entry}{$entry}{graphics};
$graphics = [ $graphics ] if ref $graphics eq "HASH";
print "$_->{type}\n" for #$graphics;
}
There are better modules for XML parsing, please check XML::LibXML
or as #RobEarl suggested use ForceArray parameter:
XMLin("file.xml",KeyAttr => ['id'], ForceArray => [ 'graphics' ]);

How to read data from a hash in Perl?

I have the following XML file:
<?xml version='1.0'?>
<preferences>
<font role="console">
<fname>Courier</fname>
<size>9</size>
</font>
<font role="default">
<fname>Times New Roman</fname>
<size>14</size>
</font>
<font role="titles">
<fname>Helvetica</fname>
<size>10</size>
</font>
</preferences>
I managed to read it and dump it out. Now I am supposed to read all the key value pairs.
Here is the script:
#!/usr/bin/perl
use warnings;
use strict;
# use module
use XML::Simple;
use Data::Dumper;
my $data = XMLin('test.xml');
# print Dumper(%data);
while ( my ($key, $value) = each(%$data) ) {
print "$key => $value\n";
}
Nothing prints inside the loop... What could be the problem? I am new to this and wrote my Hello World script and this all in the same day, so I will take any advice on the code.
This works just fine:
my $data = XMLin('test.xml');
print Dumper($data);
And it gives me:
$VAR1 = {
'font' => [
{
'fname' => 'Courier',
'role' => 'console',
'size' => '9'
},
{
'fname' => 'Times New Roman',
'role' => 'default',
'size' => '14'
},
{
'fname' => 'Helvetica',
'role' => 'titles',
'size' => '10'
}
]
};
I am guessing that inside the while loop I need to loop through each of the arrays. Am I right?
use strict;
Is your friend. It would have told you:
Global symbol "%data" requires explicit package name
What you want is %$data
In other words: $data and %data counts as two different variables.
Update:
As you changed the whole question, my answer makes little sense now.. As does your question. You have printed it. What else do you need?
If you wanted to print that structure, you'd need something like (untested):
for my $key1 (keys %$data) {
for my $array_value (#{ $data->{$key1} }) {
for my $key2 (keys %$array_value) {
print "$key2 => $array_value->{$key2}\n";
}
}
}
If you wanted to access a value directly:
print $data->{font}[0]{'fname'}
You'll need to experiment to get what you need. In the Data::Dumper output, you can easily see which values are hashes and which are arrays:
$VAR1 = { # The curly bracket denotes a beginning hash
'font' => [ # Square bracket = array begins
{ # The first array element is a hash
'fname' => 'Courier', # Inside the hash
'role' => 'console',
'size' => '9'
}, # Hash ends
{ # Next array value, new hash begins
'fname' => 'Times New Roman',
'role' => 'default',
'size' => '14'
},
{
'fname' => 'Helvetica',
'role' => 'titles',
'size' => '10'
}
] # Array ends
}; # Hash ends
Try with:
while ( my ($key, $value) = each(%$data) ) {
....