I try to parse an xml document using XML::Simple perl parsing.
I noticed, if the document looks like:
<?xml version="1.0" encoding="UTF-8"?>
<fields>
<field>
<f1>1234</f1>
<name>MyName1</name>
</field>
</fields>
the result of print(Dumper($ref)); looks as expected:
$VAR1 = {
'field' => {
'f1' => '1234',
'name' => 'MyName1'
}
};
while if I have more than one list in the document:
<?xml version="1.0" encoding="UTF-8"?>
<fields>
<field>
<f1>1234</f1>
<name>MyName1</name>
</field>
<field>
<f1>567</f1>
<name>MyName2</name>
</field>
</fields>
the results looks like:
$VAR1 = {
'field' => {
'MyName1' => {
'f1' => '1234'
},
'MyName2' => {
'f1' => '567'
}
}
};
while expected results would be:
$VAR1 = { [
'field' => {
'f1' => '1234',
'name' => 'MyName1'
},
'field' => {
'f1' => '567',
'name' => 'MyName2'
}
]
};
what options of XML::Simple parser will prevent tag content substitution with the tag reference and use an array of <field> instead?
By default. XML::Simple names hash keys by the values of tags <name>, <key> and <id>. You can customize this behavior via KeyAttr setting.
Thus, the code which produces the structure closest to what you expect is:
#!/usr/bin/env perl
use common::sense;
use Data::Dumper;
use XML::Simple;
local $/ = undef;
say Dumper XMLin(<DATA>, KeyAttr => []);
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<fields>
<field>
<f1>1234</f1>
<name>MyName1</name>
</field>
<field>
<f1>567</f1>
<name>MyName2</name>
</field>
</fields>
And here is the output:
$VAR1 = {
'field' => [
{
'f1' => '1234',
'name' => 'MyName1'
},
{
'f1' => '567',
'name' => 'MyName2'
}
]
};
Use the ForceArray => 'field' option to XMLin.
In general you cannot mold the data structures that XML::Simple returns to whatever you want them to. XML::Simple is too simple for that. However, your use case is one that is described in the documentation. I suggest you read the documentation to all items that are marked as important at least, because it really helps knowing what your options are in shaping structures returned by XML::Simple.
Related
I am loading a config file, which ends up as an embedded hash, with Config::IniFiles. After that, I want to modify the resulting hash by, for some keys, bringing its values one level up. In the example below, I am aiming for this as a result:
$VAR1 = {
'max_childrensubtree' => '7',
'port' => '1984',
'user' => 'someuser',
'password' => 'somepw',
'max_width' => '20',
'host' => 'localhost',
'attrs' => {
'subattr2' => 'cat',
'topattr1' => 'cat',
'subattr2_1' => 'pt',
'subattr1' => 'rel'
},
'max_descendants' => '1000'
};
So for the keys params and basex at the highest level, I want to move its contents (key-value pairs) to the highest level - and remove the items themselves. In short:
(
a => {
'key1' => 'ok',
'key2' => 'hello'
}
)
turns into
(
'key1' => 'ok',
'key2' => 'hello'
)
The strange thing is that what I am trying to do does not work on a hash built from a read INI file, but it does work with a manually inserted hash. In other words, this works:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Data::Dumper;
my %ini = (
'params' => {
'max_width' => '20',
'max_childrensubtree' => '7',
'max_descendants' => '1000'
},
'attrs' => {
'topattr1' => 'cat',
'subattr1' => 'rel',
'subattr2' => 'cat',
'subattr2_1' => 'pt',
},
'basex' => {
'host' => 'localhost',
'port' => '1984',
'user' => 'someuser',
'password' => 'somepw'
}
);
&_parse_ini(\%ini);
sub _parse_ini {
my $ref = shift;
foreach (('params', 'basex')) {
foreach my $k (keys %{$ref->{$_}}) {
$ref->{$k} = $ref->{$_}->{$k};
}
delete $ref->{$_};
}
print Dumper($ref);
}
But this does not:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Data::Dumper;
use Config::IniFiles;
# Load config file
tie my %ini, 'Config::IniFiles', (-file => $ARGV[0]);
&_parse_ini(\%ini);
sub _parse_ini {
my $ref = shift;
foreach (('params', 'basex')) {
foreach my $k (keys %{$ref->{$_}}) {
$ref->{$k} = $ref->{$_}->{$k};
}
delete $ref->{$_};
}
print Dumper($ref);
}
The input ini file for this example would be:
[params]
max_width = 20
max_childrensubtree = 7
max_descendants = 1000
[attrs]
topattr1 = cat
subattr1 = rel
subattr2 = cat
subattr2_1 = pt
[basex]
host = localhost
port = 1984
user = admin
password = admin
I have been looking in the documentation and on SO for similar issues but have found none. It appears that the hashes are identical (Config::IniFiles doesn't seem to add something specific), so I have no idea why it works for 'manual' hashes, and not for read-in ones.
The two hashes are not identical at all, although they may appear to be from the point of view of the data they contain.
The first one is a regular hash. You can do whatever you like with it.
The second one is a tied hash. It becomes an object of Config::IniFiles, but with a hash like interface. So whilst it appears to be a hash, the package can override the methods for storing or fetching information in the hash however it likes.
In this particular case, it looks like Config::IniFiles will only store a new key value in the hash if the value is hash ref. So you can't flatten out the tied hash as you want. Instead you'll have to create a new hash and copy the data in to it to do what you want.
I have the following xml code
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: Aug 26, 2013 10:02:03 +0900 (GMT+09:00) -->
<pathway name="path:ko01200" >
<reaction id="14" name="rn:R01845" type="irreversible">
<substrate id="108" name="cpd:C00447"/>
<product id="109" name="cpd:C05382"/>
</reaction>
<reaction id="15" name="rn:R01641" type="reversible">
<substrate id="109" name="cpd:C05382"/>
<substrate id="104" name="cpd:C00118"/>
<product id="110" name="cpd:C00117"/>
<product id="112" name="cpd:C00231"/>
</reaction>
</pathway>
I am trying to print the substrate id and product id with following code which I am stuck for the one that have more than one ID. Tried to use dumper to see the data structure but I don't know how to proceed. I have already used XML simple for the rest of my parsing script (this part is a small part of my whole script ) and I can not change that now
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("test.xml",KeyAttr => ['id']);
print Dumper($data);
foreach my $reaction ( sort keys %{$data->{reaction}} ) {
print $data->{reaction}->{$reaction}->{substrate}->{id}."\n";
print $data->{reaction}->{$reaction}->{product}->{id}."\n";
}
Here is the output
$VAR1 = {
'name' => 'path:ko01200',
'reaction' => {
'15' => {
'substrate' => {
'104' => {
'name' => 'cpd:C00118'
},
'109' => {
'name' => 'cpd:C05382'
}
},
'name' => 'rn:R01641',
'type' => 'reversible',
'product' => {
'112' => {
'name' => 'cpd:C00231'
},
'110' => {
'name' => 'cpd:C00117'
}
}
},
'14' => {
'substrate' => {
'name' => 'cpd:C00447',
'id' => '108'
},
'name' => 'rn:R01845',
'type' => 'irreversible',
'product' => {
'name' => 'cpd:C05382',
'id' => '109'
}
}
}
};
108
109
Use of uninitialized value in concatenation (.) or string at line 12.
Use of uninitialized value in concatenation (.) or string at line 13.
First of all, don't use XML::Simple. it is hard to predict what exact data structure it will produce from a bit of XML, and it's own documentation mentions it is deprecated.
Anyway, your problem is that you want to access an id field in the product and substrate subhashes – but they don't exist in one of the reaction subhashes
'15' => {
'substrate' => {
'104' => {
'name' => 'cpd:C00118'
},
'109' => {
'name' => 'cpd:C05382'
}
},
'name' => 'rn:R01641',
'type' => 'reversible',
'product' => {
'112' => {
'name' => 'cpd:C00231'
},
'110' => {
'name' => 'cpd:C00117'
}
}
},
Instead, the keys are numbers, and each value is a hash containing a name. The other reaction has a totally different structure, so special-case code would have been written for both. This is why XML::Simple shouldn't be used – the output is just to unpredictable.
Enter XML::LibXML. It is not extraordinary, but it implememts standard APIs like the DOM and XPath to traverse your XML document.
use XML::LibXML;
use feature 'say'; # assuming perl 5.010
my $doc = XML::LibXML->load_xml(file => "test.xml") or die;
for my $reaction_item ($doc->findnodes('//reaction/product | //reaction/substrate')) {
say $reaction_item->getAttribute('id');
}
Output:
108
109
109
104
110
112
I have the following xml file
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<pathway name="path:ko01200" org="ko" >
<entry id="1" >
<graphics name="one"
type="circle" />
</entry>
<entry id="7" >
<graphics name="one"
type="rectangle" />
<graphics name="two"
type="rectangle"/>
</entry>
</pathway>
I tired to pars it using xml simple with the following code which I am stuck since one of the nodes had 2 graphic elements. So it complains. I assume I have to have another foreach loop for graphic elements but I don't know how to proceed .
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("file.xml",KeyAttr => ['id']);
print Dumper($data);
foreach my $entry ( keys %{$data->{entry}} ) {
print $data->{entry}->{$entry}->{graphics}->{type}."\n";
}
here is the code result
$VAR1 = {
'entry' => {
'1' => {
'graphics' => {
'name' => 'one...',
'type' => 'circle'
}
},
'7' => {
'graphics' => [
{
'name' => 'one',
'type' => 'rectangle'
},
{
'name' => 'two',
'type' => 'rectangle'
}
]
}
},
'org' => 'ko',
'name' => 'path:ko01200'
};
circle
Not a HASH reference at stack.pl line 12.
XML::Simple lacks consistency because it's up to the user to enable strict mode, so graphics node is sometimes hash, sometimes array depending on number of child elements.
for my $entry ( keys %{$data->{entry}} ) {
my $graphics = $data->{entry}{$entry}{graphics};
$graphics = [ $graphics ] if ref $graphics eq "HASH";
print "$_->{type}\n" for #$graphics;
}
There are better modules for XML parsing, please check XML::LibXML
or as #RobEarl suggested use ForceArray parameter:
XMLin("file.xml",KeyAttr => ['id'], ForceArray => [ 'graphics' ]);
I am parsing a XML file with XML::Simple. Is there any way to get a tree form from the XML? If so please explain with example or suggest a CPAN package.
I would like to know which tag I have to process after column and so on.
There is no sequence for the tags. The column tag can appear after Table or display_name many times.
Tab
column
Table
column
display_name
column
display_name
XML:
<Tab>
<column>
<display_name>xyz</display_name>
<display_name>pqr</display_name>
</column>
<Table>
<column><display_name>Department</display_name></column>
</Table>
<display_name>abc</display_name>
<column>pwd</column>
<display_name>jack</display_name>
</Tab>
output with XML::Simple:
$VAR1 = {
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => [
'abc',
'jack'
],
'column' => [
{
'display_name' => [
'xyz',
'pqr'
]
},
'pwd'
]
};
Expected o/p:
$VAR1 = {
'column' => {
'display_name' => [
'xyz',
'pqr'
]
}
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => 'abc',
'column' => 'pwd',
'display_name' =>'jack'
};
I know a hash with same keys isn't possible. Please suggest a way that I can maintain the sequence of tags and will be able to print them.
XML::LibXML creates a tree with no loss of information.
use XML::LibXML qw( );
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($qfn);
You can generate the output you specified from there. (I don't know why you'd want to, since the Perl code you want for output would lose data if run.)
I used XML::Parser for same file
#!/usr/sbin/perl
use XML::Parser;
use Data::Dumper;
use strict;
my $Filename = "abc.xml";
my $Parser = new XML::Parser( Style => 'tree' );
my $Tree = $Parser->parsefile( $Filename );
print Dumper( $Tree );
If there is another way to get desired output please suggest.
I have the following XML file:
<?xml version='1.0'?>
<preferences>
<font role="console">
<fname>Courier</fname>
<size>9</size>
</font>
<font role="default">
<fname>Times New Roman</fname>
<size>14</size>
</font>
<font role="titles">
<fname>Helvetica</fname>
<size>10</size>
</font>
</preferences>
I managed to read it and dump it out. Now I am supposed to read all the key value pairs.
Here is the script:
#!/usr/bin/perl
use warnings;
use strict;
# use module
use XML::Simple;
use Data::Dumper;
my $data = XMLin('test.xml');
# print Dumper(%data);
while ( my ($key, $value) = each(%$data) ) {
print "$key => $value\n";
}
Nothing prints inside the loop... What could be the problem? I am new to this and wrote my Hello World script and this all in the same day, so I will take any advice on the code.
This works just fine:
my $data = XMLin('test.xml');
print Dumper($data);
And it gives me:
$VAR1 = {
'font' => [
{
'fname' => 'Courier',
'role' => 'console',
'size' => '9'
},
{
'fname' => 'Times New Roman',
'role' => 'default',
'size' => '14'
},
{
'fname' => 'Helvetica',
'role' => 'titles',
'size' => '10'
}
]
};
I am guessing that inside the while loop I need to loop through each of the arrays. Am I right?
use strict;
Is your friend. It would have told you:
Global symbol "%data" requires explicit package name
What you want is %$data
In other words: $data and %data counts as two different variables.
Update:
As you changed the whole question, my answer makes little sense now.. As does your question. You have printed it. What else do you need?
If you wanted to print that structure, you'd need something like (untested):
for my $key1 (keys %$data) {
for my $array_value (#{ $data->{$key1} }) {
for my $key2 (keys %$array_value) {
print "$key2 => $array_value->{$key2}\n";
}
}
}
If you wanted to access a value directly:
print $data->{font}[0]{'fname'}
You'll need to experiment to get what you need. In the Data::Dumper output, you can easily see which values are hashes and which are arrays:
$VAR1 = { # The curly bracket denotes a beginning hash
'font' => [ # Square bracket = array begins
{ # The first array element is a hash
'fname' => 'Courier', # Inside the hash
'role' => 'console',
'size' => '9'
}, # Hash ends
{ # Next array value, new hash begins
'fname' => 'Times New Roman',
'role' => 'default',
'size' => '14'
},
{
'fname' => 'Helvetica',
'role' => 'titles',
'size' => '10'
}
] # Array ends
}; # Hash ends
Try with:
while ( my ($key, $value) = each(%$data) ) {
....