perl XML::Simple for repeated elements - perl

I have the following xml code
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: Aug 26, 2013 10:02:03 +0900 (GMT+09:00) -->
<pathway name="path:ko01200" >
<reaction id="14" name="rn:R01845" type="irreversible">
<substrate id="108" name="cpd:C00447"/>
<product id="109" name="cpd:C05382"/>
</reaction>
<reaction id="15" name="rn:R01641" type="reversible">
<substrate id="109" name="cpd:C05382"/>
<substrate id="104" name="cpd:C00118"/>
<product id="110" name="cpd:C00117"/>
<product id="112" name="cpd:C00231"/>
</reaction>
</pathway>
I am trying to print the substrate id and product id with following code which I am stuck for the one that have more than one ID. Tried to use dumper to see the data structure but I don't know how to proceed. I have already used XML simple for the rest of my parsing script (this part is a small part of my whole script ) and I can not change that now
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("test.xml",KeyAttr => ['id']);
print Dumper($data);
foreach my $reaction ( sort keys %{$data->{reaction}} ) {
print $data->{reaction}->{$reaction}->{substrate}->{id}."\n";
print $data->{reaction}->{$reaction}->{product}->{id}."\n";
}
Here is the output
$VAR1 = {
'name' => 'path:ko01200',
'reaction' => {
'15' => {
'substrate' => {
'104' => {
'name' => 'cpd:C00118'
},
'109' => {
'name' => 'cpd:C05382'
}
},
'name' => 'rn:R01641',
'type' => 'reversible',
'product' => {
'112' => {
'name' => 'cpd:C00231'
},
'110' => {
'name' => 'cpd:C00117'
}
}
},
'14' => {
'substrate' => {
'name' => 'cpd:C00447',
'id' => '108'
},
'name' => 'rn:R01845',
'type' => 'irreversible',
'product' => {
'name' => 'cpd:C05382',
'id' => '109'
}
}
}
};
108
109
Use of uninitialized value in concatenation (.) or string at line 12.
Use of uninitialized value in concatenation (.) or string at line 13.

First of all, don't use XML::Simple. it is hard to predict what exact data structure it will produce from a bit of XML, and it's own documentation mentions it is deprecated.
Anyway, your problem is that you want to access an id field in the product and substrate subhashes – but they don't exist in one of the reaction subhashes
'15' => {
'substrate' => {
'104' => {
'name' => 'cpd:C00118'
},
'109' => {
'name' => 'cpd:C05382'
}
},
'name' => 'rn:R01641',
'type' => 'reversible',
'product' => {
'112' => {
'name' => 'cpd:C00231'
},
'110' => {
'name' => 'cpd:C00117'
}
}
},
Instead, the keys are numbers, and each value is a hash containing a name. The other reaction has a totally different structure, so special-case code would have been written for both. This is why XML::Simple shouldn't be used – the output is just to unpredictable.
Enter XML::LibXML. It is not extraordinary, but it implememts standard APIs like the DOM and XPath to traverse your XML document.
use XML::LibXML;
use feature 'say'; # assuming perl 5.010
my $doc = XML::LibXML->load_xml(file => "test.xml") or die;
for my $reaction_item ($doc->findnodes('//reaction/product | //reaction/substrate')) {
say $reaction_item->getAttribute('id');
}
Output:
108
109
109
104
110
112

Related

Perl- Get Hash Value from Multi level hash

I have a 3 dimension hash that I need to extract the data in it. I need to extract the name and vendor under vuln_soft-> prod. So far, I manage to extract the "cve_id" by using the following code:
foreach my $resultHash_entry (keys %hash){
my $cve_id = $hash{$resultHash_entry}{'cve_id'};
}
Can someone please provide a solution on how to extract the name and vendor. Thanks in advance.
%hash = {
'CVE-2015-6929' => {
'cve_id' => 'CVE-2015-6929',
'vuln_soft' => {
'prod' => {
'vendor' => 'win',
'name' => 'win 8.1',
'vers' => {
'vers' => '',
'num' => ''
}
},
'prod' => {
'vendor' => 'win',
'name' => 'win xp',
'vers' => {
'vers' => '',
'num' => ''
}
}
},
'CVE-2015-0616' => {
'cve_id' => 'CVE-2015-0616',
'vuln_soft' => {
'prod' => {
'name' => 'unity_connection',
'vendor' => 'cisco'
}
}
}
}
First, to initialize a hash, you use my %hash = (...); (note the parens, not curly braces). Using {} declares a hash reference, which you have done. You should always use strict; and use warnings;.
To answer the question:
for my $resultHash_entry (keys %hash){
print "$hash{$resultHash_entry}->{vuln_soft}{prod}{name}\n";
print "$hash{$resultHash_entry}->{vuln_soft}{prod}{vendor}\n";
}
...which could be slightly simplified to:
for my $resultHash_entry (keys %hash){
print "$hash{$resultHash_entry}{vuln_soft}{prod}{name}\n";
print "$hash{$resultHash_entry}{vuln_soft}{prod}{vendor}\n";
}
because Perl always knows for certain that any deeper entries than the first one is always a reference, so the deref operator -> isn't needed here.

Perl xml simple for parsing node with the same name

I have the following xml file
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<pathway name="path:ko01200" org="ko" >
<entry id="1" >
<graphics name="one"
type="circle" />
</entry>
<entry id="7" >
<graphics name="one"
type="rectangle" />
<graphics name="two"
type="rectangle"/>
</entry>
</pathway>
I tired to pars it using xml simple with the following code which I am stuck since one of the nodes had 2 graphic elements. So it complains. I assume I have to have another foreach loop for graphic elements but I don't know how to proceed .
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("file.xml",KeyAttr => ['id']);
print Dumper($data);
foreach my $entry ( keys %{$data->{entry}} ) {
print $data->{entry}->{$entry}->{graphics}->{type}."\n";
}
here is the code result
$VAR1 = {
'entry' => {
'1' => {
'graphics' => {
'name' => 'one...',
'type' => 'circle'
}
},
'7' => {
'graphics' => [
{
'name' => 'one',
'type' => 'rectangle'
},
{
'name' => 'two',
'type' => 'rectangle'
}
]
}
},
'org' => 'ko',
'name' => 'path:ko01200'
};
circle
Not a HASH reference at stack.pl line 12.
XML::Simple lacks consistency because it's up to the user to enable strict mode, so graphics node is sometimes hash, sometimes array depending on number of child elements.
for my $entry ( keys %{$data->{entry}} ) {
my $graphics = $data->{entry}{$entry}{graphics};
$graphics = [ $graphics ] if ref $graphics eq "HASH";
print "$_->{type}\n" for #$graphics;
}
There are better modules for XML parsing, please check XML::LibXML
or as #RobEarl suggested use ForceArray parameter:
XMLin("file.xml",KeyAttr => ['id'], ForceArray => [ 'graphics' ]);

Tie Reading and Writing Perl Config Files

I'm using the PerlMonk example I found on:
Reading and Writing Perl Config Files
Configuration.pl:
%CFG = (
'servers' => {
'SRV1' => {
'IP' => 99.32.4.0,
'user' => 'aname',
'pswd' => 'p4ssw0rd',
'status' => 'unavailable'
},
'SRV2' => {
'IP' => 129.99.10.5
'user' => 'guest',
'pswd' => 'guest'
'status' => 'unavailable'
}
},
'timeout' => 60,
'log' => {
'file' => '/var/log/my_log.log',
'level' => 'warn',
},
'temp' => 'remove me'
);
It is working great, but the only issue is when reading and writing the HASH like configuration is being 'out of order'.
Is there a way to keep it TIED?
This important since the configuration file will be also edited manually, so I want the keys and values in the same order.
You could tie config variable before using it, so later hash keys will stay in same order as before,
use strict;
use warnings;
use Tie::IxHash;
tie my %CFG, 'Tie::IxHash';
%CFG = (
'servers' => {
'SRV1' => {
'IP' => '99.32.4.0',
'user' => 'aname',
'pswd' => 'p4ssw0rd',
'status' => 'unavailable'
},
'SRV2' => {
'IP' => '129.99.10.5',
'user' => 'guest',
'pswd' => 'guest',
'status' => 'unavailable'
}
},
'timeout' => 60,
'log' => {
'file' => '/var/log/my_log.log',
'level' => 'warn',
},
'temp' => 'remove me'
);
use Data::Dumper;
print Dumper \%CFG;
If you use JSON then you have the advantage that your software is safe from a malicious attack (or perhaps accidental corruption). JSON also has a simpler syntax than Perl data structures, and it is easier to recover from syntax errors.
Setting the canonical option will create the data with the keys in sorted order, and so generate the same output for the same Perl data every time. If you need the data in a specific order other than alphabetical then you can use the Tie::IxHash module as #mpapec describes in his answer.
Alternatively you can use the sort_by method from the Pure Perl version of the module that lets you pass a collation subroutine. That would let you prescribe the order of your keys, and could be as simple as using a hash that relates all the possible key values with a numerical sort order.
This program uses the sort_by method to reconstruct the JSON in the same order as the keys appear in your original hash. That is unlikely to be the order you want, but the mechanism is there. It works by looking up each key in a hash table to determine how they should be ordered. Any keys (like SVR1 and SVR2 here) that don't appear in the hash are sorted in alphabetical order by default.
use strict;
use warnings;
use JSON::PP ();
my %CFG = (
'servers' => {
'SRV1' => {
'IP' => '99.32.4.0',
'user' => 'aname',
'pswd' => 'p4ssw0rd',
'status' => 'unavailable'
},
'SRV2' => {
'IP' => '129.99.10.5',
'user' => 'guest',
'pswd' => 'guest',
'status' => 'unavailable'
}
},
'timeout' => 60,
'log' => {
'file' => '/var/log/my_log.log',
'level' => 'warn',
},
'temp' => 'remove me'
);
my %sort_order;
my $n = 0;
$sort_order{$_} = ++$n for qw/ servers timeout log temp /;
$sort_order{$_} = ++$n for qw/ IP user pswd status /;
$sort_order{$_} = ++$n for qw/ file level /;
my $json = JSON::PP->new->pretty->sort_by(\&json_sort);
print $json->encode(\%CFG);
sub json_sort {
my ($aa, $bb) = map $sort_order{$_}, $JSON::PP::a, $JSON::PP::b;
$aa and $bb and $aa <=> $bb or $JSON::PP::a cmp $JSON::PP::b;
}
generates this output
{
"servers" : {
"SRV1" : {
"IP" : "99.32.4.0",
"user" : "aname",
"pswd" : "p4ssw0rd",
"status" : "unavailable"
},
"SRV2" : {
"IP" : "129.99.10.5",
"user" : "guest",
"pswd" : "guest",
"status" : "unavailable"
}
},
"timeout" : 60,
"log" : {
"file" : "/var/log/my_log.log",
"level" : "warn"
},
"temp" : "remove me"
}
which can simply be saved to a file and similarly restored.

How do I access certain keys in a Perl nested hash?

I dumped a data structure:
print Dumper($bobo->{'issues'});
and got:
$VAR1 = {
'155' => {
'name' => 'Gender',
'url_name' => 'gender'
}
};
How can I extract 155?
How about if I have:
$VAR1 = {
'155' => {'name' => 'Gender', 'url_name' => 'gender'},
'11' => {'name' => 'Toddler', 'url_name' => 'toddler'},
'30' => {'name' => 'Lolo', 'url_name' => 'lolo'}
};
I want to print one key, i.e. the first or second to see the value of the key?
So, based on the example you posted, the hash looks like this:
$bobo = {
issues => {
155 => {
name => 'Gender',
url_name => 'gender',
},
},
};
'155' is a key in your example code. To extract a key, you would use keys.
my #keys = keys %{$bobo->{issues}};
But to get the value that 155 indexes, you could say:
my $val = $bobo->{issues}{155};
Then $val would contain a hashref that looks like this:
{
name => 'Gender',
url_name => 'gender'
}
Have a look at perldoc perlreftut.
It is a key in the hash referenced by $bobo->{'issues'}. So you would iterate through
keys %{$bobo->{'issues'}}
to find it.

How do I access values in the data structure returned by XML::Simple?

Hi everyone,
This is very simple for perl programmers but not beginners like me,
I have one xml file and I processed using XML::Simple like this
my $file="service.xml";
my $xml = new XML::Simple;
my $data = $xml->XMLin("$file", ForceArray => ['Service','SystemReaction',
'Customers', 'Suppliers','SW','HW'],);
Dumping out $data, it looks like this:
$data = {
'Service' => [{
'Suppliers' => [{
'SW' => [
{'Path' => '/work/service.xml', 'Service' => 'b7a'},
{'Path' => '/work/service1.xml', 'Service' => 'b7b'},
{'Path' => '/work/service2.xml', 'Service' => 'b5'}]}
],
'Id' => 'SKRM',
'Customers' =>
[{'SW' => [{'Path' => '/work/service.xml', 'Service' => 'ASOC'}]}],
'Des' => 'Control the current through the pipe',
'Name' => ' Control unit'
},
{
'Suppliers' => [{
'HW' => [{
'Type' => 'W',
'Path' => '/work/hardware.xml',
'Nr' => '18',
'Service' => '1'
},
{
'Type' => 'B',
'Path' => '/work/hardware.xml',
'Nr' => '7',
'Service' => '1'
},
{
'Type' => 'k',
'Path' => '/work/hardware.xml',
'Nr' => '1',
'Service' => '1'
}]}
],
'Id' => 'ADTM',
'Customers' =>
[{'SW' => [{'Path' => '/work/service.xml', 'Service' => 'SDCR'}]}],
'Des' => 'It delivers actual motor speed',
'Name' => ' Motor Drivers and Diognostics'
},
# etc.
],
'Systemreaction' => [
# etc.
],
};
How to access each elements in the service and systemReaction(not provided). because I am using "$data" in further processing. So I need to access each Id,customers, suppliers values in each service. How to get particular value from service to do some process with that value.for example I need to get all Id values form service and create nodes for each id values.
To get Type and Nr value I tried like this
foreach my $service (#{ $data->{Service}[1]{Suppliers}[0]{HW}[0] }) {
say $service->{Nr};
}
foreach my $service (#{ $data->{Service}[1]{Suppliers}[0]{HW}[0] }) {
say $service->{Type};
}
can you help me how to get all Nr and Type values from Supplier->HW.
I suggest reading perldocs Reference Tutorial and References and Nested Data Structures. They contain an introduction and full explanation of how to access data like that.
But, for example, you can access the service ID by doing:
say $data->{Service}[0]{Id} # prints SKRM
You could go through all the services, printing their ID, with a loop:
foreach my $service (#{ $data->{Service} }) {
say $service->{Id};
}
In response to your edit
$data->{Service}[1]{Suppliers}[0]{HW}[0] is an hash reference (you can check this quickly by either using Data::Dumper or Data::Dump on it, or just the ref function). In particular, it is { Nr => 18, Path => "/work/hardware.xml", Service => 1, Type => "W" }
In other words, you've almost got it—you just went one level too deep. It should be:
foreach my $service (#{ $data->{Service}[1]{Suppliers}[0]{HW} }) {
say $service->{Nr};
}
Note the lack of the final [0] that you had.