How can I access attributes and elements from XML::LibXML in Perl? - perl

I am having trouble understanding / using name spaces with XML::LibXML package in Perl. I can access an element successfully but not an attribute. I have the following code which accesses an XML file (http://pastebin.com/f3fb9d1d0).
my $tree = $parser->parse_file($file); # parses the file contents into the new libXML object.
my $xpc = XML::LibXML::XPathContext->new($tree);
$xpc->registerNs(microplateML => 'http://moleculardevices.com/microplateML');
I then try and access an element called common-name and an attribute called name.
foreach my $camelid ($xpc->findnodes('//microplateML:species')) {
my $latin_name = $camelid->findvalue('#name');
my $common_name = $camelid->findvalue('common-name');
print "$latin_name, $common_name" ;
}
But only the latin-name (#name) is printing out, the common-name is not. What am I doing wrong and how can I get the common-name to print out as well?
What does the #name do in this case? I presume it is an array, and that attributes should be put into an array as there can be more than one, but elements (like common-name) should not be because there should just be one?
I've been following the examples here: http://www.xml.com/pub/a/2001/11/14/xml-libxml.html
and here: http://perl-xml.sourceforge.net/faq/#namespaces_xpath, and trying to get their example camel script working with my namespace, hence the weird namespace.

Make sure you XML file is valid then use $node->getAttribute("someAttribute") to access attributes.
#name is a attribute name. You'd use it in findnodes() to specify elements with a given attribute set. Eg. a path like:
//camelids/species[#name="Camelus bactrianus"]/
Here is a simple/contrived example:
#!/usr/bin/perl -w
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file('/Users/castle/Desktop/animal.xml');
my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() );
$xc->registerNs('ns', 'http://moleculardevices.com/microplateML');
my #n = $xc->findnodes('//ns:species');
foreach $nod (#n) {
print "A: ".$nod->getAttribute("name")."\n";
my #c = $xc->findnodes("./ns:common-name", $nod);
foreach $cod (#c) {
print "B: ".$cod->nodeName;
print " = ";
print $cod->getFirstChild()->getData()."\n";
}
}
Output is:
perl ./xmltest.pl
A: Camelus bactrianus
B: common-name = Bactrian Camel

Related

Print output using XML::LibXML

my $doc = $parser->parse_string( $res->content );
my $root = $doc->getDocumentElement;
my #objects = $root->getElementsByTagName('OBJECT');
foreach my $object ( #objects ){
my $name = $object->firstChild;
print "OBJECT = " . $name . "\n";}
OUTPUT is:
OBJECT = XML::LibXML::Text=SCALAR(0x262e170)
OBJECT = XML::LibXML::Text=SCALAR(0x2ee4b00)
OBJECT = XML::LibXML::Text=SCALAR(0x262e170)
OBJECT = XML::LibXML::Text=SCALAR(0x2ee4b00)
Can anyone please explain why print prints the $name attribute values like this? Why does it print normal when I use the function getAttribute with virtually he same code?
getAttribute returns an attribute, while firstChild returns a text node, element, processing instruction, or a comment.
What you see is a normal Perl way of printing an object: it prints its class and address. Your version of XML::LibXML seems to be a bit antique, recent versions overload the stringification and the code produces the actual text node.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $doc = 'XML::LibXML'->load_xml( string => << '__XML__');
<root>
<OBJECT name="o1">hello</OBJECT>
</root>
__XML__
my #objects = $doc->getElementsByTagName('OBJECT');
for my $object (#objects) {
print 'OBJECT = ', $object->firstChild, "\n";
}
Output:
OBJECT = hello
In the old versions, one needed to call the nodeValue or data method.
print 'OBJECT = ', $object->firstChild->data, "\n";

Perl XML lib get full xpath

Looking to return the full xpath from a general xpath that may grab multiple results.
The search string would be something general like this:
/myXmlPath/#myValue
The contained xml nodes might look something like this:
<myXmlPath someAttribute="false" myValue="">
<myXmlPath someAttribute="true" myValue="">
Perl code something like this:
use XML::LibXML;
use XML::XPath::XMLParser;
my $filepath = "c:\\temp\\myfile.xml";
my $parser = XML::LibXML->new();
$parser->keep_blanks(0);
my $doc = $parser->parse_file($filepath);
#myWarn = ('/myXmlPath/#myValue');
foreach(#myWarn) {
my $nodeset = $doc->findnodes($_);
foreach my $node ($nodeset->get_nodelist) {
my $value = $node->to_literal;
print $_,"\n";
print $value," - value \n";
print $node," - node \n";
}
}
I'd like to be able to evaluate the returned full path values from the xml. This code works fine when I'm using it to lookup general things in an xpath, but would be more ideal if I could get at other data from the nodeset result.
Like ikegami said, I'm not sure exactly what you're after so I've kind of produced a shotgun approach for everything I could interpret your question.
use strict;
use warnings;
use XML::LibXML;
use v5.14;
my $doc = XML::LibXML->load_xml(IO => *DATA);
say "Get the full path to the node";
foreach my $node ($doc->findnodes('//myXmlPath/#myValue')) {
say "\t".$node->nodePath();
}
say "Get the parent node of the attribute by searching";
foreach my $node ($doc->findnodes('//myXmlPath[./#myValue="banana"]')) {
say "\t".$node->nodePath();
my ($someAttribute, $myValue) = map { $node->findvalue("./$_") } qw (#someAttribute #myValue);
say "\t\tsomeAttribute: $someAttribute";
say "\t\tmyValue: $myValue";
}
say "Get the parent node programatically";
foreach my $attribute ($doc->findnodes('//myXmlPath/#myValue')) {
my $element = $attribute->parentNode;
say "\t".$element->nodePath();
}
__DATA__
<document>
<a>
<b>
<myXmlPath someAttribute="false" myValue="apple" />
</b>
<myXmlPath someAttribute="false" myValue="banana" />
</a>
</document>
Which would produce:
Get the full path to the node
/document/a/b/myXmlPath/#myValue
/document/a/myXmlPath/#myValue
Get the parent node of the attribute by searching
/document/a/myXmlPath
someAttribute: false
myValue: banana
Get the parent node programatically
/document/a/b/myXmlPath
/document/a/myXmlPath

How to extract directory names from a path in Perl

I have a path like this
/home/user/doc/loc
I want to extract home, user, doc, loc separately. I tried split (////) and also split("/")
but none of them worked. Please give me sample script:
while (<EXPORT>) {
if (/^di/) {
($key, $curdir) = split(/\t/);
printf "the current dir is %s\n", $curdir;
printf("---------------------------------\n");
($home_dir, $user_dir, $doc_dir, $loc_dir) = split("/");
}
}
But it didn't work; hence please help me.
Given $curdir containing a path, you'd probably use:
my(#names) = split m%/%, $curdir;
on a Unix-ish system. Or you would use File::Spec and splitdir. For example:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Spec;
my $curdir = "/home/user/doc/loc";
my(#names) = split m%/%, $curdir;
foreach my $part (#names)
{
print "$part\n";
}
print "File::Spec->splitdir()\n";
my(#dirs) = File::Spec->splitdir($curdir);
foreach my $part (#dirs)
{
print "$part\n";
}
Ouput (includes a leading blank line):
home
user
doc
loc
File::Spec->splitdir()
home
user
doc
loc
split's first result will be the string preceding the first instance of the regular expression passed to it. Since you have a leading "/" here you would get an empty string in $home_dir, 'user' in $user_dir and so on. Add undef to the list assignment's first position or alternatively trim a leading slash first.
Also I'm not sure if you can call split without passing it $curdir here. Try:
(undef, $home_dir, $user_dir, $doc_dir, $loc_dir) = split("/", $curdir);

XML::Simple: Parsing nested hashes/array

I am trying to parse an XML file. The Xml file can be found # http://pastebin.com/fvuwbrh9.
I have saved this xml file as packages.xml.
Goal: List all the names which are surrounded by <packagereq> tag in the XML (I am referring the packagereq which fall under the group in the dumper output).
I wrote below script called rpm.pl:
#!/usr/bin/perl -w
use strict;
use XML::Simple;
use Data::Dumper;
my $ref = XMLin ('packages.xml');
#print Dumper ($ref);
foreach my $a ( keys %{ $ref->{group} } )
{
if ( exists $ref->{group}->{$a}->{packagelist} )
{
foreach my $b ( #{ $ref->{group}->{$a}->{packagelist}->{packagereq} } )
{
print $b->{content}."\n"; ### <<< referring the Dumper out put
}
}
}
Now my script goes half way throgh and prints the package names but then it gets terminated with below error:
Not an ARRAY reference at rpm.pl line 29.
After above error, the script does not process rest of the XML file and terminates.
Above error makes me believe that somewhere value of $ref->{group}->{$a}->{packagelist}->{packagereq} is not an ARRAY reference.
I have gone as carefuly as I can throguh the XML file (OR the Dumper output) but found that packagereq always points to an ARRAY reference unless and of course I overlooked something but I doubt so.
Could you provide some input on why is it complaining about Not an ARRAY ref.
Thanks.
XML::Simple, the most complicated XML parser to use. Add the following:
my $ref = XMLin ('packages.xml',
KeyAttr => [qw( id )],
ForceArray => [qw( group packagereq ignoredep )],
);

Doing XPath using Perl

I am coding with Perl on a Window 7 machine. I am able to extract data from the XML using the XPath code below
use strict;
use warning;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($newfile);
my $query = "/tradenet/message/header/unique_ref_no/date/text( )";
my($node) = $doc->findnodes($query);
$node->setData("$file_seq_number");
However, when i use the same code on a different XML, the xpath from the second document looks as below:
/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric
Together with the Perl code, this is what the extraction code looks like:
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($newfile);
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric/text( )";
my($node) = $doc->findnodes($query);
$node->setData("$file_seq_number");
Using the second code, I am unable to retrieve the data from the second XML. I receive this error "Can't call method "setData"on an undefined value at Perl.pl line 5".
Does the ":" character in the second XPATH address affecting the code?
You have to define what out, cac, and cbc mean in order for the XPath query to find the appropriate nodes:
my $doc = $parser->parse_file($newfile);
my $xpath_context = XML::LibXML::XPathContext->new($doc->documentElement());
# These URIs need to be the same as the ones in the source document
$xpath_context->registerNs('out', 'http://example.com/out.xsd');
$xpath_context->registerNs('cac', 'http://example.com/cac.xsd');
$xpath_context->registerNs('cbc', 'http://example.com/cbc.xsd');
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric/text( )";
my ($node) = $xpath_context->findnodes($query);
As promised, here is a working example. First, the test input file:
<?xml version="1.0"?>
<!-- input.xml -->
<TradenetResponse xmlns:a="http://example.com/out.xsd"
xmlns:b="http://example.com/cac.xsd"
xmlns:c="http://example.com/cbc.xsd">
<OutboundMessage>
<a:OutwardPermit>
<a:Declaration>
<a:Header>
<b:UniqueReferenceNumber>
<c:SequenceNumeric>1234</c:SequenceNumeric>
</b:UniqueReferenceNumber>
</a:Header>
</a:Declaration>
</a:OutwardPermit>
</OutboundMessage>
</TradenetResponse>
And here is the working Perl script:
#!/usr/bin/perl
# parse.pl
use strict;
use warnings;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $newfile = "input.xml";
my $doc = $parser->parse_file($newfile);
my $xpath_context = XML::LibXML::XPathContext->new($doc->documentElement());
# These URIs need to be the same as the ones in the source document
$xpath_context->registerNs('out', 'http://example.com/out.xsd');
$xpath_context->registerNs('cac', 'http://example.com/cac.xsd');
$xpath_context->registerNs('cbc', 'http://example.com/cbc.xsd');
# Query wrapped for clarity
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit" .
"/out:Declaration/out:Header/cac:UniqueReferenceNumber" .
"/cbc:SequenceNumeric/text()";
my ($node) = $xpath_context->findnodes($query);
print "Value: " . $node->getData() . "\n";
The output for me is:
sean#localhost:~xmltest$ ./parse.pl
Value: 1234