Perl get XML node value using XML:LibXML - perl

I am trying to print out content of nodes to do further process. Wanted to print x_id="123" and node "a" content. I am using XML:LibXML parser. Any suggestion? I am very new to this file parser.
Example XML:
<header>
<id x_id="123">
<a>testing</a>
<b></b>
</id>
</header>
Current not working code:
use strict;
use warnings;
use XML::LibXML;
my $template = "xx.xml";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($template);
my($object) = $doc->findnodes("/header/id/");
print $doc->findvalue("/header/id/x_id");

Sample code snippet for demo
use strict;
use warnings;
use feature 'say';
use XML::LibXML;
my $file = 'test.xml';
my $dom = XML::LibXML->load_xml(location => $file);
foreach my $node ($dom->findnodes('//idset')) {
say 'NodeID: ', $node->{id};
say 'ItemA: ', $node->findvalue('./a');
say 'ItemB: ', $node->findvalue('./b');
say '';
}
Content of input file text.xml
<header>
<idset id="100">
<a>item_a</a>
<b>item_b</b>
</idset>
<idset id="101">
<a>item_c</a>
<b>item_d</b>
</idset>
</header>
Output
NodeID: 100
ItemA: item_a
ItemB: item_b
NodeID: 101
ItemA: item_c
ItemB: item_d

Related

perl script to iterate over xml nodes using XML::LibXML

I am trying to come up with a perl script to iterate over some nodes and get values in xml file.
My XML File looks like below and is saved spec.xml
<?xml version="1.0" encoding="UTF-8"?>
<WO xmlns="http://www.example.com/yyyy" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<WOSet>
<SR>
<FINISHTIME>2013-07-29T18:21:38-05:00</FINISHTIME>
<STARTTIME xsi:nil="true" />
<TYPE>SR</TYPE>
<DESCRIPTION>Create CUST</DESCRIPTION>
<EXTERNALSYSTEMID />
<REPORTEDBY>PCAUSR</REPORTEDBY>
<REPORTEDEMAIL />
<STATUS>RESOLVED</STATUS>
<SRID>1001</SRID>
<UID>1</UID>
<SPEC>
<AVALUE>IT</AVALUE>
<ATTRID>CUST_DEPT</ATTRID>
<NALUE xsi:nil="true" />
<TVALUE />
</SPEC>
<SPEC>
<AVALUE>001</AVALUE>
<ATTRID>DEPT_CODE</ATTRID>
<NVALUE xsi:nil="true" />
<TVALUE />
</SPEC>
</SR>
</WOSet>
</WO>
when I run the below script , I neither get the output nor any error to get clue on where to fix things...
I am not a perl expert , would love experts here to through some light...
#!/usr/bin/perl
use XML::LibXML;
use strict;
use warnings;
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($file);
my $root = $tree->getDocumentElement;
foreach my $atrid ( $tree->findnodes('WO/WOSet/SR/SPEC') ) {
my $name = $atrid->findvalue('ATTRID');
my $value = $atrid->findvalue('AVALUE');
print $name
print " = ";
print $value;
print ";\n";
}
My expected output is
CUST_DEPT = IT
DEPT_CODE = 001
The XML doesn't contain any element named WO in the null namespace. You want to match the elements named WO in the http://www.example.com/yyyy namespace.
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML qw( );
use XML::LibXML::XPathContext qw( );
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my $root = $doc->getDocumentElement;
my $xpc = XML::LibXML::XPathContext->new($doc);
$xpc->registerNs(y => 'http://www.example.com/yyyy');
for my $atrid ( $xpc->findnodes('y:WO/y:WOSet/y:SR/y:SPEC') ) {
my $name = $xpc->findvalue('y:ATTRID', $atrid);
my $value = $xpc->findvalue('y:AVALUE', $atrid);
print "$name = $value\n";
}

XML reading using Perl

I am new to the Perl language. I have an XML like,
<xml>
<date>
<date1>2012-10-22</date1>
<date2>2012-10-23</date2>
</date>
</xml>
I want to parse this XML file & store it in array. How to do this using perl script?
Use XML::Simple - Easy API to maintain XML (esp config files) or
see XML::Twig - A perl module for processing huge XML documents in tree mode.
Example like:
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml = q~<xml>
<date>
<date1>2012-10-22</date1>
<date2>2012-10-23</date2>
</date>
</xml>~;
print $xml,$/;
my $data = XMLin($xml);
print Dumper( $data );
my #dates;
foreach my $attributes (keys %{$data->{date}}){
push(#dates, $data->{date}{$attributes})
}
print Dumper(\#dates);
Output:
$VAR1 = [
'2012-10-23',
'2012-10-22'
];
Here's one way with XML::LibXML
#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(location => 'data.xml');
my #nodes = $doc->findnodes('/xml/date/*');
my #dates = map { $_->textContent } #nodes;
Using XML::XSH2, a wrapper around XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
xsh << '__XSH__';
open 2.xml ;
for $t in /xml/date/* {
my $s = string($t) ;
perl { push #l, $s }
}
__XSH__
no warnings qw(once);
print join(' ', #XML::XSH2::Map::l), ".\n";
If you can't/don't want to use any CPAN mod:
my #hits= $xml=~/<date\d+>(.+?)<\/date\d+>/
This should give you all the dates in the #hits array.
If the XML isn't as simple as your example, using a XML parser is recommended, the XML::Parser is one of them.

how to remove comments syntax only?

I want collect all tags in from XML file. How can I remove comments syntax only?
XML File:
<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<!--<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref>--></given-names>
</name>
</contrib>
</contrib-group>
</xml>
I need output as:
<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref></given-names>
</name>
</contrib>
</contrib-group>
</xml>
How can I remove comments.. without remove contains?
script:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
open(my $output , '>', "split.xml") || die "can't open the Output $!\n";
my $xml = XML::Twig->new( twig_handlers => { xref => sub{comments => 'drop'} } );
$xml->parsefile("sample.xml");
$xml->print($output);
I can't do it... How can I remove <!-- --> only without remove contain?
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
open my $output , '>', 'split.xml' or die "Can't open: $!\n";
my $xml = XML::Twig->new( comments => 'process', # Turn on comment processing
twig_handlers =>
{ '#COMMENT' => \&uncomment }
);
$xml->parsefile('sample.xml');
$xml->print($output);
sub uncomment {
my ($xml, $comment) = #_;
$comment->set_outer_xml($comment->text); # Replace the comment with its contents.
}

perl parsing using sax

I would like to write a xml parsing script in Perl that prints all the firstname values from the following xml file using XML::SAX module.
<employees>
<employee>
<firstname>John</firstname>
<lastname>Doe</lastname>
<age>gg</age>
<department>Operations</department>
<amount Ccy="EUR">100</amount>
</employee>
<employee>
<firstname>Larry</firstname>
<lastname>Page</lastname>
<age>45</age>
<department>Accounts</department>
<amount Ccy="EUR">200</amount>
</employee>
<employee>
<firstname>Harry</firstname>
<lastname>Potter</lastname>
<age>50</age>
<department>Human Resources</department>
<amount Ccy="EUR">300</amount>
</employee>
</employees>
Can anyone help me with sample script?
I am a new to Perl.
Here's an example using XML::SAX. I've used XML::SAX::PurePerl.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use XML::SAX::ParserFactory;
use XML::SAX::PurePerl;
my $characters;
my #firstnames;
my $factory = new XML::SAX::ParserFactory;
#Let's see which handlers we have available
#print Dumper $factory;
my $handler = new XML::SAX::PurePerl;
my $parser = $factory->parser(
Handler => $handler,
Methods => {
characters => sub {
$characters = shift->{Data};
},
end_element => sub {
push #firstnames, $characters if shift->{LocalName} eq 'firstname';
}
}
);
$parser->parse_uri("sample.xml");
print Dumper \#firstnames;
Output:
$VAR1 = [
'John',
'Larry',
'Harry'
];
I use $characters to hold character data, and push its contents onto #firstnames whenever I see a closing firstname tag.
Do you have any reason to stick with XML::Sax; If not then probably you can look for some other XML parsers in Perl (XML::Twig, XML::LibXML, XML::LibXMLReader, XML::Simple) and many more.
Here is a sample code to retrieve the firstname using XML::Twig.
use XML::Twig;
my $twig = XML::Twig->new ();
$twig->parsefile ('sample.xml');
my #firstname = map { $_->text } $twig->findnodes ('//firstname');

How can I use Perl's XML::LibXML to extract an attribute in a tag?

I have an XML file
<PARENT >
<TAG string1="asdf" string2="asdf" >
</TAG >
</PARENT>
I want to extract the string2 value here.. and also I want to set it to a new value..
How to do that?
Use XPath expressions
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;
my $doc = XML::LibXML->new->parse_string(q{
<PARENT>
<TAG string1="asdf" string2="asdfd">
</TAG>
</PARENT>
});
my $xpath = '/PARENT/TAG/#string2';
# getting value of attribute:
print Dumper $doc->findvalue($xpath);
my ($attr) = $doc->findnodes($xpath);
# setting new value:
$attr->setValue('dfdsa');
print Dumper $doc->findvalue($xpath);
# do following if you need to get string representation of your XML structure
print Dumper $doc->toString(1);
And read documentation, of course :)
You could use XML::Parser to get the value as well. For more information refer to the XML::Parser documentation:
#!/usr/local/bin/perl
use strict;
use warnings;
use XML::Parser;
use Data::Dumper;
my $attributes = {};
my $start_handler = sub
{
my ( $expat, $elem, %attr ) = #_;
if ($elem eq 'TAG')
{
$attributes->{$attr{'string1'}} = 'Found';
}
};
my $p1 = new XML::Parser(
Handlers => {
Start => $start_handler
}
);
$p1->parsefile('test.xml');
print Dumper($attributes);
I think you might be better off starting with XML::Simple and playing around a little first:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
my $xml = XMLin(\*DATA);
print $xml->{TAG}->{string2}, "\n";
$xml->{TAG}->{string2} = "asdf";
print XMLout( $xml, RootName => 'PARENT');
__DATA__
<PARENT>
<TAG string1="asdf" string2="value of string 2">
</TAG>
</PARENT>
Thanks for your responses. I found another answer in "Config file processing with LibXML2" which I found very useful.