perl script to iterate over xml nodes using XML::LibXML - perl

I am trying to come up with a perl script to iterate over some nodes and get values in xml file.
My XML File looks like below and is saved spec.xml
<?xml version="1.0" encoding="UTF-8"?>
<WO xmlns="http://www.example.com/yyyy" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<WOSet>
<SR>
<FINISHTIME>2013-07-29T18:21:38-05:00</FINISHTIME>
<STARTTIME xsi:nil="true" />
<TYPE>SR</TYPE>
<DESCRIPTION>Create CUST</DESCRIPTION>
<EXTERNALSYSTEMID />
<REPORTEDBY>PCAUSR</REPORTEDBY>
<REPORTEDEMAIL />
<STATUS>RESOLVED</STATUS>
<SRID>1001</SRID>
<UID>1</UID>
<SPEC>
<AVALUE>IT</AVALUE>
<ATTRID>CUST_DEPT</ATTRID>
<NALUE xsi:nil="true" />
<TVALUE />
</SPEC>
<SPEC>
<AVALUE>001</AVALUE>
<ATTRID>DEPT_CODE</ATTRID>
<NVALUE xsi:nil="true" />
<TVALUE />
</SPEC>
</SR>
</WOSet>
</WO>
when I run the below script , I neither get the output nor any error to get clue on where to fix things...
I am not a perl expert , would love experts here to through some light...
#!/usr/bin/perl
use XML::LibXML;
use strict;
use warnings;
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($file);
my $root = $tree->getDocumentElement;
foreach my $atrid ( $tree->findnodes('WO/WOSet/SR/SPEC') ) {
my $name = $atrid->findvalue('ATTRID');
my $value = $atrid->findvalue('AVALUE');
print $name
print " = ";
print $value;
print ";\n";
}
My expected output is
CUST_DEPT = IT
DEPT_CODE = 001

The XML doesn't contain any element named WO in the null namespace. You want to match the elements named WO in the http://www.example.com/yyyy namespace.
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML qw( );
use XML::LibXML::XPathContext qw( );
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my $root = $doc->getDocumentElement;
my $xpc = XML::LibXML::XPathContext->new($doc);
$xpc->registerNs(y => 'http://www.example.com/yyyy');
for my $atrid ( $xpc->findnodes('y:WO/y:WOSet/y:SR/y:SPEC') ) {
my $name = $xpc->findvalue('y:ATTRID', $atrid);
my $value = $xpc->findvalue('y:AVALUE', $atrid);
print "$name = $value\n";
}

Related

Perl get XML node value using XML:LibXML

I am trying to print out content of nodes to do further process. Wanted to print x_id="123" and node "a" content. I am using XML:LibXML parser. Any suggestion? I am very new to this file parser.
Example XML:
<header>
<id x_id="123">
<a>testing</a>
<b></b>
</id>
</header>
Current not working code:
use strict;
use warnings;
use XML::LibXML;
my $template = "xx.xml";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($template);
my($object) = $doc->findnodes("/header/id/");
print $doc->findvalue("/header/id/x_id");
Sample code snippet for demo
use strict;
use warnings;
use feature 'say';
use XML::LibXML;
my $file = 'test.xml';
my $dom = XML::LibXML->load_xml(location => $file);
foreach my $node ($dom->findnodes('//idset')) {
say 'NodeID: ', $node->{id};
say 'ItemA: ', $node->findvalue('./a');
say 'ItemB: ', $node->findvalue('./b');
say '';
}
Content of input file text.xml
<header>
<idset id="100">
<a>item_a</a>
<b>item_b</b>
</idset>
<idset id="101">
<a>item_c</a>
<b>item_d</b>
</idset>
</header>
Output
NodeID: 100
ItemA: item_a
ItemB: item_b
NodeID: 101
ItemA: item_c
ItemB: item_d

Perl with XML::LibXML Dom (Globally Find and Replace XML)

I am new to DOM and XML-LibXML.
This is my sample mathml (XML) file. My XML filename is in.xml and i need the final output XML filename is out.xml. I would like to find <mi>bcde</mi> and need to modify <mtext>pqsd</mtext> globally and store in out.xml. How to achieve this.
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mfrac>
<mi>a</mi>
<mrow>
<mi>bcde</mi>
</mrow>
</mfrac>
<msqrt>
<mi>s</mi>
<mi>e</mi>
<mi>f</mi>
</msqrt>
</math>
#!/usr/bin/perl
use strict;
use warnings 'all';
use XML::LibXML;
my $mediaIdFrom = "MEDIAID_TEST";
my $VodItemIdFrom = "VODITEM_ID_TEST";
my $mediaId="";
my $vodItemId="";
my $filename = 'sample1.xml';
my $out_filename = "sample2.xml";
my $dom = XML::LibXML -> load_xml(location => $filename);
foreach $mediaId ($dom->findnodes('/ScheduleProvider/Episode/Media/#id')) {
$mediaId->setValue("xx " . $mediaIdFrom . " yy");
}
foreach $vodItemId ($dom->findnodes('/ScheduleProvider/VoidItem/#id')) {
$vodItemId->setValue($VodItemIdFrom);
}
#### for storing the output separate XML file
$dom->toFile($out_filename);`
Your XML has a namespace but your XPath queries don't, see note under findnodes in man XML::LibXML::Node. This code should work:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::XPathContext;
my $dom = XML::LibXML->load_xml(string => <<'END_OF_XML');
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mfrac>
<mi>a</mi>
<mrow>
<mi>bcde</mi>
</mrow>
</mfrac>
<msqrt>
<mi>s</mi>
<mi>e</mi>
<mi>f</mi>
</msqrt>
</math>
END_OF_XML
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs('math', 'http://www.w3.org/1998/Math/MathML');
foreach my $node ($xpc->findnodes('/math:math/math:mfrac/math:mrow/math:mi', $dom)) {
my $newNode = XML::LibXML::Element->new('mtext');
$newNode->appendText('pqsd');
$node->replaceNode($newNode);
}
print $dom->toString();
Output:
$ perl dummy.pl
<?xml version="1.0"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mfrac>
<mi>a</mi>
<mrow>
<mtext>pqsd</mtext>
</mrow>
</mfrac>
<msqrt>
<mi>s</mi>
<mi>e</mi>
<mi>f</mi>
</msqrt>
</math>
EDIT Maybe I have misunderstood your question and you want to replace all occurrences of <mi>bcde</mi>? Then the foreach would change to
foreach my $node ($xpc->findnodes('//math:mi[text()="bcde"]', $dom)) {
EDIT 2 to find multiple <mi>xyz</mi> and replace them you could use text=replacement command line parameters, i.e.
foreach my $argv (#ARGV) {
next
unless my($find, $replace) = ($argv =~ /^([^=]+)=(.*)$/);
foreach my $node ($xpc->findnodes(qq{//math:mi[text()="${find}"]}, $dom)) {
my $newNode = XML::LibXML::Element->new('mtext');
$newNode->appendText($replace);
$node->replaceNode($newNode);
}
}
and your replacement example would be
$ perl dummy.pl bcde=pqsd
EDIT 3 replace all <mi>xxx</mi> where xxx has more than one character with mtext:
foreach my $node ($xpc->findnodes('//math:mi', $dom)) {
my $text = $node->textContent();
# strip surrounding white space from text
$text =~ s/^\s+//;
$text =~ s/\s+$//;
# if text has more than one character then replace "mi" with "mtext"
if (length($text) > 1) {
my $newNode = XML::LibXML::Element->new('mtext');
$newNode->appendText($text);
$node->replaceNode($newNode);
}
}

How do I extract an attribute/property in Perl using XML::Twig module?

If I have the below sample XML, how do I extract the _Id from the field using XML::Twig?
<note>
<to _Id="100">Share</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>A simple text</body>
</note>
I've tried combinations of the below with no luck.
sub getId {
my ($twig, $mod) = #_;
##my $to_id = $mod->field('to')->{'_Id'}; ## does not work
##my $to_id = $mod->{'atts'}->{_Id}; ## does not work
##my $to_id = $mod->id; ## does not work
$twig->purge;
}
This is one way to get 100. It uses the first_child method:
use warnings;
use strict;
use XML::Twig;
my $xml = <<XML;
<note>
<to _Id="100">Share</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>A simple text</body>
</note>
XML
my $twig = XML::Twig->new(twig_handlers => { note => \&getId });
$twig->parse($xml);
sub getId {
my ($twig, $mod) = #_;
my $to_id = $mod->first_child('to')->att('_Id');
print "$to_id \n";
}

perl script using XML parser to read values in text file and replace it xml file

Perl script using XML parser to read values in text file and replace it in xml file
how to read xml tag and replace value from text file value. if an entry value is null in install.properties then same has to be updated in property.xml and if entry value is null xml it should get updated with text file value
install.properties text file
TYPE = Patch
LOCATION =
HOST = 127.1.1.1
PORT = 8080
property.xml file before values are replaced
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="TYPE">Release</entry>
<!-- tst -->
<entry key="LOCATION">c:/release</entry>
<entry key="HOST">localhost</entry>
<entry key="PORT"></entry>
</properties>
property.xml file after values has been replaced
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="TYPE">Patch</entry>
<!-- tst -->
<entry key="LOCATION"></entry>
<entry key="HOST">127.1.1.1</entry>
<entry key="PORT">8080</entry>
</properties>
A solution using XML::XSH2, a wrapper around XML::LibXML.
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
open my $INS, '<', 'install.properties' or die $!;
while (<$INS>) {
chomp;
my ($var, $val) = split / = /; # / fix StackOverflow syntax highlighting.
$XML::XSH2::Map::ins->{$var} = $val;
}
xsh << '__XSH__';
open property.xml ;
for /properties/entry {
set ./text() xsh:lookup('ins', #key) ;
}
save :b ;
__XSH__
The same programme imlemented using only XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
open my $INS, '<', 'install.properties' or die $!;
my %ins;
while (<$INS>) {
chomp;
my ($var, $val) = split / = /; # / fix StackOverflow syntax highlighting.
$ins{$var} = $val;
}
my $xml = 'XML::LibXML'->load_xml( location => 'property.xml' );
for my $entry( $xml->findnodes('/properties/entry')) {
my ($text) = $entry->findnodes('text()');
$text->setData($ins{ $entry->getAttribute('key') });
}
rename 'property.xml', 'property.xml~';
$xml->toFile('property.xml');
Again, with XML::Twig:
#!/usr/bin/perl
use strict;
use warnings;
use autodie qw( open);
use XML::Twig;
my $IN= "install.properties";
my $XML= "properties.xml";
# load the input file into a a hash key => value
open( my $in, '<', $IN);
my %entry= map { chomp; split /\s*=\s*/; } <$in>;
XML::Twig->new( twig_handlers => { entry => \&entry, },
keep_spaces => 1,
)
->parsefile_inplace( $XML);
sub entry
{ my( $t, $entry)= #_;
if( my $val= $entry{$entry->att( 'key')} )
{ $entry->set_text( $val); }
$t->flush;
}

Is there a reason to use the XML::LibXML::Number-object in my XML::LibXML-example?

In this example I get to times '96'. Is there a possible case where I would need a XML::LibXML-Number-object to to achieve the goal?
#!/usr/bin/env perl
use warnings; use strict;
use 5.012;
use XML::LibXML;
my $xml_string =<<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<filesystem>
<path>
<dirname>/var</dirname>
<files>
<action>delete</action>
<age units="hours">10</age>
</files>
<files>
<action>delete</action>
<age units="hours">96</age>
</files>
</path>
</filesystem>
EOF
#/
my $doc = XML::LibXML->load_xml( string => $xml_string );
my $root = $doc->documentElement;
my $result = $root->find( '//files/age[#units="hours"]' );
$result = $result->get_node( 1 );
say ref $result; # XML::LibXML::Element
say $result->textContent; # 96
$result = $root->find ( 'number( //files/age[#units="hours"] )' );
say ref $result; # XML::LibXML::Number
say $result; # 96
Although I've used XML::LibXML quite a bit I have never encountered the XML::LibXML::Number class. It seems to exist to allow XPath expressions to make numerical assertions about the text content of a node (e.g.: > 10).
If all you want is the number '96' then the easiest way is probably:
my $result = $root->findvalue( '//files/age[#units="hours"]' );
An idiom I find useful for getting multiple values is:
my #values = map { $_->to_literal } $doc->find('//files/age');