xml parsing and string manipulation using perl - perl

I have an xml file with many lines similar to :
<parameter element="XYZ" module="XYZ" parametername="MyParameter" moc="MyParameter" moi="ABC=1473,DEF=0,GHI=0,JKL=0 />
My requirements are :
If the moc and parametername are same convert the first character in the parametername to lower case.
Reverse the moi like below.
So the converted line should be like :
<parameter element="XYZ" module="XYZ" parametername="myParameter" moc="MyParameter" moi="JKL=0,GHI=0,DEF.dEF=0,ABC.aBC=1473 />

Using XML::LibXML and adding a new element for a complete example (and assuming the changes are to be made on an element called parameter):
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => <DATA>);
for my $node($dom->findnodes('//parameter')) {
my $param = $node->getAttribute('parametername');
my $moc = $node->getAttribute('moc');
my #moi = split ",", $node->getAttribute('moi');
$node->setAttribute('parametername', lcfirst $param) if $param eq $moc;
$node->setAttribute('moi', join ',', reverse #moi);
}
print $dom;
__DATA__
<root>
<parameter element="XYZ" module="XYZ" parametername="MyParameter" moc="MyParameter" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
<parameter element="XYZ" module="XYZ" parametername="foo" moc="MyParameter" moi="XYZ=1473,DEF=0,GHI=0,JKL=0"/>
</root>
Result:
<root>
<parameter element="XYZ" module="XYZ" parametername="myParameter" moc="MyParameter" moi="JKL=0,GHI=0,DEF=0,ABC=1473"/>
<parameter element="XYZ" module="XYZ" parametername="foo" moc="MyParameter" moi="JKL=0,GHI=0,DEF=0,XYZ=1473"/>
</root>
Other ways to load_xml file using XML::LibXML

use strict;
use warnings;
use 5.014;
use XML::LibXML;
my $filename = "xml.xml";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($filename);
say $doc;
for my $param ($doc->findnodes('//parameter')) {
my $pname_attr = $param->getAttribute('parametername');
my $moc_attr = $param->getAttribute('moc');
if ($pname_attr eq $moc_attr) {
$param->setAttribute('parametername', lcfirst $pname_attr);
my $moi_attr = $param->getAttribute('moi');
my #pieces = split ',', $moi_attr;
$pieces[0] =~ s/\A([^=]+)/$1.\l$1/xms;
$pieces[1] =~ s/\A([^=]+)/$1.\l$1/xms;
$param->setAttribute('moi', join ',', reverse #pieces);
}
}
say $doc;
--output:--
<?xml version="1.0" encoding="UTF-8"?>
<root>
<parameter element="XYZ" module="XYZ" parametername="ABC" moc="CBA" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
<parameter element="XYZ" module="XYZ" parametername="MyParameter" moc="MyParameter" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
</root>
<?xml version="1.0" encoding="UTF-8"?>
<root>
<parameter element="XYZ" module="XYZ" parametername="ABC" moc="CBA" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
<parameter element="XYZ" module="XYZ" parametername="myParameter" moc="MyParameter" moi="JKL=0,GHI=0,DEF.dEF=0,ABC.aBC=1473"/>
</root>
If you want to change the moi attribute in all the <parameter> tags, then the code would look like this:
...
...
for my $param ($doc->findnodes('//parameter')) {
my $pname_attr = $param->getAttribute('parametername');
my $moc_attr = $param->getAttribute('moc');
if ($pname_attr eq $moc_attr) {
$param->setAttribute('parametername', lcfirst $pname_attr);
}
my $moi_attr = $param->getAttribute('moi');
my #pieces = split ',', $moi_attr;
$pieces[0] =~ s/\A([^=]+)/$1.\l$1/xms;
$pieces[1] =~ s/\A([^=]+)/$1.\l$1/xms;
$param->setAttribute('moi', join ',', reverse #pieces);
}
Response to comments:
1)
When I run it it says >/usr/bin/perl edit_mpvl.pl Perl v5.14.0
required--this is only v5.10.0,
Change the line:
use 5.014;
to:
use 5.010;
2)
Can we write the output to a file
Sure, add this:
my $fname = 'modified.xml';
open my $OUTFILE, '>', $fname
or die "Couldn't open $fname: $!";
print {$OUTFILE} $doc->toString;
close $OUTFILE;
Or, you can pretty print like this:
...
...
use XML::LibXML::PrettyPrint;
use Readonly;
...
...
Readonly my $SPACE => " ";
my $pp = XML::LibXML::PrettyPrint->new(
indent_string => $SPACE x 4 #Replace 4 by the number of spaces you want the indenting to be.
);
$pp->pretty_print($doc); #modifies $doc inplace
print {$OUTFILE} $doc->toString;
close $OUTFILE;

Related

make changes in xml file in perl

I have this xml file test1.xml:
<body>
<message>
<name>gandalf</name>
<attributes>
<value key="1">1</value>
<value key="2">2</value>
<value key="3">3</value>
<value key="4">4</value>
</attributes>
</message>
</body>
I want to override the value that its key is "4" to "10"
so my xml will look like this:
<body>
<message>
<name>gandalf</name>
<attributes>
<value key="1">1</value>
<value key="2">2</value>
<value key="3">3</value>
<value key="4">10</value>
</attributes>
</message>
</body>
this is my code:
#!/usr/bin/perl
use XML::Simple;
my $xml = new XML::Simple;
my $data = XMLin("test1.xml", ForceArray => 1);
$data->{message}->[0]->{attributes}->[0]->{value}->{4}->{content} = "10";
$newData = $xml->XMLout($data);
open(XML,">test2.xml");
print XML $newData;
close(XML);
when i run this code, the output xml looks like this:
<opt>
<message>
<name>gandalf</name>
<attributes name="value">
<1>1<1>
<2>2<2>
<3>3<3>
<4>10<4>
</attributes>
</message>
</opt>
Don't use XML::Simple.
XML::LibXML and XML::Twig are much better alternatives.
Here's the solution using XML::Twig:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $xml = XML::Twig -> new -> parsefile ( 'test1.xml' );
$_ -> set_text('10') for $xml -> get_xpath('//message/attributes/value[#key="4"]');
$xml -> set_pretty_print('indented');
$xml -> print;
This gives you:
<body>
<message>
<name>gandalf</name>
<attributes>
<value key="1">1</value>
<value key="2">2</value>
<value key="3">3</value>
<value key="4">10</value>
</attributes>
</message>
</body>
You can print to a file, by opening a filehandle and giving that fh as an argument to print:
open ( my $ouput, '>', 'test2.xml' ) or die $!;
$xml -> print ( $output );
Because you also ask in comments:
I also want to know how to set text for a value with a key that doesnt exists. For example i want to add <value key="5">5</value> inside the attributes
my $attributes = $xml -> get_xpath('//message/attributes',0); #0 to find the first one.
$attributes -> insert_new_elt('last_child', 'value', { key => 5 }, 5 );
Or as one line:
$xml -> get_xpath('//message/attributes',0) -> insert_new_elt('last_child', 'value', { key => 5 }, 5 );
Note the slightly different usage of get_xpath - we give a second argument 0 - because that says 'get the first element that matches', rather than every element that matches.

How to extract attribute value from a xml file

I have an xml file like this
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>
Now I need to extract the value of the f href attribute. I tried it with single line processing but there is certainly a better way to do it. Any idea?
Thanks
After fixing the typo in your XML, I was able to extract the value with the following code:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $dom = 'XML::LibXML'->load_xml( file => 'example.xml' );
my $xc = 'XML::LibXML::XPathContext'->new;
$xc->registerNs('x', 'http://ns.adobe.com/xfdf/');
for my $href ($xc->findvalue('//x:f/#href', $dom)) {
print $href, "\n";
}
I usually find XML::LibXML too verbose, so I'd use XML::XSH2:
open example.xml ;
register-namespace x http://ns.adobe.com/xfdf/ ;
for //x:f echo #href ;
I like XML::Twig. Not to dispute previous poster's solution, I'd do it like this:
use strict;
use warnings;
use XML::Twig;
sub extract_f {
my ( $twig, $f ) = #_;
print $f->atts->{'href'}, "\n";
}
my $twig = XML::Twig->new( twig_handlers => { 'f' => \&extract_f }, );
$twig->parse( \*DATA );
__DATA__
<?xml version="1.0" encoding="UTF-8"?><xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve" >
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>
The major reason I like XML::Twig is because it allows purging XML as you go - so if you have a lot of XML to work with, it can be invaluable.
I would recommend either XML::LibXML or XML::Twig.
I would consider your goal rather trivial if not for having to deal with namespaces. However, the following demonstrates how to use XML::LibXML to pull your desired value while ignoring the namespaces:
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_xml( IO => \*DATA );
my ($f) = $dom->findnodes('//*[local-name()="f"]');
print $f->getAttribute('href'), "\n";
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>
Outputs:
C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf

perl script to iterate over xml nodes using XML::LibXML

I am trying to come up with a perl script to iterate over some nodes and get values in xml file.
My XML File looks like below and is saved spec.xml
<?xml version="1.0" encoding="UTF-8"?>
<WO xmlns="http://www.example.com/yyyy" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<WOSet>
<SR>
<FINISHTIME>2013-07-29T18:21:38-05:00</FINISHTIME>
<STARTTIME xsi:nil="true" />
<TYPE>SR</TYPE>
<DESCRIPTION>Create CUST</DESCRIPTION>
<EXTERNALSYSTEMID />
<REPORTEDBY>PCAUSR</REPORTEDBY>
<REPORTEDEMAIL />
<STATUS>RESOLVED</STATUS>
<SRID>1001</SRID>
<UID>1</UID>
<SPEC>
<AVALUE>IT</AVALUE>
<ATTRID>CUST_DEPT</ATTRID>
<NALUE xsi:nil="true" />
<TVALUE />
</SPEC>
<SPEC>
<AVALUE>001</AVALUE>
<ATTRID>DEPT_CODE</ATTRID>
<NVALUE xsi:nil="true" />
<TVALUE />
</SPEC>
</SR>
</WOSet>
</WO>
when I run the below script , I neither get the output nor any error to get clue on where to fix things...
I am not a perl expert , would love experts here to through some light...
#!/usr/bin/perl
use XML::LibXML;
use strict;
use warnings;
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($file);
my $root = $tree->getDocumentElement;
foreach my $atrid ( $tree->findnodes('WO/WOSet/SR/SPEC') ) {
my $name = $atrid->findvalue('ATTRID');
my $value = $atrid->findvalue('AVALUE');
print $name
print " = ";
print $value;
print ";\n";
}
My expected output is
CUST_DEPT = IT
DEPT_CODE = 001
The XML doesn't contain any element named WO in the null namespace. You want to match the elements named WO in the http://www.example.com/yyyy namespace.
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML qw( );
use XML::LibXML::XPathContext qw( );
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my $root = $doc->getDocumentElement;
my $xpc = XML::LibXML::XPathContext->new($doc);
$xpc->registerNs(y => 'http://www.example.com/yyyy');
for my $atrid ( $xpc->findnodes('y:WO/y:WOSet/y:SR/y:SPEC') ) {
my $name = $xpc->findvalue('y:ATTRID', $atrid);
my $value = $xpc->findvalue('y:AVALUE', $atrid);
print "$name = $value\n";
}

perl script using XML parser to read values in text file and replace it xml file

Perl script using XML parser to read values in text file and replace it in xml file
how to read xml tag and replace value from text file value. if an entry value is null in install.properties then same has to be updated in property.xml and if entry value is null xml it should get updated with text file value
install.properties text file
TYPE = Patch
LOCATION =
HOST = 127.1.1.1
PORT = 8080
property.xml file before values are replaced
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="TYPE">Release</entry>
<!-- tst -->
<entry key="LOCATION">c:/release</entry>
<entry key="HOST">localhost</entry>
<entry key="PORT"></entry>
</properties>
property.xml file after values has been replaced
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="TYPE">Patch</entry>
<!-- tst -->
<entry key="LOCATION"></entry>
<entry key="HOST">127.1.1.1</entry>
<entry key="PORT">8080</entry>
</properties>
A solution using XML::XSH2, a wrapper around XML::LibXML.
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
open my $INS, '<', 'install.properties' or die $!;
while (<$INS>) {
chomp;
my ($var, $val) = split / = /; # / fix StackOverflow syntax highlighting.
$XML::XSH2::Map::ins->{$var} = $val;
}
xsh << '__XSH__';
open property.xml ;
for /properties/entry {
set ./text() xsh:lookup('ins', #key) ;
}
save :b ;
__XSH__
The same programme imlemented using only XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
open my $INS, '<', 'install.properties' or die $!;
my %ins;
while (<$INS>) {
chomp;
my ($var, $val) = split / = /; # / fix StackOverflow syntax highlighting.
$ins{$var} = $val;
}
my $xml = 'XML::LibXML'->load_xml( location => 'property.xml' );
for my $entry( $xml->findnodes('/properties/entry')) {
my ($text) = $entry->findnodes('text()');
$text->setData($ins{ $entry->getAttribute('key') });
}
rename 'property.xml', 'property.xml~';
$xml->toFile('property.xml');
Again, with XML::Twig:
#!/usr/bin/perl
use strict;
use warnings;
use autodie qw( open);
use XML::Twig;
my $IN= "install.properties";
my $XML= "properties.xml";
# load the input file into a a hash key => value
open( my $in, '<', $IN);
my %entry= map { chomp; split /\s*=\s*/; } <$in>;
XML::Twig->new( twig_handlers => { entry => \&entry, },
keep_spaces => 1,
)
->parsefile_inplace( $XML);
sub entry
{ my( $t, $entry)= #_;
if( my $val= $entry{$entry->att( 'key')} )
{ $entry->set_text( $val); }
$t->flush;
}

Extract XML tag content in Perl using XML::LibXML

I have the following XML file:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="CoreNLP-to-HTML.xsl" type="text/xsl"?>
<root>
<document>
<sentences>
<sentence id="1">
<basic-dependencies>
<dep type="nn">
<governor idx="2">Planted</governor>
<dependent idx="1">Europeans</dependent>
</dep>
</basic-dependencies>
</sentence>
</sentences>
</document>
</root>
I can extract the contents 'Europeans' using the code given below. Is there any way I can extract "nn" from the tag using XML::LibXML?
use strict;
use warnings;
use XML::LibXML qw( );
my $output = $filename.'.xml';
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($output);
for my $dependency_node ($doc->findnodes('//document/sentences/sentence/basic-dependencies'))
{
for my $dependent_node ($dependency_node->findnodes('dep'))
{
my $word = $dependent_node->findvalue('dependent/text()');
print "$word\n";
}
}
Yes, just change the assignment to
my $word = $dependent_node->findvalue('#type');
Attributes in XPath start with the # sign.