I have a Perl script to convert the XML file below into a hash:
<university>
<name>svu</name>
<location>ravru</location>
<branch>
<electronics>
<student name="xxx" number="12">
<semester number="1"subjects="7" rank="2"/>
</student>
<student name="xxx" number="15">
<semester number="1" subjects="7" rank="10"/>
<semester number="2" subjects="4" rank="1"/>
</student>
<student name="xxx" number="16">
<semester number="1"subjects="7" rank="2"/>
<semester number="2"subjects="4" rank="2"/>
</student>
</electronics>
</branch>
</university>.
.
.
.
.
.
<data>
<student name="msr" number="1" branch="computers" />
<student name="ksr" number="2" branch="electronics" />
<student name="lsr" number="3" branch="EEE" />
<student name="csr" number="4" branch="IT" />
<student name="msr" number="5" branch="MEC" />
<student name="ssr" number="6" branch="computers" />
<student name="msr" number="1" branch="CIV" />
.............................
..............................
.....................
</data>
How can I create a hash table for the data elements, with the name and number as the key and branch is the value in that hash. I need this because some students have the same name and some students have same number.
By using this hash key I have to search in the university node for student if found and print the branch name of each student.
I written some script in XML::Simple but am not able to create a hash.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use XML::Simple;
my $xml = new XML::Simple;
my $data = $xml->XMLin("data.xml", forcearray => [ 'student' , 'semister' ],
KeyAttr => { student => "+Name" } );
print Dumper($data);
by using data dumper I am printing hole xml information. but I need to print only Data Node elements only please help me how to do this.
I would probably write my own XML::Parser handler to combine attributes into key values (if that's something supported by XML::Simple I couldn't find it in the docs). This example should get you started:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Parser;
use Data::Dumper;
my %hash;
sub tag_start { my ($expat, $tagname) = (shift, shift);
# attributes are now in #_
my %a = grep { $_=$_=>shift } #_; # attribute hash for this tag
my $context = join('/',$expat->context()) || '';
if ($context eq 'xml/data') {
if ($tagname eq 'student') {
push #{($hash{"$a{name}:$a{number}"}||=[])}, $a{branch};
}
} elsif ($context eq ...) {
...
}
}
my $p = new XML::Parser(Handlers => { Start=>\&tag_start });
$p->parsefile('file.xml');
print Dumper \%hash;
Note that to get this to work I had to clean up your XML a bit by enclosing it in an <xml> tag and adding some missing spaces:
<xml>
<university>
<name>svu</name>
<location>ravru</location>
<branch>
<electronics>
<student name="xxx" number="12">
<semester number="1" subjects="7" rank="2"/>
</student>
<student name="xxx" number="15">
<semester number="1" subjects="7" rank="10"/>
<semester number="2" subjects="4" rank="1"/>
</student>
<student name="xxx" number="16">
<semester number="1" subjects="7" rank="2"/>
<semester number="2" subjects="4" rank="2"/>
</student>
</electronics>
</branch>
</university>
<data>
<student name="msr" number="1" branch="computers" />
<student name="ksr" number="2" branch="electronics" />
<student name="lsr" number="3" branch="EEE" />
<student name="csr" number="4" branch="IT" />
<student name="msr" number="5" branch="MEC" />
<student name="ssr" number="6" branch="computers" />
<student name="msr" number="1" branch="CIV" />
</data>
</xml>
Result:
$VAR1 = {
'ksr:2' => [
'electronics'
],
'msr:1' => [
'computers',
'CIV'
],
'csr:4' => [
'IT'
],
'ssr:6' => [
'computers'
],
'msr:5' => [
'MEC'
],
'lsr:3' => [
'EEE'
]
};
There is no need to use XML::Simple and XML::Fast together. Both perform essentially the same thing.
Invoking multiple XML parsers for the same functionality invites trouble in the form of undesired behavior, code that should work but doesn't and debugging that will leave you holding your hands in your head because identically-named methods are treading on one another's toes.
I'd stick with XML::Fast for this case:
use strict;
use warnings;
use XML::Fast;
my $data = xml2hash 'data.xml', array => [ 'student', 'semester' ];
Even if the structure is not exactly the desired one, $data can easily be post-processed and seasoned to taste (it is a data structure after all).
Related
I have this xml file test1.xml:
<body>
<message>
<name>gandalf</name>
<attributes>
<value key="1">1</value>
<value key="2">2</value>
<value key="3">3</value>
<value key="4">4</value>
</attributes>
</message>
</body>
I want to override the value that its key is "4" to "10"
so my xml will look like this:
<body>
<message>
<name>gandalf</name>
<attributes>
<value key="1">1</value>
<value key="2">2</value>
<value key="3">3</value>
<value key="4">10</value>
</attributes>
</message>
</body>
this is my code:
#!/usr/bin/perl
use XML::Simple;
my $xml = new XML::Simple;
my $data = XMLin("test1.xml", ForceArray => 1);
$data->{message}->[0]->{attributes}->[0]->{value}->{4}->{content} = "10";
$newData = $xml->XMLout($data);
open(XML,">test2.xml");
print XML $newData;
close(XML);
when i run this code, the output xml looks like this:
<opt>
<message>
<name>gandalf</name>
<attributes name="value">
<1>1<1>
<2>2<2>
<3>3<3>
<4>10<4>
</attributes>
</message>
</opt>
Don't use XML::Simple.
XML::LibXML and XML::Twig are much better alternatives.
Here's the solution using XML::Twig:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $xml = XML::Twig -> new -> parsefile ( 'test1.xml' );
$_ -> set_text('10') for $xml -> get_xpath('//message/attributes/value[#key="4"]');
$xml -> set_pretty_print('indented');
$xml -> print;
This gives you:
<body>
<message>
<name>gandalf</name>
<attributes>
<value key="1">1</value>
<value key="2">2</value>
<value key="3">3</value>
<value key="4">10</value>
</attributes>
</message>
</body>
You can print to a file, by opening a filehandle and giving that fh as an argument to print:
open ( my $ouput, '>', 'test2.xml' ) or die $!;
$xml -> print ( $output );
Because you also ask in comments:
I also want to know how to set text for a value with a key that doesnt exists. For example i want to add <value key="5">5</value> inside the attributes
my $attributes = $xml -> get_xpath('//message/attributes',0); #0 to find the first one.
$attributes -> insert_new_elt('last_child', 'value', { key => 5 }, 5 );
Or as one line:
$xml -> get_xpath('//message/attributes',0) -> insert_new_elt('last_child', 'value', { key => 5 }, 5 );
Note the slightly different usage of get_xpath - we give a second argument 0 - because that says 'get the first element that matches', rather than every element that matches.
I have an xml file with many lines similar to :
<parameter element="XYZ" module="XYZ" parametername="MyParameter" moc="MyParameter" moi="ABC=1473,DEF=0,GHI=0,JKL=0 />
My requirements are :
If the moc and parametername are same convert the first character in the parametername to lower case.
Reverse the moi like below.
So the converted line should be like :
<parameter element="XYZ" module="XYZ" parametername="myParameter" moc="MyParameter" moi="JKL=0,GHI=0,DEF.dEF=0,ABC.aBC=1473 />
Using XML::LibXML and adding a new element for a complete example (and assuming the changes are to be made on an element called parameter):
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => <DATA>);
for my $node($dom->findnodes('//parameter')) {
my $param = $node->getAttribute('parametername');
my $moc = $node->getAttribute('moc');
my #moi = split ",", $node->getAttribute('moi');
$node->setAttribute('parametername', lcfirst $param) if $param eq $moc;
$node->setAttribute('moi', join ',', reverse #moi);
}
print $dom;
__DATA__
<root>
<parameter element="XYZ" module="XYZ" parametername="MyParameter" moc="MyParameter" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
<parameter element="XYZ" module="XYZ" parametername="foo" moc="MyParameter" moi="XYZ=1473,DEF=0,GHI=0,JKL=0"/>
</root>
Result:
<root>
<parameter element="XYZ" module="XYZ" parametername="myParameter" moc="MyParameter" moi="JKL=0,GHI=0,DEF=0,ABC=1473"/>
<parameter element="XYZ" module="XYZ" parametername="foo" moc="MyParameter" moi="JKL=0,GHI=0,DEF=0,XYZ=1473"/>
</root>
Other ways to load_xml file using XML::LibXML
use strict;
use warnings;
use 5.014;
use XML::LibXML;
my $filename = "xml.xml";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($filename);
say $doc;
for my $param ($doc->findnodes('//parameter')) {
my $pname_attr = $param->getAttribute('parametername');
my $moc_attr = $param->getAttribute('moc');
if ($pname_attr eq $moc_attr) {
$param->setAttribute('parametername', lcfirst $pname_attr);
my $moi_attr = $param->getAttribute('moi');
my #pieces = split ',', $moi_attr;
$pieces[0] =~ s/\A([^=]+)/$1.\l$1/xms;
$pieces[1] =~ s/\A([^=]+)/$1.\l$1/xms;
$param->setAttribute('moi', join ',', reverse #pieces);
}
}
say $doc;
--output:--
<?xml version="1.0" encoding="UTF-8"?>
<root>
<parameter element="XYZ" module="XYZ" parametername="ABC" moc="CBA" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
<parameter element="XYZ" module="XYZ" parametername="MyParameter" moc="MyParameter" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
</root>
<?xml version="1.0" encoding="UTF-8"?>
<root>
<parameter element="XYZ" module="XYZ" parametername="ABC" moc="CBA" moi="ABC=1473,DEF=0,GHI=0,JKL=0"/>
<parameter element="XYZ" module="XYZ" parametername="myParameter" moc="MyParameter" moi="JKL=0,GHI=0,DEF.dEF=0,ABC.aBC=1473"/>
</root>
If you want to change the moi attribute in all the <parameter> tags, then the code would look like this:
...
...
for my $param ($doc->findnodes('//parameter')) {
my $pname_attr = $param->getAttribute('parametername');
my $moc_attr = $param->getAttribute('moc');
if ($pname_attr eq $moc_attr) {
$param->setAttribute('parametername', lcfirst $pname_attr);
}
my $moi_attr = $param->getAttribute('moi');
my #pieces = split ',', $moi_attr;
$pieces[0] =~ s/\A([^=]+)/$1.\l$1/xms;
$pieces[1] =~ s/\A([^=]+)/$1.\l$1/xms;
$param->setAttribute('moi', join ',', reverse #pieces);
}
Response to comments:
1)
When I run it it says >/usr/bin/perl edit_mpvl.pl Perl v5.14.0
required--this is only v5.10.0,
Change the line:
use 5.014;
to:
use 5.010;
2)
Can we write the output to a file
Sure, add this:
my $fname = 'modified.xml';
open my $OUTFILE, '>', $fname
or die "Couldn't open $fname: $!";
print {$OUTFILE} $doc->toString;
close $OUTFILE;
Or, you can pretty print like this:
...
...
use XML::LibXML::PrettyPrint;
use Readonly;
...
...
Readonly my $SPACE => " ";
my $pp = XML::LibXML::PrettyPrint->new(
indent_string => $SPACE x 4 #Replace 4 by the number of spaces you want the indenting to be.
);
$pp->pretty_print($doc); #modifies $doc inplace
print {$OUTFILE} $doc->toString;
close $OUTFILE;
This is my sample xml file (snipped version of deployment plan for weblogic application).
<?xml version="1.0" encoding="UTF-8"?>
<deployment-plan>
<application-name>ear-my-service</application-name>
<variable-definition>
<variable>
<name>characteristics</name>
<value>myTestEAR</value>
</variable>
<variable>
<name>Url</name>
<value>ABC</value>
</variable>
<variable>
<name>time</name>
<value>300</value>
</variable>
</variable-definition>
</deployment-plan>
I want to update the value when the name is characteristics. I looked around and put together this script.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $data = shift ||die $!;
my $t= XML::Twig->new(
twig_handlers => {
q{project[string(name) =~ /\bcharacteristics\b/]/value} => \&value,
},
pretty_print => 'indented',
);
$t->parsefile( $data );
$t->print;
sub value {
my ($twig, $value) = #_;
$value->set_text("myTestEAR_Modified");
}
However, it doesn't change the value to myTestEAR_Modified. Am I doing it incorrectly?
The tag name in the XML is variable, not project, replace it in the condition (q{variable[string(name)...) and it will work.
I have an xml file like this
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>
Now I need to extract the value of the f href attribute. I tried it with single line processing but there is certainly a better way to do it. Any idea?
Thanks
After fixing the typo in your XML, I was able to extract the value with the following code:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $dom = 'XML::LibXML'->load_xml( file => 'example.xml' );
my $xc = 'XML::LibXML::XPathContext'->new;
$xc->registerNs('x', 'http://ns.adobe.com/xfdf/');
for my $href ($xc->findvalue('//x:f/#href', $dom)) {
print $href, "\n";
}
I usually find XML::LibXML too verbose, so I'd use XML::XSH2:
open example.xml ;
register-namespace x http://ns.adobe.com/xfdf/ ;
for //x:f echo #href ;
I like XML::Twig. Not to dispute previous poster's solution, I'd do it like this:
use strict;
use warnings;
use XML::Twig;
sub extract_f {
my ( $twig, $f ) = #_;
print $f->atts->{'href'}, "\n";
}
my $twig = XML::Twig->new( twig_handlers => { 'f' => \&extract_f }, );
$twig->parse( \*DATA );
__DATA__
<?xml version="1.0" encoding="UTF-8"?><xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve" >
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>
The major reason I like XML::Twig is because it allows purging XML as you go - so if you have a lot of XML to work with, it can be invaluable.
I would recommend either XML::LibXML or XML::Twig.
I would consider your goal rather trivial if not for having to deal with namespaces. However, the following demonstrates how to use XML::LibXML to pull your desired value while ignoring the namespaces:
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_xml( IO => \*DATA );
my ($f) = $dom->findnodes('//*[local-name()="f"]');
print $f->getAttribute('href'), "\n";
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>
Outputs:
C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf
I tried to modify the name field in an XML file using this program
use XML::Twig;
open(OUT, ">resutl.xml") or die "cannot open out file main_file:$!";
my $twig = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => {
association => sub {
$_->findnodes('div');
$_->set_att(name => 'xxx');
},
},
);
$twig->parsefile('in.xml');
$twig->flush(\*OUT);
<div
name="test1"
booktype="book1"
price="e200"
/>
<div
name="test2"
booktype="book2"
price="100" />
When I execute the Perl script it prints the error
junk after document element at line 6, column 0, byte 65 at C:/Perl64/lib/XML/Parser.pm line 187.
at C:\Users\admin\Desktop\parse.pl line 14.
I have tried to tidy your post a little but I don't understand the XML fragment that immediately follows the Perl code.
There are two empty div elements without a root element, so as it stands it isn't well-formed XML.
XML::Twig is assuming that the first div element is the document (root) element and, since it has no content, the subsequent text produces the error message
junk after document element
You also have set twig_handlers to just a single element that handles association elements in the XML, but your data has no such elements.
I think you need to explain more about what it is that you need to do
Properly formatted xml requires a single root element. When XML::Twig attempts to parse your file, it finds the first div and decides that is the root element of the file. When it reaches the end of that and finds another tag at line 6, it gets unhappy and rightfully says there's an error.
If this document is actually intended to be XML, you'll need to enclose that data in fake element in order for it to be parsable. The following does that:
use strict;
use warnings;
use XML::Twig;
my $data = do {local $/; <DATA>};
# Enclose $data in a fake <root> element
$data = qq{<root>$data</root>};
my $twig = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => {
association => sub {
$_->findnodes('div');
$_->set_att(name => 'xxx');
},
},
);
$twig->parse($data);
$twig->print;
__DATA__
<div
name="test1"
booktype="book1"
price="e200"
/>
<div
name="test2"
booktype="book2"
price="100" />
Outputs:
<root>
<div booktype="book1" name="test1" price="e200"/>
<div booktype="book2" name="test2" price="100"/>
</root>
Now, it's also unclear what you're trying to do with your "XML". I suspect you're trying to change the name attributes of the div tags to be 'xxx'. If that's the case then you need to redo your twig_handlers to the following:
twig_handlers => {
'//div' => sub { $_->set_att(name => 'xxx'); },
},
The output will then be:
<root>
<div booktype="book1" name="xxx" price="e200"/>
<div booktype="book2" name="xxx" price="100"/>
</root>