Perl using XML Path Context to extract out data - perl

I have the following xml
<?xml version="1.0" encoding="utf-8"?>
<Response>
<Function Name="GetSomethingById">
<something idSome="1" Code="1" Description="TEST01" LEFT="0" RIGHT="750" />
</Function>
</Response>
and I want the attributes of <something> node as a hash. Im trying like below
my $xpc = XML::LibXML::XPathContext->new(
XML::LibXML->new()->parse_string($xml) # $xml is containing the above xml
);
my #nodes = $xpc->findnodes('/Response/Function/something');
Im expecting to have something like $nodes[0]->getAttributes, any help?

my %attributes = map { $_->name => $_->value } $node->attributes();

Your XPATH query seems to be wrong - you are searching for '/WSApiResponse/Function/something' while the root node of your XML is Response and not WSApiResponse
From the docs of XML::LibXML::Node (the kind of stuff that findnodes() is expected to return), you should look for my $attrs = $nodes[0]->attributes() instead of $nodes[0]->getAttributes

I use XML::Simple for this type of thing. So if the XML file is data.xml
use strict;
use XML::Simple();
use Data::Dumper();
my $xml = XML::Simple::XMLin( "data.xml" );
print Data::Dumper::Dumper($xml);
my $href = $xml->{Function}->{something};
print Data::Dumper::Dumper($href);
Note: With XML::Simple the root tag maps to the result hash itself. Thus there is no $xml->{Response}

Related

unable to parse xml file using registered namespace

I am using XML::LibXML to parse a XML file. There seems to some problem in using registered namespace while accessing the node elements. I am planning to covert this xml data into CSV file. I am trying to access each and every element here. To start with I tried out extracting attribute values of <country> and <state> tags. Below is the code I have come with . But I am getting error saying XPath error : Undefined namespace prefix.
use strict;
use warnings;
use Data::Dumper;
use XML::LibXML;
my $XML=<<EOF;
<DataSet xmlns="http://www.w3schools.com" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd">
<exec>
<survey_region ver="1.1" type="x789" date="20160312"/>
<survey_loc ver="1.1" type="x789" date="20160312"/>
<note>Population survey</note>
</exec>
<country name="ABC" type="MALE">
<state name="ABC_state1" result="PASS">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
Some random text
contained here
]]></comment>
</state>
</country>
<country name="XYZ" type="MALE">
<state name="XYZ_state2" result="FAIL">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
any random text data
]]></comment>
</state>
</country>
</DataSet>
EOF
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($XML);
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('x','http://www.w3schools.com');
foreach my $camelid ($xc->findnodes('//x:DataSet')) {
my $country_name = $camelid->findvalue('./x:country/#name');
my $country_type = $camelid->findvalue('./x:country/#type');
my $state_name = $camelid->findvalue('./x:state/#name');
my $state_result = $camelid->findvalue('./x:state/#result');
print "state_name ($state_name)\n";
print "state_result ($state_result)\n";
print "country_name ($country_name)\n";
print "country_type ($country_type)\n";
}
Update
if I remove the name space from XML and change my XPath slightly it seems to work. Can someone help me understand the difference.
foreach my $camelid ($xc->findnodes('//DataSet')) {
my $country_name = $camelid->findvalue('./country/#name');
my $country_type = $camelid->findvalue('./country/#type');
my $state_name = $camelid->findvalue('./country/state/#name');
my $state_result = $camelid->findvalue('./country/state/#result');
print "state_name ($state_name)\n";
print "state_result ($state_result)\n";
print "country_name ($country_name)\n";
print "country_type ($country_type)\n";
}
This would be my approach
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $XML=<<EOF;
<DataSet xmlns="http://www.w3schools.com" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd">
<exec>
<survey_region ver="1.1" type="x789" date="20160312"/>
<survey_loc ver="1.1" type="x789" date="20160312"/>
<note>Population survey</note>
</exec>
<country name="ABC" type="MALE">
<state name="ABC_state1" result="PASS">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
Some random text
contained here
]]></comment>
</state>
</country>
<country name="XYZ" type="MALE">
<state name="XYZ_state2" result="FAIL">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
any random text data
]]></comment>
</state>
</country>
</DataSet>
EOF
my $parser = XML::LibXML->new();
my $tree = $parser->parse_string($XML);
my $root = $tree->getDocumentElement;
my #country = $root->getElementsByTagName('country');
foreach my $citem(#country){
my $country_name = $citem->getAttribute('name');
my $country_type = $citem->getAttribute('type');
print "Country Name -- $country_name\nCountry Type -- $country_type\n";
my #state = $citem->getElementsByTagName('state');
foreach my $sitem(#state){
my #info = $sitem->getElementsByTagName('info');
my $state_name = $sitem->getAttribute('name');
my $state_result = $sitem->getAttribute('result');
print "State Name -- $state_name\nState Result -- $state_result\n";
foreach my $i (#info){
my $text = $i->getElementsByTagName('type');
print "Info --- $text\n";
}
}
print "\n";
}
Of course you can manipulate the data anyway you'd like. If you are parsing from a file change parse_string to parse_file.
For the individual elements in the xml use the getElementsByTagName to get the elements within the tags. This should be enough to get you going
There seem to be two small mistakes here.
1. call findvalue for the XPathContext document with the context node as parameter.
2. name is a attribute in country no a node.
Therefor try :
my $country_name = $xc->findvalue('./x:country/#name', $camelid );
Update to the updated question if I remove the name space from XML and change my XPath slightly it seems to work. Can someone help me understand the difference.
To understand what happens here have a look to NOTE ON NAMESPACES AND XPATH
In your case $camelid->findvalue('./x:state/#name'); calls findvalue is called for an node.
But: The recommended way is to use the XML::LibXML::XPathContext module to define an explicit context for XPath evaluation, in which a document independent prefix-to-namespace mapping can be defined. Which I did above.
Conclusion:
Calling find on a node will only work: if the root element had no namespace
(or if you use the same prefix as in the xml doucment if ther is any)

XMLin not parsing XML properly

I have an XML as follows in $response_xml
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/"><?xml version="1.0" encoding="utf-8"?><wholeSaleApi><credentials><referenceNumber></referenceNumber></credentials><wholeSaleOrderResponse><statusCode>666</statusCode><description>Object reference not set to an instance of an object.</description></wholeSaleOrderResponse></wholeSaleApi></string>
When I parse it using
my $xs = XML::Simple->new();
my $xmlDS = eval{ $xs->XMLin($response_xml) };
I get the following data structure
$xmlDS = {
'xmlns' => 'http://schemas.microsoft.com/2003/10/Serialization/',
'content' => '<?xml version="1.0" encoding="utf-8"?><wholeSaleApi><credentials><referenceNumber></referenceNumber></credentials><wholeSaleOrderResponse><statusCode>666</statusCode><description>Object reference not set to an instance of an object.</description></wholeSaleOrderResponse></wholeSaleApi>'
};
How do I get the content portion from this?
What you get is a hash reference. You can use the follwoing syntax to get to the particular key:
my $content = $xmlDS->{content};

perl script to replace the xml values

I have this XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<BroadsoftDocument protocol = "OCI" xmlns="C" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<sessionId xmlns="">169.254.52.85,16602326,1324821125562</sessionId>
<command xsi:type="UserAddRequest14sp9" xmlns="">
<serviceProviderId>AtyafBahrain</serviceProviderId>
<groupId>LoadTest</groupId>
<userId>user_0002#atyaf.me</userId>
<lastName>0002</lastName>
<firstName>user</firstName>
<callingLineIdLastName>0002</callingLineIdLastName>
<callingLineIdFirstName>user</callingLineIdFirstName>
<password>123456</password>
<language>English</language>
<timeZone>Asia/Bahrain</timeZone>
<address/>
</command>
</BroadsoftDocument>
and I need to replace the values of some fields (UserID, firstName, password) and output the file to be saved with the same name.
Using the code below I will change the syntax of the xml fields (xml format gets disturbed):
XMLout( $xml, KeepRoot => 1, NoAttr => 1, OutputFile => $xml_file, );
can you please advice how to edit the xml file without changing its syntax?
You can checkout XML::Simple parser for perl. You can refer to the CPAN site. I have used it for parsing XML files but I think this should allow modification as well.
# open XML file (input the XML file name)
open (INPUTFILE, "+<$filename_1");
#file = <INPUTFILE>;
seek INPUTFILE,0,0;
foreach $file (#file)
{
# Find string_1 and replace it by string_2
$file =~ s/$str_1/$str_2/g;
# write to file
print INPUTFILE $file;
}
close INPUTFILE;

Dropdown-Menu with optgroup

i am trying to create a dynamic dropdown-menu that receives its entries out of an xml-file at script-startup.
first i tried a static version like this:
Tr(td([popup_menu( -name=>'betreff', -values=>[optgroup(-name=>'Mädels',
-values=>['Susi','Steffi',''], -labels=>{'Susi'=>'Petra','Steffi'=>'Paula'})
,optgroup(-name=>'Jungs', -values=>['moe', 'catch',''])])]));
that worked fine.
The prob starts when i try to put the -values-parameter of popup_menu into a scalar variable.
Should somehow lokk similar to that one:
$popup_values = "[optgroup(-name=>'Mädels', -values=>['Susi','Steffi',''],
-labels=>{'Susi'=>'Petra','Steffi'=>'Paula'}),optgroup(-name=>'Jungs',
-values=>['moe', 'catch',''])]"
or with single quotation marks.
The goal is to build that string by concatenating the syntax-corrected elements of the xml-file. Thats because i do not know a priori how many optgroups or list elements within the optgroups will exist.
Any idea?
Thx in advance
Jochen
So you have an XML file which you use to generate that string? Why not directly generate the data structure necessary for the popup_menu call? It's just an array (you can call optgroup while "analysing" the XML file)
If you really want to use the string-solution then you could use eval to transform the string to the data structure. Though this solution has certain security issues.
Reading From XML-File
Here's an example of how to transform form XML to the optgroup, this of course depends on how your XML-file looks like.
use strict;
use warnings;
use XML::Simple;
use CGI qw/:standard/;
my $xmlString = join('', <DATA>);
my $xmlData = XMLin($xmlString);
my #popup_values;
foreach my $group (keys(%{$xmlData->{group}})) {
my (#values, %labels);
my $options = $xmlData->{group}->{$group}->{opt};
foreach my $option (keys(%{$options})) {
push #values, $option;
if(exists($options->{$option}->{label}) &&
'' ne $options->{$option}->{label}) {
$labels{$option} = $options->{$option}->{label};
}
}
push #popup_values, optgroup(-name => $group,
-labels => \%labels,
-values => \#values
);
}
print popup_menu(-name=>'betreff', -values=> \#popup_values);
__DATA__
<?xml version="1.0" encoding="UTF-8" ?>
<dropdown>
<group name="Mädels">
<opt name="Susi" label="Petra"/>
<opt name="Steffi" label="Paula"/>
<opt name="" />
</group>
<group name="Jungs">
<opt name="moe" />
<opt name="catch" />
<opt name="" />
</group>
</dropdown>

XML::Simple encoding problem

I have an xml-file I want to parse:
<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>
It's perfectly parsed by firefox. But XML::Simple corrupts some data. I have a perl-program like this:
my $content = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n";
$content .= "<tag>\x{c3}\x{bb}</tag>\n";
print "input:\n$content\n";
my $xml = new XML::Simple;
my $data = $xml->XMLin($content, KeepRoot => 1);
print "data:\n";
print Dumper $data;
and get:
input:
<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>
data:
$VAR1 = {
'tag' => "\x{fb}"
};
it doesn't seem to be what I expected. I think there some encoding issues. Am I doing something wrong?
UPD:
I thought that XMLin returned text in utf-8 (as the input). Just added
encode_utf8($data->{'tag'});
and it worked
XML::Simple is fickle.
Its calling Encode::decode('UTF-8',$content) which is putting your UTF-8 in native.
Do this:
my $content_utf8 = "whatevér";
my $xml = XMLin($content_utf8);
my $item_utf8 = Encode::encode('UTF-8',$xml->{'item'});
This sort of works too, but risky w/ double encoding:
my $content_utf8 = "whatevér";
my $double_encoded_utf8 = Encode::encode('UTF-8',$content_utf8);
my $xml = XMLin($double_encoded_utf8);
my $item_utf8 = $xml->{'item'};
Hexadecimal FB (dec 251) is ASCII code of "û" character. Could you please elaborate on what you expected to get in the data structure which leads you to conclude what you got was "corrupt"?