Organizing data with XPath in Perl

Organizing data with XPath in Perl - perl

I am using this line of code to get two data entries from an XML file
perl xmlPerl.pl zbxml.xml "//zabbix_export/templates/template/items/item/name/text() | //zabbix_export/templates/template/items/item/description/text()"
Which takes the data, and displays it vertically. For example:
name1
description1
name2
description2
I used this in c# and had some code so that it would display like this
name1 - description1
name2 - description2
name3 - (blank since there
isnt a description)
there were even some blanks in description. Here is the c# code, since it may help.
XPathExpression expr;
expr = nav.Compile("/zabbix_export/templates/template/items/item/name | /zabbix_export/templates/template/items/item/description");
XPathNodeIterator iterator = nav.Select(expr);
//Iterate on the node set
List<string> listBox1 = new List<string>();
listBox1.Clear();
try
{
while (iterator.MoveNext())
{
XPathNavigator nav2 = iterator.Current.Clone();
// nav2.Value;
listBox1.Add(nav2.Value);
Console.Write(nav2.Value);
iterator.MoveNext();
nav2 = iterator.Current.Clone();
Console.Write("-" + nav2.Value + "\n");
Well, I am having to switch it to Perl now, and I am not sure if I should try and find some Perl code to do what I need, or if this can be done in XPath? I tried looking at some w3 tutorials, but didn't find what I was looking for.
Thanks!
edit -
would I need to edit this part of my xmlPerl.pl
# print each node in the list
foreach my $node ( $nodeset->get_nodelist ) {
print XML::XPath::XMLParser::as_string( $node ) . "\n";
}

It cannot be done with an XPath. It can be done with an XSL transformation:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="text()"/>
<xsl:template match="item">
<xsl:value-of select="concat(name,' - ',description,'
')"/>
</xsl:template>
</xsl:stylesheet>
A simple Perl script that applies this XSLT will do the trick - see this for example (or any other command-line utility that applies an XSLT for that matter - like msxsl.exe)

Related

How to parse <rss> tag with XML::LibXML to find xmlns defintions

It seems that there is no consistent way that podcasts define their rss feeds.
Ran into one that is using different schema defs for the RSS.
What's the best way to scan for xmlnamespace in an RSS url, using XML::LibXML
E.g.
One feed might be
<rss
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">
Another might be
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"version="2.0"
xmlns:atom="http://www.w3.org/2005/Atom">
I want to include in my script an assessment of all the namespaces being used so that when parsing the rss, the appropriate field names can be tracked.
Not sure what that will look like yet, as I'm not sure this module has the capability to do the <rss> tag attribute atomization that I want.

I'm not sure I understand exactly what kind of output you're looking for, but XML::LibXML is indeed able to list the namespaces:
use warnings;
use strict;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => <<'EOT');
<rss
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">
</rss>
EOT
for my $ns ($dom->documentElement->getNamespaces) {
print $ns->getLocalName(), " / ", $ns->getData(), "\n";
}
Output:
content / http://purl.org/rss/1.0/modules/content/
wfw / http://wellformedweb.org/CommentAPI/
dc / http://purl.org/dc/elements/1.1/
atom / http://www.w3.org/2005/Atom
sy / http://purl.org/rss/1.0/modules/syndication/
slash / http://purl.org/rss/1.0/modules/slash/

I know that OP has already accepted an answer. But for completeness sake it should be mentioned that the recommended way to make searches on the DOM resilient is to use XML::LibXML::XPathContext:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my #examples = (
<<EOT
<rss xmlns:atom="http://www.w3.org/2005/Atom">
<atom:test>One Ring to rule them all,</atom:test>
</rss>
EOT
,
<<EOT
<rss xmlns:a="http://www.w3.org/2005/Atom">
<a:test>One Ring to find them,</a:test>
</rss>
EOT
,
<<EOT
<rss xmlns="http://www.w3.org/2005/Atom">
<test>The end...</test>
</rss>
EOT
,
);
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs('atom', 'http://www.w3.org/2005/Atom');
for my $example (#examples) {
my $dom = XML::LibXML->load_xml(string => $example)
or die "XML: $!\n";
for my $node ($xpc->findnodes("//atom:test", $dom)) {
printf("%-10s: %s\n", $node->nodeName, $node->textContent);
}
}
exit 0;
i.e. you assign a local namespace prefix for those namespaces you are interested in.
Output:
$ perl dummy.pl
atom:test : One Ring to rule them all,
a:test : One Ring to find them,
test : The end...

Complex XML parsing with Perl and LIBXML

I have XML:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="MeasDataCollection.xsl"?>
<measCollecFile xmlns="">
<fileHeader fileFormatVersion="32.435 V7.2.0">
</fileHeader>
<measData>
<managedElement localDn="bs=8" swVersion="R21A"/>
<measInfo measInfoId="CORE,SIP_session_statistics">
<measType p="1">CPUUSAGE</measType>
<measType p="2">CPUMEM</measType>
<measType p="3">SYSMEM</measType>
<measValue measObjLdn="SGC.bsNo=17,networkRole=2">
<r p="1">10</r>
<r p="2">20</r>
<r p="3">30</r>
</measValue>
<measValue measObjLdn="SGC.bsNo=18,networkRole=2">
<r p="1">40</r>
<r p="2">50</r>
<r p="3">60</r>
</measValue>
</measInfo>
</measData>
</measCollecFile>
QUESTION:
I want to extract the 40 from <r p="1">40</r> element. The only thing given is <measType p="1">CPUUSAGE</measType> and <measValue measObjLdn="SGC.bsNo=18,networkRole=2">
i.e. I only know that I need to find the CPUUSAGE of the bsNo=18. The order of the data is always maintained.
Here is what I have tried so far:
my $qry="//measInfo[measType/text() = 'CPUUSAGE']/measValue";
my #nodes= $conn->findnodes($qry);
foreach my $vnode (#nodes) {
if ($vnode->getAttribute('measObjLdn') =~ /'bsNo=18'/) {
foreach my $node ($vnode) {
foreach my $p ($node->getChildnodes) {
if (ref($p)=~'Element'){
$no=$p->textContent;
print $no;**#this prints the value of all the <r> elements**
}
}
}
}
}
My challenge is there can be many elements like CPUUSAGE,CPUMEM... and how I can reach the correct order in the <r> element in that order for a given measValue attribute (/'bsNo=18'/).
And subsequently modify that 40 to some other desired value**

Your Perl code can't work because you match the attribute value against 'bsNo=18' including single quotes.
If you want to find the r element with the same p attribute as the CPUUSAGE node, you could either try the single XPath expression by ikegami or something like the following:
for my $type_node ($conn->findnodes('//measInfo/measType[.="CPUUSAGE"]')) {
my $p = $type_node->getAttribute('p');
my $qry = <<"EOF";
..
/measValue[contains(concat(\#measObjLdn, ','), 'bsNo=18,')]
/r[\#p='$p']
EOF
for my $r_node ($type_node->findnodes($qry)) {
print $r_node->textContent, "\n";
}
}
This first loops over all measType nodes whose content is CPUUSAGE, gets the p attribute then finds all the corresponding r nodes. This approach should be more efficient than a single XPath query.
To find the r node by position and modify its contents, try:
for my $type_node ($conn->findnodes('//measInfo/measType[.="CPUUSAGE"]')) {
my $pos = $type_node->findvalue('count(preceding-sibling::measType) + 1');
my $qry = <<"EOF";
..
/measValue[contains(concat(\#measObjLdn, ','), 'bsNo=18,')]
/r[$pos]
EOF
for my $r_node ($type_node->findnodes($qry)) {
$r_node->removeChildNodes;
$r_node->appendText('50');
}
}
print $conn->toString;

LibXML - Inserting a Comment

I'm using XML::LibXML, I'd like to add a comment such that the the comment is outside the tag. Is it even possible to put it outside the tag? I've tried appendChild, insertBefore | After, no difference ...
<JJ>junk</JJ> <!--My comment Here!-->
# Code excerpt from within a foreach loop:
my $elem = $dom->createElement("JJ");
my $txt_node = $dom->createTextNode("junk");
my $cmt = $dom->createComment("My comment Here!");
$elem->appendChild($txt_node);
$b->appendChild($elem);
$b->appendChild($frag);
$elem->appendChild($cmt);
# but it puts the comment between the tags ...
<JJ>junk<!--My comment Here!--></JJ>

Don't append the comment node to $elem but to the parent node. For example, the following script
use XML::LibXML;
my $doc = XML::LibXML::Document->new;
my $root = $doc->createElement("doc");
$doc->setDocumentElement($root);
$root->appendChild($doc->createElement("JJ"));
$root->appendChild($doc->createComment("comment"));
print $doc->toString(1);
prints
<?xml version="1.0"?>
<doc>
<JJ/>
<!--comment-->
</doc>

Perl using XML Path Context to extract out data

I have the following xml
<?xml version="1.0" encoding="utf-8"?>
<Response>
<Function Name="GetSomethingById">
<something idSome="1" Code="1" Description="TEST01" LEFT="0" RIGHT="750" />
</Function>
</Response>
and I want the attributes of <something> node as a hash. Im trying like below
my $xpc = XML::LibXML::XPathContext->new(
XML::LibXML->new()->parse_string($xml) # $xml is containing the above xml
);
my #nodes = $xpc->findnodes('/Response/Function/something');
Im expecting to have something like $nodes[0]->getAttributes, any help?

my %attributes = map { $_->name => $_->value } $node->attributes();

Your XPATH query seems to be wrong - you are searching for '/WSApiResponse/Function/something' while the root node of your XML is Response and not WSApiResponse
From the docs of XML::LibXML::Node (the kind of stuff that findnodes() is expected to return), you should look for my $attrs = $nodes[0]->attributes() instead of $nodes[0]->getAttributes

I use XML::Simple for this type of thing. So if the XML file is data.xml
use strict;
use XML::Simple();
use Data::Dumper();
my $xml = XML::Simple::XMLin( "data.xml" );
print Data::Dumper::Dumper($xml);
my $href = $xml->{Function}->{something};
print Data::Dumper::Dumper($href);
Note: With XML::Simple the root tag maps to the result hash itself. Thus there is no $xml->{Response}

Perl LibXML raw data from textContent?

Given the following XML:
<?xml version="1.0" encoding="utf-8" ?>
<Request>
<form_submit>
<form_submit id = 1424>
<form_id>1424</form_id>
<field1 id=’5’> <![CDATA[ test ]]> </field1>
<field2 id=’6’> <![CDATA[ test2 ]]> </field2>
</form_submit>
</form_submit>
</Request>
I'm trying to get the raw values for the field1 and field2 elements. I'm using the following code:
foreach my $node ( $xml_request->findnodes('Request/*/*/*[#id]') )
{
my $form_field_value = $node->textContent;
print "Value:\"$form_field_value\"\n";
}
But the output is:
Value:" test "
Value:" test2 "
How do I retrieve the exact data, raw and as is, with all the special characters? So that the output is:
Value:" <![CDATA[ test ]]> "
Value:" <![CDATA[ test2 ]]> "
Thank you.

Am not a libxml expert.
However this is what I could figure out after playing with your xml and libxml a bit.
CDATA is a node/section and is not part of text.
Code below goes one level deep and do a toString() for cdata child nodes
and textContent for other nodes.
foreach my $node ( $xml_request->findnodes('Request/*/*/*[#id]') )
{
my $text;
if($node->childNodes) {
foreach my $child ($node->childNodes()) {
if ($child->nodeType == XML::LibXML::XML_CDATA_SECTION_NODE) {
$text .= $child->toString;
} else {
$text .= $child->textContent;
}
}
} else {
$text = $node->textContent;
}
print qq{"$text"\n};
}
will print
" <![CDATA[ test ]]> "
" <![CDATA[ test2 ]]> "

Your sample data is invalid XML, and won't parse unless you replace 1424, ’5’ and ’6’ with "1424", "5" and "6".
You have asked for the text content and have got exactly that. To get what you need you must search for the children of the <fieldN> elements and use the toString method on them.
This code shows the idea. Note that the spaces before and after the CDATA, which would otherwise appear as separate text nodes, have been eliminated using a keep_blanks => 0 option on the object constructor.
use strict;
use warnings;
use XML::LibXML;
my $xml_request = XML::LibXML->load_xml(string => <<'END', keep_blanks => 0);
<?xml version="1.0" encoding="utf-8" ?>
<Request>
<form_submit>
<form_submit id = "1424">
<form_id>1424</form_id>
<field1 id="5"> <![CDATA[ test ]]> </field1>
<field2 id="6"> <![CDATA[ test2 ]]> </field2>
</form_submit>
</form_submit>
</Request>
END
foreach my $node ( $xml_request->findnodes('//form_submit/*[#id]/text()') ) {
my $form_field_value = $node->toString;
print qq(Value: "$form_field_value"\n);
}
output
Value: "<![CDATA[ test ]]>"
Value: "<![CDATA[ test2 ]]>"
Edit
ikegami has commented that the output requested in the question includes the whitespace surrounding the CDATA section. I don't know whether that is truly part of the requirement, but this edit provides a way to do that.
This would be clearer using XML::LibXML::Reader as it has a readInnerXml method (comparable to JavaScript's innerHTML ) that does exactly what is necessary. Instead, this program has to serialize all the children of the <fieldN> nodes and concatenate them with join.
This is a new foreach loop. The rest of the program remains unchanged except for the construction of $xml_request, which must have the keep_blanks option set to 1 or removed altogether.
foreach my $node ( $xml_request->findnodes('//*[starts-with(name(),"field")][#id]') ) {
my $form_field_value = join '', map $_->toString, $node->childNodes;
print qq(Value: "$form_field_value"\n);
}
output
Value: " <![CDATA[ test ]]> "
Value: " <![CDATA[ test2 ]]> "

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Organizing data with XPath in Perl - perl

Related

How to parse <rss> tag with XML::LibXML to find xmlns defintions

Complex XML parsing with Perl and LIBXML

LibXML - Inserting a Comment

Perl using XML Path Context to extract out data

Perl LibXML raw data from textContent?

Categories

Resources