Doing XPath using Perl - perl

I am coding with Perl on a Window 7 machine. I am able to extract data from the XML using the XPath code below
use strict;
use warning;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($newfile);
my $query = "/tradenet/message/header/unique_ref_no/date/text( )";
my($node) = $doc->findnodes($query);
$node->setData("$file_seq_number");
However, when i use the same code on a different XML, the xpath from the second document looks as below:
/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric
Together with the Perl code, this is what the extraction code looks like:
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($newfile);
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric/text( )";
my($node) = $doc->findnodes($query);
$node->setData("$file_seq_number");
Using the second code, I am unable to retrieve the data from the second XML. I receive this error "Can't call method "setData"on an undefined value at Perl.pl line 5".
Does the ":" character in the second XPATH address affecting the code?

You have to define what out, cac, and cbc mean in order for the XPath query to find the appropriate nodes:
my $doc = $parser->parse_file($newfile);
my $xpath_context = XML::LibXML::XPathContext->new($doc->documentElement());
# These URIs need to be the same as the ones in the source document
$xpath_context->registerNs('out', 'http://example.com/out.xsd');
$xpath_context->registerNs('cac', 'http://example.com/cac.xsd');
$xpath_context->registerNs('cbc', 'http://example.com/cbc.xsd');
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric/text( )";
my ($node) = $xpath_context->findnodes($query);
As promised, here is a working example. First, the test input file:
<?xml version="1.0"?>
<!-- input.xml -->
<TradenetResponse xmlns:a="http://example.com/out.xsd"
xmlns:b="http://example.com/cac.xsd"
xmlns:c="http://example.com/cbc.xsd">
<OutboundMessage>
<a:OutwardPermit>
<a:Declaration>
<a:Header>
<b:UniqueReferenceNumber>
<c:SequenceNumeric>1234</c:SequenceNumeric>
</b:UniqueReferenceNumber>
</a:Header>
</a:Declaration>
</a:OutwardPermit>
</OutboundMessage>
</TradenetResponse>
And here is the working Perl script:
#!/usr/bin/perl
# parse.pl
use strict;
use warnings;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $newfile = "input.xml";
my $doc = $parser->parse_file($newfile);
my $xpath_context = XML::LibXML::XPathContext->new($doc->documentElement());
# These URIs need to be the same as the ones in the source document
$xpath_context->registerNs('out', 'http://example.com/out.xsd');
$xpath_context->registerNs('cac', 'http://example.com/cac.xsd');
$xpath_context->registerNs('cbc', 'http://example.com/cbc.xsd');
# Query wrapped for clarity
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit" .
"/out:Declaration/out:Header/cac:UniqueReferenceNumber" .
"/cbc:SequenceNumeric/text()";
my ($node) = $xpath_context->findnodes($query);
print "Value: " . $node->getData() . "\n";
The output for me is:
sean#localhost:~xmltest$ ./parse.pl
Value: 1234

Related

Print output using XML::LibXML

my $doc = $parser->parse_string( $res->content );
my $root = $doc->getDocumentElement;
my #objects = $root->getElementsByTagName('OBJECT');
foreach my $object ( #objects ){
my $name = $object->firstChild;
print "OBJECT = " . $name . "\n";}
OUTPUT is:
OBJECT = XML::LibXML::Text=SCALAR(0x262e170)
OBJECT = XML::LibXML::Text=SCALAR(0x2ee4b00)
OBJECT = XML::LibXML::Text=SCALAR(0x262e170)
OBJECT = XML::LibXML::Text=SCALAR(0x2ee4b00)
Can anyone please explain why print prints the $name attribute values like this? Why does it print normal when I use the function getAttribute with virtually he same code?
getAttribute returns an attribute, while firstChild returns a text node, element, processing instruction, or a comment.
What you see is a normal Perl way of printing an object: it prints its class and address. Your version of XML::LibXML seems to be a bit antique, recent versions overload the stringification and the code produces the actual text node.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $doc = 'XML::LibXML'->load_xml( string => << '__XML__');
<root>
<OBJECT name="o1">hello</OBJECT>
</root>
__XML__
my #objects = $doc->getElementsByTagName('OBJECT');
for my $object (#objects) {
print 'OBJECT = ', $object->firstChild, "\n";
}
Output:
OBJECT = hello
In the old versions, one needed to call the nodeValue or data method.
print 'OBJECT = ', $object->firstChild->data, "\n";

XML::LibXML perl free memory

I need to process hundreds of xml files. I'm using XML::LibXML. I'm quite new to perl and I don't understand how to close the fist XML parsed file, before opening the new one
Example
use XML::LibXML;
my ($parser, $doc, $node);
foreach my $xmlfullname (#xmlfullnamearray) {
$parser = XML::LibXML->new();
$doc = $parser->parse_file($xmlfullname);
$node = $doc->findnodes("/root/node");
...
}
Thanks to all, Riccardo
By losing all references to it, which you already do by overwriting all the variables.
A little cleaner and clearer:
use XML::LibXML;
my $parser = XML::LibXML->new();
foreach my $xmlfullname (#xmlfullnamearray) {
my $doc = $parser->parse_file($xmlfullname);
my $node = $doc->findnodes("/root/node");
...
}

WWW:Facebook::API used in perl

I am getting www:Facebook:api in perl and CPAN
error while using the Use of uninitialized value within %field in hash element at /usr/share/perl5/WWW/Facebook/API/Auth.pm line 62.
i defined all keys
#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use WWW::Facebook::API;
use WWW::Facebook::API::Auth;
use HTTP::Request;
use LWP;
my $TMP = $ENV{HOME}.'/tmp';
my $facebook_api = '--------';
my $facebook_secret = '-------';
my $facebook_clientid = '--------';
my $gmail_user = '-------';
my $gmail_password = '--------';
my $client = WWW::Facebook::API->new(
desktop => 1,
api_version => '1.0',
api_key => $facebook_api,
secret => $facebook_secret,
throw_errors => 1,
);
$client->app_id($facebook_clientid);
local $SIG{INT} = sub {
print "Logging out of Facebookn";
my $r = $client->auth->logout;
exit(1);
};
my $token = $client->auth->create_token;
print "$token \n";
$client->auth->get_session($token);
print "$client \n";
WWW::Facebook::API doesn't look like it's been updated for a while. Line 62 of that file is:
$self->base->{ $field{$key} } = $resp->{$key};
The undefined value is the $field{$key} part. The %fieldhash is a hard-coded mapping between the names of Facebook API's known fields (i.e. the fields in the data Facebook returns to you) and the names which the module wants them to be called. It seems that Facebook has added some additional fields to its data, and the module has not been updated to deal with them.
Ultimately, this is just a warning; you can just ignore it if you like. If you want your script's output to be a bit tidier, you could change that line to:
$self->base->{ $field{$key} } = $resp->{$key} if defined $field{$key};

Getting list of hyperlinks from an Excel worksheet with Perl Win32::OLE

I want to change the path for a bunch of hyperlinks in an Excel spreadsheet. After searching Google, I came across a solutions to the problem of adding hyperlinks to spreadsheets, but not changing them. Microsoft showed how to something close with VBA here.
Since I want to edit every single hyperlink in my document, the key steps that I don't know how to solve are:
Get a list of hyperlink objects in Perl
Extract their addresses 1 by 1 and
Run a regular expression to make the path change
Store the updated path in the Hyperlink->object and repeat
I am new to using the OLE and am getting tripped up on (1). Here is what I have tried so far:
#!perl
use strict;
use warnings;
use 5.014;
use OLE;
use Win32::OLE::Const "Microsoft Excel";
my $file_name = 'C:\path\to\spreadsheet.xlsx';
my $excel = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;});
$excel->{Visible} = 1;
my $workbook = $excel->Workbooks->Open($file_name);
my $sheet = $workbook->Worksheets('Sheet 1');
foreach my $link (in $sheet->Hyperlinks ) {
say $link->Address;
}
But this gives code the error:
Win32::OLE(0.1709): GetOleEnumObject() Not a Win32::OLE::Enum object at C:/Dwimperl/perl/vendor/lib/Win32/OLE/Lite.pm line 167.
Can't call method "Hyperlinks" without a package or object reference at at script.pl line 14.
It's selecting the right worksheet, so I am not sure why it complains about an object reference. I tried several variations (Adding {} around Hyperlinks, removing the 'in', trying to store it as a list, as a hash, and as a reference to a hash) Can anyone give me some pointers? Thanks!
First, you should set $Win32::OLE::Warn=3 so your script will croak the moment something goes wrong. Second, I know you can't select sheets by name in older versions of Excel, although I do not know what things are like in the newest versions. Finally, I think you'll find it easier to use Win32::OLE::Enum.
Here is an example:
#!/usr/bin/env perl
use 5.014;
use warnings; use strict;
use Carp qw( croak );
use Path::Class;
use Try::Tiny;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Excel';
use Win32::OLE::Enum;
$Win32::OLE::Warn = 3;
my $book_file = file($ENV{TEMP}, 'test.xls');
say $book_file;
my $excel = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;});
$excel->{Visible} = 1;
my $book = $excel->Workbooks->Open("$book_file");
my $sheet = get_sheet($book, 'Sheet with Hyperlinks');
my $links = $sheet->Hyperlinks;
my $it = Win32::OLE::Enum->new($links);
while (defined(my $link = $it->Next)) {
my $address = $link->{Address};
say $address;
if ($address =~ s/example/not.example/) {
$link->{Address} = $address;
$link->{TextToDisplay} = "Changed to $address";
}
}
$book->Save;
$book->Close;
$excel->Quit;
sub get_sheet {
my ($book, $wanted_sheet) = #_;
my $sheets = $book->Worksheets;
my $it = Win32::OLE::Enum->new($sheets);
while (defined(my $sheet = $it->Next)) {
my $name = $sheet->{Name};
say $name;
if ($name eq $wanted_sheet) {
return $sheet;
}
}
croak "Could not find '$wanted_sheet'";
}
The workbook did contain a sheet with the name "Sheet with Hyperlinks". Cell A1 in that sheet contained http://example.com and A2 contained http://stackoverflow.com.

How can I access attributes and elements from XML::LibXML in Perl?

I am having trouble understanding / using name spaces with XML::LibXML package in Perl. I can access an element successfully but not an attribute. I have the following code which accesses an XML file (http://pastebin.com/f3fb9d1d0).
my $tree = $parser->parse_file($file); # parses the file contents into the new libXML object.
my $xpc = XML::LibXML::XPathContext->new($tree);
$xpc->registerNs(microplateML => 'http://moleculardevices.com/microplateML');
I then try and access an element called common-name and an attribute called name.
foreach my $camelid ($xpc->findnodes('//microplateML:species')) {
my $latin_name = $camelid->findvalue('#name');
my $common_name = $camelid->findvalue('common-name');
print "$latin_name, $common_name" ;
}
But only the latin-name (#name) is printing out, the common-name is not. What am I doing wrong and how can I get the common-name to print out as well?
What does the #name do in this case? I presume it is an array, and that attributes should be put into an array as there can be more than one, but elements (like common-name) should not be because there should just be one?
I've been following the examples here: http://www.xml.com/pub/a/2001/11/14/xml-libxml.html
and here: http://perl-xml.sourceforge.net/faq/#namespaces_xpath, and trying to get their example camel script working with my namespace, hence the weird namespace.
Make sure you XML file is valid then use $node->getAttribute("someAttribute") to access attributes.
#name is a attribute name. You'd use it in findnodes() to specify elements with a given attribute set. Eg. a path like:
//camelids/species[#name="Camelus bactrianus"]/
Here is a simple/contrived example:
#!/usr/bin/perl -w
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file('/Users/castle/Desktop/animal.xml');
my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() );
$xc->registerNs('ns', 'http://moleculardevices.com/microplateML');
my #n = $xc->findnodes('//ns:species');
foreach $nod (#n) {
print "A: ".$nod->getAttribute("name")."\n";
my #c = $xc->findnodes("./ns:common-name", $nod);
foreach $cod (#c) {
print "B: ".$cod->nodeName;
print " = ";
print $cod->getFirstChild()->getData()."\n";
}
}
Output is:
perl ./xmltest.pl
A: Camelus bactrianus
B: common-name = Bactrian Camel