I need to process hundreds of xml files. I'm using XML::LibXML. I'm quite new to perl and I don't understand how to close the fist XML parsed file, before opening the new one
Example
use XML::LibXML;
my ($parser, $doc, $node);
foreach my $xmlfullname (#xmlfullnamearray) {
$parser = XML::LibXML->new();
$doc = $parser->parse_file($xmlfullname);
$node = $doc->findnodes("/root/node");
...
}
Thanks to all, Riccardo
By losing all references to it, which you already do by overwriting all the variables.
A little cleaner and clearer:
use XML::LibXML;
my $parser = XML::LibXML->new();
foreach my $xmlfullname (#xmlfullnamearray) {
my $doc = $parser->parse_file($xmlfullname);
my $node = $doc->findnodes("/root/node");
...
}
Related
I am coding with Perl on a Window 7 machine. I am able to extract data from the XML using the XPath code below
use strict;
use warning;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($newfile);
my $query = "/tradenet/message/header/unique_ref_no/date/text( )";
my($node) = $doc->findnodes($query);
$node->setData("$file_seq_number");
However, when i use the same code on a different XML, the xpath from the second document looks as below:
/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric
Together with the Perl code, this is what the extraction code looks like:
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($newfile);
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric/text( )";
my($node) = $doc->findnodes($query);
$node->setData("$file_seq_number");
Using the second code, I am unable to retrieve the data from the second XML. I receive this error "Can't call method "setData"on an undefined value at Perl.pl line 5".
Does the ":" character in the second XPATH address affecting the code?
You have to define what out, cac, and cbc mean in order for the XPath query to find the appropriate nodes:
my $doc = $parser->parse_file($newfile);
my $xpath_context = XML::LibXML::XPathContext->new($doc->documentElement());
# These URIs need to be the same as the ones in the source document
$xpath_context->registerNs('out', 'http://example.com/out.xsd');
$xpath_context->registerNs('cac', 'http://example.com/cac.xsd');
$xpath_context->registerNs('cbc', 'http://example.com/cbc.xsd');
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit/out:Declaration/out:Header/cac:UniqueReferenceNumber/cbc:SequenceNumeric/text( )";
my ($node) = $xpath_context->findnodes($query);
As promised, here is a working example. First, the test input file:
<?xml version="1.0"?>
<!-- input.xml -->
<TradenetResponse xmlns:a="http://example.com/out.xsd"
xmlns:b="http://example.com/cac.xsd"
xmlns:c="http://example.com/cbc.xsd">
<OutboundMessage>
<a:OutwardPermit>
<a:Declaration>
<a:Header>
<b:UniqueReferenceNumber>
<c:SequenceNumeric>1234</c:SequenceNumeric>
</b:UniqueReferenceNumber>
</a:Header>
</a:Declaration>
</a:OutwardPermit>
</OutboundMessage>
</TradenetResponse>
And here is the working Perl script:
#!/usr/bin/perl
# parse.pl
use strict;
use warnings;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $newfile = "input.xml";
my $doc = $parser->parse_file($newfile);
my $xpath_context = XML::LibXML::XPathContext->new($doc->documentElement());
# These URIs need to be the same as the ones in the source document
$xpath_context->registerNs('out', 'http://example.com/out.xsd');
$xpath_context->registerNs('cac', 'http://example.com/cac.xsd');
$xpath_context->registerNs('cbc', 'http://example.com/cbc.xsd');
# Query wrapped for clarity
my $query = "/TradenetResponse/OutboundMessage/out:OutwardPermit" .
"/out:Declaration/out:Header/cac:UniqueReferenceNumber" .
"/cbc:SequenceNumeric/text()";
my ($node) = $xpath_context->findnodes($query);
print "Value: " . $node->getData() . "\n";
The output for me is:
sean#localhost:~xmltest$ ./parse.pl
Value: 1234
I want to change the path for a bunch of hyperlinks in an Excel spreadsheet. After searching Google, I came across a solutions to the problem of adding hyperlinks to spreadsheets, but not changing them. Microsoft showed how to something close with VBA here.
Since I want to edit every single hyperlink in my document, the key steps that I don't know how to solve are:
Get a list of hyperlink objects in Perl
Extract their addresses 1 by 1 and
Run a regular expression to make the path change
Store the updated path in the Hyperlink->object and repeat
I am new to using the OLE and am getting tripped up on (1). Here is what I have tried so far:
#!perl
use strict;
use warnings;
use 5.014;
use OLE;
use Win32::OLE::Const "Microsoft Excel";
my $file_name = 'C:\path\to\spreadsheet.xlsx';
my $excel = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;});
$excel->{Visible} = 1;
my $workbook = $excel->Workbooks->Open($file_name);
my $sheet = $workbook->Worksheets('Sheet 1');
foreach my $link (in $sheet->Hyperlinks ) {
say $link->Address;
}
But this gives code the error:
Win32::OLE(0.1709): GetOleEnumObject() Not a Win32::OLE::Enum object at C:/Dwimperl/perl/vendor/lib/Win32/OLE/Lite.pm line 167.
Can't call method "Hyperlinks" without a package or object reference at at script.pl line 14.
It's selecting the right worksheet, so I am not sure why it complains about an object reference. I tried several variations (Adding {} around Hyperlinks, removing the 'in', trying to store it as a list, as a hash, and as a reference to a hash) Can anyone give me some pointers? Thanks!
First, you should set $Win32::OLE::Warn=3 so your script will croak the moment something goes wrong. Second, I know you can't select sheets by name in older versions of Excel, although I do not know what things are like in the newest versions. Finally, I think you'll find it easier to use Win32::OLE::Enum.
Here is an example:
#!/usr/bin/env perl
use 5.014;
use warnings; use strict;
use Carp qw( croak );
use Path::Class;
use Try::Tiny;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Excel';
use Win32::OLE::Enum;
$Win32::OLE::Warn = 3;
my $book_file = file($ENV{TEMP}, 'test.xls');
say $book_file;
my $excel = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;});
$excel->{Visible} = 1;
my $book = $excel->Workbooks->Open("$book_file");
my $sheet = get_sheet($book, 'Sheet with Hyperlinks');
my $links = $sheet->Hyperlinks;
my $it = Win32::OLE::Enum->new($links);
while (defined(my $link = $it->Next)) {
my $address = $link->{Address};
say $address;
if ($address =~ s/example/not.example/) {
$link->{Address} = $address;
$link->{TextToDisplay} = "Changed to $address";
}
}
$book->Save;
$book->Close;
$excel->Quit;
sub get_sheet {
my ($book, $wanted_sheet) = #_;
my $sheets = $book->Worksheets;
my $it = Win32::OLE::Enum->new($sheets);
while (defined(my $sheet = $it->Next)) {
my $name = $sheet->{Name};
say $name;
if ($name eq $wanted_sheet) {
return $sheet;
}
}
croak "Could not find '$wanted_sheet'";
}
The workbook did contain a sheet with the name "Sheet with Hyperlinks". Cell A1 in that sheet contained http://example.com and A2 contained http://stackoverflow.com.
I'm using an XPath like "school/student[4]". Could the setNodeText function save to harddisk? My changes only seem to be made in memory.
If I understand correctly, you are trying to change a document then write it to disk.
use XML::LibXML qw( );
my $parser = XML::LibXML->new();
my $doc = $parser->parse_fh(...);
my $root = $doc->documentElement();
for my $node ($root->findnodes('//school/student[4]')) {
$node->removeChildNodes();
$node->appendText("New text");
}
open(my $fh, '>:raw', ...) or die $!;
print($fh $doc->toString());
You can dump the XML structure using the undocumented method getNodeAsXML. The output isn't garanteed to be valid XML (e.g. no header), but it usually does the trick.
my $str = $xp->getNodeAsXML();
print $str;
Source: http://www.perlmonks.org/?node_id=567212
I'm trying to find the comment # VERSION in a perl source file. I then want to insert the version before the comment (or in place of doesn't matter). Could anyone tell me the right way to do this with PPI?
before
use strict;
use warnings;
package My::Package;
# VERSION
...
after
use strict;
use warnings;
package My::Package;
our $VERSION = 0.1;# VERSION
...
maintaining the # VERSION in the end result is optional
I actually have a couple of ideas on how to find # VERSION but one is a regex of a serialized ppi doc which doesn't seem right, and the other is using find_first on a Comment but if it's not the first I'm not sure what to do.
Updated code This seems closer to a correct solution since it only looks at the comments. but I'm not sure how to use or really how to create a new variable.
#!/usr/bin/env perl
use 5.012;
use strict;
use warnings;
use PPI;
my $ppi = PPI::Document->new('test.pm');
my $comments = $ppi->find('PPI::Token::Comment');
my $version = PPI::Statement::Variable->new;
foreach ( #{$comments} ) {
if ( /^\s*#\s+VERSION\b$/ ) {
$_->replace($version);
}
}
UPDATE
The answer to this question became the foundation for DZP::OurPkgVersion
Here's some code that does something like what you describe - It'll get you started anyway. It's edited from Catalyst::Helper::AuthDBIC (source), which is a full example of working with PPI (although bits of it may not be best practices):
sub make_model {
### snip some stuff
my $module = "lib/$user_schema_path.pm";
my $doc = PPI::Document->new($module);
my $digest_code = # some code
my $comments = $doc->find(
sub { $_[1]->isa('PPI::Token::Comment')}
);
my $last_comment = $comments->[$#{$comments}];
$last_comment->set_content($digest_code);
$doc->save($module);
}
I suppose in your case you grab the $comments arrayref and modify the first item that matches /VERSION/ with the replacement content.
And here's the final code courtesy of the poster:
#!/usr/bin/env perl
use 5.012;
use warnings;
use PPI;
my $ppi = PPI::Document->new('test.pm');
my $comments = $ppi->find('PPI::Token::Comment');
my $version = 0.01;
my $_;
foreach ( #{$comments} ) {
if ( /^(\s*)(#\s+VERSION\b)$/ ) {
my $code = "$1" . 'our $VERSION = ' . "$version;$2\n";
$_->set_content("$code");
}
}
$ppi->save('test1.pm');
EDIT: If you have an example in VBA, I'll take it. I'm just trying to understand how to use the Range object with the Tables collection to copy and paste multiple tables without looping. Put another way, how can I specify a range of 1..lastTable using the Tables collection? If I can see a working VBA example of this, I'll work on the VBA --> Perl conversion.
I'm trying to use Perl's Win32::OLE module (via Dave Roth's excellent book) to automate a couple tasks I need to repeatedly perform on some Word documents. However, the book (and most web examples) tends to use Excel for examples, so I am not sure how to copy and paste effectively with the Tables collection object.
Here is a snippet of my code:
my $originalDoc = $MSWord->Documents->Open('C:\Perl\testDocument.doc');
my $newDoc = $MSWord->Documents->Add;
my $selection = $MSWord->Selection(); # this may be spurious
my $Count = int( $originalDoc->Tables()->{Count} );
my $range = $originalDoc->Tables()->Range( { Start => $originalDoc->Tables(1)->{Range}->{Start},
End => $originalDoc->Tables($Count)->{Range}->{End}
} );
$range->Copy();
$newDoc->Range()->Paste();
The original code used Paragraphs, not Tables, so I assume some of the bugs are artifacts from that code (or more likely my non-understanding of that code).
Copying and pasting tables one at a time might be preferable:
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec::Functions qw( catfile );
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Word';
$Win32::OLE::Warn = 3;
my $word = get_word();
$word->{Visible} = 1;
my $doc = $word->{Documents}->Open(catfile $ENV{TEMP}, 'test.doc');
my $newdoc = $word->Documents->Add;
my $n_tables = $doc->Tables->Count;
for my $table_i ( 1 .. $n_tables ) {
my $table = $doc->Tables->Item($table_i);
$table->Select;
$word->Selection->Copy;
my $end = $newdoc->GoTo(wdGoToLine, wdGoToLast);
$end->InsertBefore("\n");
$end = $newdoc->GoTo(wdGoToLine, wdGoToLast);
$end->Select;
$word->Selection->Paste;
}
$doc->Close(0);
$newdoc->SaveAs('test-output.doc');
sub get_word {
my $word;
eval {
$word = Win32::OLE->GetActiveObject('Word.Application');
};
die "$#\n" if $#;
unless(defined $word) {
$word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit })
or die "Oops, cannot start Word: ",
Win32::OLE->LastError, "\n";
}
return $word;
}