Process quoted string within XML - perl

Perl version: perl, v5.10.1 (*) built for x86_64-linux-thread-multi
I am a relative newbie to perl. I have tried looking at the various XML processing utilities for Perl, XML::Simple, XML::Parser, XML::LibXML, XML::DOM, XML::XML::Twig, XML::XPath etc.
I am trying to process some XML that has quotes in the value portion. I am specifically looking to extract the title from the below XML, however, I've been stumbling over this for a bit now and would appreciate some help if possible.
$VAR1 = {
'issue' => {
'priority' => {
'fid' => '11',
'content' => '3 - Best Effort'
},
'transNum' => {
'fid' => '2',
'content' => '170'
},
'dueDate' => {
'fid' => '17',
'content' => '1327944695'
},
'status' => {
'fid' => '18',
'content' => 'Open - Unassigned'
},
'createdBy' => {
'fid' => '15',
'content' => '32'
},
'title' => {
'fid' => '20',
'content' => 'Testing on spider - issue with "quotation marks"'
},
'description' => {
'fid' => '22',
'content' => 'Noticed issue with title having quotes in title'
},
'issueNum' => {
'fid' => '1',
'content' => '33'
}
}
};
Using XML::LibXML and following code (Note: above if print of contents of $issueXML variable):
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($issueXML);
print $doc->toString;
This prints out:
<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
I am looking to specifically extract value for the title tag.
When I was processing using XML::Parser, I kept ending up with just the final quote mark. I would like to maintain the same format of the string to display:
Testing on spider - issue with "quotation marks"
I am a bit overwhelmed at the moment with the various XML processing functions. I have tried for awhile now to figure this out, and I am seriously spinning my wheels.
TIA, Appreciate any help,
Regards,
Scott

Another go with XML::LibXML. You should have no problems with quotation marks inside text nodes.
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;
my $xml = XML::LibXML->load_xml(string => q{<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
});
my $title = $xml->find('/issues/issue/title');
print $title->get_node(0)->textContent;

I am not sure what problem you run into with the quotation marks. They're just a character like any other, except in attribute values where you may have to use an entity if the quote is already used as the value delimiter. Are you sure the "problem" is not just with the way Data::Dumper displays the data structure generated by XML::Simple?
In any case stay away from XML::Parser, which is too low-level, use XML::LibXML or XML::Twig. XML::Simple seems to generate a lot of questions, especially from people not familiar with Perl, so I am not sure it's the right tool to use.
Here is a solution with XML::Twig, but there are any other ways to do this, depending on exactly what you want to do with the titles.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $issueXML=q{<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
};
my $t= XML::Twig->new( twig_handlers => { title => sub { print $_->text, "\n"; } })
->parse( $issueXML);

I usually use XML::XSH2 for XML manipulation. Your problem simplifies to:
open FILE.xml ;
for //title echo (.) ;

Your best way of pulling bits out of XML is with an XPath query.
In this case you are looking for the element 'title', inside an element 'issue', inside an element 'issues'.
So your XPath query is simply '//issues/issue/title'.
In two lines of code, you can use XML::LibXML::XPathContext to perform the XPath query for you, which will return the element's content which you are looking for.
This code snippet will demonstrate a simple way of doing an XPath query. The important bit of it is the two lines following the comment "Relevant bit here".
For more information, see the documentation for XML::LibXML::XPathContext
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $xml = XML::LibXML->load_xml(string => q{<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
});
# Relevant bit here
my $xc = XML::LibXML::XPathContext->new($xml);
my $title = $xc->find('//issues/issue/title');
print "$title\n";
# prints:
# Testing on spider - issue with "quotation marks"

Related

Perl XML::Twig - preserving quotes in and around attributes

I'm selectively fixing some elements and attributes. Unfortunately, our input files contain both single- and double-quoted attribute values. Also, some attribute values contain quotes (within a value).
Using XML::Twig, I cannot see out how to preserve whatever quotes exist around attribute values.
Here's sample code:
use strict;
use XML::Twig;
my $file=qq(<file>
<label1 attr='This "works"!' />
<label2 attr="This 'works'!" />
</file>
);
my $fixes=0; # count fixes
my $twig = XML::Twig->new( twig_handlers => {
'[#attr]' => sub {fix_att(#_,\$fixes);} },
# ...
keep_atts_order => 1,
keep_spaces => 1,
keep_encoding => 1, );
#$twig->set_quote('single');
$twig->parse($file);
print $twig->sprint();
sub fix_att {
my ($t,$elt,$fixes) =#_;
# ...
}
The above code returns invalid XML for label1:
<label1 attr="This "works"!" />
If I add:
$twig->set_quote('single');
Then we would see invalid XML for label2:
<label2 attr='This 'works'!' />
Is there an option to preserve existing quotes? Or is there a better approach for selectively fixing twigs?
Is there any specific reason for you to use keep_encoding? Without it the quote is properly encoded.
keep_encoding is used to preserve the original encoding of the file, but there are other ways to do this. It was used mostly in the pre-5.8 era, when encodings didn't work as smoothly as they do now.

How to build a hashref with arrays in perl?

I am having trouble building what i think is a hashref (href) in perl with XML::Simple.
I am new to this so not sure how to go about it and i cant find much to build this href with arrays. All the examples i have found are for normal href.
The code bellow outputs the right xml bit, but i am really struggling on how to add more to this href
Thanks
Dario
use XML::Simple;
$test = {
book => [
{
'name' => ['createdDate'],
'value' => [20141205]
},
{
'name' => ['deletionDate'],
'value' => [20111205]
},
]
};
$test ->{book=> [{'name'=> ['name'],'value'=>['Lord of the rings']}]};
print XMLout($test,RootName=>'library');
To add a new hash to the arrary-ref 'books', you need to cast the array-ref to an array and then push on to it. #{ $test->{book} } casts the array-ref into an array.
push #{ $test->{book} }, { name => ['name'], value => ['The Hobbit'] };
XML::Simple is a pain because you're never sure whether you need an array or a hash, and it is hard to distinguish between elements and attributes.
I suggest you make a move to XML::API. This program demonstrates some how it would be used to create the same XML data as your own program that uses XML::Simple.
It has an advantage because it builds a data structure in memory that properly represents the XML. Data can be added linearly, like this, or you can store bookmarks within the structure and go back and add information to nodes created previously.
This code adds the two book elements in different ways. The first is the standard way, where the element is opened, the name and value elements are added, and the book element is closed again. The second shows the _ast (abstract syntax tree) method that allows you to pass data in nested arrays similar to those in XML::Simple for conciseness. This structure requires you to prefix attribute names with a hyphen - to distinguish them from element names.
use strict;
use warnings;
use XML::API;
my $xml = XML::API->new;
$xml->library_open;
$xml->book_open;
$xml->name('createdDate');
$xml->value('20141205');
$xml->book_close;
$xml->_ast(book => [
name => 'deletionDate',
value => '20111205',
]);
$xml->library_close;
print $xml;
output
<?xml version="1.0" encoding="UTF-8" ?>
<library>
<book>
<name>createdDate</name>
<value>20141205</value>
</book>
<book>
<name>deletionDate</name>
<value>20111205</value>
</book>
</library>

Escape special character at text

I am reading a xml file, and I add some additional text, but I can't get exact text because some special characters automatically converted.
I try this:
<book>
<book-meta>
<book-id pub-id-type="doi">1545</book-id>
<book-title>Regenerating <?tex?> the Curriculum</book-title>
</book-meta>
</book>
Script:
use strict;
use XML::Twig;
open(my $out, '>', 'Output.xml') or die "can't Create stroy file $!\n";
my $story_file = XML::Twig->new(
twig_handlers => {
'book-id' => sub { $_->set_text('<?sample?>') },
keep_atts_order => 1,
},
pretty_print => 'indented',
);
$story_file->parsefile('sample.xml');
$story_file->print($out);
Output:
<book>
<book-meta>
<book-id pub-id-type="doi"><?sample?></book-id>
<book-title>Regenerating <?tex?> the Curriculum</book-title>
</book-meta>
</book>
I would like output as:
<book>
<book-meta>
<book-id pub-id-type="doi"><?sample?></book-id>
<book-title>Regenerating <?tex?> the Curriculum</book-title>
</book-meta>
</book>
How can I escape this type of character in XML twig. I tried the set_asis option, but I can't get it to work.
XML::Twig is correctly inserting the string <?sample?> for you as you are asking for a PCDATA node to be added and < must be replaced with < in such a node. However what you want is a processing instruction node.
The easiest way to insert such a node using XML::Twig is using the set_inner_xml method, which will parse an XML tree fragment from a string and insert it as the contents of the current node.
If you replace
$_->set_text('<?sample?>')
with
$_->set_inner_xml('<?sample?>')
then your code should do what you want. The output I get is
<book>
<book-meta>
<book-id pub-id-type="doi"><?sample?></book-id>
<book-title>Regenerating <?tex?> the Curriculum</book-title>
</book-meta>
</book>
<? ..... ?> is not (part of) text but a processing instruction. When you add it you your XML with set_text however it is processed as text, hence the <.
I'm not familiar with XML::Twig myself, but I think you should check for the possibility to add a processing instruction instead of text.

What is wrong with my declaration of a hash inside a hash in Perl?

I am struggling with the following declaration of a hash in Perl:
my %xmlStructure = {
hostname => $dbHost,
username => $dbUsername,
password => $dbPassword,
dev_table => $dbTable,
octopus => {
alert_dir => $alert_dir,
broadcast_id => $broadcast_id,
system_id => $system_id,
subkey => $subkey
}
};
I've been googling, but I haven't been able to come up with a solution, and every modification I make ends up in another warning or in results that I do not want.
Perl complaints with the following text:
Reference found where even-sized list expected at ./configurator.pl line X.
I am doing it that way, since I want to use the module:
XML::Simple
In order to generate a XML file with the following structure:
<settings>
<username></username>
<password></password>
<database></database>
<hostname></hostname>
<dev_table></dev_table>
<octopus>
<alert_dir></alert_dir>
<broadcast_id></broadcast_id>
<subkey></subkey>
</octopus>
</settings>
so sometthing like:
my $data = $xmlFile->XMLout(%xmlStructure);
warn Dumper($data);
would display the latter xml sample structure.
Update:
I forgot to mention that I also tried using parenthesis instead of curly braces for the hash reference, and eventhough it seems to work, the XML file is not written properly:
I end up with the following structure:
<settings>
<dev_table>5L3IQWmNOw==</dev_table>
<hostname>gQMgO3/hvMjc</hostname>
<octopus>
<alert_dir>l</alert_dir>
<broadcast_id>l</broadcast_id>
<subkey>l</subkey>
<system_id>l</system_id>
</octopus>
<password>dZJomteHXg==</password>
<username>sjfPIQ==</username>
</settings>
Which is not exactly wrong, but I'm not sure if I'm going to have problems latter on as the XML file grows bigger. The credentials are encrypted using RC4 algorith, but I am encoding in base 64 to avoid any misbehavior with special characters.
Thanks
{} are used for hash references. To declare a hash use normal parentheses ():
my %xmlStructure = (
hostname => $dbHost,
username => $dbUsername,
password => $dbPassword,
dev_table => $dbTable,
octopus => {
alert_dir => $alert_dir,
broadcast_id => $broadcast_id,
system_id => $system_id,
subkey => $subkey
}
);
See also perldoc perldsc - Perl Data Structures Cookbook.
For your second issue, you should keep in mind that XML::Simple is indeed too simple for most applications. If you need a specific layout, you're better off with a different way of producing the XML, say, using HTML::Template. For example (I quoted variable names for illustrative purposes):
#!/usr/bin/env perl
use strict; use warnings;
use HTML::Template;
my $tmpl = HTML::Template->new(filehandle => \*DATA);
$tmpl->param(
hostname => '$dbHost',
username => '$dbUsername',
password => '$dbPassword',
dev_table => '$dbTable',
octopus => [
{
alert_dir => '$alert_dir',
broadcast_id => '$broadcast_id',
system_id => '$system_id',
subkey => '$subkey',
}
]
);
print $tmpl->output;
__DATA__
<settings>
<username><TMPL_VAR username></username>
<password><TMPL_VAR password></password>
<database><TMPL_VAR database></database>
<hostname><TMPL_VAR hostname></hostname>
<dev_table><TMPL_VAR dev_table></dev_table>
<octopus><TMPL_LOOP octopus>
<alert_dir><TMPL_VAR alert_dir></alert_dir>
<broadcast_id><TMPL_VAR broadcast_id></broadcast_id>
<subkey><TMPL_VAR subkey></subkey>
<system_id><TMPL_VAR system_id></system_id>
</TMPL_LOOP></octopus>
</settings>
Output:
<settings>
<username>$dbUsername</username>
<password>$dbPassword</password>
<database></database>
<hostname>$dbHost</hostname>
<dev_table>$dbTable</dev_table>
<octopus>
<alert_dir>$alert_dir</alert_dir>
<broadcast_id>$broadcast_id</broadcast_id>
<subkey>$subkey</subkey>
<system_id>$system_id</system_id>
</octopus>
</settings>
You're using the curly braces { ... } to construct a reference to an anonymous hash. You should either assign that to a scalar, or change the { ... } to standard parentheses ( ... ).

Perl: what kind of data should i feed to delcampe API?

I write soap-client based on Delcampe API. Simple methods work fine, but functions with need on complex data give me an error message like "You must send item's data!". Based on PHP example here i thought, that data should be either hash or hashref, but both give me error mentioned before.
Sample script i use:
use 5.010;
use SOAP::Lite;
use SOAP::WSDL;
use strict;
use warnings;
use Data::Dumper;
my $API_key = 'xyz';
my $service = SOAP::Lite->service('http://api.delcampe.net/soap.php?wsdl');
my $return = $service->authenticateUser($API_key);
if ($return->{status}) {
my $key = $return->{data};
my %data = (description => 'updated description');
my $response = $service->updateItem($key, 123456, \%data);
if ($response->{status}) {
say Dumper $response->{data};
} else {
say $response->{errorMsg};
}
} else {
say "no: " . $return->{status};
}
So, what kind of data structure should i use instead of %data or how could i debug the SOAP-envelope, which is produced as request? (PHP code based on example works fine)
ADDITION
with use SOAP::Lite qw(trace); igot SOAP envelope too:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:soap-enc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://api.delcampe.net/soap.php">
<soap:Body>
<tns:updateItem>
<token xsi:type="xsd:string">secret_one</token>
<id_item xsi:type="xsd:int">123456</id_item>
<arrData xsi:nil="true" xsi:type="soap-enc:Array" />
</tns:updateItem>
</soap:Body>
</soap:Envelope>
As seen above, there is no bit of data sent. I tried data also as string, array and arrayref. Maybe it is bug of SOAP::Lite?
May be you'd try to replace
my %data = (description => 'updated description');
with
my $data = SOAP::Data->name(description => 'updated description');
We have similar issues when working on our SOAP API, and it was solved by something like that, wrapping complex data into SOAP::Data. So I hope this'll help. )
UPDATE:
The previous advice didn't help: looks like it's indeed the SOAP::Lite bug, which ignores the 'soap-enc:Array' definition in WSDL file whatsoever.
Have finally found a workaround, though. It's not pretty, but as a final resort it may work.
First, I've manually downloaded the WSDL file from Delcampe site, saved it into local directory, and referred to it as ...
my $service = SOAP::Lite->service('file://...delcampe.wsdl')
... as absolute path is required.
Then I've commented out the 'arrData line' within WSDL updateItem definition.
And, finally, I've made this:
my $little_monster = SOAP::Data->name(arrData =>
\SOAP::Data->value((
SOAP::Data->name(item =>
\SOAP::Data->value(
SOAP::Data->name(key => 'personal_reference'),
SOAP::Data->name(value => 'Some Personal Reference')->type('string'),
)
),
SOAP::Data->name(item =>
\SOAP::Data->value(
SOAP::Data->name(key => 'title'),
SOAP::Data->name(value => 'Some Amazing Title')->type('string'),
)
),
# ...
))
)->type('ns1:Map');
... and, I confess, successfully released it into the wilderness by ...
$service->updateItem($key, 123456, $little_monster);
... which, at least, generated more-o-less likable Envelope.
I sincerely hope that'll save at least some poor soul from banging head against the wall as much as I did working on all that. )