Why doesn't Perl's XML::LibXML module (specifically XPathContext) evaluate positions? - perl

I have an XML representation of a document that has the form:
<response>
<paragraph>
<sentence id="1">Hey</sentence>
<sentence id="2">Hello</sentence>
</paragraph>
</response>
I'm trying to use XML::LibXML to parse a document and get the position of the sentences.
my $root_node = XML::LibXML->load_xml( ... )->documentElement;
foreach my $sentence_node ( $root_node->findnodes('//sentence')->get_nodelist ){
print $sentence_node->find( 'position()' );
}
The error I get is "XPath error : Invalid context position error". I've read up on the docs and found this interesting tidbit
evaluating XPath function position() in the initial context raises an XPath error
My problem is that I have no idea what to do with this information. What is the 'initial context'? How do I make the engine automatically track the context position?
Re: #Dan
Appreciate the answer. I tried your example and it worked. In my code, I was assuming context to be the node represented by my perl variable. So, $sentence->find( 'position()' ) I wanted to be './position()'. Despite seeing a working example, I still can't do
foreach my $sentence ...
my $id = $sentence->getAttribute('id');
print $root_node->findvalue( '//sentence[#id=' . "$id]/position()");
I can, however, do
$root_node->findvalue( '//sentence[#id=' . "$id]/text()");
Can position() only be used to limit a query like you have?

position() does work in LibXML. For example see
my $root_node = $doc->documentElement;
foreach my $sentence_node ( $root_node->findnodes('//sentence[position()=2]')->get_nodelist ){
print $sentence_node->textContent;
}
This will print Hello with your sample data.
But the way you're using it here, there's no context. For each sentence_node, you want its position relative to what?
If you're looking for specific nodes by position, use a selector like I have above, that's simplest.

Related

Perl Hash using LibXML

I have an XML data as follows.
<type>
<data1>something1</data1>
<data2>something2</data2>
</type>
<type>
<data1>something1</data1>
<data2>something2</data2>
</type>
<type>
<data1>something1</data1>
</type>
As one can see, child node data2 is sometimes not present.
I have used this as a guideline to create the following code.
my %hash;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file_name);
my #nodes = $doc->findnodes("/type");
foreach my $node(#nodes)
{
my $key = $node->getChildrenByTagName('data1');
my $value = $node->getChildrenByTagName('data2');
$hash{$key} = $value;
}
Later, I am using this hash to generate some more data based on a fact if the child node data2 is present or not.
I use ne operator assuming that data in the %hash are key value pairs of strings and when data2 is not present, Perl inserts space as a value in the hash (I have printed this hash and found that only space is printed as a value).
However, I end up with following compilation errors.
Operation "ne": no method found,
left argument in overloaded package XML::LibXML::NodeList,
right argument has no overloaded magic at filename.pl line 74.
How do I solve this? What is the best data structure to store this data when we see that sometimes a node will not be there ?
First thing to realize is $value is an XML::LibXML::NodeList object. It only looks like a string when you print it because it has stringification overloaded. You can check with ref $value.
With my $value = $node->getChildrenByTagName('data2');, $value will always be a NodeList object. It might be an empty NodeList, but you'll always get a NodeList object.
Your version of XML::LibXML is out of date. Your version of XML::LibXML::NodeList has no string comparison overloading and, by default, Perl will not "fallback" to use stringification for other string operators like ne. I reported this bug back in 2010 and it was fixed in 2011 in version 1.77.
Upgrade XML::LibXML and the problem will go away.
As a work around you can force stringification by quoting the NodeList object.
if( "$nodelist" ne "foo" ) { ... }
But really, update that module. There's been a lot of work done on it.
Perl inserts space as a value in the hash
This is a NodeList object stringifying. I get an empty string from an empty NodeList. You might be getting a space as an old bug.
You can also check $value->size to see if the NodeList is empty.

What is Perl DBI difference between with bind_columns and without it?

What is the difference between the following code in perl dbi?
1.
while (my ($p1, $p2, $p3) = $sth->fetchrow_array()) {
# ... some code ...
}
2.
$sth->bind_columns(\my ($p1, $p2, $p3));
while ($sth->fetch) {
# ... some code ...
}
Both leads to the same result.
Perlmonks advise on bind variant.
I would appreciate if someone explain why.
the docs says, that binding is more efficient way to fetch data:
The binding is performed at a low level using Perl aliasing. Whenever
a row is fetched from the database $var_to_bind appears to be
automatically updated simply because it now refers to the same memory
location as the corresponding column value. This makes using bound
variables very efficient.

Extracting one node with LibXML

This may be very novice of me, but I am a novice at Perl LibXML (and XPath for that matter). I have this XML doc:
<Tims
xsi:schemaLocation="http://my.location.com/namespace http://my.location.com/xsd/Tims.xsd"
xmlns="http://my.location.com/namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<Error>Too many entities for operation. Acceptable limit is 5,000 and 8,609 were passed in.</Error>
<Timestamp>2012-07-27T12:06:24-04:00</Timestamp>
<ExecutionTime>41.718</ExecutionTime>
</Tims>
All I want to do is get the value of <Error>. Thats all. I've tried plenty of approaches, most recently this one. I've read the docs through and through. This is what I currently have in my code:
#!/usr/bin/perl -w
my $xmlString = <<XML;
<?xml version="1.0" encoding="ISO-8859-1"?>
<Tims
xsi:schemaLocation="http://my.location.com/namespace http://my.location.com/xsd/Tims.xsd"
xmlns="http://my.location.com/namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<Error>Too many entities for operation. Acceptable limit is 5,000 and 8,609 were passed in.</Error>
<Timestamp>2012-07-27T12:06:24-04:00</Timestamp>
<ExecutionTime>41.718</ExecutionTime>
</Tims>
XML
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xmlString);
my $root = $doc->documentElement();
my $xpc = XML::LibXML::XPathContext->new($root);
$xpc->registerNs("x", "http://my.location.com/namespace");
foreach my $node ($xpc->findnodes('x:Tims/x:Error')) {
print $node->toString();
}
Any advice, links, anything is appreciated. Thanks.
Just add a / at the beginning of the XPath (i.e. into findnodes).
Your code isn't working because you use the document element <Tims> as the context node when you create the XPath context $xpc. The <Error> element is an immediate child of this, so all you need to write is
$xpc->findnodes('x:Error')
or an alternative is to use an absolute XPath which specifies the path from the document root
$xpc->findnodes('/x:Tims/x:Error')
That way it doesn't matter what the context node of $xpc is.
But the proper way is to forget about fetching the element node altogether and use the document root as the context node. You can also use findvalue instead of findnodes to get the text of the error message without the enclosing tags:
my $parser = XML::LibXML->new;
my $doc = $parser->parse_string($xmlString);
my $xpc = XML::LibXML::XPathContext->new($doc);
$xpc->registerNs('x', 'http://my.location.com/namespace');
my $error= $xpc->findvalue('x:Tims/x:Error');
print $error, "\n";
output
Too many entities for operation. Acceptable limit is 5,000 and 8,609 were passed in.

Using hash as a reference is deprecated

I searched SO before asking this question, I am completely new to this and have no idea how to handle these errors. By this I mean Perl language.
When I put this
%name->{#id[$#id]} = $temp;
I get the error Using a hash as a reference is deprecated
I tried
$name{#id[$#id]} = $temp
but couldn't get any results back.
Any suggestions?
The correct way to access an element of hash %name is $name{'key'}. The syntax %name->{'key'} was valid in Perl v5.6 but has since been deprecated.
Similarly, to access the last element of array #id you should write $id[$#id] or, more simply, $id[-1].
Your second variation should work fine, and your inability to retrieve the value has an unrelated reason.
Write
$name{$id[-1]} = 'test';
and
print $name{$id[-1]};
will display test correctly
%name->{...}
has always been buggy. It doesn't do what it should do. As such, it now warns when you try to use it. The proper way to index a hash is
$name{...}
as you already believe.
Now, you say
$name{#id[$#id]}
doesn't work, but if so, it's because of an error somewhere else in the code. That code most definitely works
>perl -wE"#id = qw( a b c ); %name = ( a=>3, b=>4, c=>5 ); say $name{#id[$#id]};"
Scalar value #id[$#id] better written as $id[$#id] at -e line 1.
5
As the warning says, though, the proper way to index an array isn't
#id[...]
It's actually
$id[...]
Finally, the easiest way to get the last element of an array is to use index -1. The means your code should be
$name{ $id[-1] }
The popular answer is to just not dereference, but that's not correct. In other words %$hash_ref->{$key} and %$hash_ref{$key} are not interchangeable. The former is required to access a hash reference nested as an element in another hash reference.
For many moons it has been common place to nest hash references. In fact there are several modules that parse data and store it in this kind of data structure. Instantly depreciating the behavior without module updates was not a good thing. At times my data is trapped in a nested hash and the only way to get it is to do something like.
$new_hash_ref = $target_hash_ref->{$key1}
$new_hash_ref2 = $target_hash_ref->{$key2}
$new_hash_ref3 = $target_hash_ref->{$key3}
because I can't
foreach my $i(keys(%$target_hash_ref)) {
foreach(%$target_hash_ref->{$i} {
#do stuff with $_
}
}
anymore.
Yes the above is a little strange, but creating new variables just to avoid accessing a data structure in a certain way is worse. Am I missing something?
If you want one item from an array or hash use $. For a list of items use # and % respectively. Your use of # as a reference returned a list instead of an item which perl may have interpreted as a hash.
This code demonstrates your reference of a hash of arrays.
#!/usr/bin perl -w
my %these = ( 'first'=>101,
'second'=>102,
);
my #those = qw( first second );
print $these{$those[$#those]};
prints '102'

What is the correct Xpath query to access an element in Perl's XML::LibXML?

I'm trying to access an element called raw data, inside some <rawData>data is here</rawData> tags. However this XPath query with Perl's XML::LibXML is not working:
foreach my $m ($xc->findnodes(q<//ns:wave[#waveID='1']/ns:well/oneDataSet/rawData>)) {
print $m->textContent, "\n";
}
but a similar query to get an attribute #wellName is working fine:
foreach my $n ($xc->findnodes(q<//ns:wave[#waveID='1']/ns:well/#wellName>)) {
print $n->textContent, "\n";
}
What is wrong with my syntax above for accessing the element?
Without seeing your XML, I couldn't be sure but //ns:wave[#waveID='1']/ns:well/oneDataSet/rawData would make me wonder what namespace oneDataSet and rawData are supposed to be in. Do you need to prefix them?