Why is this xmlns attribute messing up my xpath query? - perl

I'm parsing a simple jhove output using LibXML. However, I don't get the values I expect. Here's the code:
use feature "say";
use XML::LibXML;
my $PRSR = XML::LibXML->new();
my $xs=<DATA>;
say $xs;
my $t1 = $PRSR->load_xml(string => $xs);
say "1:" . $t1->findvalue('//date');
$xs=<DATA>;
say $xs;
$t1 = $PRSR->load_xml(string => $xs);
say "2:" . $t1->findvalue('//date');
__DATA__
<jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove http://hul.harvard.edu/ois/xml/xsd/jhove/1.3/jhove.xsd" name="Jhove" release="1.0 (beta 3)" date="2005-02-04"><date>2006-10-06T09:11:34+02:00</date></jhove>
<jhove><date>2006-10-06T09:11:34+02:00</date></jhove>
As you can see, the line "1:" is returning an empty string, while "2:" is returning the expected date. What is in the jhove-root-element that keeps the xpath query from working properly? I even tried in XML-Spy and there it works, even with the full header.
Edit: When I remove the xmlns-attribute from the root element, the xpath query works. But how is that possible?

The XML::LibXML::Node documentation specifically mentions this issue and how to deal with it...
NOTE ON NAMESPACES AND XPATH:
A common mistake about XPath is to assume that node tests consisting of an element name with no prefix match elements in the default namespace. This assumption is wrong - by XPath specification, such node tests can only match elements that are in no (i.e. null) namespace.
So, for example, one cannot match the root element of an XHTML document with $node->find('/html') since '/html' would only match if the root element <html> had no namespace, but all XHTML elements belong to the namespace http://www.w3.org/1999/xhtml. (Note that xmlns="..." namespace declarations can also be specified in a DTD, which makes the situation even worse, since the XML document looks as if there was no default namespace).
There are several possible ways to deal with namespaces in XPath:
The recommended way is to use the XML::LibXML::XPathContext module to define an explicit context for XPath evaluation, in which a document independent prefix-to-namespace mapping can be defined. For example:
my $xpc = XML::LibXML::XPathContext->new;
$xpc->registerNs('x', 'http://www.w3.org/1999/xhtml');
$xpc->find('/x:html',$node);
Another possibility is to use prefixes declared in the queried document (if known). If the document declares a prefix for the namespace in question (and the context node is in the scope of the declaration), XML::LibXML allows you to use the prefix in the XPath expression, e.g.:
$node->find('/x:html');

I found another solution. Simply using this
say "1:" . $t1->findvalue('//*[local-name()="date"]');
will also find the value and save the hassle of declaring namespaces in an XPathContext. But apart from that, tobyinks answer is the correct one.

Related

How to isolate a list of URIs in a RDF file using CONSTRUCT or DESCRIBE in SPARQL?

I'm trying to get only a list of URIs in RDF instead of a list of triples:
PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
DESCRIBE ?product
WHERE
{
?product rdfs:subClassOf gr:ProductOrService .
}
Using SELECT, instead of DESCRIBE I receive only the subject (which I want), but not as an RDF but like a SPARQL Result with binds, vars, etc.
Using CONSTRUCT, I can't specify only the ?product, as above, so the closest I can get is:
PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
CONSTRUCT
WHERE
{
?product rdfs:subClassOf gr:ProductOrService .
}
Which returns an RDF with triples of different products, but the same properties and objects.
It looks like you should read over the SPARQL specification, which tells you that this is exactly as expected. To get the single column you wish, you must use SELECT.
SPARQL has four query forms. These query forms use the solutions from pattern matching to form result sets or RDF graphs. The query forms are:
SELECT
Returns all, or a subset of, the variables bound in a query pattern match.
CONSTRUCT
Returns an RDF graph constructed by substituting variables in a set of triple templates.
ASK
Returns a boolean indicating whether a query pattern matches or not.
DESCRIBE
Returns an RDF graph that describes the resources found.
I found a solution. I've used SELECT, but instead of binds and vars, as output received a "Comma-Separated Values (with fields in N-Triples syntax)" CSV file:
<http://www.productontology.org/id/Real_estate>
<http://www.productontology.org/id/Automobile>
<http://www.productontology.org/id/Auction>
<http://www.productontology.org/id/Video_game>
<http://www.productontology.org/id/Campsite>
<http://www.productontology.org/id/Car>
<http://www.productontology.org/id/Audiobook>
<http://www.productontology.org/id/Browser_game>
...
Representing the result of a SPARQL SELECT query as an RDF List is a tooling issue - it is not something that can be solved in SPARQL in general.
Some SPARQL tools may have ways to support rendering the query result as an RDF list. But it's not something you can fix by just formulating your query differently, you'll need to use a tool to (programmatically) format the result.
In Java, using Eclipse RDF4J, you can do it as follows (untested so you may need to tweak it to work properly, but it should give you a general idea):
String query = "SELECT ?product WHERE { ?product rdfs:subClassOf gr:ProductOrService . }";
// do the query on repo and convert to a Java list of URI objects
List<URI> results = Repositories.tupleQuery(
repo,
query,
r -> QueryResults.stream(r).map(bs -> (URI)bs.getValue("product")).collect(Collectors.toList()
);
// create a resource (bnode or URI) for the start of the rdf:List
Resource head = SimpleValueFactory.getInstance().createBNode();
// convert the Java list of results to an rdf:list and add it
// to a newly-created RDF Model
Model m = RDFCollections.asRDF(results, head, new LinkedHashModel());
Once you have your result as an RDF model you can use any of the existing RDF4J APIs to write it to file or to store it in a triplestore.
Now, I am not claiming that any of this is a good idea - I have never seen a use case for something like this. But this is how I would do it if it were necessary.

Using where() node to filter empty tags in Kapacitor

Using Kapacitor 1.3 and I am trying to use the following where node to keep measurements with an empty tag. Nothing is passing through and I get the same result with ==''.
| where(lambda: 'process-cpu__process-name' =~ /^$/)
I can workaround this issue using a default value for missing tags and filter on this default tag, in the following node but I am wondering if there is a better way structure the initial where statement and avoid an extra node.
| default()
.tag('process-cpu__process-name','system')
| where(lambda: \"process-cpu__process-name\" == 'system' )
Sure it doesn't pass, 'cause this
'process-cpu__process-name'
is a string literal it TICKScript, not a reference to a field, which is
"process-cpu__process-name"
You obviously got the condition always false in this case.
Quite common mistake though, especially for someone with previous experience with the languages that tolerates both single & double quote for mere string. :-)
Also, there's a function in TICKScript lambda called strLength(), find the doc here, please.

parsing xml having multiple namespaces using xml::Libxml::Xpathcontext in perl

I have to parse following XML:
<reply xmlns="urn::ietf::param" xmlns:element="https://xml.example.net/abc/12.3"
message-id='1'>
<abc>
<xyz> hello </xyz>
</abc>
</reply>
I want the value of xyz node i.e. hello, but findnodes is returning null value.
my code is :
my $xpath=XML::LibXML::XPathContext->new($dom);
$xpath->registerNs('ns1','urn::ietf::param');
$xpath->registerNs('ns2','https://xml.example.net/abc/12.3');
$val=$xpath->findnodes('//ns1:reply/ns2:element/abc/xyz');
print $val;
but print statement is returning null value
It looks like you have some misunderstanding about how namespaces work. xmlns:element="https://xml.example.net/abc/12.3" means that there is a prefix element defined with this specific namespace URI. This namespace is actually never used in your XML.
xmlns="urn::ietf::param" defines a default namespaces and also applies to all descendant elements.
And actually you have no <element/> element in your XML.
Thus, the following XPath should work as expected:
//ns1:reply/ns1:abc/ns1:xyz

Perl LibXML: isSameNode method

I'm trying to compare two nodes using 'isSameNode'. However, one of the nodes is created via 'parse_balanced_chunk'. From the http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-isSameNode document, is says "This method provides a way to determine whether two Node references returned by the implementation reference the same object."
So I'm wonder is it not working as I would expect because they are indeed coming from two different sources (one from the parsed txt file, the other from parse_balance_chunk)?
Here is the 'test_in.xml' & code:
<?xml version="1.0"?>
<TT>
<A>ZAB</A>
<B>ZBW</B>
<C>
<D>
<E>ZSE</E>
<F>ZLC</F>
</D>
</C>
<C>
<K>
<H>dog</H>
<E>one123</E>
<M>bird</M>
</K>
</C>
</TT>
use warnings;
use strict;
use XML::LibXML;
my $parser = XML::LibXML->new({keep_blanks=>(0)});
my $dom = $parser->load_xml(location => 'test_in.xml') or die;
#called in scalar context ($na1)
my ($na1) = $dom->findnodes('//D');
my ($na2) = $dom->findnodes('//D');
my $X1 = $na1->isSameNode($na2); ##MATCHES
my ($frg) = $parser->parse_balanced_chunk ("<D><E>ZSE</E><F>ZLC</F></D>");
my $X2 = $na1->isSameNode($frg); ##WHY NO MATCH?
my ($na3) = $frg->findnodes('//D');
my $X3 = $na1->isSameNode($na3); ##WHY NO MATCH?
print "SAME?: $X1\n";
print "SAME?: $X2\n";
print "SAME?: $X3\n";
And the output:
SAME?: 1
SAME?: 0
SAME?: 0
So the first 'isSameNode' test obviously MATCHES (same exact findnodes & xpath expression).
But neither of the 2nd or 3rd 'isSameNode' tests work using the node from the 'parsed_balance_chunk'. Is it something simple I'm overlooking with the syntax or is it just that I can't compare two nodes this way? If not, what is the method for comparing two nodes? Waht I'm ultimately trying to determine if a block of xml code (i.e. from a previous parsed_balance_chuck) already exist in the xml file.
Because they're not the same node. Like the name says, it checks if two nodes are the same node. You seem think think it checks if two nodes are equivalent, but that's not what it does.

WWW::Mechanize::Firefox CSS Selector with multiple elements?

When using WWW::Mechanize::Firefox to select an item, is it possible to iterate through a number of selectors that have the same name?
I use the following code:
my $un = $mech->selector('input.normal', single => 1);
The response is 2 elements found for CSS selector. Is there a way to use XPath or a better method, or is it possible to loop through the results?
Bonus point: typing into the inputs even though it is not in form elements (ie. uses JavaScript)
With the single option you have specified that there should be exactly one element that matches the selector. That is why you get an error message when it finds two matches.
The method will return a list of matches, and you can either use one => 1 in place of single => 1, which will throw van error if there isn't at least one match, or you can leave the option out altogether, when it will simply return all that it finds.
my #inputs = $mech->selector('input.normal')
will fill the array #inputs with a list of matching <input> elements, however many there are.
Module documentation contain these examples:
my $link = $mech->xpath('//a[id="clickme"]', one => 1);
# croaks if there is no link or more than one link found
my #para = $mech->xpath('//p');
# Collects all paragraphs