Saxon XPATH 2.0-- working correctly? - eclipse

I've got a simple XML file for testing:
<?xml version="1.0" encoding="UTF-8"?>
<A>
<B >1</B>
<B >2</B>
<B >3</B>
<B >2</B>
<B >4</B>
<B >5</B>
<B >3</B>
</A>
And I'm trying out some XPATH 2.0 functions in the XPATH console in Eclipse (Juno). I can't seem to get the following to run:
distinct-values(/A/B/text())
I think that the result should be the numbers 1-5. Instead, I get
1
2
3
2
4
5
3
Can anyone else confirm this? I tried with Saxon 9.4 as the XSLT 2.0 processor, again in Eclipse. Tried with PsychoPath too.
Do i have a misunderstanding of distinct-values()? I thought in XSLT 2.0 it would take a sequence of atomic data or a set of nodes and spit out the unique items.
Thx.

I can assure you that Saxon supports the distinct-values function, it works fine for me with Saxon 9.5 HE Java run from the command line (using XQuery instead of XPath as there is no command line interface for pure XPath 2.0 but that should not matter to test a simpe function call). So when using
'C:\Program Files (x86)\Java\jre7\bin\java.exe' -cp 'C:\Program Files (x86)\Saxonica\SaxonHE9.5J\saxon9he.jar' net.sf.saxon.Query -s:test2013061801.xml '-qs:distinct-values(/A/B)'
at a Powershell command line prompt on Windows 8 where the contents of the input XML test2013061801.xml is your file
<?xml version="1.0" encoding="UTF-8"?>
<A>
<B >1</B>
<B >2</B>
<B >3</B>
<B >2</B>
<B >4</B>
<B >5</B>
<B >3</B>
</A>
Saxon outputs <?xml version="1.0" encoding="UTF-8"?>1 2 3 4 5.
I don't know how Saxon integrates in Eclipse or what PsychoPath is but with Saxon or any other XPath 2.0 or XQuery 1.0 you shouldn't have any problem with that path expression returning distinct values.

Yes distinct-value function work fine and you can find correct result from below mention code:
<xsl:template match="A">
<xsl:for-each select="distinct-values(child::B)">
<xsl:value-of select="."></xsl:value-of>
</xsl:for-each>
</xsl:template>

Related

XSLT streaming not streaming

I'm using Saxon-EE for the purpose of streaming XSLT transformation of large XML. The transformation works fine but it seems it's not really streaming since the java.exe process is inflating: for a 100 MB XML, process memory increases ~1GB. This is the XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:bb="urn:xx-zz-1.1"
xmlns:aa="urn:xx-yy-1.1">
<xsl:mode streamable="yes"/>
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="aa:LevelOne/aa:LevelTwo">
<xsl:iterate select="bb:LevelThree! copy-of(.)">
<xsl:value-of select="concat(bb:fieldOne,',',bb:fieldTwo,'
')"/>
</xsl:iterate>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
This is the XML:
<?xml version="1.0" encoding="utf-8"?>
<aa:LevelOne xmlns="urn:xx-zz-1.1" xmlns:aa="urn:xx-yy-1.1">
<aa:LevelTwo xmlns="urn:xx-zz-1.1" xmlns:aa="urn:xx-yy-1.1">
<LevelThree xmlns="urn:xx-zz-1.1">
<fieldOne>f1</fieldOne>
<fieldTwo>f2</fieldTwo>
</LevelThree>
<!-- Level three is repeated many times -->
</aa:LevelTwo>
</aa:LevelOne>
I would like to know if there (& what) is a problem with the XSLT above.
The code I use:
net.sf.saxon.s9api.Processor processor = new net.sf.saxon.s9api.Processor(true);
processor.setConfigurationProperty(Feature.STREAMABILITY, "standard");
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable stylesheet = compiler.compile(new StreamSource(stylesheetFile));
Serializer out = processor.newSerializer(outputCsvFile);
Xslt30Transformer transformer = stylesheet.load30();
transformer.applyTemplates(new StreamSource(xmlFile), out);
EDIT: Fixed the XSLT so it compiles & added XML example.
Remark: using command java -cp "<path>\test;<path>\saxon9ee.jar" com.example.test.Test -t does not ouput additional info (only the printlns in the code). java -cp "<path>\test;<path>\saxon9ee.jar" -t com.example.test.Test outputs: Unrecognized option: -Xt Error: Could not create the Java Virtual Machine. If I change the XSLT to non-streamable rule e.g. remove the iterate line, program outputs Template rule is not streamable, also without -t option. In this case if I remove the streamability requirement from code/xslt, the error goes away.
Thanks.
Probably the most likely reason Saxon would fall back to non-streaming mode is that it hasn't located a Saxon-EE license. The easiest way to test that is (unintuitively!) by calling processor.isSchemaAware() - that will only be true if you're running Saxon-EE code with a recognized license, which is exactly the same condition to enable streaming.
If it hasn't found a license, the Saxon documentation includes a section on troubleshooting license problems at http://www.saxonica.com/documentation/index.html#!about/license
Also, try it from the command line with option -t; that will give you more information (a) about streaming, and (b) about loading of license files.
I think, if the data is as simple and as regular as shown in the question, then you can avoid the use of copy-of() and simply use
<xsl:iterate select="bb:LevelThree">
<xsl:value-of select="bb:*" separator=","/>
<xsl:text>
</xsl:text>
</xsl:iterate>
That in a quick test shows a reduced memory consumption compared to your posted approach.
As for your posted approach not using streaming with Saxon EE 9.9, I have tested the posted XSLT and the input sample with Saxon 9.9 EE from the command line with the -t option and it shows the input is streamed.
I also think the Java code shown is fine to process the file with streaming with Saxon EE.
For a detailed analysis of the memory consumption and any problems you encounter with that it might be better to raise an issue with all details on the Saxonica support site. I am not sure how the memory info Saxon outputs relates exactly to the one you say you see for java.exe.

Powershell: XPath cannot select when element has "xmlns" tag?

I've got a very simple xml, as below:
<?xml version="1.0" encoding="utf-8"?>
<First>
<Second>
<Folder>today</Folder>
<FileCount>10</FileCount>
</Second>
<Second>
<Folder>tomorrow</Folder>
<FileCount>90</FileCount>
</Second>
<Second>
<Folder>yesterday</Folder>
<FileCount>22</FileCount>
</Second>
</First>
Then I have a powershell script to select "Folder" element:
[xml]$xml=Get-Content "D:\m.xml"
$xml.SelectNodes("//Folder")
It outputs:
#text
-----
today
tomorrow
yesterday
No problem. But if I change the xml file to add "xmlns="http://schemas.microsoft.com/developer/msbuild/2003" to "First" like below:
<?xml version="1.0" encoding="utf-8"?>
<First xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Second>
<Folder>today</Folder>
<FileCount>10</FileCount>
</Second>
<Second>
<Folder>tomorrow</Folder>
<FileCount>90</FileCount>
</Second>
<Second>
<Folder>yesterday</Folder>
<FileCount>22</FileCount>
</Second>
</First>
Then, my powershell script outputs nothing.
Why? How to change my powershell script to support this xmlns?
Thanks a lot.
What you added is default namespace. Unlike prefixed namespace, descendant elements inherit ancestor default namespace implicitly, unless otherwise specified (using explicit prefix or local default namespace that points to different URI).
To select element in namespace, you'll need to define prefix that point to the namespace URI and use that prefix properly in your XPath, for example :
$ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
$ns.AddNamespace("d", $xml.DocumentElement.NamespaceURI)
$xml.SelectNodes("//d:Folder", $ns)

Calabash 1.0.23 throws on xsl-formatter step: ERROR: Failed to instantiate FO provider

I'm trying to use XML Calabash 1.0.23 to run XSLT transformation and FO formatting in a single pipeline. Although the XSLT step works fine, I cannot get the xsl-formatter step to work with FOP.
Every time I run the pipeline, Calabash throws:
ERROR: pipeline.xpl:13:68:Failed to instantiate FO provider
ERROR: Underlying exception: org/apache/fop/apps/FopFactory
My call to Calabash from the command line is:
java com.xmlcalabash.drivers.Main -c cfg.xml myPipeline.xpl
And the cfg.xml configuration file being referred to in the above line is:
<cc:xproc-config xmlns:cc="http://xmlcalabash.com/ns/configuration">
<cc:fo-processor class-name="com.xmlcalabash.util.FoFOP"/>
</cc:xproc-config>
For some reason, Calabash seems to ignore the config file setting, because regardless of the value of the class-name attribute on <cc:fo-processor>, it always throws the same error message. For example, if I use com.xmlcalabash.util.FoAH, the same happens; and if put a nonsensical value, the same occurs. It's always an exception on org/apache/fop/apps/FopFactory.
Just for the sake of completeness, this is my XPL:
<declare-step name="main" version="1.0" xmlns="http://www.w3.org/ns/xproc">
<input port="parameters" kind="parameter" />
<xslt name="transformation">
<input port="source">
<document href="myMarkup.xml" />
</input>
<input port="stylesheet">
<document href="myStylesheet.xsl" />
</input>
</xslt>
<xsl-formatter href="newDoc.pdf" >
<input port="source">
<pipe step="transformation" port="result" />
</input>
</xsl-formatter>
</declare-step>
Of course, if I manually pass the generated FO from the XSLT step to FOP 1.1, it converts it to PDF with no problems. The issue only arises when trying to do the conversion within the pipeline.
I could really use some help to solve this. I'm clueless at this point.
This may seem like a very pedantic reply, but do you have fop.jar (and fop-hyph.jar, which I think FOP requires) on your classpath? They're not bundled in the XML Calabash distribution, you have to get them from Apache.

Transform referenced XML

I have a pipeline question.
I have a workflow where I have a master XML that references multiple XML files. It would be similar to a DITA map, but this is not DITA.
I am having to run an XSLT on the master XML to prepare it for a downstream process for output to the web or other medium.
Part of this transform includes resolving database file paths to relative file paths within the package that is exported from the content management system.
I can perform the transformation on the master XML just fine. However, my question involves running a XSLT on the referenced XML files. They also need the paths corrected.
Here is a simple sample...
<master>
<reference url="x-database01.xml"/>
</master>
The reference will resolve to something like url="/files/realXMLname.xml". The problem is that I don't know how to then transform realXMLname.XML to resolve the paths there.
<content>
<graphic href="x-database02.jpg"/>
</content>
The database and the downstream process are 2 different software packages with an out-of-the-box integration. I could write my own pipeline to do the transforms, but it may be cost prohibitive. The current integration only allows for 1 XSLT to be run a the pre-process step.
Is it possible to transform referenced XML files in a single XSLT transform step?
The following setup uses a main input XML input.xml which only contains a root node and two other XML files part1.xml and part2.xml which are read using the document() function. The XSLT uses <xsl:apply-templates> to trigger the conversion of the union of the two XML files. This way it should be possible to combine any number of XML files beforehand into one tree and execute the XSLT process on the resulting tree.
By the way: It turns out that this is already supported in XSLT 1.0 (at least by the xsltproc processor).
File input.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<root/>
File part1.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<root1>
<element>
<subelement1 message="Hello from part1.xml"/>
</element>
</root1>
File 'part2.xml':
<?xml version="1.0" encoding="ISO-8859-1"?>
<root2>
<element>
<subelement2 message="Hello from part2.xml"/>
</element>
</root2>
With the input above the following XSLT transformation
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="part1" select="document('part1.xml')/root1"/>
<xsl:variable name="part2" select="document('part2.xml')/root2"/>
<xsl:template match="/">
<messages>
<xsl:apply-templates select="$part1|$part2"/>
</messages>
</xsl:template>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()"/>
<xsl:template match="subelement1">
<message>
<xsl:value-of select="#message"/>
</message>
</xsl:template>
<xsl:template match="subelement2">
<message>
<xsl:value-of select="#message"/>
</message>
</xsl:template>
</xsl:stylesheet>
yields this output:
<?xml version="1.0" encoding="UTF-8"?>
<messages>
<message>Hello from part1.xml</message>
<message>Hello from part2.xml</message>
</messages>

How do I config xml namespace for a smooks input file using WSO2?

I couldn't find any advices on how to setup Smooks (on WSO2 Developer Studio 3.1) in order to it properly read a xml input which has a namespace declaration.
Without the namespace the transformation works just fine!
By informing the xmlns in the input, I get this exception:
Error on line 5, column 19 in free-marker-template
Expecting a string, date or number here, Expression .vars["order"]["order-items/order-item/#id"] is instead a freemarker.ext.dom.NodeListModel
The problematic instruction:
----------
==> ${.vars["order"]["order-items/order-item/#id"]} [on line 5, column 17 in free-marker-template]
These are both my input and output:
<order id='444' xmlns="http://example.com">
<header>
<customer number="555">Amila</customer>
</header>
<order-items>
<order-item id='1'>
<product>1</product>
<quantity>2</quantity>
<price>400</price>
</order-item>
</order-items>
</order>
<salesorder>
<details>
<orderid></orderid>
<customer>
<id></id>
<name></name>
</customer>
</details>
<itemList>
<item>
<id></id>
<productId></productId>
<quantity></quantity>
<price></price>
</item>
</itemList>
</salesorder>
I've also tried Smooks coreĀ“s namespace declarations as per Smooks site:
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd"
xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.4.xsd"
>
<core:namespaces>
<core:namespace prefix="ex" uri="http://example.com/"/>
</core:namespaces>
But it seems not to be supported in the IDE since the configuration editor raises this exception:
-Value 'org.eclipse.emf.ecore.xml.type.impl.AnyTypeImpl#53abd5a9 (mixed: null, anyAttribute: null)' is not legal. (platform:/resource/Corp/smooks-config.xml,6,20)
Well, any idea?
Instead of Smooks, can't you use a XSLT? It will avoid the namespace issue you have.