XSLT2 doesn't import user functions? - import

I've just found that if main.xsl xsl:imports lib.xsl which defines some XSLT2 functions, those functions can't be used in main.xsl.
Error: There's no function in namespace http://foo.bar/my-library-ns
However, xsl:include does import those functions.

This is no reproducible
This stylesheet:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:local="http://localhost/">
<xsl:import href="lib.xsl"/>
<xsl:template match="/">
<xsl:value-of select="local:function()"/>
</xsl:template>
</xsl:stylesheet>
With this imported module:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:local="http://localhost/"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:function name="local:function" as="xs:boolean">
<xsl:sequence select="true()"/>
</xsl:function>
</xsl:stylesheet>
Output*:
true
* Tested on Saxon and Altova. (My XQSharp has expired...)

I've just found that if main.xsl
xsl:imports lib.xsl which defines some
XSLT2 functions, those functions can't
be used in main.xsl
Of course, this isn't true.
Do have a look at the FXSL library, whose templates contain functions that must be imported by the using XSLT code. The stylesheet modules of FXSL in fact import other stylesheet modules and use their functions.
Here is a simple example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="f"
>
<xsl:import href="../f/func-foldl.xsl"/>
<xsl:import href="../f/func-Operators.xsl"/>
<!--
This transformation calculates 10!
Expected result: 3628800 or 3.6288E6
-->
<xsl:output encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:value-of select="f:foldl(f:mult(), 1, 1 to 10 )"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any XML document (not used), it calculates the 10! (10 factorial) value:
3628800
All three XSLT 2.0 processors I am working with: Saxon 9, AltovaXML and XQSharp produce this same result.
Here is a partial view at the import hierarchy of the above transformation:
As we can see, only less than half of all imports are shown... :)
Of course here we only see the imported templates, but in every imported stylesheet module, for every template there are at least two <xsl:function>s.

Related

Burst Mode Vs Fully Streaming XSLT

I have written an XSLT to transform a huge incoming XML file to JSON using burst mode streaming. I am new to XSLT and have heard that there is a better way of fully streaming XSLT code which is more efficient and faster then burst mode.
Can someone please help me understand -
1. What is the difference between burst mode vs Fully streaming ?
2. How can i convert below XSLT code to fully streaming to improve the perfomance?
Below is my burst mode XSLT code -
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect" exclude-result-prefixes="xs" version="3.0">
<xsl:mode streamable="yes" on-no-match="shallow-skip"/>
<xsl:output method="text" encoding="UTF-8" indent="no"/>
<xsl:template match="wd:Report_Data">
<xsl:iterate select="wd:Report_Entry/copy-of()">
<!--Define Running Totals for Statistics -->
<xsl:param name="TotalHeaderCount" select="0"/>
<xsl:param name="TotalLinesCount" select="0"/>
<!--Write Statistics -->
<xsl:on-completion>
<xsl:text>{"Stats": </xsl:text>
<xsl:text>{"Total Header Count": </xsl:text>
<xsl:value-of select="$TotalHeaderCount"/>
<xsl:text>,</xsl:text>
<xsl:text>"Total Lines Count": </xsl:text>
<xsl:value-of select="$TotalLinesCount"/>
<xsl:text>}}</xsl:text>
</xsl:on-completion>
<!--Write Header Details -->
<xsl:text>{"id": "</xsl:text>
<xsl:value-of select="wd:id"/>
<xsl:text>",</xsl:text>
<xsl:text>"revenue_stream": "</xsl:text>
<xsl:value-of select="wd:revenue_stream"/>
<xsl:text>",</xsl:text>
<!--Write Line Details -->
<xsl:text>"lines": [ </xsl:text>
<!-- Count the number of lines for an invoice -->
<xsl:variable name="Linescount" select="wd:total_lines"/>
<xsl:iterate select="wd:lines">
<xsl:text> {</xsl:text>
<xsl:text>"sequence": </xsl:text>
<xsl:value-of select="wd:sequence"/>
<xsl:text>,</xsl:text>
<xsl:text>"sales_item_id": "</xsl:text>
<xsl:value-of select="wd:sales_item_id"/>
<xsl:text>",</xsl:text>
</xsl:iterate>
<xsl:text>}]}
</xsl:text>
<!--Store Running Totals -->
<xsl:next-iteration>
<xsl:with-param name="TotalHeaderCount" select="$TotalHeaderCount + 1"/>
<xsl:with-param name="TotalLinesCount" select="$TotalLinesCount + $Linescount"/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
</xsl:stylesheet>
Here is the sample XML -
<?xml version="1.0" encoding="UTF-8"?>
<wd:Report_Data xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect">
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-1</wd:id>
<wd:revenue_stream>TESTA</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Administrative Cost</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-10</wd:id>
<wd:revenue_stream>TESTB</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Data - Web Access</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
</wd:Report_Data>
If the order of properties in the JSON doesn't matter then you could directly create XSLT/XPath 3 maps and arrays with xsl:map/xsl:map-entry (or the XPath 3.1 map constructor) and the Saxon specific extension element saxon:array (unfortunately the XSLT 3 language standard lacks an instruction to create an array). Furthermore most of your iteration parameters seem to be easily implemented as accumulators:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
extension-element-prefixes="saxon"
xpath-default-namespace="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="adaptive" indent="yes"/>
<xsl:mode streamable="yes" use-accumulators="#all" on-no-match="shallow-skip"/>
<xsl:accumulator name="header-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry" select="$value + 1"/>
</xsl:accumulator>
<xsl:accumulator name="lines-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry/total_lines/text()" select="$value + xs:integer(.)"/>
</xsl:accumulator>
<xsl:template match="Report_Data">
<xsl:apply-templates/>
<xsl:sequence
select="map {
'Stats': map {
'Total Header Count' : accumulator-after('header-count'),
'Total Lines Count' : accumulator-after('lines-count')
}
}"/>
</xsl:template>
<xsl:template match="Report_Entry">
<xsl:map>
<xsl:apply-templates/>
</xsl:map>
</xsl:template>
<xsl:template match="Report_Entry/id | Report_Entry/revenue_stream | lines/sequence | lines/sales_item_id">
<xsl:map-entry key="local-name()" select="string()"/>
</xsl:template>
<xsl:template match="Report_Entry/lines">
<xsl:map-entry key="local-name()">
<saxon:array>
<xsl:apply-templates/>
</saxon:array>
</xsl:map-entry>
</xsl:template>
</xsl:stylesheet>
The example uses output method adaptive as your current sample doesn't create a single JSON object and I have simply tried to create the same output as your current code; the JSON output method would need a single map or array as the main sequence result.
Code works with streaming and Saxon EE 9.9.1.1 in oXygen, unfortunately 9.8 doesn't consider the code streamable.
As for general rules, there are limits as to what you can achieve with accumulators and template matching when streaming; as you can see, the accumulator to sum up the values from the total_lines elements needs to match on the text child to not consume the element in the accumulator (Saxon has another extension of capturing accumulators to ease such tasks however).
So far I would rather say it is more important to find a way to get around the streamability analysis and to have the streamable code return the right and same result as the non-streamable code; for instance, while experimenting with an approach to generate JSON with streaming using two transformation steps where some sample data similar to yours is the input, the XML representation for JSON the result of the first transformation and the JSON supposed to be the result of using xml-to-json on the first step's result I ran into a Saxon bug https://saxonica.plan.io/issues/4215.
With streaming, it seems there is not enough test coverage or implementation maturity to be able to combine features reliably in a complex and scalable way, partly due to a complex spec, partly due to the limited use of that stuff by the XSLT community.
So if you find a working way for a particular problem to use streaming to keep memory consumption lower or manageable compared to the normal XSLT 2/3 tree based processing then you can of course experiment with changes to improve performance but it is easy to break things.
One general observation is that streaming allows you to access all attributes of the currently processed/matched element but not its children, therefore it can help immensely to insert a processing steps that transforms elements into attributes if you have a simple child element structure. That way you can then often avoid any copy-of(). But of course you need a way to combine two stylesheets which Saxon allows with its API but doing it requires writing Java or .NET code.

Adjust Dita-OT plugin to output PDF wireframe with all blocks solid border

I'm interested to output solid black border surrounding all fo:blocks to aid in viewing where the borders are between elements displayed in a pdf output.
I would like to apply a transformation at the end of dita-ot plugin that applies the borders. I can fiddle with the following xsl however I'm not sure how to apply the xlst at the end of a dita-ot process.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet exclude-result-prefixes="xs ditaarch opentopic e" version="2.0" xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/" xmlns:e="com.docdept.pdf" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:opentopic="http://www.idiominc.com/opentopic" xmlns:opentopic-func="http://www.idiominc.com/opentopic/exsl/function" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|*|processing-instruction()|comment()">
<xsl:copy>
<xsl:apply-templates select="*|#*|text()|processing-instruction()|comment()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="fo:block">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:attribute name="border-style">solid</xsl:attribute>
<xsl:attribute name="border-width">0.5pt</xsl:attribute>
<xsl:attribute name="border-color">black</xsl:attribute>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I thought perhaps the following to apply wireframe.xsl at the end of the process but this does not work.
<?xml version='1.0' encoding='UTF-8'?>
<plugin id="com.docdept.pdf">
<require plugin="org.dita.pdf2" />
<feature extension="dita.conductor.transtype.check" value="adjust-pdf" />
<feature extension="dita.transtype.print" value="docdept-pdf" />
<feature extension="dita.conductor.target.relative" file="integrator.xml" />
<feature extension="dita.xsl.pdf" file="xsl/fo/wireframe.xsl"/>
</plugin>
I'm seeing that it's better to associate borders of different colors to the attribute sets so there is visual reference that can also be searched for by color name within the fo output.
<xsl:attribute name="border">1mm thin solid</xsl:attribute>
<xsl:attribute name="border-color">GOLD</xsl:attribute>

approach for generating cluster of xpaths from html files in perl

As per one of project requirement, we have to come up with an approach and also suitable perl resource for fullfilling the following requirement.
Our requirement is that we get bulk of input data in html format and we have to write a parser to bring all the varities of input data into one generic formatted xml file.
now the challenge is that while writing parser we manually used to verify very few sample html files, that was failing for other sample html files, but is highly difficult to analyse all html files manually.
Due to the above we have come to a decision that we will have some analysis tool where all the variations in html structure can be monitored using xpaths.
Can you please any of you suggest which module is suitable for my work, I know HTML::TreeBuilder::Xpath will help me in giving xpaths, but any limitations in that module?
or else suggest me the best approach for understanding the analysis of the html file, most of us in our team are more familiar with perl, that's why prefer to go and write in perl.
Suggest me if any other technology can also be used more efficiently, or else within perl also what can be the best approach?
If the html files are guaranteed to be well-formed xhtml, I'd take a look into xslt for transforming the input.
I'd check the output for missing data, since the structure of the input file is unknown. After transforming you can check for the missing data at known paths in the document.
a example how xslt works:
the html input file
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<div class="foo">1</div>
<div>
<div class="bar">2</div>
</div>
</body>
</html>
the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" />
<xsl:template match="/">
<!-- match the document root -->
<data>
<!-- apply all templates to the input document -->
<xsl:apply-templates />
</data>
</xsl:template>
<xsl:template match="/html/body/div[#class='foo']">
<!-- div with attribute class='foo' is expected only at /html/body/div -->
<foo-out><xsl:value-of select="."></xsl:value-of></foo-out>
</xsl:template>
<xsl:template match="//div[#class='bar']">
<!--div with attribute class=bar can be anywhere in the document-->
<bar-out><xsl:value-of select="."></xsl:value-of></bar-out>
</xsl:template>
</xsl:stylesheet>
the output
<?xml version="1.0"?>
<data>
<foo-out>1</foo-out>
<bar-out>2</bar-out>
</data>

current-dateTime() to number

I need to convert the current-dateTime() to a number, because the output only allows an integer (or long). How can I do that?
My desired output:
<?xml version="1.0" encoding="utf-8"?>
<currentDate>NaN</currentDate>
My XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" version="2.0">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:variable name="version" select="'1.0'"/>
<xsl:template match="/">
<xsl:element name="currentDate">
<xsl:value-of select="number(current-dateTime())"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
But I get this here:
<?xml version="1.0" encoding="utf-8"?>
<currentDate>NaN</currentDate>
When I remove the number() I do get the timestamp, but I need to have a clean number.
<?xml version="1.0" encoding="utf-8"?>
<currentDate>2014-01-15T07:01:02.526+01:00</currentDate>
How can I convert the timestamp to a clean number?
As mentioned, to convert "2014-01-15T07:01:02.526+01:00" to "20140115070102526" you can use the translate() function (this would work with XSLT 1.0):
<xsl:value-of select="translate(substring(string(current-dateTime()), 1, 23), '-:T.', '')"/>
Alternatively, you can calculate the duration between the current date and time and an arbitrary start and divide this duration by a millisecond to get the number of milliseconds since that date:
<xsl:value-of select="(current-dateTime() - xs:dateTime('1971-01-01T00:00:00')) div xs:dayTimeDuration('PT0.001S')"/>
In XSLT 2.0, you can use the format-dateTime() function and reformat the date: just use a picture parameter with no separators.
Alternatively - assuming all component values are already padded with zeros to maintain a fixed length - you could simply translate the separator characters away.

How to convert text to date, then get year value?

Need to convert a text value `2012-03-19' into a date type, then extract the year component.
<xsl:variable name="dateValue" select="2012-03-19"/>
<xsl:variable name="year" select="year-from-date(date($dateValue))"/>
I'm using Saxon 2.0, but it's complaining date function does not exist; I looked around Saxon's documentation and could not find the function, so that's clearly the issue, but I can't find a suitable replacement.
I don't think date() should be a function, you want the xs:date() data type.
Add the xs namespace and then prefix xs:date().
The following stylesheet, using any well-formed XML input, will produce 2012:
<xsl:stylesheet version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="dateValue" select="'2012-03-19'"/>
<xsl:variable name="year" select="year-from-date(xs:date($dateValue))"/>
<xsl:value-of select="$year"/>
</xsl:template>
</xsl:stylesheet>
Note that you also have to quote your select in your "dateValue" xsl:variable.