Burst Mode Vs Fully Streaming XSLT - streaming

I have written an XSLT to transform a huge incoming XML file to JSON using burst mode streaming. I am new to XSLT and have heard that there is a better way of fully streaming XSLT code which is more efficient and faster then burst mode.
Can someone please help me understand -
1. What is the difference between burst mode vs Fully streaming ?
2. How can i convert below XSLT code to fully streaming to improve the perfomance?
Below is my burst mode XSLT code -
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect" exclude-result-prefixes="xs" version="3.0">
<xsl:mode streamable="yes" on-no-match="shallow-skip"/>
<xsl:output method="text" encoding="UTF-8" indent="no"/>
<xsl:template match="wd:Report_Data">
<xsl:iterate select="wd:Report_Entry/copy-of()">
<!--Define Running Totals for Statistics -->
<xsl:param name="TotalHeaderCount" select="0"/>
<xsl:param name="TotalLinesCount" select="0"/>
<!--Write Statistics -->
<xsl:on-completion>
<xsl:text>{"Stats": </xsl:text>
<xsl:text>{"Total Header Count": </xsl:text>
<xsl:value-of select="$TotalHeaderCount"/>
<xsl:text>,</xsl:text>
<xsl:text>"Total Lines Count": </xsl:text>
<xsl:value-of select="$TotalLinesCount"/>
<xsl:text>}}</xsl:text>
</xsl:on-completion>
<!--Write Header Details -->
<xsl:text>{"id": "</xsl:text>
<xsl:value-of select="wd:id"/>
<xsl:text>",</xsl:text>
<xsl:text>"revenue_stream": "</xsl:text>
<xsl:value-of select="wd:revenue_stream"/>
<xsl:text>",</xsl:text>
<!--Write Line Details -->
<xsl:text>"lines": [ </xsl:text>
<!-- Count the number of lines for an invoice -->
<xsl:variable name="Linescount" select="wd:total_lines"/>
<xsl:iterate select="wd:lines">
<xsl:text> {</xsl:text>
<xsl:text>"sequence": </xsl:text>
<xsl:value-of select="wd:sequence"/>
<xsl:text>,</xsl:text>
<xsl:text>"sales_item_id": "</xsl:text>
<xsl:value-of select="wd:sales_item_id"/>
<xsl:text>",</xsl:text>
</xsl:iterate>
<xsl:text>}]}
</xsl:text>
<!--Store Running Totals -->
<xsl:next-iteration>
<xsl:with-param name="TotalHeaderCount" select="$TotalHeaderCount + 1"/>
<xsl:with-param name="TotalLinesCount" select="$TotalLinesCount + $Linescount"/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
</xsl:stylesheet>
Here is the sample XML -
<?xml version="1.0" encoding="UTF-8"?>
<wd:Report_Data xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect">
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-1</wd:id>
<wd:revenue_stream>TESTA</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Administrative Cost</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-10</wd:id>
<wd:revenue_stream>TESTB</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Data - Web Access</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
</wd:Report_Data>

If the order of properties in the JSON doesn't matter then you could directly create XSLT/XPath 3 maps and arrays with xsl:map/xsl:map-entry (or the XPath 3.1 map constructor) and the Saxon specific extension element saxon:array (unfortunately the XSLT 3 language standard lacks an instruction to create an array). Furthermore most of your iteration parameters seem to be easily implemented as accumulators:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
extension-element-prefixes="saxon"
xpath-default-namespace="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="adaptive" indent="yes"/>
<xsl:mode streamable="yes" use-accumulators="#all" on-no-match="shallow-skip"/>
<xsl:accumulator name="header-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry" select="$value + 1"/>
</xsl:accumulator>
<xsl:accumulator name="lines-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry/total_lines/text()" select="$value + xs:integer(.)"/>
</xsl:accumulator>
<xsl:template match="Report_Data">
<xsl:apply-templates/>
<xsl:sequence
select="map {
'Stats': map {
'Total Header Count' : accumulator-after('header-count'),
'Total Lines Count' : accumulator-after('lines-count')
}
}"/>
</xsl:template>
<xsl:template match="Report_Entry">
<xsl:map>
<xsl:apply-templates/>
</xsl:map>
</xsl:template>
<xsl:template match="Report_Entry/id | Report_Entry/revenue_stream | lines/sequence | lines/sales_item_id">
<xsl:map-entry key="local-name()" select="string()"/>
</xsl:template>
<xsl:template match="Report_Entry/lines">
<xsl:map-entry key="local-name()">
<saxon:array>
<xsl:apply-templates/>
</saxon:array>
</xsl:map-entry>
</xsl:template>
</xsl:stylesheet>
The example uses output method adaptive as your current sample doesn't create a single JSON object and I have simply tried to create the same output as your current code; the JSON output method would need a single map or array as the main sequence result.
Code works with streaming and Saxon EE 9.9.1.1 in oXygen, unfortunately 9.8 doesn't consider the code streamable.
As for general rules, there are limits as to what you can achieve with accumulators and template matching when streaming; as you can see, the accumulator to sum up the values from the total_lines elements needs to match on the text child to not consume the element in the accumulator (Saxon has another extension of capturing accumulators to ease such tasks however).
So far I would rather say it is more important to find a way to get around the streamability analysis and to have the streamable code return the right and same result as the non-streamable code; for instance, while experimenting with an approach to generate JSON with streaming using two transformation steps where some sample data similar to yours is the input, the XML representation for JSON the result of the first transformation and the JSON supposed to be the result of using xml-to-json on the first step's result I ran into a Saxon bug https://saxonica.plan.io/issues/4215.
With streaming, it seems there is not enough test coverage or implementation maturity to be able to combine features reliably in a complex and scalable way, partly due to a complex spec, partly due to the limited use of that stuff by the XSLT community.
So if you find a working way for a particular problem to use streaming to keep memory consumption lower or manageable compared to the normal XSLT 2/3 tree based processing then you can of course experiment with changes to improve performance but it is easy to break things.
One general observation is that streaming allows you to access all attributes of the currently processed/matched element but not its children, therefore it can help immensely to insert a processing steps that transforms elements into attributes if you have a simple child element structure. That way you can then often avoid any copy-of(). But of course you need a way to combine two stylesheets which Saxon allows with its API but doing it requires writing Java or .NET code.

Related

approach for generating cluster of xpaths from html files in perl

As per one of project requirement, we have to come up with an approach and also suitable perl resource for fullfilling the following requirement.
Our requirement is that we get bulk of input data in html format and we have to write a parser to bring all the varities of input data into one generic formatted xml file.
now the challenge is that while writing parser we manually used to verify very few sample html files, that was failing for other sample html files, but is highly difficult to analyse all html files manually.
Due to the above we have come to a decision that we will have some analysis tool where all the variations in html structure can be monitored using xpaths.
Can you please any of you suggest which module is suitable for my work, I know HTML::TreeBuilder::Xpath will help me in giving xpaths, but any limitations in that module?
or else suggest me the best approach for understanding the analysis of the html file, most of us in our team are more familiar with perl, that's why prefer to go and write in perl.
Suggest me if any other technology can also be used more efficiently, or else within perl also what can be the best approach?
If the html files are guaranteed to be well-formed xhtml, I'd take a look into xslt for transforming the input.
I'd check the output for missing data, since the structure of the input file is unknown. After transforming you can check for the missing data at known paths in the document.
a example how xslt works:
the html input file
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<div class="foo">1</div>
<div>
<div class="bar">2</div>
</div>
</body>
</html>
the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" />
<xsl:template match="/">
<!-- match the document root -->
<data>
<!-- apply all templates to the input document -->
<xsl:apply-templates />
</data>
</xsl:template>
<xsl:template match="/html/body/div[#class='foo']">
<!-- div with attribute class='foo' is expected only at /html/body/div -->
<foo-out><xsl:value-of select="."></xsl:value-of></foo-out>
</xsl:template>
<xsl:template match="//div[#class='bar']">
<!--div with attribute class=bar can be anywhere in the document-->
<bar-out><xsl:value-of select="."></xsl:value-of></bar-out>
</xsl:template>
</xsl:stylesheet>
the output
<?xml version="1.0"?>
<data>
<foo-out>1</foo-out>
<bar-out>2</bar-out>
</data>

current-dateTime() to number

I need to convert the current-dateTime() to a number, because the output only allows an integer (or long). How can I do that?
My desired output:
<?xml version="1.0" encoding="utf-8"?>
<currentDate>NaN</currentDate>
My XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" version="2.0">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:variable name="version" select="'1.0'"/>
<xsl:template match="/">
<xsl:element name="currentDate">
<xsl:value-of select="number(current-dateTime())"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
But I get this here:
<?xml version="1.0" encoding="utf-8"?>
<currentDate>NaN</currentDate>
When I remove the number() I do get the timestamp, but I need to have a clean number.
<?xml version="1.0" encoding="utf-8"?>
<currentDate>2014-01-15T07:01:02.526+01:00</currentDate>
How can I convert the timestamp to a clean number?
As mentioned, to convert "2014-01-15T07:01:02.526+01:00" to "20140115070102526" you can use the translate() function (this would work with XSLT 1.0):
<xsl:value-of select="translate(substring(string(current-dateTime()), 1, 23), '-:T.', '')"/>
Alternatively, you can calculate the duration between the current date and time and an arbitrary start and divide this duration by a millisecond to get the number of milliseconds since that date:
<xsl:value-of select="(current-dateTime() - xs:dateTime('1971-01-01T00:00:00')) div xs:dayTimeDuration('PT0.001S')"/>
In XSLT 2.0, you can use the format-dateTime() function and reformat the date: just use a picture parameter with no separators.
Alternatively - assuming all component values are already padded with zeros to maintain a fixed length - you could simply translate the separator characters away.

How to convert text to date, then get year value?

Need to convert a text value `2012-03-19' into a date type, then extract the year component.
<xsl:variable name="dateValue" select="2012-03-19"/>
<xsl:variable name="year" select="year-from-date(date($dateValue))"/>
I'm using Saxon 2.0, but it's complaining date function does not exist; I looked around Saxon's documentation and could not find the function, so that's clearly the issue, but I can't find a suitable replacement.
I don't think date() should be a function, you want the xs:date() data type.
Add the xs namespace and then prefix xs:date().
The following stylesheet, using any well-formed XML input, will produce 2012:
<xsl:stylesheet version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="dateValue" select="'2012-03-19'"/>
<xsl:variable name="year" select="year-from-date(xs:date($dateValue))"/>
<xsl:value-of select="$year"/>
</xsl:template>
</xsl:stylesheet>
Note that you also have to quote your select in your "dateValue" xsl:variable.

Umbraco XSLT RenderTemplate Woes

Needing a little guidance with XSLT and Umbraco. Fairly new to XSLT but I think I understand the concepts. Right on one page I have two columns, each with their own separate pickable content. This is done via the standard content picker property (one for each column). The issue is that I need to be able to have two different templates on the page. So in essence the page navigated two which has the columns has to render two of it's child items in its own page.
I have this working with one column using a generic XSLT which I found that basically just renders what ever child item it finds, but I want it to render what ever one the user picked.
I know the Content Picker returns the XML Node ID of the page and that can be used with the Render Template function to display it (I have an example of that), but when it comes to adding in my own properties and then passing them to the RenderTemplate function I get lost. New to this XSLT :).
So far I have...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
xmlns:Exslt.ExsltCommon="urn:Exslt.ExsltCommon"
xmlns:Exslt.ExsltDatesAndTimes="urn:Exslt.ExsltDatesAndTimes"
xmlns:Exslt.ExsltMath="urn:Exslt.ExsltMath"
xmlns:Exslt.ExsltRegularExpressions="urn:Exslt.ExsltRegularExpressions"
xmlns:Exslt.ExsltStrings="urn:Exslt.ExsltStrings"
xmlns:Exslt.ExsltSets="urn:Exslt.ExsltSets">
<xsl:output method="html" omit-xml-declaration="yes"/>
<xsl:param name="currentPage"/>
<xsl:variable name="nodeID" select="data[#alias='leftColumn']"/>
<xsl:template match="/">
<xsl:value-of select="umbraco.library:RenderTemplate($nodeID)" disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>
Any one know why this doesn't work and how to do what I'm after? The above gives a value was either too large or too small error.
You actually have two problems here ...
Calling RenderTemplate()
RenderTemplate actually requires two arguments when using an alternative template, the first being the content node ID, and the second being the chosen template you want applied.
<xsl:value-of
select="umbraco.library:RenderTemplate($nodeID, $templateID)"
disable-output-escaping="yes" />
See the following link for more information : http://our.umbraco.org/wiki/reference/umbracolibrary/rendertemplate
Too large or too small error
This one is simple to fix by putting an if-empty statement around the the code in question.
<xsl:if test="$nodeID != ''">
<xsl:value-of
select="umbraco.library:RenderTemplate($nodeID, $templateID)"
disable-output-escaping="yes" />
</xsl:if>
The XSLT parser likes to assume certain values are null, when in reality they aren't. Another way to get by this is to check the Skip Errors checkbox when saving, but this makes debugging actual erroneous code a bit of a pain.
Hope it helps.

XSLT2 doesn't import user functions?

I've just found that if main.xsl xsl:imports lib.xsl which defines some XSLT2 functions, those functions can't be used in main.xsl.
Error: There's no function in namespace http://foo.bar/my-library-ns
However, xsl:include does import those functions.
This is no reproducible
This stylesheet:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:local="http://localhost/">
<xsl:import href="lib.xsl"/>
<xsl:template match="/">
<xsl:value-of select="local:function()"/>
</xsl:template>
</xsl:stylesheet>
With this imported module:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:local="http://localhost/"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:function name="local:function" as="xs:boolean">
<xsl:sequence select="true()"/>
</xsl:function>
</xsl:stylesheet>
Output*:
true
* Tested on Saxon and Altova. (My XQSharp has expired...)
I've just found that if main.xsl
xsl:imports lib.xsl which defines some
XSLT2 functions, those functions can't
be used in main.xsl
Of course, this isn't true.
Do have a look at the FXSL library, whose templates contain functions that must be imported by the using XSLT code. The stylesheet modules of FXSL in fact import other stylesheet modules and use their functions.
Here is a simple example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="f"
>
<xsl:import href="../f/func-foldl.xsl"/>
<xsl:import href="../f/func-Operators.xsl"/>
<!--
This transformation calculates 10!
Expected result: 3628800 or 3.6288E6
-->
<xsl:output encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:value-of select="f:foldl(f:mult(), 1, 1 to 10 )"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any XML document (not used), it calculates the 10! (10 factorial) value:
3628800
All three XSLT 2.0 processors I am working with: Saxon 9, AltovaXML and XQSharp produce this same result.
Here is a partial view at the import hierarchy of the above transformation:
As we can see, only less than half of all imports are shown... :)
Of course here we only see the imported templates, but in every imported stylesheet module, for every template there are at least two <xsl:function>s.