How to specify multiple files as xmlSource in xslt task? - visual-studio-code

I am currently transforming xml to html with an xslt stylesheet. I use VS Code with the XSLT/XPath extension and have a tasks.json to do the transformation. It works fine for a single xml file, but I actually have multiple files, for which the same template applys and which should all be in the same html file.
My working tasks.json for a single file:
{
"version": "2.0.0",
"tasks": [
{
"type": "xslt-js",
"label": "Saxon-JS Transform (Briefe)",
"xsltFile": "briefe/tei_to_html.xsl",
"xmlSource": "briefe/3690096/3690096.xml",
"resultPath": "edition.html",
"group": {
"kind": "build",
"isDefault": true
},
"problemMatcher": [
"$saxon-xslt-js"
]
}
]
}
I tried to use wildcards for the source "briefe/**/*.xml", which results in an error: Failed to read XML source input (Failed to read ... /briefe/**/*.xml (no such file))
An array of the files (it's only a handful, so I could type them all in) is not possible, because xmlSource only accepts string.
I suppose I am able to handle multiple xml files in my stylesheet with for-each and document, but how do I specify multiple files as source in my task?
Update:
The goal is to process multiple xml files with one XSLT, to produce one result file.
I figured out, that I could feed a json as source:
...
"useJsonSource": true,
"xmlSource": "briefe/list.json",
...
I am trying to find out if I can process the json in xslt and this is the solution.

Assuming the list.json is e.g. an array of strings with file names ["file1.xml", "file2.xml", "file3.xml"] your XSLT 3 would process e.g.
<xsl:template match=".[. instance of array(xs:string)]">
<xsl:apply-templates select="?*!doc(.)"/>
</xsl:template>
Depending on the rest of your stylesheet you might want to output the HTML document structure around that xsl:apply-templates e.g.
<xsl:template match=".[. instance of array(xs:string)]">
<html>
<head>
<title>Example</title>
</head>
<body>
<xsl:apply-templates select="?*!doc(.)"/>
</body>
</html>
</xsl:template>
While SaxonJS certainly has a -json option to process a JSON input file I haven't checked how/whether that interacts nicely with the xslt-js task in VS Code so be prepared to experiment how it works.
From the command line you would run e.g. xslt3 -json:list.json -xsl:xslt.xsl to have a list.json containing a JSON array with a string of file names processed by the stylesheet xslt.xsl where the template shown above would match that array and push each file named as a document node created with the doc function to any of your other templates.
Another option would be to pass the file names as a global parameter and to start with the named template name="xsl:initial-template", option -it to do e.g.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:param name="file-names" as="xs:string*" select="'file1.xml', 'file2.xml', 'file3.xml'"/>
<xsl:template name="xsl:initial-template">
<html>
<head>
<title>Example</title>
</head>
<body>
<xsl:apply-templates select="$file-names!doc(.)"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

I solved this task with building a collection in a separate file letters.xml and used this as xmlSource.
The collection:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.w3.org/2001/XMLSchema"?>
<collection stable="true">
<doc href="letter_1.xml"/>
<doc href="letter_2.xml"/>
</collection>
With the last comment I switched from xslt-js to xslt, because using xslt-js was not a deliberate decision.
My tasks.json:
{
"version": "2.0.0",
"tasks": [
{
"type": "xslt",
"label": "xslt: Saxon Transform (Briefe Ebner-Eschenbach)",
"saxonJar": "SaxonHE11-3J/saxon-he-11.3.jar",
"xsltFile": "tei_to_html.xsl",
"xmlSource": "letters.xml",
"resultPath": "index.html",
"group": {
"kind": "build",
"isDefault": true
},
"problemMatcher": [
"$saxon-xslt"
]
}
]
}
In my tei_to_html.xsl I use the collection in a variable and create pages for the letters:
<xsl:variable name="letters" select="collection('briefe.xml')"/>
[...]
<xsl:for-each select="$letters">
<xsl:result-document href="{concat('letter_', position(),'.html')}" method="html">
<xsl:call-template name="letter_page"/>
</xsl:result-document>
</xsl:for-each>
Thank you all for your help and input! I definitely learned a lot.

Related

Burst Mode Vs Fully Streaming XSLT

I have written an XSLT to transform a huge incoming XML file to JSON using burst mode streaming. I am new to XSLT and have heard that there is a better way of fully streaming XSLT code which is more efficient and faster then burst mode.
Can someone please help me understand -
1. What is the difference between burst mode vs Fully streaming ?
2. How can i convert below XSLT code to fully streaming to improve the perfomance?
Below is my burst mode XSLT code -
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect" exclude-result-prefixes="xs" version="3.0">
<xsl:mode streamable="yes" on-no-match="shallow-skip"/>
<xsl:output method="text" encoding="UTF-8" indent="no"/>
<xsl:template match="wd:Report_Data">
<xsl:iterate select="wd:Report_Entry/copy-of()">
<!--Define Running Totals for Statistics -->
<xsl:param name="TotalHeaderCount" select="0"/>
<xsl:param name="TotalLinesCount" select="0"/>
<!--Write Statistics -->
<xsl:on-completion>
<xsl:text>{"Stats": </xsl:text>
<xsl:text>{"Total Header Count": </xsl:text>
<xsl:value-of select="$TotalHeaderCount"/>
<xsl:text>,</xsl:text>
<xsl:text>"Total Lines Count": </xsl:text>
<xsl:value-of select="$TotalLinesCount"/>
<xsl:text>}}</xsl:text>
</xsl:on-completion>
<!--Write Header Details -->
<xsl:text>{"id": "</xsl:text>
<xsl:value-of select="wd:id"/>
<xsl:text>",</xsl:text>
<xsl:text>"revenue_stream": "</xsl:text>
<xsl:value-of select="wd:revenue_stream"/>
<xsl:text>",</xsl:text>
<!--Write Line Details -->
<xsl:text>"lines": [ </xsl:text>
<!-- Count the number of lines for an invoice -->
<xsl:variable name="Linescount" select="wd:total_lines"/>
<xsl:iterate select="wd:lines">
<xsl:text> {</xsl:text>
<xsl:text>"sequence": </xsl:text>
<xsl:value-of select="wd:sequence"/>
<xsl:text>,</xsl:text>
<xsl:text>"sales_item_id": "</xsl:text>
<xsl:value-of select="wd:sales_item_id"/>
<xsl:text>",</xsl:text>
</xsl:iterate>
<xsl:text>}]}
</xsl:text>
<!--Store Running Totals -->
<xsl:next-iteration>
<xsl:with-param name="TotalHeaderCount" select="$TotalHeaderCount + 1"/>
<xsl:with-param name="TotalLinesCount" select="$TotalLinesCount + $Linescount"/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
</xsl:stylesheet>
Here is the sample XML -
<?xml version="1.0" encoding="UTF-8"?>
<wd:Report_Data xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect">
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-1</wd:id>
<wd:revenue_stream>TESTA</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Administrative Cost</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-10</wd:id>
<wd:revenue_stream>TESTB</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Data - Web Access</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
</wd:Report_Data>
If the order of properties in the JSON doesn't matter then you could directly create XSLT/XPath 3 maps and arrays with xsl:map/xsl:map-entry (or the XPath 3.1 map constructor) and the Saxon specific extension element saxon:array (unfortunately the XSLT 3 language standard lacks an instruction to create an array). Furthermore most of your iteration parameters seem to be easily implemented as accumulators:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
extension-element-prefixes="saxon"
xpath-default-namespace="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="adaptive" indent="yes"/>
<xsl:mode streamable="yes" use-accumulators="#all" on-no-match="shallow-skip"/>
<xsl:accumulator name="header-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry" select="$value + 1"/>
</xsl:accumulator>
<xsl:accumulator name="lines-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry/total_lines/text()" select="$value + xs:integer(.)"/>
</xsl:accumulator>
<xsl:template match="Report_Data">
<xsl:apply-templates/>
<xsl:sequence
select="map {
'Stats': map {
'Total Header Count' : accumulator-after('header-count'),
'Total Lines Count' : accumulator-after('lines-count')
}
}"/>
</xsl:template>
<xsl:template match="Report_Entry">
<xsl:map>
<xsl:apply-templates/>
</xsl:map>
</xsl:template>
<xsl:template match="Report_Entry/id | Report_Entry/revenue_stream | lines/sequence | lines/sales_item_id">
<xsl:map-entry key="local-name()" select="string()"/>
</xsl:template>
<xsl:template match="Report_Entry/lines">
<xsl:map-entry key="local-name()">
<saxon:array>
<xsl:apply-templates/>
</saxon:array>
</xsl:map-entry>
</xsl:template>
</xsl:stylesheet>
The example uses output method adaptive as your current sample doesn't create a single JSON object and I have simply tried to create the same output as your current code; the JSON output method would need a single map or array as the main sequence result.
Code works with streaming and Saxon EE 9.9.1.1 in oXygen, unfortunately 9.8 doesn't consider the code streamable.
As for general rules, there are limits as to what you can achieve with accumulators and template matching when streaming; as you can see, the accumulator to sum up the values from the total_lines elements needs to match on the text child to not consume the element in the accumulator (Saxon has another extension of capturing accumulators to ease such tasks however).
So far I would rather say it is more important to find a way to get around the streamability analysis and to have the streamable code return the right and same result as the non-streamable code; for instance, while experimenting with an approach to generate JSON with streaming using two transformation steps where some sample data similar to yours is the input, the XML representation for JSON the result of the first transformation and the JSON supposed to be the result of using xml-to-json on the first step's result I ran into a Saxon bug https://saxonica.plan.io/issues/4215.
With streaming, it seems there is not enough test coverage or implementation maturity to be able to combine features reliably in a complex and scalable way, partly due to a complex spec, partly due to the limited use of that stuff by the XSLT community.
So if you find a working way for a particular problem to use streaming to keep memory consumption lower or manageable compared to the normal XSLT 2/3 tree based processing then you can of course experiment with changes to improve performance but it is easy to break things.
One general observation is that streaming allows you to access all attributes of the currently processed/matched element but not its children, therefore it can help immensely to insert a processing steps that transforms elements into attributes if you have a simple child element structure. That way you can then often avoid any copy-of(). But of course you need a way to combine two stylesheets which Saxon allows with its API but doing it requires writing Java or .NET code.

approach for generating cluster of xpaths from html files in perl

As per one of project requirement, we have to come up with an approach and also suitable perl resource for fullfilling the following requirement.
Our requirement is that we get bulk of input data in html format and we have to write a parser to bring all the varities of input data into one generic formatted xml file.
now the challenge is that while writing parser we manually used to verify very few sample html files, that was failing for other sample html files, but is highly difficult to analyse all html files manually.
Due to the above we have come to a decision that we will have some analysis tool where all the variations in html structure can be monitored using xpaths.
Can you please any of you suggest which module is suitable for my work, I know HTML::TreeBuilder::Xpath will help me in giving xpaths, but any limitations in that module?
or else suggest me the best approach for understanding the analysis of the html file, most of us in our team are more familiar with perl, that's why prefer to go and write in perl.
Suggest me if any other technology can also be used more efficiently, or else within perl also what can be the best approach?
If the html files are guaranteed to be well-formed xhtml, I'd take a look into xslt for transforming the input.
I'd check the output for missing data, since the structure of the input file is unknown. After transforming you can check for the missing data at known paths in the document.
a example how xslt works:
the html input file
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<div class="foo">1</div>
<div>
<div class="bar">2</div>
</div>
</body>
</html>
the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" />
<xsl:template match="/">
<!-- match the document root -->
<data>
<!-- apply all templates to the input document -->
<xsl:apply-templates />
</data>
</xsl:template>
<xsl:template match="/html/body/div[#class='foo']">
<!-- div with attribute class='foo' is expected only at /html/body/div -->
<foo-out><xsl:value-of select="."></xsl:value-of></foo-out>
</xsl:template>
<xsl:template match="//div[#class='bar']">
<!--div with attribute class=bar can be anywhere in the document-->
<bar-out><xsl:value-of select="."></xsl:value-of></bar-out>
</xsl:template>
</xsl:stylesheet>
the output
<?xml version="1.0"?>
<data>
<foo-out>1</foo-out>
<bar-out>2</bar-out>
</data>

Umbraco XSLT RenderTemplate Woes

Needing a little guidance with XSLT and Umbraco. Fairly new to XSLT but I think I understand the concepts. Right on one page I have two columns, each with their own separate pickable content. This is done via the standard content picker property (one for each column). The issue is that I need to be able to have two different templates on the page. So in essence the page navigated two which has the columns has to render two of it's child items in its own page.
I have this working with one column using a generic XSLT which I found that basically just renders what ever child item it finds, but I want it to render what ever one the user picked.
I know the Content Picker returns the XML Node ID of the page and that can be used with the Render Template function to display it (I have an example of that), but when it comes to adding in my own properties and then passing them to the RenderTemplate function I get lost. New to this XSLT :).
So far I have...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
xmlns:Exslt.ExsltCommon="urn:Exslt.ExsltCommon"
xmlns:Exslt.ExsltDatesAndTimes="urn:Exslt.ExsltDatesAndTimes"
xmlns:Exslt.ExsltMath="urn:Exslt.ExsltMath"
xmlns:Exslt.ExsltRegularExpressions="urn:Exslt.ExsltRegularExpressions"
xmlns:Exslt.ExsltStrings="urn:Exslt.ExsltStrings"
xmlns:Exslt.ExsltSets="urn:Exslt.ExsltSets">
<xsl:output method="html" omit-xml-declaration="yes"/>
<xsl:param name="currentPage"/>
<xsl:variable name="nodeID" select="data[#alias='leftColumn']"/>
<xsl:template match="/">
<xsl:value-of select="umbraco.library:RenderTemplate($nodeID)" disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>
Any one know why this doesn't work and how to do what I'm after? The above gives a value was either too large or too small error.
You actually have two problems here ...
Calling RenderTemplate()
RenderTemplate actually requires two arguments when using an alternative template, the first being the content node ID, and the second being the chosen template you want applied.
<xsl:value-of
select="umbraco.library:RenderTemplate($nodeID, $templateID)"
disable-output-escaping="yes" />
See the following link for more information : http://our.umbraco.org/wiki/reference/umbracolibrary/rendertemplate
Too large or too small error
This one is simple to fix by putting an if-empty statement around the the code in question.
<xsl:if test="$nodeID != ''">
<xsl:value-of
select="umbraco.library:RenderTemplate($nodeID, $templateID)"
disable-output-escaping="yes" />
</xsl:if>
The XSLT parser likes to assume certain values are null, when in reality they aren't. Another way to get by this is to check the Skip Errors checkbox when saving, but this makes debugging actual erroneous code a bit of a pain.
Hope it helps.

How to get a part of html file?

I just want to get a part of an website within all the html-tags:
<table></table>
...
<div><font>some <b>kind</b> of <i>individual</i> text I need</font></div>
...
<div>other things I don't need</div>
-> I only want this: <font>some <b>kind</b> of <i>individual</i> text I need</font>
My goal is it to display this part with bold tags and images in a UIWebView. I've tried some XPath parser but these skipped the tags which I wanted to display in the web view.
On Stackoverflow I found a solution with java script: extract-part-of-html-in-c-objective-c but I don't know how this could help me in my ios application
Hopefully someone can help me
You may find this useful: (see the Demo inside this article)
http://api.jquery.com/html/
Its almost everything that you need, except the "make tags bold" part
update: includes getting content from separate url
http://api.jquery.com/jQuery.get/
$.get("http://www.website_i_need_to_parce.com", function(data){
/// work with "data" variable as you work with "document"
var htmlStr = data.html().find('#someDiv');
});
After this call - htmlStr will contain contents of the div with id="someDiv". If you need to paste these contents as html - use:
$('#div_on_my_site_where_I_Want_to_paste_code').text(htmlStr);
Supposing the context node is the parent of the div and the div is the first div child of the context node (You haven't provide the complete source XML !!!), this XPath expression selects the wanted nodes:
div[1]/node()
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/t">
<xsl:copy-of select="div[1]/node()"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML fragment (wrapped into a single top node to make it a well-formed XML document):
<t>
<table></table>
...
<div>
<font>some
<b>kind</b> of
<i>individual</i> text I need
</font>
</div>
...
<div>other things I don't need</div>
</t>
the wanted, correct result is produced:
<font>some
<b>kind</b> of
<i>individual</i> text I need
</font>
Explanation: The XPath expression above selects all children nodes of the first div child of the context node. This is exactly what is wanted: all children of the div element but excluding the div element itself.

How to add a colorized HTML code snippet in Sandcastle documentation?

I am using the Sandcastle Help File Builder and would like to include colorized HTML code snippets in the "Conceptual Content". Is this possible and if so, how?
I have tried <code>, <codeExample>, and <sampleCode language="HTML" />.
The best result so far is to HTML-encode the sample HTML and place it in a .snippets file like so.
<?xml version="1.0" encoding="utf-8" ?>
<examples>
<item id="htmlSnippet">
<sampleCode language="HTML">
<span>My Html</span>
</sampleCode>
</item>
</examples>
Then reference it in the .aml file.
<codeReference>htmlSnippet</codeReference>
I would prefer to have it colorized, but I can't figure out a way to add the formatting.
According to the MAML Guide, the proper way of doing this is to use a <code> tag with a CDATA section:
<code language="xml" title="Example Configuration">
<![CDATA[
<span>My Html</span>]]>
</code>
The contents of the CDATA section will be treated as a literal string, and indentation will be preserved.
I know this is old, but Sandcastle supports html as xml. I figured I should comment in case anyone else comes across this post as I did.
This should work:
<?xml version="1.0" encoding="utf-8" ?>
<examples>
<item id="htmlSnippet">
<sampleCode language="xml"><!CDATA[[
<span>My Html</span>
]]>
</sampleCode>
</item>
</examples>
If you're using Sandcastle Help File Builder, you may create your own syntax parser as described here and here, although xml is available by default... using the XAML filter's generator, which is defined here if you want to look at the config:
<generator type="Microsoft.Ddue.Tools.XamlUsageSyntaxGenerator"
assembly="{#SandcastlePath}ProductionTools\SyntaxComponents.dll">
<filter files="{#SandcastlePath}Presentation\Shared\configuration\xamlSyntax.config" />
</generator>
According to the SHFB documentation for the Code Block Component, you should be able to just use the <code>.
I got it to work without a problem; here's what I did:
test.html
<html>
<head>Something!</head>
<body>
<h1>Heading</h1>
<!-- #region myhtml -->
<p>Paragraph</p>
<div>Div for <strong>Good</strong> <em>measure</em>.</div>
<!-- #endregion -->
</body>
</html>
SomethingorOther.aml
<code language="html" source="../Examples/test.html" region="myhtml" />
Result:
Please note that in the preview, your sample will appear as unhighlighted XML, but when you build the documentation, everything should like just fine.