How to use Group By in Marklogic? - group-by

I want to use Group By in xquery. Can someone tell me please how to use Group By in Marklogic ?

Alternatively, you could call out to XSLT using xdmp:xslt-invoke or xdmp:xslt-eval. MarkLogic's XSLT processor supports XSLT 2.0, which includes full support for <xsl:for-each-group>.

The short answer is to use map:map. See http://docs.marklogic.com/map:map for documentation, and http://blakeley.com/blogofile/archives/560/ for a longer discussion.

xquery version "1.0-ml";
let $xml:= <Students>
<Student Country="England" Name="Dan" Age="20" Class="C"/>
<Student Country="England" Name="Maria" Age="20" Class="B" />
<Student Country="Australia" Name="Mark" Age="22" Class="A" />
</Students>
let $xsl:= <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:template match="Students">
<result>
<xsl:for-each-group select="Student" group-by="#Country">
<country>
<xsl:attribute name="name"><xsl:value-of select="fn:current-grouping-key()"/></xsl:attribute>
<xsl:for-each select="fn:current-group()/#Name">
<name><xsl:value-of select="."/></name>
</xsl:for-each>
</country>
</xsl:for-each-group>
</result>
</xsl:template>
</xsl:stylesheet>
return xdmp:xslt-eval($xsl,$xml)

MarkLogic covers parts of XQuery 3.0 (with its 1.0-ml dialect), but unfortunately FLWOR group by support is lacking.
However, you can still programmatically create group by like syntax which will achieve the same results. Here is an XQuery example:
for $d in distinct-values(doc("order.xml")//item/#dept)
let $items := doc("order.xml")//item[#dept = $d]
order by $d
return <department code="{$d}">{
for $i in $items
order by $i/#num
return $i
}</department>
HTH

Related

Burst Mode Vs Fully Streaming XSLT

I have written an XSLT to transform a huge incoming XML file to JSON using burst mode streaming. I am new to XSLT and have heard that there is a better way of fully streaming XSLT code which is more efficient and faster then burst mode.
Can someone please help me understand -
1. What is the difference between burst mode vs Fully streaming ?
2. How can i convert below XSLT code to fully streaming to improve the perfomance?
Below is my burst mode XSLT code -
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect" exclude-result-prefixes="xs" version="3.0">
<xsl:mode streamable="yes" on-no-match="shallow-skip"/>
<xsl:output method="text" encoding="UTF-8" indent="no"/>
<xsl:template match="wd:Report_Data">
<xsl:iterate select="wd:Report_Entry/copy-of()">
<!--Define Running Totals for Statistics -->
<xsl:param name="TotalHeaderCount" select="0"/>
<xsl:param name="TotalLinesCount" select="0"/>
<!--Write Statistics -->
<xsl:on-completion>
<xsl:text>{"Stats": </xsl:text>
<xsl:text>{"Total Header Count": </xsl:text>
<xsl:value-of select="$TotalHeaderCount"/>
<xsl:text>,</xsl:text>
<xsl:text>"Total Lines Count": </xsl:text>
<xsl:value-of select="$TotalLinesCount"/>
<xsl:text>}}</xsl:text>
</xsl:on-completion>
<!--Write Header Details -->
<xsl:text>{"id": "</xsl:text>
<xsl:value-of select="wd:id"/>
<xsl:text>",</xsl:text>
<xsl:text>"revenue_stream": "</xsl:text>
<xsl:value-of select="wd:revenue_stream"/>
<xsl:text>",</xsl:text>
<!--Write Line Details -->
<xsl:text>"lines": [ </xsl:text>
<!-- Count the number of lines for an invoice -->
<xsl:variable name="Linescount" select="wd:total_lines"/>
<xsl:iterate select="wd:lines">
<xsl:text> {</xsl:text>
<xsl:text>"sequence": </xsl:text>
<xsl:value-of select="wd:sequence"/>
<xsl:text>,</xsl:text>
<xsl:text>"sales_item_id": "</xsl:text>
<xsl:value-of select="wd:sales_item_id"/>
<xsl:text>",</xsl:text>
</xsl:iterate>
<xsl:text>}]}
</xsl:text>
<!--Store Running Totals -->
<xsl:next-iteration>
<xsl:with-param name="TotalHeaderCount" select="$TotalHeaderCount + 1"/>
<xsl:with-param name="TotalLinesCount" select="$TotalLinesCount + $Linescount"/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
</xsl:stylesheet>
Here is the sample XML -
<?xml version="1.0" encoding="UTF-8"?>
<wd:Report_Data xmlns:wd="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect">
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-1</wd:id>
<wd:revenue_stream>TESTA</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Administrative Cost</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
<wd:Report_Entry>
<wd:id>CUSTOMER_INVOICE-6-10</wd:id>
<wd:revenue_stream>TESTB</wd:revenue_stream>
<wd:total_lines>1</wd:total_lines>
<wd:lines>
<wd:sequence>ab</wd:sequence>
<wd:sales_item_id>Data - Web Access</wd:sales_item_id>
</wd:lines>
</wd:Report_Entry>
</wd:Report_Data>
If the order of properties in the JSON doesn't matter then you could directly create XSLT/XPath 3 maps and arrays with xsl:map/xsl:map-entry (or the XPath 3.1 map constructor) and the Saxon specific extension element saxon:array (unfortunately the XSLT 3 language standard lacks an instruction to create an array). Furthermore most of your iteration parameters seem to be easily implemented as accumulators:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
extension-element-prefixes="saxon"
xpath-default-namespace="urn:com.workday.report/INT1109_CR_REV_Customer_Invoices_to_Connect"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="adaptive" indent="yes"/>
<xsl:mode streamable="yes" use-accumulators="#all" on-no-match="shallow-skip"/>
<xsl:accumulator name="header-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry" select="$value + 1"/>
</xsl:accumulator>
<xsl:accumulator name="lines-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="Report_Entry/total_lines/text()" select="$value + xs:integer(.)"/>
</xsl:accumulator>
<xsl:template match="Report_Data">
<xsl:apply-templates/>
<xsl:sequence
select="map {
'Stats': map {
'Total Header Count' : accumulator-after('header-count'),
'Total Lines Count' : accumulator-after('lines-count')
}
}"/>
</xsl:template>
<xsl:template match="Report_Entry">
<xsl:map>
<xsl:apply-templates/>
</xsl:map>
</xsl:template>
<xsl:template match="Report_Entry/id | Report_Entry/revenue_stream | lines/sequence | lines/sales_item_id">
<xsl:map-entry key="local-name()" select="string()"/>
</xsl:template>
<xsl:template match="Report_Entry/lines">
<xsl:map-entry key="local-name()">
<saxon:array>
<xsl:apply-templates/>
</saxon:array>
</xsl:map-entry>
</xsl:template>
</xsl:stylesheet>
The example uses output method adaptive as your current sample doesn't create a single JSON object and I have simply tried to create the same output as your current code; the JSON output method would need a single map or array as the main sequence result.
Code works with streaming and Saxon EE 9.9.1.1 in oXygen, unfortunately 9.8 doesn't consider the code streamable.
As for general rules, there are limits as to what you can achieve with accumulators and template matching when streaming; as you can see, the accumulator to sum up the values from the total_lines elements needs to match on the text child to not consume the element in the accumulator (Saxon has another extension of capturing accumulators to ease such tasks however).
So far I would rather say it is more important to find a way to get around the streamability analysis and to have the streamable code return the right and same result as the non-streamable code; for instance, while experimenting with an approach to generate JSON with streaming using two transformation steps where some sample data similar to yours is the input, the XML representation for JSON the result of the first transformation and the JSON supposed to be the result of using xml-to-json on the first step's result I ran into a Saxon bug https://saxonica.plan.io/issues/4215.
With streaming, it seems there is not enough test coverage or implementation maturity to be able to combine features reliably in a complex and scalable way, partly due to a complex spec, partly due to the limited use of that stuff by the XSLT community.
So if you find a working way for a particular problem to use streaming to keep memory consumption lower or manageable compared to the normal XSLT 2/3 tree based processing then you can of course experiment with changes to improve performance but it is easy to break things.
One general observation is that streaming allows you to access all attributes of the currently processed/matched element but not its children, therefore it can help immensely to insert a processing steps that transforms elements into attributes if you have a simple child element structure. That way you can then often avoid any copy-of(). But of course you need a way to combine two stylesheets which Saxon allows with its API but doing it requires writing Java or .NET code.

approach for generating cluster of xpaths from html files in perl

As per one of project requirement, we have to come up with an approach and also suitable perl resource for fullfilling the following requirement.
Our requirement is that we get bulk of input data in html format and we have to write a parser to bring all the varities of input data into one generic formatted xml file.
now the challenge is that while writing parser we manually used to verify very few sample html files, that was failing for other sample html files, but is highly difficult to analyse all html files manually.
Due to the above we have come to a decision that we will have some analysis tool where all the variations in html structure can be monitored using xpaths.
Can you please any of you suggest which module is suitable for my work, I know HTML::TreeBuilder::Xpath will help me in giving xpaths, but any limitations in that module?
or else suggest me the best approach for understanding the analysis of the html file, most of us in our team are more familiar with perl, that's why prefer to go and write in perl.
Suggest me if any other technology can also be used more efficiently, or else within perl also what can be the best approach?
If the html files are guaranteed to be well-formed xhtml, I'd take a look into xslt for transforming the input.
I'd check the output for missing data, since the structure of the input file is unknown. After transforming you can check for the missing data at known paths in the document.
a example how xslt works:
the html input file
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<div class="foo">1</div>
<div>
<div class="bar">2</div>
</div>
</body>
</html>
the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" />
<xsl:template match="/">
<!-- match the document root -->
<data>
<!-- apply all templates to the input document -->
<xsl:apply-templates />
</data>
</xsl:template>
<xsl:template match="/html/body/div[#class='foo']">
<!-- div with attribute class='foo' is expected only at /html/body/div -->
<foo-out><xsl:value-of select="."></xsl:value-of></foo-out>
</xsl:template>
<xsl:template match="//div[#class='bar']">
<!--div with attribute class=bar can be anywhere in the document-->
<bar-out><xsl:value-of select="."></xsl:value-of></bar-out>
</xsl:template>
</xsl:stylesheet>
the output
<?xml version="1.0"?>
<data>
<foo-out>1</foo-out>
<bar-out>2</bar-out>
</data>

Using a variable to contain the name of another variable in order to call it

Ok, so that title is a little confusing. I think it's a little easier to just explain my problem out. In perl, I am getting an array of string values (I'm not sure how long it will be because it depends on the file). Because I don't know how long the array will be, I'm using a for-each in perl and creating a variable in perl that's just a long string that creates a bunch of variables in xslt. For example, here's my code for doing that:
foreach my $node (#objects) {
$count++;
$xslt_vars = $xslt_vars . '<xsl:variable name="namedsets' . $count . '"/><xsl:text>' . $node . '</xsl:text></xsl:variable>';
}
My problem is I'm creating an unknown number of variables in my xslt stylesheet. I have that number in a variable in xslt and I use it in a template like so:
<xsl:template name="expression">
<xsl:param name="count"/>
<xsl:choose>
<xsl:when test="$count > $name-count">
</xsl:when>
<xsl:otherwise>
<xsl:for-each select=".//expression">
<xsl:variable name="expression" select="."/>
<xsl:variable name="express-test">
<xsl:text>$name-sets{$count}</xsl:text>
</xsl:variable>
<xsl:variable name="trying">
<xsl:value-of select="{$express-test}"/>
</xsl:variable>
<xsl:if test="contains($expression, $trying)">
<a>This Worked</a>
</xsl:if>
</xsl:for-each>
<xsl:call-template name="expression">
<xsl:with-param name="count" select="$count + 1"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
So $count is not the same count as it is in the sample of perl code. The perl count is $name-count (I don't know why I did this, but that doesn't really matter). I created $express-test in order to have the name of the current $namedset00 variable. My problem is calling that variable with the correct number. As you can see I tried setting $trying to the value of {$express-test}, but this syntax isn't allowed in xslt. Has anyone done anything similar in xslt? Or know how to call a changing variable name in xslt?
The best you can do with your current approach is to do something like this:
<xsl:value-of select="document('')//xsl:variable[#name = $express-text]" />
But I would suggest exploring how to use XSL parameters properly and pass in a single nodeset to your XSLT rather than patching the variables together with string concatenation. A nodeset will be much easier to access dynamically.

How to convert text to date, then get year value?

Need to convert a text value `2012-03-19' into a date type, then extract the year component.
<xsl:variable name="dateValue" select="2012-03-19"/>
<xsl:variable name="year" select="year-from-date(date($dateValue))"/>
I'm using Saxon 2.0, but it's complaining date function does not exist; I looked around Saxon's documentation and could not find the function, so that's clearly the issue, but I can't find a suitable replacement.
I don't think date() should be a function, you want the xs:date() data type.
Add the xs namespace and then prefix xs:date().
The following stylesheet, using any well-formed XML input, will produce 2012:
<xsl:stylesheet version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="dateValue" select="'2012-03-19'"/>
<xsl:variable name="year" select="year-from-date(xs:date($dateValue))"/>
<xsl:value-of select="$year"/>
</xsl:template>
</xsl:stylesheet>
Note that you also have to quote your select in your "dateValue" xsl:variable.

How to generate Objective C class files from XML schema?

I have an XML schema that defines my data model. I would now like to have Objective C source files generated from the XML schema. Does anyone know how to accomplish this?
Take a look at this Stack Overflow question on XML serialization, which mentions a project along these lines.
Without knowing the details my immediate thought would be to probably use xslt for this. e.g. if you had something like (I appreciate
<element name="SomeEntity">
<attribute name="someAttr" type="integer" />
<complexType>
<sequence>
<element name="someOtherAttr" type="string" />
</sequence>
</complexType>
</entity>
Create a bunch of templates to translate this,e.g.
<xsl:template match="element">
<xsl:apply-template select="." mode="header"/>
<xsl:apply-template select="." mode="impl"/>
</xsl:template>
<xsl:template match="element" mode="header">
class <xsl:value-of select="#name"/> {
public:
<xsl:apply-template select="attribute" mode="header"/>
<xsl:apply-template select="complexType/element" mode="header"/>
</xsl:template>
...
Though if the logic on the generation is more complex I would probably go down the road of importing the xml into an object model and programmatically process that, possibly using a template engine such as Velocity, as while it is possible complex logic in xslt is a pain.