XML based template engine in Java? - code-generation

Can somebody suggest me a template engine (preferably written in Java) that could generate any text that I like from given XML input?

StringTemplate, FreeMarker

How about XSLt? You may use JAXP to do the processing.

You can use XSLT, it is not restricted to generating only XML output. It is restricted to XML input. Use the xsl:output tag do define the type of output you will be generating.
E.g. to generate text output
<xsl:output method="text" encoding="UTF-8"/>
To generate XML output with indentation
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>

Related

approach for generating cluster of xpaths from html files in perl

As per one of project requirement, we have to come up with an approach and also suitable perl resource for fullfilling the following requirement.
Our requirement is that we get bulk of input data in html format and we have to write a parser to bring all the varities of input data into one generic formatted xml file.
now the challenge is that while writing parser we manually used to verify very few sample html files, that was failing for other sample html files, but is highly difficult to analyse all html files manually.
Due to the above we have come to a decision that we will have some analysis tool where all the variations in html structure can be monitored using xpaths.
Can you please any of you suggest which module is suitable for my work, I know HTML::TreeBuilder::Xpath will help me in giving xpaths, but any limitations in that module?
or else suggest me the best approach for understanding the analysis of the html file, most of us in our team are more familiar with perl, that's why prefer to go and write in perl.
Suggest me if any other technology can also be used more efficiently, or else within perl also what can be the best approach?
If the html files are guaranteed to be well-formed xhtml, I'd take a look into xslt for transforming the input.
I'd check the output for missing data, since the structure of the input file is unknown. After transforming you can check for the missing data at known paths in the document.
a example how xslt works:
the html input file
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<div class="foo">1</div>
<div>
<div class="bar">2</div>
</div>
</body>
</html>
the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" />
<xsl:template match="/">
<!-- match the document root -->
<data>
<!-- apply all templates to the input document -->
<xsl:apply-templates />
</data>
</xsl:template>
<xsl:template match="/html/body/div[#class='foo']">
<!-- div with attribute class='foo' is expected only at /html/body/div -->
<foo-out><xsl:value-of select="."></xsl:value-of></foo-out>
</xsl:template>
<xsl:template match="//div[#class='bar']">
<!--div with attribute class=bar can be anywhere in the document-->
<bar-out><xsl:value-of select="."></xsl:value-of></bar-out>
</xsl:template>
</xsl:stylesheet>
the output
<?xml version="1.0"?>
<data>
<foo-out>1</foo-out>
<bar-out>2</bar-out>
</data>

Ektron Forms - Generate Plaintext Email

In Ektron I have a form that generates an HTML email on submission and sends it to a mailbox (foo#bar.com). An application (MailReader) checks that mailbox, reads the messages, strips all markup, and saves the resulting messages for later use. This is a problem because all the text from the HTML email end up getting mashed together and are completely unreadable by someone using the MailReader app.
For example, this HTML:
<h1>Header1</h1>
<div>
<h2>Header2</h2>
<p>Some text in a paragraph.</p>
</div>
Becomes:
Header1Header2Some text in a paragraph.
I cannot change MailReader in any way, it will always strip any markup so my solution is to have Ektron generate an email that contains no HTML for just this form. I know the email is generated using XSLT transformations with the file /Workarea/controls/forms/DefaultFormEmailBody.xslt.
My attempt at my solution involved adding a hidden input to the form with name "__nohtml". Then the XSLT will do the following:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:choose>
<xsl:when test="/field[starts-with(#name, '__nohtml')]">
<xsl:output method="text" />
</xsl:when>
<xsl:otherwise>
<xsl:output method="html" />
</xsl:otherwise>
</xsl:choose>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="/field[starts-with(#name, '__nohtml')]">
text output
</xsl:when>
<xsl:otherwise>
html output
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
But it never sends the email when I use this. I tried just rendering with this template on my own machine and I get errors. And I've also noticed that w3 documentation says the The xsl:output element is only allowed as a top-level element. That probably explains why I can't put it in the element.
I've also tried just completely omitting the element and it seems to default to HTML no matter what.
I've tried searching our local Ektron code for the point where the transformation occurs so I can tell Ektron to use the default XSLT in standard cases or use a different XSLT if __nohtml exists, but I don't know if that code is even accessible.
I would greatly appreciate it if someone can help me find a template that would allow either HTML or plaintext based on whether a field exists. If not then it would be equally awesome if someone could point me to the point of XSLT transformation in Ektron's code (if it's even accessible).
I would recommend a form strategy to generate a version of an email that you build out. There should be a public override void OnAfterSubmit(FormData formData, FormSubmittedData submittedFormData, string formXml, CmsEventArgs eventArgs) that would allow you to create pull out the submitted form data (either in the raw object or I believe it returns the xml as well. From there you could just parse it and save your html file.

Umbraco XSLT RenderTemplate Woes

Needing a little guidance with XSLT and Umbraco. Fairly new to XSLT but I think I understand the concepts. Right on one page I have two columns, each with their own separate pickable content. This is done via the standard content picker property (one for each column). The issue is that I need to be able to have two different templates on the page. So in essence the page navigated two which has the columns has to render two of it's child items in its own page.
I have this working with one column using a generic XSLT which I found that basically just renders what ever child item it finds, but I want it to render what ever one the user picked.
I know the Content Picker returns the XML Node ID of the page and that can be used with the Render Template function to display it (I have an example of that), but when it comes to adding in my own properties and then passing them to the RenderTemplate function I get lost. New to this XSLT :).
So far I have...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
xmlns:Exslt.ExsltCommon="urn:Exslt.ExsltCommon"
xmlns:Exslt.ExsltDatesAndTimes="urn:Exslt.ExsltDatesAndTimes"
xmlns:Exslt.ExsltMath="urn:Exslt.ExsltMath"
xmlns:Exslt.ExsltRegularExpressions="urn:Exslt.ExsltRegularExpressions"
xmlns:Exslt.ExsltStrings="urn:Exslt.ExsltStrings"
xmlns:Exslt.ExsltSets="urn:Exslt.ExsltSets">
<xsl:output method="html" omit-xml-declaration="yes"/>
<xsl:param name="currentPage"/>
<xsl:variable name="nodeID" select="data[#alias='leftColumn']"/>
<xsl:template match="/">
<xsl:value-of select="umbraco.library:RenderTemplate($nodeID)" disable-output-escaping="yes"/>
</xsl:template>
</xsl:stylesheet>
Any one know why this doesn't work and how to do what I'm after? The above gives a value was either too large or too small error.
You actually have two problems here ...
Calling RenderTemplate()
RenderTemplate actually requires two arguments when using an alternative template, the first being the content node ID, and the second being the chosen template you want applied.
<xsl:value-of
select="umbraco.library:RenderTemplate($nodeID, $templateID)"
disable-output-escaping="yes" />
See the following link for more information : http://our.umbraco.org/wiki/reference/umbracolibrary/rendertemplate
Too large or too small error
This one is simple to fix by putting an if-empty statement around the the code in question.
<xsl:if test="$nodeID != ''">
<xsl:value-of
select="umbraco.library:RenderTemplate($nodeID, $templateID)"
disable-output-escaping="yes" />
</xsl:if>
The XSLT parser likes to assume certain values are null, when in reality they aren't. Another way to get by this is to check the Skip Errors checkbox when saving, but this makes debugging actual erroneous code a bit of a pain.
Hope it helps.

how to extract xml tag values from email body?

I know the process to extract the body from the email using MIME::PARSER, but in my mail i have xml data. whats is the process to extract xml tag values from the email body?
Body:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<TEST>
<test>test</test>
<test1>test1</test1>
</TEST>
Assuming the body actually consists of XML, just feed it into your XML parser of choice (XML::Twig seems popular these days or you could look at Task::Kensho's recommended XML modules) and use it as normal.

RSS Feed created using Zend_Feed - firefox prompts to download not display as RSS

I've created an RSS feed using Zend_Feed.
It seems to have worked in that the resulting XML looks good. My problem is that Firefox won't recognise it as an RSS feed and instead prompts me to download the raw XML.
Trying it in IE gives the error "this feed contains code errors" with the following extra info:
Invalid xml declaration.
Line: 2 Character: 3
< ? xml version="1.0" encoding="UTF-8" ?>
Any help greatly appreciated.
The xml-declaration must be on the absolute first line in the output. I.e. no blank lines or spaces before the xml-declaration tag.
This is valid:
<?xml version="1.0" encoding="UTF-8" ?>
This is not:
<?xml version="1.0" encoding="UTF-8" ?>
Check whether <?xml version="1.0" encoding="utf-8"?> is the first line in the feed file. No empty lines or spaces!
If PHP is spitting out any notices/warnings or such, those will malform the feed. Try setting error_reporting to zero before the feed is sent to test:
error_reporting(0);
good rule of thumb when using php class files and such, never ?> your class files. Only use ?> in template-type files where you are going to have regular output afterwards. All of the major packages do this now for exactly the above reasoning.