Stripping empty tags from Plone content with Diazo - tinymce

I have a Plone site, themed with plone.app.theming. The problem I have is that the design is quite strict and doesn't assume any empty <p> elements or any other nonsense TinyMCE outputs. Such elements break the intended design. So I want to strip the empty elements from the content. I have tried inline xslt (that, according to http://diazo.org/advanced.html#inline-xsl-directives should be supported) like:
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*) and not(text()[normalize-space()])]"/>
But it didn't do the trick. In fact it made something weird. The empty p tags that I wanted to get rid of stayed intact but some other elements like
<img src="../++theme++jarn.com/whatever.png" />
turned into
with the image being striped out. Replacing match="*[… in the second template to match="p[… didn't strip out the images, but those nasty <p> were still in the output.
Any hints on how to get rid of the empty elements using Diazo rules?
UPDATE January 31, 2012
Here is the content from which I need the empty p tags to be stripped off:
<div id="parent-fieldname-text">
<p></p>
<p> </p>
<p> </p>
<p><section id="what-we-do">
<p class="visualClear summary">Not empty Paragraph</p>
<ul class="thumbsList">
<li> <img src="../++theme++jarn.com/whatever.png" /></li>
<li> <img src="../++theme++jarn.com/whatever.png" /></li>
</ul>
</section></p>
</div>
The Diazo transformation rules:
<?xml version="1.0" encoding="UTF-8"?>
<rules
xmlns="http://namespaces.plone.org/diazo"
xmlns:css="http://namespaces.plone.org/diazo/css"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[not(*) and not(normalize-space())]"/>
<!-- The default theme, used for standard Plone web pages -->
<theme href="index.html" css:if-content="#visual-portal-wrapper" />
<replace css:theme-children="div.contentWrapper" css:content-children="#content" />
</rules>
The output I get after applying the transformations to the Plone site is absolutely identical to the input while I would expect to get those 3 empty <p> tags after opening <div> to go away.
If I change the second template to match all elements like match="*… then the images get stripped out, but the empty <p> tags are still there.

Just have this:
<xsl:template match="p[not(*) and not(normalize-space())]"/>
A complete transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[not(*) and not(normalize-space())]"/>
</xsl:stylesheet>
when this transformation is applied on this XML document:
<div>
<p/>
<p> </p>
<p><img src="..."/></p>
<img src="..."/>
</div>
the wanted, correct result is produced:
<div>
<p>
<img src="..."/>
</p>
<img src="..."/>
</div>

Works for me. I've added an example of using Dimitre's xpath in a drop content rule at https://github.com/plone/diazo/commit/94ddff7117d25d3a8a89457eeb272b5500ec21c5 but it also works as the equivalent xsl:template. The example is pared down to the basics but it works using the complete example content in the question too.

Related

Adjust Dita-OT plugin to output PDF wireframe with all blocks solid border

I'm interested to output solid black border surrounding all fo:blocks to aid in viewing where the borders are between elements displayed in a pdf output.
I would like to apply a transformation at the end of dita-ot plugin that applies the borders. I can fiddle with the following xsl however I'm not sure how to apply the xlst at the end of a dita-ot process.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet exclude-result-prefixes="xs ditaarch opentopic e" version="2.0" xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/" xmlns:e="com.docdept.pdf" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:opentopic="http://www.idiominc.com/opentopic" xmlns:opentopic-func="http://www.idiominc.com/opentopic/exsl/function" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|*|processing-instruction()|comment()">
<xsl:copy>
<xsl:apply-templates select="*|#*|text()|processing-instruction()|comment()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="fo:block">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:attribute name="border-style">solid</xsl:attribute>
<xsl:attribute name="border-width">0.5pt</xsl:attribute>
<xsl:attribute name="border-color">black</xsl:attribute>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I thought perhaps the following to apply wireframe.xsl at the end of the process but this does not work.
<?xml version='1.0' encoding='UTF-8'?>
<plugin id="com.docdept.pdf">
<require plugin="org.dita.pdf2" />
<feature extension="dita.conductor.transtype.check" value="adjust-pdf" />
<feature extension="dita.transtype.print" value="docdept-pdf" />
<feature extension="dita.conductor.target.relative" file="integrator.xml" />
<feature extension="dita.xsl.pdf" file="xsl/fo/wireframe.xsl"/>
</plugin>
I'm seeing that it's better to associate borders of different colors to the attribute sets so there is visual reference that can also be searched for by color name within the fo output.
<xsl:attribute name="border">1mm thin solid</xsl:attribute>
<xsl:attribute name="border-color">GOLD</xsl:attribute>

approach for generating cluster of xpaths from html files in perl

As per one of project requirement, we have to come up with an approach and also suitable perl resource for fullfilling the following requirement.
Our requirement is that we get bulk of input data in html format and we have to write a parser to bring all the varities of input data into one generic formatted xml file.
now the challenge is that while writing parser we manually used to verify very few sample html files, that was failing for other sample html files, but is highly difficult to analyse all html files manually.
Due to the above we have come to a decision that we will have some analysis tool where all the variations in html structure can be monitored using xpaths.
Can you please any of you suggest which module is suitable for my work, I know HTML::TreeBuilder::Xpath will help me in giving xpaths, but any limitations in that module?
or else suggest me the best approach for understanding the analysis of the html file, most of us in our team are more familiar with perl, that's why prefer to go and write in perl.
Suggest me if any other technology can also be used more efficiently, or else within perl also what can be the best approach?
If the html files are guaranteed to be well-formed xhtml, I'd take a look into xslt for transforming the input.
I'd check the output for missing data, since the structure of the input file is unknown. After transforming you can check for the missing data at known paths in the document.
a example how xslt works:
the html input file
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<div class="foo">1</div>
<div>
<div class="bar">2</div>
</div>
</body>
</html>
the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" />
<xsl:template match="/">
<!-- match the document root -->
<data>
<!-- apply all templates to the input document -->
<xsl:apply-templates />
</data>
</xsl:template>
<xsl:template match="/html/body/div[#class='foo']">
<!-- div with attribute class='foo' is expected only at /html/body/div -->
<foo-out><xsl:value-of select="."></xsl:value-of></foo-out>
</xsl:template>
<xsl:template match="//div[#class='bar']">
<!--div with attribute class=bar can be anywhere in the document-->
<bar-out><xsl:value-of select="."></xsl:value-of></bar-out>
</xsl:template>
</xsl:stylesheet>
the output
<?xml version="1.0"?>
<data>
<foo-out>1</foo-out>
<bar-out>2</bar-out>
</data>

Cleanup Document in Eclipse not working in specific case

I have a problem using Eclipse Cleanup Document option in case my XML looks like this.
<?xml version="1.0" encoding="UTF-8"?>
<div>
<span style='color:#0000C8'>>
</span>
<h3 style='margin-left:0cm;text-indent:0cm'><a name="_Toc419970681">2.7.2<span
style='font:7.0pt "Times New Roman"'> </span>Helpers</a></h3>
</div>
Clean document works only if I remove <div> tags or if I move </span> after >
After successful [Source - Cleanup Document] XML should look like this.
<?xml version="1.0" encoding="UTF-8"?>
<div>
<span style='color:#0000C8'>></span>
<h3 style='margin-left:0cm;text-indent:0cm'>
<a name="_Toc419970681">
2.7.2
<span style='font:7.0pt "Times New Roman"'>
</span>
Helpers
</a>
</h3>
</div>
May be due to > (>) eclipse consider it as closing tag of span. Hence it can not validate other content after > as this is not proper closing tag of span/div. If you remove DIV tag then it will not find any open tag except span to treats as > however it did not removes you tag at the end.

How do I enable/use XSLT document function in eclipse?

I am trying to write a XSLT file that will process an Android Activity layout xml file and create an HTML equivalent.
In creating/testing out the xslt file, I am using the eclipse xslt tools.
One of the roadblocks on this path, is that many values are not held in the Android layout file directly (such as string/text values) but are instead held in a seperate xml file located in the 'values' folder
I have been trying to use the xslt function 'document' to open the strings.xml file but without success.
Q) Are there any eclipse permissions that I need to enable to allow the xslt document function to operate?
Q) Is there something missing in my understanding of how the document function should operate?
The line which is contained within the TextView template (in the xslt file) that is trying to access the android strings.xml file, is:
<xsl:value-of select="document('../values/strings.xml')/String[#name=substring-after(#android:text,'/')]" />
The layout file is in the res/layout folder
The xslt is in a res/xsl
folder.
The strings.xml file is in the res/values folder
Here are the code samples:
Android XML Layout snippet:
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical" >
<!-- This is the full screen vertical layout -->
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="250dp"
android:layout_height="wrap_content"
android:layout_marginTop="10dp"
android:orientation="vertical" >
<!-- This is the main content vertical layout -->
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:orientation="vertical"
android:paddingLeft="#dimen/mainHeadingIndent" >
<!-- This is the Status Section Vertical content layout -->
<TextView
style="#style/HeadingTextStyle"
android:id="#+id/textView2"
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:layout_weight=".5"
android:text="#string/Status"/>
**The rest of the layout file is intentionally omitted**
Here is the XSLT file:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:android="http://schemas.android.com/apk/res/android">
<xsl:output method="html" version="4.0" encoding="iso-8859-1"
indent="yes" />
<xsl:template match="/">
<html>
<head>
</head>
<body>
<xsl:for-each select="LinearLayout">
<xsl:call-template name="LinearLayout">
</xsl:call-template>
</xsl:for-each>
</body>
</html>
</xsl:template>
<xsl:template name="LinearLayout">
<xsl:variable name="width">
<xsl:value-of select="#android:layout_width" />
</xsl:variable>
<xsl:variable name="height">
<xsl:value-of select="#android:layout_height" />
</xsl:variable>
<xsl:variable name="margin">
<xsl:value-of select="#android:layout_margin" />
</xsl:variable>
<xsl:variable name="orient">
<xsl:value-of select="#android:orientation" />
</xsl:variable>
<xsl:variable name="pos">
<xsl:number value="position()" format="1" />
</xsl:variable>
<div div_pos='{$pos}' Width='{$width}' Height='{$height}' Margin='{$margin}'
Orient='{$orient}'>
<xsl:for-each select="LinearLayout">
<xsl:call-template name="LinearLayout">
</xsl:call-template>
</xsl:for-each>
<xsl:for-each select="TextView">
<xsl:call-template name="TextView">
</xsl:call-template>
</xsl:for-each>
<xsl:for-each select="ImageButton">
<xsl:call-template name="ImageButton">
</xsl:call-template>
</xsl:for-each>
<xsl:for-each select="Spinner">
<xsl:call-template name="Spinner">
</xsl:call-template>
</xsl:for-each>
<xsl:for-each select="Button">
<xsl:call-template name="Button">
</xsl:call-template>
</xsl:for-each>
</div>
</xsl:template>
<xsl:template name="ImageButton">
<div id="{#android:id}">
<xsl:comment>
ImageButton
</xsl:comment>
</div>
</xsl:template>
<xsl:template name="TextView">
<xsl:variable name="pos">
<xsl:number value="position()" format="1" />
</xsl:variable>
Text field =
<xsl:value-of select="#android:text" />
<xsl:variable name="txt">
<xsl:choose>
<xsl:when test="contains(#android:text,'/')">
<xsl:value-of select="document('../values/strings.xml')/String[#name=substring-after(#android:text,'/')]" />
<!--
<xsl:value-of select="substring-after(#android:text,'/')" />
-->
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="#android:text" />
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<div id="{#android:id}">
<xsl:comment>
TextView Element
</xsl:comment>
<p>
Text pointer is: <xsl:value-of select="$txt" />
</p>
<!-- <p>Text pointer is:<xsl:copy-of select="$txt"/></p> <xsl:value-of
select="document('../values/strings.xml')/String[#name=$txt]" />
<xsl:value-of select="substring-after(#android:text,'/')" />
-->
</div>
endoftext
<br />
</xsl:template>
<xsl:template name="Spinner">
<div id="{#android:id}">
<xsl:comment>
Spinner
</xsl:comment>
</div>
</xsl:template>
<xsl:template name="Button">
<div id="{#android:id}">
<xsl:comment>
Button
</xsl:comment>
</div>
</xsl:template>
</xsl:stylesheet>
You need to include the document's root element in the XPath (this is very easy to forget):
<xsl:value-of select="document('../values/strings.xml')/*/String[#name=substring- after(#android:text,'/')]" />
Well, I've solved it.
The big issue is that when xslt fails, it generally fails silently .... ie no errors
The issue turned out not to be the document function per-se. It was the selection criteria that followed the document function. This had a problem such that there was no selected nodes and hence no output from the statement. This all made it look like the document function was not working.
The big change was that the substring-after clause wasn't/isn't being evaluated against the the right attribute. By splittingg the substring clause up, it started to work.
As they say, it's "FM" (flipping magic)!
Here is what I replaced the document line with:
<xsl:variable name='string_value'>
<xsl:value-of select="substring-after(#android:text,'/')" />
</xsl:variable>
<xsl:value-of select="$string_value"/>
<xsl:value-of select="document('../values/strings.xml')/*/string[#name=$string_value]"/>
This sucessfully pulled the correct value out of the strings.xml file.

Can JSTL transform XML encoded in UTF-8?

I am making a simple JSP application to transform XML data into HTML.
I use JSTL and my XML data is encoded in UTF-8. It works, but the danish characters look strange in the browser.
Like this:
Danish characters written directly in jsp: ÆØÅ æøå
Same danish characters transformed with jstl:
character: Æ character: æ
character: Ø character: ø
character: Å character: å
However, if I manually change the xml definition like so:
<?xml version="1.0" encoding="ISO-8859-1" ?>
The output is transformed properly.
Should I set up JSTL in some way to handle UTF-8, or is it,that my file is actually latin1 encoded by mistake? I do not know how to check this...
Here is my test xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<rows>
<row>
<name>character: Æ</name>
<surname>character: æ</surname>
</row>
<row>
<name>character: Ø</name>
<surname>character: ø</surname>
</row>
<row>
<name>character: Å</name>
<surname>character: å</surname>
</row>
</rows>
Here is my xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<table border="0">
<xsl:for-each select="rows/row">
<tr>
<td>
<xsl:value-of select="name" />
</td>
<td>
<xsl:value-of select="surname" />
</td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>
My index.jsp:
<?xml version="1.0" encoding="UTF-8" ?>
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>
<%# taglib prefix="x" uri="http://java.sun.com/jsp/jstl/xml"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org /TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Insert title here</title>
</head>
<body>
Written directly in jsp: ÆØÅ æøå
<h3>xml transformed with jstl:</h3>
<c:import url="/Test.xsl" var="xsltdoc" />
<c:import url="/Test.xml" var="xmldoc" />
<x:transform xml="${xmldoc}" xslt="${xsltdoc}" />
</body>
</html>
I am using JSTL libraries (Implementation-Version: 1.2) on JBOSS AP 4.2.3.
Ok, I checked the encoding of my xml data here, and it is correct, that it is UTF-8 encoded.
Apparently, in index.jsp JSTL must be set to use UTF-8 like so
<c:import url="/Metadata1.xsl" var="xsltdoc" charEncoding="UTF-8" />
<c:import url="/Metadata1.xml" var="xmldoc" charEncoding="UTF-8" />
This solves my problem.