Can JSTL transform XML encoded in UTF-8? - encoding

I am making a simple JSP application to transform XML data into HTML.
I use JSTL and my XML data is encoded in UTF-8. It works, but the danish characters look strange in the browser.
Like this:
Danish characters written directly in jsp: ÆØÅ æøå
Same danish characters transformed with jstl:
character: Æ character: æ
character: Ø character: ø
character: Å character: å
However, if I manually change the xml definition like so:
<?xml version="1.0" encoding="ISO-8859-1" ?>
The output is transformed properly.
Should I set up JSTL in some way to handle UTF-8, or is it,that my file is actually latin1 encoded by mistake? I do not know how to check this...
Here is my test xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<rows>
<row>
<name>character: Æ</name>
<surname>character: æ</surname>
</row>
<row>
<name>character: Ø</name>
<surname>character: ø</surname>
</row>
<row>
<name>character: Å</name>
<surname>character: å</surname>
</row>
</rows>
Here is my xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<table border="0">
<xsl:for-each select="rows/row">
<tr>
<td>
<xsl:value-of select="name" />
</td>
<td>
<xsl:value-of select="surname" />
</td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>
My index.jsp:
<?xml version="1.0" encoding="UTF-8" ?>
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>
<%# taglib prefix="x" uri="http://java.sun.com/jsp/jstl/xml"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org /TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Insert title here</title>
</head>
<body>
Written directly in jsp: ÆØÅ æøå
<h3>xml transformed with jstl:</h3>
<c:import url="/Test.xsl" var="xsltdoc" />
<c:import url="/Test.xml" var="xmldoc" />
<x:transform xml="${xmldoc}" xslt="${xsltdoc}" />
</body>
</html>
I am using JSTL libraries (Implementation-Version: 1.2) on JBOSS AP 4.2.3.

Ok, I checked the encoding of my xml data here, and it is correct, that it is UTF-8 encoded.
Apparently, in index.jsp JSTL must be set to use UTF-8 like so
<c:import url="/Metadata1.xsl" var="xsltdoc" charEncoding="UTF-8" />
<c:import url="/Metadata1.xml" var="xmldoc" charEncoding="UTF-8" />
This solves my problem.

Related

find redirect META in DOMDocument with DOMXPath

I have the following HTML:
<html>
<head>
<meta http-equiv="refresh" content="0;URL=http://amazingjokes.com" />
</head>
</html>
I want to find the META with the redirect, so I wrote the following XPath query:
/html/head/meta[#http-equiv="refresh"]
However, the '-' in 'http-equiv' is causing an error:
Invalid regular expression: //html/head/meta[#http-equiv="refresh"]/:
Range out of order in character class
How can I properly rewrite the xpath query to be able to find the meta redirect?
I experimented with this, when I remove the '-' from the HTML code and the query things work as expected, but unfortunately the 'http-equiv' is a set standard, so I can not change that. This experiment showed me I am very close...
However, the '-' in 'http-equiv' is causing an error:
Invalid regular expression: //html/head/meta[#http-equiv="refresh"]/:
Range out of order in character class
Obviously, the XPath engine you are using is buggy.
The XPath expression used in the question is a valid XPath 1.0 expression and it selects the wanted <meta> element.
Here is an XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="/html/head/meta[#http-equiv='refresh']"/>
</xsl:template>
</xsl:stylesheet>
When the transformation above is applied on the provided XML document:
<html>
<head>
<meta http-equiv="refresh" content="0;URL=http://amazingjokes.com" />
</head>
</html>
the XPath expression is evaluated and the selected in this evaluation node is output:
<meta http-equiv="refresh" content="0;URL=http://amazingjokes.com"/>

Facelets page with special characters causes MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence at UTF8Reader.invalidByte

I am merging an old and some new stuff into a webapplication. However when using swedish letters the page will fail. It does not seem to be a server issues since the old .jsp pages will load correctly.
What am I missing in the xhtml header?
mar 25, 2015 11:50:53 FM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [Faces Servlet] in context with path [/BowlingInfo] threw exception
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:691)
<!DOCTYPE html>
<html lang="sv-SE"
xmlns="http://www.w3c.org/1999/xhtml"
xmlns:h="http://xmlns.jcp.org/jsf/html"
xmlns:f="http://java.sun.com/jsf/core"
xmlns:p="http://primefaces.org/ui"
xmlns:c="http://java.sun.com/jsp/jstl/core">
<h:head>
<link rel="stylesheet" href="bowling-style.css" />
<meta http-equiv="content-type" content="text/html" charset="ISO-8859-1" />
</h:head>
<h:body>
<!-- FAIL -->
<h1>Hallmästaren</h1>
</h:body>
</html>
Example of old page that will work
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<%# taglib prefix="f" uri="http://java.sun.com/jsf/core"%>
<%# taglib prefix="h" uri="http://java.sun.com/jsf/html"%>
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Svalövs bowlinghall</title>
<script type="text/JavaScript">
<!--
var currentTime = new Date()
function AutoRefresh( t ) {
setTimeout("location.reload(true);", t);
}
function GetServerDate() {
var date = new Date();
dateNow = date;
document.write(dateNow);
return dateNow;
}
</script>
<link rel="stylesheet" type="text/css" href="bowling-style.css" />
</head>
<body onload="JavaScript:AutoRefresh(15000);" bgcolor="C2F2BD">
<f:view>
.........
Facelets uses by default UTF-8 encoding (as part of World Domination). You should be configuring all editors and layers to use UTF-8.
In your particular case, there are at least two probable causes:
Eclipse should via Window > Preferences > General > Workspace > Text File Encoding be configured to use UTF-8 to save files.
The HTTP/HTML Content-Type header should be specifying charset=UTF-8, exactly as you had in your JSP which you for some reason changed to the legacy ISO-8859-1 encoding.
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />

<jsp:root and <html xmlns

In a JSP, do I need to provide a jsp:root directive and an XML namespace declaration. Or only the later. That is, if I have the following:
<jsp:directive.page language="java"
contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"
/>
<jsp:root xmlns:c="http://java.sun.com/jsp/jstl/core"
xmlns:s="http://www.springframework.org/tags"
/>
<!DOCTYPE html>
<html xmlns:c="http://java.sun.com/jsp/jstl/core"
xmlns:s="http://www.springframework.org/tags">
<head>
... remainder of my HTML page
should I remove the jsp:root element? The information seems redundant. Removing the namespace declaration from the html element makes Eclipse complain.
You need the jsp:root element to set up the rest of the document to understand tags beginning with s: or c:. So don't remove it.

Why <h:outputScript> does not work inside <h:form>

I am using JSF 2.1. I'm trying to use TinyEditor on a <h:inputTextarea>. Here is my code,
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:h="http://java.sun.com/jsf/html"
xmlns:ui="http://java.sun.com/jsf/facelets"
xmlns:f="http://java.sun.com/jsf/core"
xmlns:p="http://primefaces.org/ui">
<h:head>
<h:outputStylesheet library="css" name="editor_style.css" />
<h:outputStylesheet library="css" name="css/main.css" />
<h:outputStylesheet library="css" name="css/dropdown.css" />
<h:outputScript name="js/tinyeditor.js"></h:outputScript>
</h:head>
<h:body>
<div class="content">
<ui:include src="/template/layout/commonLayout.xhtml" />
<ui:include src="/template/layout/menu.xhtml" />
<h:form>
<div class="quick_links">
<div class="q_title">
</div>
<div class="q_window">
<div class="q_top"></div>
<div class="q_main">
<h:inputTextarea value="#{EditorBean.value}" id="input"
style="width:100%; height:300px;">Sample FAQ</h:inputTextarea>
<h:outputScript>
new TINY.editor.edit('editor',{
id:'input',
width:945,
height:175,
cssclass:'te',
controlclass:'tecontrol',
rowclass:'teheader',
dividerclass:'tedivider',
controls:['bold','italic','underline','strikethrough','|','subscript','superscript','|',
'orderedlist','unorderedlist','|','outdent','indent','|','leftalign',
'centeralign','rightalign','blockjustify','|','unformat','|','undo','redo','n',
'font','size','style','|','hr','link','unlink'],
footer:true,
fonts:['Verdana','Arial','Georgia','Trebuchet MS'],
xhtml:true,
cssfile:'style.css',
bodyid:'editor',
footerclass:'tefooter',
toggle:{text:'Source',activetext:'HTML',cssclass:'toggle'},
resize:{cssclass:'resize'}
});
</h:outputScript>
</div>
<div class="q_bottom"></div>
</div>
<h:commandButton value="Savebutton" action="#{EditorBean.test}"></h:commandButton>
</div>
<div class="push"></div>
</h:form>
</div>
</h:body>
</html>
If I remove the <h:form> tag, then only editor gets displayed, but the <h:commandButton> doesn't work.
If I keep the <h:form> tag, then the <h:commandButton> works, but the TinyEditor editor does not get initialized.
How is this caused and how can I solve it?
The <h:outputScript> works perfectly fine. The concrete problem is just in your own JavaScript code. You specified an ID of "input" in the TinyEditor configuration script:
id:'input',
However there is no such HTML element with that ID in the JSF-generated HTML output. Yes, you should be looking at the JSF-generated HTML output, because that's basically all what the browser is retrieving. JavaScript does not run in webserver, but in the webbrowser and sees the JSF-generated HTML output only. Rightclick page in browser and do View Source to see it yourself as well.
The generated ID of the <h:inputTextarea> has in this particular construct the ID of the <h:form> prepended. In your particular case, you didn't specify any ID for the <h:form>, so JSF will autogenerate one like so j_id123 so that the HTML element ID of the <textarea> as generated by <h:inputTextarea> will become j_id123:input. You need to specify exactly that ID in the TinyEditor configuration script.
However, better is to specify a fixed ID on the <h:form>, as the autogenerated ID may change whenever you add/remove components to the view.
<h:form id="form">
<h:inputTextarea id="input" />
...
This way the generated <textarea> will get an ID of form:input. Then you can just use it in the TinyEditor configuration script:
id:'form:input',

Stripping empty tags from Plone content with Diazo

I have a Plone site, themed with plone.app.theming. The problem I have is that the design is quite strict and doesn't assume any empty <p> elements or any other nonsense TinyMCE outputs. Such elements break the intended design. So I want to strip the empty elements from the content. I have tried inline xslt (that, according to http://diazo.org/advanced.html#inline-xsl-directives should be supported) like:
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*) and not(text()[normalize-space()])]"/>
But it didn't do the trick. In fact it made something weird. The empty p tags that I wanted to get rid of stayed intact but some other elements like
<img src="../++theme++jarn.com/whatever.png" />
turned into
with the image being striped out. Replacing match="*[… in the second template to match="p[… didn't strip out the images, but those nasty <p> were still in the output.
Any hints on how to get rid of the empty elements using Diazo rules?
UPDATE January 31, 2012
Here is the content from which I need the empty p tags to be stripped off:
<div id="parent-fieldname-text">
<p></p>
<p> </p>
<p> </p>
<p><section id="what-we-do">
<p class="visualClear summary">Not empty Paragraph</p>
<ul class="thumbsList">
<li> <img src="../++theme++jarn.com/whatever.png" /></li>
<li> <img src="../++theme++jarn.com/whatever.png" /></li>
</ul>
</section></p>
</div>
The Diazo transformation rules:
<?xml version="1.0" encoding="UTF-8"?>
<rules
xmlns="http://namespaces.plone.org/diazo"
xmlns:css="http://namespaces.plone.org/diazo/css"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[not(*) and not(normalize-space())]"/>
<!-- The default theme, used for standard Plone web pages -->
<theme href="index.html" css:if-content="#visual-portal-wrapper" />
<replace css:theme-children="div.contentWrapper" css:content-children="#content" />
</rules>
The output I get after applying the transformations to the Plone site is absolutely identical to the input while I would expect to get those 3 empty <p> tags after opening <div> to go away.
If I change the second template to match all elements like match="*… then the images get stripped out, but the empty <p> tags are still there.
Just have this:
<xsl:template match="p[not(*) and not(normalize-space())]"/>
A complete transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[not(*) and not(normalize-space())]"/>
</xsl:stylesheet>
when this transformation is applied on this XML document:
<div>
<p/>
<p> </p>
<p><img src="..."/></p>
<img src="..."/>
</div>
the wanted, correct result is produced:
<div>
<p>
<img src="..."/>
</p>
<img src="..."/>
</div>
Works for me. I've added an example of using Dimitre's xpath in a drop content rule at https://github.com/plone/diazo/commit/94ddff7117d25d3a8a89457eeb272b5500ec21c5 but it also works as the equivalent xsl:template. The example is pared down to the basics but it works using the complete example content in the question too.