Unicode character enconding - unicode

I have a jsp page which takes up first name and last name in chinese language. I am using the struts framework.
I need to pass the first name and last name from the JSP to servlet in terms of unicode characters.
I am doing the following changes:
JSP Changes:
1) <%# page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %>
2) meta content="content-type" content="text/html; charset=utf-8" in the header,
The filter which is called before the action servlet, I have used the following code:
request.setCharacterEncoding("UTF-8");
response.setContentType("UTF-8");
This did not work, the unicode characters which are passed are incorrect, or something not readable.
Considering the MVC framework already creates the request object by the time it reaches the filter,
I modified the JSP to include the following lines of code
<%# taglib uri="http://java.sun.com/jsp/jstl/fmt" prefix="fmt" %>
< fmt:requestEncoding value="UTF-8" />
< fmt:setLocale value="zh_CN"/>
None of the above changes have worked. Please help me to get the correct unicode characters in the action class.
IS there any modification i need to make in the config files.

You also need to set the Bundle basename.
<fmt:setBundle basename = "[base_file_name_of_language_file]"/>
where [base_file_name_of_language_file] is equal to whatever name you set your language properties file name to be.
For example, the "MyLanguageFileName" in this file name :
MyLanguageFileName_zh_CN.properties

Related

German Novel with DkPro

I tried German Novel with DkPro. My Sample input file is an XHTML file. How can I get my PosTagger output based on the XHTML index.
Script:
PACKAGE com.github.uima.ruta.novel;
ENGINE utils.HtmlAnnotator;
ENGINE utils.HtmlConverter;
ENGINE utils.ViewWriter;
TYPESYSTEM utils.HtmlTypeSystem;
TYPESYSTEM utils.TypeSystem;
IMPORT PACKAGE de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos FROM desc.type.POS;
IMPORT de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma FROM desc.type.LexicalUnits;
UIMAFIT org.dkpro.core.opennlp.OpenNlpSegmenter;
UIMAFIT org.dkpro.core.stanfordnlp.StanfordPosTagger;
CONFIGURE(HtmlAnnotator, "onlyContent" = false);
Document{-> EXEC(HtmlAnnotator)};
Document { -> CONFIGURE(HtmlConverter, "inputView" = "_InitialView","outputView" = "plain"),
EXEC(HtmlConverter,{TAG})};
"<\\?xml version=\"1.0\" encoding=\"UTF-8\"\\?>"->MARKUP;
uima.tcas.DocumentAnnotation{-CONTAINS(POS)} -> {
uima.tcas.DocumentAnnotation{-> SETFEATURE("language", "de")};
EXEC(OpenNlpSegmenter);
EXEC(StanfordPosTagger, {POS});
};
Sample Input
<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"><head xmlns="http://www.w3.org/1999/xhtml"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><meta name="viewport" content="width=device-width, initial-scale=1.0" /><style></style><title></title></head><link xmlns="http://www.w3.org/1999/xhtml" src="./ckeditor.css" /><body xmlns="http://www.w3.org/1999/xhtml"><div class="WordSection1"><p class="Normal" data-name="Normal"><span data-bkmark="para10000"></span><span style="font-size:9pt">Der Idiot</span><span data-bkmark="para10000"></span></p>
<p class="Normal" data-name="Normal"><span data-bkmark="para10001"></span><span style="font-size:9pt">Ein Roman in vier Teilen.</span><span data-bkmark="para10001"></span></p>
</div>
<hr align="left" size="1" width="33%" /></body>
</html>
In the sample script, uima.tcas.DocumentAnnotation is sent to PosTagger Process. The MARKUP in this annotation affecting the accuracy. What I need to do to get the accuracy.
The HtmlAnnotator can be used to hide additional MARKUP so that rules are not affected by them.
The HtmlConverter is able to create a new document text without html/xml markup, but only in a new CAS view as the initial text in a CAS is static and cannot be changed.
The EXEC action is able to apply an external analysis engine on the current CAS object, and it can be configured to be applied on a different CAS view. However, the external analysis engine is applied on the complete CAS including the markup. No new CAS is created on the fly.
There are several options what you could do.
You could apply the pos tagger on the ‘plain’ view, but you cannot access these annotation with rules as the annotation will be present in a different view
You setup a multi view setting, e.g, by a two stage process. First convert the text to plain text without markup, and then apply the pos tagger on the new text
Depending on the external analysis engine, you maybe can also solve this by redefining what a token is.

Sling Mapping Rewrite Rules do not rewrite paths in meta tags

I have sling mappings setup that rewrite outgoing paths to the external URL. An example of this rewrite:
/content/www-sitename/home.html would be rewritten to http://www.sitename.com/home.html
I have also configured the LinkCheckerTransformerFactory: linkcheckertransformer.rewriteElements=["a:href","area:href","form:action","link:href","meta:content"]
Some HTML on a page component:
<head>
<link rel="canonical" href="/content/www-sitename/home.html" />
<meta name="canonical" content="/content/www-sitename/home.html" />
</head>
When visited, only the link:href has been rewritten, meta:content is unchanged:
<head>
<link rel="canonical" href="http://www.sitename.com/home.html" />
<meta name="canonical" content="/content/www-sitename/home.html" />
</head>
Worth noting is that the link:href was not rewritten prior to configuring the linkcheckertransformer.rewriteElements to include it. Why did this change work for link:href, but not meta:content. Aside from creating a custom rewrite filter, what can be done to get links in meta:content attributes to be rewritten?
nerd answer is correct, by default the internal Sling mechanism responsible for parsing HTML (htmlparser) supports only following tags: a, area, form, base, link, script, body, so even if you add meta:content to the LinkChecker configuration, CQ won't recognize the <meta> as a tag which needs processing.
In order to reconfigure htmlparser, create a node named generator-htmlparser under /libs/cq/config/rewriter/default with following properties:
jcr:primaryType = nt:unstructured
includeTags = [A, AREA, FORM, BASE, LINK, SCRIPT, BODY, META]
The includeTags property should be multivalued, so you can add other tags in the future.
If you don't want to override the content under /libs, create your own rewriter configuration:
Copy /libs/cq/config/rewriter/default and its children to /apps/YOURAPP/config/rewriter/my-rewriter.
Set order property on the my-rewriter to 1.
Create generator-htmlparser under the my-rewriter as above.
I think you have to add meta tag to the htmplparser generator.
see my question and answer: How to add additional element to htmlparser generator

title tag html editing

hey everyone am trying to add my title tag but every time i try to put it in the format of Company name| Primary keyword and Secondary keyword i get parsing error which is ''Error parsing XML, line 516, column 29: Element type "ShareFreeTemplates" must be followed by either attribute specifications, ">" or "/>" and here is my html code
<b:include data='blog' name='all-head-content'/>
<!--::::::::::: Block2: Output Index Title,keywords,decription and Post Title,description -->
<!-- Post/Archive Page -->
<b:if cond='data:blog.pageType != "index"'>
<title><data:blog.pageName/></title>
<!-- Index Page -->
<b:else/>
<title><ShareFreeTemplates|Free After Effects Templates And-Tutorials /></title>
<meta content='after effects free templates, templates, after effects project files, free download' name='keywords'/>
</b:if>
if somebody can edit that title tag to give me no errors i want it like that ''Free After Effects Templates and Project Files| ShareFreeTemplates'' and thx in regards
This is how the title tag works in standard HTML coding.
<title> insert words here </title> are the tags used. and then you insert the words in between the two tags like so
<title>ShareFreeTemplates|Free After Effects Templates And-Tutorials </title>
you dont need to encapsulate your title in another pair of tags.
<title><ShareFreeTemplates|Free After Effects Templates And-Tutorials /></title>
^ ^^
//remove the characters where the arrows are pointing at

Resolve a Kentico localisation macro in a transformation?

As the title suggests I am trying to resolve a localisation string inside a repeater. I have a wysiwyg editor to input some html on the form tab of the document type, so the source would look like this
Field1: "{$localstring$}"
Then in the transformation I have
<li><%# Eval("Field1") %></li>
This outputs the string as
{$localstring$}
and doesn't resolve this as a macro and go lookup the localstring in the UI culture localisation.
I have tried different things including
<%# Eval(CMS.GlobalHelper.ResHelper.LocalizeString("Field1")) %>
and
<%# Eval(CMS.CMSHelper.CMSContext.CurrentResolver.ResolveMacros("Field1")) %>
all of which give the same output, can anyone point me in the right direction? I am sure it's the way Eval is being called.
Thanks in advance.
in case somebody else searches for this: if you want to use localization string custom.my-string in ASPX transformation, you should resolve it as follows:
<%# CMS.CMSHelper.CMSContext.CurrentResolver.ResolveMacros("{$custom.my-string$}") %>
note: no spaces! if you add spaces like this: "{$ custom.my-string $}" - it WILL NOT work.
The correct syntax is following:
<%# CMS.CMSHelper.CMSContext.CurrentResolver.ResolveMacros(Eval("Field1").ToString()) %>

set global encoding in Zend Framework

My bootstrap sets encoding for all views:
protected function _initView () {
$view = new Zend_View();
// snip...
$view->setEncoding('utf-8');
// snip...
return $view;
}
However, this does not set encoding for my form validators. The StringLength uses its default encoding (I'm not sure which that is) and it counts diacritics as two characters.
I know I can set the 'encoding' => 'utf-8' option when creating the validator, but it's kind of pesky to update all validators across my entire (huge) application. Is there a way to set the encoding for all validators at the same time?
You normally shouldn't have a problem with that if you've sent the Content-Type very early + defined it in the HTML-Document.
In PHP you can send the Encoding at the Entry-Point (which is in Zend Framework the "index.php"-File):
header( 'Content-Type: text/html; charset=utf-8' );
In your HTML-Layout you should place the META-Tag within your HEAD-Tag:
<!-- do this only in: HTML5 -->
<meta charset="utf-8" />
<!-- do this only in: HTML4 -->
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
And, of course, save every Source-File (HTML+PHP-Files) within the Project in UTF-8 (without BOM)
In ZF 1.1x the StringLength validator uses iconv_strlen:
int iconv_strlen(string $str [, string $charset = ini_set("iconv.internal_encoding")])
So one thing to try is to call ini_set (or iconv_set_encoding('internal_encoding', $encoding);).
May be a bit late, but this would be the full answer:
mb_internal_encoding('utf-8');
iconv_set_encoding('internal_encoding', 'utf-8');