I'm creating the german version of a chm help file. My problem is in Table of Contents umlauts are not displayed. I assume it is because of code page. The hhc file is ANSI. Converting it to Unicode doesn't help - it displays different, but still wrong, characters.
The file "Table of Contents.hhc" starts with
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="Microsoft® HTML Help Workshop 4.1">
<!-- Sitemap 1.0 -->
</HEAD><BODY>
<OBJECT type="text/site properties">
<param name="ImageType" value="Folder">
</OBJECT>
<UL>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="ÜÜÜÜÜÜÜÜÜÜÜÜÜÜ Uberblick">
<param name="Local" value="overview.htm">
<param name="URL" value="overview.htm">
</OBJECT>
</UL>
</BODY></HTML>
Make sure the "Language" setting in the "Options" section of the project file supports the character you want. Since you are on a Russian system, the default is probably Russian. Change it to German, for instance.
The engine rendering the chm is Unicode, only the compiler is ansi.
Try escaping them? http://www.w3schools.com/tags/ref_entities.asp
or the charset encoding:http://www.w3.org/TR/html4/charset.html#h-5.2.2
Actually, you don't need UTF-8 for CHM files because CHM doesn't support UTF-8 or Unicode. CHM is an ancient format that Microsoft has not really changed since Windows 98, and it has a number of quirks and restrictions like this
Read for more detail...
https://helpman.it-authoring.com/viewtopic.php?t=9294
https://blogs.msdn.microsoft.com/sandcastle/2007/09/29/chm-localization-and-unicode-issues-dbcsfix-exe/
Related
I have used UTF-8 encoding and ASP classic with vbscript as default scripting language in my website. I have separated files to smaller parts for better management.
I always use this trick in first line of separated files to preserve UTF-8 encoding while saving files elsewhere the language characters are converted to weird characters.
mainfile.asp
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body>
<!--#include file="sub.asp"--->
</body>
</html>
sub.asp
<%if 1=2 then%>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<%end if%>
this is some characters in other language:
تست متن به زبان فارسی
This trick works good for offline saving and also works good when the page is running on the server because these Extra lines are omitted (because the condition is always false!):
Is there a better way to preserve encoding in separated files?
I use Microsoft expression web for editing files.
I use Textpad to ensure that all main files and includes are saved in UTF-8 encoding. Just hit 'Save As' and set the encoding dropdown on the dialog to the one you want.
Keep the meta tag as well because that is still necessary.
I have a basic installed version of atom.io
I use this in snippets:
'.text.xml':
'Примечание':
'prefix': 'note'
'body': """
<note>
<title>
<inlinegraphic fileref="IMG/notice.png" align="absmiddle"/>
<emphasis role="bold">Примечание</emphasis>
</title>
<para>текст примечания</para>
</note>
"""
and I get this as a result using snippet:
<note>
<title>
<inlinegraphic fileref="IMG/notice.png" align="absmiddle"/>
<emphasis role="bold">����������</emphasis>
</title>
<para>����� ����������</para>
</note>
T tried to change encoding - to win1251, utf-8, koi8-r - it doesn't help to turn signs into a text
What can I do to make this snippet work with russian language?
Update - I used it at work with win1251 as most xml files are written with russian using this encoding. When I tried it later today at home with default for my region Atom UTF-8 encoding - I got the correct spelling of the russian letters from the snippet:
<note>
<title>
<inlinegraphic fileref="IMG/notice.png" align="absmiddle"/>
<emphasis role="bold">Примечание</emphasis>
</title>
<para>текст примечания</para>
</note>
Also I fould similar problem description (not the same) at atom.io discussion board.
It quite interesting - that guy from the board had problems with utf8 encoding, I have with win1251 but don't have with utf8. I would be happy to know how to get rid of this problem at all
I need to create i18n properties files for non-Latin languages (simplified Chinese, Japanese Kanji, etc.) With the Swing portion of our product, we use Java properties files with the raw UTF-8 characters in them, which Netbeans automatically converts to 8859-1 for us, and it works fine. With JavaFX, this strategy isn't working. Our strategy matches this answer precisely which doesn't seem to be working in this case.
In my investigation into the problem, I discovered this old article indicating that I need to use native2ascii to convert the characters in the properties file; still doesn't work.
In order to eliminate as many variables as possible, I created a sample FXML project to illustrate the problem. There are three internationalized labels in Japanese Kanji. The first label has the text in the FXML document. The second loads the raw unescaped character from the properties file. The third loads the escaped Unicode (matching native2ascii output).
jp_test.properties
btn.one=閉じる
btn.two=\u00e9\u2013\u2030\u00e3\ufffd\u02dc\u00e3\u201a\u2039
jp_test.fxml
<?xml version="1.0" encoding="UTF-8"?>
<?import java.lang.*?>
<?import java.util.*?>
<?import javafx.scene.control.*?>
<?import javafx.scene.layout.*?>
<?import javafx.scene.paint.*?>
<?scenebuilder-preview-i18n-resource jp_test.properties?>
<AnchorPane id="AnchorPane" maxHeight="-Infinity" maxWidth="-Infinity" minHeight="-Infinity" minWidth="-Infinity" prefHeight="147.0" prefWidth="306.0" xmlns:fx="http://javafx.com/fxml">
<children>
<Label layoutX="36.0" layoutY="33.0" text="閉じる" />
<Label layoutX="36.0" layoutY="65.0" text="%btn.one" />
<Label layoutX="36.0" layoutY="97.0" text="%btn.two" />
<Label layoutX="132.0" layoutY="33.0" text="Static Label" textFill="RED" />
<Label layoutX="132.0" layoutY="65.0" text="Properties File Unescaped" textFill="RED" />
<Label layoutX="132.0" layoutY="97.0" text="Properties File Escaped" textFill="RED" />
</children>
</AnchorPane>
Result
As you can see, the third label is not rendered correctly.
Environment:
Java 7 u21, u27, u45, u51, 32-bit and 64-bit. (JavaFX 2.2.3-2.2.45)
Windows 7 Enterprise, Professional 64-bit.
UPDATE
I've verified that the properties files is ISO 8859-1
Most IDEs (NetBeans at least) handle the files in unicode encoding by default. If you are creating the properties files in NetBeans and entering the Japanese text in it, then the entered text will be automatically encoded to utf. To see this open the properties file with notepad(++), you will see that the Japanese characters are escaped.
The utf escaped equivalent of "閉じる" is "\u9589\u3058\u308b", whereas "\u00e9\u2013\u2030\u00e3\ufffd\u02dc\u00e3\u201a\u2039" is "é–‰ã�˜ã‚‹" on reverse side. So the program output in the picture is correct. Additionally, if you reopen the jp_test.properties file in NetBeans, you will see the escaped utf encoded texts will be seen as decoded.
EDIT: as per comment,
Why does it do this?
It maybe because you are omitting the -encoding parameter of native2ascii, then the default charset of your system may not be UTF. This maybe the reason of that output.
Also, why is it that Java and Swing have no problems with our properties files as they are,
but FXML can't handle it?
It cannot be the case, because the "FXML is a Java". The only difference may also be the "usage of system charset" vs "overriding the charset in some configuration place".
Anyway, I suggest using right encoding parameter of native2ascii according to the input files encoding. More specifically, convert the properties files to utf-8 encoding first then do the rest. If you are using NetBeans as IDE, then no need for native2ascii.
Properties files should be ISO 8859-1 encoded, not UTF-8.
Characters can be escaped using \uXXXX.
Tools such as NetBeans are doing this by default, AFAIK.
http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html
http://docs.oracle.com/javase/tutorial/i18n/text/convertintro.html
I have some .html with the font defined as:
<font color="white" face="Arial">
I have no other style applied to my tag. In it, when I display data like:
<b> “Software” </b>
or
<b>“Software”</b>
they both display characters I do not want in the UIWebView. It looks like this on a black background:
How do I avoid that? If I don't use font face="arial", it works fine.
This is an encoding issue. Make sure you use the same encoding everywhere. UTF8 is probably the best choice.
You can put a line
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
in your html to tell UIWebView about the encoding.
To be precise, “ is what you get when you take the UTF-8 encoding of “, and interpret it as ISO-8859-1. So your data is encoded in UTF-8, which is good, and you just need to set the content type to UTF-8 instead of ISO-8859-1 (e.g. using the <meta> tag above)
You shouldn’t generally use the curly quote characters themselves—character encodings will always mess you up somehow. No idea why it works correctly when you don’t use Arial (though that suggests a great idea: don’t use Arial), but your best bet is to use the HTML entities “ and ” instead.
I can't get Zend_form to accept any inserted latin characters (ü, é, etc).
Even if I'm not validating it doesn't accept this.
Does anyone now how to get this to work?
Gr. Tosh
After doing a couple of tests, it seems to be a simple character encoding issue.
Your server is probably not delivering documents with UTF-8 encoding. You can easily force this in your view / layout by placing this in your <head> (preferably as the first child)
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
or if using a HTML 5 doctype
<meta charset="utf-8">
It probably doesn't hurt to set the Zend_View encoding as well in your application config file though this wasn't necessary in my tests (I think "UTF-8" is the default anyway)
resources.view.encoding = "utf-8"