GWT: Character encoding umlauts - gwt

I want to set a text in a label:
labelDemnaechst.setText(" Demnächst fällig:");
On the output in the application the characters "ä" are displayed wrong.
How can I display them well?

GWT assumes all source files are encoded in UTF-8 (and this is why you see löschen if you open them with an editor that interprets them as Windows-1252 or ISO-8859-1).
GWT also generates UTF-8-encoded files, so your web server should serve them as UTF-8 (or don't specify an encoding at all, to let browsers "sniff" it) and your HTML host page should be served in UTF-8 too (best is to put a <meta charset=utf-8> at the beginning of your <head>).
If a \u00E4 in Java is displayed correctly in the browser, then you're probably in the first case, where your source files are not encoded in UTF-8.
See http://code.google.com/webtoolkit/doc/latest/FAQ_Troubleshooting.html#International_characters_don't_display_correctly

well you have to encode your special charactars to Unicode. You can finde a list of the representive Unicode characters here.
Your examle would look like this:
labelDemnaechst.setText("Demn\u00E4lachst f\u00E4llig:");
Hope this helps, if noone has a better solution.
Appendix:
Thanks Thomas for your tipp, you really have to change the format in which eclipse safes it's source files. Per default it uses something like Cp1252. If you change it to UTF-8, your example works correct. (So Demnächst is written correctly).
You can edit the safing format, if you right-click on your file --> Preferences.

To get UTF-8 encoding for your entire workspace, go to Window -> Preferences. In the pop-up start typing encoding. Now you should have Content Types, Workspace, CSS Files, HTML Files, XML Files as result. In content Types you can type UTF-8 in the Default encoding text box, for the other elements you can just select the encoding in their respective listboxes.
Then check the encoding for your project in Project -> Properties -> Resource.
Detailed instruction with pictures can be found here:
http://stijndewitt.wordpress.com/2010/05/05/unicode-utf-8-in-eclipse-java/
Cheers

what i did:
open the file with notepad (Windows Explorer),
and save it with the option UFT-8 instead of proposed ANSI.
Encoding the project to UTF-8 didn't work (for me)
Cheerio

Use iso-8859-1 (western europe) character set instead of UTF-8.

Related

HtmlHelp hhc file doesn't show russian characters

I use free pascal's chmcmd command to create chm file from hhp. After converting content goes right, but left pane side (tree) doesn't show russian characters. I tried to set charset at hhc file to cp1251. And saved file in windows 1251 encoding. After that it shows tree in russian right in cool reader but not in xChm. In windows it still doesnt work, only weird symbols. Utf-8 doesn't work at all.
The Microsoft CHM help format is very old and not maintained anymore. It wasn't created with Unicode in mind and various tricks need to be done in order to be able to generate CHM files for certain encodings:
You Windows is setup in the target language of the help file
The content HTML pages must be created using the proper charset

How should a properly UTF-8 encoded file look in notepad++

I am integrating data using some flat files. I'm getting the flat files delivered by FTP as .csv-files out of MS SQL exports from a business partner.
I asked him to encode it as UTF-8 (just using the standard I thought).
Now I can see in his files that a lot of UTF-8 bytes such as "& # 2 3 3 ;" (w/o the spaces) can be seen as plain text when I open it in Notedpad++ (or also using my "ETL" tool).
Before I ask him to fix it into proper UTF-8, I would like to understand the issue and whether my claim is actually correct?
Shouldn't special characters be shown as special characters when I open them in Notepad++ and not as plain text UTF-8 codes?
Any help is much appreciated :))
Cheers
Martin
é is an HTML entity. For some reason the text is HTML formatted, which I wouldn't count as "plaintext"/flat files. The file may or may not be encoded in UTF-8 in addition to that, we don't know from the information given.
A file containing "special characters" (meaning non-ASCII characters) encoded in UTF-8 opened in a text editor which correctly interprets the file as UTF-8 looks exactly like the text it should look like, e.g.:
正式名称は、ISO/IEC 10646では “UCS Transformation Format 8”、Unicodeでは “Unicode Transformation Format-8” という。両者はISO/IEC 10646とUnicodeのコード重複範囲で互換性がある。RFCにも仕様がある。
Put this in a file, save it as UTF-8, open it in another application as UTF-8, and this is what the text should look like.

Page won't show special characters

So I know this is a common problem with stuff like the charset, but the weird thing is that this works on a page with the same set-up/template, but not on this one!
So basically, my problem is that the page won't show Norwegian characters like å and ø.
Here's the page with the problem: http://suldal.underbakke.net/register.php
and here's one with the same template but working: http://suldal.underbakke.net/
(On the second one, it's a "å" in 4th post, in the name)
The page is declared as being in UTF-8 encoding, but it is in fact windows-1252 (or iso-8859-1) encoded. You can see this by manually selecting the encoding while viewing the page in a browser; browsers typically have a View menu where you can select the encoding.
Thus, as a quick fix, you could just change utf-8 to windows-1252 in the meta tag.
As a different workaround, you could replace the “special characters” (Scandinavian letters) by HTML entities, e.g. “ø” by ø. Depending on the authoring software, you might need to do something special to achieve this (e.g., enter “HTML mode”), because an authoring tool might automatically convert “&” to &.
As the best solution, find out how to save a file in UTF-8 encoding in the authoring program you are using, and keep the meta tag as is. This is typically either an option in the general settings of the program or a choice you can make in a “Save As” command.

Notepad++ can recognize encoding?

I created file with UTF-8 encoded content (using PHP fputcsv).
When I open this file in Notepad++ - characters are wrong (Notepad++ starts with ANSI encoding).
When I set Format->"Encode in UTF-8" from menu - everything is fine.
Im worrying, that Notepad++ can recognize encoding somehow, and maybe something is wrong with my file created with fputcsv? First byte or something?
Automatically detecting an encoding is not something that can be done accurately. It's pretty much essential that the encoding be specified explicitly. It can be guessed in some cases, but even then not with 100% certainty.
This documentation (Encoding) explains the situation in relation to Notepad++.
They also point out that the difficulty arises especially if the file has not been saved with a Byte Order Mark (BOM).
Given that your file displays correctly once you manually set the encoding, I would say there's nothing wrong with how you are generating and saving the file. The only thing you can check for is whether a BOM is being saved, which might improve the chances of Notepad++ being able to automatically detect the encoding.
It's worth noting that although it may help editors like Notepad++ identify the encoding more accurately, according to The Unicode Standard document, the BOM is not recommended.
You have to check the lower right corner of the Notepad++ GUI to see the actual enconding that is being used. The problem it's not that Notepad++ specific because guessing the right encoding is a big problem without any real solution so it's better to let the user decide what is the most appropriate encoding in each single case.
When you want to reflect the encoding of the text file in a Java program, you have to consider two thnigs: encoding and character set. When you open a text file, you see encoding under "Encoding" menu. Additionally look at the character set menu point. Under "Eastern European" you will find "ISO 8859-2", and under Central European "Windows-1250". You can set corresponding encoding in the Java program
when you look up in the table:
https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
For example, for Cenntral European character set "Windows-1250" the table suggest Java encoding "Cp1250". Set the encoding and you will see the characters in program properly.

Is there a way to get the encoding of a text file in UltraEdit?

Is there a setting in UltraEdit that allows me to see the encoding of the file?
In UltraEdit, the encoding that is being used to -display- the file, is shown in the status bar at the right somewhere, together with the line-ending type in use, for example, "U8-UNIX". You can also manually set as what encoding the file has to be displayed. In version 10 this is under menu View -> Set Code Page. You can also -convert- the actual codepage of the file under menu File -> Conversions.
If the file does not have a BOM header, a couple of bytes at the start of the file indicating the encoding, the -actual- encoding of the file, can only be guessed. And even if the file has a BOM header, there can still be encoding issues.
All text editors do this, and some are better at it than others. I haven't done a comparision to see which is best at it. At the moment (2012), I know UltraEdit fails to detect UTF-8 and other variants in 1000 line (or longer) text files if the first UTF-8 character only appears later in the document. It also fails to show the encoding properly when you set it manually.
Notepad++ is also not great at detecting it, but when you know the encoding, you can set it manually.
Sublime Text is, as far as I know, best at detecting the encoding, also in large files.
I think there are also some very good command line tools out there, ported from GNU to Windows, to detect encoding. My bet would be that that's going to be the best option.