Unicode characters in Javadoc [duplicate]

Unicode characters in Javadoc [duplicate] - eclipse

I am trying to generate Java documentation in Eclipse. The source files are UTF-8 encoded and contain some umlauts. The resulting HTML files do not specify an encoding and do not use HTML entities, so the umlauts aren't displayed correctly in any browser.
What can I do to change this?

Modified from Eclipse javadoc in utf-8:
Project -> Generate Javadoc -> Next -> on the last page, in Extra Javadoc options write:
-encoding UTF-8 -charset UTF-8 -docencoding UTF-8

See the -charset, -encoding and -docencoding flags for the javadoc command.
-encoding specifies the input encoding
-docencoding specifies the output encoding
-charset makes javadoc include a meta tag with encoding info

If you generate your javadoc with an ant task and you use UTF-8 you can do:
<javadoc encoding="UTF-8" charset="UTF-8" docencoding="UTF-8" sourcepath="yoursources" destdir="yourdocdir" />

When generating the javadoc with Gradle add the following to your build.gradle file:
javadoc {
options.encoding = 'UTF-8'
options.docEncoding = 'UTF-8'
options.charSet = 'UTF-8'
}

Related

UTF-8 problems in Grails 2.4 with russian lang

I started a new project in grails 2.4 version and when I write something in russian into gsp files, browser render the page with uncorrect symbols. in gsp files I have charset=utf-8 and <%# page contentType="text/html;charset=UTF-8" %> this line. in Config.groovy
grails {
views {
gsp {
encoding = 'UTF-8'
htmlcodec = 'xml'
codecs {
expression = 'html'
scriptlet = 'html'
taglib = 'none'
staticparts = 'none'
}
}
}
}
and I change encoding to UTF-8 for whole project in Eclipse Project properties. what is wrong, please, help.

I found the solution and the problem was with tomcat. I changed tomcat configuration fot utf-8

Ant Jar task corrupts manifest encoding

Per Jar specification manifest encoding has to be UTF-8.
In some scenario (eg merge), manifests produced by ant's jar task got corrupted and special chars would be double encoded.
Original manifest (utf-8):
...
Application-Name: spécial
...
Final manifest (utf-8) after beeing processed by ant's jar task:
...
Application-Name: spÃ©cial
...

Jar tasks beeing able to process file-sets allows the developper to specify the original manifest character encoding.
Unfortunately, although the mandatory (final) encoding is utf-8 there is no default in ant's jar task and then the original manifest processing is relying on the platform default... Windows-1252 in my case where the original manifest (coming from another jar) is truly in utf-8
Solution : specify the encoding in the task attribute
<jar destfile="final.jar" filesetmanifest="merge" manifestencoding="UTF-8">
<zipfileset src="original.jar">
[...]
</zipfileset>
</jar>

I've just found my old bugreport about this for NetBeans.
As a workaround, I added manifestEncoding="${source.encoding}"
attribute to the copylibs tag in build-impl.xml

Character encoding annoyance

I am suddenly getting a character encoding error when trying to run ant build-all. I have been in the properties for my project, and choose UTF-8 under resource. Still I'm getting the following error (actually there are more than 100 encoding errors) when trying to build-all:
error: unmappable character for encoding UTF8
[javac] // nedenfor inden f�rste angreb, s� total = 41 = tur 21
I cannot commit my project because of this error. Any idea how to fix this? And it just all of a sudden started complaining about encoding

Sounds like you need to specify the encoding argument on your Ant javac task:
<javac encoding="UTF-8" ...
Ant tasks do not know about Eclipse project properties.

Can't make Ant write proper version info with unicode (c) character

After upgrading ant from 1.6 to 1.8.3 version info resources of Windows .dlls that are built with Ant became corrupted.
Previously this value was properly saved to the version-info resource:
product.copyright=\u00a9 Copyright 20xx-20xx yyyyyyyyyy \u2122 (so (c) and TM symbols were properly displayed).
After upgrading Ant default encoding was changed to UTF-8 which is expected, but currently Copyright string looks like this:
Â© Copyright 20xx-20xx yyyyyy â„¢
This is not a console issue - I checked with hex editor and File Properties dialog - both display it incorrectly.
Looking at file's hexdump I see that following (obviously incorrect) mapping occurs
\u00a9 -> 0x00c2 0x00a9
\u2122 -> 0x00e2 0x201e 0x00a2
The problem here is that Ant encodes UTF-8 bytes (not Unicode string) into 16-bit characters and writes it to version-info.
Although this looks like a bug in ant, I would ask if anyone managed to find any workarounds for this or similar problems.
Here are some snippets from the script:
Project properties file:
...
product.copyright=(c) Copyright 2005-2012 Clarabridge
....
Files included into build.xml:
<versioninfo id="current-version" if="is-windows"
fileversion="${product.version}"
productversion="${product.version}"
compatibilityversion="1"
legalcopyright="${product.copyright}"
companyname="${product.company}"
filedescription="${ant.project.name}"
productname="${ant.project.name}"
/>
...
<cc objdir="${target.dir}/${target.platform}/obj"
outfile="${target.dir}/${target.platform}/${ant.project.name}"
subsystem="other"
failonerror="true"
incremental="false"
outtype="shared"
runtime="dynamic"
>
<versioninfo refid="current-version" />
<compiler refid="compiler-shared-${target.platform}" />
<compiler refid="rc-compiler" />
<linker extends="linker-${target.platform}">
<libset dir="${target.dir}/${target.platform}/lib" libs="${lib.list}" />
</linker>
<fileset dir="${src.dir}" casesensitive="false">
<include name="*.cpp"/>
</fileset>
</cc>

Your bug is that something is misinterpreting the UTF-8 characters as 8-bit ones!!!
BTW, Java doesn’t use 16-bit characters; that would be UCS-2. Java uses UTF-16, which is just as much a variable-width encoding as UTF-8 is. Distressing how many Java programmers screw this up!
UTF-8 has 8-bit code units where UTF-16 has 16-bit code units; neither one supports an “8-bit character” or a “16-bit character”. If you catch yourself writing code that thinks they do, you’ve just written buggy code.
Your output is the result of erroneously displaying UTF-8 as though it were in Latin1, which does use 8-bit characters. You, however, do not.

ANT Javac and special characters

I have an ANT task defined like so:
<javac source="1.5" target="1.5" srcdir="${src.dir}" destdir="${classes.dir}" deprecation="on" debug="on" classpathref="classpath" fork="true" memoryMaximumSize="512m" encoding="UTF-8">
<include name="${app.directory}/**/*.java"/>
</javac>
This works fine, but when I have classes with special characters in their names it gives me the following error:
[iosession] Compiling 131 source files to /C24/PUB/io-stds/trunk/standards/GSIT/build/test/deployment/build/classes
[iosession] javac: file not found: /C24/PUB/io-stds/trunk/standards/GSIT/build/test/deployment/src/java/biz/c24/io/minos/AléaChiffréClass.java
[iosession] Usage: javac <options> <source files>
[iosession] use -help for a list of possible options
[iosession] Target compile finished
[iosession]
[iosession] Building unsuccessful 2 seconds
When I remove the "fork=true" it works, but then it ignores the "memoryMaximumSize" setting. I also tried the nested approach, but to no avail.
Any ideas?

It's perhaps not the answer you expect but my advice would be to remove all non-ascii letters from the names of methods and classes. I'm French-speaking too, and I've never seen any company, even in France and using French as its development language, accept accented letters in class names and methods. It's just not good practice, simply because it would be very hard for a non French developer, without accents on his keyboard, to use these classes and methods.
If you use a good IDE, it should allow you to refactor your code easily.

Apache did confirm that the encoding attribute only applies to the file contents and not file names. I reverted back to using fork only when needed and kept encoding="UTF-8".

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Unicode characters in Javadoc [duplicate] - eclipse

I am trying to generate Java documentation in Eclipse. The source files are UTF-8 encoded and contain some umlauts. The resulting HTML files do not specify an encoding and do not use HTML entities, so the umlauts aren't displayed correctly in any browser. What can I do to change this?

Modified from Eclipse javadoc in utf-8: Project -> Generate Javadoc -> Next -> on the last page, in Extra Javadoc options write: -encoding UTF-8 -charset UTF-8 -docencoding UTF-8

See the -charset, -encoding and -docencoding flags for the javadoc command. -encoding specifies the input encoding -docencoding specifies the output encoding -charset makes javadoc include a meta tag with encoding info

If you generate your javadoc with an ant task and you use UTF-8 you can do: <javadoc encoding="UTF-8" charset="UTF-8" docencoding="UTF-8" sourcepath="yoursources" destdir="yourdocdir" />

When generating the javadoc with Gradle add the following to your build.gradle file: javadoc { options.encoding = 'UTF-8' options.docEncoding = 'UTF-8' options.charSet = 'UTF-8' }

Related

UTF-8 problems in Grails 2.4 with russian lang

Ant Jar task corrupts manifest encoding

Character encoding annoyance

Can't make Ant write proper version info with unicode (c) character

ANT Javac and special characters

Categories

Resources