UIWebView, quote characters with Arial font not showing up correctly - iphone

I have some .html with the font defined as:
<font color="white" face="Arial">
I have no other style applied to my tag. In it, when I display data like:
<b> “Software” </b>
or
<b>“Software”</b>
they both display characters I do not want in the UIWebView. It looks like this on a black background:
How do I avoid that? If I don't use font face="arial", it works fine.

This is an encoding issue. Make sure you use the same encoding everywhere. UTF8 is probably the best choice.
You can put a line
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
in your html to tell UIWebView about the encoding.
To be precise, “ is what you get when you take the UTF-8 encoding of “, and interpret it as ISO-8859-1. So your data is encoded in UTF-8, which is good, and you just need to set the content type to UTF-8 instead of ISO-8859-1 (e.g. using the <meta> tag above)

You shouldn’t generally use the curly quote characters themselves—character encodings will always mess you up somehow. No idea why it works correctly when you don’t use Arial (though that suggests a great idea: don’t use Arial), but your best bet is to use the HTML entities “ and ” instead.

Related

How to properly encode html entities in emails? e.g. &nearr; for Gmail

So I modified some emails I send to get rid of images and replace them by special unicode characters. For example I had an arrow image and replaced it with &nearr; while wrapping it in a <span> to give it the color I want.
When I look at the source in Gmail (3 dots > Show Original) I see this:
...
--1234567890123456789012345678
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.=
w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8" />
</head>
<body>
...
... <span style=3D"font-family:arial,verdana;font-weight:bold;color:#209a20">&nearr;</span> ...
...
</body>
</html>
--1234567890123456789012345678--
Which is what I'd expect since that's what I wrote in my code.
Now the problem is that it displays like this in the Gmail web interface:
What am I doing wrong? Isn't UTF-8 a unicode encoding that should support this character?
I would understand if some of these special characters are displayed as square boxes or something, but I do not understand how they can remain encoded while the turns into a space correctly.
It also makes me question whether other email clients will display these correctly (would love feedback on that too).
In the 1950's computers could handle only capital letters, digits and some punctuation.
Before 1970, EBCDIC was invented (only to later die out) for handling lower case and a few more punctuation characters.
Then came a plethora of encodings to handle European accents, Cyrillic, Greek, and eventually Chinese. (There are some interesting stories on the invention of typewriters for handling Chinese!)
Eventually, the Unicode group got together and slowly created a universal standard. It has been evolving for a few decades and continues to enhance it -- emojis are a big addition that is ongoing.
But, meanwhile, how does one put Emoji, etc, in URLs, type them on a keyboard, etc, etc? Those standards are lagging way behind. So, there are kludges in place.
HTML allows "entities", such as &nearr; for that arrow.
Putting such in a URL would require something like %E2%86%97.
Several encodings also base their kludge on the hex encoding of the utf8.
Unicode allows \U8599 which is based on the decimal value of the "codepoint". (I think Java goes that direction.)
MySQL INSERT: UNHEX('E28697')
Keyboards -- good luck.
I don't know of anything other than HTML that reacts favorably to &nearr;
Ever notice a + in a URL? That is the encoding for a single space. (Also %20 works there.)
Try the HTML code rather than the HTML entity.
So ↗ for the north east arrow, as per
https://www.toptal.com/designers/htmlarrows/arrows/north-east-arrow/
Best reference for this is usually https://unicode-table.com/en/

.ENCODING international chars (hebrew,thai,russian,chinese,....)

international html files archived by wget
should contain chars like this
(example hebrew and thai:)
אב
הם
and ยคน
instead they are saved like this:
íäáåãéú and ÃÒ¡à§é
How to get the these displayed properly?
iconv filename.html
iconv: illegal input sequence at position 1254
SOLVED: There was nothing wrong.
Only i didnt notice the default php.ini did set the charset in the http header but
to use various charsets like this meta http-equiv="Content-Type" content="text/html; charset=windows-874" you needed to set: default_charset = "empty";
....
The pages aren't "saved like this", whatever you're using to view the file is simply interpreting the encoding incorrectly. To know what encoding the file is in you should have paid attention to the HTTP Content-Type header during download; that's gone now.
Your only other chance is to parse the equivalent HTML meta tag in the <head>, if the document has one.
Otherwise, you can only guess the encoding of the document.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for more required background knowledge.

How do I specify an encoding for TextCells in CellList?

I use a CellList like this
CellList<String> cellList = new CellList<String>(new TextCell());
and then give it an ArrayList<String>.
If a String contains an "ü" I get a question mark in the browser (FF4, GWT Dev Plugin). If I use ü I get ü
Where can I specify the encoding, so that "ü" works? (I'm not sure if it makes a difference, but the "ü" is currently hardcoded in the .java file and not read from somewhere else).
The GWT compiler assumes, that your Java files are encoded in UTF-8. Make sure, that your editor is set to save in that encoding.
You should also make sure to set the encoding of the HTML page to a unicode capable encoding like UTF-8 (this allows you to use even more exotic characters that you won't find in other charsets):
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
...
Moreover, if you later want to retrieve the strings from a database, make sure, that it is also set up to handle Unicode, and that your JDBC driver connects in Unicode mode (required for some databases).

Using unicode characters in gwt checkbox label

How can I put a unicode character in the label(constructor) of a gwt checkbox. if I put the character in, gwt escapes the & and I end up with ë in the label of the checkbox instead of ë.
Unicode characters in Java String literals follow a special syntax.
In your case, you could write it like this:
new CheckBox("H\u00ebllo")
The code for "ë" is 00eb - you can use e.g. this table. By the way, 00ebhexadecimal = 235decimal
Another possibility is to save your Java files as UTF-8. Then you can write your literals without escaping for these characters. This however also requires you to set the compiler option -Dfile.encoding=UTF-8. Many IDEs do this automatically, if you set the encoding preference for the file to UTF-8.
Another important factor is that you should set the charset of your HTML page correctly (usually UTF-8):
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

How to parse XML with special characters?

Whenever I try to parse XML with special characters such as ō or 満月先生 I get an error. The xml documents claims to use UTF-8 encoding but that does not seem to be the case.
Here is what the troublesome text looks like when I view the XML in Firefox:
Bleach: The Diamond Dust
Rebellion - M� Hitotsu no
Hy�rinmaru; Bleach - The
DiamondDust Rebellion - Mou Hitotsu no
Hyourinmaru
On the actual website, Å� is actually the character ō.
<br /> One day,
Doraemon and his friends meet
Professor Mangetsu
(����,
Professor Mangetsu?), who studies
magic and magical beings such as
goblins, and his daughter Miyoko
(���,
Miyoko?), and are warned of the
dangerous approximation of the
"star of the
Underworld" to the
Earth's orbit.<br />
<br />
And once again, on the actual website, those characters appear as 満月先生 and 美夜子.
The actual XML file is formatted properly other than those special characters, which certainly do not appear to be using the UTF-8 encoding. Is there a way to get NSXML to parse these XML files?
To use other characters than those who are utf-8, you need to use their special character code. If you want to represent ö you need to type ö
Find more on
Wikipedia: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references