On this page (WP) http://jamiestclair.com/band/ the charset is UTF-8, but the special characters in names such as Kai Brückner, and Kai Schönberg are showing up as
Kai Sch�nberg
A utf-8 encoding should take care of that....
Header is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Any help appreciated.
OK ----- here is all the relevant info. It's true the problem disappeared because I didn't know if anybody was going to answer this post, and I'm under a time constraint. So I fixed the special characters a different way, ie: spelling out the name Schönberg this way: Sch-&-ouml-;nberg (spaces not included).
It is now returned to the special characters, and they are now not rendering. The Doc type and charset is now this:
-- as opposed to being set at charset=utf-8
The odd thing is it is a Wordpress produced page. The problem with the characters not rendering is in the .php file that is producing the page. The exact same names are below in the body of the text which is in the db - and they are rendering correctly. It is just the characters in the HTML on the .php template page which is not rendering.
If that's not enuf information, tell me what else you need, and I'll include it. It's the latest v. of WP.
Try instead of "UTF-8" "iso-8859-1". It should work now.
Related
So I modified some emails I send to get rid of images and replace them by special unicode characters. For example I had an arrow image and replaced it with ↗ while wrapping it in a <span> to give it the color I want.
When I look at the source in Gmail (3 dots > Show Original) I see this:
...
--1234567890123456789012345678
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.=
w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8" />
</head>
<body>
...
... <span style=3D"font-family:arial,verdana;font-weight:bold;color:#209a20">↗</span> ...
...
</body>
</html>
--1234567890123456789012345678--
Which is what I'd expect since that's what I wrote in my code.
Now the problem is that it displays like this in the Gmail web interface:
What am I doing wrong? Isn't UTF-8 a unicode encoding that should support this character?
I would understand if some of these special characters are displayed as square boxes or something, but I do not understand how they can remain encoded while the turns into a space correctly.
It also makes me question whether other email clients will display these correctly (would love feedback on that too).
In the 1950's computers could handle only capital letters, digits and some punctuation.
Before 1970, EBCDIC was invented (only to later die out) for handling lower case and a few more punctuation characters.
Then came a plethora of encodings to handle European accents, Cyrillic, Greek, and eventually Chinese. (There are some interesting stories on the invention of typewriters for handling Chinese!)
Eventually, the Unicode group got together and slowly created a universal standard. It has been evolving for a few decades and continues to enhance it -- emojis are a big addition that is ongoing.
But, meanwhile, how does one put Emoji, etc, in URLs, type them on a keyboard, etc, etc? Those standards are lagging way behind. So, there are kludges in place.
HTML allows "entities", such as ↗ for that arrow.
Putting such in a URL would require something like %E2%86%97.
Several encodings also base their kludge on the hex encoding of the utf8.
Unicode allows \U8599 which is based on the decimal value of the "codepoint". (I think Java goes that direction.)
MySQL INSERT: UNHEX('E28697')
Keyboards -- good luck.
I don't know of anything other than HTML that reacts favorably to ↗
Ever notice a + in a URL? That is the encoding for a single space. (Also %20 works there.)
Try the HTML code rather than the HTML entity.
So ↗ for the north east arrow, as per
https://www.toptal.com/designers/htmlarrows/arrows/north-east-arrow/
Best reference for this is usually https://unicode-table.com/en/
Privjet!
I don't understand for what reason I am not getting displayed the non ASCII language characters like say, "ç, ñ, я " for my different languages.
The text in question is hardcoded, it is not served from a DB.
I have seen identical questions here
Charset=utf8 not working in my PHP page
I have seen that I should write this:
header('Content-type: text/html; charset=utf-8');
But where the heck does that go? I cant write it like that, the browser just mirrors the words and displays them as plain text, no parsing.
My encoding for the frontpage says this:
<head>
<meta charset="utf-8">
</head>
which is supposed to be Unicode.
I tried to test my page in validator.w3.org and it went:
Sorry, I am unable to validate this document because on line 60 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
Line 60 actuallly has the word Español (Spanish) with that weird n.
Any hint?
thank you
best regards
I use the Spring form taglib to generate html-forms within my xhtml page which is delivered with Content-Type: application/xhtml+xml;charset=UTF-8.
By default the the taglib escapes characters for HTML and thus it escapes e.g. the german umlaut ü to ü which is OK for HTML, but not for XML - it causes an unknown entity error on the client.
Of course I still want the XML characters (like <) to be escaped, but not perfectly valid UTF-8 characters. The taglib does have an option escapeHTML which I can set to false (even globally in web.xml), but then the XML-entities are not escaped anymore.
Surprisingly Google did not turn up anything useful here. It can't be that much of an uncommon problem, can it?
Read the source, it helps!
The escape symbols are loaded from classpath from file HtmlCharacterEntityReferences.properties in package org.springframework.web.util.
Create a file with the same name in the same package in a classpath folder with a higher priority than the spring-web.jar and with the following content:
160 = #160
34 = quot
38 = amp
39 = #39
60 = lt
62 = gt
And you'll be good.
It still feels a little hackish... I could not find any documentation about this and if it is not a documented feature it might easily be changed in a future version. Maybe someone has a better solution...
Every XHTML document served up with application/xhtml+xml should have an XHTML DOCTYPE declaration.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
(or any other valid XHTML DOCTYPE.)
The DTD in the declaration includes all HTML entity names, so you can use all of HTML's named references if you want.
That said, I think it's strange that Spring escapes things like ü. That shouldn't be necessary if the charset is UTF-8.
international html files archived by wget
should contain chars like this
(example hebrew and thai:)
אב
הם
and ยคน
instead they are saved like this:
íäáåãéú and ÃÒ¡à§é
How to get the these displayed properly?
iconv filename.html
iconv: illegal input sequence at position 1254
SOLVED: There was nothing wrong.
Only i didnt notice the default php.ini did set the charset in the http header but
to use various charsets like this meta http-equiv="Content-Type" content="text/html; charset=windows-874" you needed to set: default_charset = "empty";
....
The pages aren't "saved like this", whatever you're using to view the file is simply interpreting the encoding incorrectly. To know what encoding the file is in you should have paid attention to the HTTP Content-Type header during download; that's gone now.
Your only other chance is to parse the equivalent HTML meta tag in the <head>, if the document has one.
Otherwise, you can only guess the encoding of the document.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for more required background knowledge.
I am using a program which inserts text on to an image.
http://www.free-picture-editor.com/pixenate/themes/cardmaker/arrowheb.php
When I insert Unicode it turns it into question marks. I did add this line to the top of the php file:
<meta http-equiv="content-type" content="text/html;charset=UTF-8">
Any idea what else needs to be done to solve this?
You need a program that can handle Unicode characters.
The image appears fine for me. Unicode characters are what they should be, they are not question marks. That would imply a data conversion is bein performed to a charset that does not support those characters. So it has to be an issue with your particular webbrowser.
Something else to check: make sure your webserver is not sending a Content-Type header that specifies a different charset that overrides your HTML's tag. When I go to that URL, there is no charset specified in the Content-Type header, so the HTML charset is used. But maybe on your machine, your webbrowser is being identified differently and so the webserver sends a charset in the Content-Type header.