I am using a program which inserts text on to an image.
http://www.free-picture-editor.com/pixenate/themes/cardmaker/arrowheb.php
When I insert Unicode it turns it into question marks. I did add this line to the top of the php file:
<meta http-equiv="content-type" content="text/html;charset=UTF-8">
Any idea what else needs to be done to solve this?
You need a program that can handle Unicode characters.
The image appears fine for me. Unicode characters are what they should be, they are not question marks. That would imply a data conversion is bein performed to a charset that does not support those characters. So it has to be an issue with your particular webbrowser.
Something else to check: make sure your webserver is not sending a Content-Type header that specifies a different charset that overrides your HTML's tag. When I go to that URL, there is no charset specified in the Content-Type header, so the HTML charset is used. But maybe on your machine, your webbrowser is being identified differently and so the webserver sends a charset in the Content-Type header.
Related
On this page (WP) http://jamiestclair.com/band/ the charset is UTF-8, but the special characters in names such as Kai Brückner, and Kai Schönberg are showing up as
Kai Sch�nberg
A utf-8 encoding should take care of that....
Header is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Any help appreciated.
OK ----- here is all the relevant info. It's true the problem disappeared because I didn't know if anybody was going to answer this post, and I'm under a time constraint. So I fixed the special characters a different way, ie: spelling out the name Schönberg this way: Sch-&-ouml-;nberg (spaces not included).
It is now returned to the special characters, and they are now not rendering. The Doc type and charset is now this:
-- as opposed to being set at charset=utf-8
The odd thing is it is a Wordpress produced page. The problem with the characters not rendering is in the .php file that is producing the page. The exact same names are below in the body of the text which is in the db - and they are rendering correctly. It is just the characters in the HTML on the .php template page which is not rendering.
If that's not enuf information, tell me what else you need, and I'll include it. It's the latest v. of WP.
Try instead of "UTF-8" "iso-8859-1". It should work now.
Privjet!
I don't understand for what reason I am not getting displayed the non ASCII language characters like say, "ç, ñ, я " for my different languages.
The text in question is hardcoded, it is not served from a DB.
I have seen identical questions here
Charset=utf8 not working in my PHP page
I have seen that I should write this:
header('Content-type: text/html; charset=utf-8');
But where the heck does that go? I cant write it like that, the browser just mirrors the words and displays them as plain text, no parsing.
My encoding for the frontpage says this:
<head>
<meta charset="utf-8">
</head>
which is supposed to be Unicode.
I tried to test my page in validator.w3.org and it went:
Sorry, I am unable to validate this document because on line 60 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
Line 60 actuallly has the word Español (Spanish) with that weird n.
Any hint?
thank you
best regards
international html files archived by wget
should contain chars like this
(example hebrew and thai:)
אב
הם
and ยคน
instead they are saved like this:
íäáåãéú and ÃÒ¡à§é
How to get the these displayed properly?
iconv filename.html
iconv: illegal input sequence at position 1254
SOLVED: There was nothing wrong.
Only i didnt notice the default php.ini did set the charset in the http header but
to use various charsets like this meta http-equiv="Content-Type" content="text/html; charset=windows-874" you needed to set: default_charset = "empty";
....
The pages aren't "saved like this", whatever you're using to view the file is simply interpreting the encoding incorrectly. To know what encoding the file is in you should have paid attention to the HTTP Content-Type header during download; that's gone now.
Your only other chance is to parse the equivalent HTML meta tag in the <head>, if the document has one.
Otherwise, you can only guess the encoding of the document.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for more required background knowledge.
we built a java ee web project and use jdbc for storing our data.
The problem is that German 'Umlaute' like äöü are in use and properly stored in the mysql database. We don't know why, but in the browser those characters are broken, displaying weird stuff like
ö�
instead.
I've already tried setting the encoding of the jdbc connection like described in this question:
JDBC character encoding
And the encoding of the html page is correctly set:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
Any ideas how to fix that?
Update
connection.prepareStatement("SET CHARACTER SET utf8").execute();
won't make umlauts work.
changing the meta-tag to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
won't change anything, too
"We don't know why, but in the browser those characters are broken"
Well, that's the first thing to find out. You should trace your data at every stage:
As you fetch it out of the database (with logging)
When you inject it into the page (with logging)
On the wire (via Wireshark)
When you log, don't just log the strings: log the Unicode characters that make up the strings, as integers. Just cast each character in the string to an integer and log it. It's primitive, but it'll tell you what you need to know.
When you look on the wire, of course, you'll be seeing bytes rather than characters as such. You should work out what bytes you expect for your chosen encoding, and check those against what's actually coming across the network.
You've specified the encoding in the HTML - but have you told whatever's generating your page that you want it in ISO Latin 1? That's likely to be responsible for both setting the content-type header and performing the actual conversion from text to bytes.
Additionally, is there any reason why you're using ISO Latin 1 instead of UTF-8? Why would you deliberately restrict yourself like that? (ISO Latin 1 can only handle the first 256 characters of Unicode, instead of the full range of Unicode characters. UTF-8 can handle everything, and is just as efficient for ASCII.)
There doesn't seem to be an accepted way of sending down a header parameter in non ascii format.
The header for file download usually looks like
Content-disposition: attachment; filename="theasciifilename.doc"
Except if you smash a utf8 encoded string in the filename parameter, Firefox will handle it fine, whereas IE will throw up.
There is a document on CodeProject that explains a method for encoding the filename.
This document encodes Bản Kiểm Kê.doc to B%e1%ba%a3n%20Ki%e1%bb%83m%20K%c3%aa.doc by hex encoding the bytes.
Problem #1: the first character in that string: ả has a value of ả -- encode that number in Hex and you get %a3%1e. How did this guy get %e1%ba%a3? (I'm obviously missing something simple here)
Problem #2: While IE acknowledges this encoding, Firefox doesn't! What to do?
The specs basically don't permit anything other than US-ASCII. HTTP headers are US-ASCII. HTTP's payload defaults to ISO 8859-1 but that refers to the content body, not the headers.
Arguably the Right Thing to do would be to use MIME's technique for encoding non-ASCII data in headers, as described in RFC 2047, but I have no idea whether browsers actually support that.
EDIT: Whoops, no, RFC 2047 section 5 explicitly says that the encoded form is not permitted in Content-Disposition. Looks like you're out of luck - there is no standard.
EDIT 2: There is a standard - RFC 2231 defines how this is now supposed to work. It has support from some browsers, but is not supported in IE. I found some test cases which demonstrate how it works and what browser support is available.
Answer to question #1: You are confusing Unicode and UTF-8. The hex value of 'ả' is 0xA31E however that is not a UTF-8 character. In UTF-8 that character requries three bytes, 0xE1 0xBA 0xA3. URL encoding is poorly defined for non-ascii encodings but %e1%ba%a3 is the valid UTF-8 encoding to use for that character.
For Problem #2 you need to URL encode the file name for both Internet Explorer and Firefox. The only difference is that you need to use the format of RFC 2231 in Firefox.
This applies to Firefox 3 and Internet Explorer 7.
In the link you've got above, e1 ba a3 is the UTF-8 encoding of the character mentioned, not the character code.
Answer (sort of) to problem #2:
Since you've discovered that the naming scheme in one browser does not work in the other, your only solution is to do it differently for each browser, similar to the example here.
In case the link goes away, the solution is basically:
1. If browser is IE URL encode filename
2. Generate Content-disposition header
Of course determining if the browser is IE by User-agent (which is about the only way you can do it) is fraught with all sorts of the usual peril.
As North American centric as this sounds, if it is important that this work in a large number of browsers you do not control which may have the User-agent blocked, or modified, then simply avoid UTF-8 encoded characters in the filename and always use "Download" or something.
Unfortunately, there currently is no single way that would work in all User Agents.
See http://greenbytes.de/tech/tc2231/ for test cases, then complain to Microsoft, Google and Apple.