UTF-8 on FF cannot display french accents - encoding

On my FF browser, the encoding is set to UTF-8. The french accents are displayed properly on all pages except one page. On the trouble page, they show up as '?' marks. When I change the encoding to western, the trouble page displays french accents properly, while the other pages now do not display french accents properly.
On IE, the setting is UTF-8 and all pages show proper french accents

I know it's an old post. But, I was facing the same issue and I used htmlentities() in php, when nothing else worked out. This solved the purpose for me, so thought of mentioning it here so that someone else can benefit from it.

What's the web page?
Most likely the page's own encoding is ISO 8859-1 or something similar (a pure 8-bit encoding). Some web pages don't bother to specify their own encoding in the Content-Type: header, leaving the browser to guess. Apparently in this case Internet Explorer guesses better than Firefox.
If you have the curl command, try curl --head URL to see how and whether the encoding is specified, or right-click and View Page Info in Firefox.
You might consider contacting the owner of the web page and asking them to set the encoding properly (or, as I'd do, just ignore it).

Related

How to add Unicode emoji to the Internet Archive?

When visiting a website that contains Unicode emoji through the Wayback Machine, the emoji appear to be broken, for example:
https://web.archive.org/web/20210524131521/https://tmh.conlangs.de/emoji-language/
The emoji "😀" is rendered as "😀" and so forth:
This effect happens if a page is mistakenly rendered as if it was ISO-8859-1 encoded, even though it is actually UTF-8.
So it seems that the Wayback Machine is somehow confused about the character encoding of the page.
The original page source has a HTML5 <!doctype html> declaration and is valid HTML according to W3C's validator. The encoding is specified as utf-8 using a meta charset tag.
The original page renders correctly on all major platforms and browsers, for example Chrome on Linux, Safari on Mac OS, and Edge on Windows.
Does the Internet Archive crawler require a special way of specifying the encoding, or are emoji through UTF-8 simply not supported yet?
tl;dr The original page must be served with a charset in the HTTP content-type header.
As #JosefZ pointed out in the comments, the Wayback Machine mistakenly serves the page as windows-1252 (which has a similar effect as ISO-8859-1).
This is apparently the default encoding that the Internet Archive assumes if no charset can be detected.
The meta charset tag in the original page's source never takes effect when the archived page is rendered by the browser, because with all the extra JavaScript and CSS included by the Wayback Machine, the tag comes after the first 1024 bytes, which is too late according to the HTML5 specification: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset
So it seems that the Internet Archive does not take into account meta charset tags when crawling a page.
However, there are other archived pages such as https://web.archive.org/web/20210501053710/https://unicode.org/emoji/charts-13.0/full-emoji-list.html where Unicode emoji are displayed correctly.
It turns out that this correctly rendered page was originally served with a HTTP content-type header that includes a charset: text/html; charset=UTF-8
So, if the webserver of the original page is configured to send such a content-type HTTP header that includes the UTF-8 encoding, the Wayback Machine should display the page correctly after reindexing.
How the webserver can be configured to send the encoding with the content-type header depends on the exact webserver that is being used.
For Apache, for example, adding
AddDefaultCharset UTF-8
to the site's configuration or .htaccess file should work.
Note that for the Internet Archive to actually reindex the page, you may have to make a change to the original page's HTML content, not just change the HTTP headers.

(Question Marks) � instead of ÅÄÖ on old pages

Hi for some months ago all of my website pages started showing question marks � instead of ÅÄÖ. The browser sets default encoding to Unicode, however if I change it to Western European it shows ÅÄÖ totally fine.
The weird thing is, all new entries on the website shows ÅÄÖ totally fine with Unicode. It's only the old pages that seem to have the problem.
I tried to set charset in .htaccess and headers but without any luck.
Any idea what to do here?
Website with Unicode
Website With Western European
You need to set the correct encoding when you save the files. Your old files were probably saved with the wrong encoding.

utf 8 encoding not working

i have an utf8 encoding issue.
i have on all my html sites.
but if you go to http://lukasrauen.com/about.html you can see that some letters like ü or – are displayed wrong.
sometimes it works, sometimes it doesnt. but if you change the page to work and change it back to about it displays wrong.
what could be the fault for that?
i changed the serversettings to default_charset utf8, but still it doesnt work.
I implemented history.js for fluid page transitions, maybe it has something to do with that?
thanks in advance
Your page says it is UTF-8 but really it isn't. Enter the URL of your website at the W3C Validator to see for yourself.
This is the current result for http://lukasrauen.com/about.html
Make sure the file is actually saved in UTF-8 format.

How can I properly display Vietnamese characters in ColdFusion?

I having a hard trying to properly display Vietnamese text in ColdFusion. I've proper charset set to UTF-8 but still no luck. The same texts work fine in a HTML page. What else am I missing? Any suggestion would be much appreciated.
Html:
ColdFusion:
Thanks!
There are two things you need to watch out for, as far as I recall of the top of my head.
The first is to ensure that the .cfm file itself is saved as UTF-8 - this is a file system option, and will probably be settable in your editor. This ensures that the UTF-8 characters are correctly preserved when saving the file.
The other is that every .cfm file that includes any UTF-8 text should start with:
<cfprocessingdirective pageencoding="utf-8" />
This ensures that ColdFusion delivers the page to the browser in the correct format.
Just to be sure, when you display your working HTML, can you check the page encoding used by your browser (ie. in FireFox you can right-click+page Info). Maybe your text is not UTF-8 encoded that could explain the problem...

UTF8 encoding problem, same results work fine in wordpress

I have a wordpress installation that clients can edit, all characters display ok. On the main homepage I query the same database for the same title and post content, but it doesn't display correctly - just a question mark
I have tried sending the utf8 headers manually, through htaccess and through meta tags. I have used SET name UTF8 (which turns the characters into the diamond symbol with a questionmark inside).
I genuinely cant figure out what it could be now and I really need these characters to display correctly.
Heres the homepage, you can see in the Sounddhism 6 preview that there are lots of question marks, if you click on it you will see what they are meant to look like
http://nottingham.subverb.net
I have passed it through the validator and it gives me this error:
Sorry, I am unable to validate this document because on line 373 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xA0" does not map to Unicode
Which, i appreciate is supposed to help me, but I don't know what to do about it. Especially since that line, the letter generating the error is supposed to be a space and is AFTER the offending question marks.
Can anyone help?
Compare the encoding of both the back-end scripts in Wordpress and also your homepage script. If you're using IE, right-click the page and check the encoding. Sometimes it's set to "Auto-detect" and IE will often detect a different encoding for different pages, causing strange issues like this.
If you're not using IE, try using a tool like Fiddler to see exactly what encoding (and what bytes are being sent back and forth both in the back-end and your homepage script.
If forcing UTF-8 on your homepage script doesn't work, I would guess that the back-end is not using UTF-8.