UTF8 encoding problem, same results work fine in wordpress - encoding

I have a wordpress installation that clients can edit, all characters display ok. On the main homepage I query the same database for the same title and post content, but it doesn't display correctly - just a question mark
I have tried sending the utf8 headers manually, through htaccess and through meta tags. I have used SET name UTF8 (which turns the characters into the diamond symbol with a questionmark inside).
I genuinely cant figure out what it could be now and I really need these characters to display correctly.
Heres the homepage, you can see in the Sounddhism 6 preview that there are lots of question marks, if you click on it you will see what they are meant to look like
http://nottingham.subverb.net
I have passed it through the validator and it gives me this error:
Sorry, I am unable to validate this document because on line 373 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xA0" does not map to Unicode
Which, i appreciate is supposed to help me, but I don't know what to do about it. Especially since that line, the letter generating the error is supposed to be a space and is AFTER the offending question marks.
Can anyone help?

Compare the encoding of both the back-end scripts in Wordpress and also your homepage script. If you're using IE, right-click the page and check the encoding. Sometimes it's set to "Auto-detect" and IE will often detect a different encoding for different pages, causing strange issues like this.
If you're not using IE, try using a tool like Fiddler to see exactly what encoding (and what bytes are being sent back and forth both in the back-end and your homepage script.
If forcing UTF-8 on your homepage script doesn't work, I would guess that the back-end is not using UTF-8.

Related

TinyMCE converting ampersand in querystring to HTML entity

Edit: My assumptions about encoding were incorrect. I'm leaving the question as originally asked in case others come here with the same misunderstanding.
When I include a link in some text in the editor that includes a querystring, then view the source code, I can see that it's converted any & characters in the href to &, which breaks the links.
A link
becomes
A link
and if I change it back to just & in the source, click Ok on the view source dialog, then immediately view source again, it's already worked its charms and encoded the & once again.
Is there a way to cue the editor to go ahead and convert those outside tag attributes, but not mess with those in attributes?
Using an older version (4.0.12), but I see the behavior on the current live sample right on tinymce.com, so if it's a bug it looks like it hasn't been fixed. But I am wondering if it's just a setting I'm missing.
Relevant questions:
Do I encode ampersands in <a href...>?
Do ampersands still need to be encoded in URLs in HTML5?
The HTML spec actually states that ampersands in HTML attributes have to be encoded so TinyMCE is working 100% as it should. If your server side code is not handling that correctly that is an issue with the server side code.

Support displaying emojis in UILabel

I'm having problems displaying emojis in a UILabel.
in some cases, it even causes a crash when lay-outing the characters in the label.
these characters are returning from server as unicode, and are parsed with AFNetworking framework.
this is an example of how it is returned from the server (console logs):
\U05d4\U05d9\U05d9
i have tried different approaches, like lowercasing this to "\u05d4" or playing with the encoding of the string returning.
nothing seems to work.
i did managed to show a couple of emojis properly (which makes me think it maybe a server related issue?) - does the server needs to support sets of unicode characters so it can return it in the appropriate encoding? i'd be happy if someone could clarify this point for me. (btw, server is written in RubyOnRails i believe.)
should i parse the data with a different parser (SBJSON)? although switching the networking framework at this point would be impossible due to time and resources available..
what other options do i have?
Thanks
i think you should be able to just paste an emoji character in the code directly as a text.

UTF-8 on FF cannot display french accents

On my FF browser, the encoding is set to UTF-8. The french accents are displayed properly on all pages except one page. On the trouble page, they show up as '?' marks. When I change the encoding to western, the trouble page displays french accents properly, while the other pages now do not display french accents properly.
On IE, the setting is UTF-8 and all pages show proper french accents
I know it's an old post. But, I was facing the same issue and I used htmlentities() in php, when nothing else worked out. This solved the purpose for me, so thought of mentioning it here so that someone else can benefit from it.
What's the web page?
Most likely the page's own encoding is ISO 8859-1 or something similar (a pure 8-bit encoding). Some web pages don't bother to specify their own encoding in the Content-Type: header, leaving the browser to guess. Apparently in this case Internet Explorer guesses better than Firefox.
If you have the curl command, try curl --head URL to see how and whether the encoding is specified, or right-click and View Page Info in Firefox.
You might consider contacting the owner of the web page and asking them to set the encoding properly (or, as I'd do, just ignore it).

How can I properly display Vietnamese characters in ColdFusion?

I having a hard trying to properly display Vietnamese text in ColdFusion. I've proper charset set to UTF-8 but still no luck. The same texts work fine in a HTML page. What else am I missing? Any suggestion would be much appreciated.
Html:
ColdFusion:
Thanks!
There are two things you need to watch out for, as far as I recall of the top of my head.
The first is to ensure that the .cfm file itself is saved as UTF-8 - this is a file system option, and will probably be settable in your editor. This ensures that the UTF-8 characters are correctly preserved when saving the file.
The other is that every .cfm file that includes any UTF-8 text should start with:
<cfprocessingdirective pageencoding="utf-8" />
This ensures that ColdFusion delivers the page to the browser in the correct format.
Just to be sure, when you display your working HTML, can you check the page encoding used by your browser (ie. in FireFox you can right-click+page Info). Maybe your text is not UTF-8 encoded that could explain the problem...

Why are accented characters rendering inconsistently when accessing the same code on the same server at a different URL?

There is a page on our server that's reachable via two different URLs.
http://www.spotlight.com/6213-5613-0721
http://www.spotlight.com/interactive/cv/1/M103546.html
There's classic ASP behind the scenes, and both of those URLs actually do a Server.Transfer to the same underlying ASP page.
The accents in the name at the top of the page are rendering correctly on one URL and incorrectly on the other - but as far as I can tell, the two requests are returning identical responses (same markup, same headers, same everything) - and I have absolutely no idea why one URL should be rendering correctly whilst the other is corrupting the accented characters.
Is there anything else (content encoding?) that I should be examining - and if so, how can I tell what's being returned beyond the information displayed in Firebug?
I been in this problem in the past and the problem was that some file (maybe the asp file that do the transfer or some include) is not saved as ANSI.
Check that all files involved in the request has the same encoding in the server (try File -> Save As With Encoding)
I have checked the character encoding in your headers and meta tags and they are consistent across both pages. I also agree that the output of the pages is largely similar - except for the special characters, which are "messed up" in the source file.
I don't think this issue exists in the browser, the must be something behind the scenes that causes this. How does the name containing these characters get from the data store to the page?