special character issue in text getting from CMS - iphone

I am using API to get data from CMS, we are displaying text what user has entered into CMS,
But my problem is when user enter some special character into CMS,I am not able to get those text on iphone side
Here is the link of text what user has entered in wall description
We are using json web service, they are encode string to utf-8 so my json string will be
The word 'stop' isn\u0092t in your vocabulary. Run a marathon in 4.5 hours or less.
The utf character \u0092 is a special character we need to display same in shown in above image
NOTE:
1)if we pass string without encoding to utf-8 in webservice,I am getting whole string as null .
2)I have try with [NSString stringWithCString:[textFromCms cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding];
where textFromCms is text I got from cms as show above.
3)I also try without any conversation/encoding ….it ignore the special character
4)also try with base64 but did not help that also.
Any help would be so appreciated.

The CMS apparently uses windows-1252, not UTF-8. The curly apostrophe is 92 (hex) in windows-1252, U+2019 in Unicode, so when properly encoded into JSON, it should be \2019.

Related

Displaying foreign characters on the website

I have a small list in a foreign language and I am able to display the special foreign characters on the website I am updating. For example to display ü on the website, I write ü in the file. Or to display ö, I write ö in the file. And they are displayed correctly. So far no problems. But now I must also display the character β. Can you just write me the code for it in that same set? Or better yet, tell me where can I find the corresponding character? such as in a list? what is the name of the list I must look at? Again, I want to display character β on a website, by writing the corresponding special character on the source file, just like I am writing ü to display ü.
Mojibake is what's happening, because your text editor use ISO 8859-1 to open and save the files, but your web server serve them to your user with UTF-8. You can confirm it with https://string-functions.com/encodedecode.aspx or other tools using encode set to ISO 8859-1 but the decode set to UTF-8.
The fix is to set your text editor to use UTF-8.

How to convert GB2312 (chinese) characters in UTF-8 inside Weblogic 12?

We have pages that´re using simplified Chinese (GB2312) in the HTML form. When we submit the form with 3 Chinese characters in a text field, we receive 6 others characters (that aren't in Chinese) in the server (Weblogic 12). Then we save these 6 characters in the database, and when get it back to the screen the Chinese character is back. Work´s fine!
But, when we receive the 6 characters (that aren't Chinese) in the server, we would need to call a web service that only receive UTF-8 characters.How can I convert the original 3 Chinese (GB2312, that´re in the HTML page) characters in UTF-8?
I don´t know if the characters will be show in the forum, but I´ll try:
In the HTML form: 陈玉珍
Received in the Weblogic server: ³ÂÓñÕä
http://www.521yy.com/tools/GB2312-UTF8.html
Since this is a Chinese website. Let me explain a little bit in English
Copy your Chinese character into the black form, and click the left button on the bottom, it should convert to UTF-8 character. And the right bottom is reset. Hope it helps.

Should I use hex ascii accented character code in HTML or use the actual character?

I have several huge CSVs with lots of accented characters in html hex code: é for é and lots of others, even – for –, etc.
My site is a wiki for people to update listings. So when they are presented a textarea for update, the existing content is filled in, and obviously those hex codes will be shown.
Should I be bothered replacing those codes with actual accented characters, or just leave it as it is? I wrote a script to replace the characters, but somehow the output are weird characters. Probably the format saved in Ruby isn't in UTF-8 format.
By default my site is in UTF-8, and the accented characters are displayed properly with some html coding in the view.
Please advise. Thanks.
Could you clarify what the problem is?
If your data (CSV) is in UTF-8, and the default encoding of your site is UTF-8, then all you would need to do is make sure that when users are editing content, that content is properly treated as UTF-8.
You may not need to display the markup to the users. Perhaps you could leverage a WYSIWIG editor package like TinyMCE?

UTF8 charset, diacritical elements, conversion problems - and Zend Framework form escaping

I am writing a webapp in ZF and am having serious issues with UTF8. It's using multi lingual content through Zend Form and it seems that ZF heavily escapes all of these characters and basically just won't show a field if there's diacritical elements 'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
Zend Form allows for having non escaped data, but trying to use this is confusing, and it seems it'd need to be used all over the place.
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
And if the last question is true, then how do I convert the source text to UTF8? I am comfortable setting up apache so that it sends a default UTF8 charset heading, and also adding the charset meta tag to the html, but doing this I am still getting messed up encoding. I have also tried opening the translation csv file in TextWrangler on OSX as UTF8, but it has done nothing.
Thanks!
L
'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
This I don't understand. Can you show an example of how it is displayed, as opposed to how it should be displayed?
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
Yup. In more detail: If the data you're displaying and the encoding of the HTML page are both UTF-8, the multi-byte special characters will be displayed correctly.
And if the last question is true, then how do I convert the source text to UTF8?
Advanced editors and IDEs enable you to define what encoding the source file is saved in. You would need to open the file in its current encoding (with special characters being displayed correctly) and save it as UTF-8.
If the content is messed up when you have the right content-type header and/or meta tag specified, then the content is not UTF-8 yet. If you don't get it sorted, post an example of what it looks like here.

What kind of text code is %62%69%73%68%6F%70?

On a specific webpage, when I hover over a link, I can see the text as "bishop" but when I copy-and-paste the link to TextPad, it shows up as "%62%69%73%68%6F%70". What kind of code is this, and how can I convert it into text?
Thanks!
URL encoding, I think.
You can decode it here: http://meyerweb.com/eric/tools/dencoder/
Most programming languages will have functions to urlencode/decode too.
This is URL encoding. It is designed to pass characters like < / or & through a URL using their ASCII values in hex after a %. However, you can also use this for characters that don't need encoding per se. Makes the URL harder to read, which is sometimes desirable.
URL encoding replaces characters outside the ascii set.
More info about URL encoding in the w3schools site.
As mentioned by others, this is simply an ASCII representation of the text so that it can be passed around the HTTP object easily. If you've ever noticed typing in a website URL that has a space in it, the browser will usually convert that to %20. That's the hexadecimal value for the "space" character in ASCII.
This used to be a way to trick old spam scrapers. One way spammers get email addresses is to scrape the source code of websites for strings matching the pattern "username#company.tld". By encoding just the username portion or the whole string as ASCII characters, the string would be readable by humans, but would require the scraper to convert it to a literal string before it could be used to send emails. Of course, modern-day spamming tools account for these sort of strings.