I am trying to do a simple thing - get all my albums.
the problem is that the album names are non-English ( they are in Hebrew ).
The code that retrieves the albums :
string query = "https://graph.facebook.com/me/albums?access_token=...";
string result = webClient.DownloadString(query);
And this is how one of the returned albums looks like :
{
"id": "410329886431",
"from": {
"name": "Noam Levinson",
"id": "500786431"
},
"name": "\u05ea\u05e2\u05e8\u05d5\u05db\u05ea \u05d2\u05de\u05e8 \u05e9\u05e0\u05d4 \u05d0",
"location": "\u05e9\u05e0\u05e7\u05e8",
"link": "http://www.facebook.com/album.php?aid=193564&id=500786431",
"count": 27,
"type": "normal",
"created_time": "2010-07-18T06:20:27+0000",
"updated_time": "2010-07-18T09:29:34+0000"
},
As you can see the problem is in the "name" property. Instead of Hebrew letters
I get those codes (These codes are not garbage, they are consistent - each code probably represents a single Hebrew letter).
The question is , how can I convert those codes to a non-English language ( In my case, Hebrew).
Or maybe the problem is how I retrive the albums with the webClient object. maybe change webclient.Encoding somehow?
what can I do to solve this problem ?
Thanks in advance.
That's how Unicode is represented in JSON (see the char definition in the sidebar). They are escape sequences in which the four hex digits are the Unicode code point of the character. Note that since there's only four hex digits available, only Unicode characters from the BMP can be represented in JSON.
Any decent JSON parser will transform these Unicode escape sequences into properly encoded characters for you - provided the target encoding supports the character in the first place.
I had the same problem with Facebook Graph Api and escaped unicode Romanian characters. I have used PHP but, you probably can translate the regexp method into javascript.
Method 1 (PHP):
$str = "\u05ea\u05e2\u05e8\u05d5\u05db\u05ea";
function esc_unicode2html($string) {
return preg_replace('/\\\\u([0-9a-z]{4})/', '&#x$1;', $string);
}
echo esc_unicode2html($str);
Method 2 (PHP) and probaby it works also if u declare the charset directly in the html:
header('content-type:text/html;charset=utf-8');
These are Unicode character codes. The \u sequence tells the parser that the next 4 characters are actually form a unicode character number. What these characters look like will depend on your font, if someone does not have the correct font they may just appear as a lot of square boxes.
That's about as much as I know, Unicode is complicated.
For Hebrew texts, this code in PHP will solve the problem:
$str = '\u05ea\u05e2\u05e8\u05d5\u05db\u05ea \u05d2\u05de\u05e8 \u05e9\u05e0\u05d4 \u05d0';
function decode_encoded_utf8($string){
return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
}
echo decode_encoded_utf8($str); // will show (תערוכת גמר שנה א) text
For Arabic texts use this:
$str = '\u00d8\u00ae\u00d9\u0084\u00d8\u00b5';
function decode_encoded_utf8($string){
return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
}
echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", decode_encoded_utf8($str));
Related
I'm hoping this is an easy question, and that I'm just not seeing the forest due to all the trees.
I have a string in flutter than came from a REST API that looks like this:
"What\u0027s this?"
The \u is causing a problem.
I can't do a string.replaceAll("\", "\") on it as the single slash means it's looking for a character after it, which is not what I need.
I tried doing a string.replaceAll(String.fromCharCode(0x92), "") to remove it - That didn't work.
I then tried using a regex to remove it like string.replaceAll("/(?:\)/", "") and the same single slash remains.
So, the question is how to remove that single slash, so I can add in a double slash, or replace it with a double slash?
Cheers
Jase
I found the issue. I was looking for hex 92 (0x92) and it should have been decimal 92.
I ended up solving the issue like this...
String removeUnicodeApostrophes(String strInput) {
// First remove the single slash.
String strModified = strInput.replaceAll(String.fromCharCode(92), "");
// Now, we can replace the rest of the unicode with a proper apostrophe.
return strModified.replaceAll("u0027", "\'");
}
When the string is read, I assume what's happening is that it's being interpreted as literal rather than as what it should be (code points) i.e. each character of \0027 is a separate character. You may actually be able to fix this depending on how you access the API - see the dart convert library. If you use utf8.decode on the raw data you may be able to avoid this entire problem.
However, if that's not an option there's an easy enough solution for you.
What's happening when you're writing out your regex or replace is that you're not escaping the backslash, so it's essentially becoming nothing. If you use a double slash, that solve the problem as it escapes the escape character. "\\" => "\".
The other option is to use a raw string like r"\" which ignores the escape character.
Paste this into https://dartpad.dartlang.org:
String withapostraphe = "What\u0027s this?";
String withapostraphe1 = withapostraphe.replaceAll('\u0027', '');
String withapostraphe2 = withapostraphe.replaceAll(String.fromCharCode(0x27), '');
print("Original encoded properly: $withapostraphe");
print("Replaced with nothing: $withapostraphe1");
print("Using char code for ': $withapostraphe2");
String unicodeNotDecoded = "What\\u0027s this?";
String unicodeWithApostraphe = unicodeNotDecoded.replaceAll('\\u0027', '\'');
String unicodeNoApostraphe = unicodeNotDecoded.replaceAll('\\u0027', '');
String unicodeRaw = unicodeNotDecoded.replaceAll(r"\u0027", "'");
print("Data as read with escaped unicode: $unicodeNotDecoded");
print("Data replaced with apostraphe: $unicodeWithApostraphe");
print("Data replaced with nothing: $unicodeNoApostraphe");
print("Data replaced using raw string: $unicodeRaw");
To see the result:
Original encoded properly: What's this?
Replaced with nothing: Whats this?
Using char code for ': Whats this?
Data as read with escaped unicode: What\u0027s this?
Data replaced with apostraphe: What's this?
Data replaced with nothing: Whats this?
Data replaced using raw string: What's this?
I have following json response.. here i displayed only few lines of it.
How can I replace "&" with & and store in an array.
Thanks in advance.
[
{
"cat_id": "1",
"cat_name": "Dining & Nightlife",
"0": {
"subcat_id": "2",
"subcat_name": "Restaurants"
},
"1": {
"subcat_id": "3",
"subcat_name": "Bar & Club"
}
}
]
Your JSON data isn't properly encoded. In addition to the JSON format, all the strings seem to be HTML encoded. This is a double encoding, which is completely unnecessary.
I strongly recommend you fix this at the source, i.e. on the server. Then you have proper JSON data that can be used without problems by many different clients.
If you cannot fix it on the server, then you have to decode each string attribute in the JSON response. Before you do that, you better investigate if only ampersands are affected (unlikely) or other special characters (such as < > ä ô) as well. Then you can add this to your question and we will likely be able to help you.
Update:
I don't quite understand what classes you're using to keep the parsed JSON data in memory. (You mention arrays.) But to fix the double encoded ampersands, use the following code on each string:
str = [str stringByReplacingOccurrencesOfString: #"&" withString: #"&"];
try this
NSString *sting=[[NSString alloc]init];
sting=[[request responseString] stringByReplacingOccurrencesOfString:#"&" withString:#"&"];
NSDictionary *resDict =[parser objectWithString:sting error:nil];
U have to do in take array. and using for loop u can replace & to & and that string add in array. and make new araay...
I have a trouble in received data with chinese-big5 encoded web-page,
and I tried to get some sample code but can not find I need for big5 like below:
if ([encodingName isEqualToString:#"euc-jp"]) {
receivedDataEncoding = NSJapaneseEUCSStringEncoding;
} else {
receivedDataEncoding = NSUTF8StringEncoding};
How to replace the part of "NSJapaneseEUCSStringEncoding" for big5 chinese encoding?
Thanks for answer first.
You can use the kCFStringEncodingBig5_E constant which is available in
CoreFoundation/CFStringEncodingExt.h
I would like to initialize a set of string with hebrew chars like this:
NSString *hebStr = #" <some hebrew> ";
The problem i am having is when I type the "some hebrew" in Xcode it comes out as "werbeh emos" Left-to-Right (LTR) instead of Right-to-Left (RTL).
How do I tell Xcode that what I want for that string only is RTL?
I have several characters that aren't recognized properly.
Characters like:
º
á
ó
(etc..)
This means that the characters encoding is not utf-8 right?
So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.
If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html
There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.
static void Main(string[] args)
{
Encoding[] matches = FindEncodingTable('Ÿ');
Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
}
// Locates all Encodings with the specified Character and position
// "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table
//"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table
static Encoding[] FindEncodingTable(int CharacterPosition, char character)
{
List matches = new List();
byte myByte = (byte)CharacterPosition;
byte[] bytes = { myByte };
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = thisEnc.GetChars(bytes);
if (chars[0] == character)
{
matches.Add(thisEnc);
break;
}
}
return matches.ToArray();
}
// Locates all Encodings that contain the specified character
static Encoding[] FindEncodingTable(char character)
{
List matches = new List();
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = { character };
byte[] temp = thisEnc.GetBytes(chars);
if (temp != null)
matches.Add(thisEnc);
}
return matches.ToArray();
}
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.
An example of encoding can be seen when browsing the internet:
The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:
www.example.com?search=...
The following variables on the URL require URL encoding. If you was to write:
www.example.com?search=cat food cheap
The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)
To correct this encoding error you should exchange the ' ' with '%20' to form this URL:
www.example.com?search=cat%20food%20cheap
Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.
Good Luck!