Getting weird characters when converting NSArray to JSON - iphone

I am getting some weird characters when converting a NSArray containing NSDictionaries to a JSON string.
I tried using both SBJson and NSJSONSerialization with the same result.
The NSDictionary is populated with the content of the address book, with the contact name, email and phone number, and are mostly in hebrew.
The characters look like this:
\327\237
I could not find any information about this, help anyone?
Thanks in advance!
EDIT *
Here is a snippet of the JSON:
[
{"fname":"סתם טקסט"},
{"fname":"סתם טקסט"},
{"fname":"נ\327\231ר"}
]
its supposed to be:
[
{"fname":"סתם טקסט"},
{"fname":"סתם טקסט"},
{"fname":"ניר"}
]
And i am getting the JSON by using the following code:
NSData *jsonData = [NSJSONSerialization dataWithJSONObject:ContactsArray options:NSJSONReadingMutableLeaves error:&err];
NSLog(#"JSON: %#", [NSString stringWithUTF8String:[jsonData bytes]]);

These characters are octal escape codes. I prefer to look at things in hex. \327 and \237 are 0xD7 and 0x9F in hex.
I looked up U+00D7 and U+009F (unicode characters). They are MULTIPLICATION SIGN and APPLICATION PROGRAM COMMAND. That doesn't make sense in this context, so a straight conversion is not the way to go.
Next, I thought UTF-8 encoding. D7 9F decodes as U+05DF. This is HEBREW LETTER FINAL NUN. That makes sense in this context.
So, I'm guess the data you are seeing in UTF-8 characters that are not understood and octal escaped. JSON doesn't support octal escapes, so I'm guessing it's NSLog() or whatever you are using to print the JSON that is doing the escaping.

Related

C# To transform Facebook Response to proper encoded string

I am using regular Stream Reader to get response from Facebook graph API response
https://graph.facebook.com/XXXX?access_token=&fields=id,name,about,address,last_name
I am reading the response stream yet it returns me
{"id":"XXXXX","name":"K\u0131r\u0131nt\u0131 Reklam"...}
My code is below - I unsuccessfully tried using explicitly UTF-8 and "iso-8859-9" (Turkish) encodings and setting accept-charset headers. I read Joel's famous article about encodings. It looks like each of the chars '\' 'u' '1' '3' '1' are coming as characters from facebook - I thought this would have been 2 bytes for value 131 in UTF-8. I am confused. I expect this string to be "Kırıntı Reklam".
I could simply find/replace those strings - yet it would be far from elegant and maintainable. How should I properly process or convert the facebook graph api response for strings with accents?
using (WebResponse response = request.GetResponse())
{
using (Stream dataStream = response.GetResponseStream())
{
if (dataStream != null)
{
using (StreamReader reader = new StreamReader(dataStream))
{
responseFromServer = reader.ReadToEnd();
}
}
}
}
Thank you in advance
tldr; use a JSON library - I like Json.NET - and don't worry about it.
The JSON shown is valid JSON where \uABCD in a JSON string represents a UTF-16 encoded character1. The internal JSON character escaping format is useful to avoid having to deal with Unicode stream encoding issues - it allows JSON to be represented entirely in ASCII/7-bit-clean characters (which is a subset of UTF-8).
Using a conforming JSON library to parse the JSON with such escapes would restore the JSON into an appropriate object-graph, of which some values will be properly-decoded String values. The library is responsible for understanding JSON and converting/reading it as appropriate - this includes correctly handling any such \u escape sequences.
The stream itself (that of the JSON text) should use the encoding that the server says, is indicated by a BOM, or has been pre-negotiated: but really, just UTF-8 here. This is how the JSON text is encoded, but has no bearing on the escape sequences found in JSON strings.
1 Per RFC 4627, The application/json Media Type for JavaScript Object Notation (JSON):
Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".
Alternatively, there are two-character sequence escape
representations of some popular characters. So, for example, a
string containing only a single reverse solidus character may be
represented more compactly as "\\".
To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E"
For the doubters, here is a LINQPad example. This uses JSON.Net and imports the Newtonsoft.Json.Linq namespace.
var json = #"{""name"":""K\u0131r\u0131nt\u0131 Reklam""}";
json.Dump(); // -> {"name":"K\u0131r\u0131nt\u0131 Reklam"}
var name = JObject.Parse(json)["name"].ToString();
(name == "Kırıntı Reklam").Dump(); // -> true

ios UTF8 encoding from nsstring

I am receiving a nsstring that is not properly encoded like "mystring%201, where must be "mystring 1". How could I replace all characters that could be interpreted as UTF8? I read a lot of posts but not a full solution. Please note that nsstring is already encoded wrong and I am not asking about how to encode char sequence. Thank you.
- (NSString *)stringByReplacingPercentEscapesUsingEncoding:(NSStringEncoding)encoding is what you want. basically use it like so:
newString = [myString stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
[urlString stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]
Check the Strings and Non-ASCII Characters section of Formatting String Objects:
NSString *s = [NSString stringWithUTF8String:"Long \xe2\x80\x94 dash"];
lblDate.text=[NSString stringWithCString:[[arrTripDetail valueForKey:Param_vCurrencySymbol] UTF8String] encoding:NSUTF8StringEncoding];
U can Convert & get Custom Emoji to string
eg :
input : \U20b9
Output: ₹
Do you want just percent encoding/decoding or full URL encoding/decoding? -(NSString*)stringByReplacingPercentEscapesUsingEncoding: will work if it is just percent encoding, but if there is full URL encoding there (so, for example a space could be either %20 or +) then you'll need something like the url decoding in the three20 library (or search on there, there are lots of examples of how to do it such as URL decoding/encoding NSString).

iOS Certain Octal Escape Sequences Cause nil for stringWithUTF8String

Our code calls stringWithUTF8String but some data we have uses an octal sequence \340 in the string. This causes some code to break because we never expect the function to return nil. I did some research and found that any octal sequence from \200-\777 will give the same result. I know I can handle this returning nil but I want to understand why it would return nil, and what those octal escapes are interpreted as.
NSString *result = [NSString stringWithUTF8String:"Mfile \340 xyz.jpg"];
running this code return nil for result. It appears that to code defensively we will have to check null results for this everywhere where we use it which seems unfortunate. The documentation for the function does not say anything about returning nil as a possibility. I would bet that there is a lot of code out there that does not check for it either.
The UTF-8 Character Table doesn't have an entry for \340. You need to use the ASCII encoding for this. Do,
NSString * result = [NSString stringWithCString:"Mfile \340 xyz.jpg" encoding:NSASCIIStringEncoding];
NSLog(#"%#", result);
If you want iOS to handle it as UTF-8 you have to make sure it's valid UTF-8 characters you pass to it, so you may need to convert the octal characters to something human readable first.
I added a category which is called safeStringWithUTF8String: this is called everywhere instead it simply checks the return value for nil and returns the empty string if not valid. Not great but not sure what else to do we have to be able to handle any data passed in.

Perl JSON pound sign escaping

I am trying to use a web API of a service written in Perl (OTRS).
The data is sent in JSON format.
One of the string values inside the JSON structure contains a pound sign, which in apparently is used as a comment character in JSON.
This results in a parsing error:
unexpected end of string while parsing
JSON string
I couldn't find how to escape the character in order to get the string parsed successfully.
The obvious slash escaping results in:
illegal backslash escape sequence in
string
Any ideas how to escape it?
Update:
The URL I am trying to use looks something like that (simplified but still causes the error):
http://otrs.server.url/otrs/json.pl?User=username&Password=password&Object=TicketObject&Method=ArticleSend&Data={"Subject":"[Ticket#100000] Test Ticket from OTRS"}
Use Uri::escape:
use URI::Escape;
my $safe = uri_escape($url);
See rfc1738 for the list of characters which can be unsafe.
The hash symbol, #, has a special meaning in URLs, not in JSON. Your URL is probably getting truncated at the hash before the remove server even sees it:
http://otrs.server.url/otrs/json.pl?User=username&Password=password&Object=TicketObject&Method=ArticleSend&Data={"Subject":"[Ticket
And that means that the remote server gets mangled JSON in Data. The solution is to URL encode your parameters before pasting them together to form your URL; eugene y tells you how to do this.

stringByAddingPercentEscapesUsingEncoding not working with NSStrings with ' 0'

I have been having some problem with the stringByAddingPercentEscapesUsingEncoding: method.
Here's what happens:
When I try to use the method to convert the NSString:
"..City=Cl&PostalCode=Rh6 0Nt"
I get this this..
"City=Cl&PostalCode=Rh62t"
It should be:
"..City=Cl&PostalCode=Rh6%200Nt"
What can I do about this? Thanks in advance !!
For me, this:
NSString *s=[#"..City=Cl&PostalCode=Rh6 0Nt" stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(#"s=%#",s);
... outputs:
s=..City=Cl&PostalCode=Rh6%200Nt
You're most likely using the wrong encoding.
This happens when you're trying to encode to NSASCIIStringEncoding a string with characters not supported by ASCII.
Make sure you're encoding to NSUTF8StringEncoding, if the string can contain UTF8 characters or the method would return nil.