How to parse special characters in XML for iPad? - iphone

I am getting problem while parsing xml files that contains some special characters like single quote,double quote (', "")etc.I am using NSXMLParser's parser:foundCharacters:method to collect characters in my code.
<synctext type = "word" >They raced to the park Arthur pointed to a sign "Whats that say" he asked Zoo said DW Easy as pie</synctext>
When i parse and save the text from above tag of my xml file,the resultant string is appearing,in GDB, as
"\n\t\tThey raced to the park Arthur pointed to a sign \"Whats that say\" he asked Zoo said DW Easy as pie";
Observe there are 2 issues:
1)Unwanted characters at the beginning of the string.
2)The double quotes around Whats that say.
Can any one please help me how to get rid of these unwanted characters and how to read special characters properly.

NSString*string =[string stringByTrimmingCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:#" \n\t"]];

The parser is apparently returning exactly what's in the string. That is, the XML was coded with the starting tag on one line, a newline, two tabs, and the start of the string. And quotes in the string are obviously there in the original (and it's not clear in at least this example why you'd want to delete them).
But if you want these characters gone then you need to post-process the string. You can use Rams' statement to eliminate the newline and tabs, and stringByReplacingOccurrencesOfString:WithString: to zap the quotes.
(Note that some XML parsers can be instructed to return strings like this with the leading/trailing stuff stripped, but I'm not sure about this one. The quotes will always be there, though.)

Related

Differentiate properly escaped HTML metacharacters from improperly escaped ones

I'm working on a replacement for a desktop Java app, a single page app written in Scala and Lift.
I have this situation where some of data in the database has properly used HTML metacharacters, such as Unicode escape sequences for accented characters in non-English names. At the same time, I have other data with improper HTML metacharacters, such as ampersands in the names or organizations.
Good (don't escape): Universita\u0027
Bad (needs escape): Bob & Jim
How do I determine whether or not the data needs to be fixed before I send it to the client?
There are two ways to approach this. One is a function that takes a string and returns the index of any improperly escaped HTML metacharacters (which I can fix myself). Alternately it could be a function that takes a string and returns a string with the improperly escaped metacharacters fixed, and leaves the proper ones alone.

retrieving NSString from array in iPhone

I have an array which contains description of a route on map. I got this array by parsing JSON. My arrays contains string in this format:
"<b>Sri Krishna Nagar Rd</b> \U306b\U5411\U304b\U3063\U3066<b>\U5317\U6771</b>\U306b\U9032\U3080",
"\U53f3\U6298\U3057\U3066\U305d\U306e\U307e\U307e <b>Sri Krishna Nagar Rd</b> \U3092\U9032\U3080",
"\U5927\U304d\U304f\U5de6\U65b9\U5411\U306b\U66f2\U304c\U308a\U305d\U306e\U307e\U307e <b>Bailey Rd/<wbr/>NH 30</b> \U3092\U9032\U3080<div class=\"\">\U305d\U306e\U307e\U307e NH 30 \U3092\U9032\U3080</div><div class=\"google_note\">\n<b landmarkid=\"0x39ed57bfe47253b7:0x779c8bf48892f269\" class=\"dir-landmark\">Petrol Bunk</b>\U3092\U901a\U904e\U3059\U308b<div class=\"dirseg-sub\">\Uff083.9 km \U5148\U3001\U53f3\U624b\Uff09</div>\n</div>",
Now I want to get name of places from this array like Sri Krishna Nagar Rd , NH 30 Petrol Bunk. First two should give Sri Krishna Nagar Rd and last on should give NH 30 Petrol
Bunk
How can I get result like this.Any help would be appreciated. Thanx In Advance.
Again, suppose I have string in this format..."\U5de6\U6298\U3059\U308b" which don't have ny place name.How will i handle this scenarios.
You can get like below:
NSString *strName=[yourArray objectAtIndex:index];
NSString *yourPlaceString=[[strName componentsSeparatedByString:#"<b>"] objectAtIndex:1];
yourPlaceString=[[yourPlaceString componentsSeparatedByString:#"</b>"] objectAtIndex:0];
you can get all places like this.
First of all, you should check if you don't have any other cleaner API available for the service you query this data. If the service returns such garbage in its JSON response, that shouldn't be your responsability to clean up that mess: the service should return some text that is more usable if it is a real clean API.
Next, if you really don't have any other choice and really need to clean this text, you have two options:
If the text is XHTML (I mean real XHTML, conforming to the XML standard) you may use an NSXMLParser to filter out any tags and only keep the text from your string. This may be a bit too much for this anyway so I don't really recommand it.
You can use regular expressions. If you are developping for iOS4.0+ you can use the NSRegularExpressionclass for this purpose. The tricky part is to get the right regex (can help you with that if needed)
You can use the NSScanner class (which is available in iOS since 2.0 IIRC) to scan characters in you string and parse it. This is probably easier to understand and the way to go if you are not a regex expert, so I recommand this approach
For example if you choose the NSScannersolution, you can scan your string for characters in the alphanumeric character set, to scan letters and digits and accumulate it (you may also add ponctuation characters to your NSCharacterSetyou are using if needed). You will have the NSScanner to stop when it encounter characters such as the unicode characters \Uxxxx or like < and >. When you encounter < you can then ask the NSScanner to ignore the characters up to the next >, then start to scan the alphanumeric characters again and accumulating... and so on until the end of the string.
Finally, if you really find a pattern in the response string you are receiving, like if your place names is always between the first <b> and </b> pair (but you have to be sure of that), you can handle it other ways, like:
splitting your string using the <b> text as the separator (e.g. componentsSeparatedByString)
or asking the rangeOfString for the string <b> and then for string </b> and once you have their position, only extract substringWithRange from your original string to extract only the place name (using rangeOfString will be faster that componentsSeparatedByString because it will stop on the first occurrence found)
It looks like an encoding problem - can you change the encoding of the source or target to a different format. I had similar issues with German ö ä ü characters when UTF-8 was turned off....

Stig JSON library parse error: How do you accommodate new lines in JSON?

I have some xml that is coming back from a web service. I in turn use xslt to turn that xml into json (I am turning someone else's xml service into a json-based service). My service, which is now outputting JSON, is consumed by my iphone app using the de facto iphone json framework, SBJSON.
The problem is, using the [string JSONValue] method chokes, and I can see that it's due to line breaks. Lo and behold, even the FAQ tells me the problem but I don't know how to fix it.
The parser fails to parse string X
Are you sure it's legal JSON? This framework is really strict, so won't accept stuff that (apparently) several validators accepts. In particular, literal TAB, NEWLINE or CARRIAGE RETURN (and all other control characters) characters in string tokens are disallowed, but can be very difficult to spot. (These characters are allowed between tokens, of course.)
If you get something like the below (the number may vary) then one of your strings has disallowed Unicode control characters in it.
NSLocalizedDescription = "Unescaped control character '0x9'";
I have tried using a line such as: NSString *myString = [myString stringByReplacingOccurrencesOfString:#"\n" withString:#"\\n"];
But that doesn't work. My xml service is not coming back as CDATA. The xml does have a line break in it as far as I can tell (how would I confirm this). I just want to faithfully transmit the line break into JSON.
I have actually spent an entire day on this, so it's time to ask. I have no pride anymore.
Thanks alot
Escaping a new line character should work. So following line should ideally work. Just check if your input also contains '\r' character.
NSString *myString = [myString stringByReplacingOccurrencesOfString:#"\n" withString:#"\\n"];
You can check which control character is present in the string using any editor which supports displaying all characters (non-displayable characters as well). e.g. using Notepad++ you can view all characters contained in a string.
It sounds like your XSLT is not working, in that it is not producing legal JSON. This is unsurprising, as producing correctly formatted JSON strings is not entirely trivial. I'm wondering if it would be simpler to just use the standard XML library to parse the XML into data structures that your app can consume.
I don't have a solution for you, but I usually use CJSONSerializer and CJSONDeserializer from the TouchJSON project and it is pretty reliable, I have never had a problem with line breaks before. Just a thought.
http://code.google.com/p/touchcode/source/browse/TouchJSON/Source/JSON/CJSONDeserializer.m?r=6294fcb084a8f174e243a68ccfb7e2c519def219
http://code.google.com/p/touchcode/source/browse/TouchJSON/Source/JSON/CJSONSerializer.m?r=3f52118ae2ff60cc34e31dd36d92610c9dd6c306

How to use '^#' in Vim scripts?

I'm trying to work around a problem with using ^# (i.e., <ctrl-#>) characters in Vim scripts. I can insert them into a script, but when the script runs it seems the line is truncated at the point where a ^# was located.
My kludgy solution so far is to have a ^# stored in a variable, then reference the variable in the script whenever I would have quoted a literal ^#. Can someone tell me what's going on here? Is there a better way around this problem?
That is one reason why I never use raw special character values in scripts. While ^# does not work, string <C-#> in mappings works as expected, so you may use one of
nnoremap <C-#> {rhs}
nnoremap <Nul> {rhs}
It is strange, but you cannot use <Char-0x0> here. Some notes about null byte in strings:
Inserting null byte into string truncates it: vim uses old C-style strigs that end with null byte, thus it cannot appear in strings. These strings are very inefficient, so if you want to generate a very large text, try accumulating it into a list of lines (using setline is very fast as buffer is represented as a list of lines).
Most functions that return list of strings (like readfile, getline(start, end)) or take list of strings (like writefile, setline, append) treat \n (NL) as Null. It is also the internal representation of buffer lines, see :h NL-used-for-Nul.
If you try to insert \n character into the command-line, you will get Null shown (but this is really a newline). If you want to edit a file that has \n in a filename (it is possible on *nix), you will need to prepend newline with backslash.
The byte ctrl-# is also known as '\0'. Many languages, programs, etc. use it as an "end of string" marker, so it's not surprising that vim gets confused there. If you must use this byte in the middle of a script string, it sounds like your workaround is a decent one.

NSURL doesn't work any time

i have the following problem sometimes my openURL-Dialog works perfectly, then i looked at the variable from the url and that is the variable:
www.brehm-gmbh.de
but some other times there are some crazy elements at the end of the variable like this:
www.adamczyk-fenster.de%E2%80%8E
i get this pages from an .asc file and both are in this file normal without this elements,
what can i do to solve this problem?
thank you all for helping beforehand
From Wikipedia:
The left-to-right mark (LRM) is a
control character or non-printing
character, used in the computerized
typesetting of bi-directional text,
containing mixed left-to-right scripts
(such as English and Russian) and
right-to-left scripts (such as Arabic
and Hebrew). It is used to change the
way adjacent characters are grouped
with respect to text direction.
You're getting this because (1) you've got non-English URLs, are composing URLs from non-English strings or you have some other non-English elements and the string encoding is attempting to compensate or (2) it's garbarge being interpreted as an encoding (unlikely if it is consistant.)
Call -[NSString localizedNameOfStringEncoding] on the string before you use it see what encoding it is using. You probably need to explicitly establish an encoding when you read in the strings before you put them in the NSURL.