I use NSXMLParser for parsing XML documents of a server. They are encoded as UTF8.
My problem is, that NSXMLParser breaks at umlauts (ä, ö, ü) and starts a new element.
For example:
Lösen -- NSXMLParser ---> L + ösen
How do I get NSXMLParser to read my umlaut words completely, as every other word.
Regards
Sorry but based on your comment on the original question (foundCharacters receiving the text in two calls) the parser is behaving perfectly well. See the "Discussion" section for the parser:foundCharacters: method quoted below:
The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.
As you can see the parser is free to pass your delegate the characters in as many chunks as it sees fit.
foundCharacters: is not delinited by tags, you need to concatentate the characters passed in unti lthe next call to didEndElement.
I ran into that issue with Spanish characters in this line:
(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
I'm sure if you get the found characters section working well with the didEndElement function, you'll be fine.
Related
I have some xml that is coming back from a web service. I in turn use xslt to turn that xml into json (I am turning someone else's xml service into a json-based service). My service, which is now outputting JSON, is consumed by my iphone app using the de facto iphone json framework, SBJSON.
The problem is, using the [string JSONValue] method chokes, and I can see that it's due to line breaks. Lo and behold, even the FAQ tells me the problem but I don't know how to fix it.
The parser fails to parse string X
Are you sure it's legal JSON? This framework is really strict, so won't accept stuff that (apparently) several validators accepts. In particular, literal TAB, NEWLINE or CARRIAGE RETURN (and all other control characters) characters in string tokens are disallowed, but can be very difficult to spot. (These characters are allowed between tokens, of course.)
If you get something like the below (the number may vary) then one of your strings has disallowed Unicode control characters in it.
NSLocalizedDescription = "Unescaped control character '0x9'";
I have tried using a line such as: NSString *myString = [myString stringByReplacingOccurrencesOfString:#"\n" withString:#"\\n"];
But that doesn't work. My xml service is not coming back as CDATA. The xml does have a line break in it as far as I can tell (how would I confirm this). I just want to faithfully transmit the line break into JSON.
I have actually spent an entire day on this, so it's time to ask. I have no pride anymore.
Thanks alot
Escaping a new line character should work. So following line should ideally work. Just check if your input also contains '\r' character.
NSString *myString = [myString stringByReplacingOccurrencesOfString:#"\n" withString:#"\\n"];
You can check which control character is present in the string using any editor which supports displaying all characters (non-displayable characters as well). e.g. using Notepad++ you can view all characters contained in a string.
It sounds like your XSLT is not working, in that it is not producing legal JSON. This is unsurprising, as producing correctly formatted JSON strings is not entirely trivial. I'm wondering if it would be simpler to just use the standard XML library to parse the XML into data structures that your app can consume.
I don't have a solution for you, but I usually use CJSONSerializer and CJSONDeserializer from the TouchJSON project and it is pretty reliable, I have never had a problem with line breaks before. Just a thought.
http://code.google.com/p/touchcode/source/browse/TouchJSON/Source/JSON/CJSONDeserializer.m?r=6294fcb084a8f174e243a68ccfb7e2c519def219
http://code.google.com/p/touchcode/source/browse/TouchJSON/Source/JSON/CJSONSerializer.m?r=3f52118ae2ff60cc34e31dd36d92610c9dd6c306
NSXmlParser foundCharacters method is not reading string in one time when characters coming with special characters like København which is a danish word???
It breaks it from ø and read it separately...
What's your question? This is documented behavior:
The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.
If you wish to capture the entire textual contents of a tag, you'll have to catch all these messages and join the contents in a string.
I am getting problem while parsing xml files that contains some special characters like single quote,double quote (', "")etc.I am using NSXMLParser's parser:foundCharacters:method to collect characters in my code.
<synctext type = "word" >They raced to the park Arthur pointed to a sign "Whats that say" he asked Zoo said DW Easy as pie</synctext>
When i parse and save the text from above tag of my xml file,the resultant string is appearing,in GDB, as
"\n\t\tThey raced to the park Arthur pointed to a sign \"Whats that say\" he asked Zoo said DW Easy as pie";
Observe there are 2 issues:
1)Unwanted characters at the beginning of the string.
2)The double quotes around Whats that say.
Can any one please help me how to get rid of these unwanted characters and how to read special characters properly.
NSString*string =[string stringByTrimmingCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:#" \n\t"]];
The parser is apparently returning exactly what's in the string. That is, the XML was coded with the starting tag on one line, a newline, two tabs, and the start of the string. And quotes in the string are obviously there in the original (and it's not clear in at least this example why you'd want to delete them).
But if you want these characters gone then you need to post-process the string. You can use Rams' statement to eliminate the newline and tabs, and stringByReplacingOccurrencesOfString:WithString: to zap the quotes.
(Note that some XML parsers can be instructed to return strings like this with the leading/trailing stuff stripped, but I'm not sure about this one. The quotes will always be there, though.)
i just want to knw,is there any boundations in xml parsing with characters
like can we parse a word containing some characters like
"frühe" containing "ü"
"böser" containing "ö"
while i am parsing my xml,which is few different languages, some characters are like the above.
and wen i saw in console, it get interpted,exaactly wen it reacher "ü"
becoz at console it prints "fr"
so can someone provide me some ideas about this thing
regards
shishir
If you are using the standard NSXmlParser class and the XML file has the correct encoding= attribute then you shouldn't have anything to worry about. The console output probably isn't unicode-aware so it is interpreting the multi-byte UTF-8 characters literally. Try showing the parsed text in a UIAlertView or some other UI element and see if you still have problems.
Problem 1:
Has anyone worked with TouchXML, I am facing problem parcing rssfeed that has characters like & or even &
The parser takes the url as input and doesn’t seem to parse the XML content. NSXMLParser has no such problem for the same feed URL.
Problem 2:
Another problem with NSXMLParse is when the foundCharacter() method finds “\n”
even the call like
if([currentElementValue isEqualToString:#"\n"])
return;
currentElementValue = [currentElementValue stringByReplacingOccurrencesOfString:#"\n" withString:#""];
both these lines doesn’t seem to eliminate the \n character.
Any help guys ?
You need to escape the newline character sequence
The XML has a "\n" in it as a string, but your line above is looking for the bytecode (0x0A I think) that is a newline character in OSX.
You need to look for "\\n" which is the character sequence backslash-n.
[currentElementValue stringByReplacingOccurrencesOfString:#"\\n" withString:#""]
Will get you what you want!