NSXMLParser stops parsing after encountering special character - iphone

I am reading a XML file from google weather api and parsing it using NSXMLParser. The city in question is Paris. Here is a brief xml output I get
<?xml version="1.0"?>
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" ><forecast_information>
<city data="Paris, Île-de-France"/>
<postal_code data="Paris"/>
<latitude_e6 data=""/>
<longitude_e6 data=""/>
...
...
Now the code I used to pares this xml is
NSString *address = #"http://www.google.com/ig/api?weather=Paris";
NSURL *URL = [NSURL URLWithString:address];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:URL];
[parser setDelegate:self];
[parser parse];
...
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
NSLog(#"XML Parser 1 ... elementName ... %#", elementName);
}
This is output that I get for the above xml
XML Parser 1 ... elementName ... xml_api_reply
XML Parser 1 ... elementName ... weather
XML Parser 1 ... elementName ... forecast_information
The problem is that it parses all the tags till it reaches "city data" since there is a non-ascii character in the name Paris, Île-de-France and then it just stops. It doesn't process tags afterwards like postal_code. latitude, longitude etc.
So my question is, is there a way I can remove all non-ascii characters from the returned URL XML string?

I know what could be happening, i just had the same problem...
Look at your foundCharacters method at your parser...
I had something like this:
if (!currentElementValue) {
currentElementValue = [[NSMutableString alloc] initWithString:string];
}
and currentElementValue just stopped getting when special chars happend.
now my working code is:
if (!currentElementValue) {
currentElementValue = [[NSMutableString alloc] initWithString:string];
} else {
[currentElementValue appendString:string];
}
Remember to set currentElementValue to nil at the end of your didEndElement method

Ok. I have solved this problem. This is how I got it to work.
First I do is get the XML from the URL with special characters. Then I strip out all the special characters from the XML string. Then I convert the string to NSdata and then pass that nsdata object to my NSXMLParser. Since it has no more special characters NSXMLParser is happy.
Here's the code for anyone who may run across in future. Big thank you to everyone who contributed to this post!
NSString *address = #"http://www.google.com/ig/api?weather=Paris";
NSURL *URL = [NSURL URLWithString:address];
NSError *error;
NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:&error];
//REMOVE ALL NON-ASCII CHARACTERS
NSMutableString *asciiCharacters = [NSMutableString string];
for (NSInteger i = 32; i < 127; i++)
{
[asciiCharacters appendFormat:#"%c", i];
}
NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];
XML = [[XML componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:#""];
NSData *data = [XML dataUsingEncoding:NSUTF8StringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
[parser setDelegate:self];
[parser parse];
EDIT:
NSXMLParser is a horrible tool. I have successfully used RaptureXML in all my apps. Its super easy to use and avoids all this non-sense of non-ascii characters. https://github.com/ZaBlanc/RaptureXML

The problem you're having is that Google's response uses a different encoding than the ASCII or UTF8 that you're expecting. Using the handy command line tool curl, it's easy to see that:
$ curl -I http://www.google.com/ig/api?weather=Paris
HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
Content-Type: text/xml; charset=ISO-8859-1
...
If you look up ISO-8859-1, you'll find that it's also known as the Latin-1 character set. One of the built-in encoding options is NSISOLatin1StringEncoding, so do this:
NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSISOLatin1StringEncoding error:&error];
Using the correct encoding will make it possible for NSString to figure out how to interpret the characters, and you'll get back usable data. Alternately, you may be able to modify your request to specify the character encoding that you want Google to provide. That might be preferable, so that you don't have to try to match the encoding you use to a specific request.
Edit: Up to this point, my answer focusses on just getting the response as a readable string. I see that you're real question involves parsing with NSXMLParser, though. I think you have at least two options here:
Modify the XML that you receive to include the character encoding. The XML that you get back is Latin-1 encoded, but the XML tag says just: <?xml version="1.0"?>. You could modify that to look like: <?xml version="1.0" encoding="ISO-8859-1"?>. I don't know if that would solve the problem with NSXMLParser, but it might.
As suggested above, request the character set that you want from Google. Adding a Accept-Charset header to the request should do the trick, though that'll make retrieving the data a little more complicated.

Stick with ISO-8859-1, so you don't need to "remove special characters". Use a different mechanism for getting the http data.
Use an NSURLConnection, it's far more flexible in the long run and asynchronos.
NSMutableURLRequest *theRequest = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:url]
cachePolicy:NSURLRequestUseProtocolCachePolicy
timeoutInterval:15.0];
NSURLConnection *theConnection = [[NSURLConnection alloc] initWithRequest:theRequest delegate:self];
if (theConnection) {
// Create the NSMutableData to hold the received data.
// receivedData is an instance variable declared elsewhere.
receivedData = [[NSMutableData data] init];
return YES;
} else {
// Inform the user that the connection failed.
return NO;
}
}
#pragma mark - Url connection data delegate
- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response {
[receivedData setLength:0];
}
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
[receivedData appendData:data];
}
- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
receivedData = nil;
[self badLoad];
}
- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
//inform delegate of completion
[self.delegate fetchedData:receivedData];
receivedData = nil;
}

Related

iOS Unable to store URL contents into array

I have to connect to a URL to check whether the records is empty. The response looks something like this:
<?xml version = "1.0" encoding = "UTF-8"?>
<find>
<record_id>1234</record_id>
<no_record>00001</no_record>
<entry_num>00001</entry_num>
<session-id>aijheifaohqrihelrkqn324tlejaofjaf</session-id>
</find>
My codes:
NSMutableURLRequest *request = [[[NSMutableURLRequest alloc] init]
autorelease];
[request setURL:[NSURL URLWithString: finalSearchURL]];
// Content-Type related.
[request setValue:#"application/x-www-form-urlencoded"
forHTTPHeaderField:#"Content-Type"];
// Create Connection.
NSURLConnection *conn = [[NSURLConnection alloc] initWithRequest:request delegate:self];
if (conn) {
// The connection was established.
NSMutableData *receivedData = [[NSMutableData alloc] initWithContentsOfURL:[NSURL URLWithString:request]];
NSLog( #"Data will be received from URL: %#", request.URL );
NSLog(#"Recieved Data 2: %#", receivedData);
}
else
{
// The download could not be made.
NSLog( #"Data could not be received from: %#", request.URL );
}
But it returns me:
Recieved Data : <3c3f786d 6c207665 7273696f 6e203d20 22312e30 2220656e 636f6469 6e67203d 20225554 462d3822 3f3e0a3c 66696e64 3e0a3c73 65745f6e 756d6265 723e3031 39303633 3c2f7365 745f6e75 6d626572 3e0a3c6e 6f5f7265 636f7264 733e3030 30303030 3030313c 2f6e6f5f 7265636f 7264733e 0a3c6e6f 5f656e74 72696573 3e303030 30303030 30313c2f 6e6f5f65 6e747269 65733e0a 3c736573 73696f6e 2d69643e 4d505843 33323433 58564336 4534454a 41464232 45473541 39374237 584e3832 43554631 4e314234 584e4c37 424c5947 4e533c2f 73657373 696f6e2d 69643e0a 3c2f6669 6e643e0a 20>
Can anyone help to tell me what am I doing wrong? This is my first attempt for getting response from a url please help thanks!
See the data as a string this way:
NSString *string = [[NSString alloc] initWithData:receivedData encoding:NSUTF8StringEncoding];
NSLog(#"the xml string is %#", string);
If the parsing goal is simple enough - like just to find the value of one tag - you can use string methods to parse. Otherwise, NSXMLParser or several other options are available.
To see if the string contains a substring, you can do something like this:
if (string) {
NSRange range = [string rangeOfString:#"<session-id>"];
if (range.location != NSNotFound) {
// session-id tag is at index range.location, so we know it's there
}
}
The method you used is to get the raw data from the url. You need a parser to convert the raw data to the understandable structure (probably NSDictionary rather than NSArray).
Apple has provided NSXMLParser for you to retrieve the xml structure from the url or you can find other xml parser libraries.
Actually, your code is returning the correct data. Since NSData can hold any kind of data, it will just display the hex value. If you convert the hex data to a sting, you'll see that it has the correct text.
Now, your code can be simplified a lot. All the code for setting up the NSURLConnection is not needed at all. All you need is the following line.
NSString *recievedText = [NSString stringWithContentsOfFile:finalSearchURL encoding:NSUTF8StringEncoding error:NULL];

Parse an XML stored in NSData with libxml2

please can you help me using libxml2 to parse an XML stored in a NSMutableData object? I get the XML using
NSString *path = "http://www.mySite.com/XMLPATH.xml";
NSURLRequest* request = [NSURLRequest requestWithURL:[NSURL URLWithString:path] cachePolicy:NSURLRequestUseProtocolCachePolicy timeoutInterval:60.0];
connection = [[NSURLConnection alloc] initWithRequest:request delegate:self];
and
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data_
{
[data appendData:data_];
}
where data is an instance of NSMutableData.
Now how can i start the libxml2 to parse this data? I need the equivalent of
NSString *xml; // string containing XML
mlDocPtr doc = xmlParseMemory([xml UTF8String], [xml lengthOfBytesUsingEncoding:NSUTF8StringEncoding]);
where xml is my NSMutabledata and not a NSString.
Thanks!
EDIT:
I'm currently using NSXMLParser to do the job, but i'd like to have a parser that automatically parses all the XML, with its node structure. With NSXMLParser i need to manually set the node structure of my XML
Any reason you are not using the NSXMLParser. It is a nice wrapper written by Apple and included on all iOS versions.
http://developer.apple.com/library/mac/ipad/#documentation/Cocoa/Reference/Foundation/Classes/NSXMLParser_Class/Reference/Reference.html

"Unable to download content from web site" while NSXMLParser initWithData

Currently I am trying to parse an xml string that I already have (no web calls needed). My app is native iPhone in Objective-C. I have set up an NSXMLParser delegate class which uses initWithData:xmlData. For some reason, the first and only callback on my delegate is to parser: parseErrorOccurred with the following text:
"Unable to download content from web site (Error code 5 )"
Obviously, this makes no sense since I don't ask for anything from the web. Might it still be using some private URL property to call out for something?
Here is some code:
Delegate Class XmlParser:
- (void)parseXmlString:(NSString *)xml parseError:(NSError **)error {
DEBUG_NSLog(#"XML Parser: Called with string: %#", xml);
NSData *xmlData = [xml dataUsingEncoding:NSASCIIStringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:xmlData];
// Set self as the delegate of the parser so that it will receive the parser delegate methods callbacks.
if (parser != nil) {
[parser setDelegate:self];
[parser setShouldProcessNamespaces:NO];
[parser setShouldReportNamespacePrefixes:NO];
[parser setShouldResolveExternalEntities:NO];
[parser parse];
NSError *parseError = [parser parserError];
if (parseError && error) {
*error = parseError;
}
[parser release];
}
}
Called from:
XmlParser *parser = [[XmlParser alloc] init];
NSError *error = nil;
[parser parseXmlString:aString parseError:&error];
if (error) {
DEBUG_NSLog(#"ERROR FROM PARSER");
}
where aString is an NSString containing XML (note: without header).
Error callback that is called:
- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError {
NSString * errorString = [NSString stringWithFormat:#"Unable to download content from web site (Error code %i )", [parseError code]];
DEBUG_NSLog(#"XML Parser ERROR: %#", errorString);
[parser abortParsing];
}
When the code is run, the parseErrorOccurred hits immediately after [parser parse], and yes, I have implemented each of the didStartDocument, didEndDocument, etc.
Thanks!
UPDATE:
In debugging it seems that the xmlData object that I create is 0 bytes, even though the xml string I pass in to dataUsingEncoding has plenty of data. Is the encoding the issue?
One of the xml elements contains nested html. I'm thinking that the "s and &'s could be a problem. Hopefully doing a "->\" will fix it.
Neither escaping the quotes or replacing any &s with & fixed the problem. Could there be something wrong with having a tag in the string?
Your error message is hiding the actual error. Your xmlstring appears to be invalid as the error code is "Error code 5". See this other SO question. NSXMLparser errorcode 5
Update
When creating your xmlData instance use NSUTF8StringEncoding instead of NSASCIIStringEncoding
If that stil fails, post the actual string. Passing an empty data object to the parser is causing the error.
I tried above code with a sample XML DATA - it works great. It look like there is some issue with XML data you pass to the function.
Check your XML data or share your xml input for further analysis...
You cannot use <> characters in xml. Replace them with:
< = <
> = >
When dealing with XML the first parsing error is always fatal. If there is a parsing error, its not valid XML.
You should encode the raw HTML into HTML entities. Having raw HTML (from a user or third party source) zipping around in an app is considered a Bad Idea™.

XML answer problem

I tried to read the response data from google weather api, but german umlauts aren't shown correctly. Instead of "ö" I get "^".
I think the problem are those two lines of code:
CXMLElement *resultElement = [nodes objectAtIndex:0];
description = [[[[resultElement attributeForName:#"data"] stringValue] copy] autorelease];
How can i get data out of resultElement without stringValue?
PS: I use TouchXML to parse xml
You must be using an NSURLConnection to get your data I suppose. When you receive the data you can convert it to an NSString using appropriate encoding. E.g.
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data{
if(xmlResponse == nil){
xmlResponse = [[NSMutableString alloc] initWithData:data encoding:NSISOLatin1StringEncoding];
}
else{
NSMutableString *temp = [[NSMutableString alloc] initWithData:data encoding:NSISOLatin1StringEncoding];
[xmlResponse appendString:temp];
[temp release];
}
}
Here xmlResponse is the NSMutableString that you can pass to your parser. I have used NSISOLatin1 encoding. You can check other kinds of encoding and see what gives you the characters correctly (NSUTF8StringEncoding should do it I suppose).You can check the API doc for a list of supported encodings.

NSXMLParser and entity references

What do I need to do to NSXMLParser so it handles entity characters? For example, if I have the following element <anElement>Left & Right</anElement> I am only getting " Right" in the parser:foundCharacters: delegate method.
Thanks.
I threw together a really quick prototype application to test this out. What you are describing is not the behavior I'm seeing:
XML File:
<?xml version="1.0" encoding="UTF-8" ?>
<my_element>Left & Right</my_element>
Implementation:
#import "XMLMeController.h"
#implementation XMLMeController
- (IBAction)parse:(id)sender
{
NSURL *url = [NSURL fileURLWithPath:#"/Users/robertwalker/Desktop/test.xml"];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:url];
[parser setDelegate:self];
[parser parse];
[parser release];
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
NSLog(#"Found: %#", string);
}
#end
Console output:
2008-11-11 20:41:47.805 XMLMe[10941:10b] Found: Left
2008-11-11 20:41:47.807 XMLMe[10941:10b] Found: &
2008-11-11 20:41:47.807 XMLMe[10941:10b] Found: Right
As you can see the parser is finding the "Left" then the "&" and then "Right" as three separate events that are sent to the delegate.
I can't really tell from your posting, but you need to make sure that the proper entity is used in the XML file "&" rather than just "&" character, which of course is invalid in XML files.