NSXMLParser and entity references - iphone

What do I need to do to NSXMLParser so it handles entity characters? For example, if I have the following element <anElement>Left & Right</anElement> I am only getting " Right" in the parser:foundCharacters: delegate method.
Thanks.

I threw together a really quick prototype application to test this out. What you are describing is not the behavior I'm seeing:
XML File:
<?xml version="1.0" encoding="UTF-8" ?>
<my_element>Left & Right</my_element>
Implementation:
#import "XMLMeController.h"
#implementation XMLMeController
- (IBAction)parse:(id)sender
{
NSURL *url = [NSURL fileURLWithPath:#"/Users/robertwalker/Desktop/test.xml"];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:url];
[parser setDelegate:self];
[parser parse];
[parser release];
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
NSLog(#"Found: %#", string);
}
#end
Console output:
2008-11-11 20:41:47.805 XMLMe[10941:10b] Found: Left
2008-11-11 20:41:47.807 XMLMe[10941:10b] Found: &
2008-11-11 20:41:47.807 XMLMe[10941:10b] Found: Right
As you can see the parser is finding the "Left" then the "&" and then "Right" as three separate events that are sent to the delegate.
I can't really tell from your posting, but you need to make sure that the proper entity is used in the XML file "&" rather than just "&" character, which of course is invalid in XML files.

Related

NSXMLParser stops parsing after encountering special character

I am reading a XML file from google weather api and parsing it using NSXMLParser. The city in question is Paris. Here is a brief xml output I get
<?xml version="1.0"?>
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" ><forecast_information>
<city data="Paris, Île-de-France"/>
<postal_code data="Paris"/>
<latitude_e6 data=""/>
<longitude_e6 data=""/>
...
...
Now the code I used to pares this xml is
NSString *address = #"http://www.google.com/ig/api?weather=Paris";
NSURL *URL = [NSURL URLWithString:address];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:URL];
[parser setDelegate:self];
[parser parse];
...
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
NSLog(#"XML Parser 1 ... elementName ... %#", elementName);
}
This is output that I get for the above xml
XML Parser 1 ... elementName ... xml_api_reply
XML Parser 1 ... elementName ... weather
XML Parser 1 ... elementName ... forecast_information
The problem is that it parses all the tags till it reaches "city data" since there is a non-ascii character in the name Paris, Île-de-France and then it just stops. It doesn't process tags afterwards like postal_code. latitude, longitude etc.
So my question is, is there a way I can remove all non-ascii characters from the returned URL XML string?
I know what could be happening, i just had the same problem...
Look at your foundCharacters method at your parser...
I had something like this:
if (!currentElementValue) {
currentElementValue = [[NSMutableString alloc] initWithString:string];
}
and currentElementValue just stopped getting when special chars happend.
now my working code is:
if (!currentElementValue) {
currentElementValue = [[NSMutableString alloc] initWithString:string];
} else {
[currentElementValue appendString:string];
}
Remember to set currentElementValue to nil at the end of your didEndElement method
Ok. I have solved this problem. This is how I got it to work.
First I do is get the XML from the URL with special characters. Then I strip out all the special characters from the XML string. Then I convert the string to NSdata and then pass that nsdata object to my NSXMLParser. Since it has no more special characters NSXMLParser is happy.
Here's the code for anyone who may run across in future. Big thank you to everyone who contributed to this post!
NSString *address = #"http://www.google.com/ig/api?weather=Paris";
NSURL *URL = [NSURL URLWithString:address];
NSError *error;
NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:&error];
//REMOVE ALL NON-ASCII CHARACTERS
NSMutableString *asciiCharacters = [NSMutableString string];
for (NSInteger i = 32; i < 127; i++)
{
[asciiCharacters appendFormat:#"%c", i];
}
NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];
XML = [[XML componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:#""];
NSData *data = [XML dataUsingEncoding:NSUTF8StringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
[parser setDelegate:self];
[parser parse];
EDIT:
NSXMLParser is a horrible tool. I have successfully used RaptureXML in all my apps. Its super easy to use and avoids all this non-sense of non-ascii characters. https://github.com/ZaBlanc/RaptureXML
The problem you're having is that Google's response uses a different encoding than the ASCII or UTF8 that you're expecting. Using the handy command line tool curl, it's easy to see that:
$ curl -I http://www.google.com/ig/api?weather=Paris
HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
Content-Type: text/xml; charset=ISO-8859-1
...
If you look up ISO-8859-1, you'll find that it's also known as the Latin-1 character set. One of the built-in encoding options is NSISOLatin1StringEncoding, so do this:
NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSISOLatin1StringEncoding error:&error];
Using the correct encoding will make it possible for NSString to figure out how to interpret the characters, and you'll get back usable data. Alternately, you may be able to modify your request to specify the character encoding that you want Google to provide. That might be preferable, so that you don't have to try to match the encoding you use to a specific request.
Edit: Up to this point, my answer focusses on just getting the response as a readable string. I see that you're real question involves parsing with NSXMLParser, though. I think you have at least two options here:
Modify the XML that you receive to include the character encoding. The XML that you get back is Latin-1 encoded, but the XML tag says just: <?xml version="1.0"?>. You could modify that to look like: <?xml version="1.0" encoding="ISO-8859-1"?>. I don't know if that would solve the problem with NSXMLParser, but it might.
As suggested above, request the character set that you want from Google. Adding a Accept-Charset header to the request should do the trick, though that'll make retrieving the data a little more complicated.
Stick with ISO-8859-1, so you don't need to "remove special characters". Use a different mechanism for getting the http data.
Use an NSURLConnection, it's far more flexible in the long run and asynchronos.
NSMutableURLRequest *theRequest = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:url]
cachePolicy:NSURLRequestUseProtocolCachePolicy
timeoutInterval:15.0];
NSURLConnection *theConnection = [[NSURLConnection alloc] initWithRequest:theRequest delegate:self];
if (theConnection) {
// Create the NSMutableData to hold the received data.
// receivedData is an instance variable declared elsewhere.
receivedData = [[NSMutableData data] init];
return YES;
} else {
// Inform the user that the connection failed.
return NO;
}
}
#pragma mark - Url connection data delegate
- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response {
[receivedData setLength:0];
}
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
[receivedData appendData:data];
}
- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
receivedData = nil;
[self badLoad];
}
- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
//inform delegate of completion
[self.delegate fetchedData:receivedData];
receivedData = nil;
}

"Unable to download content from web site" while NSXMLParser initWithData

Currently I am trying to parse an xml string that I already have (no web calls needed). My app is native iPhone in Objective-C. I have set up an NSXMLParser delegate class which uses initWithData:xmlData. For some reason, the first and only callback on my delegate is to parser: parseErrorOccurred with the following text:
"Unable to download content from web site (Error code 5 )"
Obviously, this makes no sense since I don't ask for anything from the web. Might it still be using some private URL property to call out for something?
Here is some code:
Delegate Class XmlParser:
- (void)parseXmlString:(NSString *)xml parseError:(NSError **)error {
DEBUG_NSLog(#"XML Parser: Called with string: %#", xml);
NSData *xmlData = [xml dataUsingEncoding:NSASCIIStringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:xmlData];
// Set self as the delegate of the parser so that it will receive the parser delegate methods callbacks.
if (parser != nil) {
[parser setDelegate:self];
[parser setShouldProcessNamespaces:NO];
[parser setShouldReportNamespacePrefixes:NO];
[parser setShouldResolveExternalEntities:NO];
[parser parse];
NSError *parseError = [parser parserError];
if (parseError && error) {
*error = parseError;
}
[parser release];
}
}
Called from:
XmlParser *parser = [[XmlParser alloc] init];
NSError *error = nil;
[parser parseXmlString:aString parseError:&error];
if (error) {
DEBUG_NSLog(#"ERROR FROM PARSER");
}
where aString is an NSString containing XML (note: without header).
Error callback that is called:
- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError {
NSString * errorString = [NSString stringWithFormat:#"Unable to download content from web site (Error code %i )", [parseError code]];
DEBUG_NSLog(#"XML Parser ERROR: %#", errorString);
[parser abortParsing];
}
When the code is run, the parseErrorOccurred hits immediately after [parser parse], and yes, I have implemented each of the didStartDocument, didEndDocument, etc.
Thanks!
UPDATE:
In debugging it seems that the xmlData object that I create is 0 bytes, even though the xml string I pass in to dataUsingEncoding has plenty of data. Is the encoding the issue?
One of the xml elements contains nested html. I'm thinking that the "s and &'s could be a problem. Hopefully doing a "->\" will fix it.
Neither escaping the quotes or replacing any &s with & fixed the problem. Could there be something wrong with having a tag in the string?
Your error message is hiding the actual error. Your xmlstring appears to be invalid as the error code is "Error code 5". See this other SO question. NSXMLparser errorcode 5
Update
When creating your xmlData instance use NSUTF8StringEncoding instead of NSASCIIStringEncoding
If that stil fails, post the actual string. Passing an empty data object to the parser is causing the error.
I tried above code with a sample XML DATA - it works great. It look like there is some issue with XML data you pass to the function.
Check your XML data or share your xml input for further analysis...
You cannot use <> characters in xml. Replace them with:
< = <
> = >
When dealing with XML the first parsing error is always fatal. If there is a parsing error, its not valid XML.
You should encode the raw HTML into HTML entities. Having raw HTML (from a user or third party source) zipping around in an app is considered a Bad Idea™.

iPhone Development: Get images from RSS feed

I am using the NSXMLParser to get new RSS stories from a feed and am displaying them in a UITableView. However now I want to take ONLY the images, and display them in a UIScrollView/UIImageView (3 images side-by side). I am completely lost. I am using the following code to obtain 1 image from a URL.
NSURL *theUrl1=[NSURL URLWithString:#"http://farm3.static.flickr.com/2586/4072164719_0fa5695f59.jpg"];
JImage *photoImage1=[[JImage alloc] init];
[photoImage1 setContentMode:UIViewContentModeScaleAspectFill];
[photoImage1 setFrame:CGRectMake(0, 0, 320, 170)];
[photoImage1 initWithImageAtURL:theUrl1];
[imageView1 addSubview:photoImage1];
[photoImage1 release];
This is all I have accomplished, and it works, for one image, and I have to specify the exact URL. What would you recommend I do to accomplish this?
Further to my other answer, which uses some helper classes and kinda assumes you're storing stuff with Core Data, here's a pure NSXMLParser way to do it.
In this example I'm assuming you have three UIImageViews setup with tags (100,101,102) so we can access them. First off, the code that starts the parser:
// Set the URL with the images, and escape it for creating NSURL
NSString *rssURLString = #"http://feeds.gettyimages.com/channels/RecentEditorialEntertainment.rss";
NSString *escapedURL = [rssURLString stringByAddingPercentEscapesUsingEncoding:NSASCIIStringEncoding];
NSURL *rssURL = [NSURL URLWithString:escapedURL];
// rssParser is an NSXMLParser instance variable
if (rssParser) [rssParser release];
rssParser = [[NSXMLParser alloc] initWithContentsOfURL:rssURL];
[rssParser setDelegate:self];
success = [rssParser parse]; // return value not used
At this point the parsing starts and NSXMLParser will fire off calls to it's delegate methods as it finds different start and end elements in the XML.
In this example I am only writing the didStartElement method:
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
// look for an attribute called url
if ([attributeDict objectForKey:#"url"]) {
currentString = [attributeDict objectForKey:#"url"];
NSLog(#"Image URL: %#", currentString);
NSString* escapedURL = [currentString stringByAddingPercentEscapesUsingEncoding:NSASCIIStringEncoding];
UIImage *image = [[UIImage alloc] initWithData:[NSData dataWithContentsOfURL:[NSURL URLWithString:escapedURL]]];
UIImageView * tmpImageView = (UIImageView*)[scrollView viewWithTag:100+imageCount];
[tmpImageView setImage:image];
NSLog(#"images found: %d", imageCount);
imageCount++;
if (imageCount>2) [rssParser abortParsing];
}
}
Here we look to see if the attributeDict (an NSDictionary object) contains a url attribute. If so, we grab it into currentString and then escape it, just incase it has characters that NSURL will barf on. Then we create an image from that URL and set the appropriate UIImageView image based on the tag numbers. imageCount is a counter; once we've done three images we tell the NSXMLParser to abort parsing the XML.
If your XML puts the URL inside element tags like:
<image>http://example.com/image.jpg</image>
You'll need to do a bit more work with didEndElement and foundCharacters. See the quite excellent Introduction to Event-Driven XML Programming Guide for Cocoa.
I knocked together a quick and dirty app to demo this, you can grab it here.
it sounds like you need to first identify the xml tag that identifies the images in your xml document. you should be able to do this by typing whatever API call you're using into a browser address bar.
once you've done that you can make an array of image urls from the nsxmlparser delegate method that receives new data.
once you have the array of image url's you can do something similar to what you are doing above except that you would use NSURL *theUrl1=[myArray objectAtIndex:...
you can arrange the images just by changing their centre location: image.center = CGPointMake(160,240)..
hope that helps. there are apple docs for nsxmlparser.
You can also try dictionary implementation while fetching data from API call. First you have to identify xml tag that identifies images in xml document, then you can assign each image with its corresponding story as image's key into a dictionary. It will make sure that for particular story only its associated image will be displayed. Then u can use this information later in your application as the requirement varies.
NSMutableDictionary *mutDict = [[NSMutableDictionary allloc]init];
if([elementName isEqualToString:#"story_image"])
{
[mutDict setObject:currentImage forKey:currentStory];
}
I want to suggest you to use JSON instead of xml as it is lightweight data-interchange
format. It would save lot of formatting and effort also.
You can visit this page
http://code.google.com/p/json-frame
It is definitely going to help you.
You have to just download the framework and then use in your application.
To get the live data you have to do same thing as in XMLParsing
NSString *jsonString = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];
NSDictionary *JDict = [jsonString JSONValue];
or
NSArray * JArr = [jsonString JSONValue];
depending upon what your data-feed contains.
I've been using NSXMLParser myself and storing the results using CoreData. I use a version of Björn Sållarp's Parser class from his CoreData example code.
My images end up as NSData/Binary in a SQLite database, but they might just as well get put into an array of UIImage for immediate display.
Extract from Parser.m: (from Björn Sållarp)
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
if ([elementName isEqualToString:#"imagetag"])
{
UIImage *newImage = [[UIImage alloc] initWithData:[NSData dataWithContentsOfURL:[NSURL URLWithString:currentString]]];
NSData *imageData = UIImagePNGRepresentation(newImage);
[currentSearchResult setImage:imageData];
currentString = nil;
return;
}
Called from my view with:
NSString *searchURL = #"http://www.feedurl.com/feed/address/feed.xml";
NSURL *xmlURL = [NSURL URLWithString:searchURL];
Parser *xmlParse = [[Parser alloc] initWithContext:managedObjectContext];
[xmlParse parseXMLFileAtURL:xmlURL parseError:&parseError];
That code assumes your XML document contains image URLs with the tag format:
<imagetag>http://example.com/path/to/image.png</imagetag>
Even if you're not using CoreData, working through the example at that link would be instructional for processing XML with NSXMLParser.

nsxmlparser not solving &apos;

Im using NSXMLParser to dissect a xml package, I'm receiving &apos inside the package text.
I have the following defined for the xmlParser:
[xmlParser setShouldResolveExternalEntities: YES];
The following method is never called
- (void)parser:(NSXMLParser *)parser foundExternalEntityDeclarationWithName:(NSString *)entityName publicID:(NSString *)publicID systemID:(NSString *)systemID
The text in the field before the &apos is not considered by the parser.
Im searching how to solve this, any idea???
Thanks in advance
Alex
XML package portion attached:
<?xml version="1.0" encoding="ISO-8859-1"?><SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:tns="urn:appwsdl"><SOAP-ENV:Body><ns1:getObjects2Response xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/"><return xsi:type="tns:objectsResult"><totalRecipes xsi:type="xsd:string">1574</totalObjects><Objects xsi:type="tns:Item"><id xsi:type="xsd:string">4311</id><name xsi:type="xsd:string"> item title 1 </name><procedure xsi:type="xsd:string">item procedure 11......
Here is what I did, after referring a different answer from here.
I replaced all the occurrences of the &apos; in the xml with "'" when the data is received from NSURLConnection object. Then I give that data to the parser.
So what I do is:
NSData* parserData = [self resolveHTMLEntities: self.receivedData];
NSXMLParser* parser = [[NSXMLaParser alloc] initwithData:parserData];
Here is the resolveHTMLEntitites method:
NSString *xmlCode = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
NSMutableString *temp = [NSMutableString stringWithString:xmlCode];
// Replace all the entities
[temp replaceOccurrencesOfString:#"&apos;" withString:#"'" options:NSLiteralSearch range:NSMakeRange(0, [temp length])];
NSData *finalData = [temp dataUsingEncoding:NSUTF8StringEncoding];
return finalData;
The catch is that &apos; gets converted to &apos; thats why we need to replace that occurrence.
Note: No memory management is performed in the above block of code.
Hope this helps.
The standard entities are <, >, &, and ". &apos; is an html entity reference. Does your XML refer to the XHTML namespace or some other namespace that has &apos; defined?
(BTW, would be nice to see a small segment of the XML including the header.)

How to parse xml file that is encoded with another encoder different from utf-8 with using NSXMLParser in objective-C?

Is there a way to parse xml file that is encoded with windows-1254 with using NSXMLParser? When i try, didStartElement method not called.
Code is
NSXMLParser *xmlParser = [[NSXMLParser alloc] initWithData:webData];
XMLParser *parser = [[XMLParser alloc] initXMLParser: objectList];
[parser setReqType:reqType];
[xmlParser setDelegate:parser];
[xmlParser parse];
XML is
<?xml version="1.0" encoding="windows-1254" ?>
<CANLIMACLAR>
<CANLIMACLARROWS>
<TARIH>19/10/2009 21:15</TARIH>
<TAKIM1>Union Berlin</TAKIM1>
<TAKIM2>Fürth</TAKIM2>
<SONUC1>1</SONUC1>
<SONUC2>2</SONUC2>
<DK_DURUM>Maç Sonu</DK_DURUM> ...
</CANLIMACLARROWS> ...
</CANLIMACLAR>
Don't forget if parse() returns NO you can check parseError to see what happened:
if ([parser parse] == NO)
{
NSError *error = [parser parserError];
NSString *readableMessage = [error localizedDescription];
NSLog(#"Error occurred: %#\n", readableMessage);
}
I removed the <?xml version="1.0" encoding="windows-1254" ?> part from xml data and then i send it to xmlParser, so there is no problem anymore.