How to parse xml file that is encoded with another encoder different from utf-8 with using NSXMLParser in objective-C? - iphone

Is there a way to parse xml file that is encoded with windows-1254 with using NSXMLParser? When i try, didStartElement method not called.
Code is
NSXMLParser *xmlParser = [[NSXMLParser alloc] initWithData:webData];
XMLParser *parser = [[XMLParser alloc] initXMLParser: objectList];
[parser setReqType:reqType];
[xmlParser setDelegate:parser];
[xmlParser parse];
XML is
<?xml version="1.0" encoding="windows-1254" ?>
<CANLIMACLAR>
<CANLIMACLARROWS>
<TARIH>19/10/2009 21:15</TARIH>
<TAKIM1>Union Berlin</TAKIM1>
<TAKIM2>Fürth</TAKIM2>
<SONUC1>1</SONUC1>
<SONUC2>2</SONUC2>
<DK_DURUM>Maç Sonu</DK_DURUM> ...
</CANLIMACLARROWS> ...
</CANLIMACLAR>

Don't forget if parse() returns NO you can check parseError to see what happened:
if ([parser parse] == NO)
{
NSError *error = [parser parserError];
NSString *readableMessage = [error localizedDescription];
NSLog(#"Error occurred: %#\n", readableMessage);
}

I removed the <?xml version="1.0" encoding="windows-1254" ?> part from xml data and then i send it to xmlParser, so there is no problem anymore.

Related

NSXMLParser stops parsing after encountering special character

I am reading a XML file from google weather api and parsing it using NSXMLParser. The city in question is Paris. Here is a brief xml output I get
<?xml version="1.0"?>
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" ><forecast_information>
<city data="Paris, Île-de-France"/>
<postal_code data="Paris"/>
<latitude_e6 data=""/>
<longitude_e6 data=""/>
...
...
Now the code I used to pares this xml is
NSString *address = #"http://www.google.com/ig/api?weather=Paris";
NSURL *URL = [NSURL URLWithString:address];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:URL];
[parser setDelegate:self];
[parser parse];
...
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
NSLog(#"XML Parser 1 ... elementName ... %#", elementName);
}
This is output that I get for the above xml
XML Parser 1 ... elementName ... xml_api_reply
XML Parser 1 ... elementName ... weather
XML Parser 1 ... elementName ... forecast_information
The problem is that it parses all the tags till it reaches "city data" since there is a non-ascii character in the name Paris, Île-de-France and then it just stops. It doesn't process tags afterwards like postal_code. latitude, longitude etc.
So my question is, is there a way I can remove all non-ascii characters from the returned URL XML string?
I know what could be happening, i just had the same problem...
Look at your foundCharacters method at your parser...
I had something like this:
if (!currentElementValue) {
currentElementValue = [[NSMutableString alloc] initWithString:string];
}
and currentElementValue just stopped getting when special chars happend.
now my working code is:
if (!currentElementValue) {
currentElementValue = [[NSMutableString alloc] initWithString:string];
} else {
[currentElementValue appendString:string];
}
Remember to set currentElementValue to nil at the end of your didEndElement method
Ok. I have solved this problem. This is how I got it to work.
First I do is get the XML from the URL with special characters. Then I strip out all the special characters from the XML string. Then I convert the string to NSdata and then pass that nsdata object to my NSXMLParser. Since it has no more special characters NSXMLParser is happy.
Here's the code for anyone who may run across in future. Big thank you to everyone who contributed to this post!
NSString *address = #"http://www.google.com/ig/api?weather=Paris";
NSURL *URL = [NSURL URLWithString:address];
NSError *error;
NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:&error];
//REMOVE ALL NON-ASCII CHARACTERS
NSMutableString *asciiCharacters = [NSMutableString string];
for (NSInteger i = 32; i < 127; i++)
{
[asciiCharacters appendFormat:#"%c", i];
}
NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];
XML = [[XML componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:#""];
NSData *data = [XML dataUsingEncoding:NSUTF8StringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
[parser setDelegate:self];
[parser parse];
EDIT:
NSXMLParser is a horrible tool. I have successfully used RaptureXML in all my apps. Its super easy to use and avoids all this non-sense of non-ascii characters. https://github.com/ZaBlanc/RaptureXML
The problem you're having is that Google's response uses a different encoding than the ASCII or UTF8 that you're expecting. Using the handy command line tool curl, it's easy to see that:
$ curl -I http://www.google.com/ig/api?weather=Paris
HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
Content-Type: text/xml; charset=ISO-8859-1
...
If you look up ISO-8859-1, you'll find that it's also known as the Latin-1 character set. One of the built-in encoding options is NSISOLatin1StringEncoding, so do this:
NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSISOLatin1StringEncoding error:&error];
Using the correct encoding will make it possible for NSString to figure out how to interpret the characters, and you'll get back usable data. Alternately, you may be able to modify your request to specify the character encoding that you want Google to provide. That might be preferable, so that you don't have to try to match the encoding you use to a specific request.
Edit: Up to this point, my answer focusses on just getting the response as a readable string. I see that you're real question involves parsing with NSXMLParser, though. I think you have at least two options here:
Modify the XML that you receive to include the character encoding. The XML that you get back is Latin-1 encoded, but the XML tag says just: <?xml version="1.0"?>. You could modify that to look like: <?xml version="1.0" encoding="ISO-8859-1"?>. I don't know if that would solve the problem with NSXMLParser, but it might.
As suggested above, request the character set that you want from Google. Adding a Accept-Charset header to the request should do the trick, though that'll make retrieving the data a little more complicated.
Stick with ISO-8859-1, so you don't need to "remove special characters". Use a different mechanism for getting the http data.
Use an NSURLConnection, it's far more flexible in the long run and asynchronos.
NSMutableURLRequest *theRequest = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:url]
cachePolicy:NSURLRequestUseProtocolCachePolicy
timeoutInterval:15.0];
NSURLConnection *theConnection = [[NSURLConnection alloc] initWithRequest:theRequest delegate:self];
if (theConnection) {
// Create the NSMutableData to hold the received data.
// receivedData is an instance variable declared elsewhere.
receivedData = [[NSMutableData data] init];
return YES;
} else {
// Inform the user that the connection failed.
return NO;
}
}
#pragma mark - Url connection data delegate
- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response {
[receivedData setLength:0];
}
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
[receivedData appendData:data];
}
- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
receivedData = nil;
[self badLoad];
}
- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
//inform delegate of completion
[self.delegate fetchedData:receivedData];
receivedData = nil;
}

How to fix issue with unrecognized Special Character on Ipad/Iphone?

I am working on an app that gets a bunch of descriptions through xml and then after parsing puts them on the screen. I'm having a problem with some of the descriptions with apostrophes turning into question marks. What I mean is, they start out in the xml, on the output screen and in the database I get them from as an apostrophe, but then it shows up on the app as a Question mark. This doesn't always happen, but it does with the same descriptions every time. Heres an example:
This is what is in the xml/database
But it won't be easy.
After a tour of the house, you'll be
This is what shows up on the app:
But it won?t be easy.
After a tour of the house, you?ll be
I'm pretty sure that the problem is that the ipad/iphone doesn't recognize the character that it is receiving...but I have no idea how I would go about fixing it. Here is a my Parser code: I believe the xml is being sent as UTF - 8 encoding.
[whereToGetXML appendFormat:xmlID];
NSURL *URL=[[NSURL alloc] initWithString:whereToGetXML];
NSData *dataFromServer = [[NSData alloc] initWithContentsOfURL:URL];
NSLog(#"where to get xml is:%#", whereToGetXML);
// Do any additional setup after loading the view from its nib.
NSData *dataWithoutStringTag = [[NSData alloc]init];
NSXMLParser *firstParser = [[NSXMLParser alloc] initWithData: dataFromServer];
[firstParser setDelegate:self];
[firstParser parse];
//Convert the returned parsed string to data using dataUsingEncoding method. HAVE TO HAVE allowLossyConversion:YES or else it will crash on half of the states.
dataWithoutStringTag =[tempParserString dataUsingEncoding: NSASCIIStringEncoding allowLossyConversion:YES];
NSString *htmlCode = [[NSString alloc] initWithData:dataWithoutStringTag encoding:NSASCIIStringEncoding];
NSMutableString *temp = [NSMutableString stringWithString:htmlCode];
NSData *finalData = [temp dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSXMLParser* parser = [[NSXMLParser alloc] initWithData:finalData];
// NSLog(#"Data is: %#", dataWithoutStringTag);
//NSXMLParser* parser = [[NSXMLParser alloc] initWithData: dataFromServer];
[parser setDelegate: self];
//[parser setShouldResolveExternalEntities:YES];
[parser parse];
[parser release];
It's presumably because the string isn't escaped.
Try adding a backslash before each quotation mark.
\'message\'

"Unable to download content from web site" while NSXMLParser initWithData

Currently I am trying to parse an xml string that I already have (no web calls needed). My app is native iPhone in Objective-C. I have set up an NSXMLParser delegate class which uses initWithData:xmlData. For some reason, the first and only callback on my delegate is to parser: parseErrorOccurred with the following text:
"Unable to download content from web site (Error code 5 )"
Obviously, this makes no sense since I don't ask for anything from the web. Might it still be using some private URL property to call out for something?
Here is some code:
Delegate Class XmlParser:
- (void)parseXmlString:(NSString *)xml parseError:(NSError **)error {
DEBUG_NSLog(#"XML Parser: Called with string: %#", xml);
NSData *xmlData = [xml dataUsingEncoding:NSASCIIStringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:xmlData];
// Set self as the delegate of the parser so that it will receive the parser delegate methods callbacks.
if (parser != nil) {
[parser setDelegate:self];
[parser setShouldProcessNamespaces:NO];
[parser setShouldReportNamespacePrefixes:NO];
[parser setShouldResolveExternalEntities:NO];
[parser parse];
NSError *parseError = [parser parserError];
if (parseError && error) {
*error = parseError;
}
[parser release];
}
}
Called from:
XmlParser *parser = [[XmlParser alloc] init];
NSError *error = nil;
[parser parseXmlString:aString parseError:&error];
if (error) {
DEBUG_NSLog(#"ERROR FROM PARSER");
}
where aString is an NSString containing XML (note: without header).
Error callback that is called:
- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError {
NSString * errorString = [NSString stringWithFormat:#"Unable to download content from web site (Error code %i )", [parseError code]];
DEBUG_NSLog(#"XML Parser ERROR: %#", errorString);
[parser abortParsing];
}
When the code is run, the parseErrorOccurred hits immediately after [parser parse], and yes, I have implemented each of the didStartDocument, didEndDocument, etc.
Thanks!
UPDATE:
In debugging it seems that the xmlData object that I create is 0 bytes, even though the xml string I pass in to dataUsingEncoding has plenty of data. Is the encoding the issue?
One of the xml elements contains nested html. I'm thinking that the "s and &'s could be a problem. Hopefully doing a "->\" will fix it.
Neither escaping the quotes or replacing any &s with & fixed the problem. Could there be something wrong with having a tag in the string?
Your error message is hiding the actual error. Your xmlstring appears to be invalid as the error code is "Error code 5". See this other SO question. NSXMLparser errorcode 5
Update
When creating your xmlData instance use NSUTF8StringEncoding instead of NSASCIIStringEncoding
If that stil fails, post the actual string. Passing an empty data object to the parser is causing the error.
I tried above code with a sample XML DATA - it works great. It look like there is some issue with XML data you pass to the function.
Check your XML data or share your xml input for further analysis...
You cannot use <> characters in xml. Replace them with:
< = <
> = >
When dealing with XML the first parsing error is always fatal. If there is a parsing error, its not valid XML.
You should encode the raw HTML into HTML entities. Having raw HTML (from a user or third party source) zipping around in an app is considered a Bad Idea™.

How to log nsxmlparser

I'm parsing an xml from an url, by
rssParser = [[NSXMLParser alloc] initWithContentsOfURL:xmlURL];
[rssParser parse]
How to NSLog it so as to see the xml in console??? If i use
NSLog ("%#",rssParser);
i'm showed wit 'XMLParser x 4d562' in the console
You should set parser's delegate and process retrieved xml data in its methods. See NSXMLParserDelegate protocol reference.
You can't. The NSXMLParser class never loads the entire contents of the XML stream in memory at once (that's why it's an "NSXMLParser" and not an "NSXMLDocument"). You should download the data from the URL, and use it to instantiate your RSS parser, and also to create an NSString that you log instead:
NSData *data = [NSData dataWithContentsOfURL:xmlURL];
rssParser = [[NSXMLParser alloc] initWithData:data];
NSString *string = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
NSLog(#"data: %#", string);
[string release];
Please pay attention to the fact that the "initWithContentsOfURL:" methods are synchronous, and will block your UI thread until the data has downloaded. You might want to use ASIHTTPRequest or the NSURLConnection mechanism instead, with asynchronous connections.

NSXMLParser and entity references

What do I need to do to NSXMLParser so it handles entity characters? For example, if I have the following element <anElement>Left & Right</anElement> I am only getting " Right" in the parser:foundCharacters: delegate method.
Thanks.
I threw together a really quick prototype application to test this out. What you are describing is not the behavior I'm seeing:
XML File:
<?xml version="1.0" encoding="UTF-8" ?>
<my_element>Left & Right</my_element>
Implementation:
#import "XMLMeController.h"
#implementation XMLMeController
- (IBAction)parse:(id)sender
{
NSURL *url = [NSURL fileURLWithPath:#"/Users/robertwalker/Desktop/test.xml"];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:url];
[parser setDelegate:self];
[parser parse];
[parser release];
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
NSLog(#"Found: %#", string);
}
#end
Console output:
2008-11-11 20:41:47.805 XMLMe[10941:10b] Found: Left
2008-11-11 20:41:47.807 XMLMe[10941:10b] Found: &
2008-11-11 20:41:47.807 XMLMe[10941:10b] Found: Right
As you can see the parser is finding the "Left" then the "&" and then "Right" as three separate events that are sent to the delegate.
I can't really tell from your posting, but you need to make sure that the proper entity is used in the XML file "&" rather than just "&" character, which of course is invalid in XML files.