iOS NSXMLParsing unformed HTML - iphone

This is my method body for parsing "img src" image links from poorly formed html generated by an RSS feed... I am aware that NSXML only parses XML, but I have this hope that it can stumble through the mess to find these miniscule image links from messy html.
I'm trying to retrieve ONLY the FIRST image link found in the src attribute I find in each element name called IMG in nsData that has a src attribute and then save it to a NSString *img in another class. The img tags are not all the same, for instance an instance of nsData will contain only one image instance like any one of these:
< img class="ms-rteStyle-photoCredit" src="www.imagelinkthatineed.com" stuff I don't need
< img alt="" src="www.imagelinkineedfortableimagecellpreview" stuff I don't need
< img class="ms-rteStyle-photoCredit" src="www.IneedThisLink.com" more stuff I don't need
The only class that seems to generate NSLog output is the first one.
How can I get the parser methods to actually run ?
Given that there's a way, is there a different, simpler way you recommend?
#import "HtmlParser.h"
#import "ArticleItem.h"
#implementation HtmlParser
#synthesize elementArray;
- (HtmlParser *) InitHtmlByString:(NSString *)string {
// NSString *description = [NSString string];
NSData *nsData = [[NSData alloc] initWithContentsOfFile:(NSString *)string];
elementArray = [[NSMutableArray alloc] init];
parser = [[NSXMLParser alloc] initWithData:nsData];
parser.delegate = self;
[parser parse];
If I NSLog(#"%#", nsData); in this method body, the output spits out the raw HTML.
currentHTMLElement = [ArticleItem alloc];
return self;
}
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
if ([elementName isEqualToString:#"img src"]) {
currentHTMLElement = [[ArticleItem alloc] init];
}
NSLog(#"\t%# found a %# element", self, elementName);
}
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
if (!currentHTMLElement)
currentHTMLElement = [[NSMutableString alloc] initWithString:string];
NSLog(#"Processing Value: %#", currentHTMLElement);
}
- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
if ([elementName isEqualToString:#"img src"])
{
currentHTMLElement.img = elementName;
[elementArray addObject:currentHTMLElement];
currentHTMLElement = nil;
currentNodeContent = nil;
}
else
{
if (currentHTMLElement !=nil && elementName != nil && ([elementName isEqualToString:#"img src"]))
{
[currentHTMLElement setValue:currentHTMLElement forKey:elementName];
}
}
currentHTMLElement = nil;
}
#end
Thank you for your thoughts.

Given that HTML is generally not well-formed XML, NSXMLParser might not work. If you want to parse HTML, you might refer to this Ray Wenderlich article, How to Parse HTML on iOS. If you've followed those instructions and have added Hpple to your project, you can then retrieve the image src attributes like so:
#import "TFHpple.h"
- (void)retrieveImageSourceTagsViaHpple:(NSURL *)url
{
NSData *data = [NSData dataWithContentsOfURL:url];
TFHpple *parser = [TFHpple hppleWithHTMLData:data];
NSString *xpathQueryString = #"//img";
NSArray *nodes = [parser searchWithXPathQuery:xpathQueryString];
for (TFHppleElement *element in nodes)
{
NSString *src = [element objectForKey:#"src"];
NSLog(#"img src: %#", src);
}
}
Alternatively, and I say this bracing myself for the onslaught of anti-NSRegularExpression responses (in the vein of my all-time favorite Stack Overflow answer), if you want a list of img tags in an html file, you can use the following somewhat complicated regular expression:
- (void)retrieveImageSourceTagsViaRegex:(NSURL *)url
{
NSString *string = [NSString stringWithContentsOfURL:url
encoding:NSUTF8StringEncoding
error:nil];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:string
options:0
range:NSMakeRange(0, [string length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSString *src = [string substringWithRange:[result rangeAtIndex:2]];
NSLog(#"img src: %#", src);
}];
}
If you wanted to use NSXMLParser, it would look like so:
- (void)retrieveImageSourceTagsViaNSXMLParser:(NSURL *)url
{
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:url];
parser.delegate = self;
[parser parse];
}
#pragma mark - NSXMLParserDelegate methods
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
if ([elementName isEqualToString:#"img"])
{
NSString *src = attributeDict[#"src"];
NSLog(#"img src: %#", src);
}
}
The problem is, in my experience, NSXMLParser is less successful in parsing HTML than LibXML2/Hpple is. I find that on some simple pages, the above works great. But in other situations, it doesn't. Bottom line, While NSXMLParser is great at parsing well-formed XML, I'd be wary of using it for the parsing of HTML.

Related

Trying to parse xml list without parent node using NSxmlparser

I have already read about NSxmlparser. I have the following file, but I do not understand how I should do for the parser.
i Trying to parse xml list without parent node using NSxmlparser
<nodes1>
<child1>txt1</child1>
<child2>Txt2</child2>
</nodes1>
<nodes1>
<child1>Txt3</child1>
<child2>Txt4</child2>
</nodes1>
<nodes1>
<child1>Txt5</child1>
<child2>Txt6</child2>
</nodes1>
Get the file and start parsing
NSError *error;
NSString *xmlPath = [[NSBundle mainBundle] pathForResource:#"yourFile" ofType:#"xml"];
NSString* contents = [NSString stringWithContentsOfFile:xmlPath encoding:NSUTF8StringEncoding error:&error];
self.xmlData = [[NSData alloc] init];
self.xmlData = [contents dataUsingEncoding:NSUTF8StringEncoding];
self.xmlParser = [[NSXMLParser alloc] initWithData:self.xmlData];
[self.xmlParser setDelegate: self];
[self.xmlParser setShouldResolveExternalEntities: YES];
[self.xmlParser parse];
then for parsing
BOOL isValid_child1 = NO;
-(void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *) namespaceURI qualifiedName:(NSString *)qName attributes: (NSDictionary *)attributeDict
{
if ([elementName isEqualToString:#"child1"]) {
isValid_child1 = YES;
}
}
-(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
if (isValid_child1) {
valueInChild1 = string; // string is txt1
}
-(void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
if ([elementName isEqualToString:#"child1"]) {
isValid_child1 = NO;
[array addObject:valueInChild1]; // to add value for child1 to an array
}
I know this is dirty ;) but it works, you can do it more flexible.

How to display the xml parsing data in the UItextfield in iphone

I am able to parse the xml data and able to display it in the console, but not able to display that data in the UITextField or in UILabel.
I tried to assign to textfield in the viewDidLoad method also.
The following is my code,
NSMutableString *currentNodeContent;
NSXMLParser *parser;
ViewController *currentProfile;
bool isStatus;
ViewController *xmlParser;
-(id)loadXMLByURL:(NSString *)urlString
{
_profile = [[NSMutableArray alloc] init];
NSURL *url = [NSURL URLWithString:urlString];
NSData *data = [[NSData alloc] initWithContentsOfURL:url];
parser = [[NSXMLParser alloc] initWithData:data];
parser.delegate = self;
[parser parse];
return self;
}
-(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
-(void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
if([elementName isEqualToString:#"profileinfo"])
{
currentProfile = [ViewController alloc];
isStatus = YES;
}
if([elementName isEqualToString:#"first_name"])
{
currentProfile = [ViewController alloc];
isStatus = YES;
}
}
-(void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
if([elementName isEqualToString:#"first_name"])
{
currentProfile.firstName = currentNodeContent;
NSLog(#"%#",currentProfile.firstName);
first_Name.text = currentNodeContent;//UITextField
first_name.text = currentNodeContent;//Label
}
if([elementName isEqualToString:#"last_name"])
{
currentProfile.lastName = currentNodeContent;
NSLog(#"%#",currentProfile.lastName);
last_Name.text = currentProfile.lastName;
}
if([elementName isEqualToString:#"profileinfo"])
{
[self.profile addObject:currentProfile];
currentProfile = nil;
currentNodeContent = nil;
}
}
- (void)viewDidLoad
{
[super viewDidLoad];
xmlParser = [[ViewController alloc] loadXMLByURL:#"http://www.mxxxxx.net/xxx/xxxxx.aspx?type=proifileinfo&loginid=xxx#gmail.com"];
}
Instead of assigning the values directly to the 'UITextField' or 'UILabel', have the values stored in a string. And when the parsing action completed, assign the string value to the 'UITextField' or 'UILabel'; Probably you should do that in '-viewWillAppear' method :-)
Create a IBOutlet UILabel *label; in the interface, connect it in Interface Builder, and then you can set the text of it by label.text = #"Anything";
I have also the same problem while parsing XML parser.
currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
You are using the above line to trim spaces and tabs. But sometimes It does not work.
So, I implement the following code. The code may be useful to you.
NSString *sname = [currentNodeContent stringByReplacingOccurrencesOfString:#"\n" withString:#""];
NSString *actualString = [sname stringByReplacingOccurrencesOfString:#"\t" withString:#""];
Then pass the actualString to whatever you required.

NSXMLParser divides strings containing foreign(unicode) characters

I have ran into a peculiar problem with NSXMLParser.
For some reason it cuts out all the characters in front of all the norwegian characters æ, ø and å.
However, the problem seems to be the same with all non a-z characters.(All foreign characters)
Examples:
Reality: Mål
Output: ål
Reality: Le chant des sirènes
Output: ènes
Heres an example from the log where I have printed out the string from:
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
Log:
2012-02-22 14:00:01.647 VotePlayClient[2776:207] found characters: Le chant des sir
2012-02-22 14:00:01.647 VotePlayClient[2776:207] found characters: ènes
You can clearly see that it jumps to a new line whenever it encounters a foreign letter.
I believe that I have to figure out how to append the string or something to that effect.
Here are the NSXMLParser files:
SearchXMLParser.h
#import <Foundation/Foundation.h>
#import "Search.h"
#interface SearchXMLParser : NSObject <NSXMLParserDelegate>
{
NSMutableString *currentNodeContent;
NSMutableArray *searchhits;
NSMutableArray *trackhits;
NSXMLParser *parser;
Search *currentSearch;
}
#property (readonly, retain) NSMutableArray *searchhits;
#property (readonly, retain) NSMutableArray *trackhits;
-(id) loadXMLByURL:(NSString *)urlString;
#end
SearchXMLParser.m
#import "SearchXMLParser.h"
#import "Search.h"
#implementation SearchXMLParser
#synthesize searchhits, trackhits;
-(id) loadXMLByURL:(NSString *)urlString
{
searchhits = [[NSMutableArray alloc] init];
trackhits = [[NSMutableArray alloc] init];
NSURL *url = [NSURL URLWithString:urlString];
NSData *data = [[NSData alloc] initWithContentsOfURL:url];
parser = [[NSXMLParser alloc] initWithData:data];
parser.delegate = self;
[parser parse];
return self;
}
- (void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
if ([elementname isEqualToString:#"track"])
{
currentSearch = [Search alloc];
}
if ([elementname isEqualToString:#"track"])
{
currentSearch.trackurl = [attributeDict objectForKey:#"href"];
}
}
- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
if ([elementname isEqualToString:#"name"])
{
[trackhits addObject:currentNodeContent];
}
if ([elementname isEqualToString:#"track"])
{
currentSearch.track = [trackhits objectAtIndex:0];
currentSearch.artist = [trackhits objectAtIndex:1];
currentSearch.album = [trackhits objectAtIndex:2];
[trackhits removeAllObjects];
[searchhits addObject:currentSearch];
[currentSearch release];
currentSearch = nil;
[currentNodeContent release];
currentNodeContent = nil;
}
}
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
NSLog(#"found characters: %#", string);
currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
- (void) dealloc
{
[parser release];
[super dealloc];
}
#end
I have already checked SO for answers and found a couple of similar posts, but nothing that gave a clear solution to this problem.
Can anyone shed some light on this problem? :) Any help is much appreciated!
your parser:foundCharacters: method does not work as it should.
This is from the NSXMLParserDelegate Protocol Reference
The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.
you could try something like this (ARC):
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
NSLog(#"found characters: %#", string);
if (!currentNodeContent) {
currentNodeContent = [[NSMutableString alloc] init];
}
[currentNodeContent appendString:string];
}
- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
// your code here
// when you are done with the string:
currentNodeContent = nil;
}

Desperate for NSXMLParser Guidance

I previously asked this question XMLParser Advice.
However I am still unable to get it to function properly....
So I guess I will start from scratch:
Located at a certain URL is an XML Tree that looks like this
<result>
//stuff that I dont need
<title>
//thing that I do need
</title>
//stuff that I dont need
<body>
//thing that I do need
</body>
<result>
How the heck do I go about parsing that?
The (useless) code I have so far can be found in the link at the top of this question.
Thank you for your time.
Write a simple class, which will be the parser's delegate.
#interface YourObject : NSObject <NSXMLParserDelegate> {
NSString *title, *body; // object attributes
NSXMLParser *parser; // will parse XML
NSMutableString *strData; // will contains string data being parsed
}
#property(readwrite, copy) NSString *title, body;
// will be used to set your object attributes
-(void)fetchValuesAtURL:(NSString *)url;
#end
The fetchValuesAtURL: method will initiate the parse operation.
#implementation YourObject
#synthesize title, body;
-(id)init {
self = [super init];
if(self) {
title = #"";
body = #"";
parser = nil;
strData = [[NSMutableString alloc] initWithCapacity:10];
}
return self;
}
-(void)fetchValuesAtURL:(NSString *)url {
if(parser) {
[parser release];
}
NSURL *xmlURL = [NSURL URLWithString:url];
parser = [[NSXMLParser alloc] initWithContentsOfURL:xmlURL];
[parser setDelegate:self];
[parser parse];
}
-(void)parser:(NSXMLParser *)parser
didStartElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI
qualifiedName:(NSString *)qName
attributes:(NSDictionary *)attributeDict {
// element is about to be parsed, clean the mutable string
[strData setString:#""];
}
// the probably missing method
-(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
// content (or part of) has been found, append that to the current string
[strData appendString:string];
}
-(void)parser:(NSXMLParser *)parser
didEndElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI
qualifiedName:(NSString *)qName {
// element has been parsed, test the element name
// and store strData accordingly
if([elementName isEqualToString:#"title"]) {
self.title = strData;
}
else { // or else if, here you got two elements to parse
self.body = strData;
}
}
-(void)dealloc {
[title release];
[body release];
[strData release];
if(parser) {
[parser release];
}
[super dealloc];
}
#end
Then :
YourObject *obj = [[YourObject alloc] init];
[obj fetchValuesAtURL:#"http://www.site.com/xml/url"];
NSXMLParser's delegate is able to do many more things, as described in Event-Driven XML Programming Guide from Apple.
For complete reference on delegate methods, see NSXMLParserDelegate Protocol Reference.

How to parse a locally stored XML file in iPhone?

How to parse a locally stored XML file in iPhone?
please help me with this using code snippets
I have used NSXMLParser and i achieved it. I have r.xml file in my resource. I have just parsing the title and displayed using NSXMLParser.
r.xml:
<rss>
<eletitle > My Xml Program </eletitle>
</rss>
Here my sample code is,
#interface:
NSXMLParser *rssparser;
NSMutableArray *stories;
NSMutableDictionary *item;
NSMutableString *currrentTitle;
NSString *currentElement;
#implementation:
-(void) viewDidAppear:(BOOL) animated
{
[self parseXMLFileAtURL];
}
-(void) parseXMLFileAtURL
{
stories = [[NSMutableArray alloc] init];
NSURL *xmlURL = [NSURL fileURLWithPath:[[NSBundle mainBundle] pathForResource:#"r" ofType:#"xml"]];
rssparser = [[NSXMLParser alloc] initWithContentsOfURL:xmlURL];
[rssparser setDelegate:self];
[rssparser setShouldProcessNamespaces:NO];
[rssparser setShouldReportNamespacePrefixes:NO];
[rssparser setShouldResolveExternalEntities:NO];
[rssparser parse];
}
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict{
currentElement = [elementName copy];
if([elementName isEqualToString:#"rss"]);
{
item = [[NSMutableDictionary alloc] init];
currrentTitle = [[NSMutableString alloc] init];
}
}
-(void) parser:(NSXMLParser *)parser foundCharacters:(NSString *) string
{
if([currentElement isEqualToString:#"eletitle"])
{
[currrentTitle appendString:string];
}
}
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
if([elementName isEqualToString:#"rss"])
{
[item setObject:currrentTitle forKey:#"eletitle"];
[stories addObject:[item copy]];
}
}
- (void)parserDidEndDocument:(NSXMLParser *)parser
{
NSLog(#"The currrentTitle is %#",currrentTitle);
}
Best of Luck.
I'm sorry I cannot give you any snippet now, but in one project I did some time ago, we used the touchXML library.
http://code.google.com/p/touchcode/wiki/TouchXML
With this, parsing XML was pretty easy.
Good luck!