to extract a part of the URl after XML parsing? - iphone

I am trying to parse an XML file in which an element named "description" is as given below:
<description>
<![CDATA[
<a href='http://www.okmagazine.com/posts/view/13756/'>
<img src='http://www.okmagazine.com/img/photos/thumbs/27044' />
</a>
<br />
Ashlee and Pete take their tiny tot to FAO Schwarz in NYC for some new toys.
<p> <strong>Pete Wentz</strong> and <strong>Ashlee Simpson Wentz</strong> made the new parent pilgrimage to New York’s FAO Schwarz today, where 6-month old <strong>Bronx Mowgli </strong>was the...]]>
</description>
What I want is to get the link in the tag <img src='http://www.okmagazine.com/img/photos/thumbs/27044'> using which I can display an image in my image view... How can I separate this string from the contents of description tag?
A part of code when parsing is as given below
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
//NSLog(#"found characters: %#", string);
// save the characters for the current item...
if ([currentElement isEqualToString:#"title"]) {
[currentTitle appendString:string];
} else if ([currentElement isEqualToString:#"link"]) {
[currentLink appendString:string];
} else if ([currentElement isEqualToString:#"description"]) {
[currentSummary appendString:string];
} else if ([currentElement isEqualToString:#"pubDate"]) {
[currentDate appendString:string];
}
}
Please help
regards
Arun

I've never used that exact framework, but what you have to keep in mind is that while it will notify you when it finds the CDATA, anything inside is just plain-text to the parser. So it looks like you want to implement foundCDATA. You'll get passed a NSData block, and from there you have to parse the contents. Now, you can use another parser to do that, but it's probably faster just to do manual substring.

Have you thought about using regexp?

NSString *str = #"<![CDATA[<a href='http://www.okmagazine.com/posts/view/13756/'><img src='http://www.okmagazine.com/img/photos/thumbs/27044' /></a><br />Ashlee and Pete take their tiny tot to FAO Schwarz in NYC for some new toys. <p> <strong>Pete Wentz</strong> and <strong>Ashlee Simpson Wentz</strong> made the new parent pilgrimage to New York’s FAO Schwarz today, where 6-month old <strong>Bronx Mowgli </strong>was the...]]>";
NSRange range = [str rangeOfString: #"<img src='"];
str = [str substringFromIndex: range.location + range.length];
range = [str rangeOfString: #"'"];
str =[str substringToIndex: range.location];
CFShow(str);

Attributes are passed in to the didStartElement delegate method of the parser as a dictionary of strings keyed by attribute name. Thus, you can extract the urls you want from the attributes using NSDictionary's objectForKey: with the attribute name as the key. i.e.:
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
if([elementName compare: #"img"] == NSOrderedSame) // check for <img ...> element
{
NSString* url = [attributeDictionary objectForKey:#"src"];
// url now contains the url you require from the HTML

Related

Unable to Parse following XML?

I have continuous tags of img id="#", where # varies from 1 to 9.
The description of tags consists of floating values.
When I use the standard NSXML Parser, i not getting all the values.
My XML for reference:
<img id="1">-0.0111328,-0.0635608,0.152549,0.11211,-0.0250431,
-0.0370875,0.0862391,0.0970791,-0.0195908,
-0.00892297,0.0791795,0.0554013,0.00362028,0.0138572,0.0432729,
0.0253036,-0.0770325,0.14065,0.118424,0.1787,
0.0734354,0.160883,0.101831,0.237038,0.0681151,0.178331,
0.106532,0.224731,0.133766,0.222096,0.165214,0.240752,
-0.0280366,0.106239,0.052094,0.110642,
</img>
How would I parse the above XML?
Kindly, help me out.
Thanx
This is because parser:foundCharacters: does not deliver all characters at once. You need to concatenate all strings that you get between the callbacks of the parser:didStartElement:namespaceURI:qualifiedName:attributes: and parser:didEndElement:namespaceURI:qualifiedName: that you get for the <img> tag.
In the code below, buf is an NSMutableString ivar of your parser delegate.
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict {
if ([qualifiedName isEqualToString:#"img"]) {
buf = [NSMutableString string];
}
}
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
if ([qualifiedName isEqualToString:#"img"]) {
buf = [buf stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSLog(#"Got %#", buf);
}
}
- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
[buf appendString:string];
}
Finally got it... i placed start and end tags for img id = '#'. My structure now looks like this:
<images>
<img id = '1'> -0.0111328,-0.0635608,0.152549,0.11211,-0.0250431,
-0.0370875,0.0862391,0.0970791,-0.0195908,
-0.00892297,0.0791795,0.0554013,0.00362028,0.0138572,0.0432729,
0.0253036,-0.0770325,0.14065,0.118424,0.1787,
0.0734354,0.160883,0.101831,0.237038,0.0681151,0.178331,
0.106532,0.224731,0.133766,0.222096,0.165214,0.240752,
-0.0280366,0.106239,0.052094,0.110642, ....
</img>
<img id = '2'> ...
</img>
....
....
</images>
<mapping>
<map>
<imgid> 1 </imgid>
<keyword> heavy </keyword>
</map>
<map>
<imgid> 2 </imgid>
<keyword> metal </keyword>
</map>
...
...
</mapping>
Placing start and end tags allowed me to parse the whole xml.
Earlier, the start and end tags were for individual images which only resulted in parsing of one img.
This made me add another key point while parsing XML.
Hope this helps others as well.

Get URL from html page - objective

I need to get URL from loaded HTML page. Here is HTML tag where placed my URL
<a class="top_nav_link" id="logout_link" href="https://login.vk.com/?act=logout&hash=29327318c645d49a48&from_host=vk.com&from_protocol=http" onclick="if (!checkEvent(event)) { ge('logout_form').submit(); return false; }">
And URL: "https://login.vk.com/?act=logout&hash=29327318c645d49a48&from_host=vk.com&from_protocol=http"
Hash could be different.
How to get this URL?
Since you say the HTML is actually well-formed XHTML, then you can use any XML parsing method to parse the document and find what you are looking for. Using NSXMLParser and a valid parser delegate, you probably would have something like:
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict {
if ([elementName isEqualToString:#"a"] && [[attributeDict objectForKey:#"id"] isEqualToString:#"logoutLink"]) {
// Found the <a> tag with an id of logoutLink
NSString *linkURL = [attributeDict objectForKey:#"href"];
// Do what you want with the link URL here
}
This assumes you are looking for a specific <a> element with an id of logoutLink. If you are looking for other ways to identify which <a> tag has the URL you want you can adjust the if statement in this sample code accordingly.

NSXMLParser replaces é characters with \U00e9

I'm using an xml parser, NSXMLParser to parse asn xml an return some url in an NSMutableArray.Everything is giong great except the fact that the french é is replaced by \U00e9.
Here is my code:
- (void)parseXMLFileAtURL:(NSString *)URL
{
NSURL *xmlURL = [NSURL URLWithString:URL];
xmlParser = [NSXMLParser alloc] initWithContentsOfURL:xmlURL];
// Set self as the delegate of the parser so that it will receive the parser delegate methods callbacks.
[xmlParser setDelegate:self];
// Depending on the XML document you're parsing, you may want to enable these features of NSXMLParser.
[xmlParser setShouldProcessNamespaces:NO];
[xmlParser setShouldReportNamespacePrefixes:NO];
[xmlParser setShouldResolveExternalEntities:NO];
[xmlParser parse];
}
- (void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict{
currentElement = [elementName copy];
if ([elementName isEqualToString:#"catalogue"]) {
// clear out our story item caches...
currentCatalogue = [[Catalogue alloc] init];
}
if ([elementName isEqualToString:#"partenaire"]) {
// clear out our story item caches...
currentPartenaire = [[Partenaire alloc] init];
}
}
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
if ([elementName isEqualToString:#"catalogue"]) {
// Add currentCatalogue to array
[catalogueList addObject: currentCatalogue];
NSString *urls=[catalogueList valueForKey:#"url"];
NSLog(#"Current catalogue: urls=%#", urls);
}
if ([elementName isEqualToString:#"partenaire"]) {
// Add currentPartenaire to array
[partenaireList addObject: currentPartenaire];
/*NSLog(#"Current partenaire: raison_sociale=%#, lat=%#, lng=%#", currentPartenaire.raison_sociale, currentPartenaire.lat, currentPartenaire.lng);*/
}
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
// Catalogue setup
if ([currentElement isEqualToString:#"id_model"])
currentCatalogue.id_model = string;
if ([currentElement isEqualToString:#"url"])
{
if (currentCatalogue.url)
{
currentCatalogue.url = [NSString stringWithFormat: #"%#%#", currentCatalogue.url, string];
// NSLog(#"Valoare url in data handler %#", currentCatalogue.url);
}
else
currentCatalogue.url = string;
}
}
Anyone any idea how to fix this?
TESTED CODE : 100 % WORKS
NSString* inputString =[NSString stringWithFormat:#"\40_TTRS_Coup\u00e9_TTRS_Roadster_Tarifs_20110428.pdf"];
NSLog(#"inputString is: %# \n\n",inputString);
OUTPUT:
inputString is: _TTRS_Coupé_TTRS_Roadster_Tarifs_20110428.pdf
Assuming your XML has an explicit encoding, as follows:
<?xml version="1.0" encoding="UTF-8"?>
then there are 2 problems you should address first. These may or may not fix your direct problem, but if not they will make it easier to narrow in on the problem you are seeing. Here are the problems you should fix:
Remove the code that calls 2 different initializer methods on a single NSXMLParser instance. That will have undefined results and its impossible to know what is going on until you fix that.
Change how you implement the parser:foundCharacters: method. As NSXMLParser documentations states, this may be called multiple times for a given set of characters within an XML element. Instead of just accepting the string and storing its value away, your delegate class should have a mutable character buffer that you append to each time foundCharacters gets called. Then in parser:didElementEnd you can grab the contents of the buffer and do what you need to with that value.
Try it out with these fixes and see if it works. If not, update your post with a corrected version of your code and it might be more obvious what the problem is.

Why is NSXMLParser picking up this whitespace in the foundCharacters method?

I'm learning to use the NSXMLParser API for the iOS platform and so far it's very easy to use. I'm having a small problem, however, in the foundCharacters method. As I understand it, it shouldn't pick up any whitespace since the foundIgnorableWhitespace method is supposed to catch that, but it looks like it is. Here's the my code...
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
//We're at the start of a new data feed
if([elementName isEqualToString:#"data"])
{
if(listOfTimes != nil)
[listOfTimes release];
listOfTimes = [[NSMutableArray alloc] init];
}
else if ( [elementName isEqualToString:#"start-valid-time"]) {
currentElementType = kXMLElementTime;
return;
}
else {
currentElementType = kXMLElementOther;
}
//---------------------------------------------------------------------
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
if(currentElementType == kXMLElementTime)
{
//We don't want anymore than three times
if ([listOfTimes count] >= 3)
return;
[listOfTimes addObject:string];
}
}
It basically stores three "time" elements in an array. The problem, however, is it seems to be picking up whitespace in the form of a newline. Here's the printout of the array in the console...
Printing description of listOfTimes:
(
"2010-08-21T22:00:00-05:00",
"\n ",
"2010-08-22T01:00:00-05:00"
)
and here's a snippet of the XML data I'm processing...
<time-layout time-coordinate="local" summarization="none">
<layout-key>k-p3h-n40-1</layout-key>
<start-valid-time>2010-08-21T22:00:00-05:00</start-valid-time>
<start-valid-time>2010-08-22T01:00:00-05:00</start-valid-time>
<start-valid-time>2010-08-22T04:00:00-05:00</start-valid-time>
<start-valid-time>2010-08-22T07:00:00-05:00</start-valid-time>
<start-valid-time>2010-08-22T10:00:00-05:00</start-valid-time>
.
.
.
Am I misunderstanding how this works?
Thanks in advance for your help!
The easy solution is to create a didEndElement: method where you set currentElement to kXMLElementOther.
There is a good description of Ignorable White Space at Ignorable White Space. The problem is probably that you do not have a DTD associated with your document. So the parser does not actually know what ignorable white space is. (It is not simply white space between tags, which is probably what you think) So it is simply treating everything as character data.
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
//whatever data i am getting from node i am appending it to the nodecontent variable
[nodecontent appendString:[string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]];
NSLog(#"node content = %#",nodecontent);
}
I found the answer for your question do editing in following piece of code
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
Add this line of code
NSString *stringToDisplay = [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
Now reply where ever you find "string" in your function mentioned above with "stringToDisplay"
It worked for me. Hope it will work for you too.
Enjoy Coding.

Datatypes for use with NSXMLParser

I'm using NSXMLParser to parse XML data from a remote server. I followed a tutorial to get up and running and everything is ok for any (NSString *) members I have in my objects that I'm creating. I also have integers that I need to set from the XML data, such as:
<root>
<child>
<name> Hello </name>
<number> 123 </number>
</child>
<child>
<name> World</name>
<number> 456 </number>
</child>
</root>
In this case I would be creating two "child" objects. I'm using:
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
...
[aChild setValue:currentElementValue forKey:elementName];
...
}
Delegate to set the values. This is where I get to my problem. The "NSString *name" member is set fine everytime, however if I use an NSInteger, then whenever I try to set "number" I get an EXC_BAD_ACCESS. So I tried using an "int" instead. But now I can't use key-value programming and need to look for the node manually:
if([elementName isEqualToString:#"number"]) {
aChild.number = [currentElementValue intValue]
}
This is okay, I can deal with that because I know what nodes I'll be getting, but it gets worse. When currentElementValue is an "NSMutableString *" as per the tutorial, it does not return the correct integer even though the string is correct. For instance:
NSLog(#"NSMutableString Value: %#, and intValue %d\n", currentElementValue, [currentElementValue intValue]);
// Value will be 123 but intValue will be 0
So I made currentElementValue an NSString instead of an NSMutableString and I can get the proper intValue. But I read online that the reason it is an NSMutableString is because the:
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
Delegate used to set the value can occur more than once, but typically does not. So my question is does anybody know what I'm doing wrong? This seems like a pretty trivial case for NSXMLParser so I'm sure it's something I'm misunderstanding.
I don't know where your code is failing, but here's the correct way to handle this:
NSMutableString *buffer = [[NSMutableString alloc] init];
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict {
[buffer setString:#""];
}
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
if([elementName isEqualToString:#"number"]) {
[aChild setNumber:[buffer intValue]];
[buffer setString:#""];
}
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
[buffer appendString:string];
}
intValue should work identically NSString and NSMutableString. Are you sure that Value was '123' and not '\n123' (\n means a new line character), if the string doesn't start with a decimal number then intValue will return 0.
Are you clearing the mutable string correctly at parser:didStartElement:? If you're cleaning only at parser:didEndElement: then parser:foundCharacters: will collect characters from parent element too. Which will prefix your string with newlines in this case and intValue will return 0.
You're correct in that parser:foundCharacters: can be called multiple times for a single element.
Couple things going on here.
First, you have obvious blanks in your XML character data, e.g.
<number> 456 </number>
You should really strip out that whitespace. That is likely what is causing the return value of [NSString intValue] to be wrong. If you can remove it at the source, great. If not, you can strip it out on the receiving end by doing:
currentElementValue = [currentElementValue stringByTrimmingCharactersInSet:
[NSCharacterSet whitespaceAndNewlineCharacterSet]];
The reason you couldn't use key/value is that you can't store an NSInteger value in an NSMutableDictionary. Both keys and values in the dictionary have to descend from NSObject, and NSInteger is (I'm surmising, here) just a platform-safe typedef of int. So you should use an NSNumber instead:
NSNumber *theInt = [NSNumber numberWithInt:[currentElementValue intValue]];
[aChild setObject:theInt forKey:elementName];