Displaying the contents of the xml page - iphone

I am new to iphone development.I want to parse an xml page .The source code contains some htmls tags.This html tag is displayed in my simulator.I want to filter the tags and display only the content.The sorce code of xml is like
<description>
<![CDATA[<br /><p class="author"><span class="by">By: </span>By Sydney Ember</p><br><p>In the week since an earthquake devastated Haiti ...</p>]]>
</description>
I want "in the week since an ..." to be displayed and not the html tags.Please help me out.Thanks

As said before in other answers, the data in your xml is inside a CDATA block - this means that when you get the contents of the tag, the XML parser won't be able to get rid of the 'By:' bit for you - as far as it's concerned, it's all just text.
However,if you're going to display it inside as HTML inside a UIWebView (instead of a UILabel etc), you can add a style sheet to the start of the string that makes the 'By:' hidden. Something like
NSString *cssString = #"<style type='text/css'>span.by { display:none; }</style>"
NSString *html = [NSString stringWithFormat:#"<html><head>%#</head><body>%#</body></html>", cssString, descriptionString];
[webView loadHTMLString:html baseURL:nil];
where descriptionString is the contents of the <description> tag in your xml.
However this approach is a little heavy handed, I would try very hard to get some cleaner xml from your server!
As for actually parsing the xml, try the NSXMLParser object.

The contents inside a CDATA block are considered as text (xml specific chars like <, &, > etc will be ignored and treated as plain chars). If the text canvas you're using to display the text accepts html, read the text node of description tag and assign it to the innerHTML equivalent of the canvas.

I see that all the tags are HTML. In addition, there is a CDATA that defines that its content should be considered as text and not XML. As for the XML parsing - there are few XML parsers available for iPhone:
TouchXML
XPathQuery
I prefer the latter.
I'm not sure how the parsers will treat the CDATA.
Maybe you will have to parse twice - first time for getting the CDATA contents and second time for parsing the content...

Related

XMLParser encounter invalid tags

I'm writing an RSS reader app using Swift. I use the built-in class XMLParser to do the parsing job.
The XMLParser would stop when encounter some strange tags, for instance, <figure>(This tag is matched by end tag </figure>). The error code is 76(tagNameMismatchError).
I extract the part causing the tagNameMismatchError from xml:
<figure tabindex="0" draggable="false" class="ss-img-wrapper" contenteditable="false"><img src="https://cdn.sspai.com/2019/08/19/34d2340bbf2cbc3b08ffe4fe1594168d.png" alt=""><figcaption class="ss-image-caption">图 / iHelpBR</figcaption></figure>
Why this error(tagNameMismatchError)? It is <figure> an invalid tag or something else?
Besides, I can't predict what possible tags could come from possible feeds.
The problem is the img tag, which is not terminated. This is not valid XML. HTML is more lax regarding closing tags than XML is. Insert a </img> or change the img tag to be <img src=... /> and it will work.
If you ever need to confirm that the content is valid XML, you can also save it to a file and then use the command line xmllint which will report (emphasis added):
parser error : Opening and ending tag mismatch: img line 1 and figure
Bottom line, you’ll need to fix the XML, or use a HTML parser (such as Hpple or NDHpple) instead.

Need to find the tags under a tag in an XML using jQuery

I have this xml as part of the responseXml of an Ajax call:
<banner-ad>
<title><span style="color:#ffff00;"><strong>Title</strong></span></title>
</banner-ad>
When I used this jQuery(responseXml).find("title").text(); the result is "Title".
I also tried jQuery(responseXml).find("title:first-child") but the result is [object Object].
I want to get the result:
<span style="color:#ffff00;"><strong>Title</strong></span>
Please let me know how to do this in jQuery.
Thanks in advance for any help.
Regards,
Racs
Your problem is that you cannot simply append nodes from one document (the XML response) to another (your HTML page). The issue is two-fold:
You can use jQuery to append nodes from the XML document to the HTML page. This works; the nodes appear in the HTML DOM, but they stay XML nodes and therefore the browser ignores the style attribute, for example. Consequently the text will not be yellow (#ffff00).
As far as I can see, jQuery offers no built-in way to get the XML string (i.e. a serialized node) from an XML node. jQuery can handle XML documents quite well, but there is no equivalent to what .html() does in HTML documents.
So to make this work we need to extract the XML string from the XML document. Some browsers support the .xml property on XML nodes (namely, IE), the others come with an XMLSerializer object:
// find the proper XML node
var $title = $(doc).find("title");
// either use .xml or, when unavailable, an XMLSerializer
var html = $title[0].xml || (new XMLSerializer()).serializeToString($title[0]);
// result:
// '<title><span style="color:#ffff00;"><strong>Title</strong></span></title>'
Then we have to feed this HTML string to jQuery so new, real HTML elements can be created from it:
$("#target").append(html);
There is a fiddle to show this in action: http://jsfiddle.net/Tomalak/QWHj8/. This example also gets rid of the superfluous <title> element.
Anyway. If you have a chance to influence the XML itself, it would make sense to change it:
<banner-ad>
<title><span style="color:#ffff00;"><strong>Title</strong></span></title>
</banner-ad>
Just XML-encode the payload of <title> and you can do this in jQuery:
$("#target").append( $(doc).find("title").text() );
This would probably work:
$(responseXml).find("title").html();

Getting HTML tags after the XML Parsing?

I'm getting HTML tags in response and I want to display the response data in TextView.
Like:-(this is my data after the parsing)
like share, in 2010's ninth quarter.
international workers.
How can i show this data directly is there a way to remove these HTML tags after the parsing.
I also used this
[temp replaceOccurrencesOfString:#"´" withString:#"'" options:NSLiteralSearch range:NSMakeRange(0,[temp length])];
this works after parsing the data before displays to the TextView.
But what about other tags and special characters.
Can any one provide me the better way to do this work...
Thanks in advance.

Ignore CDATA while xml parsing

I am new to iphone development.I want to ignore CDATA tag while parsing because it consider the HTML tag following it as text.Since i want to display the content alone ,i want my parser to ignore CDATA tag.My source code is
[CDATA[<br /><p class="author"><span class="by">By: </span>By Sydney Ember</p><br><p>In the week since an </p>]].
Is there any way to ignore CDATA tag?
Is there any way to parse my source twice so it displays only the content?
Please give me some sample code.Please help me out.Thanks.
If you treat the CDATA content as XML instead of CDATA then your parser will throw an error (since your HTML is a weird mix of XHTML and HTML and is not well formed).
If you want to get the HTML, then parse the XML, extract the text content of the node, then parse that text as HTML.
There is no way to ignore the CDATA tag - it's part of the xml spec and parsers should honour it.
If you don't like the idea of this answer to your earlier question, you could get the contents of the CDATA section and parse it as XML again. However, this is highly not recommended! You don't know that the contents of the CDATA are going to be valid xml (they're probably not).
If you can 100% guarentee that the CDATA section contains the form you have above, you could probably use some string manipulation to get the data out (i.e. string replace '<span class="by">By: </span>' with '') but again, this will almost certainly break if the CDATA contents change.
Where is the xml coming from? It's a better idea to talk to owner of the service and get them to send you instead of description something like
<description>
<author>By Sydney Ember</autho>
<text>In the week since an </text>
</description>
S

Remove HTML entities

I am developing an application for iPhone.I need to remove html entities like ["<p>"] in a parsed xml response.Is there any direct way to remove all such entities.??
How is the data formatted? Are the HTML entities esacaped in the original XML, something like this:
<xml><content type="html"><p>A paragraph.</p></content></xml>
In this case you could just strip the tags with a regular expression.
Otherwise, I would suggest following the DTD of the XML file, and stripping all other tags under the assumption that they don't constitute part of the XML markup.