I'm looking for detailed instructions how to parse an html table content into UITableView.
I have the data from the NSURLConnection but I need specific part of the table.
Thanks!
You may use one of the DOM XML parsers, like GData, KissXML or TouchXML +
XPath API (all 3 seems to support it) to extract your data.
Related
I just implemented my first XmlParser object (MyParserObj) that relies on the NSXMLParser object.
This parser is embedded inside a tableviewController (MyTableViewController) and it starts parsing at MyTableViewController's viewDidLoad method.
Ok.This is working just fine.It's a small Xml file though! I was wondering if I should choose a different approach when dealing with big Xml files? Will memory suffer when parsing large xml documents?
UPDATE
The real point I want to understand is the flow of the process:
I placed few breakpoints now and it looks like this:
the app first encounters ( inside MyTableViewController's viewDidLoad method )the [MyParserObj parseXMLFileAtURL:path] and starts parsing the Xml document;
The app finishes parsing the whole Xml Document (parserDidEndDocument..);
The tableviewController starts populating its tableView Cells (cellForRowAtIndexPath..);
Apart from choosing a XmlParser (Between those you suggested) that is more or less time/memory consuming are the above steps going to be the same?
If positive, is it correct to think of starting populating the cells as soon as The Parser is done with that specific element?How do I do that?
thanks
Luca
Jim Dovey create a nice blog post about parsing big XML files.
http://blog.alanquatermain.me/2009/04/06/aqxmlparser-equals-equals-big-memory-win/ here Jim describes his one XML parser wich user les memory then other parser.
So if you need to parse large XML files I would suggest you have a look het the open source XML parser AQXMLParser which is the parser create in the blog post.
On iPhone devices, you don't have the NSXMLDocument available on MAC which reads the whole XML document into memory. There are many XML Parser classes available, and you can find them in the link below.
http://www.raywenderlich.com/553/how-to-chose-the-best-xml-parser-for-your-iphone-project
This tutorial on Ray Wenderlich is what u need to read to know which xml reader is best for your app.
For my own applications, I use SMXMLDocument, you can read about it here:
http://nfarina.com/post/2843708636/a-lightweight-xml-parser-for-ios
for big data, i suggest u json instead of xml, see here a tutorial
http://www.readwriteweb.com/hack/2010/11/json-vs-xml.php
I want to parse the xml file independent of the tags .So the code for the parsing should become generic one. Is there any way to do this in iPhone which is flexible with tags. I have tried to solve this problem by parsing the xml two times.In first parse i have extracted tags only and in second pass i tried to find the value of that tag. But there was the problems with this approach . So is there any api or logic to parse xml independent of tags? is it really possible?
In iOS there is no existing API or logic for parsing xml files with unknown tags. To parse xml you need to know tags of that xml.
Essentially, I currently have an iPhone app that can query and parse an XML file on my server. Right now, I currently have to manually update and upload my XML file every morning so my users can have the updated information. I would like to automate this process, which would essentially entail parsing various websites (NYTimes, iAmBored.com, etc), outputting the relevant information from each of these websites to an XML file, and uploading that file to my server.
Does anyone know the best way to accomplish this (parsing HTML to an XML file). Since I am a beginner, I'm not sure what languages this requires or what is the best way to do this?
Thanks a lot in advance!
You can try to translate HTML to XHTML (XHTML is based on XML so it's XML with some rules defined in a DTD).
You can also try to parse directly HTML with a SGML parser (As XHTML is based on XML, HTML is based on SGML).
The links are provided as inspiration.
If the content you need to scrape is in XHTML then you can easily use the XSLT language to transform original content in what you need inside the XML you provide to your users.
Otherwise any kind of scraping and XML producing solution will be fine, every programming language has its support to do such things.. but you could use XPath to select the elements you need from the page and then save them inside the output file.
Can you get what you need from the RSS/Atom feeds? That will simplify things greatly because they are XML rather than HTML and can be parsed by a standard XML parser. Of course, descriptions embedded inside RSS feeds will be HTML, so depending on your application, that may be when you need to parse HTML.
XSLT is a domain-specific programming language designed for processing XML, but you can also use any programming language that includes an XML parser for the task.
TagSoup - Just Keep On Truckin'
...a SAX-compliant parser written in Java
that, instead of parsing well-formed
or valid XML, parses HTML as it is
found in the wild: poor, nasty and
brutish, though quite often far from
short.
TagSoup is designed for people
who have to process this stuff using
some semblance of a rational
application design.
By providing a SAX
interface, it allows standard XML
tools to be applied to even the worst
HTML. TagSoup also includes a
command-line processor that reads HTML
files and can generate either clean
HTML or well-formed XML that is a
close approximation to XHTML.
Also, Taggle, a TagSoup in C++, available now
i m using php file for using data in my application,
in this file i post data on the server and if i get the data from the server
then it is in html formate.
so problem is that i have a string with html tags how i use data in that string.
how i extract data from html string.
Use NSXMLParser class. it works for HTML too. There are three useful delegate methods.
If your HTML out put is some simple data - may be you can write some simple NSString parser your self like 'markhunte' mentioned, if you have large complex data in HTML then you have to go for some open source parsers.
Cocoa does not provide HTML parser, Forum discussion claims in some case XML parser itself work for you, but I never go it working for my data.
In my case I had very simple TAG which I had handled using my own parser using NSString.
I have used the code from --> Flatten-html-content-ie-strip-tags-cocoaobjective-c.
There are also examples of its use on SO.
Just use NSScanner, it is great for searching in between tags that are permanent. If you post some page code I help you set up the scanner.
Is there a way to parse a website's source on the iPhone to get the URL's of photos on that page? If so how would you do that?
Thanks
I'd say go for regular expressions - there is a one page library that wraps c regexesthat you can drop into your project.
I recommend regular expressions. There's a great open source Regex library for Cocoa called RegexKit. For the most part, you can just drop it in your code and it'll "just work".
Getting all the urls of images wouldn't be too difficult (less than 20 lines of code) if you assume that all images are going to be in <img> tags. You'd just grab all the image tags (something like: <img\s+[^>]+>), then iterate through those matches. For each match, you'd pull out whatever's in the src attribute: src\s*=\s*("|')?\s*([^\s"']+)(\s|"|')
You might need to tweak that a bit, but it shouldn't be too bad.
There is no super easy way. When I had to do it I wrote a libxml2 SAX parser. libxml2 has an html reader that works fairly well with malformed html, and libxml2 is included with the base system.
You could try it using regular expressions, but I wouldn't recommend that. You should have a look at NSXMLParser, assuming the webpage is coded to be XHTML compliant. TouchXML is another good library.
take a look at Event Driven XML Parsing in the iPhone reference library
Are you OK with any approach you use not picking up on images loaded dynamically via JavaScript.
The closest thing I could see working is to parse out any JavaScript imports, load those up too, and then use a regular expression across the whole file looking for anything that ends in ".jpg/.gif/.png" and grab the full URL out from that. The libxml approach would miss out on references to images not in img tags, but it might well be good enough.