iPhone HTML Parsing using TouchXML and tidy - iphone

I'm trying to parse HTML using TouchXML. However, it seems that the data I want to parse (I do not control the source, it's downloaded from the internet) is partially malformed - I get various errors during the parse. Therefore, it seems that I should be using the inbuilt tidy support to fix the HTML but I cannot seem to find any documentation or information on how to enable it or link libtidy successfully into my project.
If anyone has any information on how to do this, it'd be much appreciated. Alternatively if there's another tool I could be using to do this - do tell me!

Actually, you can both link to the framework and include the headers, without needing to download the source.
Link to the existing framework libtidy.dylib
Add /usr/include/tidy to HEADER_SEARCH_PATHS

Turns out that although the framework can be linked in to an xcode project, the headers are missing. I have got around this by downloading the HTML Tidy Source (src and include directory) and added them in to compile as part of my xcode project.

Related

How to delete the old files that are created with change in the js and css files without manually cleaning them everytime using SquishIt framework

I am working on asp.net mvc2 project. Due to some limitations I cannot migrate the code to asp.net mvc4 version and due to this I am not able to use the Bundling and Magnification feature for bundling and minifying the js and css files. I used the SquishIt framework to work on this. Can anyone help me to know are there any other good options other than SquishIt framework to work on this task.
I used the following url to implement SquishIt framework:
http://www.codethinked.com/SquishIt-The-Friendly-ASPNET-JavaScript-and-CSS-Squisher
Can anyone help me know how to delete the old files that are created with change in the js and css files without manually cleaning them everytime.
Thanks & Regards,
Santosh Kumar Patro
The best solution for this is to not include the hash in your filename (I assume you are rendering to filename_#.ext). You'd just need to remove the '#' from your rendered filename, then SquishIt will use querystring invalidation by default. This will only keep a single copy of each bundled file on disk. If querystring invalidation won't work for you there is also a relatively new option that allows you to place the hash into the path as a directory, then scrub it out with an IIS rewrite rule.
For more on SquishIt's support for different cache invalidation strategies read this

How to solve iPhone app hpple HTML Parsing 'libxml/tree.h' file not found?

I am trying to parse HTML URL content using hpple for iPhone app. I want to parse and get data from like this URL http://www.example.com/mobile/403.html. I have used Google and found hpple for HTML parsing. I got the sample HTML parsing hpple code from github. When i start to run the project the below error is occurring
'libxml/tree.h' file not found
I have added this line ${SDKROOT}/usr/include/libxml2 in project->build->header search paths but not it is looking like this iPhoneos3.1.2/usr/include/libxml2/** and also i have included libxml2.dylib and libxml2.2.dylib in my project. Am working in XCode 4.2. Could you please help to solve this error? Thanks in advance.
Make sure you add this line:
${SDKROOT}/usr/include/libxml2
in TARGETS->build->header search paths...
and make sure that you copied your hpple files into your project like this...
Good luck and hope this helps!

Using TouchXML with HTML Tidy

I am trying to set up TouchXML in my iPhone app to parse HTML from a website, but unfortunately the website's HTML isn't valid XML. I'd like to use HTML tidy to tidy it up, and in fact TouchXML has a setting, TOUCHXMLUSETIDY, which when turned on in fact does this. But when I turn on this setting, I get the following errors: Tidy.h: No such file or directory found. I have libtidy.dylib installed in my target, and tried downloading the HTML Tidy source and putting it directly into my app, but nothing is working. Any suggestions for how to tidy up HTML into valid XML on the iPhone?
Tidy.h is found in /usr/include/tidy add that to your User Header Search Paths in xcode as well as add -ltidy to your Other Linker Flags. You should be all set.
Did you download the TouchXML?
Then you should add the files in /TouchXML/Externals/tidy/src to your project. That's the tidy!

Building libxslt for iPhone

I just had an app rejected for linking to libxslt using this technique.
I'd really like to use XSLT in my app, so it looks like my only shot is to compile it myself. I don't want to use a UIWebView because I want to store the resulting HTML, not just display it.
Has anyone done this -- compiled libxslt for the iPhone?
After some Googling, I've got an old Xcode solution from here and git-cloned the latest libxslt from gnome.org. Neither approach has worked out so far (autoconf bails out, and the Xcode project is missing a bunch of files).
Any advice would be appreciated.
Thanks!
Update 7/21
I found a workaround by giving UIWebView some XML linked to an XSL stylesheet. After the data loads,
I can grab the transformed HTML like this (source):
- (void)webViewDidFinishLoad:(UIWebView *)webView
{
html = [webView stringByEvaluatingJavaScriptFromString:
#"document.documentElement.outerHTML;"];
}
That said, if anyone has any hints on using libxslt directly, I'd love to hear them. Dropping into javascript seems so...unsavory.
You're not allowed to add dylibs, but you can link to a .a file. Can you get the configure for libxslt to output a .a file instead?

iPhone RSS Reader -- parseXML won't Load some XML feeds

I am using the SIMPLE RSS reading example found at http://theappleblog.com/2008/08/04/tutorial-build-a-simple-rss-reader-for-iphone/
It uses parseXML to load the RSS feeds.
Here is the problem I am having. For the following RSS feed example, I am having trouble getting it to load the feed. Comes up with an error that it cannot connect. However on my Mac RSS Reader it works fine, so I know the link is good.
Any ideas on why it cannot load this particular feed but it can load others fine?
http://www.okstate.com/rss.dbml?db_oem_id=200&media=news
Thanks.
I've just released an open source RSS/Atom Parser for iPhone and hopefully it might be of some use.
I'd love to hear your thoughts on it too!
In my experience, HTML markup causes an RSS parser to fail in most cases. I've experienced a problem like this with a lot of parser classes I've come across (in search of the ultimate one, which I didn't find)
My guess is that entities such as
's
are responsible for your crash. That was usually the case with my crashes. This also lead to my decision to create a 'proxy server' to pre-parse the XML before sending it to the iPhone (which gives me the advantage of caching, scaling, and some other stuff). I do believe there are solid solutions out there, but is always difficult writing a parser for so many RSS implementations.
P.S: W3C validates this feed as 'valid', so it really is 'our' problem..
Your problem could lie with:
Unicode characters (i.e. I see some o's with two dots above them in the feed)
The code you have doesn't respect CDATA sections correctly
To find out which is the case, save the feed file to your local disk and load it via your code to make sure the error happens.
Do a binary search on the file to find out if a particular RSS entry is causing the problem (i.e. remove all but the first rss entry and see if the problem exists. If it does, then the problem is there, if it doesn't put half the rss entries back in the file and repeat)
I've been experiencing a similar issue. I haven't yet pinned down the answer, but I've noticed that RSS 2 tends to parse more successfully than the rest.
There are many RSS feeds that contain invalid XML, usually because they were hacked together on the server side using HTML templates by somebody who didn't understand XML. I've seen improperly escaped (or non-escaped) HTML post contents, missing close tags, badly nested tags, and so on.
If you want to be able to parse arbitrary feeds, you have to clean up bad XML. The usual way is to use the "htmlTidy" library, which is included in the OS. This can clean up XML as well as HTML.
This example you're following uses NSXMLParser -- I have no idea why. It's a lower-level API and it doesn't support tidying. I would suggest using NSXMLDocument instead. There's a flag in that API that will tell it to use tidy when parsing the XML. This API also returns you the XML as a handy tree of elements that's easy to work with.