REGEX to Parse Text From Webpage in iOS? - iphone

I am working on updating some of my iOS blog apps to get beyond using UIWebViews that load the article URL. I would like to do something more along the lines of Engadget or TUAW where the text from the article and the main picture is all that is loaded when you click on an article from the tableview, but I am having issues getting only the text of the article.
I tried using some DOM Property codes to get the innerBodyText of the HTML, but no matter what DOM property I try, I end up getting header/footer info, advertisements, and more thrown in with the article.
Is there a simple way in iOS using REGEX or something else to get just the text of the article?

I'd recommend looking into Hpple
see this thread for more info

Related

Parse BBCode in an iOS app

I'm writing an app for a forum. I can get the posts as HTML but I need to do lots of custom things with the posts as I'm not displaying it in UIWebView but natively as rich text (custom handling of [youtube][/youtube] tags). So I am instead getting the much cleaner BBCode output of the posts.
This tutorial seems to fit my needs well enough, however there are some obvious problems with it. On is that if the user types mis-formed BBC I get back bad HTML. Leaving out the closing [/b] as an example.
I am thinking I may just need to loop through the outputted HTML and track if there is an unclosed tag at the end, however I was hoping that there might be a better way to parse BBCode on the iPhone.
Also, lastly I know that is probably the wrong approach (outlined above) but every stack overflow question I've found on BBCode parsing has said not to reinvent the wheel and just use an existing PHP library. But, of course, this is an iOS app so I can't use any code written in PHP.
The question is, what is the best way to parse BBCode on iOS (and if there isn't a library or example available then is there a tutorial on writing a good quality one yourself)?

How to Delete nodes from HTML in iOS

What I am trying to do is to load a webpage into in a UIWebView. The problem is that I need to do some preprocessing on the html before displaying it in the web view.
The UIWebview loadHTMLString is quiet slow when the html is big. I don't need to display the full page therefore i am trying to remove some html nodes before displaying it in the web view to speed up the loading time.
I don't think using regex for that is a wise idea. I checked out NSXMLParser and TFHPPLE but I couldn't find any way to remove nodes from the html tree using an XPath or something.
I know I could do that using Javascript but that won't solve my problem. I also don't have no control on the website so I can't edit in the webpage itself.
Is there something as easy as deleteNodeUsingXPath or something :)
Cheers and thanks a lot for your help in advance.
One possibility solution: do a proxy website which strips out unwanted stuff. The iphone accesses the proxy website URL. The proxy website loads from the original website, strips out unwanted stuff, and replies with the remaining stuff.
There is a tool called Objective-C-HTML-Parser that will do what you are looking for. The documentation is thorough, and the implementation is pretty straight-forward.
Basically, you take your HTML string and make an HTMLParser object that you can then manipulate however you want. It is a very powerful library that basically lets you do whatever you want with HTML with easy-to-use Objective-C APIs.
Good luck!

Grabbing Text from a website using xcode

what i am aiming to do is make an app that grabs text from a website and auto updates when the text on the website changes, does anyone have any ideas or any solutions that could help?
I think you can use Javascript for taken text you need. Also you can use NSRegularExpressions (but it's worse variant). For autoupdating - you should be delegate of UIWebView and check your text in didFinished… method.
Do you mean an iPhone web app? If so do the following:
<head></head><body><?php include ("pathtocontent/content.php") ?></body>
And save as e.g. index.php
Put in the content.php file:
<p>content.</p>
Or anything else you want as content.
Then in the iPhone website do the same. Then you only have to update content.php in order to cjhange both the websites
Look at NSURLConnection, send a request to the website, the data will be in the response

Is it possible to get the page number and line number in iPhone

Is it possible to get the page number and line number of particular text from doc/rtf/pdf in iPhone sdk?
Can QLPreviewController or UIDocumentInteractionController be of any help?
EDIT:
I am trying to create something like iAnnotate,Sente. Application where in user will be able to select some text and can add comments for selected text.
I have gone through the fastpdfkit api's which seems the only api which can be of some help.
Can you guys guide me in the right direction.
https://github.com/brow/leaves
Go with this link, it will help you for PDF reader and you can see there how I am doing total page and current page.
If you're looking at doing this with PDF, RTF and DOC it might be best to approach it in a different way as they are all very different formats.
Instead - if you consider attaching comments to an area of the page as opposed to a specific text selection then all you need to know about the document is which page you're looking at. A much easier task. Then you get the user to 'drag out' a comments box for an area of that page. Think of them more like sticky-notes.
This way you can add comments to images as well as text and it allows a much more flexible system for support of other file formats in the future.
I realise this isn't I direct answer to the question, but thought it relevant enough to post.

Access to hyperlink in a pdf(iPhone)?

I am working on a small app, which can open a pdf. My question is, if it is possible to access a hyperlink from a pdf? Because when am trying to click on hyperlink nothing is happening and it's not redirecting me to the link. I tried searching a lot but didn't got any luck on this. Am expecting a quick response as my work is getting delayed because of this issue. If it's possible then what would be the approach? Right now am using UIWebView to open the pdf. Any sample app or code will be of great help.
Thanks for your time .
The only way I know is to use a third party library (or perhaps parse the PDF yourself), find the links, get their rects and catch user taps and compare that to the list of link/hyperlinks/gotos for that particular page.
Unfortunatelly - nothing in the SDK that can help. You can search for a opensource pdf library, like muPDF for instance.
There is a reason why not many applications have that functionality :D