How to parse HTML to PlainText while maintaining paragraph formatting - iphone

I have an iOS app that is pulling data from a Restful web service. A portion of the content I am receiving is being loaded into a UITextView. The portion that will be going into the text view is coming in as HTML format. I need to convert it from HTML to plain text while using the paragraph tags to format the text view properly.
Here is what the HTML format looks like
<p data-seq="1"><span class="paragraph">Content of paragraph 1</span></p><p data-seq="2"><span class="paragraph">Content of paragraph 2</span></p>
You can see that <p data-seq="2"><span class="paragraph">....</span></p> designates the start and end of the paragraph.
I initially tried using NSScanner from this example, How to convert NSString HTML markup to plain text NSString?. This was quick to implement but it strips all tags and and parses the text as one long paragraph.
I have added libXml2 to my code. I started following this tutorial for implementation but after I started working through it I wasn't sure how to format the output into paragraphs.
I have also seen recommendations for the DTCoreText library but I didn't see a lot of info on it.
Could someone possibly throw up a snippet using any of the above three options or one of their own on how to parse html into plain text while maintaining the paragraphs?
SOLUTION
Per lxt's recommendation I investigated DTCoreText. Once I managed to get it installed in my app (definitely recommend cocoa pods for that). It was easy as #import "DTCoreText.h" in my detailViewController and then the lines below to add it to the UITextView.
NSDictionary *options = #{DTUseiOS6Attributes: [NSNumber numberWithBool:YES]};
NSData *htmlData = [self.htmlString dataUsingEncoding:NSUTF8StringEncoding];
NSAttributedString *stringArticle = [[NSAttributedString alloc] initWithHTMLData:htmlData options:options documentAttributes:NULL];
self.newsDetailText.attributedText = stringArticle;
The first build failed because I didn't include the DTUseiOS6Attributes line. The second build succeeded and the detail view was perfectly formatted. It was a fist pump moment! Thanks again for the recommendation lxt!

I would honestly recommend using DTCoreText rather than writing your own parser. There's no real benefit reinventing the wheel, and it's also a widely used library with a large user base.
I am surprised you had trouble finding info about it, the library has very good documentation available, and the author is also pretty active on Twitter (#cocoanetics).
You can use the nifty DTAttributedTextView class provided in place of your UITextView. The library also provides a category that extends NSAttributedString with a initWithHTMLData:documentAttributes: method. This will let you create your attributed string and plug it into your view. It's really no more than a couple of lines of code.

Related

App not displaying 'á' properly in a UILabel

Why does my app draw 'á' as '&aacute' in a UILabel.
I parse the text off a webpage and then draw the text into a label.
Is there something I am missing?
Many Thanks
-Code
This is because the data in the web page is HTML entity encoded so that á is expressed asá. However, as the UILabel doesn't parse/display HTML, it simply displays the content as-is.
As such, you'd need to entity decode the data (to convert á back to á) prior to displaying it. The existing HTML character decoding in Objective-C / Cocoa Touch question/answers (and other questions it links to) should be of some assistance.

Make portion of NSString italicized or bold, not whole string

How would I go about italicizing a single word in an NSString which will be displayed in a UILabel? Specifically, I don't want all text to be italicized, just one word.
Thanks!
(Edited) I meant UILabel, not UITextField.
I don't think that what you are asking to do is possible (I'd be happy to be proven wrong). However, this library (https://github.com/facebook/three20/) is a popular way to achieve the same result in a UILabel (not text field) . The library works fairly well, but does have a lot of limitations, especially on edge conditions, and of course, it comes with associated overhead.
I'd encourage you to think about other ways of achieving the same user outcome. Can Placeholder text help? How about hints next to your text field?
Good luck.
A native UILabel does not support NSAttributedString which is what is normally used to display strings with formatting. You could try an output the text your self using Core Text but I would suggest checking out FontLabel or the three-20 project mentioned by #JJ Rohrer
Use NSAttributedString... Find controllers to draw NSAttributedString,since UILabel wont support NSAttributedString
Controller for NSAttributedString

Extract image URL from HTML text

In Objective-C what is the best method to extract an image url from HTML text?
I've got a chunk of HTML text which may contain one or more images. I want to be able to get the src URL of each image.
If you don't mind limiting yourself to iOS 4, you can use the new text checking APIs. The one you want is NSTextCheckingTypeLink and the NSRegularExpression class.
This seems to be answered over here: parsing HTML on the iPhone
The second answer down has a detailed explanation, however you will just have to change the xpath expression to match your image tags.

Programmatically Generate PDF from HTML on iPhone

I am looking for a way to programmatically (in obj-c) generate a PDF file from a local html file. I am dynamically generating the html from user inputs, I need to create the PDF and send it to the user (via email). I am having difficulty with the PDF generation portion.
I have the code to create a PDF using CGPDFContextCreateWithURL but I am struggling with drawing the page using quartz.
I have searched extensively on SO as well as the internet to no avail.
Any help is much appreciated!
To generate a pdf from an HTML, you need to render the html into a web view, and take snapshots of the web view, and render them into an image context.
The tutorial might be helpful:
http://www.ioslearner.com/convert-html-uiwebview-pdf-iphone-ipad/
I've written a little piece of code that takes an NSAttributedString from DTCoreText, and renders it into a paged PDF file. You can find it on my GitHub Repository. It won't render images or complex html, but it should serve for most uses. Plus, if you're familiar with CoreText, you can extend my PDF frame setter to generate these items.
So what it does now: Give it an HTML string, and it will use DTCoreText to generate an NSAttributedString, then render that into a PDF. It hands back the location that it saved the PDF file in the app's Documents folder.
Why not use a WebService, send the HTML page to this and retrieve the PDF-file ?
That way you can use iTextSharp and C#, and you're done in about 2 minutes.
Plus (if you're evil) you can store and see all the data on your server.
I haven't tried this myself so i have nothing to offer concrete but I'd have to imagine there has to be an easy way to do this on iPhone due to the imaging model. I'd look deeper into the documentation.
As to pushing back with the client that is up to you but there are probably multiple reasons for wanting to keep everything local. Frankly I would not be pleased at all to here from somebody I hired that he couldn't manage this particular task. So think long and hard about this push back. Oh even if you do push back a webserver is a poor choice. I'd go back a step further and investgate why you need something in HTML in the first place.
I've never tried this so I have no idea if it'll work, but how about loading the HTML into a UIWebView, and then make the view draw itself into a PDF context? E.g.
UIWebView *webview = [[UIWebView alloc] initWithFrame:CGRectMake(...)];
[webview loadHTMLString:html baseURL:...];
Then:
- (void)webViewDidFinishLoad:(UIWebView *)webview {
CGPDFContextRef pdfContext = CGPDFContextCreateWithURL(...);
[webview.layer drawInContext:pdfContext];
...
}
I made it by following this SO: https://stackoverflow.com/a/13342906/448717
In order to maintain the same content's proportions I had to multiply the size of the WKWebView 1.25 times the printableRect's size set for the UIPrinterRenderer, as the screen points differs from the PostScript's... I guess.

Parsing XML in Hebrew language

I'm using NSXMLParser in iphone app that I'm working on.
Later I'm displaying the text in a view.
All is well when I'm using english language in my XML.
But my XML is in Herbrew language. I'm not able to read the text properly and display it.Please advice me what change do I've to make in XML.
if the XML file is in UTF-8 and you are decoding it using NSUTF8Encoding you should have no problems.
when displaying the strings in UI, remember to set the correct alignment, or the right-to-left will not look correct.