Is it possible to get page title via iText?
The PdfTextExtractor returns all text from the page but I don't know what line is title. Also, title may contain more than one line
I don't know coordinates of title thus I can't use RegionTextRenderFilter
I can try to analyze the font size and take the line(s) with biggest font but TextRenderInfo doesn't provide public access to gs (private final GraphicsState gs)
Any other ideas?
Pages within a PDF don't have titles, they just have text that happens to be bold or in a large font and appears in an area you consider to be "more top" than other pieces of text. It sounds like you know this already, I just needed to be clear on this.
See my post here which shows how to get font information by subclassing ITextExtractionStrategy. My sample targets iTextSharp which is the .Net port of iText but they match pretty much feature-to-feature. The biggest differences is that Java uses getXXX and setXXX whereas .Net just uses XXX for both. Otherwise everything should port just fine.
The moral of the story is that you are going to have to write some arbitrary rules defining what you think of as a "title" and then parse based on those rules.
Related
I am trying to get Adobe Form to autogenerate a Code128 in a text field by its self when text is input in another text field. I know there is Code128 font, and I have found a bunch of postscript stuff. I am just wondering IF I have to purchase the font and why? (I see something about the license) I just worry I spend this money on a Coding font and I still can't do what I'm wanting by "changing the font" for the "barcode" text field. I don't know any coding, closest is I took an HTML web design class in high school YEARS ago. I appreciate any help. Just to show the one line of code I found (gosh I don't even have a clue HOW) is
'''event.value = this.getField("Size In HP").value;'''
I don't even know if I am using it in the proper "script" box option in Adobe, I just place it and change it to what box I want. hoping it works. -_- (side note the ''' are around the code because below this typing window it shows to do that?)
If you are not planning on a separate program for calculating the checksum, you should stay away from Code128, as the font won't generate the checksum for you. Code39 does not require a checksum, therefore, no calculation.
As long as all you need is capital letters and numbers, Code39 is much easier to work with.
Does anyone know what the 'Pdf embedded' option in Text properties of the Static Text element's attribute is used for in iReport Designer 5.6.0?
I had the same question just today. But I took a look in the official page bellow, and the Summary says:
Unless you are okay with your PDF potentially being rendered with a default/native PDF specification font, or unless you are confident your client machines will always have the fonts installed, you need to use the "embed..." checkbox while creating your font extension
https://community.jaspersoft.com/wiki/what-purpose-embed-font-pdf-document
Because the font is needed in others clients to render correctly.
PS: sorry my English..
I am using VFR reader to display my pdf's. I need to extract the Table of Contents on a button click and display it in a tableview then it should lead to the respective pages while tapping on each.I googled for this and got these links
Create a table of contents from a pdf file
http://mobile.tutsplus.com/tutorials/iphone/ios-sdk-adding-a-table-of-contents-to-an-ipad-reader/
And i came to know that, to get TOC we must use "CGPDFDocumentGetCatalog(pdf doc)". But in my reader that "CGPDFDocumentGetCatalog(pdf doc)" is not at all getting called. Now how can i extract my TOC from my pdf file? Kindly help me out of this. I am struggling on this for a week. Thanks in advance.
Unfortunately I think the two answers you refer to point to different implementation strategies, which are both possibly valid but are different.
The first question is what the PDF files you have and want to show in your app look like. There is no such thing as a predefined TOC object in a PDF file, there are simply different ways to emulate this. The two most common ways are:
A) Bookmarks, which are a way to add little pieces of text to a structured tree, where each piece of text points to a specific location in the PDF file. These bookmarks can be added in the design application or later (there are specific tools to do so) and they can implement whatever structure.
B) Your PDF file might contain something that looks like a classic TOC from a book, which is basically just text on the opening pages, optionally with hyperlinks to specific locations in the book.
The second link you refer to shows how to create user interface where you can show the TOC in. The remaining question then is to figure out what items you want to display in the TOC window. In this second link you point to, the solution presented is to provide hard-coded items specific to one specific book. Of course this approach is not very useful when you want to display just any book.
So the question you are left with is how to figure out what items to display and where they link to.
If you consider my possibility A) above: a PDF file with bookmarks, the answer could be relatively simple. Answer 1 you point to explains how to look at the different structures inside a PDF file - bookmarks are simply such a structure (Defined in section 12.3 of the PDF specification: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf)
This means you could use the techniques shown there to walk the different objects in the PDF file, and find each bookmark. The bookmark will give you the text to display and the actual location in the PDF file that text should jump to when clicked.
If you consider my possibility B) above: a PDF file without bookmarks but a classic TOC, this will be much harder to solve. Such table of contents are simply text on one or more pages, optionally with hyperlinks. Of course you could try to find all text on these pages (if you can figure out on which page the TOC starts and ends), but you'd then also have to figure out where that item links to. If there are no hyperlinks involved, that would be a daunting task.
So your first question should be how generic you want to solve this problem. Do you know which PDF files you'll want to display? Can you devise a TOC for these files yourself (as in your solution 2)? If not, can you be sure all PDF files contain bookmarks? The answer to those questions will largely determine the rest of your strategy...
I'm fairly new to Cocoa and am having trouble Googling for the best way to design my iPhone app.
This app is for viewing a stageplay. It should pretty print the script such that character headings are centered and in small caps, say, and stage directions are in italics etc. It should also allow one character's lines to be highlighted (i.e. dynamic formatting).
Looking at this question it looks like NSTextView/NSTextStorage will provide the formatting requirements I want, I'm just confused as to how to construct the view from the underlying data.
I'm thinking at the moment my source will be XML in the following form:
<script>
<dialogue character="bob">Hello Sue!</dialogue>
<stageDirection>He moves to the table</stageDirection>
<dialogue character="sue">Hello Bob!</dialogue>
</script>
Which would output something similar to the following:
BOB
Hello Sue!
He moves to the table
SUE
Hello Bob!
How do I go from a document model (XML / CoreData / ...) to a view containing pretty formatted text?
Any advice or pointers would be great; I just can't get my head around this problem!
If interactivity isn't required then the most easy way I can think of is to generate HTML and render it with UIWebView. Dynamic highlighting can be done with stringByEvaluatingJavaScriptFromString:
UPDATE
Next relatively easy option is to compose styled text with individual UILabels. Basically you have an array of text entries (cues, stage directions etc.), each with it's own style. We create an array of corresponding UILabels with styles applied and then layout them on "script view". After that we can put this "script view" in UIScrollView and that's it. Size of label required to fit particular text can be determined with sizeWithFont:constrainedToSize:lineBreakMode: of NSString.
There are also CoreText services available, but this is much more advanced option.
I'm trying to use the printing stuff in iOS 4.2 to print from my iPhone app, but I'm having real trouble getting multi-page content to display nicely. As you can see in the attached screenshots of PDFs generated through the iOS printing API, UIMarkupTextPrintFormatter really likes to use a painfully small top-margin when rendering.
Additionally, it doesn't seem to try to split block-elements too nicely either ... it's tough to see in the screenshot but the page break actually occurs halfway through a table row, rather than on a border between rows.
I've tried using the CSS #page directives to specify page boundaries, however iOS Webkit doesn't seem to support these at all.
Does anyone know of any techniques, either in HTML or through the iOS SDK to make these top-margins bigger?
I really don't want to write a custom UIPrintPageRenderer class because I'm trying to give my users the ability to customize their printouts through HTML templates ... going with a custom renderer would almost certainly make this impossible (or really difficult).
Any help is much appreciated!
You're on the right track with UIPrintPageRenderer, but fortunately you don't need to write a custom subclass to do this. All you need to do is instantiate a vanilla UIPrintPageRenderer, set the headerHeight and footerHeight properties, and add your HTML formatter to the renderer using addPrintFormatter:startingAtPage:. It only takes a few extra lines of code, I have posted my method here: Print paper size and content inset