Field Detection using iText - itext

Using Adobe Acrobat, if you choose Add or Edit Fields... from the Forms menu on a file with no fields, you get a pop-up with a message
Currently, there are no form fields in this PDF. Do you want Acrobat
to detect form fields for you?
Is there a way of accomplishing this sort of of field detection using iText?

Not out of the box but the API exists that you could build your own.
Adobe Acrobat is a PDF renderer and as such it can actually "look" at a PDF as a human does. It "sees" a line with text "near" it and can say with a fair amount of certainty that the line represents a field and the text represents the field's label. Same with circles and squares for radio buttons and check boxes. This document actually describes all of the shapes that Adobe Acrobat searches for.
Adobe's technology, however, assumes that a human will confirm and fix any problems that occur, usually using Adobe's technology:
After running the auto field detection process on a form, check it to make sure the correct fields have been created.
So even if iText supported this, you'd still have to open the PDF in Adobe Acrobat to check and fix things anyway.
But if you want to build your own you could use something like this or this to get at the lines. And this to get at the text.

Related

What do I get back from Tesseract when OCR a Checkbox (not a form)

We parse a good number of PDFs, from many vendors. The PDFs are similar, but not exactly the same and things are not always in an exact same position on the same page. Some cases we are able to parse via getting the Strings from the PDF and checkboxes are Unicode. However, many vendors are not using Unicode so an image. These are never forms. So if I use iText to OCR the whole document, what does it produce for these checkboxes? Such that I can look for that and see if a checkbox is checked or not? Or am I just out of luck and the only way the data gets into our application is through manual entry? Thanks.

IText Pdf - RadioBox(On/Off) not appearing for some pdf

In our application we are using Itext Pdf 5.5.3 library.
We have checked with some of the pdfs in which Checkboxes displayed correctly(check/uncheck) .
However there are some pdf with RadioBoxes and do not display radiobutton(on/off) correctly.
I also use this link to validate pdfs and java code
String[] values = form.getAppearanceStates("Checkbox");
return null values.
Also tried Itext RUPS and found that pdf which are working shows Form Field Names in RUPS Form Tab. And PDfs which are not working do not display form fields.
I tried generating pdf from word document and it doesn't display form fields in RUP , neither I can check/uncheck checkbox in Adobe Acrobat Reader.
What could be the solution to display radiobutton with check on / off ?
Edit -
I had created sample web application to reproduce the issue.
Please setup attached web application and let me know the fix for the issue.
Please download from this link
You have successfully discovered the difference between interactive PDF forms and "flat" PDF documents that look like a form to the human eye, but that aren't interactive forms.
To make the "flat" forms interactive, you need to open those flat documents in PDF editing software (e.g. Adobe Acrobat) and you need to add a form field manually.
You can ask Acrobat to guess where it should add fields, but Acrobat will be wrong in many cases for obvious reasons. You always need a human if you want it to be done correctly.
As for creating an interactive PDF from Word... Forget about it. Use OpenOffice or LibreOffice.

Is it possible to attach file while filling in Adobe Forms pdf?

I know that forms in .pdf made by Adobe Acrobat can act like questionnaire or application blank for filling input fileds and saving results, but is it possible in a way to add more interactiviry to them
- add file upload or attachment to resulting document?
It's kind of possible.
You'll need to add Javascript code to your documents. The code should show Browse For File dialog and import selected file to the document. Please have a look at Import Named File Attachment script (at the bottom of the page). I doubt that this solution will work in Adobe Reader or alternative PDF viewers, but Acrobat Professional should do the trick.
Another solution is given is this thread on Adobe forums:
...I figured out a roundabout way to attach files in adobe reader (not
the full blow adobe acrobat reader). Via Adobe Acrobat 9 Pro I
created a blank form with the "comment/mark up" tools enabled. Then I
merged that file with the other PDF form that was formatted with
fields, radial buttons etc. I had to create the blank form for the
attachments separately because if I tried to enable the "comment/mark
up' tool on the form with fields and radial buttons it disabled the
fields. ... The comment markup tool then allowed me to attach files
via the "attach file as a comment" option.

Converting large amounts of text and dynamic data into PDF

I have a three page Word document that needs to be converted into PDF. This Word document was given to me as a template to show me what the PDF output should look like. I tried converting this document into PDF, created a PDF form and used iTextSharp to open the form, populate it with data and return it back to the client. This is all great but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden.
My second attempt was to create an MVC 2 View without master page, pass the model to the view, take the HTML representation of the View, pass it over to iTextSharp and render the PDF. The problem here was that iTextSharp failed on some tags (one of them was <hr> tag). I managed to get rid of the problematic tag, but then tables were not rendered properly. Namely, the border attribute was ignored so I ended up with borderless tables. That attempt failed.
I need a suggestion or advice on the most efficient way to create a PDF document in MVC 2 which would be maintainable in the long run. I really don't want my actions to be 200+ lines long. Working directly with the Word document is not the best solution as I have never worked with VSTO so I don't quite know what it would look like to open Word and manipulate text inside of it and add dynamic data and then convert that dynamically into PDF.
Any suggestion is highly welcome.
Best regards!
One thing that I've done in the past is to save the Word file as a DOCX and unzip it since DOCX is just a renamed zip file. Within the archive open up /word/document.xml and you'll see your document. There's a lot of weird XML tags in there but overall you should get a pretty good idea of where your content is. Then just add placeholder text like {FIRST_NAME}, save the file and re-zip.
Then from code you can just perform the same steps, unzipping with something like SharpZipLib or DotNetZip, swapping placeholder copy, re-zipping and then using very simple Word automation to Save-As a PDF.
The other route is to fully utilize iTextSharp and actually write Paragraphs and PdfPTable and everything else. It takes a lot longer to setup but would give you the most control.
Q: you say "... but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden"
How do you end up having to much data ? If the word template can "hold" the data in 3 pages, they should fit in 3 PDF pages.
I used to use iTextSharp to create my PDF's, but I also almost always ended up building the PDF document from scratch myself.(not really a <200 line solution) Have you considerate another library, I recently switched to MigraDoc's PDFSharp.Way simpler to use then iText, lotsa examples / docus
Just my two cents
Word documents object model is quite easy to understand. It will either contain series of Paragraphs or Tables. Using the Open XML SDK, you can iterate through each paragraph/table in the word document and retrieve it's content and styles. Then you can generate PDF document on the fly using those retrieved information. This will work under MVC too.
But if your word document contains complex elements, then it will take some more time for you to implement based on this approach. Also, this approach would only work with (Word 2007 and 2010) files.
Also, HTML to PDF options currently available in the ITextSharp library would work with only known set of tags, as far as I know.
Another suggestion is to make use of commercially available .NET components. There are lot of good solution available. For ex: Syncfusion

Interactive PDF Creation Alternatives to Acrobat?

Are there any good alternatives to Adobe Acrobat for creating interactive PDFs? The terminology is a little fuzzy here - by interactive, I mean "able to be filled in", and not necessarily "scriptable". So this form would be for data collection, rather than report generation which seems to be the common scenario for pdf-related questions on SO.
The trick is that they need to be fillable using Adobe Reader. For those who have not experienced the many frustrations of Acrobat - by default, Reader cannot fill in a form unless it was created using Acrobat Pro >8.0 and has specifically enabled usage rights. That's fine and it basically works (except then Pro users can't save their data - WTF?).
Because I am getting frustrated, I would ideally like to avoid Adobe products altogether (that is on the design side, for the users Reader is still a necessity or I would just do it as a db-backed web form). I'm wondering if anyone has has good experiences with alternatives? Either software libraries or products?
Thanks!
EDIT - Thanks, matt b - I'd seen iText before but didn't know it could create forms. Unfortunately, it looks like Reader cannot save filled-in data to the forms generated by iText (or generated by OO Writer). I've got the nasty feeling that what I want is fundamentally impossible except using Adobe's own rights management tools. If there are other ideas. I'd love to hear them.
You can create fillable form PDFs using OpenOffice.org as well as LibreOffice.
To create the initial form elements in the *.odt documents, enable the View --> Toolbars --> Form Controls tools, which allow you to add clickable checkboxes + radiobuttons, fillable text fields, pushbuttons and some more to the page(s).
When you're finished with your document, use File --> Export as PDF with the checkbox Create PDF form enabled.
Now your PDF form will be editable (and saveable!) with any non-Adobe PDF viewer.
NOTE, however: Adobe uses an own proprietary way to create and fill PDF forms. Adobe Reader does only support to fill PDF forms which were created by an Adobe product (and which have been assigned 'extended rights' so Reader can indeed save the formdata alongside the document).
Adobe Reader will not work with PDF forms you created with OpenOffice.org or LibreOffice ('work' in the sense of: 'allows you to fill+save the form data'.). The technical mechanism behind this is that Adobe digitally sign their form documents with their own key (which is known to the Adobe Reader, and which you agreed to not reverse engineer when you accepted the Adobe Reader EULA...). --
This means:
Non-Adobe PDF Readers will not be able to 'fill+save' forms created with Adobe products (they can 'fill+print' them however).
Adobe PDF readers will refuse to 'fill+save' forms created with non-Adobe products (they will 'fill+print' them however).
The latter two points will be true for all the tools and utilities mentioned in the other answers to this question. If I'm mistaken here, please let me know in a comment...
iText is pretty much the standard in the java-world for generating PDF files programmatically. Perhaps it can also be used to create PDFs with forms in them as you would like?
The open source page layout tool Scribus has a bunch of features oriented to creating interactive PDF forms. I haven't personally used them, but they appear reasonably complete and are covered by the tutorial.
Scribus is worth knowing about if you ever need to do serious page layout in any case.
XSL FO is some thing we used to create PDF files out of existing form data. Unless you want the fillable pdf to be sent out the client, this is a valid option.
IText lets you create Annotations (there are essentially 3 types of 'interactive' components - forms (old style FDF and new XFA) and Annotations. Acrobat and lots of third party tools should let you modify the Annotations values.
There is also a DotNet version of IText called ISharp - both are freeand extremely powerful.
CutePDF Pro allows you to turn a PDF into an interactive form.
Foxit reader allows you to save any pdf with the filled in fields.
I recently dabbled with Scribus. I found it to be an excellent tool if one has enough time to configure and play around with it. I highly recommend it. Wufoo is also very good.
I am not a fan of Acrobat / Adobe. A software should make my life easier not challenge me at every step.
If you search the net with these keywords - FREE FORM CREATOR and you can add the word HTML5.
You will find an array of sites where you can log online and all your clients can have their separate login, fill in data and the form remains in the Cloud and declutter your hard drive. All stakeholders can access the form and edit at anytime. The account can be used as a folder for your business. These forms can be accessed on any device and any platform.
Many of these forms are HTML5 driven, they are so beautiful and fluid. Keep away from macros, they carry viruses.
www.homebasedofficeservices.com