XFA Form filling (growing data) using iText - itext

Using iText api can I achieve the following?
We have a requirement of generating pdf documents with-
Header (static data) repeated in all pages. Same data should be filled or repeated in all pages.
Product Details section (grows data dynamically). This section is kind of table, but values are formed from multiple hibernate entity fields.
Footer repeats in all pages (hard-coded footer)
If this is achievable with iText api, we are planning to buy commercial licence.

With the core iText, you can fill out an XFA form by injecting XML. The functionality you describe requires that you create a dynamic XFA form first (e.g. using Adobe LiveCycle Designer). The result will be a filled out XFA form (XML wrapped in PDF).
If you want to flatten that dynamic PDF (for instance because you want to turn it into a PDF/A, PDF/UA, ordinary PDF document), you need XFA Worker. This will convert the XML stream to PDF syntax (no more XML inside the PDF, except for the XMP data, or, if you need to comply with the ZUGFeRD standard: an XML attachment).
iText is licensed under the AGPL, that means that you can use it for free under specific conditions. For instance: you may need to distribute all your own source code for free. XFA Worker is a closed source product, written on top of iText. You can download a trial version that will add "trial version" on top of all your flattened documents.
If you go for XFA, then your only options are Adobe LiveCycle ES or XFA Worker. I don't know of any other software that supports XFA flattening.

Related

Accessing DataBinding Parameter in a PDF Form

I have a static PDF Form created by Adobe Designer. In the properties of the text fields I can see the DataBinding value (the form is bound to an XML Schema).
I'm trying to read this information by means of Apache PDFbox 2.0 but I can get all the info but for this...
Have you any tip?
Thank you very much
Regards
Fabio
when you create a static PDF Form using the Adobe LiveCycle Designer there are two form definitions - the AcroForm and the XFA. The AcroForm has some of the form definitions of the design being done in the Designer but not all of them. The binding information unfortunately is not part of that. What you need to do is extract the XFA and get the binding from the XFA part.

PDFTK and removing the XFA format

Are there any issues that can come up of removing the XFA format from a PDF form? I'm using PDFTK to fill form, and found that if forms are XFA, then PDFTK doesn't work unless I do a drop_xfa command first to create a new template form. One thing I did notice is that if I didn't do the drop_xfa, I could see the fields pre-filled on Acrobat Reader but not Acrobat Pro. Other views like Ubuntu Document Viewer, would be fine. I don't mind doing the drop_xfa but just checking is there might be issues with me doing that to forms that I am not aware of.
Example: If the form is filled, and it's to be read on a system to grab the fields/values to process.
Thank you in advance.
There are three types of forms in PDF:
Forms using AcroForm technology. In this case, each field corresponds with one or more widgets with fixed positions on specific pages. The form is described using nothing but PDF syntax.
Dynamic forms using the XML Forms Architecture (XFA). In this case, the PDF file is nothing but a container for an XML file that describes the whole form. We refer to this as dynamic XFA, because the form can expand or shrink based on the data that is added: a 1-page form can turn into a 100-page form by adding more data.
Hybrid forms that combine AcroForm and XFA technology. In this case, the form is described twice: once using PDF objects; once using XML. Obviously, such a form is not dynamic: the AcroForm part still defines widget annotations that are defined at absolute positions on specific pages. The form can't adapt to its data.
If you have a dynamic XFA form, dropping the XML will remove the complete form. There won't be anything left.
However, it seems that you are confronted with a hybrid form that consists of both AcroForm and XFA syntax. Hybrid forms are a pain because they often lead to confusion. For instance: a viewer that is not XFA aware, will show you the data as stored in the AcroForm. A viewer that is XFA aware, can give preference to the data as stored in the XFA form. What's the problem, you might ask? Aren't both forms equivalent?
Ideally, both versions of the form are indeed equivalent, but:
If the form isn't filled out correctly, the AcroForm can be different from the XFA form.
XFA has more functionality that AcroForm technology. For instance: a text field in an XFA form can be justified (similar to <p align="justify"> in HTML). However, this option doesn't exist in an AcroForm text field (you can only have left, center or right alignment). Hence if you have text that is justified in an XFA form, but you only look at the AcroForm, then the text won't be justified (because justified text doesn't exist in an AcroForm text field).
This is a long answer to explain that, if you have a hybrid form, it is in most cases OK to throw away the XFA part. You may have small differences, but if you are OK with what the form looks like in Ubuntu Document Viewer (a viewer that doesn't support XFA), then you should be fine.
DISCLAIMER: I am the CEO of the iText Group. Pdftk is a third party tool based on an obsolete and no longer supported version of iText. iText Group does not endorse the use of Pdftk.

User Fill in for Adobe forms

I am using Adobe life cycle designer to create docs in my application....I have all my documents in word and I use the export to option in Adobe Life cycle designer and i get the document converted and now I need to have a user fill in the exported document..so can some one please suggest me how this would go and we use the java script behind....
You could have them fill the form in Adobe land, then use the scripting method exportData to get the form data as XML, then inject that XML into your Word docx as a custom xml part.
From there, Word will use the XML in any content controls bound to it.

Converting large amounts of text and dynamic data into PDF

I have a three page Word document that needs to be converted into PDF. This Word document was given to me as a template to show me what the PDF output should look like. I tried converting this document into PDF, created a PDF form and used iTextSharp to open the form, populate it with data and return it back to the client. This is all great but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden.
My second attempt was to create an MVC 2 View without master page, pass the model to the view, take the HTML representation of the View, pass it over to iTextSharp and render the PDF. The problem here was that iTextSharp failed on some tags (one of them was <hr> tag). I managed to get rid of the problematic tag, but then tables were not rendered properly. Namely, the border attribute was ignored so I ended up with borderless tables. That attempt failed.
I need a suggestion or advice on the most efficient way to create a PDF document in MVC 2 which would be maintainable in the long run. I really don't want my actions to be 200+ lines long. Working directly with the Word document is not the best solution as I have never worked with VSTO so I don't quite know what it would look like to open Word and manipulate text inside of it and add dynamic data and then convert that dynamically into PDF.
Any suggestion is highly welcome.
Best regards!
One thing that I've done in the past is to save the Word file as a DOCX and unzip it since DOCX is just a renamed zip file. Within the archive open up /word/document.xml and you'll see your document. There's a lot of weird XML tags in there but overall you should get a pretty good idea of where your content is. Then just add placeholder text like {FIRST_NAME}, save the file and re-zip.
Then from code you can just perform the same steps, unzipping with something like SharpZipLib or DotNetZip, swapping placeholder copy, re-zipping and then using very simple Word automation to Save-As a PDF.
The other route is to fully utilize iTextSharp and actually write Paragraphs and PdfPTable and everything else. It takes a lot longer to setup but would give you the most control.
Q: you say "... but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden"
How do you end up having to much data ? If the word template can "hold" the data in 3 pages, they should fit in 3 PDF pages.
I used to use iTextSharp to create my PDF's, but I also almost always ended up building the PDF document from scratch myself.(not really a <200 line solution) Have you considerate another library, I recently switched to MigraDoc's PDFSharp.Way simpler to use then iText, lotsa examples / docus
Just my two cents
Word documents object model is quite easy to understand. It will either contain series of Paragraphs or Tables. Using the Open XML SDK, you can iterate through each paragraph/table in the word document and retrieve it's content and styles. Then you can generate PDF document on the fly using those retrieved information. This will work under MVC too.
But if your word document contains complex elements, then it will take some more time for you to implement based on this approach. Also, this approach would only work with (Word 2007 and 2010) files.
Also, HTML to PDF options currently available in the ITextSharp library would work with only known set of tags, as far as I know.
Another suggestion is to make use of commercially available .NET components. There are lot of good solution available. For ex: Syncfusion

Interactive PDF Creation Alternatives to Acrobat?

Are there any good alternatives to Adobe Acrobat for creating interactive PDFs? The terminology is a little fuzzy here - by interactive, I mean "able to be filled in", and not necessarily "scriptable". So this form would be for data collection, rather than report generation which seems to be the common scenario for pdf-related questions on SO.
The trick is that they need to be fillable using Adobe Reader. For those who have not experienced the many frustrations of Acrobat - by default, Reader cannot fill in a form unless it was created using Acrobat Pro >8.0 and has specifically enabled usage rights. That's fine and it basically works (except then Pro users can't save their data - WTF?).
Because I am getting frustrated, I would ideally like to avoid Adobe products altogether (that is on the design side, for the users Reader is still a necessity or I would just do it as a db-backed web form). I'm wondering if anyone has has good experiences with alternatives? Either software libraries or products?
Thanks!
EDIT - Thanks, matt b - I'd seen iText before but didn't know it could create forms. Unfortunately, it looks like Reader cannot save filled-in data to the forms generated by iText (or generated by OO Writer). I've got the nasty feeling that what I want is fundamentally impossible except using Adobe's own rights management tools. If there are other ideas. I'd love to hear them.
You can create fillable form PDFs using OpenOffice.org as well as LibreOffice.
To create the initial form elements in the *.odt documents, enable the View --> Toolbars --> Form Controls tools, which allow you to add clickable checkboxes + radiobuttons, fillable text fields, pushbuttons and some more to the page(s).
When you're finished with your document, use File --> Export as PDF with the checkbox Create PDF form enabled.
Now your PDF form will be editable (and saveable!) with any non-Adobe PDF viewer.
NOTE, however: Adobe uses an own proprietary way to create and fill PDF forms. Adobe Reader does only support to fill PDF forms which were created by an Adobe product (and which have been assigned 'extended rights' so Reader can indeed save the formdata alongside the document).
Adobe Reader will not work with PDF forms you created with OpenOffice.org or LibreOffice ('work' in the sense of: 'allows you to fill+save the form data'.). The technical mechanism behind this is that Adobe digitally sign their form documents with their own key (which is known to the Adobe Reader, and which you agreed to not reverse engineer when you accepted the Adobe Reader EULA...). --
This means:
Non-Adobe PDF Readers will not be able to 'fill+save' forms created with Adobe products (they can 'fill+print' them however).
Adobe PDF readers will refuse to 'fill+save' forms created with non-Adobe products (they will 'fill+print' them however).
The latter two points will be true for all the tools and utilities mentioned in the other answers to this question. If I'm mistaken here, please let me know in a comment...
iText is pretty much the standard in the java-world for generating PDF files programmatically. Perhaps it can also be used to create PDFs with forms in them as you would like?
The open source page layout tool Scribus has a bunch of features oriented to creating interactive PDF forms. I haven't personally used them, but they appear reasonably complete and are covered by the tutorial.
Scribus is worth knowing about if you ever need to do serious page layout in any case.
XSL FO is some thing we used to create PDF files out of existing form data. Unless you want the fillable pdf to be sent out the client, this is a valid option.
IText lets you create Annotations (there are essentially 3 types of 'interactive' components - forms (old style FDF and new XFA) and Annotations. Acrobat and lots of third party tools should let you modify the Annotations values.
There is also a DotNet version of IText called ISharp - both are freeand extremely powerful.
CutePDF Pro allows you to turn a PDF into an interactive form.
Foxit reader allows you to save any pdf with the filled in fields.
I recently dabbled with Scribus. I found it to be an excellent tool if one has enough time to configure and play around with it. I highly recommend it. Wufoo is also very good.
I am not a fan of Acrobat / Adobe. A software should make my life easier not challenge me at every step.
If you search the net with these keywords - FREE FORM CREATOR and you can add the word HTML5.
You will find an array of sites where you can log online and all your clients can have their separate login, fill in data and the form remains in the Cloud and declutter your hard drive. All stakeholders can access the form and edit at anytime. The account can be used as a folder for your business. These forms can be accessed on any device and any platform.
Many of these forms are HTML5 driven, they are so beautiful and fluid. Keep away from macros, they carry viruses.
www.homebasedofficeservices.com