Are there any issues that can come up of removing the XFA format from a PDF form? I'm using PDFTK to fill form, and found that if forms are XFA, then PDFTK doesn't work unless I do a drop_xfa command first to create a new template form. One thing I did notice is that if I didn't do the drop_xfa, I could see the fields pre-filled on Acrobat Reader but not Acrobat Pro. Other views like Ubuntu Document Viewer, would be fine. I don't mind doing the drop_xfa but just checking is there might be issues with me doing that to forms that I am not aware of.
Example: If the form is filled, and it's to be read on a system to grab the fields/values to process.
Thank you in advance.
There are three types of forms in PDF:
Forms using AcroForm technology. In this case, each field corresponds with one or more widgets with fixed positions on specific pages. The form is described using nothing but PDF syntax.
Dynamic forms using the XML Forms Architecture (XFA). In this case, the PDF file is nothing but a container for an XML file that describes the whole form. We refer to this as dynamic XFA, because the form can expand or shrink based on the data that is added: a 1-page form can turn into a 100-page form by adding more data.
Hybrid forms that combine AcroForm and XFA technology. In this case, the form is described twice: once using PDF objects; once using XML. Obviously, such a form is not dynamic: the AcroForm part still defines widget annotations that are defined at absolute positions on specific pages. The form can't adapt to its data.
If you have a dynamic XFA form, dropping the XML will remove the complete form. There won't be anything left.
However, it seems that you are confronted with a hybrid form that consists of both AcroForm and XFA syntax. Hybrid forms are a pain because they often lead to confusion. For instance: a viewer that is not XFA aware, will show you the data as stored in the AcroForm. A viewer that is XFA aware, can give preference to the data as stored in the XFA form. What's the problem, you might ask? Aren't both forms equivalent?
Ideally, both versions of the form are indeed equivalent, but:
If the form isn't filled out correctly, the AcroForm can be different from the XFA form.
XFA has more functionality that AcroForm technology. For instance: a text field in an XFA form can be justified (similar to <p align="justify"> in HTML). However, this option doesn't exist in an AcroForm text field (you can only have left, center or right alignment). Hence if you have text that is justified in an XFA form, but you only look at the AcroForm, then the text won't be justified (because justified text doesn't exist in an AcroForm text field).
This is a long answer to explain that, if you have a hybrid form, it is in most cases OK to throw away the XFA part. You may have small differences, but if you are OK with what the form looks like in Ubuntu Document Viewer (a viewer that doesn't support XFA), then you should be fine.
DISCLAIMER: I am the CEO of the iText Group. Pdftk is a third party tool based on an obsolete and no longer supported version of iText. iText Group does not endorse the use of Pdftk.
Related
I have a static PDF Form created by Adobe Designer. In the properties of the text fields I can see the DataBinding value (the form is bound to an XML Schema).
I'm trying to read this information by means of Apache PDFbox 2.0 but I can get all the info but for this...
Have you any tip?
Thank you very much
Regards
Fabio
when you create a static PDF Form using the Adobe LiveCycle Designer there are two form definitions - the AcroForm and the XFA. The AcroForm has some of the form definitions of the design being done in the Designer but not all of them. The binding information unfortunately is not part of that. What you need to do is extract the XFA and get the binding from the XFA part.
In our application we are using Itext Pdf 5.5.3 library.
We have checked with some of the pdfs in which Checkboxes displayed correctly(check/uncheck) .
However there are some pdf with RadioBoxes and do not display radiobutton(on/off) correctly.
I also use this link to validate pdfs and java code
String[] values = form.getAppearanceStates("Checkbox");
return null values.
Also tried Itext RUPS and found that pdf which are working shows Form Field Names in RUPS Form Tab. And PDfs which are not working do not display form fields.
I tried generating pdf from word document and it doesn't display form fields in RUP , neither I can check/uncheck checkbox in Adobe Acrobat Reader.
What could be the solution to display radiobutton with check on / off ?
Edit -
I had created sample web application to reproduce the issue.
Please setup attached web application and let me know the fix for the issue.
Please download from this link
You have successfully discovered the difference between interactive PDF forms and "flat" PDF documents that look like a form to the human eye, but that aren't interactive forms.
To make the "flat" forms interactive, you need to open those flat documents in PDF editing software (e.g. Adobe Acrobat) and you need to add a form field manually.
You can ask Acrobat to guess where it should add fields, but Acrobat will be wrong in many cases for obvious reasons. You always need a human if you want it to be done correctly.
As for creating an interactive PDF from Word... Forget about it. Use OpenOffice or LibreOffice.
I have come across a scenario where I have to read html data from database and display it in pdf reports. This html data also contains table structure <table></table> tags and other html element inside it. Previously we used jasper reports for our reporting needs but recently as we came to know that the above functionality is not supported in jasper, I wanted to know which reporting tool can be used so that it can be incorporated with servoy. Does birt provide this functionality?
AFAIK none of the well-known reporting tools does support this, although in BIRT it works "somehow" - but not good enough to be usable.
The reason for this is simple, I think: A reporting tool would have to incorporate a complete browser engine like WebKit or others to achieve this, because it would have to "understand" the structure for its page-breaking algorithm.
Yes, BIRT has a text element where we can set the display type to HTML. If the html table is in a dataset field you will just have to include it in the expression of the text using "value-of" tag, something like this:
<VALUE-OF format="HTML">row["htmlTableField"]</VALUE-OF>
PDF format is taking such html elements into account, including most of simple style settings such background color, text-align, borders etc.
Usually the reports render just fine with html.
There are some tricks to displaying html correctly in BIRT.
You may use a Dynamic Text element and set to html or auto.
Here are some tricks to handling free form text..
Make sure your xml is valid, I recommend replacing line breaks or you may catch a scenario where the rptdocument will not export.
Also, if possible keep these in auto layout, when using run + render. The page breaks may actually be calculated once on run and again on render. You might experience breaking issues with fixed. The page may attempt to display all the html prior to breaking a page when using the RUN() phase, in web viewer or the rptdocument. Then when rendering to pdf the the breaks are applied differently, with fixed layout.
Using iText api can I achieve the following?
We have a requirement of generating pdf documents with-
Header (static data) repeated in all pages. Same data should be filled or repeated in all pages.
Product Details section (grows data dynamically). This section is kind of table, but values are formed from multiple hibernate entity fields.
Footer repeats in all pages (hard-coded footer)
If this is achievable with iText api, we are planning to buy commercial licence.
With the core iText, you can fill out an XFA form by injecting XML. The functionality you describe requires that you create a dynamic XFA form first (e.g. using Adobe LiveCycle Designer). The result will be a filled out XFA form (XML wrapped in PDF).
If you want to flatten that dynamic PDF (for instance because you want to turn it into a PDF/A, PDF/UA, ordinary PDF document), you need XFA Worker. This will convert the XML stream to PDF syntax (no more XML inside the PDF, except for the XMP data, or, if you need to comply with the ZUGFeRD standard: an XML attachment).
iText is licensed under the AGPL, that means that you can use it for free under specific conditions. For instance: you may need to distribute all your own source code for free. XFA Worker is a closed source product, written on top of iText. You can download a trial version that will add "trial version" on top of all your flattened documents.
If you go for XFA, then your only options are Adobe LiveCycle ES or XFA Worker. I don't know of any other software that supports XFA flattening.
I have a three page Word document that needs to be converted into PDF. This Word document was given to me as a template to show me what the PDF output should look like. I tried converting this document into PDF, created a PDF form and used iTextSharp to open the form, populate it with data and return it back to the client. This is all great but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden.
My second attempt was to create an MVC 2 View without master page, pass the model to the view, take the HTML representation of the View, pass it over to iTextSharp and render the PDF. The problem here was that iTextSharp failed on some tags (one of them was <hr> tag). I managed to get rid of the problematic tag, but then tables were not rendered properly. Namely, the border attribute was ignored so I ended up with borderless tables. That attempt failed.
I need a suggestion or advice on the most efficient way to create a PDF document in MVC 2 which would be maintainable in the long run. I really don't want my actions to be 200+ lines long. Working directly with the Word document is not the best solution as I have never worked with VSTO so I don't quite know what it would look like to open Word and manipulate text inside of it and add dynamic data and then convert that dynamically into PDF.
Any suggestion is highly welcome.
Best regards!
One thing that I've done in the past is to save the Word file as a DOCX and unzip it since DOCX is just a renamed zip file. Within the archive open up /word/document.xml and you'll see your document. There's a lot of weird XML tags in there but overall you should get a pretty good idea of where your content is. Then just add placeholder text like {FIRST_NAME}, save the file and re-zip.
Then from code you can just perform the same steps, unzipping with something like SharpZipLib or DotNetZip, swapping placeholder copy, re-zipping and then using very simple Word automation to Save-As a PDF.
The other route is to fully utilize iTextSharp and actually write Paragraphs and PdfPTable and everything else. It takes a lot longer to setup but would give you the most control.
Q: you say "... but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden"
How do you end up having to much data ? If the word template can "hold" the data in 3 pages, they should fit in 3 PDF pages.
I used to use iTextSharp to create my PDF's, but I also almost always ended up building the PDF document from scratch myself.(not really a <200 line solution) Have you considerate another library, I recently switched to MigraDoc's PDFSharp.Way simpler to use then iText, lotsa examples / docus
Just my two cents
Word documents object model is quite easy to understand. It will either contain series of Paragraphs or Tables. Using the Open XML SDK, you can iterate through each paragraph/table in the word document and retrieve it's content and styles. Then you can generate PDF document on the fly using those retrieved information. This will work under MVC too.
But if your word document contains complex elements, then it will take some more time for you to implement based on this approach. Also, this approach would only work with (Word 2007 and 2010) files.
Also, HTML to PDF options currently available in the ITextSharp library would work with only known set of tags, as far as I know.
Another suggestion is to make use of commercially available .NET components. There are lot of good solution available. For ex: Syncfusion