i have a form that i have created in MS Word then converted to a PDF (Form) then i load this in using a PDF Reader, i then have a stamper created that fills in the fields, if i want to add a second page with the same template (Form) how do i do this and populate some of the fields with the same information
i have managed to get a new page with another reader but how do i stamp information onto this page as the AcroFields will have the same name.#
this is how i achieved that:
stamper.insertPage(1,PageSize.A4);
PdfReader reader = new PdfReader("/soaprintjobs/templates/STOTemplate.pdf"); //reads the original pdf
PdfImportedPage page; //writes the new pdf to file
page = stamper.getImportedPage(reader,1); //retrieve the second page of the original pdf
PdfContentByte newPageContent = stamper.getUnderContent(1); //get the over content of the first page of the new pdf
newPageContent.addTemplate(page, 0,0);
Thanks
Acroform fields have the property that fields with the same name are considered the same field. They have the same value. So if you have a field with the same name on page 1 and page 2, they will always display the same value. If you change the value on page 1, it will also change on page 2.
In some cases this is desirable. You may have a multi-page form with a reference number and want to repeat that reference number on each page. In that case you can use fields with the same name.
However, if you want to have multiple copies of the same form with different data in 1 document, you'll run into problems. You'll have to rename the form fields so they are unique.
In iText, you should not use getImportedPage() to copy Acroforms. Starting with iText 5.4.4 you can use the PdfCopy class. In earlier versions the PdfCopyFields class should be used.
Here's some sample code to copy Acroforms and rename fields. Code for iText 5.4.4 and up is in comments.
public static void main(String[] args) throws FileNotFoundException, DocumentException, IOException {
String[] inputs = { "form1.pdf", "form2.pdf" };
PdfCopyFields pcf = new PdfCopyFields(new FileOutputStream("out.pdf"));
// iText 5.4.4+
// Document document = new Document();
// PdfCopy pcf = new PdfCopy(document, new FileOutputStream("out.pdf"));
// pcf.setMergeFields();
// document.open();
int documentnumber = 0;
for (String input : inputs) {
PdfReader reader = new PdfReader(input);
documentnumber++;
// add suffix to each field name, in order to make them unique.
renameFields(reader, documentnumber);
pcf.addDocument(reader);
}
pcf.close();
// iText 5.4.4+
// document.close();
}
public static void renameFields(PdfReader reader, int documentnumber) {
Set<String> keys = new HashSet<String>(reader.getAcroFields()
.getFields().keySet());
for (String key : keys) {
reader.getAcroFields().renameField(key, key + "_" + documentnumber);
}
}
Related
I try to generate a toc(table of content) for my pdf, and I want to get some strings which look like chapter title in xxx.pdf using ITextExtractionStrategy. But I got com.itextpdf.kernel.PdfException when I am running a test.
Here is my code:
#org.junit.Test
public void test() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfDocument pdfDoc = new PdfDocument(new PdfReader("src/test/resources/template/xxx.pdf"),
new PdfWriter(baos));
pdfDoc.addNewPage(1);
Document document = new Document(pdfDoc);
// when add this code, throw com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data.
Paragraph title = new Paragraph(new Text("index"))
.setTextAlignment(TextAlignment.CENTER);
document.add(title);
SimpleTextExtractionStrategy extractionStrategy = new SimpleTextExtractionStrategy();
for (int i = 1; i < pdfDoc.getNumberOfPages(); i++) {
PdfPage page = pdfDoc.getPage(i);
PdfCanvasProcessor parser = new PdfCanvasProcessor(extractionStrategy);
parser.processPageContent(page);
}
...
document.close();
pdfDoc.close();
new FileOutputStream("./yyy.pdf").write(baos.toByteArray());
}
Here is the output:
com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data.
at com.itextpdf.kernel.font.PdfFontFactory.createFont(PdfFontFactory.java:123)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.getFont(PdfCanvasProcessor.java:490)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$SetTextFontOperator.invoke(PdfCanvasProcessor.java:811)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:454)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:282)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:303)
at com.example.pdf.util.Test.test(Test.java:138)
Whenever you add content to a PdfDocument like you do here
Document document = new Document(pdfDoc);
Paragraph title = new Paragraph(new Text("index"))
.setTextAlignment(TextAlignment.CENTER);
document.add(title);
you have to be aware that this content is not already stored in its final form; for example fonts used are not yet properly subset'ed. The final form is generated when you're closing the document.
Text extraction on the other hand requires the content to extract to be in its final form.
Thus, you should not apply text extraction to a document you're working on. In particular, don't apply text extraction to a page you've changed the content of.
If you need to extract text from the documents you create yourself, close your document first, open a new document from the output, and extract from that new document.
I have multiple copies of a .pdf document that are commented by different users. I would like to merge all these comments into a new pdf "merged".
I wrote this sub inside a class called document with properties "path" and "directory".
Public Sub MergeComments(ByVal pdfDocuments As String())
Dim oSavePath As String = Directory & "\" & FileName & "_Merged.pdf"
Dim oPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(Path),
New PdfWriter(New IO.FileStream(oSavePath, IO.FileMode.Create)))
For Each oFile As String In pdfDocuments
Dim oSecundairyPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(oFile))
Dim oAnnotations As New PDFannotations
For i As Integer = 1 To oSecundairyPDFdocument.GetNumberOfPages
Dim pdfPage As PdfPage = oSecundairyPDFdocument.GetPage(i)
For Each oAnnotation As Annot.PdfAnnotation In pdfPage.GetAnnotations()
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Next
Next
Next
oPDFdocument.Close()
End Sub
This code results in an exception that I am failing to solve.
iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.'
What do I need to change in order to perform this task? Or am I completely off with my code block?
You need to explicitly copy the underlying PDF object to the destination document. After that you will be easily able to add that object to the list of page annotations.
Instead of adding the annotation directly:
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Copy the object to the destination document first, wrap it into PdfAnnotation class with makeAnnotation method and then add it as usual. Code is in Java but you will easily be able to convert it into VB:
PdfObject annotObject = oAnnotation.getPdfObject().copyTo(pdfDocument);
pdfDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
Here is a working Java code, with annotations copied from one document to other using the copyTo method.
PdfReader reader = new PdfReader(new
RandomAccessSourceFactory().createBestSource(sourceFileName), null);
PdfDocument document = new PdfDocument(reader);
PdfReader toMergeReader = new PdfReader(new RandomAccessSourceFactory().createBestSource(targetFileName), null);
PdfDocument toMergeDocument = new PdfDocument(toMergeReader);
PdfWriter writer = new PdfWriter(targetFileName + "_MergedVersion.pdf");
PdfDocument writeDocument = new PdfDocument(writer);
int pageCount = toMergeDocument.getNumberOfPages();
for (int i = 1; i <= pageCount; i++) {
PdfPage page = document.getPage(i);
writeDocument.addPage(page.copyTo(writeDocument));
PdfPage pdfPage = toMergeDocument.getPage(i);
List<PdfAnnotation> pageAnnots = pdfPage.getAnnotations();
if (pageAnnots != null) {
for (PdfAnnotation pdfAnnotation : pageAnnots) {
PdfObject annotObject = pdfAnnotation.getPdfObject().copyTo(writeDocument);
writeDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
}
}
}
reader.close();
toMergeReader.close();
toMergeDocument.close();
document.close();
writeDocument.close();
writer.close();
We are in the last steps of evaluating iText7. We use iText 7.1.0 and html2pdf 2.0.0.
What we do: we send a json_encoded collection with pdf-data (which includes html for header, body and footer) to our Java app. There we iterate over the collection, create a byteArrayOutputStream for each pdf-data element and merge them together. We then send the results to a script which echoes it to e.g. a browser. Although the pdf is displayed correctly, we encounter errors while creating it:
com.itextpdf.io.IOException: Error at file pointer 226,416.
...
Caused by: com.itextpdf.io.IOException: xref subsection not found.
... 73 common frames omitted
If we create only one part of the collection, no error is thrown.
Iterate over collection and merge:
#RequestMapping(value = "/pdf", method = RequestMethod.POST, produces = MediaType.APPLICATION_PDF_VALUE)
public byte[] index(#RequestBody PDFDataModelCollection elements, Model model) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
try (PdfDocument resultDoc = new PdfDocument(writer)) {
for (PDFDataModel pdfDataModel : elements.getElements()) {
PdfReader reader = new PdfReader(new ByteArrayInputStream(creationService.createDatasheet(pdfDataModel)));
try (PdfDocument sourceDoc = new PdfDocument(reader)) {
int n = sourceDoc.getNumberOfPages(); //<-- IOException on second iteration
for (int i = 1; i <= n; i++) {
PdfPage page = sourceDoc.getPage(i).copyTo(resultDoc);
resultDoc.addPage(page);
}
}
}
}
return byteArrayOutputStream.toByteArray(); //outputs the final pdf
}
Creation of part:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
//Initialize PDF document
PdfDocument pdfDoc = new PdfDocument(writer);
try (
Document document = new Document(pdfDoc)
) {
//header, footer, etc
//body
for (IElement element : HtmlConverter.convertToElements(pdfDataModel.getBody(), this.props)) {
document.add((IBlockElement) element);
}
footer.writeTotalNumberOnPages(pdfDoc);
}
return byteArrayOutputStream.toByteArray();
}
We are grateful for any suggestion.
In createDatasheet you appear to re-use some byteArrayOutputStream without clearing it first.
In the first iteration, therefore, everything works as desired, at the end of createDatasheet you have a single PDF file in it.
In the second iteration, though, you have two PDF files in that byteArrayOutputStream, one after the other. This concatenation does not form a valid single PDF.
Thus, byteArrayOutputStream.toByteArray() returns something broken.
To fix this, either make the byteArrayOutputStream local to createDatasheet and create a new instance every time or alternatively reset byteArrayOutputStream at the start of createDatasheet:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
byteArrayOutputStream.reset();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
[...]
I used examples available on web to create an application that is able to get xml structure of XFA form and then set it back filled. Important code looks like this:
public void readData(String src, String dest)
throws IOException, ParserConfigurationException, SAXException,
TransformerFactoryConfigurationError, TransformerException {
FileOutputStream os = new FileOutputStream(dest);
PdfReader reader = new PdfReader(src);
XfaForm xfa = new XfaForm(reader);
Node node = xfa.getDatasetsNode();
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
if ("data".equals(list.item(i).getLocalName())) {
node = list.item(i);
break;
}
}
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(new DOMSource(node), new StreamResult(os));
reader.close();
}
public void fillPdfWithXmlData(String src, String xml, String dest)
throws IOException, DocumentException {
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest), '\0', true);
AcroFields form = stamper.getAcroFields();
XfaForm xfa = form.getXfa();
xfa.fillXfaForm(new FileInputStream(xml));
stamper.close();
reader.close();
}
When I use it to fill this form: http://www.vzp.cz/uploads/document/tiskopisy-pro-zamestnavatele-hromadne-oznameni-zamestnavatele-verze-2-pdf-56-kb.pdf it works fine and I can see the filled form in Acrobat Reader. However, if I open the document in Foxit Reader I see a blank form (tested in latest version and 5.x version).
I tried to play a little bit with it and got these XfaForm(...).getDomDocument() data:
Filled by Acrobat Reader: http://pastebin.com/kXKyh9EM
Filled by Foxit Reader: http://pastebin.com/tiZ7EmfE
Filled by iText: http://pastebin.com/tTKLMERC
Filled by Foxit Reader after a fill by iText: http://pastebin.com/Uuq0jS4b
The field which was filled is . Is it possible to use iText in a way that it works even with Foxit Reader (and the XFA signature stays?)
In your Filled by iText example there's a superfluous <xfa:data> element:
<xfa:datasets xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
<xfa:data>
<xfa:data>
<HOZ>
<!-- rest of your data here -->
</HOZ>
</xfa:data>
</xfa:data>
<!-- data description -->
</xfa:datasets>
This is because the fillXfaForm() method of XfaForm expects the XML data without <xfa:data> as the root element. So your XML data should just look like:
<HOZ>
<!-- rest of your data here -->
</HOZ>
I see that your readData() method that extract the existing form data including the <xfa:data> element:
<xfa:data>
<HOZ>
<!-- rest of your data here -->
</HOZ>
</xfa:data>
Stripping the outer element should fix your problem. For example:
XfaForm xfa = new XfaForm(reader);
Node node = xfa.getDatasetsNode();
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
if ("data".equals(list.item(i).getLocalName())) {
node = list.item(i);
break;
}
}
// strip <xfa:data>
node = node.getFirstChild();
// Transformer code here
I have a method that returns a pdf byte stream (from fillable pdf) Is there a straight forward way to merge 2 streams into one stream and make one pdf out of it? I need to run my method twice but need the two pdf's into One pdf stream. Thanks.
You didn't say if you're flattening the filled forms with the PdfStamper, so I'll just say you must flatten the before trying to merge them. Here's a working .ashx HTTP handler:
<%# WebHandler Language="C#" Class="mergeByteForms" %>
using System;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
public class mergeByteForms : IHttpHandler {
HttpServerUtility Server;
public void ProcessRequest (HttpContext context) {
Server = context.Server;
HttpResponse Response = context.Response;
Response.ContentType = "application/pdf";
using (Document document = new Document()) {
using (PdfSmartCopy copy = new PdfSmartCopy(
document, Response.OutputStream) )
{
document.Open();
for (int i = 0; i < 2; ++i) {
PdfReader reader = new PdfReader(_getPdfBtyeStream(i.ToString()));
copy.AddPage(copy.GetImportedPage(reader, 1));
}
}
}
}
public bool IsReusable { get { return false; } }
// simulate your method to use __one__ byte stream for __one__ PDF
private byte[] _getPdfBtyeStream(string data) {
// replace with __your__ PDF template
string pdfTemplatePath = Server.MapPath(
"~/app_data/template.pdf"
);
PdfReader reader = new PdfReader(pdfTemplatePath);
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
AcroFields form = stamper.AcroFields;
// replace this with your form field data
form.SetField("title", data);
// ...
// this is __VERY__ important; since you're using the same fillable
// PDF, if you don't set this property to true the second page will
// lose the filled fields.
stamper.FormFlattening = true;
}
return ms.ToArray();
}
}
}
Hopefully the inline comments make sense. _getPdfBtyeStream() method above simulates your PDF byte streams. The reason you need to set FormFlattening to true is that a when you fill PDF form fields, names are supposed to be unique. In your case the second page is the same fillable PDF form, so it has the same field names as the first page and when you fill them they're ignored. Comment out the example line above:
stamper.FormFlattening = true;
to see what I mean.
In other words, a lot of the generic code to merge PDFs on the Internet and even here on stackoverflow will not work (for fillable forms) because Acrofields are not being accounted for. In fact, if you take a look at stackoverflow's about itextsharp tag "SO FAQ & Popular" to Merge PDFs, it's mentioned in the third comment for the correctly marked answer by #Ray Cheng.
Another way to merge fillable PDF (without flattening the form) is to rename the form fields for the second/following page(s), but that's more work.