itextsharp 5.4.4 CopyAcroForm no longer there - itext

In version 5.4.2 of itextsharp I was able to use: (fragment in VB)
Dim pdfWriter As iTextSharp.text.pdf.PdfCopy
pdfwriter = New iTextSharp.text.pdf.PdfCopy(outputPDFDocument, New FileStream(destfname, FileMode.Create))
pdfWriter.CopyAcroForm(reader)
to copy a form from one document to another.
In 5.4.4 CopyAcroForm is no longer there under PdfCopy or anywhere else - what is the alternative?

Please read the release notes for iText 5.4.4. It is now possible to use PdfCopy to merge PDFs containing AcroForm forms by using the addDocument() method. This method is much better than the copyAcroForm() method as it also preserves the structured tree root. This is important if your forms are made accessible (cf Section 508 or the PDF/UA standard).

AddDocument() method is cool. Here is my code that read and merge multiple PDFs from SQL server in asp.net . document.Close() is required to flush the content to memory stream.
enter code here
Document document = new Document();
MemoryStream output = new MemoryStream();
PdfCopy writer = new PdfCopy(document, output); // Initialize pdf writer
writer.SetMergeFields();
document.Open();
SqlDataReader dr = cmd.ExecuteReader();
while (dr.Read())
{
PdfReader reader = new PdfReader((Byte[])dr["ImageFile"]);
writer.AddDocument(reader);
}
dr.Close();
document.Close();

It looks like you also need to call .SetMergeFields() or it won't work:
reader = new PdfReader(path);
using (var document = new Document(reader.GetPageSizeWithRotation(1))) {
using (var outputStream = new FileStream(...)) {
using (var writer = new PdfCopy(document, outputStream)) {
writer.SetMergeFields();
document.Open();
//all pages:
writer.AddDocument(reader);
//Particular Pages:
//writer.AddDocument(reader, new List<int> { pageNumber });
}
}
}

Related

iText7: com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data

I try to generate a toc(table of content) for my pdf, and I want to get some strings which look like chapter title in xxx.pdf using ITextExtractionStrategy. But I got com.itextpdf.kernel.PdfException when I am running a test.
Here is my code:
#org.junit.Test
public void test() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfDocument pdfDoc = new PdfDocument(new PdfReader("src/test/resources/template/xxx.pdf"),
new PdfWriter(baos));
pdfDoc.addNewPage(1);
Document document = new Document(pdfDoc);
// when add this code, throw com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data.
Paragraph title = new Paragraph(new Text("index"))
.setTextAlignment(TextAlignment.CENTER);
document.add(title);
SimpleTextExtractionStrategy extractionStrategy = new SimpleTextExtractionStrategy();
for (int i = 1; i < pdfDoc.getNumberOfPages(); i++) {
PdfPage page = pdfDoc.getPage(i);
PdfCanvasProcessor parser = new PdfCanvasProcessor(extractionStrategy);
parser.processPageContent(page);
}
...
document.close();
pdfDoc.close();
new FileOutputStream("./yyy.pdf").write(baos.toByteArray());
}
Here is the output:
com.itextpdf.kernel.PdfException: Dictionary doesn't have supported font data.
at com.itextpdf.kernel.font.PdfFontFactory.createFont(PdfFontFactory.java:123)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.getFont(PdfCanvasProcessor.java:490)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$SetTextFontOperator.invoke(PdfCanvasProcessor.java:811)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:454)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:282)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:303)
at com.example.pdf.util.Test.test(Test.java:138)
Whenever you add content to a PdfDocument like you do here
Document document = new Document(pdfDoc);
Paragraph title = new Paragraph(new Text("index"))
.setTextAlignment(TextAlignment.CENTER);
document.add(title);
you have to be aware that this content is not already stored in its final form; for example fonts used are not yet properly subset'ed. The final form is generated when you're closing the document.
Text extraction on the other hand requires the content to extract to be in its final form.
Thus, you should not apply text extraction to a document you're working on. In particular, don't apply text extraction to a page you've changed the content of.
If you need to extract text from the documents you create yourself, close your document first, open a new document from the output, and extract from that new document.

iText7 Merge pdf annotations on a new pdf document

I have multiple copies of a .pdf document that are commented by different users. I would like to merge all these comments into a new pdf "merged".
I wrote this sub inside a class called document with properties "path" and "directory".
Public Sub MergeComments(ByVal pdfDocuments As String())
Dim oSavePath As String = Directory & "\" & FileName & "_Merged.pdf"
Dim oPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(Path),
New PdfWriter(New IO.FileStream(oSavePath, IO.FileMode.Create)))
For Each oFile As String In pdfDocuments
Dim oSecundairyPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(oFile))
Dim oAnnotations As New PDFannotations
For i As Integer = 1 To oSecundairyPDFdocument.GetNumberOfPages
Dim pdfPage As PdfPage = oSecundairyPDFdocument.GetPage(i)
For Each oAnnotation As Annot.PdfAnnotation In pdfPage.GetAnnotations()
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Next
Next
Next
oPDFdocument.Close()
End Sub
This code results in an exception that I am failing to solve.
iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.'
What do I need to change in order to perform this task? Or am I completely off with my code block?
You need to explicitly copy the underlying PDF object to the destination document. After that you will be easily able to add that object to the list of page annotations.
Instead of adding the annotation directly:
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Copy the object to the destination document first, wrap it into PdfAnnotation class with makeAnnotation method and then add it as usual. Code is in Java but you will easily be able to convert it into VB:
PdfObject annotObject = oAnnotation.getPdfObject().copyTo(pdfDocument);
pdfDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
Here is a working Java code, with annotations copied from one document to other using the copyTo method.
PdfReader reader = new PdfReader(new
RandomAccessSourceFactory().createBestSource(sourceFileName), null);
PdfDocument document = new PdfDocument(reader);
PdfReader toMergeReader = new PdfReader(new RandomAccessSourceFactory().createBestSource(targetFileName), null);
PdfDocument toMergeDocument = new PdfDocument(toMergeReader);
PdfWriter writer = new PdfWriter(targetFileName + "_MergedVersion.pdf");
PdfDocument writeDocument = new PdfDocument(writer);
int pageCount = toMergeDocument.getNumberOfPages();
for (int i = 1; i <= pageCount; i++) {
PdfPage page = document.getPage(i);
writeDocument.addPage(page.copyTo(writeDocument));
PdfPage pdfPage = toMergeDocument.getPage(i);
List<PdfAnnotation> pageAnnots = pdfPage.getAnnotations();
if (pageAnnots != null) {
for (PdfAnnotation pdfAnnotation : pageAnnots) {
PdfObject annotObject = pdfAnnotation.getPdfObject().copyTo(writeDocument);
writeDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
}
}
}
reader.close();
toMergeReader.close();
toMergeDocument.close();
document.close();
writeDocument.close();
writer.close();

Fillable forms: two filled in copies of same form, 2nd one doesn't print data

All,
I have two PDFs. They are the same PDF, but filled in with different information. For instance, the Company field will have "ABC Company" as the filled in data for PDF1 and "XYZ Company" as the filled in data for PDF 2.
When merging, I will get both documents, but the 2nd one will contain no data in the fillable fields.
I can switch the order they get merged and always the first one will have data, but the 2nd one will have no data.
From researching, it seems as if there might be issues due to the fields on the pages all having the same name? Could this be the issue?
I am using version: iText 5.4
Thanks for any help/suggestions you can provide.
--- 9/10/2018 Code to show.
FYI. I have taken over this code from someone who wrote it years ago and have not worked with Itext before so I am just digging in. Here is the code that I am using. Oracle calling Java to run the code.
As always, thanks for the responses that already came in and any future ones..thx.
import java.sql.Blob;
import java.sql.ResultSet;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfReader;
public class PdfMerger {
public static void merge(ResultSet pdfs, Blob dest) throws Exception {
Document doc = new Document();
PdfCopy copy = new PdfCopy(doc, dest.setBinaryStream(0));
doc.open();
while (pdfs.next()) {
PdfReader rdr = new PdfReader(pdfs.getBinaryStream(1));
for (int i = 1; i <= rdr.getNumberOfPages(); i++)
copy.addPage(copy.getImportedPage(rdr, i));
}
copy.close();
doc.close();
}
}
--JM
Thanks for the response about :
PdfStamper stamper = new PdfStamper(reader, output);
stamper.setFormFlattening(true);
stamper.close();
As a non Java person, (strictly PL/SQL) I very much appreciate the assistance. Where would these three lines be placed in the code above to flatten the docs?
I apologize for not knowing more, but I just got placed on this project. Appreciate the help. This is the last step to completion and if I could get over this hurdle....
Would it look like?
CREATE OR REPLACE AND RESOLVE JAVA SOURCE NAMED "PdfMergerTest" as import java.sql.Blob;
import java.sql.ResultSet;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
public class PdfMergerTest {
public static void merge(ResultSet pdfs, Blob dest) throws Exception {
Document doc = new Document();
PdfCopy copy = new PdfCopy(doc, dest.setBinaryStream(0));
doc.open();
while (pdfs.next()) {
PdfReader rdr = new PdfReader(pdfs.getBinaryStream(1));
PdfStamper stamper = new PdfStamper(rdr,copy);
stamper.setFormFlattening(true);
stamper.close();
for (int i = 1; i <= rdr.getNumberOfPages(); i++)
copy.addPage(copy.getImportedPage(rdr, i));
}
copy.close();
doc.close();
}
}
You should flatten PDF forms before merging/joining:
PdfStamper stamper = new PdfStamper(reader, output);
stamper.setFormFlattening(true);
stamper.close();
Field values will then become part of the document as a text instead of field + value and it will prevent collision between field values of merged documents.
Update: The result code in your case will be:
PdfReader rdr = new PdfReader(pdfs.getBinaryStream(1));
ByteArrayOutputStream output = new ByteArrayOutputStream();
PdfStamper stamper = new PdfStamper(rdr, output);
stamper.setFormFlattening(true);
stamper.close();
rdr = new PdfReader(output.toByteArray());
for (int i = 1; i <= rdr.getNumberOfPages(); i++) {
copy.addPage(copy.getImportedPage(rdr, i));
}
It means that it fills form using PdfStamper and saves it into the ByteArraouOutputStream and then it again reads the filled PDF and adds page by page into the PdfCopy.
You can also try to use PdfSmartCopy instead of PdfCopy to reduce the PDF file size. PdfSmartCopy tries to share the same objects across the result joined PDF, but it needs much more memory then PdfCopy and will be slower.

Is it possible to merge several pdfs using iText7

I have several datasheets for products. Each is a separate file. What I want to do is to use iText to generate a summary / recommended set of actions, based on answers to a webform, and then append to that all the relevant datasheets. This way, I only need to open one new tab in the browser to print all information, rather than opening one for the summary, and one for each datasheet that is needed.
So, is it possible to do this using iText?
Yes, you can merge PDFs using iText 7. E.g. look at the iText 7 Jump-Start tutorial sample C06E04_88th_Oscar_Combine, the pivotal code is:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();
(C06E04_88th_Oscar_Combine method createPdf)
Depending on your use case, you might want to use the PdfDenseMerger with its helper class PageVerticalAnalyzer instead of the PdfMerger here. It attempts to put content from multiple source pages onto a single target page and corresponds to the iText 5 PdfVeryDenseMergeTool from this answer. Due to the nature of PDF files this only works for PDFs without headers, footers, and similar artifacts.
I found a solution that works quite well.
public byte[] Combine(IEnumerable<byte[]> pdfs)
{
using (var writerMemoryStream = new MemoryStream())
{
using (var writer = new PdfWriter(writerMemoryStream))
{
using (var mergedDocument = new PdfDocument(writer))
{
var merger = new PdfMerger(mergedDocument);
foreach (var pdfBytes in pdfs)
{
using (var copyFromMemoryStream = new MemoryStream(pdfBytes))
{
using (var reader = new PdfReader(copyFromMemoryStream))
{
using (var copyFromDocument = new PdfDocument(reader))
{
merger.Merge(copyFromDocument, 1, copyFromDocument.GetNumberOfPages());
}
}
}
}
}
}
return writerMemoryStream.ToArray();
}
}
Use
DirectoryInfo d = new DirectoryInfo(INPUT_FOLDER);
var pdfList = new List<byte[]> { };
foreach (var file in d.GetFiles("*.pdf"))
{
pdfList.Add(File.ReadAllBytes(file.FullName));
}
File.WriteAllBytes(OUTPUT_FOLDER + "\\merged.pdf", Combine(pdfList));
Autor:
https://www.nikouusitalo.com/blog/combining-pdf-documents-using-itext7-and-c/
If you want to add two array of bytes and return one array of bytes as PDF/A
public static byte[] mergePDF(byte [] first, byte [] second) throws IOException {
// Initialize PDF writer
ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(arrayOutputStream);
// Initialize PDF document
PdfADocument pdf = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "",
"https://www.color.org", "sRGB IEC61966-2.1", new FileInputStream("sRGB_CS_profile.icm")));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(first)));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(second)));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
writer.close();
pdf.close();
return arrayOutputStream.toByteArray();
}
The question doesn't specify the language, so I'm adding an answer using C#; this works for me. I'm creating three separate but related PDFs then combining them into one.
After creating the three separate PDF docs and adding data to them, I combine them this way:
PdfDocument pdfCombined = new PdfDocument(new PdfWriter(destCombined));
PdfMerger merger = new PdfMerger(pdfCombined);
PdfDocument pdfReaderExecSumm = new PdfDocument(new PdfReader(destExecSumm));
merger.Merge(pdfReaderExecSumm, 1, pdfReaderExecSumm.GetNumberOfPages());
PdfDocument pdfReaderPhrases = new PdfDocument(new PdfReader(destPhrases));
merger.Merge(pdfReaderPhrases, 1, pdfReaderPhrases.GetNumberOfPages());
PdfDocument pdfReaderUncommonWords = new PdfDocument(new PdfReader(destUncommonWords));
merger.Merge(pdfReaderUncommonWords, 1, pdfReaderUncommonWords.GetNumberOfPages());
pdfCombined.Close();
So the combined PDF is a PDFWriter type of PdfDocument, and the merged pieces parts are PdfReader types of PdfDocuments, and the PdfMerger is the glue that binds it all together.
Here is the minimum C# code needed to merge file1.pdf into file2.pdf creating new merged.pdf:
var path = #"C:\Temp\";
var src0 = System.IO.Path.Combine(path, "merged.pdf");
var wtr0 = new PdfWriter(src0);
var pdf0 = new PdfDocument(wtr0);
var src1 = System.IO.Path.Combine(path, "file1.pdf");
var fi1 = new FileInfo(src1);
var rdr1= new PdfReader(fi1);
var pdf1 = new PdfDocument(rdr1);
var src2 = System.IO.Path.Combine(path, "file2.pdf");
var fi2 = new FileInfo(src2);
var rdr2 = new PdfReader(fi2);
var pdf2 = new PdfDocument(rdr2);
var merger = new PdfMerger(pdf0);
merger.Merge(pdf1, 1, pdf1.GetNumberOfPages());
merger.Merge(pdf2, 1, pdf2.GetNumberOfPages());
merger.Close();
pdf0.Close();
Here is a VB.NET solution using open source iText7 that can merge multiple PDF files to an output file.
Imports iText.Kernel.Pdf
Imports iText.Kernel.Utils
Public Function Merge_PDF_Files(ByVal input_files As List(Of String), ByVal output_file As String) As Boolean
Dim Input_Document As PdfDocument = Nothing
Dim Output_Document As PdfDocument = Nothing
Dim Merger As PdfMerger
Try
Output_Document = New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfWriter(output_file))
'Create the output file (Document) from a Merger stream'
Merger = New PdfMerger(Output_Document)
'Merge each input PDF file to the output document'
For Each file As String In input_files
Input_Document = New PdfDocument(New PdfReader(file))
Merger.Merge(Input_Document, 1, Input_Document.GetNumberOfPages())
Input_Document.Close()
Next
Output_Document.Close()
Return True
Catch ex As Exception
'catch Exception if needed'
If Input_Document IsNot Nothing Then Input_Document.Close()
If Output_Document IsNot Nothing Then Output_Document.Close()
File.Delete(output_file)
Return False
End Try
End Function
USAGE EXAMPLE:
Dim success as boolean = false
Dim input_files_list As New List(Of String)
input_files_list.Add("c:\input_PDF1.pdf")
input_files_list.Add("c:\input_PDF2.pdf")
input_files_list.Add("c:\input_PDF3.pdf")
success = Merge_PDF_Files(input_files_list, "c:\output_PDF.pdf")
'Optional: handling errors'
if success then
'Files merged'
else
'Error merging files'
end if

Streaming OpenXML result

By using OpenXML to manipulating a Word document (as a template), the server application saves the new content as a temporary file and then sends it to user to download.
The question is how to make these content ready to download without saving it on the server as a temporary file? Is it possible to save OpenXML result as a byte[] or Stream instead of saving it as a file?
Using this page:
OpenXML file download without temporary file
I changed my code to this one:
byte[] result = null;
byte[] templateBytes = System.IO.File.ReadAllBytes(wordTemplate);
using (MemoryStream templateStream = new MemoryStream())
{
templateStream.Write(templateBytes, 0, (int)templateBytes.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(templateStream, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
...
mainPart.Document.Save();
templateStream.Position = 0;
using (MemoryStream memoryStream = new MemoryStream())
{
templateStream.CopyTo(memoryStream);
result = memoryStream.ToArray();
}
}
}
You can create the WordprocessingDocument and then use the Save() method to save it to a Stream.
http://msdn.microsoft.com/en-us/library/cc882497
var memoryStream = new MemoryStream();
document.Clone(memoryStream);