iText7: Error at file pointer when merging two pdfs - itext

We are in the last steps of evaluating iText7. We use iText 7.1.0 and html2pdf 2.0.0.
What we do: we send a json_encoded collection with pdf-data (which includes html for header, body and footer) to our Java app. There we iterate over the collection, create a byteArrayOutputStream for each pdf-data element and merge them together. We then send the results to a script which echoes it to e.g. a browser. Although the pdf is displayed correctly, we encounter errors while creating it:
com.itextpdf.io.IOException: Error at file pointer 226,416.
...
Caused by: com.itextpdf.io.IOException: xref subsection not found.
... 73 common frames omitted
If we create only one part of the collection, no error is thrown.
Iterate over collection and merge:
#RequestMapping(value = "/pdf", method = RequestMethod.POST, produces = MediaType.APPLICATION_PDF_VALUE)
public byte[] index(#RequestBody PDFDataModelCollection elements, Model model) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
try (PdfDocument resultDoc = new PdfDocument(writer)) {
for (PDFDataModel pdfDataModel : elements.getElements()) {
PdfReader reader = new PdfReader(new ByteArrayInputStream(creationService.createDatasheet(pdfDataModel)));
try (PdfDocument sourceDoc = new PdfDocument(reader)) {
int n = sourceDoc.getNumberOfPages(); //<-- IOException on second iteration
for (int i = 1; i <= n; i++) {
PdfPage page = sourceDoc.getPage(i).copyTo(resultDoc);
resultDoc.addPage(page);
}
}
}
}
return byteArrayOutputStream.toByteArray(); //outputs the final pdf
}
Creation of part:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
//Initialize PDF document
PdfDocument pdfDoc = new PdfDocument(writer);
try (
Document document = new Document(pdfDoc)
) {
//header, footer, etc
//body
for (IElement element : HtmlConverter.convertToElements(pdfDataModel.getBody(), this.props)) {
document.add((IBlockElement) element);
}
footer.writeTotalNumberOnPages(pdfDoc);
}
return byteArrayOutputStream.toByteArray();
}
We are grateful for any suggestion.

In createDatasheet you appear to re-use some byteArrayOutputStream without clearing it first.
In the first iteration, therefore, everything works as desired, at the end of createDatasheet you have a single PDF file in it.
In the second iteration, though, you have two PDF files in that byteArrayOutputStream, one after the other. This concatenation does not form a valid single PDF.
Thus, byteArrayOutputStream.toByteArray() returns something broken.
To fix this, either make the byteArrayOutputStream local to createDatasheet and create a new instance every time or alternatively reset byteArrayOutputStream at the start of createDatasheet:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
byteArrayOutputStream.reset();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
[...]

Related

iText7 Merge pdf annotations on a new pdf document

I have multiple copies of a .pdf document that are commented by different users. I would like to merge all these comments into a new pdf "merged".
I wrote this sub inside a class called document with properties "path" and "directory".
Public Sub MergeComments(ByVal pdfDocuments As String())
Dim oSavePath As String = Directory & "\" & FileName & "_Merged.pdf"
Dim oPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(Path),
New PdfWriter(New IO.FileStream(oSavePath, IO.FileMode.Create)))
For Each oFile As String In pdfDocuments
Dim oSecundairyPDFdocument As New iText.Kernel.Pdf.PdfDocument(New PdfReader(oFile))
Dim oAnnotations As New PDFannotations
For i As Integer = 1 To oSecundairyPDFdocument.GetNumberOfPages
Dim pdfPage As PdfPage = oSecundairyPDFdocument.GetPage(i)
For Each oAnnotation As Annot.PdfAnnotation In pdfPage.GetAnnotations()
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Next
Next
Next
oPDFdocument.Close()
End Sub
This code results in an exception that I am failing to solve.
iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.'
What do I need to change in order to perform this task? Or am I completely off with my code block?
You need to explicitly copy the underlying PDF object to the destination document. After that you will be easily able to add that object to the list of page annotations.
Instead of adding the annotation directly:
oPDFdocument.GetPage(i).AddAnnotation(oAnnotation)
Copy the object to the destination document first, wrap it into PdfAnnotation class with makeAnnotation method and then add it as usual. Code is in Java but you will easily be able to convert it into VB:
PdfObject annotObject = oAnnotation.getPdfObject().copyTo(pdfDocument);
pdfDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
Here is a working Java code, with annotations copied from one document to other using the copyTo method.
PdfReader reader = new PdfReader(new
RandomAccessSourceFactory().createBestSource(sourceFileName), null);
PdfDocument document = new PdfDocument(reader);
PdfReader toMergeReader = new PdfReader(new RandomAccessSourceFactory().createBestSource(targetFileName), null);
PdfDocument toMergeDocument = new PdfDocument(toMergeReader);
PdfWriter writer = new PdfWriter(targetFileName + "_MergedVersion.pdf");
PdfDocument writeDocument = new PdfDocument(writer);
int pageCount = toMergeDocument.getNumberOfPages();
for (int i = 1; i <= pageCount; i++) {
PdfPage page = document.getPage(i);
writeDocument.addPage(page.copyTo(writeDocument));
PdfPage pdfPage = toMergeDocument.getPage(i);
List<PdfAnnotation> pageAnnots = pdfPage.getAnnotations();
if (pageAnnots != null) {
for (PdfAnnotation pdfAnnotation : pageAnnots) {
PdfObject annotObject = pdfAnnotation.getPdfObject().copyTo(writeDocument);
writeDocument.getPage(i).addAnnotation(PdfAnnotation.makeAnnotation(annotObject));
}
}
}
reader.close();
toMergeReader.close();
toMergeDocument.close();
document.close();
writeDocument.close();
writer.close();

Is it possible to merge several pdfs using iText7

I have several datasheets for products. Each is a separate file. What I want to do is to use iText to generate a summary / recommended set of actions, based on answers to a webform, and then append to that all the relevant datasheets. This way, I only need to open one new tab in the browser to print all information, rather than opening one for the summary, and one for each datasheet that is needed.
So, is it possible to do this using iText?
Yes, you can merge PDFs using iText 7. E.g. look at the iText 7 Jump-Start tutorial sample C06E04_88th_Oscar_Combine, the pivotal code is:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();
(C06E04_88th_Oscar_Combine method createPdf)
Depending on your use case, you might want to use the PdfDenseMerger with its helper class PageVerticalAnalyzer instead of the PdfMerger here. It attempts to put content from multiple source pages onto a single target page and corresponds to the iText 5 PdfVeryDenseMergeTool from this answer. Due to the nature of PDF files this only works for PDFs without headers, footers, and similar artifacts.
I found a solution that works quite well.
public byte[] Combine(IEnumerable<byte[]> pdfs)
{
using (var writerMemoryStream = new MemoryStream())
{
using (var writer = new PdfWriter(writerMemoryStream))
{
using (var mergedDocument = new PdfDocument(writer))
{
var merger = new PdfMerger(mergedDocument);
foreach (var pdfBytes in pdfs)
{
using (var copyFromMemoryStream = new MemoryStream(pdfBytes))
{
using (var reader = new PdfReader(copyFromMemoryStream))
{
using (var copyFromDocument = new PdfDocument(reader))
{
merger.Merge(copyFromDocument, 1, copyFromDocument.GetNumberOfPages());
}
}
}
}
}
}
return writerMemoryStream.ToArray();
}
}
Use
DirectoryInfo d = new DirectoryInfo(INPUT_FOLDER);
var pdfList = new List<byte[]> { };
foreach (var file in d.GetFiles("*.pdf"))
{
pdfList.Add(File.ReadAllBytes(file.FullName));
}
File.WriteAllBytes(OUTPUT_FOLDER + "\\merged.pdf", Combine(pdfList));
Autor:
https://www.nikouusitalo.com/blog/combining-pdf-documents-using-itext7-and-c/
If you want to add two array of bytes and return one array of bytes as PDF/A
public static byte[] mergePDF(byte [] first, byte [] second) throws IOException {
// Initialize PDF writer
ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(arrayOutputStream);
// Initialize PDF document
PdfADocument pdf = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "",
"https://www.color.org", "sRGB IEC61966-2.1", new FileInputStream("sRGB_CS_profile.icm")));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(first)));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(second)));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
writer.close();
pdf.close();
return arrayOutputStream.toByteArray();
}
The question doesn't specify the language, so I'm adding an answer using C#; this works for me. I'm creating three separate but related PDFs then combining them into one.
After creating the three separate PDF docs and adding data to them, I combine them this way:
PdfDocument pdfCombined = new PdfDocument(new PdfWriter(destCombined));
PdfMerger merger = new PdfMerger(pdfCombined);
PdfDocument pdfReaderExecSumm = new PdfDocument(new PdfReader(destExecSumm));
merger.Merge(pdfReaderExecSumm, 1, pdfReaderExecSumm.GetNumberOfPages());
PdfDocument pdfReaderPhrases = new PdfDocument(new PdfReader(destPhrases));
merger.Merge(pdfReaderPhrases, 1, pdfReaderPhrases.GetNumberOfPages());
PdfDocument pdfReaderUncommonWords = new PdfDocument(new PdfReader(destUncommonWords));
merger.Merge(pdfReaderUncommonWords, 1, pdfReaderUncommonWords.GetNumberOfPages());
pdfCombined.Close();
So the combined PDF is a PDFWriter type of PdfDocument, and the merged pieces parts are PdfReader types of PdfDocuments, and the PdfMerger is the glue that binds it all together.
Here is the minimum C# code needed to merge file1.pdf into file2.pdf creating new merged.pdf:
var path = #"C:\Temp\";
var src0 = System.IO.Path.Combine(path, "merged.pdf");
var wtr0 = new PdfWriter(src0);
var pdf0 = new PdfDocument(wtr0);
var src1 = System.IO.Path.Combine(path, "file1.pdf");
var fi1 = new FileInfo(src1);
var rdr1= new PdfReader(fi1);
var pdf1 = new PdfDocument(rdr1);
var src2 = System.IO.Path.Combine(path, "file2.pdf");
var fi2 = new FileInfo(src2);
var rdr2 = new PdfReader(fi2);
var pdf2 = new PdfDocument(rdr2);
var merger = new PdfMerger(pdf0);
merger.Merge(pdf1, 1, pdf1.GetNumberOfPages());
merger.Merge(pdf2, 1, pdf2.GetNumberOfPages());
merger.Close();
pdf0.Close();
Here is a VB.NET solution using open source iText7 that can merge multiple PDF files to an output file.
Imports iText.Kernel.Pdf
Imports iText.Kernel.Utils
Public Function Merge_PDF_Files(ByVal input_files As List(Of String), ByVal output_file As String) As Boolean
Dim Input_Document As PdfDocument = Nothing
Dim Output_Document As PdfDocument = Nothing
Dim Merger As PdfMerger
Try
Output_Document = New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfWriter(output_file))
'Create the output file (Document) from a Merger stream'
Merger = New PdfMerger(Output_Document)
'Merge each input PDF file to the output document'
For Each file As String In input_files
Input_Document = New PdfDocument(New PdfReader(file))
Merger.Merge(Input_Document, 1, Input_Document.GetNumberOfPages())
Input_Document.Close()
Next
Output_Document.Close()
Return True
Catch ex As Exception
'catch Exception if needed'
If Input_Document IsNot Nothing Then Input_Document.Close()
If Output_Document IsNot Nothing Then Output_Document.Close()
File.Delete(output_file)
Return False
End Try
End Function
USAGE EXAMPLE:
Dim success as boolean = false
Dim input_files_list As New List(Of String)
input_files_list.Add("c:\input_PDF1.pdf")
input_files_list.Add("c:\input_PDF2.pdf")
input_files_list.Add("c:\input_PDF3.pdf")
success = Merge_PDF_Files(input_files_list, "c:\output_PDF.pdf")
'Optional: handling errors'
if success then
'Files merged'
else
'Error merging files'
end if

How to fill XFA from using iText so it is Foxit Reader comptible

I used examples available on web to create an application that is able to get xml structure of XFA form and then set it back filled. Important code looks like this:
public void readData(String src, String dest)
throws IOException, ParserConfigurationException, SAXException,
TransformerFactoryConfigurationError, TransformerException {
FileOutputStream os = new FileOutputStream(dest);
PdfReader reader = new PdfReader(src);
XfaForm xfa = new XfaForm(reader);
Node node = xfa.getDatasetsNode();
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
if ("data".equals(list.item(i).getLocalName())) {
node = list.item(i);
break;
}
}
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(new DOMSource(node), new StreamResult(os));
reader.close();
}
public void fillPdfWithXmlData(String src, String xml, String dest)
throws IOException, DocumentException {
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest), '\0', true);
AcroFields form = stamper.getAcroFields();
XfaForm xfa = form.getXfa();
xfa.fillXfaForm(new FileInputStream(xml));
stamper.close();
reader.close();
}
When I use it to fill this form: http://www.vzp.cz/uploads/document/tiskopisy-pro-zamestnavatele-hromadne-oznameni-zamestnavatele-verze-2-pdf-56-kb.pdf it works fine and I can see the filled form in Acrobat Reader. However, if I open the document in Foxit Reader I see a blank form (tested in latest version and 5.x version).
I tried to play a little bit with it and got these XfaForm(...).getDomDocument() data:
Filled by Acrobat Reader: http://pastebin.com/kXKyh9EM
Filled by Foxit Reader: http://pastebin.com/tiZ7EmfE
Filled by iText: http://pastebin.com/tTKLMERC
Filled by Foxit Reader after a fill by iText: http://pastebin.com/Uuq0jS4b
The field which was filled is . Is it possible to use iText in a way that it works even with Foxit Reader (and the XFA signature stays?)
In your Filled by iText example there's a superfluous <xfa:data> element:
<xfa:datasets xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
<xfa:data>
<xfa:data>
<HOZ>
<!-- rest of your data here -->
</HOZ>
</xfa:data>
</xfa:data>
<!-- data description -->
</xfa:datasets>
This is because the fillXfaForm() method of XfaForm expects the XML data without <xfa:data> as the root element. So your XML data should just look like:
<HOZ>
<!-- rest of your data here -->
</HOZ>
I see that your readData() method that extract the existing form data including the <xfa:data> element:
<xfa:data>
<HOZ>
<!-- rest of your data here -->
</HOZ>
</xfa:data>
Stripping the outer element should fix your problem. For example:
XfaForm xfa = new XfaForm(reader);
Node node = xfa.getDatasetsNode();
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
if ("data".equals(list.item(i).getLocalName())) {
node = list.item(i);
break;
}
}
// strip <xfa:data>
node = node.getFirstChild();
// Transformer code here

C# - System.ObectDisposedException in MemoryStream for Capacity, Length, and Position Properties

When executing the code to render PDF file from Html Stream, code seems to be executing fine, but PDF is not generated.
When debugging through the code, it seems that everything works just fine until I do not start looking into MemoryStream object properties and notice the following under MemoryStream object:
This is the code:
public partial class WriteNotes : System.Web.UI.Page
{
...
protected override void Render(System.Web.UI.HtmlTextWriter writer)
{
...
using (System.IO.MemoryStream printStream = new System.IO.MemoryStream())
using (System.IO.StreamWriter printStreamWriter = new System.IO.StreamWriter(printStream))
using (System.Web.UI.HtmlTextWriter printWriter = new System.Web.UI.HtmlTextWriter(printStreamWriter))
{
base.Render(printWriter);
printWriter.Flush();
using (System.IO.StreamReader myStreamReader = new System.IO.StreamReader(printStream))
{
myStreamReader.BaseStream.Position = 0;
Document pdfDocument = pdfConverter.GetPdfDocumentObjectFromHtmlStream(myStreamReader.BaseStream, System.Text.Encoding.Default, HttpContext.Current.Request.Url.ToString().Replace(HttpContext.Current.Request.Url.PathAndQuery, "/"));
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ContentType = "application/pdf";
pdfDocument.Save(HttpContext.Current.Response.OutputStream);
HttpContext.Current.Response.Flush();
HttpContext.Current.Response.End();
}
}
}
...
}
Executing the following line of code produces the described exception for MemoryStream.
Document pdfDocument = pdfConverter.GetPdfDocumentObjectFromHtmlStream(myStreamReader.BaseStream, System.Text.Encoding.Default, HttpContext.Current.Request.Url.ToString().Replace(HttpContext.Current.Request.Url.PathAndQuery, "/"));
The same kind of exception happens if I do not use Disposable Pattern.
The same code is in production and works fine.
What can be the reason?
Sure enough, found it. Read the documentation here:
PdfConverter.GetPdfDocumentObjectFromHtmlStream(htmlStream, streamEncoding)
In a nutshell it says to close the document object which you are not doing in your code above.
PdfDocument.Close()
States it here too, always call method close on the document object once you are done with it. Try this updated code.
Also add the library you are using next time if you happen to know it. Some answers can be found right in the documentation (not all the time of course).
public partial class WriteNotes : System.Web.UI.Page
{
...
protected override void Render(System.Web.UI.HtmlTextWriter writer)
{
...
using (System.IO.MemoryStream printStream = new System.IO.MemoryStream())
using (System.IO.StreamWriter printStreamWriter = new System.IO.StreamWriter(printStream))
using (System.Web.UI.HtmlTextWriter printWriter = new System.Web.UI.HtmlTextWriter(printStreamWriter))
{
base.Render(printWriter);
printWriter.Flush();
using (System.IO.StreamReader myStreamReader = new System.IO.StreamReader(printStream))
{
myStreamReader.BaseStream.Position = 0;
Document pdfDocument = pdfConverter.GetPdfDocumentObjectFromHtmlStream(myStreamReader.BaseStream, System.Text.Encoding.Default, HttpContext.Current.Request.Url.ToString().Replace(HttpContext.Current.Request.Url.PathAndQuery, "/"));
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ContentType = "application/pdf";
pdfDocument.Save(HttpContext.Current.Response.OutputStream);
pdfDocument.Close(); // add this line and see what happens
HttpContext.Current.Response.Flush();
HttpContext.Current.Response.End();
}
}
}
...
}

Creating a PDF back page using Itext

I am currently using iText (Java) to create a PDF document.
This PDf document is meant to be a Businesscard to be printed.
This is part of my code:
public void write() throws DocumentException, FileNotFoundException, IOException {
String extension = ".pdf";
String file = "testerPDF";
String filename = file + extension;
Document doku = new Document(PageSize.A4, 10,20,50,150);
PdfWriter writer = PdfWriter.getInstance(doku, new FileOutputStream(new File(filename)));
doku.open();
Phrase textblock = new Phrase();
// jetzt durch den Vector iterarieren
for(int i = 0; i < vector.size(); i++) {
Chunk chunk=new Chunk((String)vector.get(i));
textblock.add(chunk);
}
doku.add(new Paragraph(textblock));
doku.close();
writer.close();
}
So I need now a way to also write a Back Page for this created PDF. In such a way that the document would look like a Business card on Paper.