iText fails to propery merge signatures in PDF files

iText fails to propery merge signatures in PDF files - itext

I have a problem with merging PDF files using iText. I am using the PdfCopyFields class to do the merging yet, the signatures of existing PDF files are not merged properly. It seems that some data in the dictionary of the merged PDF file are incorrect.
To be specific:
I have 2 documents i want to merge, each document has 2 pages, each page in each document is signed with a different signature (in each signature i put a different reason code in order to be able to visually tell each signature).
The merged PDF file contains:
4 PDF pages (this is correct)
All 4 pages have a signature
The signature of pages 3 and 4 are NOT the original signatures, but they are the same signatures as those of page 1 and 2. To verify that the signatures are different i put a unique 'Reason' code in the signature.
It is obvious that although the resulting PDF has 4 signatures (which i know are PDF fields), it seems that these signature-fields reference the signature data of the wrong document.
Moreover, when i open the merged pdf file with a text editor, i look at the 'header' of the PDF file and find only 2 Signature entries (instead of 4). This means that the actual signatures in the pages reference only those 2 signatures, thus the mix-up.
Thank you
Costas
PS: i can post sample PDF files to reproduce the error, but the simplest ones will do the job (i created 2 PDF files with MS Word and stamped each page separately)

Concerning the observation
The signature of pages 3 and 4 are NOT the original signatures, but they are the same signatures as those of page 1 and 2. To verify that the signatures are different i put a unique 'Reason' code in the signature.
That might happen if the names of the signature fields in the documents coincide. Multiple PDF fields with the same name are considered multiple visualizations of the same field. The merge process may throw away duplicate values in that case.
I'm not sure, though, whether that is the case for your files. If you want to know, please share them.
...
Having inspected the sample files it becomes clear that indeed the problem is caused by the identical signature field names in the merged documents:
Doc1-signed.pdf has
a signed signature field Signature1 (field and widget merged) on page 1 with Reason Doc1-Page1 in its value and
a signed signature field Signature2 (field and widget merged) on page 2 with Reason Doc1-Page2 in its value.
Doc2-signed.pdf has
a signed signature field Signature1 (field and widget merged) on page 1 with Reason Doc2-Page1 in its value and
a signed signature field Signature2 (field and widget merged) on page 2 with Reason Doc2-Page2 in its value.
Merging result in MERGED-PDF.pdf which has
a signed signature field Signature1 with Reason Doc1-Page1 in its value with explicit widgets on pages 1 and 3 and
a signed signature field Signature2 with Reason Doc1-Page2 in its value with explicit widgets on pages 2 and 4.
Because the whole PDF is considered a single form, a form field name can only have a single associated value.
Thus, merging multiple fields with the same name from the two source documents into a single fields with multiple widgets (as PdfCopyFields seems to do) is a sensible action to take.
I tried using an online pdf merge service and the signature fields were merged properly
By merged properly I assume you mean they still had their original, differing values. This in turn indicates that that service did not merge the fields as described above.
But this is not more proper than what PdfCopyFields does, it is dumber because the value of the field Signature1 now is unclear, just like the value of Signature2.
The proper thing to do if you want to keep the differing values of the source fields with duplicate names, is to rename such duplicate fields in the course of the merge. (If the online PDF merge service did this, too, it was not dumb. But you did not indicate any change in field names...)
You can find sample code for a merge of documents with their fields renamed in iText in Action chapter 6 Working with existing PDFs example ConcatenateForms2.java:
PdfCopyFields copy
= new PdfCopyFields(new FileOutputStream(RESULT));
// add a document
PdfReader reader1 = new PdfReader(renameFieldsIn(DATASHEET, 1));
copy.addDocument(reader1);
// add a document
PdfReader reader2 = new PdfReader(renameFieldsIn(DATASHEET, 2));
copy.addDocument(reader2);
// Close the PdfCopyFields object
copy.close();
reader1.close();
reader2.close();
using the helper method
private static byte[] renameFieldsIn(String datasheet, int i)
throws IOException, DocumentException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// Create the stamper
PdfStamper stamper = new PdfStamper(new PdfReader(datasheet), baos);
// Get the fields
AcroFields form = stamper.getAcroFields();
// Loop over the fields
Set<String> keys = new HashSet<String>(form.getFields().keySet());
for (String key : keys) {
// rename the fields
form.renameField(key, String.format("%s_%d", key, i));
}
// close the stamper
stamper.close();
return baos.toByteArray();
}
Obviously you can tweak this by only renaming signature fields. This would merge other fields with "normal" form content but leave signatures as they are.
(remember, i don't refer to the validity of the signatures themselves, just the signature fields).
As an afterthought, flattening the signature fields before merging might be an alternative approach. The visual representation remains but the verification failure messages are gone because nothing is verified anymore.
A general remark on merging signed PDFs
Your intention
I have 2 documents i want to merge, each document has 2 pages, each page in each document is signed with a different signature
cannot be implemented without completely invalidating the signatures, at least those from all but one document. Have a look here for an introduction to integrated PDF signatures. Especially note how multiple integrated signatures in the same document work:
After a merge of two documents, you can keep the signatures from one document valid, but the added signatures of the other document only cover data from their document while after the merge they would have to cover data from both documents.
Thus, a merge is impossible without breaking at least some of the signatures.
Merging in more current iText versions
The OP uses an iText version 4.2.0. In current iText versions (5.5.x) much of the form aware logic has been moved from PdfCopyFields to PdfCopy. If you use such a version or a newer one, try and use PdfCopy.
#Bruno on merging signature fields
The result of the merge above will be utterly invalid in PDF-2, not merely because of the invalid signatures themselves but because of the signatures with multiple appearances. You might want to reconsider the behavior of the Pdf*Copy* class family concerning signature fields.

Related

Migrating from itext2 to itext7

Years ago, I wrote a small app in itext2 to gather reports on a weekly basis and concatenate them into one PDF. The app used com.lowagie.text.pdf.PdfCopy to copy and merge the PDFs. And it worked fine. Performed exactly as expected.
A few weeks ago I looked into migrating the application to itex7. To that end, I used the copyPagesTo method of com.itextpdf.kernel.pdf.PdfDocument. When run on the same file set, this produces warnings like:
WARN PdfNameTree - Name "section.1" already exists in the name tree; old value will be replaced by the new one.
When I click on the link to "section.1" in the first document of the merged PDF, I am taken to "section.1" of the last document. Not what I expected and not what happens when using the itext2 app. In the PDF's produced by itext2, if I click on the link to "section.1" of the first document in the combined PDF, I am taken to section 1 of the first document.
There is a hint in Javadocs for copyPagesTo saying
If outlines destination names are the same in different documents, all
such outlines will lead to a single location in the resultant
document. In this case iText will log a warning. This can be avoided
by renaming destinations names in the source document.
There is however, no explanation of how this should be done. I find it odd that this should be necessary in itext7, although it wasn't in itext2.
Is there a simple way to get around his problem?
I've also tried the Sejda desktop app and it produces correct results, but I would prefer to automate the process through a batch script.

My guess is iText 2 didn't even know it might be a problem.
If iText can't deduplicate destination names, the procedure is roughly:
Follow /Catalog -> /Names -> /Dests in each document to find the destination name tree.
Deduplicate the names, by adding suffixes. Remember that a name with a suffix added might be equal to an existing name in the same or another document. Be careful!
Now you can rewrite the destination name trees. Since you have only used suffixes, you can do this in place - the lexicographic ordering of the names is unaltered so the search tree structure is not broken.
Now, rewrite destination links in each PDF for the new names. For example any dictionary entry with key /Dest, or any /D in a /GoTo action.
Now, after all this preprocessing, the files will merge without name clashes.
(I know all this because I've just implemented it for my own PDF software. It's slightly hairy stuff, but not intractable.)
If you like, I can provide a devel version of cpdf with this functionality, if you would like to test it.

Digitally signed PDF correctly verified by ItextSharp 5.5.12 but not by Adobe Reader DC

This link (Document) contains a digitally signed PDF that is correctly verified by IText (version 5.5.12) but not by Adobe Reader DC which issues the following message:
Error during signature verification.
Unexpected byte range values defining scope of signed data.
Details: The signature byte range is invalid
Whos is correct? Adobe DC or IText? IText Bug?
Sample ITextSharp code used to PDF digital sign verification:
using System;
using System.Collections.Generic;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.security;
namespace ClassLibrary1
{
public class Class1
{
public Boolean PDFVerify(String file)
{
PdfReader pdfr = new PdfReader(file);
AcroFields af = pdfr.AcroFields;
List<String> names = af.GetSignatureNames();
foreach (String name in names)
{
PdfPKCS7 pk = af.VerifySignature(name);
if (!pk.Verify()) return false;
}
return true;
}
}
}

The problem is that the PDF itself is broken.
The PDF has two revisions,
a first revision apparently created using "Marvell Semiconductor, Inc. -- http://www.marvell.com" (according to the Producer info dictionary entry)
an incremental update to a second revision containing the signature apparently created using iTextSharp 5.4.3.
The first revision is broken in many ways, two obvious errors are:
Invalid header; according to ISO 32000-1 the first line of a PDF file shall be a header consisting of the 5 characters %PDF– followed by a version number of the form 1.N, where N is a digit between 0 and 7. Here the header line is
%PDF-1.4 Marvell Semiconductor
The above quoted rule does not allow for more than %PDF-1.N.
Invalid cross reference table entries; according to ISO 32000-1 each entry shall be exactly 20 bytes long, including the end-of-line marker. Here each entry is only 19 bytes long, including the end-of-line marker.
There might be more less obvious issues.
There are no obvious errors in the incremental update to the second revision.
When Adobe Reader opens this file, it recognizes that it is broken, creates a repaired version internally, and from now on uses this repaired version. When the signature is verified, it is verified in this repaired version. Such repairs, though, obviously will change the hash of the PDF and most likely will move the position of the signature. The signature byte range is expected to be the entire file, including the signature dictionary but excluding the signature value itself. If the repair moved the embedded signature, this requirement is not fulfilled anymore, resulting in the observed error message.
When iText opens this file, it probably also recognizes the errors but it merely creates its internal cross reference table and for every purpose besides cross reference lookups, it still works with the original file. Thus, the hash is correct and everything is located where it belongs, resulting in a verification success.

iText can not verify signed PDF document was edited by nitro pro 10/11 [duplicate]

This question already has an answer here:
PDF/A signed with iText7 allows changing attached documents without breaking a signature
(1 answer)
Closed 4 years ago.
I used nitro pro 10/11 to edit a signed PDF document.
Adobe reader can recognize the docs content has been modified, but integrity check is ok by iText (V5.5.6/V7.0.2).
How can i check whether the integrity is correct using iText?

iText offers an API to validate each integrated signature and to check whether it covers the whole document. It does not check, though, whether changes in incremental updates are allowed changes.
Some backgrounds
PDFs can be changed by a mechanism called incremental updates. This mechanism only appends to the file leaving the original bytes unchanged. With each such incremental update a new revision of the file is added to the file.
Integrated PDF signatures sign the complete revision of the document in which they were added to the file with the obvious exception of the actual signature bytes.
Thus, a signature of a former revision still correctly signs its byte ranges even if a later revision completely changes how the PDF is displayed.
As in common signing use cases content someone signed should not be arbitrarily changed, the PDF specification only considers very few types of changes in incremental updates to signed revisions valid, cf. this answer on stack overflow and the documents referenced from there.
What iText's API offers
iText offers an API to validate each integrated signature, in particular this validation checks whether that integrated signature correctly signs the bytes it applies to.
iText furthermore determines whether a signature covers the whole file, i.e. the latest revision of the file.
iText does not offer simple API functions, though, that check whether the changes in incremental updates to signed revisions are all allowed.
This task actually is decidedly non-trivial as the allowed changes are not specified in deep detail; I am not aware of any proper open source implementation of it. Even Adobe Reader in its code for this check has many false negatives for PDFs in which the allowed changes are implemented differently than Adobe Reader would have done it itself, e.g. see this answer.
iText does offer the low-level tools to implement such tests, though, so anyone is welcome to implement them on top of iText.
iText checks in code
You can execute the checks with iText 5.5.10 like this:
PdfReader reader = new PdfReader(resource);
AcroFields acroFields = reader.getAcroFields();
List<String> names = acroFields.getSignatureNames();
for (String name : names) {
System.out.println("Signature name: " + name);
System.out.println("Signature covers whole document: " + acroFields.signatureCoversWholeDocument(name));
System.out.println("Document revision: " + acroFields.getRevision(name) + " of " + acroFields.getTotalRevisions());
PdfPKCS7 pk = acroFields.verifySignature(name);
System.out.println("Subject: " + CertificateInfo.getSubjectFields(pk.getSigningCertificate()));
System.out.println("Document verifies: " + pk.verify());
}
(VerifySignature test testVerifyBabyloveSigned)
The outputs for the OP's sample files are:
babylove_signed.pdf
===================
Signature name: Fadadaf1a333d3-d51a-4fbb-ad22-bbdcaddd7d8e
Signature covers whole document: true
Document revision: 1 of 1
Subject: {C=[CN], OU=[fabigbig, Individual-1], CN=[051#???#352229198405072013#2], O=[CFCA OCA1]}
Document verifies: true
babylove_signed&modify_by_nitro.pdf
===================
Signature name: Fadadaf1a333d3-d51a-4fbb-ad22-bbdcaddd7d8e
Signature covers whole document: false
Document revision: 1 of 2
Subject: {C=[CN], OU=[fabigbig, Individual-1], CN=[051#???#352229198405072013#2], O=[CFCA OCA1]}
Document verifies: true
As you see, signed.pdf has only one revision, its integrated signature is valid, and it covers the whole file. signed&modify_by_nitro.pdf has two revisions, its integrated signature is valid but it covers only revision one.
Thus, iText says that while the signature in the latter file does correctly sign its revision, there may be any amount of changes in the second revision.

merge multiple docx files into a single docx file

I came across solutions on how to merge docx files using c#:
Append multiple DOCX files together
In this solution, he iterates through files and copies the body "outerxml" to a new document:
XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);
newBody.Add(tempBody);
This looks somethign specific to c# api. But I am using Ruby. So far I have been able to edit the docx file and make changes to it by editing "word/document.xml". However, now I need to merge multiple docx files, and I would like to know if there is a specific xml file in openxml that encompasses an entire document, so that I can use that to copy into another document.

The main document part (usually at word/document.xml) contains the text of the body of the docx. Headers/footers/comments/footnotes/endnotes are elsewhere.
The problem is that the main document part will often refer to other parts, and you need to manage these references.
Some of these references (eg images, headers, footers) are via "relationships" in the rels part; others are styles, comment ids etc.
If your documents are predictable and simple, you could handle those cases yourself. Otherwise, you'd be better of using http://openxmldeveloper.org/wiki/w/wiki/documentbuilder.aspx (C#) or our commercial MergeDocx component (Java).

ITextSharp - Adding a Watermark; Leaving the PDF editable - FormFlattening = false

We have a large amount of PDF files that we are creating a web site to allow users to download them and when they do we want to:
Put a watermark on it with their name.
We want the form fields to be left open so they can enter their information.
We want to be able to print and save the document
When I put the Watermark on the document and then open it I get a message from Adobe:
"The document has been changed since it was created and use of extended features is no longer available. Please contact the author..."
According the book "iText-in-Action", this is a security issue (Chapter 8). There seems to be 2 ways to open them:
Remove usage rights : This breaks # 3 above.
Open it in append mode : It does not matter if modify it and save it with "FormFlattening = false" or true, if I put a water mark on the form the fields are no longer editable.
The error message from Adobe does describe the problem, I have modified the content of the document with the watermark, and the form fields become blocked because of this.
I have tried opening the document putting the watermark on it and saving it to a new file, and then closing it. Then reopening it and the trying to unblock the form fields, but it does not work.
Does anyone know if this is possible?
I have read something about templates; I don't know if this is a solution because of the work to convert the documents to templates? Does anyone know if this would help?
Below is a sample of my code for using an Image as a watermark, although I have tried adding text as well:
PdfReader reader = new PdfReader(sourceFile.FullName);
//reader.RemoveUsageRights();
var fileStream = new FileStream(outputPath, FileMode.Create, FileAccess.ReadWrite);
PdfStamper pdfStamper = new PdfStamper(reader, fileStream, '\0', true);
Image image = Image.GetInstance(imagePath);
image.SetAbsolutePosition(250, 300);
for (int i = 1; i <= reader.NumberOfPages; i++) // Must start at 1 because 0 is not an actual page.
{
PdfContentByte pdfPageContents = pdfStamper.GetUnderContent(i);
pdfPageContents.AddImage(image);
}
pdfStamper.FormFlattening = false; // enable this if you want the PDF flattened.
//bool have = pdfStamper.PartialFormFlattening("test");
pdfStamper.Close(); // Always close the stamper or you'll have a 0 byte stream.

A document that is Reader-enabled is digitally signed using a private key owned by Adobe. If Adobe Reader can validate that signature using Adobe's public key, the extra functionality (e.g. allowing you to save a form that has been filled out) is enabled.
Adding a watermark isn't part of the actions you're allowed to do with a digitally signed document. There is absolutely no way you can achieve what you want without invalidating the digital signature that triggers the reader enabling.
In short: you're trying to do something that is impossible. You can only achieve this by using Adobe software because you need Adobe's private key to 'restore' the reader enabling after breaking it.

Good advice.
Every time I reboot my computer Adobe complains about needing to be updated. Last thing I want is to be stuck with a hack, that may not work in the future.
One of my attempts was to create a watermark on a different layer of the PDF in the hopes that it would not see it as changing the text layer of the PDF, but this did not work. My boss had a thought of grabing the text from the original PDF and coping it to a new document and then putting the watermark on it. even though I created a new PDF it still sees it as modifying it, and the fields are still not editable.
Still stuck