Add file with bookmark - itext

I want to add a PDF file using iTextSharp but if PDF file contains bookmarks then they should also be added.
Currently I'm using following code
Document document = new Document();
//Step 2: we create a writer that listens to the document
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(outputFileName, FileMode.Create));
writer.ViewerPreferences = PdfWriter.PageModeUseOutlines;
//Step 3: Open the document
document.Open();
PdfContentByte cb = writer.DirectContent;
//The current file path
string filename = "D:\\rtf\\2.pdf";
// we create a reader for the document
PdfReader reader = new PdfReader(filename);
//Chapter ch = new Chapter("", 1);
for (int pageNumber = 1; pageNumber < reader.NumberOfPages + 1; pageNumber++)
{
document.SetPageSize(reader.GetPageSizeWithRotation(1));
document.NewPage();
// Insert to Destination on the first page
if (pageNumber == 1)
{
Chunk fileRef = new Chunk(" ");
fileRef.SetLocalDestination(filename);
document.Add(fileRef);
}
PdfImportedPage page = writer.GetImportedPage(reader, pageNumber);
int rotation = reader.GetPageRotation(pageNumber);
if (rotation == 90 || rotation == 270)
{
cb.Add(page);
}
else
{
cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
document.Close();

Please read Chapter 6 of my book. In table 6.1, you'll read:
Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.
This is exactly what you experience. However, if you look at the other classes listed in that table, you'll discover PdfStamper, PdfCopy, etc... which are classes that do preserve interactive features.
PdfStamper will keep the bookmarks. If you want to use PdfCopy (or PdfSmartCopy), you need to read chapter 7 to find out how to keep them. Chapter 7 isn't available for free, but you can consult the examples here: Java / C#. You need the ConcatenateBookmarks example.
Note that you're code currently looks convoluted because you're not using the correct classes. Using PdfStamper should significantly reduce the number of lines of code.

Related

Copying annotations with PdfWriter instead of PdfCopy

I need to copy annotations using PdfWriter instead of PdfCopy because at the time of the copy I need to resize/rotate the page. Can anyone tell me how to do this?
You think you need to use a plain PdfWriter instead of a PdfCopy for copying PDFs because you need to resize/rotate the page and iText in Action, 2nd Ed, says doing so is not possible with the PdfCopy class. Thus, you look for a way to copy annotations in such a context.
What you should look for instead is a way to rotate or resize pages and at the same time use PdfCopy nonetheless!
While it is true that the PdfCopy class itself does not allow resizing or rotating pages, you can manipulate a PDF loaded into a PdfReader and resize and/or rotate its pages before using the PdfCopy class. If you then copy the pages from this manipulated PdfReader into a PdfCopy, you get a result with resized or rotated pages (due to the manipulated PdfReader) and all the annotations present (due to the use of a PdfCopy).
E.g. you can resize all the pages in a PdfReader like this:
void resize(PdfReader pdfReader, float width, float height) {
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
boolean switched = pdfReader.getPageRotation(i) % 180 != 0;
float widthHere = switched ? height : width;
float heightHere = switched ? width : height;
Rectangle cropBox = pdfReader.getCropBox(i);
float halfWidthGain = (widthHere - cropBox.getWidth()) / 2;
float halfHeightGain = (heightHere - cropBox.getHeight()) / 2;
Rectangle newCropBox = new Rectangle(cropBox.getLeft() - halfWidthGain, cropBox.getBottom() - halfHeightGain,
cropBox.getRight() + halfWidthGain, cropBox.getTop() + halfHeightGain);
Rectangle mediaBox = pdfReader.getPageSize(i);
Rectangle newMediaBox = new Rectangle(Math.min(newCropBox.getLeft(), mediaBox.getLeft()),
Math.min(newCropBox.getBottom(), mediaBox.getBottom()),
Math.max(newCropBox.getRight(), mediaBox.getRight()),
Math.max(newCropBox.getTop(), mediaBox.getTop()));
PdfDictionary pageDictionary = pdfReader.getPageN(i);
pageDictionary.put(PdfName.MEDIABOX, new PdfArray(new float[] {newMediaBox.getLeft(), newMediaBox.getBottom(),
newMediaBox.getRight(), newMediaBox.getTop()}));
pageDictionary.put(PdfName.CROPBOX, new PdfArray(new float[] {newCropBox.getLeft(), newCropBox.getBottom(),
newCropBox.getRight(), newCropBox.getTop()}));
}
}
(CopyWithResizeRotate helper method)
and you can rotate all the pages in a PdfReader like this:
void rotate(PdfReader pdfReader) {
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
int rotation = pdfReader.getPageRotation(i);
int newRotation = rotation + 90 % 360;
PdfDictionary pageDictionary = pdfReader.getPageN(i);
if (newRotation == 0)
pageDictionary.remove(PdfName.ROTATE);
else
pageDictionary.put(PdfName.ROTATE, new PdfNumber(newRotation));
}
}
(CopyWithResizeRotate helper method)
Using these helpers, you can e.g. create a PDF from the rotated and/or resized pages of some source PDF and copy them like this:
byte[] wildPdf = RETRIEVE_SOURCE_PDF;
PdfReader pdfReaderOriginal = new PdfReader(wildPdf);
PdfReader pdfReaderRotate = new PdfReader(wildPdf);
rotate(pdfReaderRotate);
PdfReader pdfReaderResize = new PdfReader(wildPdf);
resize(pdfReaderResize, PageSize.LETTER.getWidth(), PageSize.LETTER.getHeight());
PdfReader pdfReaderRotateResize = new PdfReader(wildPdf);
rotate(pdfReaderRotateResize);
resize(pdfReaderRotateResize, PageSize.LETTER.getWidth(), PageSize.LETTER.getHeight());
try ( OutputStream os = new FileOutputStream(new File(RESULT_FOLDER, "wild-rotated-resized.pdf"))) {
Document document = new Document();
PdfCopy pdfCopy = new PdfCopy(document, os);
document.open();
pdfCopy.addDocument(pdfReaderOriginal);
pdfCopy.addDocument(pdfReaderRotate);
pdfCopy.addDocument(pdfReaderResize);
pdfCopy.addDocument(pdfReaderRotateResize);
document.close();
}
(CopyWithResizeRotate test method testRotateResizeAndCopy)
The result can look as follows, the first row the original pages (#1 A4, #2 HALFLETTER, #3 A5, #4 A5 rotated, #5 500x700), the second row the rotated ones, the third row the resized ones (to LETTER), and the fourth row the rotated and resized ones (to LETTER). The Adobe Reader thumbnails unfortunately are not at all to scale:
Manipulating a single PDF only
If you actually only want to resize/rotate pages of a single input PDF, you should not use a PdfCopy instance but instead a PdfStamper:
PdfReader pdfReader = new PdfReader(SOURCE);
[...manipulate properties of the pdfReader like above...]
new PdfStamper(pdfReader, TARGET_STREAM).close();
The advantage here is that not only page-level data but also document-level data of the original document are retained.
Special annotations
There is one type of annotations which will behave in an unexpected manner with the code above: annotations with the NoRotate flag set. Such annotations will behave like this when their host page is rotated:
(ISO 32000-2 section 12.5.3 — Annotation flags)

Placing Text on Multiple Imported PDF Pages with iTextSharp [duplicate]

I am trying to add a header to existing pdf documents in Java with iText. I can add the header at a fixed place on the document, but all the documents are different page sizes, so it is not always at the top of the page. I have tried getting the page size so that I could calculate the position of the header, but it seems as if the page size is not actually what I want. On some documents, calling reader.getPageSize(i).getTop(20) will place the text in the right place at the top of the page, however, on some different documents it will place it half way down the page. Most of the pages have been scanned be a Xerox copier, if that makes a difference. Here is the code I am using:
PdfReader reader = new PdfReader(readFilePath);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(writeFilePath));
BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
PdfContentByte cb = stamper.getOverContent(i);
cb.beginText();
cb.setFontAndSize(bf, 14);
float x = reader.getPageSize(i).getWidth() / 2;
float y = reader.getPageSize(i).getTop(20);
cb.showTextAligned(PdfContentByte.ALIGN_CENTER, "Copy", x, y, 0);
cb.endText();
}
stamper.close();
PDF that works correctly
PDF that works incorrectly
Take a look at the StampHeader1 example. I adapted your code, introducing ColumnText.showTextAligned() and using a Phrase for the sake of simplicity (maybe you can change that part of your code too):
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Phrase header = new Phrase("Copy", new Font(FontFamily.HELVETICA, 14));
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
float x = reader.getPageSize(i).getWidth() / 2;
float y = reader.getPageSize(i).getTop(20);
ColumnText.showTextAligned(
stamper.getOverContent(i), Element.ALIGN_CENTER,
header, x, y, 0);
}
stamper.close();
reader.close();
}
As you have found out, this code assumes that no rotation was defined.
Now take a look at the StampHeader2 example. I'm using your "Wrong" file and I've added one extra line:
stamper.setRotateContents(false);
By telling the stamper not to rotate the content I'm adding, I'm adding the content using the coordinates as if the page isn't rotated. Please take a look at the result: stamped_header2.pdf. We added "Copy" at the top of the page, but as the page is rotated, we see the word appear on the side. The word is rotated because the page is rotated.
Maybe that's what you want, maybe it isn't. If it isn't, please take a look at StampHeader3 in which I calculate x and y differently, based on the rotation of the page:
if (reader.getPageRotation(i) % 180 == 0) {
x = reader.getPageSize(i).getWidth() / 2;
y = reader.getPageSize(i).getTop(20);
}
else {
x = reader.getPageSize(i).getHeight() / 2;
y = reader.getPageSize(i).getRight(20);
}
Now the word "Copy" appears on what is perceived as the "top of the page" (but in reality, it could be the side of the page): stamped_header3.pdf

inserting pages with PdfStamper does not import form fields

I'm inserting pages pages into pdf doc. The pages don't get added at the end or begining they need to be inserted in the middle somewhere (I have a way to determine the insert location with bookmarks).
The key is not to loose bookmarks. So I'm using PdfStamper to insert the pages. The problem is pdfs that are being inserted have form fields and those fields are not coming through.
The code that does inserting
for (int pageNum = 1; pageNum <= readerPdfToAdd.NumberOfPages; pageNum++)
{
PdfImportedPage page = pdfStamper.GetImportedPage(readerPdfToAdd, pageNum);
pdfStamper.InsertPage(filesByCategory[i].PageOfInsert + pageNum, readerPdfToAdd.GetPageSizeWithRotation(pageNum));
var rotation = readerPdfToAdd.GetPageRotation(pageNum);
if (rotation == 90 || rotation == 270)
{
pdfStamper.GetUnderContent(filesByCategory[i].PageOfInsert + pageNum)
.AddTemplate(page, 0, -1f, 1f, 0, 0, readerPdfToAdd.GetPageSizeWithRotation(pageNum).Height);
}
else
{
pdfStamper.GetUnderContent(filesByCategory[i].PageOfInsert + pageNum).AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
I tried something like this to copy the fields but this doesn't copy exactly.
foreach (KeyValuePair<string, AcroFields.Item> kvp in pdfFormFields.Fields)
{
var s = pdfFormFields.GetFieldPositions("Date");
PdfArray r = kvp.Value.GetWidget(0).GetAsArray(PdfName.RECT);
var name = kvp.Value.GetWidget(0).GetAsArray(PdfName.NAME);
Rectangle rr = new Rectangle(r.GetAsNumber(0).FloatValue, r.GetAsNumber(1).FloatValue, r.GetAsNumber(2).FloatValue, r.GetAsNumber(3).FloatValue);
TextField field = new TextField(pdfStamper.Writer, rr, kvp.Value.GetWidget(0).Get(PdfName.T).ToString());
if (kvp.Value.GetWidget(0).Get(PdfName.V) != null)
field.Text = kvp.Value.GetWidget(0).Get(PdfName.V).ToString();
// add the field here, the second param is the page you want it on
pdfStamper.AddAnnotation(field.GetTextField(), filesByCategory[i].PageOfInsert + pageNum);
fields.SetField(kvp.Key, kvp.Value.ToString());
}
Is there a better way to do this? I've tried PdfCopy that looses bookmarks on the source document.
Please use PdfCopy or PdfSmartCopy to assemble documents. As documented in chapter 6 of my book a PdfImportedPage only copies what is in the content stream.
You probably need PdfStamper to stamp some extra content on the original document, and that is fine, but you need to combine this with PdfCopy or PdfSmartCopy to assemble the final document.

Remove unused image objects

I have PDF files that are being created with a composition tool to produce financial statements.
The PDF files are around the 5000 - 10000 pages per file using global image resources to maximise space efficiences.
These statements include marketing images. Many of them (about 3mb worth), not every particular statements uses all the images.
When I extract the PDF file using a tool that has been developed for this purpose (or if I use adobe acrobat just for testing purposes) - to extract a blank page at the start of the PDF file, the resulting extracted PDF is around the 3mb. Auditing the space usage sees that it is comprised of 3mb of images.
Using iTextSharp (latest 5.4.4) I have attempted to iterate through each page and copy to a writer calling reader.RemoveUnusedObjects. But this does not reduce the size.
I also found another example to use a pdfstamper and tried the same thing. Same result.
I've also tried setting maximum compression and SetFullCompression. Neither made any difference.
Can anyone give me any pointers for what I might do. I'm hoping I can do it as a simple exercise and not have to parse the objects in the PDF file and manually remove the unused ones.
Code Below:
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(inputFile);
iTextSharp.text.Document document = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(1));
// step 2: we create a writer that listens to the document
// step 3: we open the document
iTextSharp.text.pdf.PdfCopy pdfCpy = new iTextSharp.text.pdf.PdfCopy(document, new System.IO.FileStream(outputFile, System.IO.FileMode.Create));
document.Open();
iTextSharp.text.pdf.PdfContentByte cb = pdfCpy.DirectContent;
//pdfCpy.NewPage();
int objects = reader.RemoveUnusedObjects();
reader.RemoveFields();
reader.RemoveAnnotations();
// we retrieve the total number of pages
int numberofPages = reader.NumberOfPages;
int i = 0;
while (i < numberofPages)
{
i++;
document.SetPageSize(reader.GetPageSizeWithRotation(i));
document.NewPage();
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader, i);
pdfCpy.SetFullCompression();
reader.RemoveUnusedObjects();
reader.RemoveFields();
reader.RemoveAnnotations();
int rotation = reader.GetPageRotation(i);
if (rotation == 90 || rotation == 270)
{
cb.AddTemplate(page, 0, -1f, 1f, 0, 0, reader.GetPageSizeWithRotation(i).Height);
}
else
{
cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
pdfCpy.AddPage(page);
}
pdfCpy.NewPage();
pdfCpy.Add(new iTextSharp.text.Paragraph("This is added text"));
document.Close();
pdfCpy.CompressionLevel = iTextSharp.text.pdf.PdfStream.BEST_COMPRESSION;
pdfCpy.Close();
reader.Close();
Stamper example:
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(inputFile);
using (FileStream fs = new FileStream(outputFile + ".2" , FileMode.Create))
{
iTextSharp.text.pdf.PdfStamper stamper = new iTextSharp.text.pdf.PdfStamper(reader, fs, iTextSharp.text.pdf.PdfWriter.VERSION_1_5);
iTextSharp.text.pdf.PdfWriter writer = stamper.Writer;
writer.SetPdfVersion(iTextSharp.text.pdf.PdfWriter.PDF_VERSION_1_5);
writer.CompressionLevel = iTextSharp.text.pdf.PdfStream.BEST_COMPRESSION;
reader.RemoveFields();
reader.RemoveUnusedObjects();
stamper.Reader.RemoveUnusedObjects();
stamper.SetFullCompression();
stamper.Writer.SetFullCompression();
stamper.Close();
}
reader.Close();
Try using iTextSharp.text.pdf.PdfSmartCopy instead of PdfCopy.
For me it decreased a PDF with a size of ~43MB PDF to ~4MB.

Chapter-2 example on countrychunks

I am trying to run Countrychunks example from the Chapter 2. The example works but the line: document.Add(Chunk.NEWLINE); does not produce the new line and the loop overwrites the first line. I am posting my code here in case I am doing anything wrong:
public void createCountryChunks(String fileName)
{
iTextSharp.text.Font font;
Document document = new iTextSharp.text.Document();
//PdfWriter.GetInstance(document, new FileStream(fileName)).setInitialLeading(16);
PdfWriter.GetInstance(document, new FileStream(fileName, FileMode.Create));
document.Open();
font = new iTextSharp.text.Font(iTextSharp.text.Font.FontFamily.HELVETICA, 6, iTextSharp.text.Font.BOLD, iTextSharp.text.BaseColor.WHITE);
foreach (var p in myProducts)
{
// add a country to the document as a Chunk
document.Add(new Chunk(p.pr_name));
document.Add(new Chunk(" "));
Chunk id = new Chunk(p.pr_URN.ToString(), font);
// with a background color
id.SetBackground(BaseColor.BLACK, 1f, 0.5f, 1f, 1.5f);
// and a text rise
id.SetTextRise(6);
document.Add(id);
document.Add(Chunk.NEWLINE);
}
document.Close();
}
As you can see the example is a bit different because of the data but the rest is almost the same as the original Java example.
Any suggestions please?
The setInitialLeading call that you weren't able to bring over and was commented out was actually very important. Adding that back in will solve your problems. I really don't like to add properties directly onto my constructed objects so I'm going to do it in two lines:
var w = PdfWriter.GetInstance(document, new FileStream(fileName, FileMode.Create));
w.InitialLeading = 16;