Issues converting certain TIF compressions to PDF using iTextSharp - itext

I am using iTextSharp to convert & stitch single-page TIF files to multi-page PDF file. The single-page TIF files are of different bit depths and compressions.
Here is the code-
private void button1_Click(object sender, EventArgs e)
{
List<string> TIFfiles = new List<string>();
Document document;
PdfWriter pdfwriter;
Bitmap tifFile;
pdfFilename = <file path>.PDF;
TIFfiles = <load the path to each TIF file in this array>;
//Create document
document = new Document();
// creation of the different writers
pdfwriter = PdfWriter.GetInstance(document, new System.IO.FileStream(pdfFilename, FileMode.Create));
document.Open();
document.SetMargins(0, 0, 0, 0);
foreach (string file in TIFfiles)
{
//load the tiff image
tifFile = new Bitmap(file);
//Total number of pages
iTextSharp.text.Rectangle pgSize = new iTextSharp.text.Rectangle(tifFile.Width, tifFile.Height);
document.SetPageSize(pgSize);
document.NewPage();
PdfContentByte cb = pdfwriter.DirectContent;
tifFile.SelectActiveFrame(FrameDimension.Page, 0);
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(tifFile, ImageFormat.Tiff);
// scale the image to fit in the page
img.SetAbsolutePosition(0, 0);
cb.AddImage(img);
}
document.Close();
}
This code works well and stitches & converts tifs to PDF. Issue is with processing time and pdf file size that it creates when processing certain types of TIFs.
For e.g.
Original TIF --> B&W/Bit depth 1/Compression CCITT T.6 --> Faster processing, PDF file size is ~1.1x times the TIF file size.
Original TIF --> Color/Bit depth 8/Compression LZW --> Faster processing, PDF file size is ~1.1x times the TIF file size.
Original TIF --> Color/Bit depth 24/Compression JPEG--> Slow processing, PDF file size is ~12.5x times the TIF file size.
Why doesn't converting Color/Bit depth 24/Compression JPEG files gives similar result as other tif files?
Moreover, this issue is only with iTextSharp. I had a colleague test the same set of TIF samples using Java-iText and the resulting PDF was of smaller size (1.1x times) and had faster processing.
Unfortunately, I need to use .Net for this TIF to PDF conversion, so am stuck with using iTextSharp.
Any ideas/suggestions on how to get those Compression JPEG TIF files to create smaller size PDFs as it does for other TIF compressions?
Appreciate your help!
Regards,
AG

I was able to reproduce your problem with the code you supplied, but found that the problem went away once I used Image.GetInstance instead of the bitmap used in your sample. When using the code below, the file size and run time was the same between Java and C#. If you have any questions about the sample, don't hesitate to ask.
List<string> TIFfiles = new List<string>();
Document document;
PdfWriter pdfwriter;
iTextSharp.text.Image tifFile;
String pdfFilename = pdfFile;
TIFfiles = new List<string>();
TIFfiles.Add(tifFile1);
TIFfiles.Add(tifFile2);
TIFfiles.Add(tifFile3);
TIFfiles.Add(tifFile4);
TIFfiles.Add(tifFile5);
TIFfiles.Add(tifFile6);
TIFfiles.Add(tifFile7);
//Create document
document = new Document();
// creation of the different writers
pdfwriter = PdfWriter.GetInstance(document, new System.IO.FileStream(pdfFilename, FileMode.Create));
document.Open();
document.SetMargins(0, 0, 0, 0);
int i = 0;
while (i < 50)
{
foreach (string file in TIFfiles)
{
//load the tiff image
tifFile = iTextSharp.text.Image.GetInstance(file);
//Total number of pages
iTextSharp.text.Rectangle pgSize = new iTextSharp.text.Rectangle(tifFile.Width, tifFile.Height);
document.SetPageSize(pgSize);
document.NewPage();
PdfContentByte cb = pdfwriter.DirectContent;
// scale the image to fit in the page
tifFile.SetAbsolutePosition(0, 0);
cb.AddImage(tifFile);
}
i++;
}
document.Close();

Related

How to convert PowerPoint (.ppt) to PDF in Java

Only last slide is getting convert means last slide overlapping every slides. Can anyone suggest how to combine it in one PDF?
I have tried with different approach but they are first creating image and then PDF.
FileInputStream inputStream = new FileInputStream(in);
SlideShow ppt = new SlideShow(inputStream);
inputStream.close();
Dimension pgsize = ppt.getPageSize();
Document document = new Document();
PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(out));
document.setPageSize(new Rectangle(
(float)pgsize.getWidth(), (float)pgsize.getHeight()));
document.open();
PdfGraphics2D graphics= null;
for (int i = 0; i < ppt.getSlides().length; i++) {
Slide slide = ppt.getSlides()[i];
graphics = (PdfGraphics2D) pdfWriter.getDirectContent()
.createGraphics((float)pgsize.getWidth(), (float)pgsize.getHeight());
slide.draw(graphics);
}
graphics.dispose();
document.close();

how to calculate the image scalesize of a watermark image

I have a piece of code to put an image watermark into an existing pdf. I am looking for a way to calculate the scale percentage of the watermark image
public void MixFiles(String wmrk, String src, String dest)
{
string watermarkedFile = dest;
PdfReader pdfReader = new PdfReader(src);
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write, FileShare.None));
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(wmrk);
PdfContentByte waterMark;
for (int pageIndex = 1; pageIndex <= pdfReader.NumberOfPages; pageIndex++)
{
waterMark = pdfStamper.GetOverContent(pageIndex);
// the scale percent is found by trial and error how can I calculate it??
img.ScalePercent(24f);
img.SetAbsolutePosition(0f, 0f);
waterMark.AddImage(img);
}
pdfStamper.FormFlattening = true;
pdfStamper.Close();
}
My code works so far but what happens with other watermark image. What does the scale proportion depend on? The watermark image is a png with a size of 210x297mm the source pdf to bestamped has also pages with 210x297mm both have a resolution of 300 dpi.
I found out that itext uses internaly a 72 dpi resolution. The original resolution of the watermark png is 300 dpi. So 72/300 results with 0.24.
I tried with a 400 dpi watermark and get the expected result with 72/400= 0.18.
For unknown watermark resolution I now use
System.Drawing.Image newImage = System.Drawing.Image.FromFile(wmrk);
var reso=newImage.HorizontalResolution;
float scalepercent = (72/reso)*100;
img.ScalePercent(scalepercent);

Print byte[] to PDF using PDFBox

I have a question about writing image to PDF using PDFBox.
My requirement is very simple: I get an image from a web service using Spring RestTemplate, I store it in a byte[] variable, but I need to draw the image into a PDF document.
I know that the following is provided:
final byte[] image = this.restTemplate.getForObject(
this.imagesUrl + cableReference + this.format,
byte[].class
);
JPEGFactory.createFromStream() for JPEG format, CCITTFactory.createFromFile() for TIFF images, LosslessFactory.createFromImage() if starting with buffered images. But I don't know what to use, as the only information I know about those images is that they are in THUMBNAIL format and I don't know how to convert from byte[] to those formats.
Thanks a lot for any help.
(This applies to version 2.0, not to 1.8)
I don't know what you mean with THUMBNAIL format, but give this a try:
final byte[] image = ... // your code
ByteArrayInputStream bais = new ByteArrayInputStream(image);
BufferedImage bim = ImageIO.read(bais);
PDImageXObject pdImage = LosslessFactory.createFromImage(doc, bim);
It might be possible to create a more advanced solution by using
PDImageXObject.createFromFileByContent()
but this one uses a file and not a stream, so it would be slower (but produce the best possible image type).
To add this image to your PDF, use this code:
PDDocument doc = new PDDocument();
try
{
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contents = new PDPageContentStream(doc, page);
// draw the image at full size at (x=20, y=20)
contents.drawImage(pdImage, 20, 20);
// to draw the image at half size at (x=20, y=20) use
// contents.drawImage(pdImage, 20, 20, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.close();
doc.save(pdfPath);
}
finally
{
doc.close();
}

Remove unused image objects

I have PDF files that are being created with a composition tool to produce financial statements.
The PDF files are around the 5000 - 10000 pages per file using global image resources to maximise space efficiences.
These statements include marketing images. Many of them (about 3mb worth), not every particular statements uses all the images.
When I extract the PDF file using a tool that has been developed for this purpose (or if I use adobe acrobat just for testing purposes) - to extract a blank page at the start of the PDF file, the resulting extracted PDF is around the 3mb. Auditing the space usage sees that it is comprised of 3mb of images.
Using iTextSharp (latest 5.4.4) I have attempted to iterate through each page and copy to a writer calling reader.RemoveUnusedObjects. But this does not reduce the size.
I also found another example to use a pdfstamper and tried the same thing. Same result.
I've also tried setting maximum compression and SetFullCompression. Neither made any difference.
Can anyone give me any pointers for what I might do. I'm hoping I can do it as a simple exercise and not have to parse the objects in the PDF file and manually remove the unused ones.
Code Below:
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(inputFile);
iTextSharp.text.Document document = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(1));
// step 2: we create a writer that listens to the document
// step 3: we open the document
iTextSharp.text.pdf.PdfCopy pdfCpy = new iTextSharp.text.pdf.PdfCopy(document, new System.IO.FileStream(outputFile, System.IO.FileMode.Create));
document.Open();
iTextSharp.text.pdf.PdfContentByte cb = pdfCpy.DirectContent;
//pdfCpy.NewPage();
int objects = reader.RemoveUnusedObjects();
reader.RemoveFields();
reader.RemoveAnnotations();
// we retrieve the total number of pages
int numberofPages = reader.NumberOfPages;
int i = 0;
while (i < numberofPages)
{
i++;
document.SetPageSize(reader.GetPageSizeWithRotation(i));
document.NewPage();
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader, i);
pdfCpy.SetFullCompression();
reader.RemoveUnusedObjects();
reader.RemoveFields();
reader.RemoveAnnotations();
int rotation = reader.GetPageRotation(i);
if (rotation == 90 || rotation == 270)
{
cb.AddTemplate(page, 0, -1f, 1f, 0, 0, reader.GetPageSizeWithRotation(i).Height);
}
else
{
cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
pdfCpy.AddPage(page);
}
pdfCpy.NewPage();
pdfCpy.Add(new iTextSharp.text.Paragraph("This is added text"));
document.Close();
pdfCpy.CompressionLevel = iTextSharp.text.pdf.PdfStream.BEST_COMPRESSION;
pdfCpy.Close();
reader.Close();
Stamper example:
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(inputFile);
using (FileStream fs = new FileStream(outputFile + ".2" , FileMode.Create))
{
iTextSharp.text.pdf.PdfStamper stamper = new iTextSharp.text.pdf.PdfStamper(reader, fs, iTextSharp.text.pdf.PdfWriter.VERSION_1_5);
iTextSharp.text.pdf.PdfWriter writer = stamper.Writer;
writer.SetPdfVersion(iTextSharp.text.pdf.PdfWriter.PDF_VERSION_1_5);
writer.CompressionLevel = iTextSharp.text.pdf.PdfStream.BEST_COMPRESSION;
reader.RemoveFields();
reader.RemoveUnusedObjects();
stamper.Reader.RemoveUnusedObjects();
stamper.SetFullCompression();
stamper.Writer.SetFullCompression();
stamper.Close();
}
reader.Close();
Try using iTextSharp.text.pdf.PdfSmartCopy instead of PdfCopy.
For me it decreased a PDF with a size of ~43MB PDF to ~4MB.

Add file with bookmark

I want to add a PDF file using iTextSharp but if PDF file contains bookmarks then they should also be added.
Currently I'm using following code
Document document = new Document();
//Step 2: we create a writer that listens to the document
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(outputFileName, FileMode.Create));
writer.ViewerPreferences = PdfWriter.PageModeUseOutlines;
//Step 3: Open the document
document.Open();
PdfContentByte cb = writer.DirectContent;
//The current file path
string filename = "D:\\rtf\\2.pdf";
// we create a reader for the document
PdfReader reader = new PdfReader(filename);
//Chapter ch = new Chapter("", 1);
for (int pageNumber = 1; pageNumber < reader.NumberOfPages + 1; pageNumber++)
{
document.SetPageSize(reader.GetPageSizeWithRotation(1));
document.NewPage();
// Insert to Destination on the first page
if (pageNumber == 1)
{
Chunk fileRef = new Chunk(" ");
fileRef.SetLocalDestination(filename);
document.Add(fileRef);
}
PdfImportedPage page = writer.GetImportedPage(reader, pageNumber);
int rotation = reader.GetPageRotation(pageNumber);
if (rotation == 90 || rotation == 270)
{
cb.Add(page);
}
else
{
cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
document.Close();
Please read Chapter 6 of my book. In table 6.1, you'll read:
Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.
This is exactly what you experience. However, if you look at the other classes listed in that table, you'll discover PdfStamper, PdfCopy, etc... which are classes that do preserve interactive features.
PdfStamper will keep the bookmarks. If you want to use PdfCopy (or PdfSmartCopy), you need to read chapter 7 to find out how to keep them. Chapter 7 isn't available for free, but you can consult the examples here: Java / C#. You need the ConcatenateBookmarks example.
Note that you're code currently looks convoluted because you're not using the correct classes. Using PdfStamper should significantly reduce the number of lines of code.