Flying Saucer: itext html to pdf conversion taking long time offline - itext

I used Flying Saucer library to convert html to pdf. It used to work fine if the system is connected to internet. But if system is not connected to internet then it is taking around 5 minutes. Is there any parameter through which i can stop itextrenderer to look for images online.
Below is the method i used to convert HTML to PDF :
LOGGER.info("Converting HTML To PDF.");
OutputStream os = null;
try {
os = new FileOutputStream(finalOutputFilePath);
final Document cleanedDocument = HtmlUtil.cleanHTML(finalInputFilePath);
final ITextRenderer renderer = new ITextRenderer();
final String url = new File(finalInputFilePath).toURI().toURL().toString();
renderer.setDocument(cleanedDocument, url);
renderer.layout();
renderer.createPDF(os);
LOGGER.info("HTML is converted To PDF.");
} finally {
if (null != os) {
os.flush();
os.close();
}
}

Related

Image scrambled when embedded into pdf using iTextSharp

I'm having an issue when attempting to create a PDF with an image using iTextSharp. I'm adding the image to the top of the PDF like a masthead and then the rest of the PDF is HTML. The PDF generates fine and displays correctly when viewed in Edge, Chrome and Firefox. But, if you open it in IE or in Adobe Reader, the masthead is completely scrambled. The rest of the PDF (the HTML content) generates as expected in all browsers and programs. I've googled for days now trying to discover if anyone else has had this issue, but all I can find is text being garbled, not images. Below is the code I'm using to generate the image. Any ideas?
using (var document = new Document(PageSize.A4, 10f, 10f, 10f, 10f))
{
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(Server.MapPath("~/Content/Images/kfa_masthead.jpg"));
using (var ms = new MemoryStream())
{
image.Left = 10f;
image.Top = image.Height;
image.ScaleAbsolute(560f, 58f);
using (var writer = PdfWriter.GetInstance(document, ms))
{
document.Open();
writer.CloseStream = false;
//Add the masthead
document.Add(image);
//Add the report information
var html = CreateTable(profile);
using (var xHtml = new StringReader(html))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, xHtml);
}
document.CloseDocument();
document.Close();
}
return ms.ToArray();
}
}

Embedding font using itext 5 for PDF/UA compliance

We are currently building a proof of concept to generate PDF/UA compliant PDF from from a CSS and html (xhtml) file using xslt. We are able tag the PDF and add the appropriate metadata information.
The last major issue we are unable to solve is embedding a standard PDF font zapfdinbats, which our accessibility assessment tool complains about - using PAC 2.0 along with adobe DC built in checker.
As you can see from the image below the other fonts we are using seems automatically get embedded using the xmlworker from our CSS.
I have also tried finding the font as indicated and found one, however, it doesn't seem to be the correct one.
Here is a sample of our code
private static ReturnValue CreateFromHtml(string html)
{
ReturnValue Result = new ReturnValue();
var stream = new MemoryStream();
using (var doc = new Document(PageSize.LETTER))
{
using (var ms = new MemoryStream())
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
writer.CloseStream = false;
writer.SetPdfVersion(PdfWriter.PDF_VERSION_1_7);
//TAGGED PDFVERSION_1_7
//Make document tagged
writer.SetTagged();
//===============
//PDF/UA
//Set document metadata
writer.ViewerPreferences = PdfWriter.DisplayDocTitle;
doc.AddLanguage("en-US");
doc.AddTitle("document title");
writer.CreateXmpMetadata();
doc.Open();
var embedfont = HttpContext.Current.Server.MapPath("~/scripts/ZapfDingbats.ttf");
var fontProv = new XMLWorkerFontProvider();
fontProv.DefaultEncoding = "UTF-8";
fontProv.Register(embedfont);
//Testing zapfDingbats font
Font font = FontFactory.GetFont(embedfont, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Paragraph p1 = new Paragraph("Testing of Fonts", font);
doc.Add(p1);
//end font processing
var tagProcessors = (DefaultTagProcessorFactory)Tags.GetHtmlTagProcessorFactory();
tagProcessors.RemoveProcessor(HTML.Tag.IMG);
tagProcessors.AddProcessor(HTML.Tag.IMG, new CustomImageTagProcessor());
var cssFiles = new CssFilesImpl();
cssFiles.Add(XMLWorkerHelper.GetInstance().GetDefaultCSS());
var cssResolver = new StyleAttrCSSResolver(cssFiles);
var charset = Encoding.UTF8;
var context = new HtmlPipelineContext(new CssAppliersImpl(new XMLWorkerFontProvider()));
context.SetAcceptUnknown(true).AutoBookmark(true).SetTagFactory(tagProcessors);
var htmlPipeline = new HtmlPipeline(context, new PdfWriterPipeline(doc, writer));
var cssPipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
var worker = new XMLWorker(cssPipeline, true);
var xmlParser = new XMLParser(true, worker, charset);
using (var sr = new StringReader(html))
{
xmlParser.Parse(sr);
doc.Close();
ms.Position = 0;
ms.CopyTo(stream);
stream.Position = 0;
}
}
}
}
// get bytes from stream
Result.Data = stream.ToArray();
// success
Result.Success = true;
return Result;
}
Maybe there is something in the CSS we need to do (our CSS is quite large f
iText only ships with the Adobe Font Metrics (AFM) file of Zapfdingbats. This means that you can't embed that font unless you provide the corresponding PostScript Font Binary (PFB) file. This PFB file can't be shipped with iText because iText doesn't have a license to do so.
The first step to solve this, is to:
purchase a Zapfdingbats license so that you get the PFB (If I recall correctly, it's a font owned by Adobe), or
use an alternative font when you want to insert special characters (check boxes, phone symbols,...) into your text (e.g. purchase a license for the AdobePiStd font that was used as a substitution font and use that font instead of Zapfdingbats).
In your case, you provided a font ZapfDingbats.ttf which you register with the XMLWorkerFontProvider. When you register this font, it can be recognized through an alias. If ZapfDingbats.ttf isn't picked up by XML Worker, there is probably a mismatch between the name of the font used in the PDF and the alias that was used when ZapfDingbats.ttf was registered.
What is the font name used for ZapfDingbats in the CSS? You should register ZapfDingbats using that name as alias.

Map System.Drawing.Image to proper PdfName.COLORSPACE and PdfName.FILTER in iTextSharp

I'm using iTextSharp to update an image object in a PDF with a modified System.Drawing.Image. How do I properly set the PdfName.COLORSPACE and PdfName.FILTER based on the System.Drawing.Image? I'm not sure which System.Drawing.Image properties can be used for the mappings.
private void SetImageData(PdfImageObject pdfImage, System.Drawing.Image image, byte[] imageData)
{
PRStream imgStream = (PRStream)pdfImage.GetDictionary();
imgStream.Clear();
imgStream.SetData(imageData, false, PRStream.NO_COMPRESSION);
imgStream.Put(PdfName.TYPE, PdfName.XOBJECT);
imgStream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
imgStream.Put(PdfName.WIDTH, new PdfNumber(image.Width));
imgStream.Put(PdfName.HEIGHT, new PdfNumber(image.Height));
imgStream.Put(PdfName.LENGTH, new PdfNumber(imageData.LongLength));
// Not sure how to properly set these entries based on the image properties
imgStream.Put(PdfName.BITSPERCOMPONENT, 8);
imgStream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
imgStream.Put(PdfName.FILTER, PdfName.DCTDECODE);
}
I took the advice of Chris Haas and cheated by writing the System.Drawing.Image to a temporary PDF and then read it back out as a PdfImageObject.
using (MemoryStream ms = new MemoryStream())
{
using (iTextSharp.text.Document doc = new iTextSharp.text.Document())
{
using (iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(doc))
{
doc.Open();
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(drawingImage, drawingImage.RawFormat);
image.SimplifyColorspace();
doc.Add(image);
doc.Close();
}
}
... code that opens the mem stream with PdfReader and retrieves the image as PdfImageObj...
}
That worked but seemed like allot of side stepping for a cheat. I stepped through the code in the call to doc.Add(image) and found that eventually a PdfImage object was created from the iTextSharp.text.Image object and the PdfImage object contained all the dictionary entries that I needed. So I decided to cut some corners on the original cheat and come up with this as my final solution:
private void SetImageData(PdfImageObject pdfImageObj, byte[] imageData)
{
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(imageData);
image.SimplifyColorSpace();
PdfImage tempPdfImage = new PdfImage(image, "TempImg", null);
PRStream imgStream = (PRStream)pdfImageObj.GetDictionary();
imgStream.Clear();
imgStream.SetDataRaw(imageData);
imgStream.Merge(tempPdfImage);
}

GWT Progress Bar while buffering PDF content before displaying on browser

In my GWT app I have written a servlet to download/stream a PDF file.
Following is the code.
protected void updateResponse(HttpServletResponse response, InputStream dataStream, long contentLength, KnownContentTypes contentType, String filename, int cacheSeconds) {
response.setHeader("Content-Type", contentType.getTypeString());
response.setHeader("Content-Length", String.valueOf(contentLength));
response.setHeader("Content-disposition", "inline;filename=" + filename);
response.setHeader("Cache-Control", "max-age=" + cacheSeconds);
byte[] buffer = new byte[BUFFER_SIZE];
try {
ServletOutputStream out = response.getOutputStream();
while ((dataStream.read(buffer)) != -1) {
out.write(buffer);
}
} catch (IOException e) {
sendError(response);
}
}
The pdf is successfully rendered on the browser.
The problem is some pdf's are really large in size and since this is called on window.open all I see is a blank browser.
I want to display a dynamic message like '1MB of 5MB downloaded' and display/render the entire PDF file once all bytes are streamed.
Please let me know how to do this.
I am new to GWT and any help will be appreciated.
Thanks.
This can not be done with synch request which we usually make by calling servlet with Window.open You have to make ajax request to calculate progress and display the response.
Take a look to this library: http://code.google.com/p/gwtupload/. It is really easy to to use and works fine in most of the browsers. It uses ajax requests to calculate progress.

adding page numbers and creating landscape A4 in streams with itext

I have the following code that generates a pdf into the stream. This works well but i now have the following requirements.
1) make page landscape: Looking at other examples they add the property to the document object. But i'm doing this instream. So how would i add this property?
2) Add page numbers. I need to put items into a grid so that there are x number of rows per page. With a page number at the footer of the page. How can this kind of feature be acheived with Itext sharp.
public static void Create(ICollection<Part> parts, string path)
{
PdfReader reader = new PdfReader(path);
var pageWidth = 500;
byte[] bytes;
using (MemoryStream ms = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, ms))
{
PdfContentByte cb = stamper.GetOverContent(1);
//Flush the PdfStamper's buffer
stamper.Close();
//Get the raw bytes of the PDF
bytes = ms.ToArray();
var now = String.Format("{0:d-M-yyyy}", DateTime.Now);
var pdfName = string.Format("{0}_factory_worksheet", now).Replace("%", "").Replace(" ", "_");
var context = HttpContext.Current;
context.Response.ContentType = "application/pdf";
context.Response.AddHeader("content-disposition", "attachment;filename=" + pdfName);
context.Response.Buffer = true;
context.Response.Clear();
context.Response.OutputStream.Write(ms.GetBuffer(), 0, ms.GetBuffer().Length);
context.Response.OutputStream.Flush();
context.Response.End();
}
}
}
I don't really know how you handling it C# but logical flow will be like this:
Use PdfDictionary to rotate content in reader to 90 degree.Assuming your pdf contain multiple pages,
PdfReader reader = new PdfReader(path);
for (int pgCnt=1; pgCnt <= reader.getNumberOfPages(); pgCnt++) {
//Logic to implement rotation & Add Page number
}
To get current rotation(assuming you are using Portrait mode & try to convert it in landscape mode) use int rttnPg = reader.getPageRotation(pgCnt); also get the PdfDictionary of that page pgDctnry=reader.getPageN(i);(I named that variable as pgDctnry)
Now to rotate it in 90 degree use
pgDctnry.put(PdfName.ROTATE, new PdfNumber(rttnPg+90));
Now bind it using PdfStamper as you are currently doing it.Now to add page number get over content(here i named it pgCntntBt) of the current page
pgCntntBt = stamper .getOverContent(pgCnt);
rctPgSz = rdrPgr.getPageSizeWithRotation(pgCnt);
pgCntntBt.beginText();
bfUsed=//Base Font used for text to be displayed.Also set font size pgCntntBt.setFontAndSize(bfUsed,8.2f);
txtPg=String.format(pgTxt+" %d/%d",pgCnt,totPgCnt);
pgCntntBt.showTextAligned(2,txtPg,//Put Width,//Put Height,//Rotation);
pgCntntBt.endText();
Actually i don't understand what you mean by this:"I need to put items into a grid so that there are x number of rows per page. With a page number at the footer of the page".Now close the stamper to flush it in outputstream.