Itext Html2pdf memory management - itext

I'm using Java Itext Version 7.1.9 and html2pdf 2.1.6. I am using the following code to convert HTML string and add it to pdf. But I'm running out of memory very quickly when trying to add large amounts of data to pdf. When I check the heap dump I see following Itext classes take most of the memory. Most of the memory is taken by some font class. Please suggest on how to reduce memory consumption on adding HTML string to pdf.
private void addHtmlToDocument(String htmlText, Document document) {
List<IElement> elements;
try {
elements = HtmlConverter.convertToElements(htmlText);
for (IElement element : elements) {
if (element instanceof IBlockElement) {
document.add((IBlockElement) element);
}
}
} catch (IOException e) {
log.error("Unable to add Html elements to pdf document: " + e.toString(), e);
}
}
Here is the stack trace
at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
at java.util.HashMap.resize()[Ljava/util/HashMap$Node; (HashMap.java:704)
at java.util.HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object; (HashMap.java:663)
at java.util.HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (HashMap.java:612)
at java.util.HashSet.add(Ljava/lang/Object;)Z (HashSet.java:220)
at com.itextpdf.layout.tagging.LayoutTaggingHelper.releaseFinishedHints()V (LayoutTaggingHelper.java:282)
at com.itextpdf.layout.renderer.DocumentRenderer.updateCurrentArea(Lcom/itextpdf/layout/layout/LayoutResult;)Lcom/itextpdf/layout/layout/LayoutArea; (DocumentRenderer.java:99)
at com.itextpdf.layout.renderer.RootRenderer.updateCurrentAndInitialArea(Lcom/itextpdf/layout/layout/LayoutResult;)V (RootRenderer.java:463)
at com.itextpdf.layout.renderer.RootRenderer.addChild(Lcom/itextpdf/layout/renderer/IRenderer;)V (RootRenderer.java:234)
at com.itextpdf.layout.RootElement.createAndAddRendererSubTree(Lcom/itextpdf/layout/element/IElement;)V (RootElement.java:377)
at com.itextpdf.layout.RootElement.add(Lcom/itextpdf/layout/element/IBlockElement;)Lcom/itextpdf/layout/IPropertyContainer; (RootElement.java:106)
at com.itextpdf.layout.Document.add(Lcom/itextpdf/layout/element/IBlockElement;)Lcom/itextpdf/layout/Document; (Document.java:160)
at org.test.service.VRGeneratorImpl.addHtmlToDocument(Ljava/lang/String;Lcom/itextpdf/layout/Document;)

Related

iText7: Error at file pointer when merging two pdfs

We are in the last steps of evaluating iText7. We use iText 7.1.0 and html2pdf 2.0.0.
What we do: we send a json_encoded collection with pdf-data (which includes html for header, body and footer) to our Java app. There we iterate over the collection, create a byteArrayOutputStream for each pdf-data element and merge them together. We then send the results to a script which echoes it to e.g. a browser. Although the pdf is displayed correctly, we encounter errors while creating it:
com.itextpdf.io.IOException: Error at file pointer 226,416.
...
Caused by: com.itextpdf.io.IOException: xref subsection not found.
... 73 common frames omitted
If we create only one part of the collection, no error is thrown.
Iterate over collection and merge:
#RequestMapping(value = "/pdf", method = RequestMethod.POST, produces = MediaType.APPLICATION_PDF_VALUE)
public byte[] index(#RequestBody PDFDataModelCollection elements, Model model) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
try (PdfDocument resultDoc = new PdfDocument(writer)) {
for (PDFDataModel pdfDataModel : elements.getElements()) {
PdfReader reader = new PdfReader(new ByteArrayInputStream(creationService.createDatasheet(pdfDataModel)));
try (PdfDocument sourceDoc = new PdfDocument(reader)) {
int n = sourceDoc.getNumberOfPages(); //<-- IOException on second iteration
for (int i = 1; i <= n; i++) {
PdfPage page = sourceDoc.getPage(i).copyTo(resultDoc);
resultDoc.addPage(page);
}
}
}
}
return byteArrayOutputStream.toByteArray(); //outputs the final pdf
}
Creation of part:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
//Initialize PDF document
PdfDocument pdfDoc = new PdfDocument(writer);
try (
Document document = new Document(pdfDoc)
) {
//header, footer, etc
//body
for (IElement element : HtmlConverter.convertToElements(pdfDataModel.getBody(), this.props)) {
document.add((IBlockElement) element);
}
footer.writeTotalNumberOnPages(pdfDoc);
}
return byteArrayOutputStream.toByteArray();
}
We are grateful for any suggestion.
In createDatasheet you appear to re-use some byteArrayOutputStream without clearing it first.
In the first iteration, therefore, everything works as desired, at the end of createDatasheet you have a single PDF file in it.
In the second iteration, though, you have two PDF files in that byteArrayOutputStream, one after the other. This concatenation does not form a valid single PDF.
Thus, byteArrayOutputStream.toByteArray() returns something broken.
To fix this, either make the byteArrayOutputStream local to createDatasheet and create a new instance every time or alternatively reset byteArrayOutputStream at the start of createDatasheet:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
byteArrayOutputStream.reset();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
[...]

itext AcroFields form onto second page, needs to keep same template

i have a form that i have created in MS Word then converted to a PDF (Form) then i load this in using a PDF Reader, i then have a stamper created that fills in the fields, if i want to add a second page with the same template (Form) how do i do this and populate some of the fields with the same information
i have managed to get a new page with another reader but how do i stamp information onto this page as the AcroFields will have the same name.#
this is how i achieved that:
stamper.insertPage(1,PageSize.A4);
PdfReader reader = new PdfReader("/soaprintjobs/templates/STOTemplate.pdf"); //reads the original pdf
PdfImportedPage page; //writes the new pdf to file
page = stamper.getImportedPage(reader,1); //retrieve the second page of the original pdf
PdfContentByte newPageContent = stamper.getUnderContent(1); //get the over content of the first page of the new pdf
newPageContent.addTemplate(page, 0,0);
Thanks
Acroform fields have the property that fields with the same name are considered the same field. They have the same value. So if you have a field with the same name on page 1 and page 2, they will always display the same value. If you change the value on page 1, it will also change on page 2.
In some cases this is desirable. You may have a multi-page form with a reference number and want to repeat that reference number on each page. In that case you can use fields with the same name.
However, if you want to have multiple copies of the same form with different data in 1 document, you'll run into problems. You'll have to rename the form fields so they are unique.
In iText, you should not use getImportedPage() to copy Acroforms. Starting with iText 5.4.4 you can use the PdfCopy class. In earlier versions the PdfCopyFields class should be used.
Here's some sample code to copy Acroforms and rename fields. Code for iText 5.4.4 and up is in comments.
public static void main(String[] args) throws FileNotFoundException, DocumentException, IOException {
String[] inputs = { "form1.pdf", "form2.pdf" };
PdfCopyFields pcf = new PdfCopyFields(new FileOutputStream("out.pdf"));
// iText 5.4.4+
// Document document = new Document();
// PdfCopy pcf = new PdfCopy(document, new FileOutputStream("out.pdf"));
// pcf.setMergeFields();
// document.open();
int documentnumber = 0;
for (String input : inputs) {
PdfReader reader = new PdfReader(input);
documentnumber++;
// add suffix to each field name, in order to make them unique.
renameFields(reader, documentnumber);
pcf.addDocument(reader);
}
pcf.close();
// iText 5.4.4+
// document.close();
}
public static void renameFields(PdfReader reader, int documentnumber) {
Set<String> keys = new HashSet<String>(reader.getAcroFields()
.getFields().keySet());
for (String key : keys) {
reader.getAcroFields().renameField(key, key + "_" + documentnumber);
}
}

how to read problems explorer view in eclipse programatically

is there any way to read the eclipse problem view programatically in eclipse plugin.
I want to fetch data from the following screen-
Yes: Ask the workbench for all Markers of type IMarker.PROBLEM. The documentation contains a code snippet for this:
IMarker[] problems = null;
int depth = IResource.DEPTH_INFINITE;
try {
problems = resource.findMarkers(IMarker.PROBLEM, true, depth);
} catch (CoreException e) {
// something went wrong
}
To get the workspace root, use ResourcesPlugin.getWorkspace().getRoot();
The file MarkerTypesModel.java contains this code:
private String getWellKnownLabel(String type) {
if (type.equals(IMarker.PROBLEM)) {
return "Problem";//$NON-NLS-1$
}
if (type.equals(IMarker.TASK)) {
return "Task";//$NON-NLS-1$
}
if (type.equals("org.eclipse.jdt.core.problem")) { //$NON-NLS-1$
return "Java Problem";//$NON-NLS-1$
}
return type;
}
As you can see, it compares the type with a fixed string to produce Java Problem (and the NON_NLS-Comments are wrong, too).

GWT numberformat weird behaviour

I have a GWT project for which the locale is set to fr. I have a custom text field that uses a number format to validate and format the numerical inputs.
The formatting works fine but not the input validation. Here is a snapshot of the method that validates that the new value is a valid percentage (this is called onValueChanged):
private void validateNumber(String newVal){
logger.debug("value changed, newVal="+newVal+", current="+current);
// Attempt to parse value
double val=0;
try{
val=Double.parseDouble(newVal);
}catch(NumberFormatException e){
logger.warn("parsing failed",e);
try{
val=getFormatter().parse(newVal);
}catch(NumberFormatException ex){
logger.warn("parsing with nb format failed",ex);
// on failure: restore previous value
setValue(current,false);
return;
}
}
//some check on min and max value here
}
For example if the starting value is set by the program to "0.2" it will show up as 20,00 % hence using the correct decimal separator.
Now:
if I input 0,1 I get a numberformat exception.
if I input 0.1 it show as 10,00 %
if I 10% (without the space before the '%'), I get a numberformat exception
Do you know how I can modify the method to have 0,1 and 10% identified as valid inputs?
As Colin mentioned, you definitely want to parse and format using a GWT Number Format object, not Double, so the parsing and formatting are properly locale specific.
Below is some code snippets I could find to parse, validate and format a percent number.
Note however the edit process has the % unit hard-coded outside of the text box value, hence no conversion between 20,45% and 0.2045 in the edit process, 20,45 is entered directly and visualized as such. I vaguely recall struggling with such conversion during the edit process but forgot the details as it was a while back. So if it is a critical part of your question and requirements then I am afraid the examples below may be of limited value. Anyway, here they are!
Notations:
TextBox txt = new TextBox();
NumberFormat _formatFloat = NumberFormat.getFormat("#,##0.00");
NumberFormat _formatPercent = NumberFormat.getFormat("##0.00%");
Parsing text entry like "20,45" as 20.45 (not "20,45%" as 0.2045):
txt.setText("20,45"); // French locale format example, % symbol hard-coded outside of text box.
try {
float amount = (float) _formatFloat.parse(txt.getText());
} catch (NumberFormatException e) ...
Parsing & Validating text entry like "20,45":
private class PercentEntryValueChangeHandler implements ValueChangeHandler<String>
{
#Override
public void onValueChange(ValueChangeEvent<String> event)
{
validatePercent((TextBox) event.getSource());
}
};
private void validatePercent(final TextBox percentTextBox)
{
try
{
if (!percentTextBox.getText().isEmpty())
{
final float val = (float) _formatFloat.parse(percentTextBox.getText());
if (isValid(val))
percentTextBox.setText(_formatFloat.format(val));
else
{
percentTextBox.setFocus(true);
percentTextBox.setText("");
Window.alert("Please give me a valid value!");
}
}
}
catch (NumberFormatException e)
{
percentTextBox.setFocus(true);
percentTextBox.setText("");
Window.alert("Error: entry is not a valid number!");
}
}
private boolean isValid(float val) { return 12.5 < val && val < 95.5; }
txt.addValueChangeHandler(new PercentEntryValueChangeHandler());
Formatting 20.45 as "20,45":
float val = 20.45;
txt.setText(_formatFloat.format(val));
Formatting 0.2045 as "20,45%" (read only process, the text box is not editable, the % is set inside the text box):
float val = 0.2045;
txt.setText(_formatPercent.format((double)(val))); // * 100 embedded inside format.
It is not fancy and probably far from perfect but it works!
Any feedback on how to improve upon this implementation is more than welcome and appreciated!
I hope it helps anyway.
I managed to make it work by changing the code to the following:
private void validateNumber(String newVal){
double val=0;
try{
val=getFormatter().parse(newVal);
}catch(NumberFormatException e){
boolean ok=false;
try{
val=NumberFormat.getDecimalFormat().parse(newVal);
ok=true;
}catch(NumberFormatException e1){}
if(!ok){
try{
val=Double.parseDouble(newVal);
}catch(NumberFormatException ex){
setValue(current,false);
// inform user
Window.alert(Proto2.errors.myTextField_NAN(newVal));
return;
}
}
}

Merge 2 pdf byte streams using Itextsharp

I have a method that returns a pdf byte stream (from fillable pdf) Is there a straight forward way to merge 2 streams into one stream and make one pdf out of it? I need to run my method twice but need the two pdf's into One pdf stream. Thanks.
You didn't say if you're flattening the filled forms with the PdfStamper, so I'll just say you must flatten the before trying to merge them. Here's a working .ashx HTTP handler:
<%# WebHandler Language="C#" Class="mergeByteForms" %>
using System;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
public class mergeByteForms : IHttpHandler {
HttpServerUtility Server;
public void ProcessRequest (HttpContext context) {
Server = context.Server;
HttpResponse Response = context.Response;
Response.ContentType = "application/pdf";
using (Document document = new Document()) {
using (PdfSmartCopy copy = new PdfSmartCopy(
document, Response.OutputStream) )
{
document.Open();
for (int i = 0; i < 2; ++i) {
PdfReader reader = new PdfReader(_getPdfBtyeStream(i.ToString()));
copy.AddPage(copy.GetImportedPage(reader, 1));
}
}
}
}
public bool IsReusable { get { return false; } }
// simulate your method to use __one__ byte stream for __one__ PDF
private byte[] _getPdfBtyeStream(string data) {
// replace with __your__ PDF template
string pdfTemplatePath = Server.MapPath(
"~/app_data/template.pdf"
);
PdfReader reader = new PdfReader(pdfTemplatePath);
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
AcroFields form = stamper.AcroFields;
// replace this with your form field data
form.SetField("title", data);
// ...
// this is __VERY__ important; since you're using the same fillable
// PDF, if you don't set this property to true the second page will
// lose the filled fields.
stamper.FormFlattening = true;
}
return ms.ToArray();
}
}
}
Hopefully the inline comments make sense. _getPdfBtyeStream() method above simulates your PDF byte streams. The reason you need to set FormFlattening to true is that a when you fill PDF form fields, names are supposed to be unique. In your case the second page is the same fillable PDF form, so it has the same field names as the first page and when you fill them they're ignored. Comment out the example line above:
stamper.FormFlattening = true;
to see what I mean.
In other words, a lot of the generic code to merge PDFs on the Internet and even here on stackoverflow will not work (for fillable forms) because Acrofields are not being accounted for. In fact, if you take a look at stackoverflow's about itextsharp tag "SO FAQ & Popular" to Merge PDFs, it's mentioned in the third comment for the correctly marked answer by #Ray Cheng.
Another way to merge fillable PDF (without flattening the form) is to rename the form fields for the second/following page(s), but that's more work.