I am trying to use iTextSharp to convert some HTML mail from Outlook into PDF. Some mail gives problem to the HTMLWorker, generating exceptions.
In case this happens, I want to catch the exception and abandon the PDF creation. But I can not. What do I have to do to check and properly close the opened Document?
Directly before calling Close() you can check the PageNumber property of your Document to see if there are any pages.
if (doc.PageNumber == 0) {
//Do something here
}
doc.Close();
Also, the HTMLWorker class isn't being actively developed anymore. Instead, almost all new HTML parsing code is being done in a separate library called XMLWorker. See #kuujinbo's sample code here.
Start with a new page and add your paragraphs:
Document document = new Document();
document.Open();
foreach (var item in List)
{
document.NewPage();
AddParagraph(item, document);
}
document.Close();
Related
I have searched and searched and cannot find the answer to my problem. I've tried many different approaches in my code, but I've hit a wall and I'm not sure where to go from here. I seem to be wanting to do the same thing as these two threads:
Trying to insert an image into a pdf in c#
Add image in an existing PDF with itextsharp
They are very similar and the answer is the same. However, when I use that exact code, the result is a PDF without an image. Here is my code:
using (var existingFileStream = new FileStream(fileNameExisting, FileMode.Open))
using (var newFileStream = new FileStream(fileNameNew, FileMode.Create))
{
var pdfReader = new PdfReader(existingFileStream);
var stamper = new PdfStamper(pdfReader, newFileStream, '\0', true);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (var field in form.Fields)
{
if (field.Key == "form1[0].ec_Bldg_Photo_1[0].ImageField2[0]")
{
PushbuttonField imageField = form.GetNewPushbuttonFromField(field.Key);
imageField.Layout = PushbuttonField.LAYOUT_ICON_ONLY;
imageField.IconReference = null;
imageField.ProportionalIcon = true;
imageField.Image = Image.GetInstance(#"PATH_TO_IMAGE\front.jpg");
form.ReplacePushbuttonField(field.Key, imageField.Field);
}
}
stamper.FormFlattening = false;
stamper.Close();
pdfReader.Close();
}
I have tried to rule out all of the obvious things. My path to the image is correct, the field is indeed a PushbuttonField when I read the existing PDF field and get the field type. If I open the PDF in Adobe Reader and click on the placeholder for the image, it allows me to pick a file from my PC. When I place an image in the file, save, and then read in that PDF, I can then change my code to this:
imageField.ProportionalIcon = false;
And now all of sudden the image is stretched on the saved copy. So I see that it is changing this part but this is when I enter the image manually in Adobe Reader. When I read in the field after I set that image in Adobe Reader and it shows correctly, I see a couple interesting things. The field.Image property IS NULL and the field.IconReference is NOT NULL. When I use the original code to try and insert the image, it is reversed, where Image is NOT NULL but IconReference IS NULL
Any help would be greatly appreciated, thank you!!
EDIT 1: Ok so I didn't see it the first time, but I went back and checked more thoroughly and I did find that key. Here it is:
Several things are at play here.
Usage Rights:
The PDF is digitally signed with a private key owned by Adobe.
You can see this using RUPS here (in your screen shot you didn't go deep enough):
This has two implications:
The signature unlocks special permissions in Adobe Reader, such as the permission to save a filled out form locally.
Making any changes to the original PDF breaks the signature and removes the special permissions leading to an ugly error message in Adobe Reader.
This functionality is deprecated in (and even removed from) PDF 2.0. It's old technology that became obsolete with the emergence of PDF viewers other than Adobe Reader.
My suggestion: remove the usage rights to avoid breaking the signature. See the FAQ entry "Why do I get an error saying that "use of extended features is no longer available"?" iText 7 / iText 5
This is the iText 7 code:
public void removeUsageRights(PdfDocument pdfDoc) {
PdfDictionary perms = pdfDoc.getCatalog().getPdfObject().getAsDictionary(PdfName.Perms);
if (perms == null) {
return;
}
perms.remove(new PdfName("UR"));
perms.remove(PdfName.UR3);
if (perms.size() == 0) {
pdfDoc.getCatalog().remove(PdfName.Perms);
}
}
This is the iText 5 code:
PdfReader reader = new PdfReader(old_file);
if (reader.hasUsageRights()) {
reader.removeUsageRights();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(new_file));
stamper.close();
}
reader.close();
This is the iText 5 answer.
Hybrid Form:
If you click on the /AcroForm entry, you see this:
There is a /Fields array with references to field dictionaries that are also widget annotations. That means that the document has an AcroForm form inside. However, there is also an /XFA entry with a series of XML snippets. That means that the document has an XFA form inside.
In other words: the same form description is added twice inside. You are changing a button in one form (the AcroForm part), but not in the other (the XFA form) and that leads to inconsistencies.
XFA has been deprecated in PDF 2.0 because there weren't many vendors supporting that technology. It's kind of frustrating to be confronted with forms that use deprecated technology.
My suggestion: I would remove the XFA part. See the FAQ entry "Is it safe to remove XFA?" iText 5 / iText 7
In iText 5, removing XFA is done like this:
AcroFields form = stamper.getAcroFields();
form.removeXfa();
Important: my suggestion is to remove all the deprecated functionality from the PDF, but if the government expects that functionality to be present, then you're out of luck. In that case, you will need to use Adobe software to process the form. If that's the case, you could complain to the government that their requirements lead to a de facto vendor lock-in. By the way: iText Software is also a vendor. It's an open source company that offers open source software under the AGPL license. The AGPL license allows free use under certain circumstances (see How do I make sure my software complies with AGPL: How can I use iText for free?) If you don't meet those requirements, you will have to purchase a commercial license for your use of iText.
I want to use iText to convert a series of html file to PDF.
For instance: if have these files:
page1.html
page2.html
page3.html
...
Now I want to create a single PDF file, where page1.html is the first page, page2.html is the second page, and so on...
I know how to convert a single HTML file to a PDF, but I don't know how to combine these different PDFs resulting from this operation into a single PDF.
Before we start: I am not a C# developer, so I can not give you an example in C#. All the iText examples I write, are written in Java. Fortunately, iText and iTextSharp are always kept in sync. In the context of this question, you can rest assure that whatever works for iText will also work for iTextSharp, but you'll have to make small adaptations that are specific to C#. From what I hear from C# developers, this is usually not hard to achieve.
Regarding the answer: there are two answers and answer #2 is generally better than answer #1, but I'm giving both options because there may be specific cases where answer #1 is better.
Test data: I have created 3 simple HTML files, each containing some info about a State in the US:
page1.html: California
page2.html: New York
page3.html: Massachusetts
We are going to use XML Worker to parse these three files and we want a single PDF file as a result.
Answer #1: see ParseMultipleHtmlFiles1 for the full code sample and multiple_html_pages1.pdf for the resulting PDF.
You say that you already succeeded in converting one HTML file into one PDF files. It is assumed that you did it like this:
public byte[] parseHtml(String html) throws DocumentException, IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, baos);
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(html));
// step 5
document.close();
// return the bytes of the PDF
return baos.toByteArray();
}
This is not the most efficient way to parse an HTML file (there are other examples on the web site), but it's the simplest way.
As you can see, this method parse an HTML into a PDF file and returns that PDF file in the form of a byte[]. As we want to create a single PDF, we can feed this byte array to a PdfCopy instance, so that we can concatenate multiple documents.
Suppose that we have three documents:
public static final String[] HTML = {
"resources/xml/page1.html",
"resources/xml/page2.html",
"resources/xml/page3.html"
};
We can loop over these three documents, parse them one by one to a byte[], create a PdfReader instance with the PDF bytes, and add the document to the PdfCopy instance using the addDocument() method:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(file));
document.open();
PdfReader reader;
for (String html : HTML) {
reader = new PdfReader(parseHtml(html));
copy.addDocument(reader);
reader.close();
}
document.close();
}
This solves your problem, but why do I think it's not the optimal solution?
Suppose that you need to use a special font that needs to be embedded. In that case, every separate PDF file will contain a subset of that font. Different files will require different font subsets, and PdfCopy (nor PdfSmartCopy for that matter) can merge font subsets. This could result in a bloated PDF file with way too many font subsets of the same font.
How do we solve this? That's explained in answer #2.
Answer #2: See ParseMultipleHtmlFiles2 for the full code sample and multiple_html_pages2.pdf for the resulting PDF. You already see the difference in file size: 4.61 KB versus 5.05 KB (and we didn't even introduce embedded fonts).
In this case, we don't parse the HTML to a PDF file the way we did in the parseHtml() method from answer #1. Instead, we parse the HTML to an iText ElementList using the parseToElementList() method. This method requires two Strings. One containing the HTML code, the other one containing CSS values.
We use a utility method to read the HTML file into a String. As for the CSS value, we could pass null to parseToElementList(), but in that case, default styles will be ignored. You'll notice that the <h1> tag we introduced in our HTML will look completely different if you don't pass the default.css that is shipped with XML Worker.
Long story short, this is the code:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(file));
document.open();
String css = readCSS();
for (String htmlfile : HTML) {
String html = Utilities.readFileToString(htmlfile);
ElementList list = XMLWorkerHelper.parseToElementList(html, css);
for (Element e : list) {
document.add(e);
}
document.newPage();
}
document.close();
}
We create a single Document and a single PdfWriter instance. We parse the different HTML files into ElementLists one by one, and we add all the elements to the Document.
As you want a new page, each time a new HTML file is parsed, I introduced a document.newPage(). If you remove this line, you can add the three HTML pages on a single page (which wouldn't be possible if you would opt for answer #1).
I want to use iText to convert a series of html file to PDF.
For instance: if have these files:
page1.html
page2.html
page3.html
...
Now I want to create a single PDF file, where page1.html is the first page, page2.html is the second page, and so on...
I know how to convert a single HTML file to a PDF, but I don't know how to combine these different PDFs resulting from this operation into a single PDF.
Before we start: I am not a C# developer, so I can not give you an example in C#. All the iText examples I write, are written in Java. Fortunately, iText and iTextSharp are always kept in sync. In the context of this question, you can rest assure that whatever works for iText will also work for iTextSharp, but you'll have to make small adaptations that are specific to C#. From what I hear from C# developers, this is usually not hard to achieve.
Regarding the answer: there are two answers and answer #2 is generally better than answer #1, but I'm giving both options because there may be specific cases where answer #1 is better.
Test data: I have created 3 simple HTML files, each containing some info about a State in the US:
page1.html: California
page2.html: New York
page3.html: Massachusetts
We are going to use XML Worker to parse these three files and we want a single PDF file as a result.
Answer #1: see ParseMultipleHtmlFiles1 for the full code sample and multiple_html_pages1.pdf for the resulting PDF.
You say that you already succeeded in converting one HTML file into one PDF files. It is assumed that you did it like this:
public byte[] parseHtml(String html) throws DocumentException, IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, baos);
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(html));
// step 5
document.close();
// return the bytes of the PDF
return baos.toByteArray();
}
This is not the most efficient way to parse an HTML file (there are other examples on the web site), but it's the simplest way.
As you can see, this method parse an HTML into a PDF file and returns that PDF file in the form of a byte[]. As we want to create a single PDF, we can feed this byte array to a PdfCopy instance, so that we can concatenate multiple documents.
Suppose that we have three documents:
public static final String[] HTML = {
"resources/xml/page1.html",
"resources/xml/page2.html",
"resources/xml/page3.html"
};
We can loop over these three documents, parse them one by one to a byte[], create a PdfReader instance with the PDF bytes, and add the document to the PdfCopy instance using the addDocument() method:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(file));
document.open();
PdfReader reader;
for (String html : HTML) {
reader = new PdfReader(parseHtml(html));
copy.addDocument(reader);
reader.close();
}
document.close();
}
This solves your problem, but why do I think it's not the optimal solution?
Suppose that you need to use a special font that needs to be embedded. In that case, every separate PDF file will contain a subset of that font. Different files will require different font subsets, and PdfCopy (nor PdfSmartCopy for that matter) can merge font subsets. This could result in a bloated PDF file with way too many font subsets of the same font.
How do we solve this? That's explained in answer #2.
Answer #2: See ParseMultipleHtmlFiles2 for the full code sample and multiple_html_pages2.pdf for the resulting PDF. You already see the difference in file size: 4.61 KB versus 5.05 KB (and we didn't even introduce embedded fonts).
In this case, we don't parse the HTML to a PDF file the way we did in the parseHtml() method from answer #1. Instead, we parse the HTML to an iText ElementList using the parseToElementList() method. This method requires two Strings. One containing the HTML code, the other one containing CSS values.
We use a utility method to read the HTML file into a String. As for the CSS value, we could pass null to parseToElementList(), but in that case, default styles will be ignored. You'll notice that the <h1> tag we introduced in our HTML will look completely different if you don't pass the default.css that is shipped with XML Worker.
Long story short, this is the code:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(file));
document.open();
String css = readCSS();
for (String htmlfile : HTML) {
String html = Utilities.readFileToString(htmlfile);
ElementList list = XMLWorkerHelper.parseToElementList(html, css);
for (Element e : list) {
document.add(e);
}
document.newPage();
}
document.close();
}
We create a single Document and a single PdfWriter instance. We parse the different HTML files into ElementLists one by one, and we add all the elements to the Document.
As you want a new page, each time a new HTML file is parsed, I introduced a document.newPage(). If you remove this line, you can add the three HTML pages on a single page (which wouldn't be possible if you would opt for answer #1).
I have some code that generates documents using the Document, PdfWriter and PdfResource. As it loops through the generation of the PDF, it creates new pages. Sometimes, there may be a condition where the generation of the new page fails, and the page should not be added. Is there a way to handle pages "transactionally". i.e. create a page and enter content, if it fails, "rollback" the changes and do not add the page to the document?
I have some code that looks like the following:
pdfResource.document.newPage();
PdfContentByte contentByte = writer.getDirectContent();
contentByte.saveState();
try {
// do some work to fill the page
} catch (Exception e) {
// How do I rollback and remove the page???
} finally {
contentByte.restoreState();
}
I am currently using version 5.0.2
I know of no way to do what you are trying to do.
The recommended path would be to do the "try/catch" stuff before you call newPage(). Set up some state variables variables, perform your sanity tests, etc. If you're working with images make sure that you can actually read them. I would actually go so far as to instantiate/load the image bytes beforehand.
Another option would be to mark those pages as "to delete at the end" and then call pdfReader.selectaPages() on everything except those.
I am working in asp .net mvc3.
I have these statements in controller class:
PdfWriter.GetInstance(doc, new FileStream((Request.PhysicalApplicationPath + "\\Receipt5.pdf"),
FileMode.Create));
doc.Open();
PdfPTable table = new PdfPTable(2);
table.AddCell("tt[0]");
table.AddCell("tt[1]");
doc.close();
All time my values are changing but in pdf sometimes showing old result. please tell me what should i do for it that whenever i press done button then new pdf document should generate.
i am using iTextSharp to generate pdf.
It seems that you're not able to replace the old file cause it is locked.
Try to delete it and see what happens.
Anyway, consider that if more than one user tries to print the same document you can have a concurrency problem.
I would suggest you to use a generated file name:
var newFile = System.IO.Path.Combine(Request.PhysicalApplicationPath, Guid.NewGuid().ToString() + ".pdf");