Display base64 encoded string as image in a PDF file using iText - itext

I have a base64 encoded string. I want to display this as an image in a PDF file. I am using iText to achieve this. I am using apache commons codec to convert Base64 to byteArray.Below is the code -
Document document = new Document();
PdfWriter.getInstance(document,new FileOutputStream("C:\\Path\\Path\\example.pdf"));
document.open();
String example = "...base64..String";
byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(example.getBytes());
Image image1 = Image.getInstance(decoded);
document.add(image1);
document.close();
This code executes without any error but when I open the generated PDF file it opens with an "internal error" and no image is displayed. What is the problem?
The complete Base64 string is -
_9j_4AAQSkZJRgABAQAAAQABAAD_2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj_2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj_wAARCAIVAUADASIAAhEBAxEB_8QAHQAAAwADAQEBAQAAAAAAAAAAAAECAwYHBQQICf_EAFkQAAEDAgQDBAYFBggKCQIHAAEAAhEDBAUSITEGQVEHEyJhFDJxgZGhF5Ox0dIVI0JSVcEIFiQzYnLh8CU0NUVWgpKVsrM3RlNzdYSiw_FEYyY2VIOUo8L_xAAZAQEBAQEBAQAAAAAAAAAAAAAAAQIDBAX_xAApEQEBAAIBBQACAQQCAwAAAAAAAQIREgMTITFRIkFxBDJhoYGRQlLw_9oADAMBAAIRAxEAPwD9OoTQgSE0IEhNCBITQgSE0IEhNCBITQgSE0IEhNCBITQgSE0IEhNCBITQgSE0IEhNCBITQgSE0IEhNCBITQgSE0IGhNCBL5sSvrXDLCve39enb2tBhfUq1DDWtHMoxO_tcLsK99iFxTt7Sgwvq1ahhrWjmVw17sR7asX7
Thanks!

Related

iText - add image to an existing PDF

I am doing the following using iText.
I have a PDF
I add an image to the PDF
I save the modified PDF.
To add image in PDF I am using this method itext-add.
This worked fine until I got a certain PDF. In this PDF, the adding-image-to-the-pdf method doesn't work. Moreover, it corrupts the PDF.
Points to note:
I'm getting PDFs from a third party and these are contractual PDFs. So it is possible that they have added some restrictions.
And one fun fact, when I add an annotation on the same page where I want to add that image, that image starts coming!
I'm using iText 7.1.10
String srcFileName = "/Users/kalpit/Desktop/step1-stack.pdf";
String destFileName = "step1-test1-kd2.pdf";
File destFile = new File(destFileName);
PdfDocument pdf = new PdfDocument(new PdfReader(srcFileName), new PdfWriter(destFile),new StampingProperties().useAppendMode());
Document document = new Document(pdf);
String imFile = "/Users/kalpit/Desktop/sample-image.png";
ImageData data = ImageDataFactory.create(imFile);
Image image = new Image(data);
document.add(image);
document.close();
and here is the PDF : https://ipupload.com/tP6/step1.pdf
As #Nikita found out, the problem does only occur if working in append mode. The cause is that in unfortunate conditions the changed resources are not stored, a bug.
The unfortunate conditions here are that on page 1
the content stream array already is an indirect object in its own right, so only this indirect object (and not the page object) is marked as changed when the instructions for showing the image are added; and
the resources and resource type dictionaries are direct objects but only they (and not the page object, the indirect object holding them) are marked as changed when the image resource is added.
Thus, the changes to the page content streams are stored but the page object is not.
So there now is an instruction to draw an image from an image page resource which is not there.
One way to work around this is not to use append mode as proposed by #Nikita.
Alternatively, if you are required to use append mode, you can explicitly mark your page as changed:
Document document = new Document(pdf);
String imFile = "/Users/kalpit/Desktop/sample-image.png";
ImageData data = ImageDataFactory.create(imFile);
Image image = new Image(data);
document.add(image);
pdf.getFirstPage().setModified(); // <-----
document.close();
Well, something wrong with appendMode. Do you really need him? Try to add image via the next code:
PdfDocument pdf = new PdfDocument(new PdfReader(srcFile), new PdfWriter(outFileName));
Document doc = new Document(pdf);
PdfImageXObject xObject = new PdfImageXObject(ImageDataFactory.createPng(UrlUtil.toURL(pathToYourimage.png)));
Image image = new Image(xObject, 100).setHorizontalAlignment(HorizontalAlignment.RIGHT);
doc.add(image);
doc.close();
The result is the next

using iTextPDF parse the data from HTML using XMLWorkerHelper this is directly write the content into Document but i want set data using ColumnText [duplicate]

I want to use iText to convert a series of html file to PDF.
For instance: if have these files:
page1.html
page2.html
page3.html
...
Now I want to create a single PDF file, where page1.html is the first page, page2.html is the second page, and so on...
I know how to convert a single HTML file to a PDF, but I don't know how to combine these different PDFs resulting from this operation into a single PDF.
Before we start: I am not a C# developer, so I can not give you an example in C#. All the iText examples I write, are written in Java. Fortunately, iText and iTextSharp are always kept in sync. In the context of this question, you can rest assure that whatever works for iText will also work for iTextSharp, but you'll have to make small adaptations that are specific to C#. From what I hear from C# developers, this is usually not hard to achieve.
Regarding the answer: there are two answers and answer #2 is generally better than answer #1, but I'm giving both options because there may be specific cases where answer #1 is better.
Test data: I have created 3 simple HTML files, each containing some info about a State in the US:
page1.html: California
page2.html: New York
page3.html: Massachusetts
We are going to use XML Worker to parse these three files and we want a single PDF file as a result.
Answer #1: see ParseMultipleHtmlFiles1 for the full code sample and multiple_html_pages1.pdf for the resulting PDF.
You say that you already succeeded in converting one HTML file into one PDF files. It is assumed that you did it like this:
public byte[] parseHtml(String html) throws DocumentException, IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, baos);
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(html));
// step 5
document.close();
// return the bytes of the PDF
return baos.toByteArray();
}
This is not the most efficient way to parse an HTML file (there are other examples on the web site), but it's the simplest way.
As you can see, this method parse an HTML into a PDF file and returns that PDF file in the form of a byte[]. As we want to create a single PDF, we can feed this byte array to a PdfCopy instance, so that we can concatenate multiple documents.
Suppose that we have three documents:
public static final String[] HTML = {
"resources/xml/page1.html",
"resources/xml/page2.html",
"resources/xml/page3.html"
};
We can loop over these three documents, parse them one by one to a byte[], create a PdfReader instance with the PDF bytes, and add the document to the PdfCopy instance using the addDocument() method:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(file));
document.open();
PdfReader reader;
for (String html : HTML) {
reader = new PdfReader(parseHtml(html));
copy.addDocument(reader);
reader.close();
}
document.close();
}
This solves your problem, but why do I think it's not the optimal solution?
Suppose that you need to use a special font that needs to be embedded. In that case, every separate PDF file will contain a subset of that font. Different files will require different font subsets, and PdfCopy (nor PdfSmartCopy for that matter) can merge font subsets. This could result in a bloated PDF file with way too many font subsets of the same font.
How do we solve this? That's explained in answer #2.
Answer #2: See ParseMultipleHtmlFiles2 for the full code sample and multiple_html_pages2.pdf for the resulting PDF. You already see the difference in file size: 4.61 KB versus 5.05 KB (and we didn't even introduce embedded fonts).
In this case, we don't parse the HTML to a PDF file the way we did in the parseHtml() method from answer #1. Instead, we parse the HTML to an iText ElementList using the parseToElementList() method. This method requires two Strings. One containing the HTML code, the other one containing CSS values.
We use a utility method to read the HTML file into a String. As for the CSS value, we could pass null to parseToElementList(), but in that case, default styles will be ignored. You'll notice that the <h1> tag we introduced in our HTML will look completely different if you don't pass the default.css that is shipped with XML Worker.
Long story short, this is the code:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(file));
document.open();
String css = readCSS();
for (String htmlfile : HTML) {
String html = Utilities.readFileToString(htmlfile);
ElementList list = XMLWorkerHelper.parseToElementList(html, css);
for (Element e : list) {
document.add(e);
}
document.newPage();
}
document.close();
}
We create a single Document and a single PdfWriter instance. We parse the different HTML files into ElementLists one by one, and we add all the elements to the Document.
As you want a new page, each time a new HTML file is parsed, I introduced a document.newPage(). If you remove this line, you can add the three HTML pages on a single page (which wouldn't be possible if you would opt for answer #1).

RichEditDocumentServer docx conversion to pdf not working

I am trying to convert a word document(.docx) into PDF. I am trying to pass Stream object into load document method of RichEditDocumentServer,but my bad ,I am getting a blank pdf. I tried passing the file path in load document method ,which worked fine. But my requirement has to meet with stream object. Can anyone help me to fix the issue. A sample code has been added below.
private Stream ConvertToPdf(Stream fileStream)
{
RichEditDocumentServer server = new RichEditDocumentServer();
fileStream.Seek(0, SeekOrigin.Begin);
server.LoadDocument(fileStream, DocumentFormat.Doc);
Stream convertStream = new MemoryStream();
server.ExportToPdf(convertStream);
convertStream.Seek(0, SeekOrigin.Begin);
return convertStream;
}
it worked with the below link
https://www.devexpress.com/support/center/Question/Details/T340655#answer-9c3224dd-383a-4faa-9672-9b34e36c1c7a
server.LoadDocument(fileStream, DocumentFormat.OpenXml);

Write hebrew text to pdf using itextpdf

I'm trying to write Hebrew characters to a PDF.
I created HTML in the servlet by appending to a string buffer.
Then passing the same to xmlworkerhelper.
Also registered the arial.ttf, arialuni.ttf and arialbold.ttf fonts using XMLWorkerFontProvider and using the registered fonts in xmlworkerhelper.
Used the below fonts for registering arial.ttf, ArialBold.ttf, arialuni.ttf;
Any solutions for this issue will be very helpful. Let me know if more details required.
Below is the snippet.
XMLWorkerFontProvider fontImp = new XMLWorkerFontProvider();
fontImp.register(arial);
fontImp.register(arialbold);
fontImp.register(arialuni);
byte[] byte6=PDFtext.toString().getBytes("ISO-8859-1");
String bufferfinalPDF=new String(byte6, "UTF-8");
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new ByteArrayInputStream(bufferfinalPDF.getBytes(Charset.forName(baseFont))),null,Charset.forName(baseFont),fontImp);
Solution or a suggestion on this is very helpful.

How to parse multiple HTML files into a single PDF?

I want to use iText to convert a series of html file to PDF.
For instance: if have these files:
page1.html
page2.html
page3.html
...
Now I want to create a single PDF file, where page1.html is the first page, page2.html is the second page, and so on...
I know how to convert a single HTML file to a PDF, but I don't know how to combine these different PDFs resulting from this operation into a single PDF.
Before we start: I am not a C# developer, so I can not give you an example in C#. All the iText examples I write, are written in Java. Fortunately, iText and iTextSharp are always kept in sync. In the context of this question, you can rest assure that whatever works for iText will also work for iTextSharp, but you'll have to make small adaptations that are specific to C#. From what I hear from C# developers, this is usually not hard to achieve.
Regarding the answer: there are two answers and answer #2 is generally better than answer #1, but I'm giving both options because there may be specific cases where answer #1 is better.
Test data: I have created 3 simple HTML files, each containing some info about a State in the US:
page1.html: California
page2.html: New York
page3.html: Massachusetts
We are going to use XML Worker to parse these three files and we want a single PDF file as a result.
Answer #1: see ParseMultipleHtmlFiles1 for the full code sample and multiple_html_pages1.pdf for the resulting PDF.
You say that you already succeeded in converting one HTML file into one PDF files. It is assumed that you did it like this:
public byte[] parseHtml(String html) throws DocumentException, IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, baos);
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(html));
// step 5
document.close();
// return the bytes of the PDF
return baos.toByteArray();
}
This is not the most efficient way to parse an HTML file (there are other examples on the web site), but it's the simplest way.
As you can see, this method parse an HTML into a PDF file and returns that PDF file in the form of a byte[]. As we want to create a single PDF, we can feed this byte array to a PdfCopy instance, so that we can concatenate multiple documents.
Suppose that we have three documents:
public static final String[] HTML = {
"resources/xml/page1.html",
"resources/xml/page2.html",
"resources/xml/page3.html"
};
We can loop over these three documents, parse them one by one to a byte[], create a PdfReader instance with the PDF bytes, and add the document to the PdfCopy instance using the addDocument() method:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(file));
document.open();
PdfReader reader;
for (String html : HTML) {
reader = new PdfReader(parseHtml(html));
copy.addDocument(reader);
reader.close();
}
document.close();
}
This solves your problem, but why do I think it's not the optimal solution?
Suppose that you need to use a special font that needs to be embedded. In that case, every separate PDF file will contain a subset of that font. Different files will require different font subsets, and PdfCopy (nor PdfSmartCopy for that matter) can merge font subsets. This could result in a bloated PDF file with way too many font subsets of the same font.
How do we solve this? That's explained in answer #2.
Answer #2: See ParseMultipleHtmlFiles2 for the full code sample and multiple_html_pages2.pdf for the resulting PDF. You already see the difference in file size: 4.61 KB versus 5.05 KB (and we didn't even introduce embedded fonts).
In this case, we don't parse the HTML to a PDF file the way we did in the parseHtml() method from answer #1. Instead, we parse the HTML to an iText ElementList using the parseToElementList() method. This method requires two Strings. One containing the HTML code, the other one containing CSS values.
We use a utility method to read the HTML file into a String. As for the CSS value, we could pass null to parseToElementList(), but in that case, default styles will be ignored. You'll notice that the <h1> tag we introduced in our HTML will look completely different if you don't pass the default.css that is shipped with XML Worker.
Long story short, this is the code:
public void createPdf(String file) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(file));
document.open();
String css = readCSS();
for (String htmlfile : HTML) {
String html = Utilities.readFileToString(htmlfile);
ElementList list = XMLWorkerHelper.parseToElementList(html, css);
for (Element e : list) {
document.add(e);
}
document.newPage();
}
document.close();
}
We create a single Document and a single PdfWriter instance. We parse the different HTML files into ElementLists one by one, and we add all the elements to the Document.
As you want a new page, each time a new HTML file is parsed, I introduced a document.newPage(). If you remove this line, you can add the three HTML pages on a single page (which wouldn't be possible if you would opt for answer #1).