iText : Unable to print mathematical characters like ∈, ∩, ∑, ∫, ∆ √, ∠ - itext

I am using iText 4.0.2 version for pdf generation. I have some characters/symbols to print like ∈, ∩, ∑, ∫, ∆ (Mathematical symbols) and many others.
My code :
Document document = new Document(PageSize.A4, 60, 60, 60, 60);
try
{
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("/home/adeel/experiment.pdf"));
document.open();
String str = "this string will contains special character like this ∈, ∩, ∑, ∫, ∆";
BaseFont bfTimes = null;
try {
bfTimes = BaseFont.createFont(BaseFont.TIMES_ROMAN, BaseFont.CP1252, BaseFont.EMBEDDED);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Font fontnormal = new Font(bfTimes, 12, Font.NORMAL, Color.BLACK);
Paragraph para = new Paragraph(str, fontnormal);
document.add(para);
document.close();
writer.close();
System.out.println("Done!");
} catch (DocumentException e)
{
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
In the above code I am putting all the symbols in str and generating the pdf file. In place of symbols I tried putting unicode characters in str for the symbols but It didn't worked as well.

Please take a look at the MathSymbols example.
First you need a font that supports the symbols you need. Incidentally, FreeSans.ttf is such a font. Then you need to use the right encoding.
You're using UNICODE, so you need Identity-H as the encoding.
You should also use notations such as \u2208, \u2229, \u2211, \u222b, \u2206. That's not a must, but it's good practice.
This is how it's done:
public static final String DEST = "results/fonts/math_symbols.pdf";
public static final String FONT = "resources/fonts/FreeSans.ttf";
public static final String TEXT = "this string contains special characters like this \u2208, \u2229, \u2211, \u222b, \u2206";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
BaseFont bf = BaseFont.createFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font f = new Font(bf, 12);
Paragraph p = new Paragraph(TEXT, f);
document.add(p);
document.close();
}
The result looks like this: math_symbols.pdf
Important: you should always use the most recent, official version of iText. iText 4.0.2 is not an official version.

Related

iText PDF/A-2 Java add total page count in footer

I need to add the total page count to a PDF/A-2 document created using iText in Java. The following code is being used:
public class HeaderFooterPageEvent extends PdfPageEventHelper {
Font fontHEADER = null;
/** The template with the total number of pages. */
PdfTemplate total;
public HeaderFooterPageEvent() {
try {
fontHEADER = new Font(BaseFont.createFont("OpenSans-Regular.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED), 8, Font.BOLD);
} catch (DocumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
#Override
public void onOpenDocument(PdfWriter writer, Document document) {
total = writer.getDirectContent().createTemplate(30, 16);
super.onOpenDocument(writer, document);
}
#Override
public void onCloseDocument(PdfWriter writer, Document document) {
PdfContentByte cb = writer.getDirectContent();
ColumnText.showTextAligned(total, Element.ALIGN_RIGHT,
new Phrase(String.valueOf(writer.getPageNumber() - 1)),fontHEADER),
document.right() - document.rightMargin()+5,
document.bottom() - 10, 0);
super.onCloseDocument(writer, document);
}
}
And when creating the PDF the following code is called:
Document document = new Document(PageSize.A4, 15, 15, 30, 20);
PdfAWriter writer = PdfAWriter.getInstance(document, new FileOutputStream(dest), PdfAConformanceLevel.PDF_A_2A);
writer.createXmpMetadata();
writer.setTagged();
// add header and footer
HeaderFooterPageEvent event = new HeaderFooterPageEvent();
writer.setPageEvent(event);
document.open();
document.addLanguage("en-us");
File file = new File("sRGB_CS_profile.icm");
ICC_Profile icc = ICC_Profile
.getInstance(new FileInputStream(file));
writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
Paragraph p=new Paragraph("Page 1 content",fontEmbedded); //setting an embedded font
p.setSpacingBefore(30f);
document.add(p);
document.newPage();
document.add(new Paragraph("Content of next page goes here",fontEmbedded));
document.close();
Now when we add content on 2 pages and use document.newPage() to add the new page, runtime exception is generated The page 3 was requested but the document has only 2 pages. What is a solution to this problem?

How can i use this ® symbol in itextsharp [duplicate]

I am using iText 4.0.2 version for pdf generation. I have some characters/symbols to print like ∈, ∩, ∑, ∫, ∆ (Mathematical symbols) and many others.
My code :
Document document = new Document(PageSize.A4, 60, 60, 60, 60);
try
{
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("/home/adeel/experiment.pdf"));
document.open();
String str = "this string will contains special character like this ∈, ∩, ∑, ∫, ∆";
BaseFont bfTimes = null;
try {
bfTimes = BaseFont.createFont(BaseFont.TIMES_ROMAN, BaseFont.CP1252, BaseFont.EMBEDDED);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Font fontnormal = new Font(bfTimes, 12, Font.NORMAL, Color.BLACK);
Paragraph para = new Paragraph(str, fontnormal);
document.add(para);
document.close();
writer.close();
System.out.println("Done!");
} catch (DocumentException e)
{
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
In the above code I am putting all the symbols in str and generating the pdf file. In place of symbols I tried putting unicode characters in str for the symbols but It didn't worked as well.
Please take a look at the MathSymbols example.
First you need a font that supports the symbols you need. Incidentally, FreeSans.ttf is such a font. Then you need to use the right encoding.
You're using UNICODE, so you need Identity-H as the encoding.
You should also use notations such as \u2208, \u2229, \u2211, \u222b, \u2206. That's not a must, but it's good practice.
This is how it's done:
public static final String DEST = "results/fonts/math_symbols.pdf";
public static final String FONT = "resources/fonts/FreeSans.ttf";
public static final String TEXT = "this string contains special characters like this \u2208, \u2229, \u2211, \u222b, \u2206";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
BaseFont bf = BaseFont.createFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font f = new Font(bf, 12);
Paragraph p = new Paragraph(TEXT, f);
document.add(p);
document.close();
}
The result looks like this: math_symbols.pdf
Important: you should always use the most recent, official version of iText. iText 4.0.2 is not an official version.

How to search the specific keyword in PDF and highlight using itext PDF

I have a requirement to do below items,
Read the existing PDF file
Search the specific keywords in the PDF
Highlight them in specific color or bold
Save the PDF
and i have to tried below code,
public static void main(String[] args) throws IOException, DocumentException
{
File file = new File(DEST); file.getParentFile().mkdirs();
new BrefingPackageHighlight_Main2().manipulatePdf(SRC, DEST);
}
public void manipulatePdf(String src, String dest) throws IOException, DocumentException
{
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest)); PdfContentByte canvas = stamper.getOverContent(2); canvas.saveState(); canvas.setColorFill(BaseColor.YELLOW);
canvas.rectangle(200, 786, 5, 5);
canvas.fill();
canvas.restoreState();
stamper.close();
reader.close();
}
the above code only highlights top of the second page. Please provide me samples to search one specific keyword and highlight them alone.

Issue in converting HTML to PDF containing <pre> tag with Flying Saucer and ITEXT

I am using Flying Saucer library to convert html to pdf. It is working fine with the all the HTML files.
But for some HTML files which include some tags in pre tag, generated PDF file has tags displayed.
If I remove pre tags then the formatting of data is lost.
My code is
org.w3c.dom.Document document = null;
try {
Document doc = Jsoup.parse(new File(htmlFile), "UTF-8", "");
Whitelist wl = new RelaxedPlusDataBase64Images();
Cleaner cleaner = new Cleaner(wl);
doc = cleaner.clean(doc);
Tidy tidy = new Tidy();
tidy.setShowWarnings(false);
tidy.setXmlTags(false);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setPrintBodyOnly(true);
tidy.setXHTML(true);
tidy.setMakeClean(true);
tidy.setAsciiChars(true);
if (doc.select("pre").html().contains("</")) {
doc.select("pre").unwrap();
}
Reader reader = new StringReader(doc.html());
document = (tidy.parseDOM(reader, null));
Element element = (Element) document.getElementsByTagName("head").item(0);
element.getParentNode().removeChild(element);
NodeList elements = document.getElementsByTagName("img");
for (int i = 0; i < elements.getLength(); i++) {
String value = elements.item(i).getAttributes().getNamedItem("src").getNodeValue();
if (value != null && value.startsWith("cid:") && value.contains("#")) {
value = value.substring(value.indexOf("cid:") + 4, value.indexOf("#"));
elements.item(i).getAttributes().getNamedItem("src").setNodeValue(value);
System.out.println(value);
}
}
document.normalize();
System.out.println(getNiceLyFormattedXMLDocument(document));
} catch (Exception e) {
System.out.println(e);
}
Method to create PDF is :
try {
org.w3c.dom.Document doc = CleanHtml.cleanNTidyHTML("b.html");
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.setPDFVersion(new Character('7'));
String outputFile = "test.pdf";
OutputStream os = new FileOutputStream(outputFile);
renderer.layout();
renderer.createPDF(os);
os.flush();
os.close();
} catch (Exception e) {
e.printStackTrace();
}
By using itext XMLWorker :
try {
org.w3c.dom.Document doc = CleanHtml.cleanNTidyHTML("a.html");
String k = CleanHtml.getNiceLyFormattedXMLDocument(doc);
OutputStream file = new FileOutputStream(new File("test.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
ByteArrayInputStream is = new ByteArrayInputStream(k.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
public static String getNiceLyFormattedXMLDocument(org.w3c.dom.Document doc) throws IOException, TransformerException {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
// transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
Writer stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);
transformer.transform(new DOMSource(doc), streamResult);
String result = stringWriter.toString();
return result;
}

XMLWorkerHelper loses text not between tags

I've been using XMLWorkerHelper to add formatted text, inputted on pages via a rich text editor, to PDFs. I've noticed that sometimes not all text was rendered in the PDF. Apparently XMLWorkerHelper drops text not between HTML tags.
Is this correct behavior?
I've written a JUnit test case that shows the problem:
public class XMLWorkerTest {
#Test
public void test() throws IOException, DocumentException {
Document document = new Document();
String fileName = "itext_test_" + System.currentTimeMillis() + ".pdf";
PdfWriter.getInstance(document, new FileOutputStream(fileName));
document.open();
Paragraph paragraph = new Paragraph();
String s1 = "not between tags<b>between tags</b>not between tags";
addHtml(paragraph, s1);
// NOT OK: 'not between tags' missing twice
paragraph.add(Chunk.NEWLINE);
String s2 ="<span>" + s1 + "</span>";
addHtml(paragraph, s2);
// OK
document.add(paragraph);
document.close();
}
private void addHtml(final Paragraph paragraph, String html) throws IOException {
XMLWorkerHelper.getInstance().parseXHtml(new ElementHandler() {
#Override
public void add(Writable writable) {
if (writable instanceof WritableElement) {
for (Element element : ((WritableElement) writable).elements()) {
paragraph.add(element);
}
}
}
}, new ByteArrayInputStream(html.getBytes()), Charset.defaultCharset());
}
}
We're using version 5.5.6.
That's the expected behavior. Your html should have a root tag otherwise it's not really html. Just because the text shows in a browser doesn't mean that is well formed.