I have problems outputting UTF-8 characters into pdf file with Zend_Pdf. Here is my code:
// Load Zend_Pdf class
include 'Zend/Pdf.php';
// Create new PDF
$pdf = new Zend_Pdf();
// Set font
$page->setFont(Zend_Pdf_Font::fontWithPath('fonts/times.ttf'), 12);
// Draw text
$page->drawText('Janko Hraško', 200, 643, 'UTF-8');
The font I§m loading supports UTF-8 characters. But I am getting this error"
Notice: iconv() [function.iconv]: Detected an illegal character in input string in D:\data\o\Zend\Pdf\Resource\Font\Type0.php on line 241
With the Helvetica font your code works!
Solved:
$page->drawText('Janko Hraško', 200, 643, 'Windows-1250');
For some reason, Windows-1250 encoding works but UTF-8 doesn't. Weird but I will use Windows-1250 then.
Related
I'm facing a problem when trying to export a Vietnamese document as PDF using iText.
I put Vietnamese words in .xml file like this
<td fontfamily="Helvetica" fontstyle="0" fontsize="9" align="0" colspan="48" lineoccupied="1">T\u1ED5 ch\u1EE9c tham gia</td>
then having java to get the phrases from xml file and convert it into Unicode using this method:
public String convertToUnicode(String s) {
int i = 0, len = s.length();
char c;
StringBuffer sb = new StringBuffer(len);
try {
while (i < len) {
c = s.charAt(i++);
if (c == '\\') {
if (i < len) {
c = s.charAt(i++);
if (c == 'u') {
if (Character.digit(s.charAt(i), 16) != -1
&& Character.digit(s.charAt(i + 1), 16) != -1
&& Character.digit(s.charAt(i + 2), 16) != -1
&& Character.digit(s.charAt(i + 3), 16) != -1) {
if (s.substring(i).length() >= 4) {
c = (char) Integer.parseInt(s.substring(i, i + 4), 16);
i += 4;
} else {
sb.append('\\');
}
} else {
sb.append('\\');
}
} // add other cases here as desired...
}
} // fall through: \ escapes itself, quotes any character but u
sb.append(c);
}
} catch (Exception e) {
System.out.println("Error Generate PDF :: " + e.getStackTrace().toString());
return s;
}
return sb.toString();
}
After that, export String to PDF - encoding UTF-8.
But the program failed to display Vietnamese character '\u1ED5' and '\u1EE9'
The output becomes "T chc tham gia"
Could you please show me how to fix this issue?
Thanks :)
There are 3 XML Worker examples involving Asian languages on the official iText web site. They parse an XHTML file containing Chinese characters, but it should be easy to adapt them to Vietnamese examples.
You can find the HTML files were going to parse here:
hero.html
hero2.html
Both files contain the following text:
長空 (Broken Sword), 秦王殘劍 (Flying Snow), 飛雪 (Moon), 如月 (the King), and 秦王 (Sky).
In the first case, a font is defined using CSS:
<span style="font-size:12.0pt; font-family:MS Mincho">長空</span>
In the second case, no specific font is defined:
<body><p>長空 (Broken Sword), 秦王殘劍 (Flying Snow), 飛雪 (Moon), 如月 (the King), and 秦王 (Sky).</p></body>
These files contain UTF-8 characters, so we're going to parse them like this:
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML), Charset.forName("UTF-8"));
The first thing you need, is a font that supports Vietnamese characters. That's something iText can't help you with. In your HTML file, you've defined Helvetica, but that's a standard Type1 font that is never embedded when using iText and that doesn't know how to draw Vietnamese glyphs. That's never going to work.
The first example D07_ParseHtmlAsian will automatically search for a font named MS Mincho. If it finds that font (for instance because you have msmincho.ttc in your Windows fonts directory), the font will show up in your PDF. See hero.pdf. If it doesn't find a font with that name, then the glyphs won't be visible, because you didn't provide any font program for those glyphs.
The second example D07bis_ParseHtmlAsian offers a workaround in case you don't have MS Mincho anywhere. In that case, you have to use an XMLWorkerFontProvider and register a font that can be used instead of MS Mincho. For instance: we use a font stored in the file cfmingeb.ttf and assign the alias MS Mincho:
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/cfmingeb.ttf", "MS Mincho");
The resulting file asian.pdf is slightly different from what we expect, but now we can at least see the Chinese glyphs.
In the third example, the HTML file doesn't tell us anything about the font that needs to be used. We'll define the font using CSS like this:
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream("body {font-family:tsc fming s tt}".getBytes()));
cssResolver.addCss(cssFile);
Now, all the text in the body will use the font TSC FMing S TT (stored in the file cfmingeb.ttf). You can see the difference in the resulting PDF asian2.pdf.
I think you need an encoding as UTF-8 for your HTML and use &#xUNUM; for hex or &#NUM; for regular code to embed your special characters. Not sure where but somewhere in your program since it is not display shown, but your final HTML should be:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML LEVEL 1//EN">
<HTML>
<HEAD>
<TITLE>Your Page Title</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
</HEAD>
<BODY>
<!-- YOUR CONTENT HERE -->
<td fontfamily="Helvetica" fontstyle="0" fontsize="9"
align="0" colspan="48"
lineoccupied="1">Tổ chức tham gia</td>
</BODY>
</HTML>
You can cut and paste the above into an HTML file and view the result. More reading pleasure is here Unicode and HTML
I'm facing a problem when trying to export a Vietnamese document as PDF using iText.
I put Vietnamese words in .xml file like this
<td fontfamily="Helvetica" fontstyle="0" fontsize="9" align="0" colspan="48" lineoccupied="1">T\u1ED5 ch\u1EE9c tham gia</td>
then having java to get the phrases from xml file and convert it into Unicode using this method:
public String convertToUnicode(String s) {
int i = 0, len = s.length();
char c;
StringBuffer sb = new StringBuffer(len);
try {
while (i < len) {
c = s.charAt(i++);
if (c == '\\') {
if (i < len) {
c = s.charAt(i++);
if (c == 'u') {
if (Character.digit(s.charAt(i), 16) != -1
&& Character.digit(s.charAt(i + 1), 16) != -1
&& Character.digit(s.charAt(i + 2), 16) != -1
&& Character.digit(s.charAt(i + 3), 16) != -1) {
if (s.substring(i).length() >= 4) {
c = (char) Integer.parseInt(s.substring(i, i + 4), 16);
i += 4;
} else {
sb.append('\\');
}
} else {
sb.append('\\');
}
} // add other cases here as desired...
}
} // fall through: \ escapes itself, quotes any character but u
sb.append(c);
}
} catch (Exception e) {
System.out.println("Error Generate PDF :: " + e.getStackTrace().toString());
return s;
}
return sb.toString();
}
After that, export String to PDF - encoding UTF-8.
But the program failed to display Vietnamese character '\u1ED5' and '\u1EE9'
The output becomes "T chc tham gia"
Could you please show me how to fix this issue?
Thanks :)
There are 3 XML Worker examples involving Asian languages on the official iText web site. They parse an XHTML file containing Chinese characters, but it should be easy to adapt them to Vietnamese examples.
You can find the HTML files were going to parse here:
hero.html
hero2.html
Both files contain the following text:
長空 (Broken Sword), 秦王殘劍 (Flying Snow), 飛雪 (Moon), 如月 (the King), and 秦王 (Sky).
In the first case, a font is defined using CSS:
<span style="font-size:12.0pt; font-family:MS Mincho">長空</span>
In the second case, no specific font is defined:
<body><p>長空 (Broken Sword), 秦王殘劍 (Flying Snow), 飛雪 (Moon), 如月 (the King), and 秦王 (Sky).</p></body>
These files contain UTF-8 characters, so we're going to parse them like this:
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML), Charset.forName("UTF-8"));
The first thing you need, is a font that supports Vietnamese characters. That's something iText can't help you with. In your HTML file, you've defined Helvetica, but that's a standard Type1 font that is never embedded when using iText and that doesn't know how to draw Vietnamese glyphs. That's never going to work.
The first example D07_ParseHtmlAsian will automatically search for a font named MS Mincho. If it finds that font (for instance because you have msmincho.ttc in your Windows fonts directory), the font will show up in your PDF. See hero.pdf. If it doesn't find a font with that name, then the glyphs won't be visible, because you didn't provide any font program for those glyphs.
The second example D07bis_ParseHtmlAsian offers a workaround in case you don't have MS Mincho anywhere. In that case, you have to use an XMLWorkerFontProvider and register a font that can be used instead of MS Mincho. For instance: we use a font stored in the file cfmingeb.ttf and assign the alias MS Mincho:
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/cfmingeb.ttf", "MS Mincho");
The resulting file asian.pdf is slightly different from what we expect, but now we can at least see the Chinese glyphs.
In the third example, the HTML file doesn't tell us anything about the font that needs to be used. We'll define the font using CSS like this:
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream("body {font-family:tsc fming s tt}".getBytes()));
cssResolver.addCss(cssFile);
Now, all the text in the body will use the font TSC FMing S TT (stored in the file cfmingeb.ttf). You can see the difference in the resulting PDF asian2.pdf.
I think you need an encoding as UTF-8 for your HTML and use &#xUNUM; for hex or &#NUM; for regular code to embed your special characters. Not sure where but somewhere in your program since it is not display shown, but your final HTML should be:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML LEVEL 1//EN">
<HTML>
<HEAD>
<TITLE>Your Page Title</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
</HEAD>
<BODY>
<!-- YOUR CONTENT HERE -->
<td fontfamily="Helvetica" fontstyle="0" fontsize="9"
align="0" colspan="48"
lineoccupied="1">Tổ chức tham gia</td>
</BODY>
</HTML>
You can cut and paste the above into an HTML file and view the result. More reading pleasure is here Unicode and HTML
Our project needs dynamic PDF generation in 6 languages which consists of Hindi and Arabic. iText works brilliantly for other languages except these two. Can someone let me know if current version of iText(5.5.5) have ligature implementation for Hindi and Arabic?
about Arabic language:
I understand you face one of two problems:
the Arabic words don't appear.
the Arabic words appears like :
the solution is:
first: update your itext 5.5.8 to itext 7 implementation:
https://search.maven.org/artifact/com.itextpdf/itext7-core/7.2.2/pom
then edit your code:
String font = "your Arabic font";
PdfFontFactory.register(font);
FontProgram fontProgram = FontProgramFactory.createFont(font, true);
PdfFont f = PdfFontFactory.createFont(fontProgram, PdfEncodings.IDENTITY_H);
LanguageProcessor languageProcessor = new ArabicLigaturizer();
com.itextpdf.kernel.pdf.PdfDocument tempPdfDoc = new
com.itextpdf.kernel.pdf.PdfDocument(new PdfReader(pdfFile.getPath()), TempWriter);
com.itextpdf.layout.Document TempDoc = new
com.itextpdf.layout.Document(tempPdfDoc);
com.itextpdf.layout.element.Paragraph paragraph0 = new
com.itextpdf.layout.element.Paragraph(languageProcessor.process("الاستماره الالكترونية"))
.setFont(f).setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setFontSize(15);
//and look how i useded setBaseDirection & and don't use TextAlignment ,it will work without it
I need to use special characters like "ç", "á"... but nothing works.
Part of the code:
Paragraph title = new Paragraph();
com.lowagie.text.Font fontTitle= FontFactory.getFont("Arial", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, 15, Font.BOLD);
title.add(new Paragraph("ç", fontTtile));
document.add(title);
Using IDENTITY_H and EMBEDDED the pdf doesn't work, but if I just use ("Arial", 15, Font.BOLD) it works (special characters don't).
How can I fix it?
Thanks.
I am reading data from a .csv file and displaying it. When I encounter the micro character (µ) some special symbols are displayed instead. How can I display the micro character?
class Scr extends MainScreen
{
public Scr() {
LabelField label = new LabelField();
String fontName = label.getFont().getFontFamily().getName();
String text = "Font name: "+fontName+" Symbol: µ";
label.setText(text);
add(label);
}
}
screen http://img207.imageshack.us/img207/9606/screenwx.jpg
Can't reproduce on RIM OS 4.5 Curve 8300, font "BBAlpha Sans".
Check what font are you using, there may be no "µ" symbol
Try to debug and see if symbol is readed correctly from file