I am trying to create a PDF on z/OS using JZOS and iText.
I have tried so many combinations for the font as well as the DefaultPlatformEncoding but I just don't seem to be able to get the Arabic characters to display in the PDF. They display as Latin characters. When I turn the PDF compression off and display the hex characters, I see the EBCDIC hex codes.
The input file on z/OS is IBM-420 and the output PDF should have Cp1256 or Windows-1256 for display on Windows.
Here is the snippet of the code:
// Open the input dataset
ZFile zFilein = new ZFile("//DD:INDS", "rb,type=record,noseek");
// Open the output PDF file
PdfWriter writer = PdfWriter.getInstance(document,
FileFactory.newBufferedOutputStream("//DD:OUTPDF"));
document.open();
// Font cf = new Font(Font.FontFamily.COURIER, Font.DEFAULTSIZE, Font.NORMAL);
// Font cf = FontFactory.getFont("Courier","Cp1256", true);
Font cf = FontFactory.getFont("Arial", BaseFont.IDENTITY_H, true, Font.DEFAULTSIZE, Font.NORMAL);
Paragraph paragraph = new Paragraph();
paragraph.setFont(cf);
String encoding = ZUtil.getDefaultPlatformEncoding();
// String encoding = "Cp1256";
String line = new String(recBuf,1,nRead-1,encoding);
paragraph.add(line);
I tried the following options but still unable to get the PDF to display correctly and also
the PDF Font information does not show the font as EMBEDDED. Anything else I missed?
Note: arial.ttf was uploaded from WINDOWS
Option 1
FontFactory.register("arial.ttf");
Font cf = FontFactory.getFont("Arial", 8);
paragraph = new Paragraph(line, cf);
The FONT information in the PDF displays the following:
ArialMT
Type: TrueType
Encoding: Ansi
Actual Font: ArialMT
Actual Font Type: TrueType
Option 2
BaseFont bf = BaseFont.createFont(font1, BaseFont.IDENTITY_H, true);
Font cf = new Font(bf, 10);
paragraph = new Paragraph(line, cf);
Viewing the PDF display the following error:
Cannot extract the embedded font 'ZQRNLC+ArialMT'. Some characters may not display or
print correctly.
Viewing the source of the PDF in an editor I can see the following:
R/FontName/ZQRNLC+ArialMT/
The FONT in the PDF displays the following information:
ArialMT
Type: TrueType(CID)
Encoding: Identity-H
Actual Font: Unknown
Your question is a duplicate of several other questions.
This line of code may be problematic:
Font cf = FontFactory.getFont("Arial", BaseFont.IDENTITY_H, true, Font.DEFAULTSIZE, Font.NORMAL);
This won't give you the font Arial unless you have registered an Arial font program as explained in my answer to the question
"Why doesn't FontFactory.GetFont("Known Font Name", floatSize) work?"
I don't know z/OS (never heard of it), but if it uses EBCDIC, it's doing something wrong. Your strings need to be in UNICODE. It is very important that the String values you are using are in the right encoding as explained in my answer to the question "getBytes() doesn't work for Cyrillic letters". In your case, you are reading the content from a file, but there is no guarantee that the encoding you use to read the file matches the encoding that was used to store the file. When you say that the glyphs are shown in EBCDIC, I think that you're mistaken and that you experience the same problem as explained in "getBytes() doesn't work for Cyrillic letters". You really need Unicode.
You say that you want to create text in Arabic, but I don't see you using the RTL setting anywhere. This is explained in my answer to this question: "how to create persian content in pdf using eclipse".
Please download the free ebook "The Best iText Questions on StackOverflow" and you'll notice that all these problems have been asked and answered before. I would like to close your question as a duplicate of how to create persian content in pdf using eclipse, but unfortunately, that answer wasn't accepted and it didn't receive an upvote, so it can't be used as an original question to mark another question as a duplicate.
Related
I am using iTextSharp 5.5.13.2 in PowerShell - I'm trying to fill out a pdf form with pre-encoded string of characters, which will be read out as a Code128 barcode, after displaying it with a custom font. The template was made in Indesign and edited in Acrobat (pro), to provide properties such as text alignment, text size and font.
I am using the code below to fill out appropriate fields of the template:
$pdfReader = New-Object iTextSharp.text.pdf.PdfReader("path\to\template.pdf")
$currentRecord = $kat7_List[$l]
$currentId = $currentRecord.Split("`t")[0]
$currentCode = $currentRecord.Split("`t")[1]
$pdfStamper = New-Object iTextSharp.text.pdf.PdfStamper($pdfReader, [System.IO.File]::Create("$($tempPath)\$($l).pdf"))
$pdfStamper.AcroFields.SetField("id_Field", $currentId) | Out-Null
$pdfStamper.AcroFields.SetField("kod_Field", $currentCode) | Out-Null
$pdfStamper.FormFlattening = 0
$pdfStamper.Close()
$pdfReader.Close()
The problem is that when I open the output file, the "kod_Field" AcroField is missing glyphs (Ò and Ó, 0210 and 0211 in Unicode chars) - which correspond to start and end characters in this particular Code128 representation, like pictured here:
Barcode before clicking
When I click to edit it in Acrobat though, it suddenly regains these glyphs and the code is working (they are present in the font - I am using such system successfully in Indesign, with exactly the same encoded string values): Barcode after clicking
The coded strings are in UTF-16LE encoded text file, and should stay that way - I have tried setting BaseFont for the Stamper with BaseFont.IDENTITY_H, .CP1250 and .CP1252 - in either case the code came out even more "mangled" than now.
FormFlattening in the final version should be switched to true - I don't want open form fields in the final product. It flattens with missing glyphs, of course, but for testing purpose, I leave the flattening set to 0.
Which properties should I verify and what method governs encoding in this example? I though that .SetField() method should do the trick without much problem, since it's provided with properly encoded value. What might be the cause of such behavior?
Please help me out, regards.
--- Addendum, 17.08.2021 ---
I have temporarily changed the font to Myriad Pro in the template's kod_Field. It filled out properly, with no glyphs missing and proper string value. When I changed it back to code font via Acrobat in the finished pdf, it didn't have any missing glyphs. I really want to avoid using an additional operation (setting "textfont" property on the AcroField after setting the value) - because it should already be working "out of the box".
I am writing an application that generates a PDF file using iText 2.1.3. When printing a Unicode text that contains characters not in the font used, the characters simply vanishes in the output.
I have read that the way to solve that is to use a FontSelector something like this:
public void createPdf(final String filename, final String text)
throws IOException, DocumentException {
// Get at document for printing.
final Document document = new Document(...);
PdfWriter.getInstance(document, new FileOutputStream(filename));
document.open();
// Create a FontSelector with multiple fonts.
final FontSelector selector = new FontSelector();
final Font f1 = ...; // Covers e.g. Western glyphs
selector.addFont(f1);
final Font f2 = ...; // Covers e.g. Chinese glyphs
selector.addFont(f2);
// Phrase contains of chunks each using an approriate font
// for rendering part of the text.
final Phrase ph = selector.process(text);
document.add(new Paragraph(ph));
document.close();
}
(Example rewritten from this example.)
However, there might still be some characters in the text that are not covered by any of the available fonts.
Is there a way to tell iText 2.1.3 to print e.g. the Unicode Replacement Character (U+FFFD aka �) for such characters?
Using the Unicode Consortium Last Resort Font as the last font to search by the FontSelector there will be a glyph for every possible Unicode character.
The Last Resort Font is a special font that has different glyphs for different character categories giving a good hint about which kind of characters your font does not support.
So instead of getting � glyphs, the glyphs shown here will be displayed in the PDF.
I am on the mainframe platform and uploaded the arial.ttf from Windows. I used the following code for the font, but the font does not show SUBSETTED or EMBEDDED in Adobe. I even tried to add font.getBaseFont to force it to embed.
Any reason why it would not embed or subset?
String font1 = "arial.ttf";
FontFactory.register(font1,"myfont");
BaseFont bf = BaseFont.createFont(font1, BaseFont.IDENTITY_H, true);
Font font = FontFactory.getFont("arial");
font.getBaseFont().setSubset(true);
Adobe doc show the following font information:
Type truetype
Encoding Ansi
Actual Font: ArialMT
Actual Font type: TrueType
You create a BaseFont object bf, but you aren't doing anything with it. One would expect that you do this:
BaseFont bf = BaseFont.createFont(pathToFont, BaseFont.IDENTITY_H, true);
Font font = new Font(bf, 12);
In this case, font would make sure that a subset of the font is embedded because the encoding is Identity-H and iText always embeds a subset of a font with that encoding.
As you aren't doing anything with bf, it is as if the line isn't present. In that case, we are left with:
String font1 = "arial.ttf";
FontFactory.register(font1,"myfont");
Font font = FontFactory.getFont("arial");
Assuming that the path to arial.ttf is correct, and that the alias of that font is "arial", you are now creating a font with the default encoding (Ansi), the default font size (12) and the default embedding (false).
That is in line with what is shown in Adobe Reader. If you want a subset of the font to be embedded, you need at least:
Font font = FontFactory.getFont("arial", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
In answer to your question: the reason why the font is not embedded by iText is the fact that you are not telling iText to embed the font.
We are using iTextSharp to create PDF's using a custom font and I am running into an issue with the unicode character 2120 (SM, Service Mark). The problem is the glyph is not in the custom font. Is there a way I can specify a fallback font for a field in a PDF? We tried adding a text field with Verdana so that the form had the secondary font embedded in it but that didn't seem to help.
First you need to find a font (ttf, otf,...) that has the character you need. Then you can use the FontSelector class and add the different fonts you want to use to this selector (see the FontSelectionExample). Now you can process every string:
FontSelector selector = new FontSelector();
selector.addFont(f1);
selector.addFont(f2);
Phrase ph = selector.process(some_string);
The FontSelector will return a phrase that consists of Chunks with font f1 for all the glyphs that are available in the first font, and Chunks with font f2 for the glyphs that couldn't be font in f1, but that are present in f2.
If you want the C# port of this example, please consult chapter 11.
Update
Using a FontSelector also works in the context of forms (as long as we're talking about AcroForm technology). It's really easy: you just need to add substitutions fonts to the form:
AcroFields form = stamper.getAcroFields();
form.addSubstitutionFont(bf1);
form.addSubstitutionFont(bf2);
Now the font defined in the form has preference, but if that font can't show a specific glyph, it will look at bf1, then at bf2 and so on. You can find an example demonstrating this functionality here.
Note that there's a difference between the first and the second example. In the first example we use a Font object, in the second, we use a BaseFont object.
i am using free pdf library libharu to generate PDF file,
but i have a encoding problem, i can not draw Thai lanugage text on PDF file,
all the text shows "???.."
Somebody know how to fix it?
Thanks
I have succeeded in rendering hieroglyphic texts (not Thai, but Chinese and Japanese) using libharu. First of all, I used Unicode mode, please refer to HPDF_UseUTFEncodings() function documentation.
For C language, here is a sequence of libharu API calls needed to overcome your trouble:
HPDF_UseUTFEncodings(docHandle);
HPDF_SetCurrentEncoder(docHandle, "UTF-8");
Here docHandle is a valid HPDF_Doc object.
Next part is proper work with UTF fonts:
const char * libFontName = HPDF_LoadTTFontFromFile(docHandle, fontFileName.c_str(), font_embed::EmbedFonts);
HPDF_Font font = HPDF_GetFont(docHandle, libFontName, "UTF-8");
After these calls you may render unicode texts containing Thai characters. Also note about embedding flag (3rd param of LoadTTFontFromFile) - your PDF file may be unreadable due to external font references. If you are not crazy with output PDF size, you may just embed fonts.
I've tested couple of Thai .ttf fonts found in Google and they were rendered OK. Also (it may be important, but I'm not sure) I'm using fork of libharu https://github.com/kdeforche/libharu which is now merged into master branch.
When you write text to the PDF, use the correct font and encoding. In the libharu documentation you have all the possibilities: https://github.com/libharu/libharu/wiki/Fonts
In your case, you must use the ISO8859-11 Thai, TIS 620-2569 character set
An example (in spanish):
HPDF_Font fontEn = HPDF_GetFont(pdf, "Helvetica-Bold", "ISO8859-2");
HPDF_Page_TextOut(page1, 50.00, 750.00, [#"Código para correcta codificación en libharu" cStringUsingEncoding:NSISOLatin1StringEncoding]);