Java iText - special characters not being displayed - itext

I'm trying for quite some time now to generate a PDF in Java using itextpdf (com.itextpdf kernel,layout,form,pdfa) with text containing special characters (äöüß). I tried several things in different variations, like loading a TTF file and setting the encoding:
FontProgram fontProgram = FontProgramFactory.createFont( "font/FreeSans.ttf") ;
PdfFont font = PdfFontFactory.createFont( fontProgram, "UTF-8" ) ;
document.setFont( font );
This way it just doesn't display special characters at all.
This doesn't work either:
var font = PdfFontFactory.createFont(StandardFonts.HELVETICA, PdfEncodings.UTF8);
document.setFont( font );
I haven't found any solution to this and the official tutorials don't seem to have a solution.
Other encodings just render placeholder characters.
this is how I add the text:
PdfWriter writer = new PdfWriter(filename);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
Paragraph p = new Paragraph("äüöß");
document.add(p);
document.close();
edit: I just realized that it works when I load the text from elsewhere like an input field, instead of passing a normal string. How can I make this work with hardcoded strings?
I tried re-encoding the string as described here: https://www.baeldung.com/java-string-encode-utf-8
but none of these methods work either. It always shows wrong characters.
PdfFont freeUnicode = PdfFontFactory.createFont("font/FreeSans.ttf", PdfEncodings.IDENTITY_H);
String rawString = "äöüß1234'";
byte[] bytes = StringUtils.getBytesUtf8(rawString);
String utf8EncodedString = StringUtils.newStringUtf8(bytes);
document.add(new Paragraph().setFont(freeUnicode)
.add(utf8EncodedString));
edit: The encoding in the source code editor is UTF-8 and I passed UTF-8 to the createFont() method, but that didn't work. When I pass CP1252 and change the source code encoding to ISO-8859-1, it shows the correct characters. Really strange how I couldn't find much information about this problem.

Related

iTextSharp does not show Unicode symbols on pdf document

I have a little problem in ASP.NET Webforms I am using iTextSharp to create a pdf document. Everything works great, except the Unicode characters that are not displayed or they appear as question marks. Here is my code, if anyone knows what am I doing wrong I would be very grateful!
BaseFont base = BaseFont.CreateFont("C:\\Windows\\Fonts\\Wingdings.ttf", BaseFont.CP1252, BaseFont.EMBEDDED);
Font font = new Font(base, 12, Font.NORMAL, BaseColor.BLACK);
using (MemoryStream ms = new MemoryStream())
{
using (var document = new Document())
{
HTMLWorker parser = new HTMLWorker(document);
document.NewPage();
string str = " \u2022 \u2706 ";
Phrase phrase = new Phrase(str, font);
document.Add(phrase);
//and after that I have some other strings which are parsed with parser and that works perfectly
}}
I have tried to parse the phrase with XMLWorkerHelper, but it did not work at all. Also, I have tried to convert the string like this and the symbols appeared as question marks. I have tried to change the font like Wingding2, Wingding3 but it did not work.
byte[] bytestr = Encoding.Default.GetBytes(str);
string convertstr = Encoding.UTF8.GetString(bytestr);
Phrase phrase = new Phrase(convertstr, font);
document.Add(phrase);
Also I have tried to enter them like this, but also I've got the same result with the question marks
string str = "•✆";
Phrase phrase = new Phrase(str, font);
document.Add(phrase);
EDIT: I have tried to change the font to Segoe UI Symbol and escape characters like \u260F, but still does not work.

How to construct a unicode glyph key in C#

I am using FontAwesome to display glyphs in my Xamarin Android application. If I hardcode the glyph like this, where everything works fine:
string iconKey = "\uf0a3";
var drawable = new IconDrawable(this.Context, iconKey, "Font Awesome 5 Pro-Regular-400.otf").Color(Xamarin.Forms.Color.White.ToAndroid()).SizeDp(fontSize);
However, if what I have is the four letter code "f0a3" from FontAwesome's cheatsheet, stored in a string variable, I don't know how to set my iconKey variable to a value that works. Just concatenating a "\u" onto the beginning doesn't work, which makes sense since that's a Unicode escape indicator, not part of a standard string, but I don't know what to do instead. I also tried converting to and from Unicode in various random ways - e.g.
iconKey = unicode.GetChars(unicode.GetBytes("/u" + myFourChar.ToString())).ToString();
but unsurprisingly that didn't work either.
The IconDrawable is from here. The value I send becomes an input there to the Paint.GetTextBounds method and the Canvas.DrawText method.
Thanks for any assistance!
Found the answer here. Here is the code I am using, based on that post but simplified since I have only one hexadecimal character to handle:
string myString = "f0a3";
var chars = new char[] { (char)Convert.ToInt32(myString, 16) };
string iconKey = new string(chars);
var drawable = new IconDrawable(this.Context, iconKey, "Font Awesome 5 Pro-Regular-400.otf").Color(Xamarin.Forms.Color.White.ToAndroid()).SizeDp(fontSize);

Arabic problems with converting html to PDF using ITextRenderer

When I use ITextRenderer converting html to PDF.this is my code
ByteArrayOutputStream out = new ByteArrayOutputStream();
ITextRenderer renderer = new ITextRenderer();
String inputFile = "C://Users//Administrator//Desktop//aaa2.html";
String url = new File(inputFile).toURI().toURL().toString();
renderer.setDocument(url);
renderer.getSharedContext().setReplacedElementFactory(
new B64ImgReplacedElementFactory());
// 解决阿拉伯语问题
ITextFontResolver fontResolver = renderer.getFontResolver();
try {
fontResolver.addFont("C://Users//Administrator//Desktop//arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
} catch (DocumentException e) {
e.printStackTrace();
}
renderer.layout();
OutputStream outputStream = new FileOutputStream("C://Users//Administrator//Desktop//HTMLasPDF.pdf");
renderer.createPDF(outputStream, true);
/*PdfWriter writer = renderer.getWriter();
writer.open();
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
OutputStream outputStream2 = new FileOutputStream( "C://Users//Administrator//Desktop//HTMLasPDFcopy.txt");
renderer.createPDF(outputStream2);*/
renderer.finishPDF();
out.flush();
out.close();
Actual PDF Result:
Expected PDF Result:
How to make arabic ligature?
If you want to do this properly (I assume using iText, since your post is tagged as such), you should use
iText7
pdfHTML (to convert HTML to PDF)
pdfCalligraph (to handle Arabic ligatures properly)
a font that supports these features (as indicated by another answer)
For an example, please consult the HTML to PDF tutorial, more specifically the following FAQ item: How to convert HTML containing Arabic/Hebrew characters to PDF?
You need fonts that contain the glyphs you need, e.g.:
public static final String[] FONTS = {
"src/main/resources/fonts/noto/NotoSans-Regular.ttf",
"src/main/resources/fonts/noto/NotoNaskhArabic-Regular.ttf",
"src/main/resources/fonts/noto/NotoSansHebrew-Regular.ttf"
};
And you need a FontProvider that knows how to find these fonts in the ConverterProperties:
public void createPdf(String src, String[] fonts, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
for (String font : fonts) {
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram);
}
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
Note that the text will come out all wrong if you don't have the pdfCalligraph add-on. That add-on didn't exist at the time Flying Saucer was created, hence you can't use Flying Saucer for converting documents with text in Arabic, Hindi, Telugu,... Read the pdFCalligraph white paper if you want to know more about ligatures.
Greek characters seemed to be omitted; they didn’t show up in the document.
In flying saucer the generated PDF uses some kind of default
(probably Helvetica) font, that contains a very limited character set,
that obviously does not contain the Greek code page. link
I change the way to convert pdf by using wkhtmltopdf.

How do I process Russian text in Eclipse?

I need to sort some Russian text file and when I try to read the strings and print them out, they all appear garbled and like boxes. Looks like there is no Russian support for my eclipse. I downloaded Language packs plug in but I can't figure out how to install it.
Help required please.
FileInputStream fstream = new FileInputStream("c:\\textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
ArrayList<String> allLines = new ArrayList<String>();
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
allLines.add(strLine);
System.out.println(strLine);
}
How can you be sure it's an eclipse problem? It could be:
The encoding of the text file
The method you used to read the text file (eg: InputStreamReader uses default charset unless you explicitly specify it on the constructor)
The method you used to store the text file in memory
The method you used to print the text
The method you used to view the printed text

What is the character encoding?

I have several characters that aren't recognized properly.
Characters like:
º
á
ó
(etc..)
This means that the characters encoding is not utf-8 right?
So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.
If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html
There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.
static void Main(string[] args)
{
Encoding[] matches = FindEncodingTable('Ÿ');
Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
}
// Locates all Encodings with the specified Character and position
// "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table
//"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table
static Encoding[] FindEncodingTable(int CharacterPosition, char character)
{
List matches = new List();
byte myByte = (byte)CharacterPosition;
byte[] bytes = { myByte };
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = thisEnc.GetChars(bytes);
if (chars[0] == character)
{
matches.Add(thisEnc);
break;
}
}
return matches.ToArray();
}
// Locates all Encodings that contain the specified character
static Encoding[] FindEncodingTable(char character)
{
List matches = new List();
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = { character };
byte[] temp = thisEnc.GetBytes(chars);
if (temp != null)
matches.Add(thisEnc);
}
return matches.ToArray();
}
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.
An example of encoding can be seen when browsing the internet:
The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:
www.example.com?search=...
The following variables on the URL require URL encoding. If you was to write:
www.example.com?search=cat food cheap
The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)
To correct this encoding error you should exchange the ' ' with '%20' to form this URL:
www.example.com?search=cat%20food%20cheap
Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.
Good Luck!