iTextSharp does not show Unicode symbols on pdf document - unicode

I have a little problem in ASP.NET Webforms I am using iTextSharp to create a pdf document. Everything works great, except the Unicode characters that are not displayed or they appear as question marks. Here is my code, if anyone knows what am I doing wrong I would be very grateful!
BaseFont base = BaseFont.CreateFont("C:\\Windows\\Fonts\\Wingdings.ttf", BaseFont.CP1252, BaseFont.EMBEDDED);
Font font = new Font(base, 12, Font.NORMAL, BaseColor.BLACK);
using (MemoryStream ms = new MemoryStream())
{
using (var document = new Document())
{
HTMLWorker parser = new HTMLWorker(document);
document.NewPage();
string str = " \u2022 \u2706 ";
Phrase phrase = new Phrase(str, font);
document.Add(phrase);
//and after that I have some other strings which are parsed with parser and that works perfectly
}}
I have tried to parse the phrase with XMLWorkerHelper, but it did not work at all. Also, I have tried to convert the string like this and the symbols appeared as question marks. I have tried to change the font like Wingding2, Wingding3 but it did not work.
byte[] bytestr = Encoding.Default.GetBytes(str);
string convertstr = Encoding.UTF8.GetString(bytestr);
Phrase phrase = new Phrase(convertstr, font);
document.Add(phrase);
Also I have tried to enter them like this, but also I've got the same result with the question marks
string str = "•✆";
Phrase phrase = new Phrase(str, font);
document.Add(phrase);
EDIT: I have tried to change the font to Segoe UI Symbol and escape characters like \u260F, but still does not work.

Related

Java iText - special characters not being displayed

I'm trying for quite some time now to generate a PDF in Java using itextpdf (com.itextpdf kernel,layout,form,pdfa) with text containing special characters (äöüß). I tried several things in different variations, like loading a TTF file and setting the encoding:
FontProgram fontProgram = FontProgramFactory.createFont( "font/FreeSans.ttf") ;
PdfFont font = PdfFontFactory.createFont( fontProgram, "UTF-8" ) ;
document.setFont( font );
This way it just doesn't display special characters at all.
This doesn't work either:
var font = PdfFontFactory.createFont(StandardFonts.HELVETICA, PdfEncodings.UTF8);
document.setFont( font );
I haven't found any solution to this and the official tutorials don't seem to have a solution.
Other encodings just render placeholder characters.
this is how I add the text:
PdfWriter writer = new PdfWriter(filename);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
Paragraph p = new Paragraph("äüöß");
document.add(p);
document.close();
edit: I just realized that it works when I load the text from elsewhere like an input field, instead of passing a normal string. How can I make this work with hardcoded strings?
I tried re-encoding the string as described here: https://www.baeldung.com/java-string-encode-utf-8
but none of these methods work either. It always shows wrong characters.
PdfFont freeUnicode = PdfFontFactory.createFont("font/FreeSans.ttf", PdfEncodings.IDENTITY_H);
String rawString = "äöüß1234'";
byte[] bytes = StringUtils.getBytesUtf8(rawString);
String utf8EncodedString = StringUtils.newStringUtf8(bytes);
document.add(new Paragraph().setFont(freeUnicode)
.add(utf8EncodedString));
edit: The encoding in the source code editor is UTF-8 and I passed UTF-8 to the createFont() method, but that didn't work. When I pass CP1252 and change the source code encoding to ISO-8859-1, it shows the correct characters. Really strange how I couldn't find much information about this problem.

Arabic problems with converting html to PDF using ITextRenderer

When I use ITextRenderer converting html to PDF.this is my code
ByteArrayOutputStream out = new ByteArrayOutputStream();
ITextRenderer renderer = new ITextRenderer();
String inputFile = "C://Users//Administrator//Desktop//aaa2.html";
String url = new File(inputFile).toURI().toURL().toString();
renderer.setDocument(url);
renderer.getSharedContext().setReplacedElementFactory(
new B64ImgReplacedElementFactory());
// 解决阿拉伯语问题
ITextFontResolver fontResolver = renderer.getFontResolver();
try {
fontResolver.addFont("C://Users//Administrator//Desktop//arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
} catch (DocumentException e) {
e.printStackTrace();
}
renderer.layout();
OutputStream outputStream = new FileOutputStream("C://Users//Administrator//Desktop//HTMLasPDF.pdf");
renderer.createPDF(outputStream, true);
/*PdfWriter writer = renderer.getWriter();
writer.open();
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
OutputStream outputStream2 = new FileOutputStream( "C://Users//Administrator//Desktop//HTMLasPDFcopy.txt");
renderer.createPDF(outputStream2);*/
renderer.finishPDF();
out.flush();
out.close();
Actual PDF Result:
Expected PDF Result:
How to make arabic ligature?
If you want to do this properly (I assume using iText, since your post is tagged as such), you should use
iText7
pdfHTML (to convert HTML to PDF)
pdfCalligraph (to handle Arabic ligatures properly)
a font that supports these features (as indicated by another answer)
For an example, please consult the HTML to PDF tutorial, more specifically the following FAQ item: How to convert HTML containing Arabic/Hebrew characters to PDF?
You need fonts that contain the glyphs you need, e.g.:
public static final String[] FONTS = {
"src/main/resources/fonts/noto/NotoSans-Regular.ttf",
"src/main/resources/fonts/noto/NotoNaskhArabic-Regular.ttf",
"src/main/resources/fonts/noto/NotoSansHebrew-Regular.ttf"
};
And you need a FontProvider that knows how to find these fonts in the ConverterProperties:
public void createPdf(String src, String[] fonts, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
for (String font : fonts) {
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram);
}
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
Note that the text will come out all wrong if you don't have the pdfCalligraph add-on. That add-on didn't exist at the time Flying Saucer was created, hence you can't use Flying Saucer for converting documents with text in Arabic, Hindi, Telugu,... Read the pdFCalligraph white paper if you want to know more about ligatures.
Greek characters seemed to be omitted; they didn’t show up in the document.
In flying saucer the generated PDF uses some kind of default
(probably Helvetica) font, that contains a very limited character set,
that obviously does not contain the Greek code page. link
I change the way to convert pdf by using wkhtmltopdf.

How to fill an existing pdf form with itext 7 and itextsharp 5.5.9 without flattening it?

I am trying to fill out a USCIS form and after filling it is making as read only (flattening that). I am not sure why it is doing that. Even though I don’t have any code to flatten that.
I searched the stack overflow and tried many different things (with itextsharp 5.5.9 and itext 7) but still it doesn’t work.
Here is the sample code I am using
string src = #"https://www.uscis.gov/sites/default/files/files/form/i-90.pdf";
string dest = #"C:\temp\i-90Filled.pdf";
var reader = new PdfReader(src);
reader.SetUnethicalReading(true);
var writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader, writer);
// add content
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("form1[0].#subform[0].P1_Line3b_GivenName[0]", out toSet);
toSet.SetValue("John");
pdfDoc.Close();
Forms are filled like this with iTextSharp 5:
string src = #"https://www.uscis.gov/sites/default/files/files/form/i-90.pdf";
string dest = #"C:\temp\i-90Filled.pdf";
PdfReader pdfReader = new PdfReader(src);
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(
dest, FileMode.Create));
AcroFields form = pdfStamper.AcroFields;
form.SetField("form1[0].#subform[0].P1_Line3b_GivenName[0]", "John");
Note that the form you are trying to fill is a hybrid form. It contains the description of the form twice: once as an AcroForm; once as an XFA form. You may want to remove the XFA form by adding this line:
form.RemoveXfa();
For more info, read the documentation:
Is it safe to remove XFA?
How to change the text color of an AcroForm field?
Your code is using iText 7 for C#. If you want that code to work, you most certainly need to remove the XFA part as iText 7 doesn't support XFA. iText 7 was developed with PDF 2.0 in mind (this spec is to be released in 2017). XFA will be deprecated in PDF 2.0. A valid PDF 2.0 file will not be allowed to contain an XFA stream.

How do I process Russian text in Eclipse?

I need to sort some Russian text file and when I try to read the strings and print them out, they all appear garbled and like boxes. Looks like there is no Russian support for my eclipse. I downloaded Language packs plug in but I can't figure out how to install it.
Help required please.
FileInputStream fstream = new FileInputStream("c:\\textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
ArrayList<String> allLines = new ArrayList<String>();
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
allLines.add(strLine);
System.out.println(strLine);
}
How can you be sure it's an eclipse problem? It could be:
The encoding of the text file
The method you used to read the text file (eg: InputStreamReader uses default charset unless you explicitly specify it on the constructor)
The method you used to store the text file in memory
The method you used to print the text
The method you used to view the printed text

OpenXML editing docx

I have a dynamically generated docx file.
Need write the text strictly to end of page.
With Microsoft.Interop i insert Paragraphs before text:
int kk = objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing);
while (objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing) != kk + 1)
{
objWord.Selection.TypeParagraph();
}
objWord.Selection.TypeBackspace();
But i can't use same code with Open XML, because pages.count calculated only by word.
Using interop impossible, because it so slowwwww.
There are 2 options of doing this in Open XML.
create Content Place holder from Microsoft Office Developer Tab at the end of your document and now you can access this Content Place Holder programatically and can place any text in it.
you can append text driectly to your word document where it will be inserted at the end of your text. In this approach you got to write all the stuff to your document first and once you are done than you can append your document the following way
//
public void WriteTextToWordDocument()
{
using(WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
Body body = mainPart.Document.Body;
Paragraph paragraph = new Paragraph();
Run run = new Run();
Text myText = new Text("Append this text at the end of the word document");
run.Append(myText);
paragraph.Append(run);
body.Append(paragraph);
// dont forget to save and close your document as in the following two lines
mainPart.Document.Save();
doc.Close();
}
}
I haven't tested the above code but hope it will give you an idea of dealing with word document in OpenXML.
Regards,