ABCpdf ellipsis and apostrophes showing up garbled - special-characters

In my situation, users are able to insert their own text into a field before printing the page as a pdf. Some of these users have been copying and pasting from Word, so the special apostrophes and ellipsis characters are being saved in the field. These special characters are fine when viewing the html page but render as a garbled mess when ABCpdf gets a hold of them.
I have tried adding the following to the head of the page as per ABCpdf 5 Problems with encoding (special characters)
<meta http-equiv="content-type" content="text/xhtml; charset=utf-8" />
I have also tried this: Symbol font on ABCpdf
theDoc.HtmlOptions.FontEmbed = true;
theDoc.HtmlOptions.FontSubstitute = false;
theDoc.HtmlOptions.FontProtection = false;
I'm using the method AddImageUrl() for the rendering.
How do I make these special characters show up properly using ABCpdf?

Related

Arabic text shows strange characters الÙباى انگليسى ØŒ

I have Arabic text (.sql pure text). When I view it in any document, it shows like this:
حر٠اول الÙباى انگليسى ØŒ حر٠اضاÙÙ‡ مثبت
But when I use an HTML document with <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>, it shows properly like this:
حرف اول الفباى انگليسى ، حرف اضافه مثبت
How can I convert it to readable text?
The Arabic text has been encoded to bytes using UTF-8.
You are explicitly telling the HTML document that the bytes are encoded in UTF-8, which is why any HTML viewer will be able to display the text correctly.
However, any other text viewer will not know the bytes are encoded in UTF-8, unless you put a UTF-8 BOM in front of the text, and the viewer supports BOMs. Otherwise, as you are seeing, a text viewer may instead interpret the bytes in Latin-1 or similar encoding instead. So, you would have to manually tell the text viewer to interpret the bytes as UTF-8 instead. But how you actually do that depends on the particular text viewer you are using. Not all viewers offer this option.

html2pdf special characters not rendered

I am using html2pdf library for genarating pdf with bookmarked index. By default it seems to work well for English content but i need to generate content that includes English & Arabic text. The "aefurat" font seems to work relatively good, except some special characters (’, ‘, “, ”, ...) that are rendered as boxes ([]).
The code I used is,
require_once(dirname(__FILE__).'/../html2pdf.class.php');
$html2pdf = new HTML2PDF('P', 'A4', 'en', true, 'UTF-8', 0);
$html2pdf->setDefaultFont('aefurat');
$html2pdf->writeHTML($content);
$html2pdf->Output('bookmark.pdf');
A Sample content that includes arabic and special chararacters is,
’This is Arabic’ "العربية" Example With TCPDF... some text here some
text here some “text here”.
Wondering if I need to use some other font or alter some configurations. Kindly advice me.

.ENCODING international chars (hebrew,thai,russian,chinese,....)

international html files archived by wget
should contain chars like this
(example hebrew and thai:)
אב
הם
and ยคน
instead they are saved like this:
íäáåãéú and ÃÒ¡à§é
How to get the these displayed properly?
iconv filename.html
iconv: illegal input sequence at position 1254
SOLVED: There was nothing wrong.
Only i didnt notice the default php.ini did set the charset in the http header but
to use various charsets like this meta http-equiv="Content-Type" content="text/html; charset=windows-874" you needed to set: default_charset = "empty";
....
The pages aren't "saved like this", whatever you're using to view the file is simply interpreting the encoding incorrectly. To know what encoding the file is in you should have paid attention to the HTTP Content-Type header during download; that's gone now.
Your only other chance is to parse the equivalent HTML meta tag in the <head>, if the document has one.
Otherwise, you can only guess the encoding of the document.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for more required background knowledge.

Zend Framework Form not rendering special characters like (ä, ö, ü etc) - makes the form element value empty

I am trying to set the Zend Form working for me. I am using the same form for inserting and editing a particular database object. The object has name and I can easily create a new object with the name "Ülo". It saves correctly in database and when I fetch it to display in a report then it shows correclty "Ülo". The problem is with forms. When I open the edit form then the name element is empty. All other elements are showing correctly and if I change them to have "ü" in them they are displayed empty too. The same thing is with Form element labels. When I set a label to contain "ü" it does not show the label any more.
For example if I have $name->setLabel('Nameü: '); then it does not show the label but when I change it back to $name->setLabel('Name: '); then it shows correclty.
Same thing when I have $bcrForm->name->setValue('Ülo'); it does not show the value but when I change it to $bcrForm->name->setValue('Alo'); it shows correctly.
How can I fix it to display correctly? It seems like it is some kind of form rendering issue.
This one helped solving it for me:
make sure these settings are in /etc/php5/apache2/php.ini and /etc/php5/cli/php.ini:
default_charset = utf-8
Did you checked encoding? Try adding this to head...
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
make sure that both your script and view files are encoded on UTF-8 and your connection to your DB is set for UTF-8 too.
If you are using mysql you force it to return UTF-8 data by openning a DB connection and running : SET NAMES 'utf8'
or using mysqli with : mysqli_set_charset('utf8');
I would check:
view charset
database charset (backend)
Zend_Db_Adapter charset
file charset
The view escape method is set to expect utf8 chars and may strip anything else (ie. singlebyte strange chars) :)
It should be as simple as setting escape flag to false on the elements label decorator.
$name->addDecorator('Label', аrray('escape'=>false));
Or see setEscape(). http://framework.zend.com/manual/1.12/en/zend.form.standardDecorators.html

Using unicode characters in gwt checkbox label

How can I put a unicode character in the label(constructor) of a gwt checkbox. if I put the character in, gwt escapes the & and I end up with ë in the label of the checkbox instead of ë.
Unicode characters in Java String literals follow a special syntax.
In your case, you could write it like this:
new CheckBox("H\u00ebllo")
The code for "ë" is 00eb - you can use e.g. this table. By the way, 00ebhexadecimal = 235decimal
Another possibility is to save your Java files as UTF-8. Then you can write your literals without escaping for these characters. This however also requires you to set the compiler option -Dfile.encoding=UTF-8. Many IDEs do this automatically, if you set the encoding preference for the file to UTF-8.
Another important factor is that you should set the charset of your HTML page correctly (usually UTF-8):
<meta http-equiv="content-type" content="text/html; charset=UTF-8">