html2pdf special characters not rendered - unicode

I am using html2pdf library for genarating pdf with bookmarked index. By default it seems to work well for English content but i need to generate content that includes English & Arabic text. The "aefurat" font seems to work relatively good, except some special characters (’, ‘, “, ”, ...) that are rendered as boxes ([]).
The code I used is,
require_once(dirname(__FILE__).'/../html2pdf.class.php');
$html2pdf = new HTML2PDF('P', 'A4', 'en', true, 'UTF-8', 0);
$html2pdf->setDefaultFont('aefurat');
$html2pdf->writeHTML($content);
$html2pdf->Output('bookmark.pdf');
A Sample content that includes arabic and special chararacters is,
’This is Arabic’ "العربية" Example With TCPDF... some text here some
text here some “text here”.
Wondering if I need to use some other font or alter some configurations. Kindly advice me.

Related

How does this font manage to display even in plain text?

I came across a piece of text that displays in a mystery font even when you view source in plain text: 𝓦𝓸𝓸𝓭
The word 'Wood' above appears, at least in Chrome, as a sort of caligraphic font when pasted in to Notepad or even the Google search bar.
Have tried to see if its base64 encoded characters, or quoted printable etc
𝓦𝓸𝓸𝓭
Can anyone identify how its done? Can it be done with a different font? Is it cross browser compatible?
Those characters are not being shown in a different font. They're in the same font as the rest of the page source.
The reason why they look strange is that they are not the ordinary letters 'W', 'o' and 'd' represented by the character values 0x57, 0x6f and 0x64. These characters are from the "Mathematical Alphanumeric Symbols" section of the font. Specifically they are the "Mathematical Bold Script Capital W", the "Mathematical Bold Script Small O" and the "Mathematical Bold Script Small D" characters represented by the values 0x1d4e6, 0x1d4f8 and 0x1d4ed. See https://unicode-table.com/en/blocks/mathematical-alphanumeric-symbols/ for a table of the characters in that section.
There's a good chance that any modern browser would show those characters just as you're seeing them. It comes down to whether the font that the browser uses to present the page includes glyphs for those character values.

Replace Text in a textfield with Image

I have a form on my website.
Inside this form a some emoji buttons.
If a user clicks one of this buttons they add an emoji to a <textfield>,
but they look like this " :) ".
So how can i replace the :) Emoji with images.
Example: If a user click a button it should add <img src="img/emoji/1.png"> and display it.
Use the UTF8 characters for the emoji's. Since the UTF8 emoji's are considered text characters, you don't need to use any HTML for this, and they will render in normal textarea's and text input's without anything special.
Here's a reference for some unicode UTF8 emoji's. In your buttons code, replace the ;) or whatever with the code for the UTF8 symbol-- for example on the table in the link it shows the code U+1F600 for a smiley face. In your script you want to say it as \u1F600 in quotes. This will render the emoji character. Most browsers, especially on mobile, have good support for these characters, but you can bolster that by importing an emoji font using css.
http://www.unicode.org/emoji/charts/full-emoji-list.html
Addendum: You can even cut and paste UTF8 emoji's directly into your code. Since they are valid characters in UTF8, they are valid code.. you could even use them in variable names 😎

Should I use hex ascii accented character code in HTML or use the actual character?

I have several huge CSVs with lots of accented characters in html hex code: é for é and lots of others, even – for –, etc.
My site is a wiki for people to update listings. So when they are presented a textarea for update, the existing content is filled in, and obviously those hex codes will be shown.
Should I be bothered replacing those codes with actual accented characters, or just leave it as it is? I wrote a script to replace the characters, but somehow the output are weird characters. Probably the format saved in Ruby isn't in UTF-8 format.
By default my site is in UTF-8, and the accented characters are displayed properly with some html coding in the view.
Please advise. Thanks.
Could you clarify what the problem is?
If your data (CSV) is in UTF-8, and the default encoding of your site is UTF-8, then all you would need to do is make sure that when users are editing content, that content is properly treated as UTF-8.
You may not need to display the markup to the users. Perhaps you could leverage a WYSIWIG editor package like TinyMCE?

How to draw Thai text to PDF file by using libharu library

i am using free pdf library libharu to generate PDF file,
but i have a encoding problem, i can not draw Thai lanugage text on PDF file,
all the text shows "???.."
Somebody know how to fix it?
Thanks
I have succeeded in rendering hieroglyphic texts (not Thai, but Chinese and Japanese) using libharu. First of all, I used Unicode mode, please refer to HPDF_UseUTFEncodings() function documentation.
For C language, here is a sequence of libharu API calls needed to overcome your trouble:
HPDF_UseUTFEncodings(docHandle);
HPDF_SetCurrentEncoder(docHandle, "UTF-8");
Here docHandle is a valid HPDF_Doc object.
Next part is proper work with UTF fonts:
const char * libFontName = HPDF_LoadTTFontFromFile(docHandle, fontFileName.c_str(), font_embed::EmbedFonts);
HPDF_Font font = HPDF_GetFont(docHandle, libFontName, "UTF-8");
After these calls you may render unicode texts containing Thai characters. Also note about embedding flag (3rd param of LoadTTFontFromFile) - your PDF file may be unreadable due to external font references. If you are not crazy with output PDF size, you may just embed fonts.
I've tested couple of Thai .ttf fonts found in Google and they were rendered OK. Also (it may be important, but I'm not sure) I'm using fork of libharu https://github.com/kdeforche/libharu which is now merged into master branch.
When you write text to the PDF, use the correct font and encoding. In the libharu documentation you have all the possibilities: https://github.com/libharu/libharu/wiki/Fonts
In your case, you must use the ISO8859-11 Thai, TIS 620-2569 character set
An example (in spanish):
HPDF_Font fontEn = HPDF_GetFont(pdf, "Helvetica-Bold", "ISO8859-2");
HPDF_Page_TextOut(page1, 50.00, 750.00, [#"Código para correcta codificación en libharu" cStringUsingEncoding:NSISOLatin1StringEncoding]);

UTF8 charset, diacritical elements, conversion problems - and Zend Framework form escaping

I am writing a webapp in ZF and am having serious issues with UTF8. It's using multi lingual content through Zend Form and it seems that ZF heavily escapes all of these characters and basically just won't show a field if there's diacritical elements 'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
Zend Form allows for having non escaped data, but trying to use this is confusing, and it seems it'd need to be used all over the place.
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
And if the last question is true, then how do I convert the source text to UTF8? I am comfortable setting up apache so that it sends a default UTF8 charset heading, and also adding the charset meta tag to the html, but doing this I am still getting messed up encoding. I have also tried opening the translation csv file in TextWrangler on OSX as UTF8, but it has done nothing.
Thanks!
L
'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
This I don't understand. Can you show an example of how it is displayed, as opposed to how it should be displayed?
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
Yup. In more detail: If the data you're displaying and the encoding of the HTML page are both UTF-8, the multi-byte special characters will be displayed correctly.
And if the last question is true, then how do I convert the source text to UTF8?
Advanced editors and IDEs enable you to define what encoding the source file is saved in. You would need to open the file in its current encoding (with special characters being displayed correctly) and save it as UTF-8.
If the content is messed up when you have the right content-type header and/or meta tag specified, then the content is not UTF-8 yet. If you don't get it sorted, post an example of what it looks like here.