PDFTable Unicode support - fpdf

I'm using PDFTable from http://www.vanxuan.net/tool/pdftable/ which is based on FPDF class. I managed to export HTML table to pdf using PDFTable. However, I'm facing one issue. The non-English characters are all displayed in gibberish. It doesn't seem that it supports unicode. The language I'm trying to display is Arabic and Russian.
I could, theoretically, create a similar class to PDFTable, which is inherited from FPDF, and develop it from scratch to add unicode support. But it's a lot of work. Has anyone done something like that and perhaps could share? Thank you!

For unicode support, the best way is to use tFPDF from http://www.fpdf.org/en/script/script92.php. It's a fork of FPDF with specifically to support unicode. The class is based on the latest FPDF version 1.7.

Related

How can i create iTextSharp pdf in Hindi font?

I am trying to build desktop application for Hindi PDFs in c#. But the Unicode encoding is not well supported.Any idea to fix this.
string ARIALUNI_TTF = path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
bf = iTextSharp.text.pdf.BaseFont.CreateFont(ARIALUNI_TTF, BaseFont.IDENTITY_H,BaseFont.EMBEDDED);
iTextSharp.text.Font font = new iTextSharp.text.Font(bf, 8, iTextSharp.text.Font.NORMAL);
Can Identity_H will give support for Hindi Encoding?
Hindi is not supported yet. A font like mangal.ttf, that supports the Devanagari script, will show you in iTextSharp the glyphs but not the ligatures. Work is being done on the Indic front not only for Hindi support but also for Telegu, Gujarati and others.
You basically require support for Asian Characters. A similar thread can be found here(stackoverflow). The implementation revolve around usage of BaseFont (use createFont method), which indicates using font and appropriate encoding. You can find the example on the official site of iText here. Note that the example is in Java, but the same implementation is available in .Net as well.

How to display emoji char in HTML

I saved the face "savouring delicious food emoji" to database, and read it in php json_encode which show "uD83D\uDE0B"。 but usually we use one <img /> label to replace it .
however,usually I just find this format '\uE056' not "uD83D\uDE0B",to replace with pic E056.png .
I don't know how to get the pic accroding to 'uD83D\uDE0B'.someone know ?
What the relation between 'uD83D\uDE0B' and '\uE056', they both represent emoji "savouring delicious food"?
The Unicode character U+1F60B FACE SAVOURING DELICIOUS FOOD is a so-called Plane 1 character, which means that its UTF-16 encoded form consists of two 16-bit code units, namely 0xD83D 0xDE0B. Generally, Plane 1 characters cause considerable problems because many programs are not prepared to deal with them, and few fonts contain them.
According to http://www.fileformat.info/info/unicode/char/1f60b/fontsupport.htm this particular character only exists in DejaVu fonts and in Symbola, but the versions of DejaVu I’m using don’t contain it.
Instead of dealing with the problems of encodings (which are not that difficult, but require extra information), you can use the character reference 😈 in HTML. But this does not solve the font problem. I don’t know about iPhone fonts, but in general in web browsing, the odds of a computer having any font capable of rendering the character are probably less than 1%. So you may need to use downloadable fonts. Using an image is obviously much simpler and mostly more reliable.
U+E056 is a Private Use codepoint, which means that anybody can make an agreement about its meaning with his brother or with himself, without asking anyone else’s mind. A font designer may assign any glyph to it.
IMPORTANT: As of this posting, the only browser that doesn't automatically support emojis is chrome.
FOR CHROME:
Depending on what server side language you are using, you should be able to find a library that converts emojis for you. I recently needed to solve this issue with php and used this library:
https://github.com/iamcal/php-emoji
The creator essentially created a sprite and adjusts the css according to the unicode of the emoji. It isnt pretty, but luckily he/she did all the grunt work for you. If you're using a different language you should be able to find something similar.
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.

docx - markup / markup - docx conversion

I have to store some documents in the docx format, but can't stand using msword: I would like to edit some kind of plain text markup, anything except stuff based on XML (I don't like that either) and convert from/to that to/from docx.
Are there any options for this?
EDIT: since people think this is not programming related, I'll extend my question. What libraries do you suggest for writing a complete tex-docx/docx-tex converter?
If you're talking .net, I'd check out the OpenXML toolkit first. There are lots of "libraries" on the internet to do this, but they all seem to just be thin wrappers around the OpenXML stuff.
You might also check out
http://openxmldeveloper.org/
Aspose.Words for .NET allows you to create DOCX files from scratch using text or other content and then convert DOCX files to text etc. It doesn't require MS Office to be installed on the system. And the component is a simple .NET assembly with an easy to learn and implement API. Please try and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
You can try the DocxEditorKit http://java-sl.com/docx_editor_kit.html
Set the editor kit to JEditorPane, add styled text and store the document in docx format.

firefox addon development and Unicode

So I started developing my firefox addon.
Most of the work is performed by a referenced javascript file.
Problem is that when I edit some of the html elements on the page and say, set their text it's written as pure giberish. I am writing the text in hebrew. Can't for the life of me figure the reason.
Any ideas?
Javascript strings are already Unicode at runtime. However, you have to make sure that your files are encoded correctly.
Always use utf-8 (without BOM) file encoding for all your js, XUL, DTD, properties files to be sure.
Firefox might try to guess the file character set incorrectly otherwise, and even worse some stuff might not even try guessing the encoding and instead simply always assume utf-8.
Better yet, do not hard-code strings in js/xul, but use DTD/properties files for localization (XUL tutorial, XUL School).
This, e.g. snippet works pretty well for me (on this very page):
document.getElementsByTagName("h1")[0].textContent="русский язык";
(Just fire up the Firefox Web Console)
"Inline" hewbrew embedded in js files might create additional problems because it is right-to-left and bidi sucks, so the localization approach should be preferred.

Converting hebrew text to an image using imagemagick

I need to convert text to an image. Using imagemagick I can get this done.
However, part or all of the text could be in Hebrew (an RTL language).
This means the words in Hebrew are rendered backwards.
If I was assured that the text was only Hebrew, I would have just reversed the text before sending it to ImageMagick. However, this solution won't work if part of the text is in English.
Does anyone have any idea how this can be done?
P.S. I'm not committed to using ImageMagick, if a better way comes up.
However, the solution should work for both Linux and Windows (I might be able to live with a non-windows solution, but a multi OS solution is preferable).
Thanks,
Niv
i see this link
http://www.experts-exchange.com/Software/Photos_Graphics/Web_Graphics/Q_21766928.html
they suggest
Maybe Unifier (http://www.melody-soft.com/html/unifier.html) or Encoding Master (http://www.elfdata.com/encodingmaster/index.html)
Sounds like your real issue is to re-order the bidirectional text for imagemagick. A job for the Unicode bidirectional algorithm. See http://unicode.org/reports/tr9/ That report lists two reference implementations. Or see this one: http://fribidi.org/