I am converting email to a .pdf using an HTML-to-pdf conversion scheme.
When I convert the email I see this in the pdf:
So I looked a little deeper into the email and can see Unicode character 200e which is a left to right character:
I am going to strip that character out of the email before the conversion, but is there a better solution?
[EDIT] Thank you for correcting the typo where I misstated the direction.
This email is from a U.S. based English configured user's computer. The date is inserted when he hits reply on an email and it inserts the marker for separating the conversations. Due to the nature of the business, it is highly unlikely they will get right to left languages. The 0x200e only appears in the date.
This is a programming question because I am converting the HTML email to pdf using c# in an outlook add-on that we are creating.
We are using HtmlRenderer.PdfSharp to do the conversion; seems to work very well other than this annoyance.
Related
The automated outlook emails using pywin32 and plain HTML were great till people started using it for forwarding and reply, Once you forward all the HTML formats are getting stripped and the borders of the table suddenly disappears. The way around is to go to your outlook settings and disable the option "Reduce message size by removing format information not necessary for the message".
The question is how to format the email so that it wont be lost when forwarded and make the format information necessary for the message ?
I have found out a work around though, It is observed that outlook is stripping of those styles which are defined in style block, If the styles are defined embedded in tags its escaping the stripping. As of now I have taken this approach
I'm getting emails occasionally that are having strange encoding issues. The quotation marks show up as ³example², and apostrophes show up as that¹s. I can't imagine that the other person actually meant to use those symbols, even though the email headers specify an encoding of Windows-1252. I'm using Thunderbird for Mac OSX, and I'm not sure what email client is being used to send these messages.
These are the characters ` and angled double-quotes. In my experience, these are typically from OSX because it uses a specialized version of ISO-8859, that's what I recall reading when researching this issue a few months ago, if I find the reference I will add the link.
If the sender specifies UTF-8, this goes away.
I saved the face "savouring delicious food emoji" to database, and read it in php json_encode which show "uD83D\uDE0B"。 but usually we use one <img /> label to replace it .
however,usually I just find this format '\uE056' not "uD83D\uDE0B",to replace with pic E056.png .
I don't know how to get the pic accroding to 'uD83D\uDE0B'.someone know ?
What the relation between 'uD83D\uDE0B' and '\uE056', they both represent emoji "savouring delicious food"?
The Unicode character U+1F60B FACE SAVOURING DELICIOUS FOOD is a so-called Plane 1 character, which means that its UTF-16 encoded form consists of two 16-bit code units, namely 0xD83D 0xDE0B. Generally, Plane 1 characters cause considerable problems because many programs are not prepared to deal with them, and few fonts contain them.
According to http://www.fileformat.info/info/unicode/char/1f60b/fontsupport.htm this particular character only exists in DejaVu fonts and in Symbola, but the versions of DejaVu I’m using don’t contain it.
Instead of dealing with the problems of encodings (which are not that difficult, but require extra information), you can use the character reference 😈 in HTML. But this does not solve the font problem. I don’t know about iPhone fonts, but in general in web browsing, the odds of a computer having any font capable of rendering the character are probably less than 1%. So you may need to use downloadable fonts. Using an image is obviously much simpler and mostly more reliable.
U+E056 is a Private Use codepoint, which means that anybody can make an agreement about its meaning with his brother or with himself, without asking anyone else’s mind. A font designer may assign any glyph to it.
IMPORTANT: As of this posting, the only browser that doesn't automatically support emojis is chrome.
FOR CHROME:
Depending on what server side language you are using, you should be able to find a library that converts emojis for you. I recently needed to solve this issue with php and used this library:
https://github.com/iamcal/php-emoji
The creator essentially created a sprite and adjusts the css according to the unicode of the emoji. It isnt pretty, but luckily he/she did all the grunt work for you. If you're using a different language you should be able to find something similar.
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.
I am developing a web app that sends out emails. Currently, all emails have a HTML part.
Questions:
Is it important to include a text part also?
Do you include both?
Is just removing all the tags from the HTML message and adding a few line breaks good enough to create a text part from the HTML part?
Thanks, Kevin
Is it Important to include a text part also? It's a best practice to provide a plain text version of the email. However, in my opinion and in this day and age, I would guess that it is not such a big deal to leave it out. However, if you know more about your recipients' email clients (eg: if you're sending the emails in a corporate environment and everyone uses a particular email client), then you can determine how necessary it really is.
Do you include both? The .net framework (which I use) provides an AlternateView class (MSDN) that allows you to easily specify copies of an email in different formats. It makes things very easy to include a plain text version of the email. Perhaps you can find something similar in apache/php.
Is just removing all the tags from the HTML message and adding a few line breaks good enough to create a text part from the HTML part? Technically, yes but be VERY CAREFUL here. A complex HTML layout that has been converted to plain text will look absolutely terrible if all you do is remove HTML tags and pile the content together. It really depends on your content and how much you can do to manipulate said content. Also, take a look at Campaign Monitor'ssuggestions for formatting plain text emails.
One final word of advice for you HTML emails to test, test, and then test some more. When you're finished testing, test again. HTML emails will render differently in different email clients and, if some of your recipients are using Microsoft Word 2007/2010 then you can forget about web standards. I urge you to take a look at Campaign Monitor's Guide to CSS support in email.
I'm trying to generate email from my code that will read correctly for people using right-to-left-reading languages such as Arabic. My question is: what are my options for acheiving this?
I am aware that I can create a multipart email and encode the message body as "text/html", then specify a text direction in the <html> tag (e.g. <html dir="rtl">), but ideally I would like use plain-text email and not have to rely on HTML formatting, because not all users will have HTML support in their email client.
On the plain-text front, I have managed to encode Arabic text in UTF-8 using the "Content-Type" header as follows:
Content-Type: text/plain;charset=UTF-8
But as for the overall direction of the text, I am unsure how to explicitly specify this in the email, or even if this is necessary. How would an Arabic speaker typically work with plain-text email? Would they usually rely on the global text direction setting in their email client, or is there some other, generally accepted way of forcing the text direction in the email itself?
Any suggestions or general advice regarding right-to-left email would be much appreciated.
I am unsure how to explicitly specify this in the email, or even if this is necessary.
For plain-text unicode, you can add a right-to left mark inline, but it's not really needed.
I'd say why not add the RTL mark, but it's not really necessary
How would an Arabic speaker typically work with plain-text email?Would they usually rely on the global text direction setting in their email client, or is there some other, generally accepted way of forcing the text direction in the email itself?
Most users would be able to either switch the text direction to get a correct padding of the lines, or are used to a jagged-right Arabic text (the text itself will appear correctly if it doesn't contain any inline English characters, and users are used to reading it in chunks even if it does).
I am upvoting Osama's question, even tough I have a very different opinion then him. The reason is that unicode control characters are not used enough in the world, and they could fix a lot of problems.
Anyway, to my answer: use HTML. Really. Because even if you so s = RLE + s + PDF, the text will have right-to-left direction but not alignment.