What character encoding am I seeing in my emails? - email

I'm getting emails occasionally that are having strange encoding issues. The quotation marks show up as ³example², and apostrophes show up as that¹s. I can't imagine that the other person actually meant to use those symbols, even though the email headers specify an encoding of Windows-1252. I'm using Thunderbird for Mac OSX, and I'm not sure what email client is being used to send these messages.

These are the characters ` and angled double-quotes. In my experience, these are typically from OSX because it uses a specialized version of ISO-8859, that's what I recall reading when researching this issue a few months ago, if I find the reference I will add the link.
If the sender specifies UTF-8, this goes away.

Related

Left to Right Unicode Character in email Time Stamp

I am converting email to a .pdf using an HTML-to-pdf conversion scheme.
When I convert the email I see this in the pdf:
So I looked a little deeper into the email and can see Unicode character 200e which is a left to right character:
I am going to strip that character out of the email before the conversion, but is there a better solution?
[EDIT] Thank you for correcting the typo where I misstated the direction.
This email is from a U.S. based English configured user's computer. The date is inserted when he hits reply on an email and it inserts the marker for separating the conversations. Due to the nature of the business, it is highly unlikely they will get right to left languages. The 0x200e only appears in the date.
This is a programming question because I am converting the HTML email to pdf using c# in an outlook add-on that we are creating.
We are using HtmlRenderer.PdfSharp to do the conversion; seems to work very well other than this annoyance.

Support displaying emojis in UILabel

I'm having problems displaying emojis in a UILabel.
in some cases, it even causes a crash when lay-outing the characters in the label.
these characters are returning from server as unicode, and are parsed with AFNetworking framework.
this is an example of how it is returned from the server (console logs):
\U05d4\U05d9\U05d9
i have tried different approaches, like lowercasing this to "\u05d4" or playing with the encoding of the string returning.
nothing seems to work.
i did managed to show a couple of emojis properly (which makes me think it maybe a server related issue?) - does the server needs to support sets of unicode characters so it can return it in the appropriate encoding? i'd be happy if someone could clarify this point for me. (btw, server is written in RubyOnRails i believe.)
should i parse the data with a different parser (SBJSON)? although switching the networking framework at this point would be impossible due to time and resources available..
what other options do i have?
Thanks
i think you should be able to just paste an emoji character in the code directly as a text.

Russian characters are turned junk when using Filezilla to FTP

When I tried to FTP Russian named files, it is showing as "junk" characters in Linux machine. But when I copied the Russian names it is correctly showing up.
Is there any settings or anything need to be done in Filezilla during FTP. I tried with both Ascii and Binary mode.
The Linux machine is having locale set to ru_RU.cp1251.
FTP was invented with US-ASCII as character set in mind, so it lacks a concept for different character sets at all. The server sends filenames as-is and the client has to properly interpret them.
FileZilla can do that as well: Add your site to the Site Manager (File then Site Manager…). For your site, go to Charset tab and select Use custom charset. As I do not know how the accepted character set name is, you have to try a bit: cp-1251, windows-1251, cp1251, etc.
If possible, make sure the FTP server supports UTF-8 and then always use UTF-8 (Unicode). This way you do not have such problems anymore.
ASCII and binary modes by the way are completely unrelated to character sets - see FileZilla Wiki regarding data type for more information.

How to display emoji char in HTML

I saved the face "savouring delicious food emoji" to database, and read it in php json_encode which show "uD83D\uDE0B"。 but usually we use one <img /> label to replace it .
however,usually I just find this format '\uE056' not "uD83D\uDE0B",to replace with pic E056.png .
I don't know how to get the pic accroding to 'uD83D\uDE0B'.someone know ?
What the relation between 'uD83D\uDE0B' and '\uE056', they both represent emoji "savouring delicious food"?
The Unicode character U+1F60B FACE SAVOURING DELICIOUS FOOD is a so-called Plane 1 character, which means that its UTF-16 encoded form consists of two 16-bit code units, namely 0xD83D 0xDE0B. Generally, Plane 1 characters cause considerable problems because many programs are not prepared to deal with them, and few fonts contain them.
According to http://www.fileformat.info/info/unicode/char/1f60b/fontsupport.htm this particular character only exists in DejaVu fonts and in Symbola, but the versions of DejaVu I’m using don’t contain it.
Instead of dealing with the problems of encodings (which are not that difficult, but require extra information), you can use the character reference 😈 in HTML. But this does not solve the font problem. I don’t know about iPhone fonts, but in general in web browsing, the odds of a computer having any font capable of rendering the character are probably less than 1%. So you may need to use downloadable fonts. Using an image is obviously much simpler and mostly more reliable.
U+E056 is a Private Use codepoint, which means that anybody can make an agreement about its meaning with his brother or with himself, without asking anyone else’s mind. A font designer may assign any glyph to it.
IMPORTANT: As of this posting, the only browser that doesn't automatically support emojis is chrome.
FOR CHROME:
Depending on what server side language you are using, you should be able to find a library that converts emojis for you. I recently needed to solve this issue with php and used this library:
https://github.com/iamcal/php-emoji
The creator essentially created a sprite and adjusts the css according to the unicode of the emoji. It isnt pretty, but luckily he/she did all the grunt work for you. If you're using a different language you should be able to find something similar.
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.

UTF8 encoding problem, same results work fine in wordpress

I have a wordpress installation that clients can edit, all characters display ok. On the main homepage I query the same database for the same title and post content, but it doesn't display correctly - just a question mark
I have tried sending the utf8 headers manually, through htaccess and through meta tags. I have used SET name UTF8 (which turns the characters into the diamond symbol with a questionmark inside).
I genuinely cant figure out what it could be now and I really need these characters to display correctly.
Heres the homepage, you can see in the Sounddhism 6 preview that there are lots of question marks, if you click on it you will see what they are meant to look like
http://nottingham.subverb.net
I have passed it through the validator and it gives me this error:
Sorry, I am unable to validate this document because on line 373 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xA0" does not map to Unicode
Which, i appreciate is supposed to help me, but I don't know what to do about it. Especially since that line, the letter generating the error is supposed to be a space and is AFTER the offending question marks.
Can anyone help?
Compare the encoding of both the back-end scripts in Wordpress and also your homepage script. If you're using IE, right-click the page and check the encoding. Sometimes it's set to "Auto-detect" and IE will often detect a different encoding for different pages, causing strange issues like this.
If you're not using IE, try using a tool like Fiddler to see exactly what encoding (and what bytes are being sent back and forth both in the back-end and your homepage script.
If forcing UTF-8 on your homepage script doesn't work, I would guess that the back-end is not using UTF-8.