Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Recently I noticed that a user name was blocked on my web service because it didn't match the regular expression filter that I had placed there to restrict character to letters, numbers and certain symbols. The string was Dong and it turns out that in my game a user must have pasted a Unicode string into the text entry field, something I hadn't considered up to that point.
I did a bit of research and noted that C# Regex is Unicode aware and I can add \p{L}\p{M}\p{Z}\p{N} in order to allow more combinations in but I still wouldn't be sure that I allowed or disallowed all appropriate combinations. Particularly I wanted to be able to block certain words from being used that might offend younger players (or older ones too). Right now I have a bad word filter that I apply after the user completes entry which does a good job of blocking most attempts at making a username that is offensive but only (as I realize now) if they used ASCII to compose it.
How do other games deal with this given that certain Unicode characters aren't represented in the particular font that is chosen for the game and that certain Unicode combinations can actually muck up the alignment in your display?
The more quality games / apps use a fallback font for characters their main font don't support - so fallback on Arial Unicode if Segoe UI doesn't have that character. This usually works fine since it's just the non-English characters, emoji, and whatnot that end up rendered using the alternate font.
If you don't do that, you have to restrict user input to what your chosen fonts will support, if not all the way back to ASCII.
The question of supported fonts compared to restricting "bad words" is totally separate, though. Having a flagging mechanism is a good idea but also using a common list of "naughty words" can be helpful as well. Even if you don't speak every language you can probably find word lists for those languages on the web somewhere.
Related
Is there a list somewhere of which unicode characters are well supported? I.e. if I used these characters in an application or on a web page, there's a good chance that the user will see what I see, and not a question mark or a square.
http://apps.timwhitlock.info/emoji/tables/unicode This is a good start. This shows what a number of unicode characters look like on several common platforms. But this list is limited to emoji. I'm more interested in things like arrows and mathematical symbols.
Of course, I can always see which characters look good on my computer, phone, web browser, favorite font, etc. But I want to know what will work well for most other people, too.
If you need to know if a Unicode character is assigned you should check the official Unicode Chart.
Wikipedia has a good list of fonts which cover the most characters. See Unicode Font # List of Unicode fonts.
Arial is preinstalled on almost every machine and covers a lot of characters. Not to forget Noto from Google - a font collection which covers almost every character you will ever come in touch with.
For a fast lookup of Unicode characters I recommend Fileformat.info.
But I want to know what will work well for most other people, too.
I would go with Arial, or Times New Roman, or make your decision platform dependent.
I am developing a website in the Georgian language. The Georgian alphabet has its own Unicode range, but there are also special fonts which have Georgian glyphs in place of English characters, a bit like the "Symbol" and "Dingbats" fonts.
For example the string "saqarTvelo" will be rendered as "საქართველო" with these fonts. So now I have two options and don't know what to do:
Using Georgian Unicode for my website, but the problem is that all fonts are created for English Unicode, and don't work with Georgian Unicode.
Using Georgian fonts with English Unicode. But I don't know how search engine will react.
Please tell me what to do, I am stacked!
The short answer is that using the approach you mean in option 1, search engines will see the word “საქართველო” in your text as “saqarTvelo”, so normal searches will fail.
The question seems to refer to two different ways of using Georgian letters on web pages:
Using Unicode encoding, so that characters will be rendered using an Unicode-encoded font (which is what most fonts are, but most fonts don’t contain Georgian letters).
Using a nonstandard, “private” encoding, usually one that maps 256 different code positions (8-bit combinations) to whatever characters are needed for some purposes. This presumes that the text is rendered using a font encoded the same way.
Method 2 can be characterized as a wrong approach, but it has been used on the web since the early days (even when CSS was not available and one had to resort to <font face=...> for setting font), and especially in the early days. It really does not work unless the user’s computer has the specific, “privately” encoded font (or some font encoded exactly the same way). Since search engines are font-agnostic, they only see the 8-bit codes and try to interpret them in the encoding declared or implied for the page, not in the “private” encoding (which cannot be declared since it has no published definition and no standard name, or any name for that matter).
Method 1 has the problem that for it to work, the user’s computer needs to have some (Unicode-encoded font) that supports the characters used. Nowadays, this can be reasonably well solved using a downloadable font (web font) via #font-face. Fonts that support Georgian letters include some useful free fonts like DejaVu fonts, GNU Freefont fonts, and Quivira. For more info on this approach, see my Guide to using special characters in HTML.
Using method 1, search engines will see the Georgian letters correctly, provided that the document’s encoding (normally UTF-8) has been properly declared or can be inferred by the search engine.
I saved the face "savouring delicious food emoji" to database, and read it in php json_encode which show "uD83D\uDE0B"。 but usually we use one <img /> label to replace it .
however,usually I just find this format '\uE056' not "uD83D\uDE0B",to replace with pic E056.png .
I don't know how to get the pic accroding to 'uD83D\uDE0B'.someone know ?
What the relation between 'uD83D\uDE0B' and '\uE056', they both represent emoji "savouring delicious food"?
The Unicode character U+1F60B FACE SAVOURING DELICIOUS FOOD is a so-called Plane 1 character, which means that its UTF-16 encoded form consists of two 16-bit code units, namely 0xD83D 0xDE0B. Generally, Plane 1 characters cause considerable problems because many programs are not prepared to deal with them, and few fonts contain them.
According to http://www.fileformat.info/info/unicode/char/1f60b/fontsupport.htm this particular character only exists in DejaVu fonts and in Symbola, but the versions of DejaVu I’m using don’t contain it.
Instead of dealing with the problems of encodings (which are not that difficult, but require extra information), you can use the character reference 😈 in HTML. But this does not solve the font problem. I don’t know about iPhone fonts, but in general in web browsing, the odds of a computer having any font capable of rendering the character are probably less than 1%. So you may need to use downloadable fonts. Using an image is obviously much simpler and mostly more reliable.
U+E056 is a Private Use codepoint, which means that anybody can make an agreement about its meaning with his brother or with himself, without asking anyone else’s mind. A font designer may assign any glyph to it.
IMPORTANT: As of this posting, the only browser that doesn't automatically support emojis is chrome.
FOR CHROME:
Depending on what server side language you are using, you should be able to find a library that converts emojis for you. I recently needed to solve this issue with php and used this library:
https://github.com/iamcal/php-emoji
The creator essentially created a sprite and adjusts the css according to the unicode of the emoji. It isnt pretty, but luckily he/she did all the grunt work for you. If you're using a different language you should be able to find something similar.
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I wrote some code to send email as both HTML & Text, and I am having having trouble testing it.
On Thunderbird and Outlook, there is an option to view as plain-text, however I have a feeling that they are being smart and doing something to the plain text (because it looks slightly different in thunderbird than in outlook).
What's the crappiest email client out there? One that simply has no HTML support and would not be smart enough to convert HTML to text by itself.
I'd like to see the worst solution.
mail on *nix is an option. Use it on a terminal, and there's no way you'll get automatic conversion of HTML!
( If mail is too crappy, you can try another text based email client, that is actually easy to use and often installed - Alpine ( although, if you want to err on the side of crappy go for Pine instead ) )
Lotus Notes. Even simple HTML is likely to be mauled. This would be a perfect application, and the only suitable purpose it could possibly fulfill. While you're at it, you can get a great example of how to design a bad interface.
Are you dealing with technical people with this? I can't imagine many people use something like Mutt except for CS professors and people who read SO, but realistically not many common readers (other than internal company email at places with limited computers like POS systems) use non-HTML mail.
Also, look into how it looks without externally linked images because most webmail providers (and others) block it.
The view plain text option in Thunderbird actually displays the plain text copy from the Multi-part MIME. That should be fine for what you're looking to do.
Outlook actually converts the HTML version into plain text and does not show the plain text version that was delivered in the MIME.
Bottom line, just use Thunderbird.
Make sure to test your mail in both outlook 2003 and 2007.
Outlook 2003 uses Internet Explorer to render HTML mails.
Outlook 2007 uses Word to render HTML mails. It doesn't support stuff like positioning of elements, so if you heavily rely on css, outlook 2007 is going mess things up for sure.
If you want to actually see the Text part of a multi-part email (an email sent with both an HTML version and a Text version in the same message), then I'd recommend having a look at Fastmail.fm. They have an option to show nothing but the Text part of the multipart.
If on the other hand you are looking for an email client with the worst heuristics to convert HTML to Text - then I'm going to take a guess and second Lotus Notes.
You could look for elm or pine on *nix - both very basic.
Whenever I want to see what the message actually is (not just what it looks like) in Thunderbird, I just use the view message source option. That gives me the plain text and the HTML source.
If you just want mangled e-mail, though, Lotus Notus is highly recommended.
Mail on *nix. Very, VERY crappy.
This is sort of a generic question due to my lack of experience with fonts, so a little patience and/or pointing in the right direction to get more info would be appreciated. I have an iphone app and am noticing that when I print some text on my labels, I end up with garbage when the string contains non-ascii, like Korean for example.
My guess is that since my UILabels, for instance, are using the system font, perhaps the system font does not support displaying wide characters. However, I'm left with a few beginner questions:
1) How do I set the system font so my iphone sdk objects that use the system font use it?
2) Does this sound correct that the system font probably doesn't support wide characters and is the reason I see garbage when I have characters out of the normal ascii range?
Thanks. Let me know if I need to clarify the problem please.
Update:
I later suspected maybe it was a problem on my server end so posted this related but not identical post here: does google app engine display unicode differently in StringProperty v StringListProperty objs?
It turns out the problem was not with the font, but with improperly encoding the data response from the server into Ascii when I should have used UTF8. It appears the font supported unicode to begin with.