Is there a list somewhere of which unicode characters are well supported? I.e. if I used these characters in an application or on a web page, there's a good chance that the user will see what I see, and not a question mark or a square.
http://apps.timwhitlock.info/emoji/tables/unicode This is a good start. This shows what a number of unicode characters look like on several common platforms. But this list is limited to emoji. I'm more interested in things like arrows and mathematical symbols.
Of course, I can always see which characters look good on my computer, phone, web browser, favorite font, etc. But I want to know what will work well for most other people, too.
If you need to know if a Unicode character is assigned you should check the official Unicode Chart.
Wikipedia has a good list of fonts which cover the most characters. See Unicode Font # List of Unicode fonts.
Arial is preinstalled on almost every machine and covers a lot of characters. Not to forget Noto from Google - a font collection which covers almost every character you will ever come in touch with.
For a fast lookup of Unicode characters I recommend Fileformat.info.
But I want to know what will work well for most other people, too.
I would go with Arial, or Times New Roman, or make your decision platform dependent.
Related
I am developing a website in the Georgian language. The Georgian alphabet has its own Unicode range, but there are also special fonts which have Georgian glyphs in place of English characters, a bit like the "Symbol" and "Dingbats" fonts.
For example the string "saqarTvelo" will be rendered as "საქართველო" with these fonts. So now I have two options and don't know what to do:
Using Georgian Unicode for my website, but the problem is that all fonts are created for English Unicode, and don't work with Georgian Unicode.
Using Georgian fonts with English Unicode. But I don't know how search engine will react.
Please tell me what to do, I am stacked!
The short answer is that using the approach you mean in option 1, search engines will see the word “საქართველო” in your text as “saqarTvelo”, so normal searches will fail.
The question seems to refer to two different ways of using Georgian letters on web pages:
Using Unicode encoding, so that characters will be rendered using an Unicode-encoded font (which is what most fonts are, but most fonts don’t contain Georgian letters).
Using a nonstandard, “private” encoding, usually one that maps 256 different code positions (8-bit combinations) to whatever characters are needed for some purposes. This presumes that the text is rendered using a font encoded the same way.
Method 2 can be characterized as a wrong approach, but it has been used on the web since the early days (even when CSS was not available and one had to resort to <font face=...> for setting font), and especially in the early days. It really does not work unless the user’s computer has the specific, “privately” encoded font (or some font encoded exactly the same way). Since search engines are font-agnostic, they only see the 8-bit codes and try to interpret them in the encoding declared or implied for the page, not in the “private” encoding (which cannot be declared since it has no published definition and no standard name, or any name for that matter).
Method 1 has the problem that for it to work, the user’s computer needs to have some (Unicode-encoded font) that supports the characters used. Nowadays, this can be reasonably well solved using a downloadable font (web font) via #font-face. Fonts that support Georgian letters include some useful free fonts like DejaVu fonts, GNU Freefont fonts, and Quivira. For more info on this approach, see my Guide to using special characters in HTML.
Using method 1, search engines will see the Georgian letters correctly, provided that the document’s encoding (normally UTF-8) has been properly declared or can be inferred by the search engine.
Is there a standard governing Unicode font support expected of all browsers?
The latest version of Unicode contains a repertoire of more than 110,000 characters covering 100 scripts. I don't expect the browsers to support all of them, but there should be minimum support for some characters such as letters from the Latin script, common punctuation, and symbols of type math, currency, and other.
I am currently having problem displaying the U+060B AFGANI SIGN (؋) and U+202F NARROW NON-BREAK SPACE on the Android browser. I wonder if there is a list of universally recognized Unicode characters so that developers can use them confidently without having to worry about browser display issues.
There is no standard on Unicode support in browsers. Besides, the ability to display a character mostly depends on fonts, though browsers differ in their abilities in scanning through fonts. Normally what you can do is to specify a suitable font-family list of fonts that each support all the characters you need. For generalities on this, see my Guide to using special characters in HTML.
On Android, the problem is that there is a very limited set of fonts. If you need any characters beyond what is supported by them, you need to use a downloadable font, via #font-face.
The currency symbol “؋” U+060B AFGHANI SIGN is present in about a dozen fonts, but the only free font among them (if we don’t count the bitmap font GNU Unifont) appears to be Scheherazade.
For U+202F NARROW NO-BREAK SPACE, font support is wider. But in general, it is often better to use other methods than such characters. Many fonts contain this character as almost as wide as a normal space, and its description in the Unicode standard as regards to its width is vague: “a narrow form of a no-break space, typically the width of a thin space or a mid space”. “Thin space” is described as “a fifth of an em (or sometimes a sixth)” in the Unicode standard, and in reality its width varies. And “mid space” is really an undefined concept.
For example, if the text is in a language that uses spaces as thousands separators, you could in principle write a number like 100 000 as 100 000, but it’s better to write, say,
<span class="gr">100 000</span>
with CSS code like .gr { word-spacing: -0.15em }.
AFAIK, all browsers support #font-face for loading webfonts and can support any character within those fonts. As such, you should be able to display any character in any browser if you make sure you provide access to a webfont with support for those characters.
To avoid using giant fonts just to support a few special characters, you can create your own fonts with tools like the Icomoon App.
I used the Icomoon App to create the Emoji emoticon font as well as for creating custom icon fonts on a per project basis.
For more info on the use or creation of icon fonts (or other webfonts), see Create webfont with Unicode Supplementary Multilingual Plane symbols
I'm trying to figure out why characters like this : 👉 show up like empty boxes. They are unicode characters though and charset is utf-8.
Can it be a font problem which doesn't have a glyph for that? Any ideas?
Details: Html page, i use firefox 16.0.1, Windows 7.. Page like on this post i dont see this glyph either
Thanks
The character which you've there is the Unicode Character 'WHITE RIGHT POINTING BACKHAND INDEX' (U+1F449). On that page, you can find a list of known fonts supporting the character behind the link Fonts that support U+1F449.
Font
LastResort
Segoe UI Emoji
Segoe UI Symbol
Symbola
Neither of those fonts is been used here on stackoverflow.com, so you'll also see an empty box.
If this occurs on your own website, and you'd like to fix it, then you'd need to supply a supporting font along with the webapp by CSS #font-face, or in this particular case perhaps better, look for a CSS based icon library such as Font Awesome. The <i class="fa fa-hand-o-right"> comes very close with this character.
The character U+1F449 was added to Unicode in version 6 in 2010, and it generally takes about ten years from the adoption of a character into Unicode before it is widely supported in fonts.
The few fonts that contain it now include Symbola and Segoe UI Symbol. If you have either of them installed, you’ll probably see it; otherwise not. Segoe UI Symbol is shipped with Windows 8 and apparently with (at least some variants of) Windows 7, though the Windows 7 version may be limited – an update is available from Microsoft. Symbola is a free font, so you could in principle use it as a downloadable font (via #font-face), but its file size is rather large.
Web browsers are supposed to use fallback fonts, if the fonts specified for an element do not contain a glyph for some character in the content. Firefox generally implements this will, IE does not, especially in older versions, so if you use the character on a web page, it is best to wrap in an element of its own (usually span is used for the purpose) and set the following on it in CSS:
font-family: Segoe UI Symbol, Symbola;
But this will as such (without #font-face) work only for people using computers that contain one of the fonts.
Missing font characters will usually be substituted with other fonts, and UTF-8 should be able to display all unicode characters. I suspect that the encoding of your file (how it is saved by your editor), does not match the declaration in the meta tags of your HTML page.
You can check your page with this W3-checker, it can possibly give you hints about the problem of your page.
EDIT:
You are right, it's not an encoding problem, the number of the character has such a high number, that the "normal" fonts do not support it. Maybe you can use one of those ☛ ☞, otherwise you would have to use a web font, and fonts with full unicode support can be quite large.
I saved the face "savouring delicious food emoji" to database, and read it in php json_encode which show "uD83D\uDE0B"。 but usually we use one <img /> label to replace it .
however,usually I just find this format '\uE056' not "uD83D\uDE0B",to replace with pic E056.png .
I don't know how to get the pic accroding to 'uD83D\uDE0B'.someone know ?
What the relation between 'uD83D\uDE0B' and '\uE056', they both represent emoji "savouring delicious food"?
The Unicode character U+1F60B FACE SAVOURING DELICIOUS FOOD is a so-called Plane 1 character, which means that its UTF-16 encoded form consists of two 16-bit code units, namely 0xD83D 0xDE0B. Generally, Plane 1 characters cause considerable problems because many programs are not prepared to deal with them, and few fonts contain them.
According to http://www.fileformat.info/info/unicode/char/1f60b/fontsupport.htm this particular character only exists in DejaVu fonts and in Symbola, but the versions of DejaVu I’m using don’t contain it.
Instead of dealing with the problems of encodings (which are not that difficult, but require extra information), you can use the character reference 😈 in HTML. But this does not solve the font problem. I don’t know about iPhone fonts, but in general in web browsing, the odds of a computer having any font capable of rendering the character are probably less than 1%. So you may need to use downloadable fonts. Using an image is obviously much simpler and mostly more reliable.
U+E056 is a Private Use codepoint, which means that anybody can make an agreement about its meaning with his brother or with himself, without asking anyone else’s mind. A font designer may assign any glyph to it.
IMPORTANT: As of this posting, the only browser that doesn't automatically support emojis is chrome.
FOR CHROME:
Depending on what server side language you are using, you should be able to find a library that converts emojis for you. I recently needed to solve this issue with php and used this library:
https://github.com/iamcal/php-emoji
The creator essentially created a sprite and adjusts the css according to the unicode of the emoji. It isnt pretty, but luckily he/she did all the grunt work for you. If you're using a different language you should be able to find something similar.
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.
I'm using unicode symbols in a web as graphic components.
I need to trust in the way this unicode characters are rendered.
Here there is a simplified example of what I'm trying to build.
You can see that the unicode characters are rendering different in different computers.
Chrome under OSX:
Chrome under Windows:
I only need to support modern browsers so #font-face and google fonts are allowed.
Updated
I know the problem is that the chosen font has not the special characters and finding one with them and compatible with #font-face or googlefonts will be the solution but this is the real problem: how to find a font with this characteristics.
The most likely answer is that your selected font has no glyphs defined for those unicode code-points (and from perusing the font, that seems to be the case) and you will need to switch to a font that has glyphs defined for those code-points.
When a font has no defined glyph for a Unicode code-point, it's up to the platform to figure out how to handle it. Windows used to simply show a square box for anything that wasn't defined, but since Windows Vista (or maybe Windows 7), it will now display a glyph from the system default font, if that's available. What you are most likely seeing for your unicode characters are the versions from the system default fonts - which, of course, are not the same on Windows and Mac.
You should try and find a font that a) contains all the characters you need, b) can be legally used as a downloadable font via #font-face.
You are now using the Fedoka One font, but it contains a very limited character repertoire. The first four characters that you are trying to show are not there (not even “⋕”, since it is quite distinct from the Ascii character “#” despite visual similarity). Since the font-family rule next specifies fantasy, browsers will try whatever fancy font they have been set to use as a generic fantasy font, and it probably hasn’t got them either—fantasy fonts tend to have a limited repertoire. Browsers then go their own ways, possibly using various fonts.
Those four characters are rare in fonts, and the fonts containing them have no similarity with Fedoka One in style. So you may need to reconsider the approach.
Some notes on using special characters in HTML: http://www.cs.tut.fi/~jkorpela/html/characters.html