Unicode character sets & encoding in browsers - unicode

I'm trying to find out how character sets/encoding are implemented in browsers, specifically Unicode.
Are sets/encodings implemented separately in each browser or is it OS specific?
Is it possible to find out what version of the Unicode Character Db (UCD) is being used?
How are UCD updates pushed to each browser/OS? (Is it ever pushed out via automatic updates or is it just set for whatever version browser/OS you're using?)
Links to character sets/encoding information for each browser/OS manufacturer would be nice.
Thanks

I don't believe the browsers worry about the UCD at all.
A wellformed page will have a charset defined for it. Example: <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
Any text that is being displayed will have a list of fonts defined for it (in preferred order). Example:
p { font-family: Verdana, Arial, sans-serif; }
For any character on the page the browser simply looks up the glyph in the font definition. If there isn't one it moves to the next font in the list. If it lucks out completely it probably just uses whatever uber-font the OS provides (Arial).

Related

Is it safe to remove the <DOCTYPE ...> in post-IE area? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
HTML: What is the functionality of !DOCTYPE
I recently asked a question here and the solution was a simple:
You need to add a doctype to the page. This should fix the issue for you.
Now, my pages work fine in every browser without the doctype (except IE). Does IE need a doctype (is this an IE only thing) and do other browsers just assume it OR or is it doing something I'm not seeing.
What are its functions and how does it work?
All browsers need the doctype. Without the DOCTYPE you are forcing the browsers to render in Quirks Mode.
However, DOCTYPE was only partially used by the browsers in determining dialect and parsing, even though that was the purpose. This is why HTML5 has reduced the DOCTYPE to simply:
<!DOCTYPE html>
2.2. The DOCTYPE
The HTML syntax of HTML5 requires a DOCTYPE to be specified to ensure that the browser renders the page in standards mode. The DOCTYPE has no other purpose and is therefore optional for XML. Documents with an XML media type are always handled in standards mode. [DOCTYPE]
The DOCTYPE declaration is <!DOCTYPE html> and is case-insensitive in the HTML syntax. DOCTYPEs from earlier versions of HTML were longer because the HTML language was SGML-based and therefore required a reference to a DTD. With HTML5 this is no longer the case and the DOCTYPE is only needed to enable standards mode for documents written using the HTML syntax. Browsers already do this for <!DOCTYPE html>.
Source: HTML5 differences from HTML4: DOCTYPE
The Doctype does two things.
It identifies which dialect of HTML you're using.
It controls whether the browsers uses "standards" or "quirks" mode to render the document.
If there is no doctype, or there's an unrecognized one, then it uses "quirks" mode and interprets the document as best it can. If there IS a doctype, and it recognizes it, then it follows the standards. The results of the rendering can vary depending on how it interprets the document.
Why?
Why specify a doctype? Because it
defines which version of (X)HTML your
document is actually using, and this
is a critical piece of information
needed by some tools processing the
document.
For example, specifying the doctype of
your document allows you to use tools
such as the Markup Validator to check
the syntax of your (X)HTML. Such tools
won't be able to work if they do not
know what kind of document you are
using.
But the most important thing is that
with most families of browsers, a
doctype declaration will make a lot of
guessing unnecessary, and will thus
trigger a "standard" rendering mode.
Source: http://www.w3.org/QA/Tips/Doctype
You should have a DOCTYPE for ANY browser. It tells the browser how to interpret the html and css. This is why html4 and html5 have different definitions (as does xhtml). All very important for validation.
What IE will do is put the document into what it calls 'quirks mode' which basically ignores a whole heap of rules for how CSS should (by modern definitions) behave. Here is a good summary of the issue. It harks back to the bad old days of non-standardised CSS support
Browsers need at the least to render in what is known as standards mode. See John Resig's article on the html 5 doctype: http://ejohn.org/blog/html5-doctype/. Now if you want your browser to not use standards and render like its 1990 go ahead and not add anything and you will see floats and other now standard items not work correctly. If you want to have your page render/validate in accordance to a particular standard then you would want to add more to the doc type but it is not necessary.
From W3Schools, a doctype is "an instruction to the web browser about what version of the markup language the page is written in." (http://www.w3schools.com/tags/tag_doctype.asp)
If you do not include the doctype, the browser may assume you are using a different language than you really are, causing it to be rendered incorrectly.
From W3Schools.com:
The doctype declaration is not an HTML
tag; it is an instruction to the web
browser about what version of the
markup language the page is written
in.
There are a handful of different doctypes, and changing them can drastically change how your page renders.
The doctype declaration should be the
very first thing in an HTML document,
before the tag.
The doctype declaration is not an HTML
tag; it is an instruction to the web
browser about what version of the
markup language the page is written
in.
The doctype declaration refers to a
Document Type Definition (DTD). The
DTD specifies the rules for the markup
language, so that the browsers render
the content correctly.
Reference

Inaccessible glyphs and symbols in Google Fonts

Some glyphs that are shown on the Google Fonts specimen sheet are not available once the font is implemented on a site.
For example, look at this preview for Piazzolla:
https://fonts.google.com/specimen/Piazzolla?preview.text=piazzolla%20%E2%84%A6%E2%86%92%E2%86%92%E2%86%97%E2%86%97&preview.text_type=custom&query=piazzolla#standard-styles
Notice how the arrows are using the custom glyph provided by the font.
Then, compare that to this codepen that uses the same font, but the arrows are not using the same glyph.
<div></div>
(random code block to appease stackoverflow because there is no code that needs to be embedded in the question.)
This leads me to believe that Google is not serving up the entire font, and there might be a way to have access to more characters.
Any help would be greatly appreciated. Thanks!
The GF API has an advanced feature for this, but you have to closely read the manual (https://developers.google.com/fonts/docs/getting_started) and know how to use the API to do what you want.
Here's a working demo using the arrows in IBM Plex:
https://jsbin.com/neheyuxira/2/edit?html,output
And a fork of your page with the same technique applied
https://codepen.io/davelab6/pen/bGRpJQP
The trick is to add a API URL first that uses the text API feature to specify the unicodes you want (URL encoded, eg with https://r12a.github.io/app-encodings), and then the regular API URL.
<link href="https://fonts.googleapis.com/css?family=IBM+Plex+Mono|IBM+Plex+Sans|IBM+Plex+Sans+Condensed|IBM+Plex+Serif&text=%E2%86%B3" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=IBM+Plex+Mono|IBM+Plex+Sans|IBM+Plex+Sans+Condensed|IBM+Plex+Serif" rel="stylesheet">

Should HTML email template use table element for the layout?

I have seen bunch of HTML email templates example and all of them use <table> element for layout. Is there any specific reason for using <table>? I tried making one without it and it works for me. Should I be worried that it might break for someone else with different browser?
The main reason tables are still used nowadays is to support Outlook 2007/2010/2013. Those versions use Microsoft Word engine to render HTML, and it's quite a mess. It has a limited support for CSS (no float or position for example), and some CSS properties are only supported on some specific HTML elements. For example, padding is supported on a , but not on a . And even when you could theorically use padding on more semantical elements (like tags), or use margin on elements instead, Word's rendering engine is still massively bugged and can have unpredictable behavior with such HTML and CSS code. Thus, developers find it easier to just use instead.
But here's the thing : if you don't feel like you need to support Outlook 2007/2010/2013, then you can absolutely ditch tables and use better code instead. And even if you need to support it, simple one-column layouts can be done without tables. The reason your template works in Outlook 2011 is that this version (for Mac only) uses WebKit rendering engine (just like in Safari or Apple Mail).
Referring to this old post(why-is-it-still-recommended-to-use-tables-for-email-structure) and some of my own experiments:
We can definitely use HTML tags and not just <table> tag. It gets rendered well in modern browsers. I have personally experimented with Chrome ( which most probably means it works on all chromium based browsers ) and Safari.
Another thing I noticed is, the email clients stripped the template and removed all tags except the main content. In other words, it only rendered what's inside the <body> tag and removed other tags like <html>, <head> including the <body> tag itself. So I don't use those tags in my template at all.

How to use Unicode symbols on webpages?

I'm using some Unicode symbols on a webpage I'm making. For purposes of this example, let's say it's this guy: '☺'.
As I understand it, under the correct implementation of CSS, you can set any font you want, and if it runs into a character that is not present in that font, it will start falling back through the font-family backup choices until it finds one that works.
In light of that, I have my font-family set up like this in css:
font-family: Tahoma, Helvetica, "Arial Unicode MS", sans-serif
My rationale is that Tahoma comes bundled with Windows. However, I found out online that only some versions of Windows' bundled Tahoma had Unicode support. Helvetica is a font that is similar to Tahoma for Macs. "Arial Unicode MS" comes bundled with Office 2000 and up and definitely support Unicode. San-serif is the fallback in all cases that should also hopefully support Unicode.
For the most part, this works well. However, as it is wont to do, Internet Explorer seems to be ruining my well-laid plan. I can't figure out what the pattern is, as I'm seeing it on one computer running Vista with IE8, and another on Windows XP with IE7, but it works fine on my development machine with Win7 with IE8/IE7 Tester/IE6 Tester. I have found the claim on some obscure webpages that on old versions of IE, it will only look for the first font that it has, and then use that for everything, even if that font is missing a given symbol, but this doesn't explain why it's happening on Vista/IE8. Thus, my lovely Unicode symbols turn up as boxes to some, but not all IE users.
What's the recommended way to be handling Unicode symbols on the web? Are they just not usable for projects where wide browser compatibility is needed? Should I be looking to include code specifically to handle old IE? Are there any other gotcha situations or platforms I should be worrying about?
Edit: Updated with new information on systems this is failing on.
only some versions of Windows' bundled Tahoma had Unicode support
It's not really “Unicode support”. Tahoma supports Unicode in as much as it has Unicode code point lookup tables. That doesn't mean you get a glyph for every character defined by Unicode... actually almost no font has glyphs for every character.
No version of Tahoma includes a glyph for U+263A White Smiling Face, so your code is a test of font fallback capabilities, something IE (especially IE6) is bad at, compared to other browsers. A more common Windows-bundled font that does include U+26A3 is plain Arial (not “Arial Unicode”), since version 3.00 (included in WinXP).
You can use IE overrides in your CSS file to create different behavior for older versions of internet explorer. The over-rides are # and _ before each statement depending on the version of internet explorer.
Put an _ before each statement in your css for internet explorer 6.0 and earlier
Put a # before each statement in your css for internet explorer only
Example:
//Normal
font-family: Tahoma, Helvetica, "Arial Unicode MS", sans-serif;
// IE 6.0 Earlier
_font-family: Tahoma, Helvetica, sans-serif;
// IE Only
#font-family: Tahoma, Helvetica, "Arial Unicode MS", sans-serif;
As I understand it, under the correct implementation of CSS, you can set any font you want, and if it runs into a character that is not present in that font, it will start falling back through the font-family backup choices until it finds one that works.
Unfortunately, that's not how it works in Internet Explorer, at least not in older versions. These browsers only use the first font family available on a system. One approach I sometimes use is to add a separate CSS class for Unicode characters:
<span class="unicode">[Unicode character]</span>
.unicode { font-family: "Lucida Sans Unicode", ..., sans-serif; }
I found Lucida Sans Unicode to be a good choice as it's preinstalled on all Windows versions since Windows 98. Its Unicode support isn't as complete as you'd expect, though.
But I started to prefer icon fonts. They have the advantage that the glyphs always look the same regardless of the font actually used. As John Slegers mentions in his answer, there are great online tools to create your own customized icon font.

Use of HTML 5 doctype creates a gap at top of page on iphone safari browser

Update: Please disregard, my problem was caused by an advertisement bar being inserted by the vendor who provides my workplace wireless service.
I was building a mobile friendly website and wanted to use HTML 5. However when I specify the doctype as <!DOCTYPE HTML> , I get a gap at the top of the page on safari on the iphone.
I notice that other sites have the same problem such as nextstop.com and nike.com
I guess safari does not fully support HTML 5 yet. Anybody know of a workaround?
HTML 5 is still in a very unstable state. Don't use it in a production environment.
Edit Just so you guys know what it's about, HTML 5 is currently an Editor's Draft, and the document clearly states (in the Status of This Document section) that this specification is not stable, and that a consensus may not have been reached on any of the proposed sections. I think it should be clear enough that it means it's a bit early to start using it.
All browsers correctly interpret the HTML doctype. Putting it in sets your browser into Standards Compliant mode, that is the only difference with or without the doctype.
You can use a CSS reset tool like http://meyerweb.com/eric/tools/css/reset/ to get rid of default margins and padding on all elements.