The Rocky Road to HTML5

The best of plans sometimes go astray and so it was with the Word Wide Web Consortium (W3C) road map for web page structure. Those familiar with web design know the building blocks of web pages; (X)HTML, CSS, plus script and program languages.

  • (X)HTML gives structure to web content.
  • CSS controls web page presentation.
  • Scripts and program languages add additional functionality.

The W3C road map for content structure went along the following path up until recently:
HTML4 > XHTML1.0 > XHTML1.1.

While each candidate recommendation contains additional features, such as additional elements and attributes, the key component progressively introduced along this road map was the requirement for "well formed" documents. What does this mean? Essentially, a well formed document is one that has a defined structure and can be parsed (checked) against a reference called a Document Type Definition (DTD). A DTD specifies a document hierarchy by prescribing what markup elements are allowed and the sequence in which they must appear. Essentially, this is XML as defined in both XHTML1.0 and 1.1, but “excused” in XHTML1.0 in favour of more tolerant error parsing.

The truth is that something as high as 99% of web pages published contain markup errors but neither HTML4 or XHTML1.0 invoke error responses that cause page content not to be displayed. This is so because all browsers have elaborate and forgiving error correction routines. However, this tolerance was to be removed with the introduction of XHTML1.1 thereby forcing the introduction of "well formed" documents. A document that does not parse correctly would cause incorrectly marked up content not to be displayed by web browsers.

Representatives from Mozilla and Opera petitioned the W3C, arguing that the evolution of HTML should continue and, among other issues, should remain backward compatible with HTML4, including the retention of error correction routines that would not cause fatal errors because of incorrect markup. The W3C rejected this proposal on the basis that it conflicted with their previously chosen direction for the evolution of the Web, centered on developing xml-based replacements for HTML4. Those interested in evolving HTML4, including Apple, Microsoft, Google, Opera, Mozilla formed their own working group called the WHATWG. The group still has its own website at http://www.whatwg.org/.

Web standard recommendations suddenly traveled on a rocky road with the W3C position becoming untenable. A few months down the road, the W3C officially abolished its XHTML1.1 working group and announced the termination of the XHTML candidate recommendation. HTML5 was officially born and the W3C and WHATWG are again steering the same course. What does all this mean for the web community?

HTML5 replaces the XHTML de-facto standard but incorporates declarations that enable authors to have their content rendered and parsed as XML by web browsers. Choosing HTML or XML is done by selecting document transmission with a certain MIME type, declared in the document header with Content Type declaration as follows:

  • For HTML, <meta http-equiv="Content-Type" content="text/html">
  • For XML, <meta http-equiv="Content-Type" content="text/xhtml+xml">

These declarations cause XML and HTML documents to be processed differently. With XML, even minor syntax errors will prevent a document being rendered fully whereas these syntax errors in a HTML document would be ignored. So, rather than facing the crossroad where XHTML would impose unforgiving error handling in line with the the W3C preference for "well formed" web page markup, HTML5, like its predecessors, will continue to ignore incorrect HTML markup in deference to usability and backward comparability.

HTML5 delivers some 30 enhancements, too many to list here. You can view them all at http://html5readiness.com/.

Should you wish to know more about the subject of web standard recommendations and their implementation, I can thoroughly recommend Mark Pilgrim's entertaining scribblings at http://diveintohtml5.org/.

Further relevant information is available at the following sites:

On a closing note, every complex subject needs a bit of confusion. When can we have HTML5, I can hear you ask? Officially, its full implementation is a decade or more away, but many browsers already now support some of the new features.

HTML Comment Box is loading comments...