. You’ll see both styles in this book, as we each work as we feel most comfortable and you need to be able to read both. As a brave new HTML5 author, you’re free to choose—but having chosen, keep to it.
Why such appallingly lax syntax? The answer is simple: browsers never cared about XHTML syntax if it was sent as text/html— only the XHTML validator did. Therefore, favouring one form over the other in HTML5 would be entirely arbitrary, and cause pages that didn’t follow that format to be invalid, although they would work perfectly in any browser. So HTML5 is agnostic about which you use. While we’re on the subject of appallingly lax syntax rules (from an XHTML perspective), let’s cheat and, after adding the document title, go straight to the content: <meta charset=utf-8>
Interesting blog Today I drank coffee for breakfast. 14 hours later, ¬ I went to bed.
If we validate this exhilarating blog, we find that it validates fine, yet it has no tag, no , and no (Figure 1.1). FIguRE 1.1 Shockingly, with no head, body, or HTML tag, the document validates.
4
IN TRodu cIN g HT ML 5
This is perhaps one of those WTF? moments I mentioned in the introduction. These three elements are (XHTML authors, are you sitting down?) entirely optional, because browsers assume them anyway. A quick glance under the browser hood with Opera Dragonfly confirms this (Figure 1.2). FIguRE 1.2 Opera Dragonfly debugger shows that browsers add the missing elements.
Figure 1.3 shows it using the Internet Explorer 6 developer tools.
FIguRE 1.3 Internet Explorer 6, like all other browsers, adds missing elements in the DOM. (Old versions of IE seem to swap
and <meta>, however.)
Because browsers do this, HTML5 doesn’t require these tags. Nevertheless, omitting these elements from your markup is likely to confuse your coworkers. Also, if you plan to use AppCache (see Chapter 7) you’ll need the element in your markup. It’s also a good place to set the primary language of the document:
A visually-impaired user might come to your website with screenreading software that reads out the text on a page in a synthesized voice. When the screenreader meets the string “six” it will pronounce it very differently if the language of the page is English or French. Screenreaders can attempt to guess at what language your content is in, but it’s much better to unambiguously specify it, as I have here.
cHA p TE R 1 : M A I N STR u cTu R E : THE < HE Ad >
5
IE8 and below require the element before they will apply CSS to style new HTML5 elements, so it makes sense to use this element, too. So, in the interest of maintainability, we’ll add those optional elements to make what’s probably the minimum maintainable HTML5 page: <meta charset=utf-8>
Interesting blog Today I drank coffee for breakfast. 14 hours later, ¬ I went to bed.
Does validation matter anymore? Given that we have such forgiving syntax, we can omit implied tags like , , and , and—most importantly—because HTML5 defines a consistent DOM for any bad markup, you might be asking yourself if validation actually matters anymore. We’ve asked ourselves the same question. Our opinion is that it’s as important as it’s ever been as a quality assurance tool. But it’s only ever been a tool, a means to an end—not a goal in itself. The goal is semantic markup: ensuring that the elements you choose define the meaning of your content as closely as possible, and don’t describe presentation. It’s possible to have a perfectly valid page made of nothing but display tables, divs, and spans, which is of no semantic use to anyone, Conversely, a single unencoded ampersand can make an excellently structured, semantically rich web page invalid, but it’s still a semantic page. When we lead development teams, we make passing validation a necessary step before any code review, let alone before making code live. It’s a great way to ensure that your code really does what you want. After all, browsers may make a consistent DOM from bad markup but it might not be the DOM you want. Also, HTML5 parsers aren’t yet everywhere, so ensuring valid pages is absolutely what you should aim for to ensure predictable CSS and JavaScript behaviours. We recommend using http://validator.w3.org/ or http://html5.validator.nu. We expect that there will be further developments in validators, such as options to enforce coding choices—so you can choose to be warned for not using XHTML syntax, for example, even though that’s not required by the spec. One such tool that looks pretty good is http://lint.brihten.com, although we can’t verify whether the validation routines it uses are up-to-date.
6
IN TRodu cIN g HT ML 5
Using new HTML5 structural elements In 2004, Ian Hickson, the editor of the HTML5 spec, mined one billion web pages via the Google index, looking to see what the “real” Web is made of. One of the analyses he subsequently published (http://code.google.com/webstats/2005-12/ classes.html) was a list of the most popular class names in those HTML documents. More recently, in 2009, the Opera MAMA crawler looked again at class attributes in 2,148,723 randomly chosen URLs and also ids given to elements (which the Google dataset didn’t include) in 1,806,424 URLs. See Table 1.1 and Table 1.2. TABLE 1.2 ID Names
popuLARITY
VALuE
FREQuENcY
popuLARITY
VALuE
FREQuENcY
1
footer
179,528
1
footer
288,061
2
menu
146,673
2
content
228,661
3
style1
138,308
3
header
223,726
4
msonormal
123,374
4
logo
5
text
122,911
5
container
6
content
113,951
6
main
106,327
7
title
91,957
7
table1
101,677
8
style2
89,851
8
menu
96,161
9
header
89,274
9
layer1
93,920
10
copyright
86,979
10
autonumber1
77,350
11
button
81,503
11
search
74,887
12
main
69,620
12
nav
72,057
13
style3
69,349
13
wrapper
66,730
14
small
68,995
14
top
66,615
15
nav
68,634
15
table2
57,934
16
clear
68,571
16
layer2
56,823
17
search
59,802
17
sidebar
52,416
18
style4
56,032
18
image1
48,922
19
logo
48,831
19
banner
44,592
20
body
48,052
20
navigation
43,664
121,352 119,877
cHAp T E R 1 : MAIN ST Ru c TuR E : u SI Ng NE w HTM L 5 STRu c Tu R A L E L E M E NTS
7
As you can see, once we remove obviously presentational classes, we’re left with a good idea of the structures that authors are trying to use on their pages. Just as HTML 4 reflects the early Web of scientists and engineers (so there are elements like
, <samp>, and ), HTML5 reflects the Web as it was during its development: 30 elements are new, many of them inspired by the class and id names above, because that’s what developers build. So, while we’re in a pragmatic rather than philosophical mood, let’s actually use them. Here is a sample blog home page marked up as we do in HTML 4 using the semantically neutral element:
Yesterday
Today I drank coffee for breakfast. 14 hours later, ¬ I went to bed.
Tuesday
Ran out of coffee, so had orange juice for breakfast. ¬ It was from concentrate.
By applying some simple CSS to it, we’ll style it: #sidebar {float:left; width:20%;} .post {float:right; width:79%;} #footer {clear:both;}
8
IN TRodu cIN g HT ML 5
Diagrammatically, the page looks like Figure 1.4. FIguRE 1.4 The HTML 4 structure of our blog.
div id="header" div class="post" div id= "sidebar" div class="post" div id="footer" While there is nothing at all wrong with this markup (and it’ll continue working perfectly well in the new HTML5 world), most of the structure is entirely unknown to a browser, as the only real HTML element we can use for these important page landmarks is the semantically neutral
(defined in HTML 4 as “a generic mechanism for adding structure to documents”). So, if it displays fine, what’s wrong with this? Why would we want to use more elements to add more semantics? It’s possible to imagine a clever browser having a shortcut key that would jump straight to the page’s navigation. The question is: How would it know what to jump to? Some authors write