but
While we’re on the subject of appallingly lax syntax rules (from an XHTML perspective), let’s cheat and, after adding the document title, we’ll go straight to the content: <meta charset=utf-8>
Interesting blog Today I drank coffee for breakfast. 14 hours later, ¬ I went to bed.
If we validate this exhilarating blog, we find that it validates fine, yet it has no tag, no , and no (Figure 1.1). FIGURE 1.1 Shockingly, with no head, body, or html tag, the document validates.
4
IN TROD U CIN G H TML5
This is perhaps one of those WTF? moments I mentioned in the introduction. These three elements are (XHTML authors, are you sitting down?) entirely optional, because browsers assume them anyway. A quick glance under the browser hood with Opera Dragonfly confirms this (Figure 1.2). FIGURE 1.2 Opera Dragonfly debugger shows that browsers add the missing elements.
Figure 1.3 shows it using the IE8 developer tools. FIGURE 1.3 Internet Explorer 8, like all other browsers, adds missing elements in the DOM. (IE seems to swap
and <meta>, however.)
Because browsers do this, HTML5 doesn’t require these tags. Nevertheless, omitting these elements from your markup is likely to confuse the heck out of your co-workers. Also, skipping the tag hurts your screen reader users, as that’s where you set the primary language of the document:
CHA P TE R 1 : M A IN STR U CTU R E : THE < HE A D >
5
This is important as the word six, for example, is pronounced differently depending on whether the language is English or French. Also, as we’ll see later, IE requires the element before it will apply CSS to style new HTML5 elements. So, in the interest of maintainability, we’ll add those optional elements to make what’s probably the minimum maintainable HTML5 page: <meta charset=utf-8>
Interesting blog Today I drank coffee for breakfast. 14 hours later, ¬ I went to bed.
Does validation matter anymore? Given that we have such forgiving syntax, we can miss out implied tags like , , and , and—most importantly—because HTML5 defines a consistent DOM for any bad markup, you’ll be forgiven for asking yourself if validation actually matters any more. We’ve asked ourselves the same question. Our opinion is that validation was always a tool, a means to an end—not a goal in itself. The goal is semantic markup: ensuring that the elements you choose define the meaning of your content as closely as possible, and don’t describe presentation. It’s possible to have a perfectly valid page made of nothing other than display tables, divs and spans, which is no semantic use to anyone, Conversely, a single unencoded ampersand can make an excellently-structured semantically-rich web page invalid, but it’s still a semantic page. We think that validation remains useful quality assurance, too. When we lead development teams, we make passing validation a necessary step before any code review, let alone making code live. It’s a great way of ensuring that your code really does what you want. After all, browsers may make a consistent DOM from bad markup but it might not be the DOM you want. Also, HTML5 parsers don’t exist yet in production browsers, so ensuring valid pages is absolutely what you should aim for to ensure predictable CSS and JavaScript behaviours. The validator we use is http://html5.validator.nu. We expect to see further developments in validators, such as options to enforce coding choices—so you can choose to be warned for not using XHTML syntax, for example, even though that’s not required by the spec.
6
IN TROD U CIN G H TML5
Using new HTML5 structural elements In 2004, the editor of the HTML5 spec, Ian Hickson, mined 1 billion web pages via the Google index, looking to see what the “real” web is made of. One of the analyses he subsequently published (http://code.google.com/webstats/2005-12/classes. html) was a list of the most popular class names in those HTML documents. More recently, in 2009 the Opera MAMA crawler (see http:// devfiles.myopera.com/articles/572/idlist-url.htm) looked again at class attributes in 2,148,723 randomly chosen URLs and also ids given to elements (which the Google dataset didn’t include) in 1,806,424 URLs. See Table 1.1 and Table 1.2. TABLE 1.1 Class Names
TABLE 1.2 ID Names
POPULARITY
VALUE
FREQUENCY
POPULARITY
VALUE
FREQUENCY
1
footer
179,528
1
footer
288,061
2
menu
146,673
2
content
228,661
3
style1
138,308
3
header
223,726
4
msonormal
123,374
4
logo
121,352
5
text
122,911
5
container
119,877
6
content
113,951
6
main
106,327
7
title
91,957
7
table1
101,677
8
style2
89,851
8
menu
96,161
9
header
89,274
9
layer1
93,920
10
copyright
86,979
10
autonumber1
77,350
11
button
81,503
11
search
74,887
12
main
69,620
12
nav
72,057
13
style3
69,349
13
wrapper
66,730
14
small
68,995
14
top
66,615
15
nav
68,634
15
table2
57,934
16
clear
68,571
16
layer2
56,823
17
search
59,802
17
sidebar
52,416
18
style4
56,032
18
image1
48,922
19
logo
48,831
19
banner
44,592
20
body
48,052
20
navigation
43,664
C HAPT E R 1 : MAIN ST R U CTU R E : U SI NG NE W HTM L 5 STR U CTU R A L E L E M E NTS
7
As we can see, once we remove obviously presentational classes, we’re left with a good idea of the structures that authors are trying to use on their pages. Just as HTML 4 reflects the early Web of scientists and engineers (so there are elements like
, <samp>, and ), HTML5 reflects the Web as it was during its development: 28 elements are new, many of them inspired by the class and id names above, because that’s what developers actually build. So, while we’re in a pragmatic rather than philosophical mood, let’s actually use them. Here is a sample blog home page marked up as we do in HTML 4 using the semantically neutral element:
Yesterday
Today I drank coffee for breakfast. 14 hours later, ¬ I went to bed.
Tuesday
Ran out of coffee, so had orange juice for breakfast. ¬ It was from concentrate.
By applying some simple CSS to it, we’ll style it: #sidebar {float:left; width:20%;} .post {float:right; width:79%;} #footer {clear:both;}
Diagrammatically, the page looks like Figure 1.4.
8
IN TROD U CIN G H TML5
FIGURE 1.4 The HTML 4 structure of our blog.
div id="header"
div class="post" div id= "sidebar" div class="post"
div id="footer" While there is nothing at all wrong with this markup (and it’ll continue working perfectly well in the new HTML5 world), most of the structure is entirely unknown to a browser, as the only real HTML element we can use for these important page landmarks is the semantically neutral
(defined in HTML 4 as “a generic mechanism for adding structure to documents”). It’s possible to imagine a clever browser having a shortcut key that would jump straight to the page’s navigation. The question is: how would it know what to jump to? Some users use