This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
We realize you’re a busy professional with deadlines to hit. Whether your goal is to learn a new technology or solve a critical problem, we want to be there to lend you a hand. Our primary objective is to provide you with the insight and knowledge you need to stay atop the highly competitive and everchanging technology industry. Wiley Publishing, Inc., offers books on a wide variety of technical categories, including security, data warehousing, software development tools, and networking — everything you need to reach your peak. Regardless of your level of expertise, the Wiley family of books has you covered. • For Dummies® – The fun and easy way™ to learn • The Weekend Crash Course® – The fastest way to learn a new tool or technology • Visual – For those who prefer to learn a new topic visually • The Bible – The 100% comprehensive tutorial and reference • The Wiley Professional list – Practical and reliable resources for IT professionals The book you now hold is part of our new 60 Minutes a Day series which delivers what we think is the closest experience to an actual hands-on seminar that is possible with a book. Our author team are veterans of hundreds of hours of classroom teaching and they use that background to guide you past the hurdles and pitfalls to confidence and mastery of XML in manageable units that can be read and put to use in just an hour. If you have a broadband connection to the Web, you can see Linda and Al introduce each topic — but this book will still be your best learning resource if you download only the audio files or use it strictly as a printed resource. From fundamentals to security and Web Services, you’ll find this self-paced training to be your best learning aid. Our commitment to you does not end at the last page of this book. We’d want to open a dialog with you to see what other solutions we can provide. Please be sure to visit us at www.wiley.com/compbooks to review our complete title list and explore the other resources we offer. If you have a comment, suggestion, or any other inquiry, please locate the “contact us” link at www.wiley.com. Finally, we encourage you to review the following page for a list of Wiley titles on related topics. Thank you for your support and we look forward to hearing from you and serving your needs again in the future. Sincerely,
Richard K. Swadley Vice President & Executive Group Publisher Wiley Technology Publishing
more information on related titles
Wiley Going to the Next Level Available from Wiley Publishing 60 Minutes a Day Books... • Self-paced instructional text packed with real-world tips and examples from real-world training instructors • Skill-building exercises, lab sessions, and assessments • Author-hosted streaming video presentations for each chapter will pinpoint key concepts and reinforce lessons
0-471-43023-4
0-471-42548-6
0-471-42314-9
0-471-42254-1
Available at your favorite bookseller or visit www.wiley.com/compbooks
Wiley, For Dummies, The Fun and Easy Way, Weekend Crash Course, Visual and related trademarks, logos and trade dress are trademarks or registered trademarks of Wiley. Java and J2EE are trademarks of Sun Microsystems, Inc. All other trademarks are the property of their respective owners.
XML in 60 Minutes a Day
Linda McKinnon Al McKinnon
Executive Publisher: Robert Ipsen Vice-President and Publisher: Joseph B. Wikert Senior Editor: Ben Ryan Editorial Manager: Kathryn A. Malm Developmental Editor: Jerry Olson Production Editor: Vincent Kunkemueller Media Development Specialist: Kit Malone Text Design & Composition: Wiley Composition Services Copyright 2003 by Linda McKinnon and Al McKinnon. All rights reserved. Published by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8700. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail: [email protected]. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Trademarks: Wiley, the Wiley logo and related trade dress are trademarks or registered trademarks of Wiley in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. Wiley Publishing, Inc., is not associated with any product or vendor mentioned in this book. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Library of Congress Cataloging-in-Publication Data: ISBN: 0-471-42254-1 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
A Note from the Consulting Editor Instructor-led training has proven to be an effective and popular tool for training engineers and developers. To convey technical ideas and concepts, the classroom experience has been shown to be superior when compared to other delivery methods. As a technical trainer for more than 20 years, I have seen the effectiveness of instructor-led training firsthand. 60 Minutes a Day combines the best of the instructor-led training and book experience. Technical training is typically divided into short and discrete modules, where each module encapsulates a specific topic. Each module is then followed by “questions and answers” and a review. 60 Minutes a Day titles follow the same model: each chapter is short, discrete, and can be completed in 60 minutes a day. For these books, I have enlisted premier technical trainers as authors. They provide the voice of the trainer and demonstrate classroom experience in each book of the series. You even get an opportunity to meet the actual trainer: As part of this innovative approach, each chapter of a 60 Minutes a Day book is presented online by the author. Readers are encouraged to view the online presentation before reading the relevant chapter. Therefore, 60 Minutes a Day delivers the complete classroom experience—even the trainer. As an imprint of Wiley Publishing, Inc., Gearhead Press continues to bring you, the reader, the level of quality that Wiley has delivered consistently for nearly 200 years.
Thank you. Donis Marshall Founder, Gearhead Press Consulting Editor, Wiley Technology Publishing Group
iii
Contents
Acknowledgments
xvii
About the Authors
xix
Introduction
xxi
Chapter 1
XML Backgrounder Why Do We Need a History Lesson Chapter? Basics: From Documents to Markup and Metalanguages What’s a Document? What Is Markup? XML Is a Markup Language and a Metalanguage Markup Languages Metalanguages
The Evolution of XML The Advent of Generic Coding GML Led the Way Other Typesetting Developments SGML: Parent of HTML and XML HTML: The Older Sibling of XML
The Arrival of XML XML-Related Applications The World Wide Web Consortium and XML Possible XML Issues: “Nobody’s Perfect (Yet)”
Lab Exercises: Instructions and Conventions A Brief Introduction to Space Gems, Inc. Chapter 1 Labs: Web Exploration Summary
1 2 3 3 4 6 6 7
7 7 8 10 12 15
17 22 25 27
28 29 29 32
v
vi
Contents Chapter 2
Setting Up Your XML Working Environment Hardware Requirements Web Servers Web Browsers XML Authoring Tools Simple Text Editors Graphical Editors Use Only the Latest Versions of Microsoft Word for HTML/XML Creation Integrated Development Environments
Converting HTML Documents to XML Chapter 2 Labs: Creating an XML Authoring Environment Computer System Requirements Operating System Requirements Creating Your XML Environment: Overview
Chapter 3
45 47 49 52
55 56 56 56 56
Summary
62
Anatomy of an XML Document What Are XML Documents? XML Document Processing
67 67 68
Applications XML Parsers Document Errors
The Structure of XML Documents The Logical Structure The Prolog The Data Instance The Physical Structure: Entities Entities Are Parsed or Unparsed Entities Can Be Internal or External General Entities versus Parameter Entities
Preserving Characters from Parser Misinterpretation Predefined Entities Numeric Character References CDATA Sections
Chapter 4
39 40 40 43 45
68 69 70
70 71 71 77 92 93 93 94
97 97 98 100
What Is a Well-Formed XML Document? What Is a Valid XML Document? Chapter 3 Labs: Anatomy of an XML File Summary
101 104 105 112
Document Type Definitions What Are Document Type Definitions? Why Use Document Type Definitions? Creating DTDs—General DTD Types and Locations
117 118 119 120 121
Internal DTD Subsets External DTD Subsets
122 122
Contents Private External DTDs External DTD Subsets Located at Web Sites Remote External DTDs with Public Access Internal DTDs Combined with External DTDs
DTD Declarations: General Element Type Declarations The Content Model Elements Containing Parsed Character Data Element Types Containing Other Element Types Element Types Containing Mixed Content Empty Element Declarations Elements with “Any” Content Element Content Operators Attribute List Declarations Attribute Declarations to Preserve White Space Language ID Attribute Declarations Entity Declarations General Entity Declarations Parameter Entity Declarations Notation Declarations Non-XML Data Introduced with an Attribute Non-XML Data Introduced as an Entity Declaring Namespace Attributes in the DTD Default Namespace Declarations Prefix Namespace Declarations Limitations of DTDs with Respect to Namespace Declarations Normalization
XML Schemas What Are Schemas? XML Schema 1.0: A Two-Part W3C XML Schema Recommendation The XML Schema Abstract Model The Logical Structure of a Sample XML Schema
161 162
The Prolog The Element: Namespaces and Qualified or Unqualified Locals Namespace Declarations Target Namespaces The minedata.xsd Document as a Support Schema Global and Local References: Qualified and Unqualified Locals
163 164 166 168 170 170 171 172 173
vii
viii
Contents Element Type Declarations The Element Declaration: Complex Data Types The Element Declaration Compositors Empty Element Content The Element: Simple Data Types Mixed Content Elements Using Facets to Define Data More Precisely
Chapter 6
175 176 178 181 181 182 183 184
Schema Document Structures
186
The Nesting Structure The Flat Catalog Structure
186 188
Using Schemas and DTDs Together Chapter 5 Labs: Creating Simple Schemas Summary
190 191 200
XHTML HTML Review
205 206
A Brief History of HTML and XHTML HTML Shortcomings
XHTML Definition and Background Advantages of XHTML XHTML Is Related to XML XHTML Is Extensible XHTML Is Modular XHTML Is Portable
XHTML 1.0’s Three Variants, DTDs, and Schemas The XHTML 1.0 Strict Variant The XHTML 1.0 Transitional Variant The XHTML 1.0 Frameset Variant
XHTML Syntax The Logical Structure of an XHTML Document The Prolog The Data Instance XHTML Follows XML’s Strict Syntax Rules XHTML Element Types Must Be Properly Nested All HTML-Related Tag Names Must Be Lowercase All XHTML Elements Must Be Closed Attribute Names Must Be Lowercase; Attribute Values Must Be Quoted Attribute Minimization Is Forbidden The name Attribute Has Been Replaced by the id Attribute
Start Moving to XHTML Soon! Converting Web Sites to XHTML XHTML Utilities and Services Provided by W3C W3C’s HTML Validation Service Amaya, W3C’s Editor and Browser
206 208
209 211 211 212 213 214
214 215 216 217
218 218 219 219 220 220 221 221 223 223 224
225 225 226 226 227
Contents Other XHTML Utilities and Services HTML Tidy HTML-Kit
Chapter 7
229 229 229
Chapter 6 Labs: Creating XHTML Documents Summary
230 234
XML and Cascading Style Sheets Overview of Cascading Style Sheets
239 240
CSS and the World Wide Web Consortium Dave Raggett’s Adding a Touch of Style Web Site W3C’s CSS Validation Service Coping with CSS Issues
Specifying Styles for HTML and XML Documents Inline Style Specifications Internal Style Sheet Specifications Internal Style Sheet Specifications for HTML and XHTML Internal Style Sheet Specifications for Other XML-Related Language Documents Affiliating Documents with External Style Sheets Affiliating HTML and XHTML Documents with External Style Sheets Affiliating Other Types of XML Documents with External Style Sheets
CSS and the Parsing Process Creating CSS Style Rules Basic Style Rule Syntax Selectors Declarations Displaying Inline versus Block Elements Selectors with Pseudo-Elements Grouping Selectors by Classes Grouping Selectors by Pseudo-Classes Combining Pseudo-Classes with Other CSS Classes Grouping Selectors by the ID Attribute Inserting Images as Backgrounds Inserting Images as Discrete Elements Drawing Borders around Elements Text Alignment, Margins, and Indentations Absolute and Relative Positioning Example: Absolute Positioning Example: Relative Positioning
The Cascading Nature of Cascading Style Sheets Chapter 7 Labs: Applying CSS Summary
XLinks XLink: The XML Linking Language The W3C and XLink XLink and XPointer Implementations Basic XLink Concepts Resources Link Traversal, Arcs, and Link Direction
XLink Logical Structures Declare an XLink Namespace Naming XLink Links XLink’s Global Attributes A Linking Element Needs a type Attribute Other Important Attributes: show and actuate Combining XLink Type Elements and Attributes: Two Restrictions Example: Simple-Type XLink Example: Extended-Type XLink
Combining XLink, XPath, and XPointer to Access Subresources The XML Path Language (XPath) XPath Expressions, Location Paths, and Location Steps XPath Expressions and Location Paths Location Steps Axes Node Tests Predicates XPath Expressions Can Contain Functions
The XML Pointer Language Extends XPath
Chapter 9
283 284 284 285 285 285 286
286 287 287 287 289 291 292 294 295
300 300 301 301 303 304 306 307 308
311
Pointers Address a Document’s Internal Structure XPointer Basics: Points, Ranges, and Locations XPointer Points Node-points Character-points XPointer Ranges Browser Display of XLink Links and Syntax
312 313 313 313 314 314 315
Chapter 8 Labs: Using XLink, XPath, and XPointer Summary
316 325
XML Transformations Why Transform XML Data? The W3C and Transformations
331 332 333
The Extensible Stylesheet Language (XSL) XSL Parsers The XSL Transformation Language (XSLT) XML Path Language (XPath)
333 334 334 335
Contents Sample XML Transformation: Tabulating a List of Diamonds The XML Source Document The XSLT Style Sheet Node 5: Begin Transformation Using Query Contexts and First Template Rule Nodes 6 through 12: Creating Elements Using Node 13: Building an HTML Table with XSLT Element Types Node 14: Processing Continues on the Source Node Node 15: The Current Template Rule and a Template Rule for Node 16: Creating the First Row in the HTML Table Node 17: More Template Patterns Fill Out the Table Row Nodes 23 through 25: Filling Out the Individual Name Table Cell Nodes 18 through 22: Filling Out the Other Cells in the Table Row Filling In the Other Rows in the Table
Chapter 9 Labs: Using XML Transformation Software Summary Chapter 10 XML Data Binding What Is Data Binding? Performing Data Binding Data Placeholders: Data Consumer Elements The Element The Element The
Element Data Source/Data Fields: The datasrc and datafld Attributes Data Nesting and the Two-Level Rule Data Island Storage of XML Data External Data Islands Internal XML Data Islands Data Binding and Table Repetition Agents Data Source Objects (DSOs)
Navigating Recordsets Chapter 10 Labs: Data Binding with XML Summary
VML Development What Is VML? A Definition Creating VML Documents Logical Structure: A Prolog and an Element Namespace Declarations
419 420 421 422 422
xi
xii
Contents Behavior Declarations VML Elements in the Element The Element Creating Graphic Objects Using the path Attribute or Element VML’s Predefined Shapes The Element for Frequently Used Custom Figures Figure Placement Altering the Appearance of VML Figures Grouping Shapes Together
Chapter 12 SMIL What Is Streaming Media? What Is the Synchronized Multimedia Integrated Language?
455 455 456
The W3C and SMIL SMIL 1.0 SMIL 2.0 XHTML+SMIL Profile Viewing and Creating SMIL Documents
457 457 458 459 459
Creating SMIL Documents The Prolog The SMIL 1.0 DTD The Root Element: The Element The Element The Element The Element When Media Object Dimensions Don’t Match Region Dimensions The Element The Element The Element: Content, Temporal, and Linking Information Synchronizing Media Objects with the and Elements The SMIL Media Object Elements The Element SMIL’s Hyperlinking Elements
Chapter 12 Labs: Getting Started with SMIL Summary
Contents Chapter 13 RDF Web Search and Publication Issues Metadata Is the Key to the Solution The W3C, PICS, and RDF RDF Defined The Semantic Web and Recent RDF Developments RDF Implementations
RDF Concepts and Syntax Statements Resources Properties Values RDF Graphs The Logical Structure of an RDF Document The Prolog The Root Element, Namespaces, and Content Models Resource Descriptions Are Nested within Elements Property Elements Abbreviating RDF Substituting Our Own XML Data into Others’ Data Content Models Using the resource Attribute
Chapter 13 Labs: Creating and Validating RDF Summary Chapter 14 CDF Basic Communication Concepts
Basic Webcasting and Managed Webcasting What Are Channels?
531 532
The User’s Side of CDF: Accessing Channels
534
Investigating Available Channels Adding a Web Site Channel to Your Favorites List Adding a Channel from a Web Site That Does Not Provide a CDF Subscription Adding a Channel from a Web Site That Offers CDF Subscription Channel Synchronization: Setup and Activation Viewing a Channel Offline
Development of the CDF Specification CDF Resources Channel Definition Format: A Definition
The Publisher’s Side of CDF: Creating CDF Channels Designing the Channel Creating Logo Images
534 536 536 539 542 543
544 544 545
546 547 549
xiii
xiv
Contents The Logical Structure of a CDF Document The Prolog The Element Other CDF Elements Special Characters and Character Encoding Test Your Comprehension with a Sample CDF File Posting the CDF File to the Web Server Providing Access to the Channel
Chapter 14 Labs: Getting Started with CDF Basic CDF File for Web Pages
Summary Chapter 15 SOAP What Are Web Services? The UDDI : Organization, Project, Specification, and Registry The Web Service Description Language (WSDL) WSDL Development A Real WSDL File at Work: The GetLocalTime Web Service WSDL File Structure A Sample WSDL Document File: GetLocalTime The Prolog The Root Data Element The Element The Element The Element The Element The and Elements The Last Line The Bottom Line
What Is SOAP? Development of the SOAP Specification
Basic SOAP Message Construct The SOAP Envelope The SOAP Header The role Attribute The mustUnderstand Attribute The encodingStyle Attribute The SOAP Body SOAP Request Example SOAP Response Example SOAP Faults Values for the Element within the Element Example SOAP Fault Message
Chapter 15 Labs: Accessing Web Services with SOAP Summary
Contents Chapter 16 MathML Mathematical Expression Issues Early Visual Presentation Solutions The W3C and MathML The W3C Math Working Group MathML Design Goals MathML Implementations
What Is MathML? The Logical Structure of a MathML Document The Prolog MathML DTDs or Schemas MathML and Style Sheets MathML Markup Specifications The Element MathML and Namespaces MathML Attributes Bases, Scripts, Characters, and Symbols Presentation Markup Content Markup Prefix Notation Combining Presentation and Content Markup
Two Basic Math-Expression Creation Techniques and Concepts Abstract Expression Trees Layout Boxes
Chapter 16 Labs: Getting Started with MathML Summary
640 649
About the 60 Minutes Web Site
655 659
xv
Acknowledgments
We teach several courses in several information technology curricula. This book is dedicated to all those students who, no matter what their level of expertise, spoke out in class or approached us on the side to ask us about basic XML concepts. It is difficult, we know, to find time to become familiar with the basic concepts of a new and unfamiliar technology like XML, especially when our colleagues already seem to be “in the know.” We thank them for their courage and dedication, and for pointing us in the right direction regarding topics to present in this book. There are many others to thank. A big thanks to Donis Marshall of Gearhead Press for providing this opportunity, for providing support and direction, and for being patient beyond measure. Thanks, too, to J.W. (Jerry) Olsen, our project manager, who suffered with us the most, along with two editors he managed, Sydney Jones and Joann Woy. This is a far better product because of their efforts, flexibility, and adaptability. Thanks to Ben Ryan, Kathryn Malm, and Vincent Kunkemueller at Wiley Publishing, Inc., for their support and patience, too. Finally, thanks to our friends and family. In the future (well, at least until the next project), we promise not to be so preoccupied and to put in more “face time” with them.
xvii
About the Authors
Linda McKinnon has a Mass Communications degree and has worked for more than 20 years in computing science and information technology. She has performed increasingly advanced work—design, development, implementation, database management, data control, and system security—on large corporate computer systems across various platforms. At the same time, her duties have also included user administration and assistance, and troubleshooting both mainframe and personal computer systems and networks. Since 1990, Ms. McKinnon has been president and senior consultant for Skills in Motion Inc. In that capacity, she has been responsible for providing, and occasionally developing, instruction on the installation, configuration, and administration of various platforms, such as AIX, Linux, other Unix flavors, Novell NetWare, and Windows NT/2000, 9x, and XP. She is also an expert at TCP/IP addressing and configuration. More recently, Linda has been responsible for the installation, implementation, and administration of many IBM p-Series (RS/6000) SP2 systems. Because of her background in Java, JavaScript, and XML programming, as well as Web services and other Web development, she also teaches those curricula on IBM’s WebSphere Server Application Development systems. Al McKinnon is an engineer, technical author, and trainer who assists clients throughout North America in the areas of network design, installation, and auditing. He has been a contributing author to national standards and has written manuals, specifications, provincial policies, procedures, regulations, legislation, magazine articles, and editorials. Al and Linda are headquartered in Calgary, Canada.
xix
Introduction
Welcome to XML in 60 Minutes a Day! If you’re interested in learning about XML, this is a good place to start. Or if you’re interested in building a simple XML-based Web site, you can also start here. We know there are several XML books available already: textbooks, handbooks, pamphlets—you name it. If you are in a bookstore or library, you are probably surrounded by them. You may even have one or more already at your workstation or office, at home, or in your study carrel. Plus, there are also plenty of Internet sites where you can learn almost everything about XML, from a quick overview to an explanation of the finest syntactic or semantic details. So, why should you choose this book? In the next few sections, we hope to tell you why, to convince you that this book is a good introductory textbook, a good reference manual, and a good investment in your future. It may even entertain you.
Overview of the Book and Technology Development of XML and its related standards, specifications, and vocabularies is proceeding at an almost explosive rate, with simultaneous progress on many fronts and with ever-evolving objectives. Those who want or need to learn about XML quickly need answers to questions like these: What is XML? Where did it come from? How do I get started? What do I concentrate on? What can I learn that’s useful to me now? How long is it going to take to be productive with XML?
xxi
xxii
Introduction
We can help to answer those questions. We wrote this book for several reasons: ■■
It reflects what our colleagues and students have requested for years: an easily read text that introduces and explains what they need to know now to get up to speed with XML in a Windows environment. Meanwhile, the companion Web site, discussed in Appendix A, will help those who work in a Linux environment.
■■
Although there are many XML books on the market, we wanted to create one that would allow you to be up and running with XML in a proper order and according to an optimal schedule.
■■
This book contains material comparable to what you would find in a good introductory XML course. The price of this book is pretty attractive compared to what you would pay at any technical institute, college, or university for a comparable XML intro course.
■■
This book also makes a great companion for almost anyone’s introductory XML course. Its definitions, explanations, lab exercises, and review questions supplement material in others’ courses. In fact, we take the time to explain some concepts that, because of scheduling or prerequisite assumptions, instructors tend to gloss over or omit.
■■
If you follow the lab procedures in this book, you can actually build your own XML-oriented Web site, quickly and inexpensively.
■■
This book will also help you if you are pursuing XML certification. We want to help you get ahead. Our quiz questions are comparable to those you will eventually find on an XML certification test. But please don’t look for everything you will need to know for an XML certification test. This is, after all, an introductory-level book.
■■
Finally, the book is written as an invitation to you to get involved with XML development. You may already have knowledge, experience, interest, or even the enthusiasm to help with the XML revolution. Or you may be just around the corner from it. If there is a topic that you find interesting or exciting, it’s never too late to volunteer. In almost every chapter, you will see several opportunities to contact those who are continually developing XML standards and vocabularies.
What a challenge it is to be as up-to-date as possible! XML-related standards are constantly being updated. To help you keep pace, we provide Web site references in every chapter so that you can check for the latest developments. When you check the Web sites, you’ll see that the changes are overwhelmingly for the better.
Introduction xxiii
How This Book Is Organized From the outset, we knew that the outline for our book would be part rigid, part flexible. What does that mean? Well, the first five chapters of this book provide the most basic and fundamental XML information and open the door to the topics in the rest of the book. The latter chapters address several related XML standards and languages, and provide you with other real-world XML information and capabilities. As an initial strategy, then, we suggest that you start with Chapter 1 and proceed right through to Chapter 16. That way, you will receive the information in what we consider to be an optimal and cumulative order, and you will be able to construct your version of the example Web site in the proper sequence. Alternately, if you are not intending to perform the lab exercises and construct a Web site, you might start with Chapter 2 and proceed to the end of Chapter 5 to get the basics. Then, you could examine the other chapters as you need to or your curiosity guides you. In that case, you can also go to the book’s site at www.wiley.com/compbooks/60minutesaday and download various source or solution files to examine their content and structure, or go to the Space Gems, Inc. Web site and examine the source code of the documents you find there. You probably want to know what’s in our book. Like many introductory courses and textbooks, this one begins with a discussion of the technology it will introduce; that is, it explains the origins of XML and shows you where it fits into the information technology world and into the development of the World Wide Web. In Chapter 1, we go right back to basics. We explain basic document and markup concepts. After that, we define XML as a markup language and a metalanguage. That is followed by a brief history of XML and its ancestor technologies. The World Wide Web Consortium (the W3C) is essential to XML development, so we discuss that organization, its principles, and its objectives, too. Chapter 2 explains how to create an XML working environment, since we are anxious to get up and running quickly, so that we can begin creating our sample Web site. It starts by specifying hardware requirements and then discusses Web server, Web browser, and XML authoring applications. The lab exercises provide step-by-step instructions for installing, configuring, and using the applications we will use for the remainder of the book. In Chapter 3, we begin to discuss XML documents and their processing. We talk about XML-related applications, XML processors (also called parsers), and XML errors. Then we discuss the physical and logical structure of a generic XML document. Chapter 3 continues with an introduction to the basic
xxiv Introduction
components of an XML document: element types, attributes, namespace declarations, and entities. It concludes with definitions of two important concepts: the well-formedness and validity of XML documents. Chapters 4 and 5 discuss two methods for defining (the official XML term is declaring) the components of XML-related documents for purposes of document validation: the use of the more traditional document type definitions (DTDs) and the newer-technology XML schemas. A knowledge of DTDs and schemas is essential if you will eventually be creating your own specific XML vocabularies. In Chapter 6, we introduce the largest of the XML-derived languages developed so far, XHTML. XHTML resembles HTML Version 4 and is expected to replace HTML eventually. We discuss the conversion of existing HTML documents to XHTML and the creation of XHTML documents from scratch. We also list some free utilities that facilitate those activities. We introduce the Cascading Style Sheet language (CSS) in Chapter 7. Not only do cascading style sheets allow designers to control data semantics and structure they facilitate the transformation of XML data into an appealing presentation as well. Chapter 8 shows you how to create XML-related hyperlinks and even how to integrate them with your existing Web page projects. We discuss three XMLrelated standards that provide linking capability: the XML Linking language (XLink), the XML Path language (XPath), and the XML Pointer language (XPointer). Together, they overcome the inadequacies in classic HTML linking. In Chapter 9, we discuss another method for transforming XML documents, using the Extensible Stylesheet Language (XSL) family of XML-related standards. But unlike the display-oriented style sheets discussed in Chapter 7, the Chapter 9 style sheets prepare XML data for further processing. Chapter 10 presents XML as both data sources and as data retrieval documents. We discuss basic XML-related data binding concepts and the agent applications that synchronize and retrieve data in an XML environment. Chapters 11 and 12 are a little more fun than the transformation and data binding chapters. Chapter 11 introduces the Vector Markup Language (VML), the prevailing XML-related graphics language. Chapter 12 introduces SMIL (the Synchronized Multimedia Integration Language), which is used for adding multimedia to Web page documents. In Chapter 13, we discuss the Resource Description Framework language (RDF), which allows us to include appropriate meta data in our Web page documents to describe the information in those documents clearly and accurately. RDF will eventually make our systems seem “smarter,” since it will make our Web searches faster and provide the information we really want. Chapter 14 explains the Channel Definition Format language, which allows Web users and publishers to obtain or provide, respectively, regularly updated Web site information. We bet you’ve already used CDF without knowing its name or how valuable it can be.
Introduction
Chapter 15 introduces the Simple Object Access Protocol (SOAP), which has become the most popular protocol for exchanging messages with and otherwise accessing Web services. In this chapter, we discuss Web services in general, the Universal Description, Discovery, and Integration service in particular, and the construction and use of SOAP messages. Chapter 16 takes us back almost to the roots of XML. The Mathematical Markup Language (MathML) has been developed to help us share mathematical and scientific expressions across the Web. MathML allows us to not only display the various numbers and symbols in our equations, but also to transmit their actual meaning. The Appendix contains information about what you can expect to find on the three XML in 60 Minutes a Day companion Web sites. One will provide instructional audio and video presentations. The second will provide downloadable resource and solution files to help you complete the lab exercises found in this book. The third companion Web site is the Web site that belongs to the fictitious Space Gems, Inc. company. When we began this book, we thought it would be instructive and fun to help you the reader create your own real, operating Web site. So we created an imaginary gemstone exploration and marketing company called Space Gems, Inc. When you perform the lab exercises, you can perform tasks that the Space Gems Web site designer and administrator would perform.
Who Should Read This Book We wrote our book for several audiences, including: ■■
The experienced HTML Web site designer, developer, or Web site administrator who faces a transition from HTML to XML
■■
The manager who faces updating or upgrading an Internet service
■■
The student who faces an introductory XML course or who has been fast-tracked into an intermediate-level course and isn’t quite sure about having the prerequisite knowledge and experience to keep up with the instructor or other students
■■
The work-at-home or small business professional whose firm never seems to have enough funds for training, yet who needs to stay current with Web technology
You don’t need a lot of experience to understand and use this book. It is geared toward the XML newcomer. Granted, it might be beneficial if you already have a background in HTML or Web site publishing or administration, but that’s not necessary. (An old “discount bin” HTML manual is usually sufficient.)
xxv
xxvi Introduction
Occasionally, we mention some advanced concepts, but we don’t dwell on them. We mention them mostly to stimulate your curiosity.
Tools You Will Need In Chapter 2, “Setting Up Your XML Working Environment,” and in the Appendix, we describe the hardware you will need to perform the lab exercises and to access and use the three companion Web sites to XML in 60 Minutes a Day. Thereafter, we suggest you install Windows XP Professional or Windows 2000 Professional as a base operating system, with Internet Explorer as your base Web browser application. In Chapter 2, we describe all the applications you will need to perform your lab exercises. If additional or different applications are required for later exercises, we tell you where the applications are located and how to install them. We have tried to find online sources that are free or that provide trial periods that are long enough for you to complete the relevant exercises. As we mentioned in the earlier Overview of the Book and Technology section, copies of our lab exercises that are oriented to the Linux operating system are available at www.wiley.com/compbooks/60minutesaday. Although we used the Red Hat distribution of Linux to create the exercises, any version of Linux will suffice to perform them. Please be aware that some of the Linux XML labs still require you to use Internet Explorer to test the procedures. On those occasions, you will need both a Windows system and a Linux system. To help you share files between the two systems, we have provided additional technical solutions at our (the authors’) Web site at www.skillsinmotion.com.
Summary We hope you’ll enjoy this introduction to XML. Once you’ve worked your way through the book, you’ll have enough background to begin creating many XML documents and to contribute to almost any HTML or XML-related Web site. Plus, you will have enough basic knowledge to tackle an intermediatelevel XML course or text. Besides being a good introductory course, this book also is a good reference manual and a good investment in your future. Good luck! And thanks for selecting our book!
CHAPTER
1 XML Backgrounder
The past five or six years have witnessed an explosive growth of Extensible Markup Language (XML) as more individuals and organizations link their computer systems together to exchange data and create usable information, and as more vendors convert their electronic commerce Web sites to provide goods and services. XML has matured quickly and now is capable of providing a standard for the structure, transmission, and interchange of data, whether that data travels within the same computer system, through a local network, or clear across the globe, and whether the applications and operating systems processing the data are identical or different. All of the major software companies—most notably the Web browser developers such as Microsoft, Netscape, Mozilla, Konqueror, and Opera—are enthusiastic about XML. Promoting the use of XML standards is the next step in the evolution of the World Wide Web. This book introduces you to XML and shows you why XML is becoming so popular. It also introduces you to several XML-related languages and standards as we teach you to develop a simple e-commerce Web site. You will build this Web site yourself over the course of several laboratory exercises. In this first chapter, we provide an overview of some basic document processing concepts and then discuss the context of XML’s development and the development of its predecessors. We’ll define and discuss markup, markup
1
2
Chapter 1
languages, and metalanguages, too. We’ll then discuss the need for standards, the role of XML as a standard, and the role of the World Wide Web Consortium (W3C) in the development of the World Wide Web, XML, and other Webrelated technologies. By the end of this chapter, you should be familiar with basic markup concepts and be able to participate in any general conversation about XML as a metalanguage and a markup language.
Why Do We Need a History Lesson Chapter? We swear that the exchange in the accompanying Classroom Q & A actually took place just before we began this book. The question is verbatim, but we’ve paraphrased our answer a little.
Classroom Q & A Q: I was in the bookstore yesterday and I was looking at some XML books. Why do so many XML books begin with some sort of history lesson? Why should we care about XML’s history? Why not just get at it? A: It’s true that this first chapter is a combination of concepts and history, but there are several reasons for chapters like this: ■■
XML’s development process is meaningful to your understanding of its concepts and its open, independence-oriented culture.
■■
The XML story is interesting and even heroic. The fact that you’re reading this means you are about to become a character in the story, too. And many of its heroes are among us— some of them you can actually contact with just a few mouse clicks and keystrokes. They’re fighting the good fight, and they’d be happy to have you assist them.
■■
XML didn’t just happen yesterday, and it didn’t happen all at once. It’s not just another flavor of the month. It has evolved from its predecessors over the past 40 years or more, and it’s expanding and evolving constantly.
■■
We’ll show you how XML draws from its heritage, how it constantly evolves to cope with ever-growing needs, and how it pays dividends for the worlds of communication and commerce.
XML Backgrounder
Meanwhile, to illustrate the evolving nature of XML, from the time we began drafting this book until the time we finished it, we had to revise several chapters to keep the information current. By the time you read this book, no doubt even more changes will have occurred. That’s why we provide Web site and other references so that you can get the latest XML information and updates. Several chapters introduce XML-derived and -related markup languages. Because each language came along at a different time and because each has a rather unique heritage and evolution, there will be a brief historical summary in each of those chapters, too. Let’s start our background and history chapter with a discussion of some basic concepts that will appear several times throughout this book.
Basics: From Documents to Markup and Metalanguages This section examines some basic concepts and then uses those concepts to build a definition of XML.
What’s a Document? Outside the IT world, we encounter all sorts of hard-copy documents: letters, forms, books, newspapers, magazines, invoices, maps, birthday cards, leaflets, posters, sticky notes, and many others. The concept of the hard-copy document evolved almost without notice. When we encounter new types of hardcopy documents in our homes, offices, classrooms, libraries, stationery stores, or local newsstands, we seem to accept them unconsciously. Meanwhile, within the IT world, the concept of the electronic document has evolved, too. Let’s start with a more basic definition first: the definition of text. Text is generally considered to consist of words, sentences, lines, paragraphs, and even pages. Typically, the term text also refers to electronic text stored as only simple character codes (for example, American Standard Code for Information Interchange, or ASCII, codes)—that is, without any formatting. At one time, the electronic document was only considered to be a text file created with applications called text editors or word processors. You could almost use the terms text and document interchangeably. However, as developments occurred on many IT fronts, the concept of the electronic document expanded to contain tables, graphics, charts, and other objects, in a manner that parallels the evolution of hard-copy documents. Now, in the IT world, documents are considered to be electronic files of any size for any media (for example, text, audio, video, and graphics), created by any application. So now, the definition of text is a subset of the definition of the document.
3
4
Chapter 1
In their Extensible Markup Language 1.0 Recommendation, which is recognized as the official XML standard, the W3C defines an XML document as a “data object if it is well-formed, as defined in (Extensible Markup Language Recommendation). . . . Each XML document has both a logical and a physical structure.” (We discuss the W3C in more detail in The World Wide Web Consortium and XML section later in this chapter.) That definition might appear obscure at this point, but don’t worry. We discuss and expand on that definition in Chapter 3, “Anatomy of an XML Document,” when we discuss XML documents in more detail. Actually, we discuss some form of XML-related document or another in almost every chapter, but Chapter 3 provides the most essential and basic discussion of document components and structure. Related to the discussion of documents is the term document processing, which is the discipline that deals with creating applications that allow you to deal with documents of all types. Document processing is split into creating or manipulating those documents destined for human viewing and consumption (people-oriented processing), as well as those that are destined for computer consumption (machine-oriented processing). Documents of the former type were comparatively long-lived (examples: specifications, drawings, procedures, charts, and memos). Documents of the latter type tend to have shorter lives because their data may be manipulated, transformed, or combined on the fly to create or add to different documents. As you’ll see, XML descends from a rich document-processing heritage.
What Is Markup? The concept of markup is important. After all, it’s the M in XML. But what does it mean? Basically, it’s a way to add information about data to the data itself. You may not have had much experience with other markup languages, but you have probably used markup in one form or another. For example, have you ever: ■■
Underlined or highlighted words or passages on a hard-copy document to indicate important information?
■■
Marked up a draft hard copy of a document with symbols indicating “new paragraph here,” “bold this,” or “remove this”?
■■
Made marks on a map indicating where you want to turn, or where specific features are located?
■■
Numbered bits of information, such as steps, in an otherwise unnumbered procedure?
Those and similar activities involve marking up data. All the symbols, notes, numbers, designated actions, or highlights—all of which qualify as
XML Backgrounder
some sort of markup—emphasize or convey something about the data: what it means or what you are supposed to do with it. A significant paper titled “Markup Systems and the Future of Scholarly Text Processing,” by James H. Coombs and Allen H. Renear of Brown University and Steven J. DeRose of Electronic Book Technologies, describes six types of markup: ■■
Punctuational, which consists of the use of defined marks (examples: spaces, periods, and commas) to provide primarily syntactic information about written utterances. Punctuation has been around so long that we take it for granted.
■■
Presentational, which we use to group our materials for order and clarity. Examples include horizontal and vertical spacing, page breaks, numbering, chapter and section breaks, justification, and lists.
■■
Procedural, which is a characteristic of whatever system will be used to create presentations. Often grouped with what we call file formats, it tells someone or something (such as a formatter with a set of installed drivers) about the size and format of a document (examples: letter, legal, and portrait and landscape views), fonts, and other production information.
■■
Descriptive, which allows authors to identify certain elements of their data as belonging to a specific family of text. The common wordprocessing tag BT (for basal text) is an example: When a text formatter encounters that code, it consults, and then follows, a predefined set of rules that tell it what to do to display or print the characters associated with that code. If changes become necessary, you only need to change the rules, not each BT tag in the document.
■■
Referential, which refers to separate physical or electronic entities (that is, located external to the document being processed) that will be imported and placed in the proper sequence during document processing. In Chapter 3 and elsewhere in this book, you will see how to incorporate audio, video, and other files into XML documents by using this type of markup.
■■
Metamarkup, which provides the ability to control the definition and interpretation of markup tags, and to extend the vocabulary of derivative markup languages. Metadata, the concept of information about information, is related to this concept. If you would like to read the Coombs, Renear, and DeRose paper that the preceding definitions were taken from, you can find it online at www.oasis-open.org/cover/coombs.html#Figure1.
5
6
Chapter 1
Markup, in summary, is the inserting of characters or symbols into a document to indicate the document’s physical and logical structure, to indicate how the information in a document should appear, or to provide some other form of instruction. The primary goal of markup is to separate the treatment (for example, the appearance or structure) of a document from the actual data in the document.
XML Is a Markup Language and a Metalanguage There are over two dozen categories of computer languages; you are probably familiar with some of them already. For example, machine languages consist entirely of numbers and are only understood by computers; assembly languages are symbolic representations of the machine language of a specific computer; programming languages such as COBOL, C++, Java, and Fortran instruct computers to do specific tasks; and fourth-generation languages, whose syntax is closer to human languages. Some language categories are separate and discrete, dedicated to specific functions; some languages are subsets of others; and some are hybrids of other languages. For a more comprehensive listing of computer languages and their respective definitions, consult the The Language List Web site, maintained by Bill Kinnersley of the Computer Science Department, University of Kansas, at http://cui.unige.ch/OSG/info/Langlist/intro.html.
XML doesn’t fall into any of the categories previously listed, but it falls into two other categories: It’s a markup language and a metalanguage.
Markup Languages Extrapolating the definition of markup, markup languages are those that allow us to create documents consisting of plaintext data and other entities, plus markup codes that define the logical components and structure, as well as describe the appearance or other aspects of the data. The markup codes, called tags, are located adjacent to their respective data. In addition, the data and tags are usually composed of common text characters, so they can remain independent of platform and operating system. Why use markup languages? These days, with the proliferation of computer networks across the world, with their myriad of applications, operating systems, and proprietary network devices, the data transmitted over the wire, through the air, and through space must include all the information necessary for automated systems (such as computers, routers, firewalls, and hubs) to transmit, receive, and otherwise deal with the data. The receiver needs the markup tags to interpret the message: the format and content of database data,
XML Backgrounder
multimedia graphic files or audio files, debit card transactions, credit card authorizations, or any other various document types.
Metalanguages In the What Is Markup? section, we provided a listing of markup types. One of the types was called metamarkup, which provides the capability to control the definition and interpretation of markup tags, and to extend the vocabulary of derivative markup languages. That is consistent with the definition found at Mr. Kinnersley’s Web site, where he defines a metalanguage as a “language used for formal description of another language.” It is also consistent with other definitions of metalanguages, which describe them as languages that provide for conformance-proving mechanisms. XML permits developers to create their own specialized derivative languages, but all of those languages have one thing in common: They meet XML specifications. If languages and documents contravene the XML specifications, the XML processors in their respective applications may or may not process them. Even if they do, they will likely generate error messages.
The Evolution of XML Until the late 1960s, it was accepted practice that electronic manuscript files would contain macros or control codes (referred to as specific coding) to prescribe how the manuscript documents should be rendered. Plus, the format of the document files, and the applications that manipulated them, were often proprietary to the publishers. Also, document processing applications were of a black-box nature. Users couldn’t get at all the coding to examine and possibly modify it; therefore, document coding was not open source. It was also nonstandard: Tags and other coding from one application were not identical or interchangeable with those from another application. Documents created with one application were usually not compatible with other applications.
The Advent of Generic Coding There are several good historical summaries of the state of document processing prior to the development of generic coding, upon which XML and its predecessors are based. This section paraphrases from several sources, especially, from those found at Charles F. Goldfarb’s SGML Source home page at www.sgmlsource.com, in his SGML History Niche at www.sgmlsource.com/ history/. (SGML stands for Standard Generalized Markup Language; we look at SGML more closely later in this chapter.) Documents there are recommended reading.
7
8
Chapter 1
We have already mentioned the proprietary nature of early document processing technologies. A number of movements began in the late 1960s that would lead to a substantial change from that philosophy, including the following: ■■
New York book designer Stanley Rice advocated the development and adoption of standard style macros based on the structural elements of publications (examples: parts and chapters).
■■
William Tunnicliffe of the Graphic Communication Association (GCA; now known as the International Digital Enterprise Alliance) advocated “the separation of information content of documents from their format.” This was the concept of generic coding at its embryonic stage.
■■
In 1969, IBM began research on an integrated processing project: the application of computers to the legal profession. The project involved the integration of a text editing application with a database information retrieval system and a document composition application.
There was intellectual cross-pollination among the initiatives, which bore fruit for the GCA, IBM, and, eventually, for all of us. For further information on IDEAlliance (formerly the GCA), consult www.idealliance.org/.
GML Led the Way As the IBM team worked on their integrated document project, they recognized that their eventual product language would have to reflect three features: ■■
Markup in general would have to be the common language (the developers refer to it as the lingua franca) for data description, structure, and communication, and it would have to be readable and writable by all relevant computer applications.
■■
The markup would have to be extensible, not related to just one industry, because an infinite variety of information types might eventually be created. In other words, they saw that their technology might and should be applied to all professions.
■■
The documents common to the information in each different area would need some sort of description mechanism or rules, against which the documents could be checked for conformity—that is, proofed.
The IBM team called the first version of the product they developed in 1969 the Text Description Language. Development continued and its name was
XML Backgrounder
changed in 1971 to the Generalized Markup Language. The name was chosen deliberately, so that its acronym, GML, could serve as a reminder of the GML’s original creators: Charles Goldfarb, Ed Mosher, and Ray Lorie. With GML, IBM removed specific formatting instructions from the content of the document itself. GML’s markup was based only on the identification of the different types of structural components in a document. With GML, an author could assign descriptive tag names to the sections of data. After the various sections were thus identified, any application could be written to manipulate the data as long as it contained the appropriate tag references. GML was first released under its own name as part of Advanced Text Management System in 1973. It became an integral part of several IBM publishing systems, most notably IBM Script. Table 1.1 lists some basic GML codes. The following is a sample of GML markup: :h2.Definitions: :ol. li.1. noun, a gem variety of corundum in transparent or translucent crystals of a color other than red; especially, a transparent rich blue :li.2. noun, a gem of such corundum :li.3. noun, a deep purplish blue color :li.4. adjective, made of or resembling a sapphire gem :li.5. adjective, having the color of a blue sapphire :eol.
Table 1.1
GML Tag Examples
TAG
EXPLANATION
:title
Document title
:h0-:h6
Zero level through sixth-level titles
:ul / :eul
Begin and end unordered lists
:ol / :eol
Begin and end ordered lists
:li
Item that appears following a “begin list” tag
:hp1 / :hp2 / :hp3
Start highlight level 1, 2, or 3 (where 1=underscore, 2=bold, and 3=both)
:ehp1 / :ehp2 / :hp3
End highlight level 1, 2, or 3
:lq / :elq
Begin and end a long quote (also called a block quote)
9
10
Chapter 1
Let’s examine the sample coding. Notice that a GML tag begins at the left margin (that’s why they’re called flush left) and that the tag name is preceded by a colon. The colon is the GML delimiter, which instructs the application (presumably a text formatter) to begin processing a tag. Immediately following the delimiter is the descriptive tag h2, which indicates that the content, when encountered, is to be formatted according to the predefined rules for a secondlevel title. The tag is followed by a content separator (the period), which tells the processor to stop processing the tag and to, instead, process the text data that follows according to the h2 rules. Finally, the data (that is, the word Definitions followed by a colon) appears. GML’s descriptive generic coding makes a document more portable because the content can be printed or displayed in different ways according to an application’s interpretation of the tag without making any changes to the original document file. Plus, the author doesn’t have to supply the formatting details. So, using one application, the h2-tagged definitions might be printed in a Times font at 30-point size, while the list items (that is, the content on the :li lines) might be printed in Times at 12 points. With another application, definitions might appear on-screen in a bolded sans serif font at 24 points, while the list items appear in sans serif at 10 points. GML development didn’t stop with its release in 1973. As the SGML history documents indicate, Mr. Goldfarb “continued research on document structures . . . short references, link processes, and concurrent document types . . . . By far the most important was the concept of a validating parser that could read a document type definition and check the accuracy of markup without actually processing a document.” Document type definitions, which would fulfill the third required feature of the three listed earlier in this section, had been in development since the beginning of the work on GML. In 1975 to 1978, IBM introduced their Document Composition Facility (DCF), based on the IBM Script product of the 1960s but with upgrades like GML support. With DCF, GML left the essentially research-only domain and became commercially available.
Other Typesetting Developments As we stated previously, until the late 1960s computer text processing applications were proprietary in nature. GML’s creators showed the document processing world that there was merit in creating a portable, machine-independent system of encoding. The development of generic coding and other document processing technologies, however, did not cease with the appearance of GML. The GCA had initiated their System X project, which would later be called GenCode. Work on that project would continue through the 1970s.
XML Backgrounder
Other automated typesetting technologies prevalent or being developed simultaneously with (and even later than) GML and System X/GenCode, but in different areas, include the following: Mainframe publishing applications. Expensive, esoteric, and requiring mainframe systems, these were still fairly powerful; however, most dealt with data display and did not reflect the direction shown by GML. Desktop publishing (DTP) applications (also called formatting markup applications). As personal computers became more powerful and less expensive, desktop publishing applications appeared more frequently. They are used for producing newsletters, books, and other documents that used to require professional typesetters. DTP applications are advantageous for large, detailed reports. Often, they’re freely available, powerful, and used throughout the world (an example is the TeX family of DTP applications: TeX, LaTex, MikTeX, and others). They’re often platform-independent and have powerful facilities for mathematical and scientific equations and other expressions (better than word processing applications, which we discuss in the next bullet point), as well as section and chapter numbering. But they are primarily display-oriented rather than data-oriented. Plus, there is no instant feedback or instant modification. In fact, additional applications are required to translate a DTP rendition file to a presentation format that is humanreadable, including Device Independent (DVI), PostScript (PS), Portable Document Format (PDF), and HyperText Markup Language (HTML). After that, a specific viewer application (for example, a DVI viewer) must be activated to view the final results. Only after the document file is displayed can you finally print it. Word processor applications. From a user interface perspective, word processor applications are an improvement over desktop publishing applications. These applications create renditions, but they provide a nicer user interface to create and manipulate them. The interface is designed to look similar to the presentation: the finished paper product or screen layout. That’s why using these applications is called What You See Is What You Get (WYSIWYG, pronounced “wizzywig”) publishing. WYSIWYGs are becoming more sophisticated and more like desktop publishing applications. But WYSIWYG applications have their drawbacks, too: They still don’t have all the sophistication of desktop publishing applications, especially when it comes to rendering mathematical or scientific expressions. Also, with most, their features are generally inserted into the document file with nonstandard markup codes. Plus, the codes are only visible to the application during processing, but not to the user during document
11
12
Chapter 1
creation or editing. Thus, with most WYSIWYGs (there are exceptions), all you see in the user interface (that is, on-screen) is the effect of the codes that the applications have inserted, not the codes themselves. However, some WYSIWYGs allow you to save a document in HTML format, allowing those applications to compete with Web page design applications. So the codes have become somewhat more standard and, after a fashion, visible, but again, these applications emphasize data display rather than data semantics, and not all of them are platformindependent. Simple text editors (also called plaintext applications). Simple text editors are scaled-down (compared to WYSIWYG) applications for creating plaintext documents. That is, they create documents consisting of ordinary text. After that, most do not allow you to apply even the simplest typesetting features, including page breaks. But the applications are small, use few resources, and come already installed on almost every operating system.
SGML: Parent of HTML and XML With all these developments in their respective arenas, two facts became apparent: ■■
For markup languages to be truly portable and useful to the myriad of eventual users in several environments on many networks around the world, a standard would have to be developed to list all acceptable and valid markup tags.
■■
Any eventual standard must clearly define the meaning and syntax of the markup tags.
Further, any information intended for public use could not be proprietary. That is, it couldn’t be restricted to one technology and certainly not to one make, model, or manufacturer of such a technology. In addition, public-oriented information should be in a form that could be reused in many different ways to optimize time and effort. Proprietary data formats, no matter how well documented or publicized, would be unacceptable. To meet the challenge, the American National Standards Institute (ANSI), in 1978, established the Computer Languages for the Processing of Text committee, which began work on a standard text description language. That language was to be based on GML. The GenCode committee of the GCA provided several
XML Backgrounder
people (including the aforementioned Mr. Tunnicliffe; he would come to play a vital role) dedicated to the task of developing the SGML standard. Mr. Goldfarb, one of the three original GML creators, also participated. The committee published its first working draft of the Standard Generalized Markup Language standard in 1980. After several drafts, the United States Internal Revenue Service and the United States Department of Defense adopted SGML. Many national and international organizations, notably other defense organizations in North America and elsewhere, subsequently adopted SGML, too. By 1984, the SGML project had also been authorized by the International Organization for Standardization (ISO), which established its own SGML development team; however, alignment between the ANSI and ISO teams was maintained by Mr. Goldfarb, who served as project editor for both. In 1986, the Standard Generalized Markup Language (ISO 8879:1986)—an international standard describing markup for the structure and content of different types of electronic, machine-readable documents—was approved. SGML was not designed as a document encoder on its own. Its power comes from its use as a standard by which other, more specific languages, tailored to the specific requirements of any organization or industry, can be developed. SGML became the overarching standard metalanguage and would be used to facilitate the creation of many derivative markup languages—most notably, HTML and XML. The derivative languages and their respective documents may then be processed, without changes or losses, for varying purposes and in different forms by any appropriately written program that can process SGML. The documents might be transmitted or displayed on a PC, on laptop or handheld computers, in print, or via projection without fear of information being lost or misinterpreted. SGML-related languages separate the three aspects of a typical document (the data structure, content, and style) and deal mainly with the relationship between structure and content. To that end, the concept of a separate but related document called the document type definition (or DTD, the subject of Chapter 4, “Document Type Definitions”), which had been born with GML, was formalized with SGML. SGML thus became an extremely powerful and extensible tool, and it led to the cataloging and indexing of data in many important and complex industries (examples: defense, as mentioned, plus medical, financial, aerospace, telecommunications, and entertainment). Table 1.2 lists several SGML-based languages.
13
14
Chapter 1 Table 1.2
Examples of SGML-Based Languages
LANGUAGE
DESCRIPTION
HyperText Markup Language (HTML)
Perhaps the most famous; used to create hypertext documents. In use over the World Wide Web since 1990.
Extensible Markup Language (XML)
Used to create other industry-specific or organization-specific languages; the scaled-down, Internet-oriented version of SGML.
Continuous Acquisition and Lifecycle Support (CALS)
Formerly called Computer-aided Acquisition and Logistics Support. More recently called Commerce at Light Speed. Used for documenting complex military equipment.
Text Encoding Initiative (TEI)
An international standard that allows libraries, museums, publishers, scholars, and others to represent texts for online research and teaching.
The Standard Music Description Language (SMDL)
Used to define timing and user-defined functions for pitches, chords, and instrumental and vocal sounds.
News Industry Text Format (NITF)
For describing information for the News Distribution Industry.
A longer list of SGML-related languages can be found at the Oasis Cover Pages Web site at http://xml.coverpages.org/gen-apps.html.
The following is an example of SGML markup. Notice that start tags and end tags (terms enclosed in angle brackets and located before the beginning and after the end of the data, respectively) are used to identify elements (contrast that to GML’s left-flush tags). Everything from the start tag to the end tag , including those tags, is part of the element. brackets, may provide a clue. You’ll find that, as you progress through XML in 60 Minutes a Day, analyzing XML files becomes easier. 4. When you are finished examining the file, click File, Exit, or simply click the Close (X) button in the top right corner.
Lab 1.3: Visit Some Web Sites to See How Many Use XML In this lab, you will check various Web sites to see if they use XML extensively or whether they have incorporated just some XML. 1. We’ll first visit the W3C Web site, where, we can safely wager, we’ll find Web pages containing XML coding. a. From your Windows Desktop, click Start; then scroll up to Programs, and click your Web browser application (it might be
31
32
Chapter 1
Internet Explorer, Netscape Navigator, Mozilla, Opera, or any of a number of available browsers). For this exercise, let’s presume you are using Internet Explorer. b. In the locator bar, type the following and then press Enter: http://www.w3.org
2. When the home page for the World Wide Web Consortium’s Web site appears, go up to the toolbar and press and hold View; then scroll down to Source, and release the mouse button there. A new window appears, which shows you the beginning of the code that went into creating the W3C site. The first couple of lines will look like this: